Tidy a corpus object from the quanteda package. tidy
returns a
tbl_df with one-row-per-document, with a text
column containing
the document's text, and one column for each document-level metadata.
glance
returns a one-row tbl_df with corpus-level metadata,
such as source and created. For Corpus objects from the tm package,
see tidy.Corpus()
.
A Corpus object, such as a VCorpus or PCorpus
Extra arguments, not used
For the most part, the tidy
output is equivalent to the
"documents" data frame in the corpus object, except that it is converted
to a tbl_df, and texts
column is renamed to text
to be consistent with other uses in tidytext.
Similarly, the glance
output is simply the "metadata" object,
with NULL fields removed and turned into a one-row tbl_df.
if (requireNamespace("quanteda", quietly = TRUE)) {
data("data_corpus_inaugural", package = "quanteda")
data_corpus_inaugural
tidy(data_corpus_inaugural)
}
#> # A tibble: 59 × 5
#> text Year Presi…¹ First…² Party
#> <chr> <int> <chr> <chr> <fct>
#> 1 "Fellow-Citizens of the Senate and of the House … 1789 Washin… George none
#> 2 "Fellow citizens, I am again called upon by the … 1793 Washin… George none
#> 3 "When it was first perceived, in early times, th… 1797 Adams John Fede…
#> 4 "Friends and Fellow Citizens:\n\nCalled upon to … 1801 Jeffer… Thomas Demo…
#> 5 "Proceeding, fellow citizens, to that qualificat… 1805 Jeffer… Thomas Demo…
#> 6 "Unwilling to depart from examples of the most r… 1809 Madison James Demo…
#> 7 "About to add the solemnity of an oath to the ob… 1813 Madison James Demo…
#> 8 "I should be destitute of feeling if I was not d… 1817 Monroe James Demo…
#> 9 "Fellow citizens, I shall not attempt to describ… 1821 Monroe James Demo…
#> 10 "In compliance with an usage coeval with the exi… 1825 Adams John Q… Demo…
#> # … with 49 more rows, and abbreviated variable names ¹President, ²FirstName