Tidy a corpus object from the quanteda package. tidy returns a tbl_df with one-row-per-document, with a text column containing the document's text, and one column for each document-level metadata. glance returns a one-row tbl_df with corpus-level metadata, such as source and created. For Corpus objects from the tm package, see tidy.Corpus.

# S3 method for corpus
tidy(x, ...)

# S3 method for corpus
glance(x, ...)

Arguments

x

A Corpus object, such as a VCorpus or PCorpus

...

Extra arguments, not used

Details

For the most part, the tidy output is equivalent to the "documents" data frame in the corpus object, except that it is converted to a tbl_df, and texts column is renamed to text to be consistent with other uses in tidytext.

Similarly, the glance output is simply the "metadata" object, with NULL fields removed and turned into a one-row tbl_df.

Examples

if (requireNamespace("quanteda", quietly = TRUE)) { data("data_corpus_inaugural", package = "quanteda") data_corpus_inaugural tidy(data_corpus_inaugural) }
#> # A tibble: 58 x 5 #> text Year President FirstName Party #> <chr> <int> <chr> <chr> <fct> #> 1 "Fellow-Citizens of the Senate and o… 1789 Washington George none #> 2 "Fellow citizens, I am again called … 1793 Washington George none #> 3 "When it was first perceived, in ear… 1797 Adams John Federalist #> 4 "Friends and Fellow Citizens:\n\nCal… 1801 Jefferson Thomas Democratic-… #> 5 "Proceeding, fellow citizens, to tha… 1805 Jefferson Thomas Democratic-… #> 6 "Unwilling to depart from examples o… 1809 Madison James Democratic-… #> 7 "About to add the solemnity of an oa… 1813 Madison James Democratic-… #> 8 "I should be destitute of feeling if… 1817 Monroe James Democratic-… #> 9 "Fellow citizens, I shall not attemp… 1821 Monroe James Democratic-… #> 10 "In compliance with an usage coeval … 1825 Adams John Qui… Democratic-… #> # … with 48 more rows