Transform and tokenize text |
|
---|---|
Split a column into tokens |
|
Wrapper around unnest_tokens for characters and character shingles |
|
Wrapper around unnest_tokens for n-grams |
|
Wrapper around unnest_tokens for sentences, lines, and paragraphs |
|
Wrapper around unnest_tokens for regular expressions |
|
Wrapper around unnest_tokens for Penn Treebank Tokenizer |
|
Term frequency and inverse document frequency |
|
Bind the term frequency and inverse document frequency of a tidy text dataset to the dataset |
|
Tidy |
|
Tidiers for a corpus object from the quanteda package |
|
Tidy dictionary objects from the quanteda package |
|
|
Tidiers for LDA and CTM objects from the topicmodels package |
Tidiers for Latent Dirichlet Allocation models from the mallet package |
|
|
Tidiers for Structural Topic Models from the stm package |
|
Tidy DocumentTermMatrix, TermDocumentMatrix, and related objects from the tm package |
Tidy a Corpus object from the tm package |
|
Utility function to tidy a simple triplet matrix |
|
Cast |
|
Casting a data frame to a DocumentTermMatrix, TermDocumentMatrix, or dfm |
|
Create a sparse matrix from row names, column names, and values in a table. |
|
Supporting data sets |
|
Get a tidy data frame of a single sentiment lexicon |
|
Get a tidy data frame of a single stopword lexicon |
|
Sentiment lexicon from Bing Liu and collaborators |
|
Various lexicons for English stop words |
|
English negators, modals, and adverbs |
|
Parts of speech for English words from the Moby Project |
|
Graphing helpers |
|
|
Reorder an x or y axis within facets |
Package info |
|
tidytext: Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools |