Transform and tokenize text

unnest_tokens()

Split a column into tokens

unnest_characters() unnest_character_shingles()

Wrapper around unnest_tokens for characters and character shingles

unnest_ngrams() unnest_skip_ngrams()

Wrapper around unnest_tokens for n-grams

unnest_sentences() unnest_lines() unnest_paragraphs()

Wrapper around unnest_tokens for sentences, lines, and paragraphs

unnest_regex()

Wrapper around unnest_tokens for regular expressions

unnest_ptb()

Wrapper around unnest_tokens for Penn Treebank Tokenizer

Term frequency and inverse document frequency

bind_tf_idf()

Bind the term frequency and inverse document frequency of a tidy text dataset to the dataset

Tidy

tidy(<corpus>) glance(<corpus>)

Tidiers for a corpus object from the quanteda package

tidy(<dictionary2>)

Tidy dictionary objects from the quanteda package

tidy(<LDA>) tidy(<CTM>) augment(<LDA>) augment(<CTM>) glance(<LDA>) glance(<CTM>)

Tidiers for LDA and CTM objects from the topicmodels package

tidy(<jobjRef>) augment(<jobjRef>)

Tidiers for Latent Dirichlet Allocation models from the mallet package

tidy(<STM>) tidy(<estimateEffect>) glance(<estimateEffect>) augment(<STM>) glance(<STM>)

Tidiers for Structural Topic Models from the stm package

tidy(<DocumentTermMatrix>) tidy(<TermDocumentMatrix>) tidy(<dfm>) tidy(<dfmSparse>) tidy(<simple_triplet_matrix>)

Tidy DocumentTermMatrix, TermDocumentMatrix, and related objects from the tm package

tidy(<Corpus>)

Tidy a Corpus object from the tm package

tidy_triplet()

Utility function to tidy a simple triplet matrix

Cast

cast_tdm() cast_dtm() cast_dfm()

Casting a data frame to a DocumentTermMatrix, TermDocumentMatrix, or dfm

cast_sparse()

Create a sparse matrix from row names, column names, and values in a table.

Supporting data sets

get_sentiments()

Get a tidy data frame of a single sentiment lexicon

get_stopwords()

Get a tidy data frame of a single stopword lexicon

sentiments

Sentiment lexicon from Bing Liu and collaborators

stop_words

Various lexicons for English stop words

nma_words

English negators, modals, and adverbs

parts_of_speech

Parts of speech for English words from the Moby Project

Graphing helpers

reorder_within() scale_x_reordered() scale_y_reordered() reorder_func()

Reorder an x or y axis within facets

Package info

tidytext tidytext-package

tidytext: Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools