Tidy LDA models fit by the mallet package, which wraps the Mallet topic modeling package in Java. The arguments and return values are similar to lda_tidiers.

# S3 method for jobjRef
  matrix = c("beta", "gamma"),
  log = FALSE,
  normalized = TRUE,
  smoothed = TRUE,

# S3 method for jobjRef
augment(x, data, ...)



A jobjRef object, of type RTopicModel, such as created by MalletLDA.


Whether to tidy the beta (per-term-per-topic, default) or gamma (per-document-per-topic) matrix.


Whether beta/gamma should be on a log scale, default FALSE


If true (default), normalize so that each document or word sums to one across the topics. If false, values will be integers representing the actual number of word-topic or document-topic assignments.


If true (default), add the smoothing parameter to each to avoid any values being zero. This smoothing parameter is initialized as alpha.sum in MalletLDA.


Extra arguments, not used


For augment, the data given to the LDA function, either as a DocumentTermMatrix or as a tidied table with "document" and "term" columns.


augment must be provided a data argument containing one row per original document-term pair, such as is returned by tdm_tidiers, containing columns document and term. It returns that same data with an additional column .topic with the topic assignment for that document-term combination.


Note that the LDA models from MalletLDA are technically a special case of S4 objects with class jobjRef. These are thus implemented as jobjRef tidiers, with a check for whether the toString output is as expected.

See also


if (FALSE) { library(mallet) library(dplyr) data("AssociatedPress", package = "topicmodels") td <- tidy(AssociatedPress) # mallet needs a file with stop words tmp <- tempfile() writeLines(stop_words$word, tmp) # two vectors: one with document IDs, one with text docs <- td %>% group_by(document = as.character(document)) %>% summarize(text = paste(rep(term, count), collapse = " ")) docs <- mallet.import(docs$document, docs$text, tmp) # create and run a topic model topic_model <- MalletLDA(num.topics = 4) topic_model$loadDocuments(docs) topic_model$train(20) # tidy the word-topic combinations td_beta <- tidy(topic_model) td_beta # Examine the four topics td_beta %>% group_by(topic) %>% top_n(8, beta) %>% ungroup() %>% mutate(term = reorder(term, beta)) %>% ggplot(aes(term, beta)) + geom_col() + facet_wrap(~ topic, scales = "free") + coord_flip() # find the assignments of each word in each document assignments <- augment(topic_model, td) assignments }