Tidy topic models fit by the stm package. The arguments and return values are similar to lda_tidiers.

# S3 method for STM
tidy(
x,
matrix = c("beta", "gamma", "theta"),
log = FALSE,
document_names = NULL,
...
)

# S3 method for estimateEffect
tidy(x, ...)

# S3 method for estimateEffect
glance(x, ...)

# S3 method for STM
augment(x, data, ...)

# S3 method for STM
glance(x, ...)

Arguments

x An STM fitted model object from either stm or estimateEffect from the stm package. Whether to tidy the beta (per-term-per-topic, default) or gamma/theta (per-document-per-topic) matrix. The stm package calls this the theta matrix, but other topic modeling packages call this gamma. Whether beta/gamma/theta should be on a log scale, default FALSE Optional vector of document names for use with per-document-per-topic tidying Extra arguments, not used For augment, the data given to the stm function, either as a dfm from quanteda or as a tidied table with "document" and "term" columns

Value

tidy returns a tidied version of either the beta or gamma matrix if called on an object from stm or a tidied version of the estimated regressions if called on an object from estimateEffect.

glance always returns a one-row table, with columns

k

Number of topics in the model

docs

Number of documents in the model

uncertainty

Uncertainty measure

augment must be provided a data argument, either a dfm from quanteda or a table containing one row per original document-term pair, such as is returned by tdm_tidiers, containing columns document and term. It returns that same data as a table with an additional column .topic with the topic assignment for that document-term combination.

glance always returns a one-row table, with columns

k

Number of topics in the model

docs

Number of documents in the model

terms

Number of terms in the model

iter

Number of iterations used

alpha

If an LDA model, the parameter of the Dirichlet distribution for topics over documents

lda_tidiers

If matrix == "beta" (default), returns a table with one row per topic and term, with columns

topic

Topic, as an integer

term

Term

beta

Probability of a term generated from a topic according to the structural topic model

If matrix == "gamma", returns a table with one row per topic and document, with columns

topic

Topic, as an integer

document

Document name (if given in vector of document_names) or ID as an integer

gamma

Probability of topic given document

If called on an object from estimateEffect, returns a table with columns

topic

Topic, as an integer

term

The term in the model being estimated and tested

estimate

The estimated coefficient

std.error

The standard error from the linear model

statistic

t-statistic

p.value

two-sided p-value

Examples


if (FALSE) {
if (requireNamespace("stm", quietly = TRUE)) {
library(dplyr)
library(ggplot2)
library(stm)
library(janeaustenr)

austen_sparse <- austen_books() %>%
unnest_tokens(word, text) %>%
anti_join(stop_words) %>%
count(book, word) %>%
cast_sparse(book, word, n)
topic_model <- stm(austen_sparse, K = 12, verbose = FALSE, init.type = "Spectral")

# tidy the word-topic combinations
td_beta <- tidy(topic_model)
td_beta

# Examine the topics
td_beta %>%
group_by(topic) %>%
top_n(10, beta) %>%
ungroup() %>%
ggplot(aes(term, beta)) +
geom_col() +
facet_wrap(~ topic, scales = "free") +
coord_flip()

# tidy the document-topic combinations, with optional document names
td_gamma <- tidy(topic_model, matrix = "gamma",
document_names = rownames(austen_sparse))
td_gamma

# using stm's gardarianFit, we can tidy the result of a model
# estimated with covariates