Get a specific stop word lexicon via the stopwords package's stopwords function, in a tidy format with one word per row.
get_stopwords(language = "en", source = "snowball")
The language of the stopword lexicon specified as a
two-letter ISO code, such as "es"
, "de"
, or "fr"
.
Default is "en"
for English. Use
stopwords_getlanguages from stopwords to see available
languages.
The source of the stopword lexicon specified. Default is
"snowball"
. Use stopwords_getsources from
stopwords to see available sources.
A tibble with two columns, word
and lexicon
. The
parameter lexicon
is "quanteda" in this case.
library(dplyr)
get_stopwords()
#> # A tibble: 175 × 2
#> word lexicon
#> <chr> <chr>
#> 1 i snowball
#> 2 me snowball
#> 3 my snowball
#> 4 myself snowball
#> 5 we snowball
#> 6 our snowball
#> 7 ours snowball
#> 8 ourselves snowball
#> 9 you snowball
#> 10 your snowball
#> # ℹ 165 more rows
get_stopwords(source = "smart")
#> # A tibble: 571 × 2
#> word lexicon
#> <chr> <chr>
#> 1 a smart
#> 2 a's smart
#> 3 able smart
#> 4 about smart
#> 5 above smart
#> 6 according smart
#> 7 accordingly smart
#> 8 across smart
#> 9 actually smart
#> 10 after smart
#> # ℹ 561 more rows
get_stopwords("es", "snowball")
#> # A tibble: 308 × 2
#> word lexicon
#> <chr> <chr>
#> 1 de snowball
#> 2 la snowball
#> 3 que snowball
#> 4 el snowball
#> 5 en snowball
#> 6 y snowball
#> 7 a snowball
#> 8 los snowball
#> 9 del snowball
#> 10 se snowball
#> # ℹ 298 more rows
get_stopwords("ru", "snowball")
#> # A tibble: 159 × 2
#> word lexicon
#> <chr> <chr>
#> 1 и snowball
#> 2 в snowball
#> 3 во snowball
#> 4 не snowball
#> 5 что snowball
#> 6 он snowball
#> 7 на snowball
#> 8 я snowball
#> 9 с snowball
#> 10 со snowball
#> # ℹ 149 more rows