R/parts_of_speech.R
parts_of_speech.Rd
Parts of speech for English words from the Moby Project by Grady Ward. Words with non-ASCII characters and items with a space have been removed.
parts_of_speech
A data frame with 205,985 rows and 2 variables:
An English word
The part of speech of the word. One of 13 options, such as "Noun", "Adverb", "Adjective"
Another dataset of English parts of speech, available only for non-commercial use, is available as part of SUBTLEXus at https://www.ugent.be/pp/experimentele-psychologie/en/research/documents/subtlexus/.
library(dplyr)
parts_of_speech
#> # A tibble: 208,259 × 2
#> word pos
#> <chr> <chr>
#> 1 3-d Adjective
#> 2 3-d Noun
#> 3 4-f Noun
#> 4 4-h'er Noun
#> 5 4-h Adjective
#> 6 a' Adjective
#> 7 a-1 Noun
#> 8 a-axis Noun
#> 9 a-bomb Noun
#> 10 a-frame Noun
#> # ℹ 208,249 more rows
parts_of_speech %>%
count(pos, sort = TRUE)
#> # A tibble: 14 × 2
#> pos n
#> <chr> <int>
#> 1 Noun 104542
#> 2 Adjective 47719
#> 3 Verb (transitive) 15723
#> 4 Adverb 13234
#> 5 Verb (usu participle) 11402
#> 6 Plural 7764
#> 7 Verb (intransitive) 4626
#> 8 NA 2274
#> 9 Interjection 395
#> 10 Preposition 159
#> 11 Noun Phrase 115
#> 12 Pronoun 113
#> 13 Definite Article 103
#> 14 Conjunction 90