Parts of speech for English words from the Moby Project by Grady Ward. Words with non-ASCII characters and items with a space have been removed.

parts_of_speech

Format

A data frame with 205,985 rows and 2 variables:

word

An English word

pos

The part of speech of the word. One of 13 options, such as "Noun", "Adverb", "Adjective"

Source

https://archive.org/details/mobypartofspeech03203gut

Details

Another dataset of English parts of speech, available only for non-commercial use, is available as part of SUBTLEXus at https://www.ugent.be/pp/experimentele-psychologie/en/research/documents/subtlexus/.

Examples

library(dplyr) parts_of_speech
#> # A tibble: 208,259 x 2 #> word pos #> <chr> <chr> #> 1 3-d Adjective #> 2 3-d Noun #> 3 4-f Noun #> 4 4-h'er Noun #> 5 4-h Adjective #> 6 a' Adjective #> 7 a-1 Noun #> 8 a-axis Noun #> 9 a-bomb Noun #> 10 a-frame Noun #> # … with 208,249 more rows
parts_of_speech %>% count(pos, sort = TRUE)
#> # A tibble: 14 x 2 #> pos n #> <chr> <int> #> 1 Noun 104542 #> 2 Adjective 47719 #> 3 Verb (transitive) 15723 #> 4 Adverb 13234 #> 5 Verb (usu participle) 11402 #> 6 Plural 7764 #> 7 Verb (intransitive) 4626 #> 8 NA 2274 #> 9 Interjection 395 #> 10 Preposition 159 #> 11 Noun Phrase 115 #> 12 Pronoun 113 #> 13 Definite Article 103 #> 14 Conjunction 90