Parts of speech for English words from the Moby Project

Parts of speech for English words from the Moby Project by Grady Ward. Words with non-ASCII characters and items with a space have been removed.

parts_of_speech

Format

A data frame with 205,985 rows and 2 variables:

word: An English word
pos: The part of speech of the word. One of 13 options, such as "Noun", "Adverb", "Adjective"

Source

https://archive.org/details/mobypartofspeech03203gut

Details

Another dataset of English parts of speech, available only for non-commercial use, is available as part of SUBTLEXus at https://www.ugent.be/pp/experimentele-psychologie/en/research/documents/subtlexus/.

Examples


library(dplyr)

parts_of_speech
#> # A tibble: 208,259 × 2
#>    word    pos      
#>    <chr>   <chr>    
#>  1 3-d     Adjective
#>  2 3-d     Noun     
#>  3 4-f     Noun     
#>  4 4-h'er  Noun     
#>  5 4-h     Adjective
#>  6 a'      Adjective
#>  7 a-1     Noun     
#>  8 a-axis  Noun     
#>  9 a-bomb  Noun     
#> 10 a-frame Noun     
#> # ℹ 208,249 more rows

parts_of_speech %>%
  count(pos, sort = TRUE)
#> # A tibble: 14 × 2
#>    pos                        n
#>    <chr>                  <int>
#>  1 Noun                  104542
#>  2 Adjective              47719
#>  3 Verb (transitive)      15723
#>  4 Adverb                 13234
#>  5 Verb (usu participle)  11402
#>  6 Plural                  7764
#>  7 Verb (intransitive)     4626
#>  8 NA                      2274
#>  9 Interjection             395
#> 10 Preposition              159
#> 11 Noun Phrase              115
#> 12 Pronoun                  113
#> 13 Definite Article         103
#> 14 Conjunction               90