Parts of speech for English words from the Moby Project by Grady Ward. Words with non-ASCII characters and items with a space have been removed.

parts_of_speech

Format

A data frame with 205,985 rows and 2 variables:

word

An English word

pos

The part of speech of the word. One of 13 options, such as "Noun", "Adverb", "Adjective"

Details

Another dataset of English parts of speech, available only for non-commercial use, is available as part of SUBTLEXus at https://www.ugent.be/pp/experimentele-psychologie/en/research/documents/subtlexus/.

Examples


library(dplyr)

parts_of_speech
#> # A tibble: 208,259 × 2
#>    word    pos      
#>    <chr>   <chr>    
#>  1 3-d     Adjective
#>  2 3-d     Noun     
#>  3 4-f     Noun     
#>  4 4-h'er  Noun     
#>  5 4-h     Adjective
#>  6 a'      Adjective
#>  7 a-1     Noun     
#>  8 a-axis  Noun     
#>  9 a-bomb  Noun     
#> 10 a-frame Noun     
#> # ℹ 208,249 more rows

parts_of_speech %>%
  count(pos, sort = TRUE)
#> # A tibble: 14 × 2
#>    pos                        n
#>    <chr>                  <int>
#>  1 Noun                  104542
#>  2 Adjective              47719
#>  3 Verb (transitive)      15723
#>  4 Adverb                 13234
#>  5 Verb (usu participle)  11402
#>  6 Plural                  7764
#>  7 Verb (intransitive)     4626
#>  8 NA                      2274
#>  9 Interjection             395
#> 10 Preposition              159
#> 11 Noun Phrase              115
#> 12 Pronoun                  113
#> 13 Definite Article         103
#> 14 Conjunction               90