Compute the delta distances (from its two variants) of all pairs of documents in a tidy table.
Usage
pairwise_delta(tbl, item, feature, value, method = "burrows", ...)
pairwise_delta_(tbl, item, feature, value, method = "burrows", ...)
Arguments
- tbl
Table
- item
Item to compare; will end up in
item1
anditem2
columns- feature
Column describing the feature that links one item to others
- value
Value
- method
Distance measure to be used; see
dist()
- ...
Extra arguments passed on to
squarely()
, such asdiag
andupper
Examples
library(janeaustenr)
library(dplyr)
library(tidytext)
# closest documents in terms of 1000 most frequent words
closest <- austen_books() %>%
unnest_tokens(word, text) %>%
count(book, word) %>%
top_n(1000, n) %>%
pairwise_delta(book, word, n, method = "burrows") %>%
arrange(delta)
closest
#> # A tibble: 30 × 3
#> item1 item2 delta
#> <fct> <fct> <dbl>
#> 1 Persuasion Northanger Abbey 0.408
#> 2 Northanger Abbey Persuasion 0.408
#> 3 Pride & Prejudice Sense & Sensibility 0.491
#> 4 Sense & Sensibility Pride & Prejudice 0.491
#> 5 Persuasion Pride & Prejudice 0.769
#> 6 Pride & Prejudice Persuasion 0.769
#> 7 Northanger Abbey Pride & Prejudice 0.779
#> 8 Pride & Prejudice Northanger Abbey 0.779
#> 9 Persuasion Sense & Sensibility 0.814
#> 10 Sense & Sensibility Persuasion 0.814
#> # … with 20 more rows
closest %>%
filter(item1 == "Pride & Prejudice")
#> # A tibble: 5 × 3
#> item1 item2 delta
#> <fct> <fct> <dbl>
#> 1 Pride & Prejudice Sense & Sensibility 0.491
#> 2 Pride & Prejudice Persuasion 0.769
#> 3 Pride & Prejudice Northanger Abbey 0.779
#> 4 Pride & Prejudice Mansfield Park 1.04
#> 5 Pride & Prejudice Emma 1.09
# to remove duplicates, use upper = FALSE
closest <- austen_books() %>%
unnest_tokens(word, text) %>%
count(book, word) %>%
top_n(1000, n) %>%
pairwise_delta(book, word, n, method = "burrows", upper = FALSE) %>%
arrange(delta)
# Can also use Argamon's Linear Delta
closest <- austen_books() %>%
unnest_tokens(word, text) %>%
count(book, word) %>%
top_n(1000, n) %>%
pairwise_delta(book, word, n, method = "argamon", upper = FALSE) %>%
arrange(delta)