Given a tidy table of features describing each item, perform k-means
clustering using kmeans()
and retidy the data into
one-row-per-cluster.
Arguments
- tbl
Table
- item
Item to cluster (as a bare column name)
- feature
Feature column (dimension in clustering)
- value
Value column
- k
Number of clusters
- fill
What to fill in for missing values
- ...
Other arguments passed on to
kmeans()
Examples
library(gapminder)
library(dplyr)
clusters <- gapminder %>%
widely_kmeans(country, year, lifeExp, k = 5)
clusters
#> # A tibble: 142 × 2
#> country cluster
#> <fct> <fct>
#> 1 Bangladesh 1
#> 2 Benin 1
#> 3 Bolivia 1
#> 4 Botswana 1
#> 5 Cambodia 1
#> 6 Cameroon 1
#> 7 Comoros 1
#> 8 Congo, Rep. 1
#> 9 Cote d'Ivoire 1
#> 10 Gabon 1
#> # … with 132 more rows
clusters %>%
count(cluster)
#> # A tibble: 5 × 2
#> cluster n
#> <fct> <int>
#> 1 1 31
#> 2 2 24
#> 3 3 29
#> 4 4 31
#> 5 5 27
# Examine a few clusters
clusters %>% filter(cluster == 1)
#> # A tibble: 31 × 2
#> country cluster
#> <fct> <fct>
#> 1 Bangladesh 1
#> 2 Benin 1
#> 3 Bolivia 1
#> 4 Botswana 1
#> 5 Cambodia 1
#> 6 Cameroon 1
#> 7 Comoros 1
#> 8 Congo, Rep. 1
#> 9 Cote d'Ivoire 1
#> 10 Gabon 1
#> # … with 21 more rows
clusters %>% filter(cluster == 2)
#> # A tibble: 24 × 2
#> country cluster
#> <fct> <fct>
#> 1 Afghanistan 2
#> 2 Angola 2
#> 3 Burkina Faso 2
#> 4 Burundi 2
#> 5 Central African Republic 2
#> 6 Chad 2
#> 7 Congo, Dem. Rep. 2
#> 8 Djibouti 2
#> 9 Equatorial Guinea 2
#> 10 Eritrea 2
#> # … with 14 more rows