Cluster items based on k-means across features

Given a tidy table of features describing each item, perform k-means clustering using kmeans() and retidy the data into one-row-per-cluster.

Usage

widely_kmeans(tbl, item, feature, value, k, fill = 0, ...)

Arguments

tbl: Table
item: Item to cluster (as a bare column name)
feature: Feature column (dimension in clustering)
value: Value column
k: Number of clusters
fill: What to fill in for missing values
...: Other arguments passed on to kmeans()

Examples


library(gapminder)
library(dplyr)

clusters <- gapminder %>%
  widely_kmeans(country, year, lifeExp, k = 5)

clusters
#> # A tibble: 142 × 2
#>    country       cluster
#>    <fct>         <fct>  
#>  1 Bangladesh    1      
#>  2 Benin         1      
#>  3 Bolivia       1      
#>  4 Botswana      1      
#>  5 Cambodia      1      
#>  6 Cameroon      1      
#>  7 Comoros       1      
#>  8 Congo, Rep.   1      
#>  9 Cote d'Ivoire 1      
#> 10 Gabon         1      
#> # … with 132 more rows

clusters %>%
  count(cluster)
#> # A tibble: 5 × 2
#>   cluster     n
#>   <fct>   <int>
#> 1 1          31
#> 2 2          24
#> 3 3          29
#> 4 4          31
#> 5 5          27

# Examine a few clusters
clusters %>% filter(cluster == 1)
#> # A tibble: 31 × 2
#>    country       cluster
#>    <fct>         <fct>  
#>  1 Bangladesh    1      
#>  2 Benin         1      
#>  3 Bolivia       1      
#>  4 Botswana      1      
#>  5 Cambodia      1      
#>  6 Cameroon      1      
#>  7 Comoros       1      
#>  8 Congo, Rep.   1      
#>  9 Cote d'Ivoire 1      
#> 10 Gabon         1      
#> # … with 21 more rows
clusters %>% filter(cluster == 2)
#> # A tibble: 24 × 2
#>    country                  cluster
#>    <fct>                    <fct>  
#>  1 Afghanistan              2      
#>  2 Angola                   2      
#>  3 Burkina Faso             2      
#>  4 Burundi                  2      
#>  5 Central African Republic 2      
#>  6 Chad                     2      
#>  7 Congo, Dem. Rep.         2      
#>  8 Djibouti                 2      
#>  9 Equatorial Guinea        2      
#> 10 Eritrea                  2      
#> # … with 14 more rows

Cluster items based on k-means across features

Usage

Arguments

See also

Examples