mlr3cluster

Package website: release | dev

Cluster analysis for mlr3.

r-cmd-check CRAN status StackOverflow Mattermost

mlr3cluster is an extension package for cluster analysis within the mlr3 ecosystem. It is a successor of clustering capabilities of mlr2.

Installation

Install the last release from CRAN:

install.packages("mlr3cluster")

Install the development version from GitHub:

# install.packages("pak")
pak::pak("mlr-org/mlr3cluster")

Feature Overview

The current version of mlr3cluster contains:

Also, the package is integrated with mlr3viz which enables you to create great visualizations with just one line of code!

Cluster Analysis

Cluster Learners

Key Label Packages
clust.MBatchKMeans Mini Batch K-Means ClusterR
clust.SimpleKMeans K-Means (Weka) RWeka
clust.agnes Agglomerative Nesting cluster
clust.ap Affinity Propagation apcluster
clust.bico BICO stream
clust.birch BIRCH stream
clust.cmeans Fuzzy C-Means e1071, clue
clust.cobweb Cobweb RWeka
clust.dbscan DBSCAN dbscan
clust.dbscan_fpc DBSCAN (fpc) fpc
clust.diana Divisive Analysis cluster
clust.em Expectation-Maximization RWeka
clust.fanny Fuzzy Analysis cluster
clust.featureless Featureless Clustering Learner
clust.ff Farthest First RWeka
clust.hclust Hierarchical Clustering stats
clust.hdbscan HDBSCAN dbscan
clust.kkmeans Kernel K-Means kernlab
clust.kmeans K-Means stats, clue
clust.mclust Gaussian Mixture Model mclust
clust.meanshift Mean Shift LPCM
clust.optics OPTICS dbscan
clust.pam Partitioning Around Medoids cluster, clue
clust.xmeans X-Means RWeka

Cluster Measures

Key Label Packages
clust.ch Calinski Harabasz fpc
clust.dunn Dunn fpc
clust.silhouette Silhouette cluster
clust.wss Within Sum of Squares fpc

Example

library(mlr3)
library(mlr3cluster)

task = tsk("usarrests")
task
#> 
#> ── <TaskClust> (50x4): US Arrests ──────────────────────────────────────────────
#> • Target:
#> • Properties: -
#> • Features (4):
#>   • int (2): Assault, UrbanPop
#>   • dbl (2): Murder, Rape

learner = lrn("clust.kmeans")
prediction = learner$train(task)$predict(task)
measures = msrs(c("clust.wss", "clust.silhouette"))
prediction$score(measures, task)
#>        clust.wss clust.silhouette 
#>     9.639903e+04     5.926554e-01

More Resources

Check out the blogpost for a more detailed introduction to the package. Also, mlr3book has a section on clustering.

Future Plans

If you have any questions, feedback or ideas, feel free to open an issue here.