Spark library for generalized K-Means clustering. Supports general Bregman divergences. Suitable for clustering probabilistic data, time series data, high dimensional data, and very large data.
- spark
- embeddings
- spark-mllib
- itakura-saito-divergence
- cosine-similarity
- kullback-leibler-divergence
- k-means
- entropy
- clustering
- euclidean-distance
- bregman-divergence
- similarity-search
Scala versions:
2.10
Latest version
[![massivedatascience-clusterer Scala version support](https://index.scala-lang.org/derrickburns/generalized-kmeans-clustering/massivedatascience-clusterer/latest.svg)](https://index.scala-lang.org/derrickburns/generalized-kmeans-clustering/massivedatascience-clusterer)
JVM badge
[![massivedatascience-clusterer Scala version support](https://index.scala-lang.org/derrickburns/generalized-kmeans-clustering/massivedatascience-clusterer/latest-by-scala-version.svg?platform=jvm)](https://index.scala-lang.org/derrickburns/generalized-kmeans-clustering/massivedatascience-clusterer)