Kernel KMeans clustering with GEMINI

Since the MMD GEMINI objective is equivalent in OvO mode to a kernel KMeans objective, we can use it with the nonparametric model that directly associates a cluster to each sample. The overall model would thus behave as a kernel KMeans algorithm. However, its training is done by gradient descent.

import numpy as np
from matplotlib import pyplot as plt
from sklearn import metrics, datasets

from gemclus.nonparametric import CategoricalMMD

Draw samples from a circular dataset

# We start by generating samples distributed on two circles
X, y = datasets.make_circles(n_samples=200, noise=0.05, factor=0.05, random_state=0)

# then normalise the data
X = (X - np.mean(X, 0)) / np.std(X, ddof=0)

# Have a look at it
plt.scatter(X[:, 0], X[:, 1], c=y)
plt.axis("off")
plt.ylim((-3, 3))
plt.ylim((-3, 3))
plt.show()
plot kernel kmeans

Train the model

Create the Non parametric GEMINI clustering model and call the .fit method to optimise the cluster assignment of the nodes

model = CategoricalMMD(n_clusters=2, random_state=0, kernel="rbf")
y_pred = model.fit_predict(X)

Final Clustering

plt.scatter(X[:, 0], X[:, 1], c=y_pred)
plt.show()

ari_score = metrics.adjusted_rand_score(y, y_pred)
gemini_score = model.score(X)
print(f"Final ARI score: {ari_score:.3f}")
print(f"GEMINI score is {gemini_score:.3f}")
plot kernel kmeans
Final ARI score: 1.000
GEMINI score is 0.330

Total running time of the script: (0 minutes 5.231 seconds)

Gallery generated by Sphinx-Gallery