Note
Go to the end to download the full example code.
Building an unsupervised tree with kernel-kmeans objective: KAURI¶
We show here how to obtain two different decision trees for clustering using two different kernels to accompanny the KAURI method.
The KAURI model builds decision trees using gain metrics derived from the squared MMD-GEMINI which are equivalent to KMeans optimisation.
from sklearn import datasets, metrics
from gemclus.tree import Kauri, print_kauri_tree
Load the dataset¶
Create a first tree using a linear kernel¶
# Notice that we limit the depth of the tree for simplicity
linear_model = Kauri(max_clusters=3, kernel="linear", max_depth=3)
y_pred_linear = linear_model.fit_predict(X)
print("Score of model is: ", linear_model.score(X))
Score of model is: 9459.167022308022
Create a second tree using an additive chi2 kernel¶
additive_chi2_model = Kauri(max_clusters=3, kernel="additive_chi2", max_depth=3)
y_pred_additive_chi2 = additive_chi2_model.fit_predict(X)
print("Score of model is: ", additive_chi2_model.score(X))
Score of model is: -22.43532371061057
Evaluate the performances of the model¶
print("ARI of linear kernel: ", metrics.adjusted_rand_score(y,y_pred_linear))
print("ARI of additive chi2 kernel: ", metrics.adjusted_rand_score(y,y_pred_additive_chi2))
ARI of linear kernel: 0.7172759168337549
ARI of additive chi2 kernel: 0.8680377279943841
Visualise the tree structure¶
print("Structure of the additive chi2 model")
print_kauri_tree(additive_chi2_model, iris["feature_names"])
Structure of the additive chi2 model
Node 0
|=petal width (cm) <= 0.6
| Node 1
| Cluster: 0
|=petal width (cm) > 0.6
| Node 2
| |=petal length (cm) <= 4.7
| | Node 3
| | Cluster: 2
| |=petal length (cm) > 4.7
| | Node 4
| | |=petal width (cm) <= 1.5
| | | Node 5
| | | Cluster: 2
| | |=petal width (cm) > 1.5
| | | Node 6
| | | Cluster: 1
Total running time of the script: (0 minutes 0.013 seconds)