User Guide

Content of the package

Which GEMINIs are implemented

All GEMINIs from the initial work are available: MMD and Wasserstein distances are present for geometrical considerations, as well as Kullback-Leibler divergence, Total Variation distance and squared Hellinger distance. Both OvA and OvO implementations are present in all models. The OvO mode can be set in most clustering model by adding ovo=True in the constructor of a model.

Some models propose readily integrated GEMINIs, but it is also possible to set a custom GEMINI for some models.

The Wasserstein distance requires a distance function in the data space to compute. We directly propose all distances available from sklearn.metrics.pairwise_distances, with the Euclidean distance by default. In the same manner, we provide all kernels available from sklearn.metrics.pairwise_kernels for the MMD, with the linear kernel by default. For both GEMINIs, it is possible as well to involve a precomputed distance or kernel of your own that must be then passed to the GEMINI.

What discriminative distributions are available

We propose several clustering distributions depending on the purpose:

  • Logistic regressions

  • 2-layer Multi-Layered-Perceptrons with ReLU activations

  • Decision trees (only compatible with the MMD GEMINI)

  • Differentiable trees

For the logistic regression and MLP, we also propose sparse versions to achieve feature selection along clustering. The sparse architecture of the MLP is inspired from LassoNet [1] which adds a linear skip connection between the inputs and clustering output.

We also include other models taken from the litterature that fits the scope of discriminative clustering with mutual information, e.g. the regularized mutual information (RIM): gemclus.linear.RIM [2].

If you want to use another model, you can derive the gemclus.DiscriminativeModel class and rewrite its hidden methods _infer, _get_weights, _init_params and _compute_grads. An example of extension is given Here

Basic examples

We provide some basic examples in the Example gallery, including clustering of simple distribution and how to perform feature selection using sparse models from gemclus.sparse.