`gemclus.sparse`.SparseMLPMMD¶

class gemclus.sparse.SparseMLPMMD(n_clusters=3, groups=None, max_iter=1000, learning_rate=0.001, n_hidden_dim=20, kernel='linear', M=10, batch_size=None, alpha=0.01, ovo=False, dynamic=False, solver='adam', verbose=False, random_state=None, kernel_params=None)[source]¶

This is the Sparse Version of the MLP MMD model.

On top of the vanilla MLP GEMINI model, this variation brings a skip connection from the data to the cluster output. This skip connection ensures a sparsity constraint through a group-lasso penalty and a proximal gradient that eliminates input features as well in the first layer of the MLP.

This architecture is inspired from LassoNet by Lemhadri et al (2021).

Parameters:

n_clustersint, default=3

The maximum number of clusters to form as well as the number of output neurons in the neural network.

groups: list of arrays of various shapes, default=None

If groups is set, it must describe a partition of the indices of variables. This will be used for performing variable selection with groups of features considered to represent one variable. This option can typically be used for one-hot-encoded variables. Variable indices that are not entered will be considered alone. For example, with 3 features, accepted values can be [[0],[1],[2]], [[0,1],[2]] or [[0,1]].

max_iter: int, default=1000

Maximum number of epochs to perform gradient descent in a single run.

learning_rate: float, default=1e-3

Initial learning rate used. It controls the step-size in updating the weights.

n_hidden_dim: int, default=20

The number of neurons in the hidden layer of the neural network.

kernel: {‘additive_chi2’, ‘chi2’, ‘cosine’,’linear’,’poly’,’polynomial’,’rbf’,’laplacian’,’sigmoid’, ‘precomputed’},

default=’linear’ The kernel to use in combination with the MMD objective. It corresponds to one value of KERNEL_PARAMS. Currently, all kernel parameters are the default ones. If the kernel is set to ‘precomputed’, then a custom kernel matrix must be passed to the argument y of fit, fit_predict and/or score.

ovo: bool, default=False

Whether to run the model using the MMD OvA (False) or the MMD OvO (True).

solver: {‘sgd’,’adam’}, default=’adam’

The solver for weight optimisation.

‘sgd’ refers to stochastic gradient descent.
‘adam’ refers to a stochastic gradient-based optimiser proposed by Kingma, Diederik and Jimmy Ba.

alpha: float, default=1e-2

The weight of the group-lasso penalty in the optimisation scheme.

M: float, default=10 The hierarchy coefficient that controls the relative strength between the group-lasso

penalty of the skip connection and the sparsity of the first layer of the MLP.

dynamic: bool, default=False

Whether to run the path in dynamic mode or not. The dynamic mode consists of affinities computed using only the subset of selected variables instead of all variables.

batch_size: int, default=None

The size of batches during gradient descent training. If set to None, the whole data will be considered.

verbose: bool, default=False

Whether to print progress messages to stdout

random_state: int, RandomState instance, default=None

Determines random number generation for weights and bias initialisation. Pass an int for reproducible results across multiple function calls.

Attributes:

W1_: ndarray, shape (n_features, n_hidden_dim): The linear weights of the first layer
b1_: ndarray of shape (1, n_hidden_dim): The biases of the first layer
W2_: ndarray of shape (n_hidden_dim, n_clusters): The linear weights of the hidden layer
b2_: ndarray of shape (1, n_clusters): The biases of the hidden layer
W_skip_: ndarray of shape (n_features, n_clusters): The linear weights of the skip connection
optimiser_: `AdamOptimizer` or `SGDOptimizer`: The optimisation algorithm used for training depending on the chosen solver parameter.
labels_: ndarray of shape (n_samples): The labels that were assigned to the samples passed to the fit() method.
n_iter_: int: The number of iterations that the model took for converging.
H_: ndarray of shape (n_samples, n_hidden_dim): The hidden representation of the samples after fitting.
groups_: list of lists of int or None: The explicit partition of the variables formed by the groups parameter if it was not None.

See also

SparseMLPModel: sparse two-layer neural network trained with any GEMINI
SparseLinearMMD: sparse logistic regression trained for clustering with the MMD GEMINI

References

GEMINI - Generalised Mutual Information for Discriminative Clustering: Louis Ohl, Pierre-Alexandre Mattei, Charles Bouveyron, Warith Harchaoui, Mickaël Leclercq, Arnaud Droit, Frederic Precioso
LassoNet architecture - LassoNet: A Neural Network with Feature Sparsity.: Lemhadri, I., Ruan, F., Abraham, L., & Tibshirani, R.
Sparse GEMINI - Sparse GEMINI for joint discriminative clustering and feature selection: Louis Ohl, Pierre-Alexandre Mattei, Charles Bouveyron, Mickaël Leclercq, Arnaud Droit, Frederic Precioso

Examples

>>> from sklearn.datasets import load_iris
>>> from gemclus.sparse import SparseMLPMMD
>>> X,y=load_iris(return_X_y=True)
>>> clf = SparseMLPMMD(random_state=0).fit(X)
>>> clf.predict(X[:2,:])
array([0, 0])
>>> clf.predict_proba(X[:2,:]).shape
(2, 3)
>>> clf.score(X)
1.7664211836

__init__(n_clusters=3, groups=None, max_iter=1000, learning_rate=0.001, n_hidden_dim=20, kernel='linear', M=10, batch_size=None, alpha=0.01, ovo=False, dynamic=False, solver='adam', verbose=False, random_state=None, kernel_params=None)[source]¶

fit(X, y=None)¶

Compute GEMINI clustering.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): Training instances to cluster.
yndarray of shape (n_samples, n_samples), default=None: Use this parameter to give a precomputed affinity metric if the option “precomputed” was passed during construction. Otherwise, it is not used and present here for API consistency by convention.

Returns:

selfobject: Fitted estimator.

fit_predict(X, y=None)¶

Compute GEMINI clustering and returns the predicted clusters.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): Training instances to cluster.
yndarray of shape (n_samples, n_samples), default=None: Use this parameter to give a precomputed affinity metric if the option “precomputed” was passed during construction. Otherwise, it is not used and present here for API consistency by convention.

Returns:

y_predndarray of shape (n_samples,): Vector containing the cluster label for each sample.

get_gemini()[source]¶

Initialise a gemclus.GEMINI instance that will be used to train the model.

Returns:

gemini: a GEMINI instance

get_metadata_routing()¶

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

get_selection()¶

Retrieves the indices of features that were selected by the model.

Returns:

ind: ndarray: The indices of the selected features.

path(X, y=None, alpha_multiplier=1.05, min_features=2, keep_threshold=0.9, restore_best_weights=True, early_stopping_factor=0.99, max_patience=10)¶

Unfold the progressive geometric increase of the penalty weight starting from the initial alpha until there remains only a specified amount of features.

The history of the different gemini scores are kept as well as the best weights with minimum of features ensuring that the GEMINI score remains at a certain percentage of the maximum GEMINI score seen during the path.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): Test samples on which the feature reduction will be made.
yndarray of shape (n_samples, n_samples), default=None: Use this parameter to give a precomputed affinity metric if the option “precomputed” was passed during construction. Otherwise, it is not used. This parameter is incompatible with the dynamic mode.
alpha_multiplierfloat, default=1.05: The geometric increase of the group-lasso penalty at each-retraining. It must be greater than 1.
min_features: int, default=2: The number of features that must remain at best to stop performing the path.
keep_threshold: float, default=0.9: The percentage of the maximal GEMINI under which any solution with a minimal number of features is deemed best.
restore_best_weights: bool, default=True: After performing the path, the best weights offering simultaneously good GEMINI score and few features are restored to the model. If the model is set to dynamic=True, then this option will be ignored because of the incomparable nature of GEMINIs when the number of selected variables change.
early_stopping_factor: float, default=0.99: The percentage factor beyond which upgrades of the GEMINI or the group-lasso penalty are considered too small for early stopping.
max_patience:: The maximum number of iterations to wait without any improvements in either the gemini score or the group-lasso penalty before stopping the current step.

Returns:

best_weights: list of ndarray of various shapes of length 5: The list containing the best weights during the path. Sequentially: W1_, W2_, W_skip_, b1_, b2_
geminis: list of float of length T: The history of the gemini scores as the penalty alpha was increased.
group_penalties: list of float of length T: The history of the group-lasso penalties
alphas: list of float of length T: The history of the penalty alphas during the path.
n_features: list of float of length T: The number of features that were selected at step t.

predict(X)¶

Return the cluster membership of samples. This can only be called after the model was fit to some data.

Parameters:

X{array-like, sparse matrix}, shape (n_samples, n_features): The input samples.

Returns:

yndarray of shape (n_samples,): The label for each sample is the label of the closest sample seen during fit.

predict_proba(X)¶

Probability estimates that are the output of the neural network p(y|x). The returned estimates for all classes are ordered by the label of classes.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): Vector to be scored, where n_samples is the number of samples and n_features is the number of features.

Returns:

Tarray-like of shape (n_samples, n_clusters): Returns the probability of the sample for each cluster in the model.

score(X, y=None)¶

Return the value of the GEMINI evaluated on the given test data.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): Test samples.
yndarray of shape (n_samples, n_samples), default=None: Use this parameter to give a precomputed affinity metric if the option “precomputed” was passed during construction. Otherwise, it is not used and present here for API consistency by convention.

Returns:

scorefloat: GEMINI evaluated on the output of self.predict(X).

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

Examples using `gemclus.sparse.SparseMLPMMD`¶

Feature selection using the Sparse MMD OvA (MLP)

gemclus.sparse.SparseMLPMMD¶

Examples using gemclus.sparse.SparseMLPMMD¶

`gemclus.sparse`.SparseMLPMMD¶

Examples using `gemclus.sparse.SparseMLPMMD`¶