`gemclus.sparse`.SparseMLPModel¶

class gemclus.sparse.SparseMLPModel(n_clusters=3, gemini='mmd_ova', groups=None, max_iter=1000, learning_rate=0.001, n_hidden_dim=20, M=10, alpha=0.01, dynamic=False, solver='adam', batch_size=None, verbose=False, random_state=None)[source]¶

Implementation of a neural network as a clustering distribution \(p(y|x)\) with variable selection.

On top of the vanilla MLP GEMINI model, this variation brings a skip connection from the data to the cluster output. This skip connection ensures a sparsity constraint through a group-lasso penalty and a proximal gradient that eliminates input features as well in the first layer of the MLP.

This architecture is inspired from LassoNet by Lemhadri et al. (2021).

Parameters:

n_clustersint, default=3

The maximum number of clusters to form as well as the number of output neurons in the neural network.

groups: list of arrays of various shapes, default=None

If groups is set, it must describe a partition of the indices of variables. This will be used for performing variable selection with groups of features considered to represent one variable. This option can typically be used for one-hot-encoded variables. Variable indices that are not entered will be considered alone. For example, with 3 features, accepted values can be [[0],[1],[2]], [[0,1],[2]] or [[0,1]].

max_iter: int, default=1000

Maximum number of epochs to perform gradient descent in a single run.

learning_rate: float, default=1e-3

Initial learning rate used. It controls the step-size in updating the weights.

n_hidden_dim: int, default=20

The number of neurons in the hidden layer of the neural network.

dynamic: bool, default=False

Whether to run the path in dynamic mode or not. The dynamic mode consists of affinities computed using only the subset of selected variables instead of all variables.

solver: {‘sgd’,’adam’}, default=’adam’

The solver for weight optimisation.

‘sgd’ refers to stochastic gradient descent.
‘adam’ refers to a stochastic gradient-based optimiser proposed by Kingma, Diederik and Jimmy Ba.

alpha: float, default=1e-2

The weight of the group-lasso penalty in the optimisation scheme.

M: float, default=10 The hierarchy coefficient that controls the relative strength between the group-lasso

penalty of the skip connection and the sparsity of the first layer of the MLP.

batch_size: int, default=None

The size of batches during gradient descent training. If set to None, the whole data will be considered.

verbose: bool, default=False

Whether to print progress messages to stdout

random_state: int, RandomState instance, default=None

Determines random number generation for weights and bias initialisation. Pass an int for reproducible results across multiple function calls.

Attributes:

W1_: ndarray, shape (n_features, n_hidden_dim): The linear weights of the first layer
b1_: ndarray of shape (1, n_hidden_dim): The biases of the first layer
W2_: ndarray of shape (n_hidden_dim, n_clusters): The linear weights of the hidden layer
b2_: ndarray of shape (1, n_clusters): The biases of the hidden layer
W_skip_: ndarray of shape (n_features, n_clusters): The linear weights of the skip connection
optimiser_: `AdamOptimizer` or `SGDOptimizer`: The optimisation algorithm used for training depending on the chosen solver parameter.
labels_: ndarray of shape (n_samples): The labels that were assigned to the samples passed to the fit() method.
n_iter_: int: The number of iterations that the model took for converging.
H_: ndarray of shape (n_samples, n_hidden_dim): The hidden representation of the samples after fitting.
groups_: list of lists of int or None: The explicit partition of the variables formed by the groups parameter if it was not None.

Examples using `gemclus.sparse.SparseMLPModel`¶

Feature selection using the Sparse MMD OvA (MLP)

gemclus.sparse.SparseMLPModel¶

Examples using gemclus.sparse.SparseMLPModel¶

`gemclus.sparse`.SparseMLPModel¶

Examples using `gemclus.sparse.SparseMLPModel`¶