gemclus.linear
.RIM¶
- class gemclus.linear.RIM(n_clusters=3, max_iter=1000, learning_rate=0.001, reg=0.1, solver='adam', batch_size=None, verbose=False, random_state=None)[source]¶
Implementation of the maximisation of the classical mutual information using a logistic regression with an \(\ell_2\) penalty on the weights. This implementation follows the framework described by Krause et al. in the RIM paper.
- Parameters:
- n_clustersint, default=3
The maximum number of clusters to form as well as the number of output neurons in the neural network.
- max_iter: int, default=1000
Maximum number of epochs to perform gradient descent in a single run.
- learning_rate: float, default=1e-3
Initial learning rate used. It controls the step-size in updating the weights.
- reg: float, default=0.1
Regularisation hyperparameter for the $ell_2$ weight penalty.
- solver: {‘sgd’,’adam’}, default=’adam’
The solver for weight optimisation.
‘sgd’ refers to stochastic gradient descent.
‘adam’ refers to a stochastic gradient-based optimiser proposed by Kingma, Diederik and Jimmy Ba.
- batch_size: int, default=None
The size of batches during gradient descent training. If set to None, the whole data will be considered.
- verbose: bool, default=False
Whether to print progress messages to stdout
- random_state: int, RandomState instance, default=None
Determines random number generation for weights and bias initialisation. Pass an int for reproducible results across multiple function calls.
See also
LinearModel
logistic regression trained for clustering with any GEMINI
LinearWasserstein
logistic regression trained for clustering with the Wasserstein GEMINI
LinearMMD
logistic regression trained for clustering with the MMD GEMINI
References
- RIM - Discriminative Clustering by Regularized Information Maximization
Ryan Gomes, Andreas Krause, Pietro Perona. 2010.
Examples
>>> from sklearn.datasets import load_iris >>> from gemclus.linear import RIM >>> X,y=load_iris(return_X_y=True) >>> clf = RIM(random_state=0).fit(X) >>> clf.predict(X[:2,:]) array([0, 0]) >>> clf.predict_proba(X[:2,:]).shape (2, 3) >>> clf.score(X) 0.4390485754
- Attributes:
- W_: ndarray of shape (n_features_in, n_clusters)
The linear weights of model
- b_: ndarray of shape (1, n_clusters)
The biases of the model
- optimiser_: `AdamOptimizer` or `SGDOptimizer`
The optimisation algorithm used for training depending on the chosen solver parameter.
- labels_: ndarray of shape (n_samples)
The labels that were assigned to the samples passed to the
fit()
method.- n_iter_: int
The number of iterations that the model took for converging.
- __init__(n_clusters=3, max_iter=1000, learning_rate=0.001, reg=0.1, solver='adam', batch_size=None, verbose=False, random_state=None)[source]¶
- fit(X, y=None)¶
Compute GEMINI clustering.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
Training instances to cluster.
- yndarray of shape (n_samples, n_samples), default=None
Use this parameter to give a precomputed affinity metric if the option “precomputed” was passed during construction. Otherwise, it is not used and present here for API consistency by convention.
- Returns:
- selfobject
Fitted estimator.
- fit_predict(X, y=None)¶
Compute GEMINI clustering and returns the predicted clusters.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
Training instances to cluster.
- yndarray of shape (n_samples, n_samples), default=None
Use this parameter to give a precomputed affinity metric if the option “precomputed” was passed during construction. Otherwise, it is not used and present here for API consistency by convention.
- Returns:
- y_predndarray of shape (n_samples,)
Vector containing the cluster label for each sample.
- get_gemini()¶
Initialise a
gemclus.GEMINI
instance that will be used to train the model.- Returns:
- gemini: a GEMINI instance
- get_metadata_routing()¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X)¶
Return the cluster membership of samples. This can only be called after the model was fit to some data.
- Parameters:
- X{array-like, sparse matrix}, shape (n_samples, n_features)
The input samples.
- Returns:
- yndarray of shape (n_samples,)
The label for each sample is the label of the closest sample seen during fit.
- predict_proba(X)¶
Probability estimates that are the output of the neural network p(y|x). The returned estimates for all classes are ordered by the label of classes.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
Vector to be scored, where n_samples is the number of samples and n_features is the number of features.
- Returns:
- Tarray-like of shape (n_samples, n_clusters)
Returns the probability of the sample for each cluster in the model.
- score(X, y=None)¶
Return the value of the GEMINI evaluated on the given test data.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
Test samples.
- yndarray of shape (n_samples, n_samples), default=None
Use this parameter to give a precomputed affinity metric if the option “precomputed” was passed during construction. Otherwise, it is not used and present here for API consistency by convention.
- Returns:
- scorefloat
GEMINI evaluated on the output of
self.predict(X)
.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
Examples using gemclus.linear.RIM
¶
Simple logistic regression with RIM