gemclus.gemini
.MMDGEMINI¶
- class gemclus.gemini.MMDGEMINI(ovo=False, kernel='linear', kernel_params=None, epsilon=1e-12)[source]¶
Implements the one-vs-all and one-vs-one MMD GEMINI. The one-vs-all version compares the maximum mean discrepancy between a cluster distribution and the data distribution.
The one-vs-one objective is equivalent to a kernel KMeans objective.
\[\mathcal{I} = \mathbb{E}_{y \sim p(y)}[\text{MMD}_\kappa(p(x|y)\|p(x|y))]\]where \(\kappa\) is a kernel defined between the samples of the data space.
The one-vs-one compares the maximum mean discrepancy between two cluster distributions.
\[\mathcal{I} = \mathbb{E}_{y_a,y_b \sim p(y)}[\text{MMD}_\kappa(p(x|y_a)\|p(x|y_b))]\]- Parameters:
- ovo: bool, default=False
Whether to use the one-vs-all objective (False) or the one-vs-one objective (True).
- kernel: {‘additive_chi2’, ‘chi2’, ‘cosine’,’linear’,’poly’,’polynomial’,’rbf’,’laplacian’,’sigmoid’, ‘precomputed’},
default=’linear’ The kernel to use in combination with the MMD objective. It corresponds to one value of KERNEL_PARAMS. Currently, all kernel parameters are the default ones. If the kernel is set to ‘precomputed’, then a custom kernel matrix must be passed to the argument affinity of the evaluate method.
- kernel_params: dict, default=None
Additional keyword arguments for the kernel function. Ignored if the kernel is callable or precomputed.
- epsilon: float, default=1e-12
The precision for clipping the prediction values in order to avoid numerical instabilities.
- compute_affinity(X, y=None)[source]¶
Compute the kernel between all samples of X.
- Parameters:
- X: ndarray of shape (n_samples, n_features)
The samples between which all affinities must be computed
- y: ndarray of shape (n_samples, n_samples), default=None
Values of the affinity between samples in case of a “precomputed” affinity. Ignored if None and the affinity is not precomputed.
- Returns:
- affinity: ndarray of shape (n_samples, n_samples)
The kernel between all samples if it is needed for the GEMINI computations, None otherwise.
- evaluate(y_pred, affinity, return_grad=False)[source]¶
Compute the GEMINI objective given the predictions \(p(y|x)\) and an affinity matrix. The computation must return as well the gradients of the GEMINI w.r.t. the predictions. Depending on the context, the affinity matrix affinity can be either a kernel matrix or a distance matrix resulting from the compute_affinity method.
- Parameters:
- y_pred: ndarray of shape (n_samples, n_clusters)
The conditional distribution (prediction) of clustering assignment per sample.
- affinity: ndarray of shape (n_samples, n_samples)
The affinity matrix resulting from the compute_affinity method. The matrix must be symmetric.
- return_grad: bool, default=False
If True, the method should return the gradient of the GEMINI w.r.t. the predictions \(p(y|x)\).
- Returns:
- gemini: float
The gemini score of the model given the predictions and affinities.
- gradients: ndarray of shape (n_samples, n_clusters)
The derivative w.r.t. the predictions y_pred: \(\nabla_{p (y|x)} \mathcal{I}\)