gemclus.data.gstm

gemclus.data.gstm(n=500, alpha=2, df=1, random_state=None)[source]

Reproduces the Gaussian-Student Mixture dataset from the GEMINI article.

Parameters:
n: int, default=500

The number of samples to draw from the dataset.

alpha: float, default=2:

This parameter controls how close the means of the Gaussian distribution and the location of the Student-t distribution are.

df: float, default=1

The degrees of freedom for the Student-t distribution.

random_state: int, RandomState instance or None, default=None

Determines random number generation for dataset creation. Pass an int for reproducible output across multiple runs.

Returns:
X: ndarray of shape (n,2)

The samples of the dataset in an array of shape n_samples x n_features

y: ndarray of shape (n,)

The component of the GMM from which each sample was drawn.

References

GEMINI - Ohl, L., Mattei, P. A., Bouveyron, C., Harchaoui, W., Leclercq, M., Droit, A., & Precioso, F.

(2022, October). Generalised Mutual Information for Discriminative Clustering. In Advances in Neural Information Processing Systems.

Examples using gemclus.data.gstm

Example of decision boundary map for a mixture of Gaussian and low-degree Student distributions

Example of decision boundary map for a mixture of Gaussian and low-degree Student distributions