gemclus.data.celeux_one

gemclus.data.celeux_one(n=300, p=20, mu=1.7, random_state=None) Tuple[ndarray, ndarray][source]

Draws n samples from a Gaussian mixture with 3 isotropic components of respective means 1, 0 and 1 over 5 dimensions scaled by \mu. The data is concatenated with p additional noisy excessive random variables that are independent of the true clusters. This dataset is taken by Celeux et al., section 3.1.

Parameters:
n: int, default=300

The number of samples to draw from the gaussian mixture models.

p: int, default=20

The number of excessive noisy variables to concatenate to the dataset.

mu: float, default=1.7

Controls how the means of the components are close to each other by scaling.

random_state: int, RandomState instance or None, default=None

Determines random number generation for dataset creation. Pass an int for reproducible output across multiple runs.

Returns:
X: ndarray of shape (n, 5+p)

The samples of the dataset in an array of shape n_samples x n_features

y: ndarray of shape (n,)

The component of the GMM from which each sample was drawn.

References

Dataset - Celeux, G., Martin-Magniette, M. L., Maugis-Rabusseau, C., & Raftery, A. E. (2014). Comparing model

selection and regularization approaches to variable selection in model-based clustering. Journal de la Societe francaise de statistique, 155(2), 57-71.