gemclus.data
.celeux_one¶
- gemclus.data.celeux_one(n=300, p=20, mu=1.7, random_state=None) Tuple[ndarray, ndarray] [source]¶
Draws n samples from a Gaussian mixture with 3 isotropic components of respective means 1, 0 and 1 over 5 dimensions scaled by \mu. The data is concatenated with p additional noisy excessive random variables that are independent of the true clusters. This dataset is taken by Celeux et al., section 3.1.
- Parameters:
- n: int, default=300
The number of samples to draw from the gaussian mixture models.
- p: int, default=20
The number of excessive noisy variables to concatenate to the dataset.
- mu: float, default=1.7
Controls how the means of the components are close to each other by scaling.
- random_state: int, RandomState instance or None, default=None
Determines random number generation for dataset creation. Pass an int for reproducible output across multiple runs.
- Returns:
- X: ndarray of shape (n, 5+p)
The samples of the dataset in an array of shape n_samples x n_features
- y: ndarray of shape (n,)
The component of the GMM from which each sample was drawn.
References
- Dataset - Celeux, G., Martin-Magniette, M. L., Maugis-Rabusseau, C., & Raftery, A. E. (2014). Comparing model
selection and regularization approaches to variable selection in model-based clustering. Journal de la Societe francaise de statistique, 155(2), 57-71.