gemclus.data
.celeux_two¶
- gemclus.data.celeux_two(n=2000, random_state=None) Tuple[ndarray, ndarray] [source]¶
Draws samples from a mixture of 4 Gaussian distributions in 2d with additional variables linearly dependent of the informative variables and non-informative noisy variables. This dataset is taken from Celeux et al., section 3.2.
- Parameters:
- n: int, default=2000
The number of samples to draw.
- random_state: int, RandomState instance or None, default=None
Determines random number generation for dataset creation. Pass an int for reproducible output across multiple runs.
- Returns:
- X: ndarray of shape (n, 14)
The samples of the dataset in an array of shape n_samples x n_features
- y: ndarray of shape (n,)
The component of the GMM from which each sample was drawn.
References
- Dataset - Celeux, G., Martin-Magniette, M. L., Maugis-Rabusseau, C., & Raftery, A. E. (2014). Comparing model
selection and regularization approaches to variable selection in model-based clustering. Journal de la Societe francaise de statistique, 155(2), 57-71.