gemclus.data.celeux_two

gemclus.data.celeux_two(n=2000, random_state=None) Tuple[ndarray, ndarray][source]

Draws samples from a mixture of 4 Gaussian distributions in 2d with additional variables linearly dependent of the informative variables and non-informative noisy variables. This dataset is taken from Celeux et al., section 3.2.

Parameters:
n: int, default=2000

The number of samples to draw.

random_state: int, RandomState instance or None, default=None

Determines random number generation for dataset creation. Pass an int for reproducible output across multiple runs.

Returns:
X: ndarray of shape (n, 14)

The samples of the dataset in an array of shape n_samples x n_features

y: ndarray of shape (n,)

The component of the GMM from which each sample was drawn.

References

Dataset - Celeux, G., Martin-Magniette, M. L., Maugis-Rabusseau, C., & Raftery, A. E. (2014). Comparing model

selection and regularization approaches to variable selection in model-based clustering. Journal de la Societe francaise de statistique, 155(2), 57-71.