gemclus.data.celeux_one

gemclus.data.celeux_one(n=300, p=20, mu=1.7, random_state=None) Tuple[ndarray, ndarray][source]

Draws \(n\) samples from a Gaussian mixture with 3 isotropic components of respective means 1, 0 and 1 over 5 dimensions scaled by \(\mu\). The data is concatenated with \(p\) additional noisy excessive random variables that are independent of the true clusters. This dataset is taken by Celeux et al., section 3.1.

Parameters:
n: int, default=300

The number of samples to draw from the gaussian mixture models.

p: int, default=20

The number of excessive noisy variables to concatenate to the dataset.

mu: float, default=1.7

Controls how the means of the components are close to each other by scaling.

random_state: int, RandomState instance or None, default=None

Determines random number generation for dataset creation. Pass an int for reproducible output across multiple runs.

Returns:
X: ndarray of shape (n, 5+p)

The samples of the dataset in an array of shape n_samples x n_features

y: ndarray of shape (n,)

The component of the GMM from which each sample was drawn.

References

Dataset - Celeux, G., Martin-Magniette, M. L., Maugis-Rabusseau, C., & Raftery, A. E. (2014). Comparing model

selection and regularization approaches to variable selection in model-based clustering. Journal de la Societe francaise de statistique, 155(2), 57-71.

Examples using gemclus.data.celeux_one

Feature selection using the Sparse MMD OvO (Logistic regression)

Feature selection using the Sparse MMD OvO (Logistic regression)

Feature selection using the Sparse Linear MI (Logistic regression)

Feature selection using the Sparse Linear MI (Logistic regression)

Consensus clustering with linking constraints on sample pairs

Consensus clustering with linking constraints on sample pairs