.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/feature_selection/plot_feature_selection_mlp.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_feature_selection_plot_feature_selection_mlp.py: ================================================= Feature selection using the Sparse MMD OvA (MLP) ================================================= In this example, we ask the :class:`gemclus.sparse.SparseMLPMMD` to perform a path where the regularisation penalty is progressively increased until all features but 2 are discarded. The model then keeps the best weights with the minimum number of features that maintains a GEMINI score close to 90% of the maximum GEMINI value encountered during the path. The dataset consists of 3 isotropic Gaussian distributions (so 3 clusters to find) in 2d with 48 noisy variables. Thus, the optimal solution should find that only 2 features are relevant and sufficient to get the correct clustering. .. GENERATED FROM PYTHON SOURCE LINES 14-21 .. code-block:: Python import numpy as np from matplotlib import pyplot as plt from sklearn import datasets from gemclus.sparse import SparseMLPMMD .. GENERATED FROM PYTHON SOURCE LINES 22-24 Load a simple synthetic dataset -------------------------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 26-34 .. code-block:: Python # Generate samples on that are simple to separate X, y = datasets.make_blobs(centers=3, cluster_std=0.5, n_samples=200, random_state=0) # Add extra noisy variables np.random.seed(0) X = np.concatenate([X, np.random.normal(scale=0.5, size=(200, 18))], axis=1) .. GENERATED FROM PYTHON SOURCE LINES 35-39 Train the model -------------------------------------------------------------- Create the GEMINI clustering model (just a logistic regression) and call the .path method to iteratively select features through gradient descent. .. GENERATED FROM PYTHON SOURCE LINES 41-47 .. code-block:: Python clf = SparseMLPMMD(random_state=0, alpha=1, batch_size=50, max_iter=25, learning_rate=0.001) # Perform a path search to eliminate all features best_weights, geminis, penalties, alphas, n_features = clf.path(X) .. rst-class:: sphx-glr-script-out .. code-block:: none /home/circleci/.local/lib/python3.10/site-packages/sklearn/base.py:474: FutureWarning: `BaseEstimator._validate_data` is deprecated in 1.6 and will be removed in 1.7. Use `sklearn.utils.validation.validate_data` instead. This function becomes public and is part of the scikit-learn developer API. warnings.warn( /home/circleci/.local/lib/python3.10/site-packages/sklearn/base.py:474: FutureWarning: `BaseEstimator._validate_data` is deprecated in 1.6 and will be removed in 1.7. Use `sklearn.utils.validation.validate_data` instead. This function becomes public and is part of the scikit-learn developer API. warnings.warn( .. GENERATED FROM PYTHON SOURCE LINES 48-52 Path results ------------ Take a look at how our features are distributed .. GENERATED FROM PYTHON SOURCE LINES 54-85 .. code-block:: Python # Highlight the number of selected features and the GEMINI along decreasing increasing alphas plt.figure(figsize=(10, 5)) plt.subplot(2, 2, 1) plt.title("Number of features depending on $\\alpha$") plt.plot(alphas, n_features) plt.xlabel("Regularisation penalty $\\alpha$") plt.ylabel("Number of used features") plt.subplot(2, 2, 2) plt.title("GEMINI score depending on $\\alpha$") plt.plot(alphas, geminis) plt.xlabel("$\\alpha$") plt.ylabel("GEMINI score") plt.ylim(0, max(geminis) + 0.5) plt.subplot(2, 2, 3) plt.title("Group-Lasso penalty depending on $\\alpha$") plt.plot(alphas, penalties) plt.xlabel("$\\alpha$") plt.ylabel("Penalty") plt.subplot(2, 2, 4) plt.title("Total score depending on $\\alpha$") plt.plot(alphas, np.array(geminis) - np.array(penalties) * alphas) plt.xlabel("$\\alpha$") plt.ylabel("Total score") plt.tight_layout() plt.show() print(f"Selected features: {clf.get_selection()}") print(f"The model score is {clf.score(X)}") print(f"Top gemini score was {max(geminis)}, which settles an optimum of {0.9 * max(geminis)}") .. image-sg:: /auto_examples/feature_selection/images/sphx_glr_plot_feature_selection_mlp_001.png :alt: Number of features depending on $\alpha$, GEMINI score depending on $\alpha$, Group-Lasso penalty depending on $\alpha$, Total score depending on $\alpha$ :srcset: /auto_examples/feature_selection/images/sphx_glr_plot_feature_selection_mlp_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Selected features: [0 1] The model score is 1.4596486286280357 Top gemini score was 1.5824499242836227, which settles an optimum of 1.4242049318552605 .. GENERATED FROM PYTHON SOURCE LINES 86-88 Final Clustering ----------------- .. GENERATED FROM PYTHON SOURCE LINES 90-110 .. code-block:: Python # Now, show the cluster predictions y_pred = clf.predict(X) X_0 = X[y_pred == 0] X_1 = X[y_pred == 1] X_2 = X[y_pred == 2] ax0 = plt.scatter(X_0[:, 0], X_0[:, 1], c='crimson', s=50) ax1 = plt.scatter(X_1[:, 0], X_1[:, 1], c='deepskyblue', s=50) ax2 = plt.scatter(X_2[:, 0], X_2[:, 1], c='darkgreen', s=50) leg = plt.legend([ax0, ax1, ax2], ['Cluster 0', 'Cluster 1', 'Cluster 2'], loc='upper left', fancybox=True, scatterpoints=1) leg.get_frame().set_alpha(0.5) plt.xlabel('Feature 1') plt.ylabel('Feature 2') plt.show() .. image-sg:: /auto_examples/feature_selection/images/sphx_glr_plot_feature_selection_mlp_002.png :alt: plot feature selection mlp :srcset: /auto_examples/feature_selection/images/sphx_glr_plot_feature_selection_mlp_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 6.522 seconds) .. _sphx_glr_download_auto_examples_feature_selection_plot_feature_selection_mlp.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_feature_selection_mlp.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_feature_selection_mlp.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_feature_selection_mlp.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_