.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/feature_selection/plot_grouped_selection.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_feature_selection_plot_grouped_selection.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_feature_selection_plot_grouped_selection.py:


=================================================================
Grouped Feature selection with a linear model
=================================================================

In this example, we ask the :class:`gemclus.sparse.SparseLinearMMD` to perform a path where the regularisation penalty
is progressively increased until all features but 2 are discarded. Moreover, we will produce some categorical variables
that are one-hot-encoded and constrain the model to consider these features altogether using the `groups` option of the
model.

The dataset consists of 2 binomial variables which parameters depend on the cluster (2 clusters to find) with 8 noisy
variables. Thus, the optimal solution should find that only 2 features are relevant and sufficient to get the correct
clustering.

.. GENERATED FROM PYTHON SOURCE LINES 15-23

.. code-block:: Python


    import numpy as np
    from matplotlib import pyplot as plt

    from gemclus.sparse import SparseLinearMMD

    np.random.seed(0)


.. GENERATED FROM PYTHON SOURCE LINES 24-26

Load a simple synthetic dataset
--------------------------------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 28-63

.. code-block:: Python


    # Generate the informative variables that will be the outcome of multinomial distributions
    X1_class_1 = np.random.multinomial(n=1, pvals=np.array([0.05, 0.45, 0.45, 0.05]), size=(50,))
    X2_class_1 = np.random.multinomial(n=1, pvals=np.array([0.1, 0.1, 0.8]), size=(50,))
    X_class_1 = np.concatenate([X1_class_1, X2_class_1], axis=1)
    X1_class_2 = np.random.multinomial(n=1, pvals=np.array([0.45, 0.05, 0.05, 0.45]), size=(50,))
    X2_class_2 = np.random.multinomial(n=1, pvals=np.array([0.8, 0.1, 0.1]), size=(50,))
    X_class_2 = np.concatenate([X1_class_2, X2_class_2], axis=1)
    X_informative = np.concatenate([X_class_1, X_class_2], axis=0) * 2

    # Generate noisy variables
    X_noise = np.random.normal(size=(100, 8))

    X = np.concatenate([X_informative, X_noise], axis=1)

    # The true cluster assignments
    y = np.repeat(np.arange(2), 50)

    # Finally, write out the partition of the dataset
    groups = [np.arange(4), np.arange(4, 7)]
    # for i in range(8):
    #     groups += [np.array([i + 7])]

    print(groups, X.shape)


    # Visualise clusters
    def rand_jitter(data):
        return data + np.random.randn(len(data)) * 0.01


    plt.scatter(rand_jitter(X1_class_1.argmax(1)), rand_jitter(X2_class_1.argmax(1)), c="red")
    plt.scatter(rand_jitter(X1_class_2.argmax(1)), rand_jitter(X2_class_2.argmax(1)), c="blue")
    plt.show()


.. image-sg:: /auto_examples/feature_selection/images/sphx_glr_plot_grouped_selection_001.png
   :alt: plot grouped selection
   :srcset: /auto_examples/feature_selection/images/sphx_glr_plot_grouped_selection_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [array([0, 1, 2, 3]), array([4, 5, 6])] (100, 15)


.. GENERATED FROM PYTHON SOURCE LINES 64-68

Train the model
--------------------------------------------------------------
Create the GEMINI clustering model (just a logistic regression) and call the .path method to iteratively select
features through gradient descent.

.. GENERATED FROM PYTHON SOURCE LINES 70-76

.. code-block:: Python


    clf = SparseLinearMMD(groups=groups, random_state=0, alpha=1, max_iter=100, batch_size=50, learning_rate=1e-2)

    # Perform a path search to eliminate all features, we lower the threshold to 80% of the max GEMINI in feature selection
    best_weights, geminis, penalties, alphas, n_features = clf.path(X, keep_threshold=0.8)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    /home/circleci/.local/lib/python3.10/site-packages/sklearn/base.py:474: FutureWarning: `BaseEstimator._validate_data` is deprecated in 1.6 and will be removed in 1.7. Use `sklearn.utils.validation.validate_data` instead. This function becomes public and is part of the scikit-learn developer API.
      warnings.warn(
    /home/circleci/.local/lib/python3.10/site-packages/sklearn/base.py:474: FutureWarning: `BaseEstimator._validate_data` is deprecated in 1.6 and will be removed in 1.7. Use `sklearn.utils.validation.validate_data` instead. This function becomes public and is part of the scikit-learn developer API.
      warnings.warn(


.. GENERATED FROM PYTHON SOURCE LINES 77-81

Path results
------------

Take a look at how our features are distributed

.. GENERATED FROM PYTHON SOURCE LINES 83-86

.. code-block:: Python


    print(f"Selected features: {clf.get_selection()}")
    print(f"The model score is {clf.score(X)}")
    print(f"Top gemini score was {max(geminis)}, which settles an optimum of {0.8 * max(geminis)}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Selected features: [0 1 2 3 4 5 6]
    The model score is 1.4935152433868386
    Top gemini score was 0.8084378363828287, which settles an optimum of 0.646750269106263


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 3.583 seconds)


.. _sphx_glr_download_auto_examples_feature_selection_plot_grouped_selection.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_grouped_selection.ipynb <plot_grouped_selection.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_grouped_selection.py <plot_grouped_selection.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_grouped_selection.zip <plot_grouped_selection.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_