.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/plot_graph_node_clustering.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_plot_graph_node_clustering.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_plot_graph_node_clustering.py:


=================================================
Graph node clustering with a nonparametric model
=================================================

We create here a random graph by following a simplified version of the `Latent Position Model` generative procedure.
To create the graph, we sample latent positions from a Gaussian Mixture Model and create a graph with as many nodes
as samples. Edges are then determined according to probability depending only on the distance between samples.

To perform clustering, we then use a nonparametric model which will associated to each node a clustering probability.
We indicate to this model a specific distance that is adequate for our graph nodes. Note that the parameters given to
the `fit` function instead of the data is a simple identity matrix.

.. GENERATED FROM PYTHON SOURCE LINES 14-24

.. code-block:: default

    import itertools

    import numpy as np
    from matplotlib import pyplot as plt
    from scipy.sparse import csgraph
    from sklearn import metrics

    from gemclus import data
    from gemclus.nonparametric import CategoricalWasserstein


.. GENERATED FROM PYTHON SOURCE LINES 25-27

Draw samples from a GMM
--------------------------------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 29-37

.. code-block:: default


    # Generate samples on that are simple to separate
    N = 100  # Number of nodes in the graph
    # GMM parameters
    means = np.array([[1, -1], [1, 1], [-1, -1], [-1, 1]])*3
    covariances = [np.eye(2)]*4
    X, y = data.draw_gmm(N, means, covariances, np.ones(4) / 4, random_state=0)


.. GENERATED FROM PYTHON SOURCE LINES 38-40

Create the graph edges
--------------------------------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 42-52

.. code-block:: default


    distances = metrics.pairwise_distances(X, metric="euclidean")
    edge_probs = np.exp(-distances)

    np.random.seed(0)
    adjacency_matrix = np.random.binomial(n=1, p=edge_probs)  # Determine if there is an edge from node i->j

    # Make the adjacency matrix symmetric
    adjacency_matrix = np.maximum(adjacency_matrix, adjacency_matrix.T)


.. GENERATED FROM PYTHON SOURCE LINES 53-55

Pre-compute a specific metric between samples
--------------------------------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 57-64

.. code-block:: default


    # compute the all-pairs shortest path in this graph
    distances = csgraph.floyd_warshall(adjacency_matrix, directed=False, unweighted=True)

    # Replace np.inf with 2 times the size of the matrix
    distances[np.isinf(distances)] = 2 * distances.shape[0]


.. GENERATED FROM PYTHON SOURCE LINES 65-69

Train the model
--------------------------------------------------------------
Create the Non parametric GEMINI clustering model and call the .fit method to optimise the cluster
assignment of the nodes

.. GENERATED FROM PYTHON SOURCE LINES 71-78

.. code-block:: default


    # We specify a custom metric and will pass the distance matrix to the `y` argument of `.fit`.
    model = CategoricalWasserstein(n_clusters=4, metric="precomputed", ovo=True, random_state=1789, learning_rate=1e-1)
    # In the nonparametric model, X is a dummy unnecessary variable because the parameters do not depend on the values
    # of X. There is only an index matching.
    y_pred = model.fit_predict(np.eye(N), y=distances)


.. GENERATED FROM PYTHON SOURCE LINES 79-81

Final Clustering
-----------------

.. GENERATED FROM PYTHON SOURCE LINES 83-94

.. code-block:: default


    for node_i, node_j in itertools.combinations(range(N),2):
        if adjacency_matrix[node_i,node_j]:
            plt.plot([X[node_i,0],X[node_j,0]], [X[node_i,1],X[node_j,1]], c="gray",linewidth=1,alpha=0.5)
    plt.scatter(X[:, 0], X[:, 1], c=y_pred, s=50)
    plt.show()

    ari_score = metrics.adjusted_rand_score(y, y_pred)
    gemini_score = model.score(np.eye(N), y=distances)
    print(f"Final ARI score: {ari_score:.3f}")
    print(f"GEMINI score is {gemini_score:.3f}")


.. image-sg:: /auto_examples/images/sphx_glr_plot_graph_node_clustering_001.png
   :alt: plot graph node clustering
   :srcset: /auto_examples/images/sphx_glr_plot_graph_node_clustering_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Final ARI score: 0.977
    GEMINI score is 2.141


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 7.613 seconds)


.. _sphx_glr_download_auto_examples_plot_graph_node_clustering.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_graph_node_clustering.py <plot_graph_node_clustering.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_graph_node_clustering.ipynb <plot_graph_node_clustering.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_