gemclus.tree.Kauri

class gemclus.tree.Kauri(max_clusters=3, max_depth=None, min_samples_split=2, min_samples_leaf=1, max_features=None, max_leaves=None, kernel='linear', verbose=False, random_state=None)[source]

Implementation of the KMeans as unsupervised reward ideal tree algorithm. This model learns clusters by iteratively performing splits on different nodes of the tree and either assigning those nodes to new clusters or refurbishing them to already existing one according to some kernel-guided gain scores.

Parameters:
max_clustersint, default=3

The maximum number of clusters to form.

max_depth: int, default=None

The maximum depth to limit the tree construction. If set to None, then the tree is not limited in depth.

min_samples_split: int, default=2

The minimum number of samples that must be contained in a leaf node to consider splitting it into two new leaves.

min_samples_leaf: int, default=1

The minimum number of samples that must be at least in a leaf. Note that the logical constraint min_samples_leaf`*2 <= `min_samples_split must be satisfied.

max_features: int, default=None

The maximal number of features (randomly selected) to consider upon the choice of splitting a leaf. If set to None, then all features of the data will be used.

max_leaves: int, default=None

The maximal number of leaves that can be found in the tree. If set to None, then the tree is not limited in number of leaves.

kernel: {‘additive_chi2’, ‘chi2’, ‘cosine’,’linear’,’poly’,’polynomial’,’rbf’,’laplacian’,’sigmoid’, ‘precomputed’},

default=’linear’ The kernel to use in combination with the MMD objective. It corresponds to one value of KERNEL_PARAMS. Currently, all kernel parameters are the default ones. If set to ‘precomputed’, then a custom kernel must be passed to the y argument of the fit or fit_predict method.

verbose: bool, default=False

Whether to print progress messages to stdout

random_state: int, RandomState instance, default=None

Determines random number generation for feature exploration. Pass an int for reproducible results across multiple function calls.

Attributes:
labels_: ndarray of shape (n_samples,)

The cluster in which each sample of the data was put

tree_: Tree instance

The underlying Tree object. Please refer to help(sklearn.tree._tree.Tree) for attributes of Tree object.

__init__(max_clusters=3, max_depth=None, min_samples_split=2, min_samples_leaf=1, max_features=None, max_leaves=None, kernel='linear', verbose=False, random_state=None)[source]
fit(X, y=None)[source]

Performs the KAURI algorithm by repeatedly choosing leaves, evaluating best gain and increasing the tree structure until structural limits or maximal gains are reached.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

Training instances to cluster.

yndarray of shape (n_samples, n_samples), default=None

Use this parameter to give a precomputed affinity metric if the option “precomputed” was passed during construction. Otherwise, it is not used and present here for API consistency by convention.

Returns:
selfobject

Fitted estimator.

fit_predict(X, y=None)[source]

Performs the KAURI algorithm by repeatedly choosing leaves, evaluating best gain and increasing the tree structure until structural limits or maximal gains are reached. Returns the assigned clusters to the data samples.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

Training instances to cluster.

yndarray of shape (n_samples, n_samples), default=None

Use this parameter to give a precomputed affinity metric if the option “precomputed” was passed during construction. Otherwise, it is not used and present here for API consistency by convention.

Returns:
y_predndarray of shape (n_samples,)

Vector containing the cluster label for each sample.

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X)[source]

Passes the data samples X through the tree structure to assign cluster membership. This method can be called only once fit or fit_predict was performed.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

Training instances to cluster.

Returns:
y_predndarray of shape (n_samples,)

Vector containing the cluster label for each sample.

score(X, y=None)[source]

Return the value of the GEMINI evaluated on the given test data. Note that this GEMINI is a special variation for the MMD-GEMINI with dirac distributions and hence may be different up to constants or factors of the actual GEMINI.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

Test samples.

yndarray of shape (n_samples, n_samples), default=None

Use this parameter to give a precomputed affinity metric if the option “precomputed” was passed during construction. Otherwise, it is not used and present here for API consistency by convention.

Returns:
scorefloat

GEMINI evaluated on the output of self.predict(X).

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

Examples using gemclus.tree.Kauri

Building an unsupervised tree with kernel-kmeans objective: KAURI

Building an unsupervised tree with kernel-kmeans objective: KAURI