fit

PLSCAN.fit(X, y=None, *, sample_weights=None, **fit_params)

Computes PLSCAN clusters and hierarchies for the input data. Several inputs are supported, including feature vectors, precomputed sorted (partial) minimums spanning trees, dense or sparse distance matrices, and k-nearest neighbors graphs.

The input data does not have to form a single connected component, and the algorithm will select the minimum cluster size that maximizes the total persistence over all components. The components themselves are never selected as clusters.

Parameters:
  • X (ndarray[tuple[int, ...]] | tuple | csr_array) –

    The input data. If metric is not set to “precomputed”, the X must be a 2D array of shape (num_points, num_features). Missing values are not supported.

    If metric is set to “precomputed”, the input is a (sparse) distance matrix in one of the following formats:

    1. tuple of (edges, num_points)

      A minimum spanning tree where edges is a 2D array in the format (parent, child, distance) and num_points is the number of points in the input data. There should be at most num_points - 1 edges. Edges must be sorted by distance.

    2. tuple of (distances, indices)

      A k-nearest neighbors graph where distances is a 2D array of distances and indices is a 2D array of child indices. Rows must be sorted by distance. Negative indices indicate missing edges and must occur after all valid edges in their row.

    3. np.ndarray[tuple[int, …], np.dtype[np.float32]]:

      A condensed or full square distance matrix. The diagonal is filled with zeros before processing.

    4. csr_array:

      A sparse distance matrix in CSR format. Self-loops and explicit zeros are removed before processing.

    In all cases, distance values should be non-negative. In cases 2 through 4, each point should have min_samples neighbors. Infinite distances, either as input or as a result of too few neighbors, may break plots and the bi-persistence computation.

  • y (None, default: None) – Ignored, present for compatibility with scikit-learn.

  • sample_weights (ndarray[tuple[int], dtype[single]] | None, default: None) – Sample weights for the points in the sorted minimum spanning tree. If None, all samples are considered equally weighted.

  • **fit_params – Unused additional parameters for compatibility with scikit-learn.

Returns:

self – The fitted PLSCAN instance.