fit¶
- PLSCAN.fit(X, y=None, *, sample_weights=None, **fit_params)¶
Computes PLSCAN clusters and hierarchies for the input data. Several inputs are supported, including feature vectors, precomputed sorted (partial) minimums spanning trees, dense or sparse distance matrices, and k-nearest neighbors graphs.
The input data does not have to form a single connected component, and the algorithm will select the minimum cluster size that maximizes the total persistence over all components. The components themselves are never selected as clusters.
- Parameters:
X (
ndarray[tuple[int,...]] |tuple|csr_array) –The input data. If metric is not set to “precomputed”, the X must be a 2D array of shape (num_points, num_features). Missing values are not supported.
If metric is set to “precomputed”, the input is a (sparse) distance matrix in one of the following formats:
- tuple of (edges, num_points)
A minimum spanning tree where edges is a 2D array in the format (parent, child, distance) and num_points is the number of points in the input data. There should be at most num_points - 1 edges. Edges must be sorted by distance.
- tuple of (distances, indices)
A k-nearest neighbors graph where distances is a 2D array of distances and indices is a 2D array of child indices. Rows must be sorted by distance. Negative indices indicate missing edges and must occur after all valid edges in their row.
- np.ndarray[tuple[int, …], np.dtype[np.float32]]:
A condensed or full square distance matrix. The diagonal is filled with zeros before processing.
- csr_array:
A sparse distance matrix in CSR format. Self-loops and explicit zeros are removed before processing.
In all cases, distance values should be non-negative. In cases 2 through 4, each point should have min_samples neighbors. Infinite distances, either as input or as a result of too few neighbors, may break plots and the bi-persistence computation.
y (
None, default:None) – Ignored, present for compatibility with scikit-learn.sample_weights (
ndarray[tuple[int],dtype[single]] |None, default:None) – Sample weights for the points in the sorted minimum spanning tree. If None, all samples are considered equally weighted.**fit_params – Unused additional parameters for compatibility with scikit-learn.
- Returns:
self – The fitted PLSCAN instance.