distance_cut

PLSCAN.distance_cut(epsilon)

Return a DBSCAN*-style clustering at a fixed distance threshold.

Selects all leaf-clusters whose birth distance is at most epsilon and returns labels and membership probabilities for those clusters. This is equivalent to running DBSCAN* with eps = epsilon and min_samples equal to the fitted value: points that fall outside every selected cluster are labelled as noise (-1).

Parameters:

epsilon (float) – Distance threshold. Only leaf-clusters with birth distance epsilon are selected. Use epsilon = 0 to select no clusters (all noise) and epsilon = np.inf to select all leaf-clusters.

Return type:

Labelling

Returns:

  • labels – int64 array of shape (n_samples,). Cluster indices are zero-based; noise points are -1.

  • probabilities – float32 array of shape (n_samples,) with cluster membership probabilities in [0, 1].