clusters_from_spanning_forest

fast_plscan.clusters_from_spanning_forest(sorted_mst, num_points, *, min_cluster_size=2.0, max_cluster_size=inf, persistence_measure='size', sample_weights=None)

Compute PLSCAN clusters from a sorted minimum spanning forest.

Parameters:
  • sorted_mst (SpanningTree) – A sorted (partial) minimum spanning forest.

  • num_points (int) – The number of points in the sorted minimum spanning forest.

  • min_cluster_size (float, default: 2.0) – The minimum size of a cluster.

  • max_cluster_size (float, default: inf) – The maximum size of a cluster.

  • persistence_measure (str, default: 'size') – Selects a persistence measure. Valid options are “size”, “distance”, “density”, “size-distance”, and “size-density”. The “size”, “distance”, and “density” options compute persistence as the range of size/distance/density values for which clusters are leaves. The “size-distance” and “size-density” options compute bi-persistence as the distance/density – minimum cluster size areas for which clusters are leaves. Density is computed as exp(-dist).

  • sample_weights (ndarray[tuple[int], dtype[single]] | None, default: None) – Sample weights for the points in the sorted minimum spanning tree. If None, all samples are considered equally weighted.

Return type:

tuple[Labelling, ndarray[tuple[int], dtype[uintc]], PersistenceTrace, LeafTree, CondensedTree, LinkageTree]

Returns:

  • labels – Essentially a tuple of cluster labels and membership probabilities for each point.

  • trace – A trace of the total (bi-)persistence per minimum cluster size.

  • leaf_tree – A leaf tree with cluster-leaves at minimum cluster sizes.

  • condensed_tree – A condensed tree with the cluster merge distances.

  • linkage_tree – A single linkage dendrogram of the sorted minimum spanning tree. (order matches the input sorted_mst!)