Persistent Leaves Spatial Clustering of Applications with Noise¶
PLSCAN is a refinement of HDBSCAN* for practical density-based clustering. The primary advantages of PLSCAN over the hdbscan and fast_hdbscan libraries are:
PLSCAN automatically finds the optimal minimum cluster size.
PLSCAN can easily use all available cores to speed up computation;
PLSCAN has much faster implementations of tree condensing and cluster extraction;
PLSCAN does not rely on JIT compilation.
Using PLSCAN requires minimal tuning. The min_samples parameter (same role
as in HDBSCAN*) controls how many neighbors are used in mutual reachability
distance calculations. In contrast, the minimum cluster size threshold that
controls cluster granularity is selected automatically by PLSCAN.
Increasing min_samples generally yields fewer, smoother, and more stable
clusters by requiring stronger local density support.
import numpy as np
import matplotlib.pyplot as plt
from fast_plscan import PLSCAN
data = np.load("docs/data/data.npy")
clusterer = PLSCAN(
min_samples = 5, # same as in HDBSCAN
).fit(data)
plt.figure()
plt.scatter(
*data.T, c=clusterer.labels_ % 10, s=5, alpha=0.5,
edgecolor="none", cmap="tab10", vmin=0, vmax=9
)
plt.axis("off")
plt.subplots_adjust(left=0, right=1, top=1, bottom=0)
plt.show()
The algorithm creates a hierarchy of leaf clusters by identifying the minimum
cluster size threshold at which local density maxima change. A leaf cluster is a
cluster with no child clusters in the density cluster hierarchy. As the
threshold changes, leaf clusters appear and disappear. PLSCAN measures
persistence, the range of threshold values over which each leaf cluster remains
alive, and selects the threshold with the highest total persistence. You can
visualize this hierarchy using the leaf_tree_ attribute, which provides an
alternative to HDBSCAN*’s condensed tree visualization.
clusterer.leaf_tree_.plot(leaf_separation=0.1)
plt.show()
You can also explore how the clustering changes for other important values of
the minimum cluster size. The cluster_layers method automatically finds the
most persistent clusterings and returns their cluster labels and membership
strengths.
layers = clusterer.cluster_layers(max_peaks=4)
for i, (size, labels, probs) in enumerate(layers):
plt.subplot(2, 2, i + 1)
plt.scatter(
*data.T,
c=labels % 10,
alpha=np.maximum(0.1, probs),
s=1,
linewidth=0,
cmap="tab10",
)
plt.title(f"min_cluster_size={int(size)}")
plt.axis("off")
plt.subplots_adjust(left=0, right=1, top=1, bottom=0)
plt.show()
Installation instructions¶
Pre-build binaries are available on pypi, so the package can be installed with pip and similar package managers on most systems:
pip install fast-plscan
Conda forge builds are also available:
conda install conda-forge::fast-plscan
See our documentation for instructions on compiling the package locally.
Citing¶
When using this work, please cite our preprint:
@misc{bot2025plscan,
title = {Persistent Multiscale Density-based Clustering},
author = {Dani{\"{e}}l Bot and Leland McInnes and Jan Aerts},
year = {2025},
eprint = {2512.16558},
archiveprefix = {arXiv},
primaryclass = {cs.LG},
url = {https://arxiv.org/abs/2512.16558}
}
Licensing¶
The fast-plscan package has a 3-Clause BSD license.