DBSCAN is a popular algorithm for finding clusters of spatial data. It identifies core points that have enough (defined by the user) neighbors within some distance (also user defined). Points that are not core points but are within the distance of a core point are considered border points of the cluster. Points that are not core points and are not within the distance of a core point are considered outliers and not part of any cluster. The algorithm requires two parameters:Documentation Index
Fetch the complete documentation index at: https://docs.wherobots.com/llms.txt
Use this file to discover all available pages before exploring further.
epsilon- The farthest apart two points can be while still being considered connected or related.epsilonmust be a positive double float.minPoints- The minimum number of neighbor points (as determined by epsilon). A point needsminPointsneighbors to be considered a core point.minPointsmust be a positive integer.
Example overview
In this example, we will generate some random data and use DBSCAN to cluster that data. Then, we’ll visualize the clusters using a scatter plot. This demo is derived from the scikit-learn DBSCAN demo.Define Sedona Context
Data Generation
In the following code section, we’ll generate some data using sklearn’smake_blobs function. We’ve set the data to consist of 750 points with 3 clusters. After clustering the data, we’ll visualize it in pyplot.
Clustering
In the following section, we’ll use the DBSCAN implementation in Wherobots to cluster the data in a dataframe, settingepsilon to 0.3 and minPoints to 10.
Wherobots’ DBSCAN returns outliers by default.

