> ## Documentation Index > Fetch the complete documentation index at: https://docs.wherobots.com/llms.txt > Use this file to discover all available pages before exploring further. # DBSCAN Python Module > DBSCAN is a density-based algorithm ideal for spatial data. It groups records that are closely packed together, marking records in low-density regions as noise. This implementation is built on Apache Spark, Apache Sedona, and GraphFrames to support large-scale datasets and heterogeneous GeometryType features. ## dbscan Annotates a DataFrame with a cluster label for each record using the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm. ```python theme={"system"} dbscan( dataframe: DataFrame, epsilon: float, min_pts: int, geometry: Optional[str], include_outliers: bool, use_sphere: bool, is_core_column_name: str, cluster_column_name: str ) ``` ### Parameters Spark dataframe containing the geometries. The input DataFrame must contain at least one GeometryType column. All rows in the DataFrame must be unique. minimum distance parameter of DBSCAN algorithm minimum number of points parameter of DBSCAN algorithm name of the geometry column If not provided, the algorithm automatically selects the geometry column to use based on the following rules: * If only one `GeometryType` column exists, it is used. * If multiple `GeometryType` columns exist and one is named geometry, it is used. * If multiple `GeometryType` columns exist and none are named geometry, the column name must be explicitly provided as a parameter. whether to return outlier points. If True, outliers are returned with a cluster value of -1. whether to use a cartesian or sphere distance calculation. Default is false what the name of the column indicating if this is a core point should be. Default is "isCore" what the name of the column indicating the cluster id should be. Default is "cluster" ### Returns A PySpark DataFrame containing the cluster label for each row. ### Usage Examples ```python theme={"system"} from dbscan import * # Example usage of dbscan result = dbscan( dataframe=EXAMPLE_NAME, epsilon=EXAMPLE_FLOAT_VALUE, min_pts=EXAMPLE_INT_VALUE ) ``` ## get\_knee\_locator Create a `KneeLocator` to select an epsilon value for passing into a DBSCAN execution. This function implements a common heuristic for `epsilon` selection by finding the "knee" of the k-distance plot. It operates as follows: 1. Calculates the distance to the k-th nearest neighbor for a random sample of records (up to `max_sample_size`). 2. The value `k` is set to the `min_points` value. 3. These distances are sorted, and their values (y-axis) are plotted against their sorted index (x-axis). 4. The resulting plot is fed into a `kneed.KneeLocator` object to find the point of maximum curvature (the "knee"). The `knee` attribute of the returned `KneeLocator` object is the suggested `epsilon` value. This method is a heuristic and is not foolproof. The calculated knee may not be optimal for all datasets. It is **strongly recommended** to visualize the knee plot (e.g., using the `KneeLocator`'s built-in plotting methods) to manually sanity-check the selected `epsilon` value before proceeding. ### Parameters ```python theme={"system"} get_knee_locator( dataframe: DataFrame, min_points: int, geometry: Optional[str], approximate_knn: bool, use_sphere: bool, max_sample_size: Optional[int] ) -> KneeLocator ``` apache sedona dataframe containing the geometries. This should be the same dataframe you intend to the min points parameter you intend to pass to the dbscan function. This will impact the epsilon name of the geometry column whether to use approximate KNN. When false will use exact KNN join. Default is False whether to use a cartesian or sphere distance calculation. False will use Cartesian. Default the maximum number of records from dataframe to use when calculating the knee. If the ### Returns A KneeLocator object derived from the input DataFrame, downsampled to approximately max\_sample\_size records. Retrieve the recommended epsilon value with the return value's knee\_y property.