> ## Documentation Index
> Fetch the complete documentation index at: https://docs.wherobots.com/llms.txt
> Use this file to discover all available pages before exploring further.

# DBSCAN Scala Object

> DBSCAN is a density-based algorithm ideal for spatial data. It groups records that are closely packed together, marking records in low-density regions as noise. This implementation is built on Apache Spark, Apache Sedona, and GraphFrames to support large-scale datasets and heterogeneous GeometryType features.

Annotates a dataframe with a cluster label for each data record using the DBSCAN algorithm.

```scala theme={"system"}
def dbscan(
      dataframe: DataFrame,
      epsilon: Double,
      minPts: Int,
      geometry: String = null,
      includeOutliers: Boolean = true,
      useSpheroid: Boolean = false,
      isCoreColumnName: String = "isCore",
      clusterColumnName: String = "cluster"): DataFrame =
```

## Parameters

<ParamField path="dataframe" type="DataFrame">
  dataframe to cluster. Must contain at least one GeometryType column
</ParamField>

<ParamField path="epsilon" type="Double">
  minimum distance parameter of DBSCAN algorithm
</ParamField>

<ParamField path="minPts" type="Int">
  minimum number of points parameter of DBSCAN algorithm
</ParamField>

<ParamField path="geometry" type="String">
  name of the geometry column

  The dataframe should contain at least one `GeometryType` column. Rows must be unique.
  If one geometry column is present it will be used automatically.
  If two are present, the one named `'geometry'` will be used. If more than one are present
  and neither is named `'geometry'`, the column name must be provided. The new column will be named `'cluster'`.
</ParamField>

<ParamField path="includeOutliers" type="Boolean">
  whether to include outliers in the output. Default is false
</ParamField>

<ParamField path="useSpheroid" type="Boolean">
  whether to use a cartesian or spheroidal distance calculation. Default is false
</ParamField>

<ParamField path="isCoreColumnName" type="String">
  what the name of the column indicating if this is a core point should be. Default is "isCore"
</ParamField>

<ParamField path="clusterColumnName" type="String">
  what the name of the column indicating the cluster id should be. Default is "cluster"
</ParamField>

## Returns

The input DataFrame with the cluster label added to each row. Outlier will have a cluster
value of `-1` if included.

## Usage Example

```scala theme={"system"}
import org.apache.sedona.stats.clustering.DBSCAN

// Example usage
val result = DBSCAN.dbscan(dataframe, epsilon, minPts)
```
