Skip to main content

localOutlierFactor

Annotates a dataframe with a column containing the local outlier factor for each data record. The dataframe should contain at least one GeometryType column. Rows must be unique. If one geometry column is present it will be used automatically. If two are present, the one named ‘geometry’ will be used. If more than one are present and neither is named ‘geometry’, the column name must be provided.
def localOutlierFactor(
      dataframe: DataFrame,
      k: Int = 20,
      geometry: String = null,
      approximateKNN: Boolean = false,
      handleTies: Boolean = false,
      useSphere: Boolean = false,
      resultColumnName: String = "lof"): DataFrame =

Parameters

dataframe
DataFrame
dataframe containing the point geometries
k
Int
number of nearest neighbors that will be considered for the LOF calculation
geometry
String
name of the geometry column
approximateKNN
Boolean
whether to use approximate KNN. When false will use exact KNN join. Default is false
handleTies
Boolean
whether to handle ties in the k-distance calculation. Default is false
useSphere
Boolean
whether to use a cartesian or spheroidal distance calculation. Default is false
resultColumnName
String
the name of the column containing the lof for each row. Default is “lof”

Returns

A DataFrame containing the lof for each row

Usage Examples

import org.apache.sedona.stats.clustering.LocalOutlierFactor

// Example usage
val result = LocalOutlierFactor.localOutlierFactor(dataframe, epsilon, minPts)