Skip to main content

spatiallyStratifiedSample

Spatially stratified sampling of a DataFrame containing spatial data. DataFrame containing spatial data to be sampled. Must contain a geometryColumn column. SpatiallyStratifiedSampling rate between 0 and 1 Number of partitions to divide the data into. If not a perfect square, the number of partitions in each dimension will be rounded to the nearest integer. Column containing the geometry data. Default is “geometry” Seed for sampling the data
def spatiallyStratifiedSample(
      dataframe: Dataset[Row],
      fraction: Double,
      partitionCount: Int,
      geometry: String = null,
      seed: Long = 42): Dataset[Row] =

Parameters

dataframe
Dataset[Row]
DataFrame containing spatial data to be sampled. Must contain a geometryColumn column.
fraction
Double
SpatiallyStratifiedSampling rate between 0 and 1
partitionCount
Int
Number of partitions to divide the data into. If not a perfect square, the number of partitions in each dimension will be rounded to the nearest integer.
geometry
String
Column containing the geometry data. Default is “geometry”
seed
Long
Seed for sampling the data

Returns

The input DataFrame sampled down to the specified rate

Usage Examples

import org.apache.sedona.stats.clustering.SpatiallyStratifiedSampling

// Example usage
val result = SpatiallyStratifiedSampling.spatiallyStratifiedSample(dataframe, epsilon, minPts)