Skip to main content
The weighting module provides functions for creating spatial weight matrices that define neighborhood relationships between spatial features. These weights are commonly used in spatial autocorrelation analysis, hotspot detection, and other spatial statistics operations.

add_binary_distance_band_column()

Annotates a dataframe with a weights column containing the other records within the threshold and their weight.
def add_binary_distance_band_column(
    dataframe: DataFrame,
    threshold: float,
    include_zero_distance_neighbors: bool = True,
    include_self: bool = False,
    geometry: Optional[str] = None,
    use_spheroid: bool = False,
    saved_attributes: Optional[List[str]] = None,
    result_name: str = 'weights'
) -> DataFrame
Weights will always be 1.0. The dataframe should contain at least one GeometryType column. Rows must be unique. If one geometry column is present it will be used automatically. If two are present, the one named ‘geometry’ will be used. If more than one are present and neither is named ‘geometry’, the column name must be provided.

Parameters

dataframe
DataFrame
required
DataFrame with geometry column
threshold
float
required
Distance threshold for considering neighbors
include_zero_distance_neighbors
bool
default:"True"
Whether to include neighbors that are 0 distance. If 0 distance neighbors are included and binary is false, values are infinity as per the floating point spec (divide by 0)
include_self
bool
default:"False"
Whether to include self in the list of neighbors
geometry
Optional[str]
default:"None"
Name of the geometry column
use_spheroid
bool
default:"False"
Whether to use a cartesian or spheroidal distance calculation. Default is false
saved_attributes
Optional[List[str]]
default:"None"
The attributes to save in the neighbor column. Default is all columns
result_name
str
default:"'weights'"
The name of the resulting column. Default is ‘weights’

Returns

The input DataFrame with a weight column added containing neighbors and their weights (always 1) added to each row

add_distance_band_column()

Annotates a dataframe with a weights column containing the other records within the threshold and their weight.
def add_distance_band_column(
    dataframe: DataFrame,
    threshold: float,
    binary: bool = True,
    alpha: float = -1.0,
    include_zero_distance_neighbors: bool = False,
    include_self: bool = False,
    self_weight: float = 1.0,
    geometry: Optional[str] = None,
    use_spheroid: bool = False,
    saved_attributes: Optional[List[str]] = None,
    result_name: str = 'weights'
) -> DataFrame
The dataframe should contain at least one GeometryType column. Rows must be unique. If one geometry column is present it will be used automatically. If two are present, the one named ‘geometry’ will be used. If more than one are present and neither is named ‘geometry’, the column name must be provided.

Parameters

dataframe
DataFrame
required
DataFrame with geometry column
threshold
float
required
Distance threshold for considering neighbors
binary
bool
default:"True"
Whether to use binary weights or inverse distance weights for neighbors (dist^alpha)
alpha
float
default:"-1.0"
Alpha to use for inverse distance weights ignored when binary is true
include_zero_distance_neighbors
bool
default:"False"
Whether to include neighbors that are 0 distance. If 0 distance neighbors are included and binary is false, values are infinity as per the floating point spec (divide by 0)
include_self
bool
default:"False"
Whether to include self in the list of neighbors
self_weight
float
default:"1.0"
The value to use for the self weight
geometry
Optional[str]
default:"None"
Name of the geometry column
use_spheroid
bool
default:"False"
Whether to use a cartesian or spheroidal distance calculation. Default is false
saved_attributes
Optional[List[str]]
default:"None"
The attributes to save in the neighbor column. Default is all columns
result_name
str
default:"'weights'"
The name of the resulting column. Default is ‘weights’

Returns

The input DataFrame with a weight column added containing neighbors and their weights added to each row

add_weighted_distance_band_column()

Annotates a dataframe with a weights column containing the other records within the threshold and their weight.
def add_weighted_distance_band_column(
    dataframe: DataFrame,
    threshold: float,
    alpha: float,
    include_zero_distance_neighbors: bool = True,
    include_self: bool = False,
    self_weight: float = 1.0,
    geometry: Optional[str] = None,
    use_spheroid: bool = False,
    saved_attributes: Optional[List[str]] = None,
    result_name: str = 'weights'
) -> DataFrame
Weights will be distance^alpha. The dataframe should contain at least one GeometryType column. Rows must be unique. If one geometry column is present it will be used automatically. If two are present, the one named ‘geometry’ will be used. If more than one are present and neither is named ‘geometry’, the column name must be provided.

Parameters

dataframe
DataFrame
required
DataFrame with geometry column
threshold
float
required
Distance threshold for considering neighbors
alpha
float
required
Alpha to use for inverse distance weights. Computation is dist^alpha. Default is -1.0
include_zero_distance_neighbors
bool
default:"True"
Whether to include neighbors that are 0 distance. If 0 distance neighbors are included and binary is false, values are infinity as per the floating point spec (divide by 0)
include_self
bool
default:"False"
Whether to include self in the list of neighbors
self_weight
float
default:"1.0"
The value to use for the self weight. Default is 1.0
geometry
Optional[str]
default:"None"
Name of the geometry column
use_spheroid
bool
default:"False"
Whether to use a cartesian or spheroidal distance calculation. Default is false
saved_attributes
Optional[List[str]]
default:"None"
The attributes to save in the neighbor column. Default is all columns
result_name
str
default:"'weights'"
The name of the resulting column. Default is ‘weights’

Returns

The input DataFrame with a weight column added containing neighbors and their weights (dist^alpha) added to each row

Usage Examples

from sedona.spark.stats.weighting import (
    add_binary_distance_band_column,
    add_distance_band_column,
    add_weighted_distance_band_column
)

# Binary distance band weighting (weights are always 1.0)
binary_weights_df = add_binary_distance_band_column(
    dataframe=spatial_df,
    threshold=1000.0,
    include_zero_distance_neighbors=True,
    include_self=False,
    geometry="geometry",
    use_spheroid=False,
    result_name="binary_weights"
)

# Distance band with binary or inverse distance weights
distance_weights_df = add_distance_band_column(
    dataframe=spatial_df,
    threshold=1000.0,
    binary=False,
    alpha=-1.0,
    include_zero_distance_neighbors=False,
    include_self=True,
    self_weight=1.0,
    geometry="geometry",
    use_spheroid=False,
    result_name="distance_weights"
)

# Weighted distance band with inverse distance weights
weighted_distance_df = add_weighted_distance_band_column(
    dataframe=spatial_df,
    threshold=1000.0,
    alpha=-2.0,
    include_zero_distance_neighbors=False,
    include_self=True,
    self_weight=1.0,
    geometry="geometry",
    use_spheroid=True,
    result_name="weighted_weights"
)

# Using specific saved attributes
limited_weights_df = add_distance_band_column(
    dataframe=spatial_df,
    threshold=500.0,
    saved_attributes=["id", "name", "value"],
    result_name="neighbor_weights"
)

Weight Types

Binary Weights

Binary weights assign a value of 1.0 to all neighbors within the threshold distance and 0.0 to all others. This is the simplest form of spatial weighting and is created using add_binary_distance_band_column().

Inverse Distance Weights

Inverse distance weights use the formula dist^alpha where alpha is typically negative (e.g., -1.0 or -2.0). Closer neighbors receive higher weights, and farther neighbors receive lower weights. Use add_weighted_distance_band_column() for this type.

Flexible Distance Band Weights

The add_distance_band_column() function provides flexibility to choose between binary or inverse distance weighting based on the binary parameter, making it the most versatile option.

Notes

  • All functions require a DataFrame with at least one geometry column
  • Rows in the DataFrame must be unique
  • If multiple geometry columns exist and none is named ‘geometry’, the column name must be specified
  • The use_spheroid parameter determines whether to use Cartesian (planar) or spheroidal (great circle) distance calculations
  • Zero distance neighbors can cause infinite weights when using inverse distance weighting (binary=False)
  • The saved_attributes parameter allows you to control which columns are preserved in the neighbor information