Skip to content

schc

Experimental Feature. Spatially constrained hierarchical clustering (SCHC) algorithm implementation.

schc(geo_df, feature_attrs, geo_column, id_column, target_count)

Performs spatially constrained hierarchical clustering of a PySpark spatial DataFrame.

Parameters:

Name Type Description Default
geo_df

pyspark dataframe containing the feature columns on which clustering will be performed

required
feature_attrs

a Python list containing the column names which are considered as features

required
geo_column

name of the column that contains spatial geometry objects

required
id_column

name of the column that contains the ids of the spatial locations

required
target_count

target number of clusters

required

Returns:

Type Description

a PySpark DataFrame containing the cluster labels for each data record