What you will learn
This notebook will teach you to:- Perform standard spatial joins — identifying features within other geometries
- Execute nearest neighbor joins — finding the closest feature between datasets
Loading datasets for a spatial join
Spatial joins use the relationship of two columns withgeometry types, examining their relationship in space. For example, you could join line-shaped delivery route data with point coordinates of customer locations to optimize logistics, or join building polygons with flood zone polygons to analyze risk.
In this example, we are going to combine data from two tables in the Wherobots Open Data catalog.
- Polygons that are administrative boundaries of US cities and towns from the Overture Maps Foundation (OMF)
- Points that are places of interest from Foursquare
Points within polygons with ST_Intersects
With both datasets loaded, we can now join them based on their spatial relationship. In this case, we want to find which places of interest (points) fall within each administrative boundary (polygons). We use theST_Intersects function to check if a point’s geometry is inside or directly on a boundary’s geometry.
The spatial join keeps only the pairs of points and polygons where their geometries intersect, and the resulting points DataFrame will include columns for each administrative boundary that it intersects.
Spatial join and aggregate points within polygons
After performing a spatial join, a common analysis is to count how many points fall within each polygon. We can perform this in a single operation by combining the spatial join with agroupBy and the COUNT aggregation.
This query joins the polygons and points, groups the results by the polygon ID, and counts the matching points.
Nearest-neighbor spatial join
In some cases, you may want to find the closest feature from another dataset — such as identifying the nearest city for each point of interest.A nearest neighbor join finds the closest points or polygons based on geographic proximity. (Docs: K-Nearest Neighbor Joins)Wherobots has two functions for finding nearest neighbors using a k-nearest-neighbors approach.
ST_KNN returns the exact nearest neighbors, while ST_AKNN trades off some accuracy for speed by using approximate algorithms. In this case, we will join the point data to the polygons using ST_KNN to find the 4 polygons with centroids nearest to each point.
We are passing four parameters to this function:
- R: Table of query geometry, which are our points of interest
- S: Tabke of object geometry, the centroids of the localities
- k: The number of neighbors to find for each object in the query geometry
- use_sphere: A boolean whether to use a spherical model instead of a planar distance model

