Skip to main content
This guide covers best practices for writing spatial SQL queries in Wherobots, including proper setup, common pitfalls, and debugging techniques. The examples use Overture Maps Foundation data available through the Wherobots Open Data catalog but the principles apply to any spatial queries you write in Wherobots.

Key principles

The following principles will help ensure your spatial queries run successfully in Wherobots:

Always initialize the Sedona context

Every spatial query session in Wherobots must begin with proper Sedona context initialization:
from sedona.spark import *
from pyspark.sql.functions import expr

# Initialize Sedona Context
config = SedonaContext.builder().getOrCreate()
sedona = SedonaContext.create(config)

Understand table schema before querying

Always verify the structure of tables you’re querying, especially for complex data types like STRUCT and ARRAY columns.
Many common query errors stem from assumptions about table schema. Always check the actual column names and data types before writing queries.

Use spatial predicates for geographic filtering

Leverage spatial functions like ST_Intersects for precise geographic filtering instead of text-based approaches.

Use wkls for geographic boundaries

The wkls library provides accurate WKT representations for countries, states, and regions. Use it to define geographic boundaries in spatial queries instead of hard-coding coordinates:
import wkls
usa_wkt = wkls.us.wkt()

Validate category values when filtering

When filtering on criteria that have a set of predefined valid values, always verify the exact values used in the dataset. This helps ensure accurate results and prevents zero-result queries caused by mismatched category names. For example, to find all baseball stadiums in the Overture dataset, you might assume that the category is ballpark or baseball_field, but the correct category is actually baseball_stadium. The following query returns all of the baseball-related categories in the places_place dataset to help you understand the correct values to filter on:
# Find all baseball-related categories
sedona.sql("""
SELECT DISTINCT categories.primary
FROM wherobots_open_data.overture_maps_foundation.places_place
WHERE categories.primary LIKE '%baseball%' OR categories.primary LIKE '%stadium%'
""").show()

Common pitfalls and solutions

The following sections highlight common mistakes when writing spatial queries in Wherobots and how to avoid them.

Incorrect database names

Using the wrong database or table name is a common source of errors. While you might assume the database is overture, the correct database for the Overture Maps Foundation data is overture_maps_foundation:
-- Wrong database name
SELECT COUNT(*)
FROM wherobots_open_data.overture.places_place
WHERE categories LIKE '%ballpark%'

Mishandling STRUCT columns

The categories column in many Overture datasets is a STRUCT with primary (string) and alternate (array of strings) fields. Attempting to filter on categories as if it were a simple string will lead to errors or zero results.
-- Treating STRUCT as string
WHERE categories LIKE '%ballpark%'

Using non-existent columns

Assuming column names without checking the schema can lead to errors: You must make sure to use the correct column names as defined in the dataset schema. For example, there is no region column in the places_place dataset, so filtering on region = 'US' will not work. Instead, use the wkls library to get the correct geometry for the U.S. and use spatial predicates to filter:
-- Non-existent 'region' column
WHERE region = 'US'

Unsupported functions

Some SQL functions that work in other databases may not be available in Apache Spark SQL.
-- UNNEST() is not supported in Spark SQL
FROM UNNEST(addresses) AS addr
WHERE addr.country = 'US'

Complete example: counting baseball stadiums in the U.S.

The following example brings the above best practices together into a complete, executable query:
from sedona.spark import *
from pyspark.sql.functions import expr
import wkls

# Initialize Sedona Context
config = SedonaContext.builder().getOrCreate()
sedona = SedonaContext.create(config)

# Get accurate U.S. boundary geometry using the wkls library
usa_wkt = wkls.us.wkt()

# Execute the spatial query
result = sedona.sql(f'''
SELECT COUNT(*) as stadium_count
FROM wherobots_open_data.overture_maps_foundation.places_place
WHERE (
  categories.primary = 'baseball_stadium'
  OR ARRAY_CONTAINS(categories.alternate, 'baseball_stadium')
)
AND ST_Intersects(ST_GeomFromWKT('{usa_wkt}'), geometry)
''')

# Extract and print the result
count = result.collect()[0][0]
print(f"Number of baseball stadiums in the U.S.: {count}")

Why this query works

Understand why this query is structured the way it is:
Sets up Sedona for spatial operations.
Uses wkls.us.wkt() for precise U.S. boundaries.
Accesses categories.primary and categories.alternate with dot notation.
Uses ST_Intersects instead of text-based filtering.
Uses .collect()[0][0] to get the scalar count value.

Debugging steps for query issues

If you encounter errors or unexpected results, follow these debugging steps:

Verify table schema

Make sure you’re using the correct database and table names, and check the schema to understand column types:
# Check table structure
sedona.sql("DESCRIBE wherobots_open_data.overture_maps_foundation.places_place").show()

Examine sample data

Return a small sample of records to confirm that the data contains the expected fields and values:
# Look at sample records to understand data structure
sedona.sql("""
SELECT categories, addresses
FROM wherobots_open_data.overture_maps_foundation.places_place
LIMIT 5
""").show(truncate=False)

Test categories structure

Understand how the categories STRUCT is organized to ensure you’re accessing the correct fields:
# Understand how categories are structured
sedona.sql("""
SELECT categories.primary, categories.alternate
FROM wherobots_open_data.overture_maps_foundation.places_place
WHERE categories.primary IS NOT NULL
LIMIT 10
""").show()

Validate spatial operations

Confirm that your spatial predicates are working as expected by testing them with known geometries:
import wkls

# Test spatial intersection without filtering first
usa_wkt = wkls.us.wkt()
sedona.sql(f"""
SELECT COUNT(*) as total_places_in_us
FROM wherobots_open_data.overture_maps_foundation.places_place
WHERE ST_Intersects(ST_GeomFromWKT('{usa_wkt}'), geometry)
""").show()

Best Practices Summary

Review the following best practices and common mistakes to ensure that your spatial queries run smoothly in Wherobots:
The following best practices will help ensure your spatial queries run successfully in Wherobots:
  • Use the correct, fully-qualified database names (e.g., overture_maps_foundation).
  • Access STRUCT fields with dot notation (e.g., categories.primary).
  • Use ARRAY_CONTAINS to search within array fields.
  • Use spatial predicates like ST_Intersects for geographic filtering.
  • Always initialize the Sedona context before running spatial operations.
  • Use .collect()[0][0] to extract a single scalar result.
  • Leverage the wkls library for accurate geographic boundaries.
  • Test queries incrementally to isolate issues.