Key principles
The following principles will help ensure your spatial queries run successfully in Wherobots:Always initialize the Sedona context
Every spatial query session in Wherobots must begin with proper Sedona context initialization:Understand table schema before querying
Always verify the structure of tables you’re querying, especially for complex data types likeSTRUCT and ARRAY columns.
Use spatial predicates for geographic filtering
Leverage spatial functions likeST_Intersects for precise geographic filtering instead of text-based approaches.
Use wkls for geographic boundaries
The wkls library provides accurate WKT representations for countries, states, and regions. Use it to define geographic boundaries in spatial queries instead of hard-coding coordinates:
Validate category values when filtering
When filtering on criteria that have a set of predefined valid values, always verify the exact values used in the dataset. This helps ensure accurate results and prevents zero-result queries caused by mismatched category names. For example, to find all baseball stadiums in the Overture dataset, you might assume that the category isballpark or baseball_field, but the correct category is actually baseball_stadium.
The following query returns all of the baseball-related categories in the places_place dataset to help you understand the correct values to filter on:
Common pitfalls and solutions
The following sections highlight common mistakes when writing spatial queries in Wherobots and how to avoid them.Incorrect database names
Using the wrong database or table name is a common source of errors. While you might assume the database isoverture, the correct database for the Overture Maps Foundation data is overture_maps_foundation:
- Incorrect
- Correct
Mishandling STRUCT columns
Thecategories column in many Overture datasets is a STRUCT with primary (string) and alternate (array of strings) fields.
Attempting to filter on categories as if it were a simple string will lead to errors or zero results.
- Incorrect
- Correct
Using non-existent columns
Assuming column names without checking the schema can lead to errors: You must make sure to use the correct column names as defined in the dataset schema. For example, there is noregion column in the places_place dataset, so filtering on region = 'US' will not work.
Instead, use the wkls library to get the correct geometry for the U.S. and use spatial predicates to filter:
- Incorrect
- Correct
Unsupported functions
Some SQL functions that work in other databases may not be available in Apache Spark SQL.- Incorrect
- Correct
Complete example: counting baseball stadiums in the U.S.
The following example brings the above best practices together into a complete, executable query:Why this query works
Understand why this query is structured the way it is:Proper context initialization
Proper context initialization
Sets up Sedona for spatial operations.
Accurate geometry
Accurate geometry
Uses
wkls.us.wkt() for precise U.S. boundaries.Correct STRUCT handling
Correct STRUCT handling
Accesses
categories.primary and categories.alternate with dot notation.Spatial precision
Spatial precision
Uses
ST_Intersects instead of text-based filtering.Proper result extraction
Proper result extraction
Uses
.collect()[0][0] to get the scalar count value.Debugging steps for query issues
If you encounter errors or unexpected results, follow these debugging steps:Verify table schema
Make sure you’re using the correct database and table names, and check the schema to understand column types:Examine sample data
Return a small sample of records to confirm that the data contains the expected fields and values:Test categories structure
Understand how thecategories STRUCT is organized to ensure you’re accessing the correct fields:
Validate spatial operations
Confirm that your spatial predicates are working as expected by testing them with known geometries:Best Practices Summary
Review the following best practices and common mistakes to ensure that your spatial queries run smoothly in Wherobots:- Do
- Don't
The following best practices will help ensure your spatial queries run successfully in Wherobots:
- Use the correct, fully-qualified database names (e.g.,
overture_maps_foundation). - Access
STRUCTfields with dot notation (e.g.,categories.primary). - Use
ARRAY_CONTAINSto search within array fields. - Use spatial predicates like
ST_Intersectsfor geographic filtering. - Always initialize the Sedona context before running spatial operations.
- Use
.collect()[0][0]to extract a single scalar result. - Leverage the
wklslibrary for accurate geographic boundaries. - Test queries incrementally to isolate issues.

