> ## Documentation Index
> Fetch the complete documentation index at: https://docs.wherobots.com/llms.txt
> Use this file to discover all available pages before exploring further.

# How to Write Effective Spatial Queries

> Best practices and common pitfalls for writing spatial queries using Wherobots

This guide covers best practices for writing spatial SQL queries in Wherobots, including proper setup, common pitfalls, and debugging techniques.

The examples use [Overture Maps Foundation](https://overturemaps.org/) data available through the Wherobots Open Data catalog but the principles apply to any spatial queries you write in Wherobots.

## Key principles

The following principles will help ensure your spatial queries run successfully in Wherobots:

### Always initialize the Sedona context

Every spatial query session in Wherobots must begin with proper Sedona context initialization:

```python theme={"system"}
from sedona.spark import *
from pyspark.sql.functions import expr

# Initialize Sedona Context
config = SedonaContext.builder().getOrCreate()
sedona = SedonaContext.create(config)
```

### Understand table schema before querying

Always verify the structure of tables you're querying, especially for complex data types like `STRUCT` and `ARRAY` columns.

<Warning>
  Many common query errors stem from assumptions about table schema. Always check the actual column names and data types before writing queries.
</Warning>

### Use spatial predicates for geographic filtering

Leverage spatial functions like `ST_Intersects` for precise geographic filtering instead of text-based approaches.

### Use `wkls` for geographic boundaries

The [`wkls`](/develop/wkls) library provides accurate WKT representations for countries, states, and regions. Use it to define geographic boundaries in spatial queries instead of hard-coding coordinates:

```python theme={"system"}
import wkls
usa_wkt = wkls.us.wkt()
```

<Tip>
  `wkls` comes **pre-installed in WherobotsDB**, so you do not need to install or import it separately if you're using [Wherobots Cloud](https://cloud.wherobots.com).
</Tip>

### Validate category values when filtering

When filtering on criteria that have a set of predefined valid values, always verify the exact values used in the dataset.
This helps ensure accurate results and prevents zero-result queries caused by mismatched category names.

For example, to find all baseball stadiums in the Overture dataset, you might assume that the category is `ballpark` or `baseball_field`, but the correct category is actually `baseball_stadium`.

The following query returns all of the baseball-related categories in the `places_place` dataset to help you understand the correct values to filter on:

```python theme={"system"}
# Find all baseball-related categories
sedona.sql("""
SELECT DISTINCT categories.primary
FROM wherobots_open_data.overture_maps_foundation.places_place
WHERE categories.primary LIKE '%baseball%' OR categories.primary LIKE '%stadium%'
""").show()
```

## Common pitfalls and solutions

The following sections highlight common mistakes when writing spatial queries in Wherobots and how to avoid them.

### Incorrect database names

Using the wrong database or table name is a common source of errors.

While you might assume the database is `overture`, the correct database for the Overture Maps Foundation data is `overture_maps_foundation`:

<Tabs>
  <Tab title="Incorrect">
    ```sql theme={"system"}
    -- Wrong database name
    SELECT COUNT(*)
    FROM wherobots_open_data.overture.places_place
    WHERE categories LIKE '%ballpark%'
    ```
  </Tab>

  <Tab title="Correct">
    ```sql theme={"system"}
    -- Correct database name
    SELECT COUNT(*)
    FROM wherobots_open_data.overture_maps_foundation.places_place
    WHERE categories.primary = 'baseball_stadium'
    ```
  </Tab>
</Tabs>

### Mishandling STRUCT columns

The `categories` column in many Overture datasets is a `STRUCT` with `primary` (string) and `alternate` (array of strings) fields.

Attempting to filter on `categories` as if it were a simple string will lead to errors or zero results.

<Tabs>
  <Tab title="Incorrect">
    ```sql theme={"system"}
    -- Treating STRUCT as string
    WHERE categories LIKE '%ballpark%'
    ```
  </Tab>

  <Tab title="Correct">
    ```sql theme={"system"}
    -- Properly accessing STRUCT fields
    WHERE (
      categories.primary = 'baseball_stadium'
      OR ARRAY_CONTAINS(categories.alternate, 'baseball_stadium')
    )
    ```
  </Tab>
</Tabs>

### Using non-existent columns

Assuming column names without checking the schema can lead to errors:

You must make sure to use the correct column names as defined in the dataset schema. For example, there is no `region` column in the `places_place` dataset, so filtering on `region = 'US'` will not work.

Instead, [use the `wkls` library](#use-wkls-for-geographic-boundaries) to get the correct geometry for the U.S. and use spatial predicates to filter:

<Tabs>
  <Tab title="Incorrect">
    ```sql theme={"system"}
    -- Non-existent 'region' column
    WHERE region = 'US'
    ```
  </Tab>

  <Tab title="Correct">
    ```sql theme={"system"}
    -- Use spatial intersection with actual geometry
    WHERE ST_Intersects(ST_GeomFromWKT('{usa_wkt}'), geometry)
    ```
  </Tab>
</Tabs>

### Unsupported functions

Some SQL functions that work in other databases may not be available in Apache Spark SQL.

<Tabs>
  <Tab title="Incorrect">
    ```sql theme={"system"}
    -- UNNEST() is not supported in Spark SQL
    FROM UNNEST(addresses) AS addr
    WHERE addr.country = 'US'
    ```
  </Tab>

  <Tab title="Correct">
    ```sql theme={"system"}
    -- Use spatial predicates instead
    WHERE ST_Intersects(ST_GeomFromWKT('{usa_wkt}'), geometry)
    ```
  </Tab>
</Tabs>

## Complete example: counting baseball stadiums in the U.S.

The following example brings the above best practices together into a complete, executable query:

```python theme={"system"}
from sedona.spark import *
from pyspark.sql.functions import expr
import wkls

# Initialize Sedona Context
config = SedonaContext.builder().getOrCreate()
sedona = SedonaContext.create(config)

# Get accurate U.S. boundary geometry using the wkls library
usa_wkt = wkls.us.wkt()

# Execute the spatial query
result = sedona.sql(f'''
SELECT COUNT(*) as stadium_count
FROM wherobots_open_data.overture_maps_foundation.places_place
WHERE (
  categories.primary = 'baseball_stadium'
  OR ARRAY_CONTAINS(categories.alternate, 'baseball_stadium')
)
AND ST_Intersects(ST_GeomFromWKT('{usa_wkt}'), geometry)
''')

# Extract and print the result
count = result.collect()[0][0]
print(f"Number of baseball stadiums in the U.S.: {count}")
```

### Why this query works

Understand why this query is structured the way it is:

<AccordionGroup>
  <Accordion title="Proper context initialization">
    Sets up Sedona for spatial operations.
  </Accordion>

  <Accordion title="Accurate geometry">
    Uses `wkls.us.wkt()` for precise U.S. boundaries.
  </Accordion>

  <Accordion title="Correct STRUCT handling">
    Accesses `categories.primary` and `categories.alternate` with dot notation.
  </Accordion>

  <Accordion title="Spatial precision">
    Uses `ST_Intersects` instead of text-based filtering.
  </Accordion>

  <Accordion title="Proper result extraction">
    Uses `.collect()[0][0]` to get the scalar count value.
  </Accordion>
</AccordionGroup>

## Debugging steps for query issues

If you encounter errors or unexpected results, follow these debugging steps:

### Verify table schema

Make sure you're using the correct database and table names, and check the schema to understand column types:

```python theme={"system"}
# Check table structure
sedona.sql("DESCRIBE wherobots_open_data.overture_maps_foundation.places_place").show()
```

### Examine sample data

Return a small sample of records to confirm that the data contains the expected fields and values:

```python theme={"system"}
# Look at sample records to understand data structure
sedona.sql("""
SELECT categories, addresses
FROM wherobots_open_data.overture_maps_foundation.places_place
LIMIT 5
""").show(truncate=False)
```

### Test categories structure

Understand how the `categories` STRUCT is organized to ensure you're accessing the correct fields:

```python theme={"system"}
# Understand how categories are structured
sedona.sql("""
SELECT categories.primary, categories.alternate
FROM wherobots_open_data.overture_maps_foundation.places_place
WHERE categories.primary IS NOT NULL
LIMIT 10
""").show()
```

### Validate spatial operations

Confirm that your spatial predicates are working as expected by testing them with known geometries:

```python theme={"system"}
import wkls

# Test spatial intersection without filtering first
usa_wkt = wkls.us.wkt()
sedona.sql(f"""
SELECT COUNT(*) as total_places_in_us
FROM wherobots_open_data.overture_maps_foundation.places_place
WHERE ST_Intersects(ST_GeomFromWKT('{usa_wkt}'), geometry)
""").show()
```

## Best Practices Summary

Review the following best practices and common mistakes to ensure that your spatial queries run smoothly in Wherobots:

<Tabs>
  <Tab title="Do">
    The following best practices will help ensure your spatial queries run successfully in Wherobots:

    * Use the correct, fully-qualified database names (e.g., `overture_maps_foundation`).
    * Access `STRUCT` fields with dot notation (e.g., `categories.primary`).
    * Use `ARRAY_CONTAINS` to search within array fields.
    * Use spatial predicates like `ST_Intersects` for geographic filtering.
    * Always initialize the Sedona context before running spatial operations.
    * Use `.collect()[0][0]` to extract a single scalar result.
    * Leverage the `wkls` library for accurate geographic boundaries.
    * Test queries incrementally to isolate issues.
  </Tab>

  <Tab title="Don't">
    The following common mistakes can lead to query errors or incorrect results:

    * Assume column names without checking the table schema.
    * Treat `STRUCT` columns as plain strings (e.g., `WHERE categories LIKE ...`).
    * Use functions unsupported in Spark SQL, such as `UNNEST()`.
    * Rely on text-based geographic filtering instead of spatial predicates.
    * Forget to initialize the Sedona context.
    * Use incorrect argument order in spatial functions.
  </Tab>
</Tabs>
