Introducing the STAC Reader for Wherobots

This table highlights key features available for each Organization tier:

Feature availability is indicated by ✅ (available) or ❌ (not available).

Feature	Available in Professional and Enterprise Editions	Available in Community Edition
STAC Reader	✅	✅

We are excited to announce the new STAC Reader data source in Wherobots! SpatioTemporal Asset Catalog (STAC) is a specification that standardizes the way geospatial assets are described and cataloged. This new reader loads STAC items and collections directly into Sedona DataFrames within the Wherobots environment.

Benefits

Direct STAC Integration: Easily access and load data from the growing ecosystem of STAC-compliant catalogs directly into your Wherobots workflows.
Unified Geospatial Analysis: Leverage Sedona’s powerful spatial and temporal data processing capabilities on data loaded from STAC sources.
Performance Optimization: The reader features Spatial and Temporal Filter Pushdown, allowing you to apply spatial (e.g., st_contains, st_intersects) or temporal (e.g., datetime BETWEEN) filters in your queries.
- These filters are pushed down to the STAC API level where they’re supported by the API.
- This significantly reduces the amount of data that needs to be transferred and processed, leading to faster query execution.
Flexible Data Access: Connect to STAC collections through an HTTP/HTTPS endpoint, an S3-compatible object store, or a local JSON file.

Key Considerations

Configuration Options: You can fine-tune the reader’s behavior using Sedona configuration properties (e.g., spark.sedona.stac.load.maxPartitionItemFiles, spark.sedona.stac.load.numPartitions) and reader options (e.g., itemsLimitMax, itemsLoadProcessReportThreshold, itemsLimitPerRequest).
- These configurations allow you to control partitioning, limit the number of items loaded, manage API request sizes, and monitor loading progress.
Resulting Schema: Data loaded via the STAC reader results in a Sedona DataFrame with a predefined schema reflecting STAC item properties, including metadata, geometry, timestamps, links, and assets.
API Limits: When accessing public or private STAC APIs, be mindful of potential rate limits or query constraints imposed by the API provider.
- The itemsLimitPerRequest option can help manage this.

Get Started

There are two main approaches for accessing and working with STAC data within the Wherobots environment.

Use the Spark data source reader:
1. Load the STAC data
2. Apply optional configuration
3. Query the loaded data.
Use the dedicated Python STAC Client.

1. Load STAC Data into a DataFrame

Use the stac format with sedona.read.

From an HTTP/HTTPS Endpoint
From an S3 File
From a Local File

df = sedona.read.format("stac").load("https://earth-search.aws.element84.com/v1/collections/sentinel-2-pre-c1-l2a")
df.printSchema()
df.show()

df = sedona.read.format("stac").load("s3a://example.com/stac_bucket/stac_collection.json")
df.printSchema()
df.show()

df = sedona.read.format("stac").load("/path/to/your/stac_collection.json")
df.printSchema()
df.show()

2. Apply Configuration Options

# Example applying reader options
df = sedona.read \
    .format("stac") \
    .option("itemsLimitMax", "1000") \
    .option("itemsLimitPerRequest", "50") \
    .load("https://earth-search.aws.element84.com/v1/collections/sentinel-2-pre-c1-l2a")

3. Query Loaded Data (Leveraging Filter Pushdown)

The following assumes the data is loaded into a table named STAC_TABLE.

Spatial Filter
Temporal Filter

SELECT id, geometry
FROM STAC_TABLE
WHERE st_contains(ST_GeomFromText('POLYGON((...))'), geometry)

SELECT id, datetime as dt, geometry
FROM STAC_TABLE
WHERE datetime BETWEEN '2022-01-01T00:00:00Z' AND '2022-12-31T23:59:59Z'

Use the Python STAC Client

You can also interact directly with STAC APIs using the provided Python client.

Python STAC Client Example

from sedona.spark import *
import datetime

# Initialize the client
client = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

# Search for items within a date range and get a DataFrame
df_items = client.search(
    collection_id="sentinel-2-l2a",
    datetime=["2023-01-01T00:00:00Z", "2023-01-05T00:00:00Z"],
    bbox=[-74, 40, -73, 41], # Example bbox for New York area
    return_dataframe=True,
    max_items=10
)

df_items.show()

# Save results to GeoParquet
client.get_collection("sentinel-2-l2a").save_to_geoparquet(
    output_path="/path/to/output/s2_data",
    bbox=[-74, 40, -73, 41],
    datetime="2023-01"
)

Read the Documentation

For comprehensive details on the STAC Reader, configuration options, the Python Client API, and examples, refer to the official Wherobots STAC Reader Documentation.

Additional Resources

STAC Specification: https://stacspec.org/
STAC Browser (Example Tool): https://github.com/radiantearth/stac-browser

Wherobots Cloud REST API

WherobotsDB

Vector tiles (PMTiles)

Havasu (Iceberg) table management

WherobotsAI

Rasterflow

Introducing the STAC Reader for Wherobots

Benefits

Key Considerations

Get Started

1. Load STAC Data into a DataFrame

2. Apply Configuration Options

3. Query Loaded Data (Leveraging Filter Pushdown)

Use the Python STAC Client

Read the Documentation

Additional Resources

Wherobots Cloud REST API

WherobotsDB

Vector tiles (PMTiles)

Havasu (Iceberg) table management

WherobotsAI

Rasterflow

​Benefits

​Key Considerations

​Get Started

​1. Load STAC Data into a DataFrame

​2. Apply Configuration Options

​3. Query Loaded Data (Leveraging Filter Pushdown)

​Use the Python STAC Client

​Read the Documentation

​Additional Resources

Benefits

Key Considerations

Get Started

1. Load STAC Data into a DataFrame

2. Apply Configuration Options

3. Query Loaded Data (Leveraging Filter Pushdown)

Use the Python STAC Client

Read the Documentation

Additional Resources