Skip to content

STAC Reader

This table highlights key features available for each Organization tier1:

Feature Available in Professional
and Enterprise Editions
Available in
Community Edition
STAC Reader

We are excited to announce the new STAC Reader data source in Wherobots!

SpatioTemporal Asset Catalog (STAC) is a specification that standardizes the way geospatial assets are described and cataloged.

This new reader loads STAC items and collections directly into Sedona DataFrames within the Wherobots environment.

Benefits

  • Direct STAC Integration: Easily access and load data from the growing ecosystem of STAC-compliant catalogs directly into your Wherobots workflows.
  • Unified Geospatial Analysis: Leverage Sedona's powerful spatial and temporal data processing capabilities on data loaded from STAC sources.
  • Performance Optimization: The reader features Spatial and Temporal Filter Pushdown, allowing you to apply spatial (e.g., st_contains, st_intersects) or temporal (e.g., datetime BETWEEN) filters in your queries.
    • These filters are pushed down to the STAC API level where they're supported by the API.
    • This significantly reduces the amount of data that needs to be transferred and processed, leading to faster query execution.
  • Flexible Data Access: Connect to STAC collections through an HTTP/HTTPS endpoint, an S3-compatible object store, or a local JSON file.

Key Considerations

  • Configuration Options: You can fine-tune the reader's behavior using Sedona configuration properties (e.g., spark.sedona.stac.load.maxPartitionItemFiles, spark.sedona.stac.load.numPartitions) and reader options (e.g., itemsLimitMax, itemsLoadProcessReportThreshold, itemsLimitPerRequest).
    • These configurations allow you to control partitioning, limit the number of items loaded, manage API request sizes, and monitor loading progress.
  • Resulting Schema: Data loaded via the STAC reader results in a Sedona DataFrame with a predefined schema reflecting STAC item properties, including metadata, geometry, timestamps, links, and assets.
  • API Limits: When accessing public or private STAC APIs, be mindful of potential rate limits or query constraints imposed by the API provider.
    • The itemsLimitPerRequest option can help manage this.

Get Started

There are two main approaches for accessing and working with STAC data within the Wherobots environment.

  1. Use the Spark data source reader:
    1. Load the STAC data
    2. Apply optional configuration
    3. Query the loaded data.
  2. Use the dedicated Python STAC Client.

1. Load STAC Data into a DataFrame

Use the stac format with sedona.read.

1
2
3
df = sedona.read.format("stac").load("https://earth-search.aws.element84.com/v1/collections/sentinel-2-pre-c1-l2a")
df.printSchema()
df.show()
1
2
3
df = sedona.read.format("stac").load("s3a://example.com/stac_bucket/stac_collection.json")
df.printSchema()
df.show()
1
2
3
df = sedona.read.format("stac").load("/path/to/your/stac_collection.json")
df.printSchema()
df.show()

2. Apply Configuration Options

4
5
6
7
8
9
# Example applying reader options
df = sedona.read \
    .format("stac") \
    .option("itemsLimitMax", "1000") \
    .option("itemsLimitPerRequest", "50") \
    .load("https://earth-search.aws.element84.com/v1/collections/sentinel-2-pre-c1-l2a")

3. Query Loaded Data (Leveraging Filter Pushdown)

The following assumes the data is loaded into a table named STAC_TABLE.

10
11
12
SELECT id, geometry
FROM STAC_TABLE
WHERE st_contains(ST_GeomFromText('POLYGON((...))'), geometry)
10
11
12
SELECT id, datetime as dt, geometry
FROM STAC_TABLE
WHERE datetime BETWEEN '2022-01-01T00:00:00Z' AND '2022-12-31T23:59:59Z'

Use the Python STAC Client

You can also interact directly with STAC APIs using the provided Python client.

Python STAC Client Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from sedona.stac.client import Client
import datetime

# Initialize the client
client = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

# Search for items within a date range and get a DataFrame
df_items = client.search(
    collection_id="sentinel-2-l2a",
    datetime=["2023-01-01T00:00:00Z", "2023-01-05T00:00:00Z"],
    bbox=[-74, 40, -73, 41], # Example bbox for New York area
    return_dataframe=True,
    max_items=10
)

df_items.show()

# Save results to GeoParquet
client.get_collection("sentinel-2-l2a").save_to_geoparquet(
    output_path="/path/to/output/s2_data",
    bbox=[-74, 40, -73, 41],
    datetime="2023-01"
)

Read the Documentation

For comprehensive details on the STAC Reader, configuration options, the Python Client API, and examples, refer to the official Wherobots STAC Reader Documentation.

Additional Resources


  1. If a feature is available in a given Edition, this will be indicated by ✅. If a feature is not available in a given Edition, this will be indicated by ❌.