STAC Reader
This table highlights key features available for each Organization tier1:
Feature | Available in Professional and Enterprise Editions |
Available in Community Edition |
---|---|---|
STAC Reader | ✅ | ✅ |
We are excited to announce the new STAC Reader data source in Wherobots!
SpatioTemporal Asset Catalog (STAC) is a specification that standardizes the way geospatial assets are described and cataloged.
This new reader loads STAC items and collections directly into Sedona DataFrames within the Wherobots environment.
Benefits¶
- Direct STAC Integration: Easily access and load data from the growing ecosystem of STAC-compliant catalogs directly into your Wherobots workflows.
- Unified Geospatial Analysis: Leverage Sedona's powerful spatial and temporal data processing capabilities on data loaded from STAC sources.
- Performance Optimization: The reader features Spatial and Temporal Filter Pushdown, allowing you to apply spatial (e.g.,
st_contains
,st_intersects
) or temporal (e.g.,datetime BETWEEN
) filters in your queries.- These filters are pushed down to the STAC API level where they're supported by the API.
- This significantly reduces the amount of data that needs to be transferred and processed, leading to faster query execution.
- Flexible Data Access: Connect to STAC collections through an HTTP/HTTPS endpoint, an S3-compatible object store, or a local JSON file.
Key Considerations¶
- Configuration Options: You can fine-tune the reader's behavior using Sedona configuration properties (e.g.,
spark.sedona.stac.load.maxPartitionItemFiles
,spark.sedona.stac.load.numPartitions
) and reader options (e.g.,itemsLimitMax
,itemsLoadProcessReportThreshold
,itemsLimitPerRequest
).- These configurations allow you to control partitioning, limit the number of items loaded, manage API request sizes, and monitor loading progress.
- Resulting Schema: Data loaded via the STAC reader results in a Sedona DataFrame with a predefined schema reflecting STAC item properties, including metadata, geometry, timestamps, links, and assets.
- API Limits: When accessing public or private STAC APIs, be mindful of potential rate limits or query constraints imposed by the API provider.
- The
itemsLimitPerRequest
option can help manage this.
- The
Get Started¶
There are two main approaches for accessing and working with STAC data within the Wherobots environment.
- Use the Spark data source reader:
- Load the STAC data
- Apply optional configuration
- Query the loaded data.
- Use the dedicated Python STAC Client.
1. Load STAC Data into a DataFrame¶
Use the stac
format with sedona.read
.
1 2 3 |
|
1 2 3 |
|
1 2 3 |
|
2. Apply Configuration Options¶
4 5 6 7 8 9 |
|
3. Query Loaded Data (Leveraging Filter Pushdown)¶
The following assumes the data is loaded into a table named STAC_TABLE
.
10 11 12 |
|
10 11 12 |
|
Use the Python STAC Client¶
You can also interact directly with STAC APIs using the provided Python client.
Python STAC Client Example | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
Read the Documentation¶
For comprehensive details on the STAC Reader, configuration options, the Python Client API, and examples, refer to the official Wherobots STAC Reader Documentation.
Additional Resources¶
- STAC Specification: https://stacspec.org/
- STAC Browser (Example Tool): https://github.com/radiantearth/stac-browser
-
If a feature is available in a given Edition, this will be indicated by ✅. If a feature is not available in a given Edition, this will be indicated by ❌. ↩