
- GeoParquet: Open source format that is optimized for modern, very large geospatial workflows.
- Shapefile: Legacy format for geospatial data.
- GeoJSON: Lightweight and human-readable.
- CSV: Tabular data that can contain geometries serialized in a WKT (well-known text) column or point coordinates as multiple columns.
- Cloud-Optimized GeoTIFF (COG): Designed for efficient cloud storage and access.
- NetCDF: Often used for multidimensional climate data.
Connect to data stored in Amazon S3
Most geospatial datasets are too large to store locally, so we use Amazon S3 to manage and access spatial data. Wherobots queries run on cloud-based data and support out-of-database (“Out-DB”) rasters, meaning it only reads the parts of rasters needed to process queries. Let’s test if we can list files in an S3 bucket. We will verify our connection to Wherobots’s public S3 bucket for the data in this tutorial and confirm that we can access spatial datasets stored in the cloud.Loading vector data
The next few cells show examples of how to load:- GeoParquet from an S3 bucket
- GeoJSON from the notebook’s local file storage
- A CSV file with latitude and longitude stored in two columns
format("geoparquet")→ Specifies that we are reading a GeoParquet file.load("s3a://...")→ Loads the dataset directly from S3 without downloading it locally.
option("header", "true")→ Reads the first line as column names.ST_MakePoint()→ Converts decimal coordinates from columns into a geometry object.
Loading raster data
Raster data represents continuous spatial information such as pixels in satellite imagery, heights in elevation models, or temperate in climate or weather data. These values are stored as a grid of values and come in a variety formats.| Format | Description |
|---|---|
| GeoTIFF | A widely used raster format for geospatial imagery |
| Cloud-Optimized GeoTIFF (COG) | A version of GeoTIFF optimized for fast cloud access |
| NetCDF | Commonly used for scientific climate and weather data |
| JPEG2000 | A compressed raster format with high quality |
| HDF (Hierarchical Data Format) | Used for large datasets in Earth science |
- Faster access in cloud storage by reading only necessary parts of the file
- Good parallel processing for large-scale data environments
- Broad compatibility with GIS tools, including Wherobots
Tips for using raster data
Optimizing with tiling: Breaking large raster files into tiles can improve query performance.RS_TileExplode and RS_Tile are two Wherobots functions to create tiles as database records or arrays. Docs: Raster functions


