Skip to main content
Whether you’re new to Wherobots or to geospatial data, this glossary provides clear definitions for the terms you’ll encounter throughout these docs.

Wherobots Platform

Core terminology for the Wherobots platform, its products, and key concepts.

Products and Engines

The following terms describe Wherobots’ core products and engines that power its geospatial analytics capabilities. Each product is designed to address specific aspects of spatial data management, analysis, and AI-driven insights at scale.
The fully managed cloud platform for running geospatial workloads. Wherobots Cloud provides a serverless environment with notebooks, job scheduling, storage integrations, and a web console for managing your organization, runtimes, and data.Get started with Wherobots Cloud
A Model Context Protocol server that lets you interact with Wherobots through natural language in AI-powered IDEs. Use it to explore spatial data catalogs and generate Spatial SQL queries conversationally.Set up the MCP Server
A cloud-native, serverless analytics engine optimized for geospatial workloads. WherobotsDB supports Spatial SQL, Python, and Scala, and is compatible with Apache Sedona. WherobotsDB provides 300+ spatial functions for both vector and raster data and delivers up to 20x faster performance than open-source Apache Sedona.Introduction to WherobotsDB
A managed inference engine for large-scale raster processing. RasterFlow provides a high-level API to build mosaics from multiple raster data sources, run computer vision model inference, and vectorize results — without managing distributed infrastructure.Get started with RasterFlow
Wherobots’ spatial table format built on Apache Iceberg. Havasu adds native spatial indexing, optimized spatial queries, and geometry-aware storage to Iceberg tables, enabling efficient spatial data lake management.Havasu reference
A suite of AI and machine learning tools for geospatial data analysis. WherobotsAI includes:
  • Raster Inference — Run computer vision models (classification, segmentation, object detection) on satellite and aerial imagery at planetary scale.
  • GeoStats — Distributed machine learning algorithms (DBSCAN, Getis-Ord Gi*, Local Outlier Factor) for detecting spatial patterns in vector data.
  • Map Matching — Align GPS or location-tracking coordinates to digital road networks at scale.
Introduction to WherobotsAI

Compute and Billing

The following terms address key concepts related to Wherobots Cloud compute resources, billing, and organizational management.
The period of inactivity after which a notebook runtime automatically scales down. Configurable to 15, 45, or 120 minutes. While idle timeout reduces costs, you are still charged for minimum resources until the notebook is destroyed.
Your Organization’s maximum concurrent computing power, measured in Spatial Units. Quota depends on your Organization Edition and payment history. You can view your current quota utilization in Workload History.
A dedicated, serverless computing cluster that powers your Wherobots workloads. Runtimes come in three types:
TypeBest for
General PurposeMost workloads, from getting started to planetary-scale queries
Memory OptimizedMemory-intensive tasks like map matching
GPU OptimizedWherobotsAI Raster Inference
Runtimes range in size from Micro to 4X-Large. Larger runtimes consume more Spatial Units per hour but can process data faster.Learn more about runtimes
The maximum duration a runtime instance can remain active. Professional and Enterprise organizations have an 8-hour default TTL, while Community Edition organizations have a 4-hour default TTL. You can request an increase through a Compute Request.
The billing unit that measures computational horsepower provisioned to a runtime. One Spatial Unit provides performance similar to a 32-vCPU Apache Sedona cluster on the latest stable Apache Spark runtime. Costs are calculated as Spatial Units multiplied by the hourly price.Understand Spatial Unit pricing
Any computational task that runs on a Wherobots runtime, including Job Runs, SQL Sessions, notebooks, and AI model inference.

Data and Storage

The following terms relate to how Wherobots manages data, storage integrations, and querying capabilities across different catalogs and formats.
The ability to query data across multiple catalogs and storage systems from a single Wherobots runtime. Data Federation lets you join datasets across your own S3 storage, Wherobots Managed Storage, and external catalogs like Unity Catalog without duplicating data.Learn about Data Federation
The central interface in Wherobots Cloud for managing your data connections. Use Data Hub to configure storage integrations, browse catalogs, and access built-in open datasets.Configure Data Hub
Cloud storage provisioned and managed by Wherobots for your organization. Managed Storage provides a ready-to-use location for reading and writing data without configuring external storage integrations.
A connection between Wherobots and your own Amazon S3 buckets, enabling your runtimes to read and write data directly in your cloud storage.Set up S3 integration
The entry point for all WherobotsDB operations. Creating a SedonaContext initializes both the Sedona geospatial engine and the underlying Spark cluster. It is the first step in any Wherobots notebook or job.
from sedona.spark import SedonaContext
config = SedonaContext.builder().getOrCreate()
sedona = SedonaContext.create(config)
A Databricks data governance layer that Wherobots can connect to as a foreign catalog. This lets you query your Databricks-managed tables directly in Wherobots without moving data.Connect to Unity Catalog
A curated catalog of publicly available geospatial datasets (referenced as wherobots_open_data in queries) that are pre-loaded and ready to query in any Wherobots runtime. Useful for prototyping, learning, and enriching your own datasets.

Organization and Access

The following terms define key concepts related to organizational structure, access control, and the different service tiers available in Wherobots Cloud.
A scheduled or on-demand execution of a geospatial workload outside of the notebook environment. Job Runs integrate with Apache Airflow via the WherobotsRunOperator for orchestrating production-grade spatial ETL pipelines.Learn about Job Runs
Wherobots offers three tiers of service:
EditionHighlights
CommunityFree tier. Tiny runtime only, 4-hour TTL, 20 SU rate limit.
ProfessionalPaid via AWS Marketplace. Access to larger runtimes, 8-hour TTL, and WherobotsAI features.
EnterpriseCustom contract. Full feature access, SAML SSO, dedicated support, and custom quotas.
Compare editions
A non-human identity used for programmatic access to the Wherobots API. Service Principals enable automation, CI/CD pipelines, and integrations without tying access to an individual user account.Set up Service Principals
A REST API for executing Spatial SQL queries against WherobotsDB without starting a notebook. It provides a programmatic interface for integrating Wherobots query capabilities into applications and workflows.Spatial SQL API docs

General Geospatial Terms

This section covers foundational geospatial concepts and standards referenced throughout the Wherobots documentation and the broader spatial industry.

Spatial Data Types

The following terms describe fundamental spatial data types that represent geographic features and phenomena. Understanding these data types is essential for working with geospatial data in WherobotsDB and performing spatial analysis effectively.
Spatial data stored as a grid of cells (pixels), where each cell contains one or more values representing a measurement (e.g., elevation, temperature, reflectance). Raster data is suited for continuous phenomena like satellite imagery, elevation models, and climate data.Common formats include GeoTIFF / COG, netCDF, and Zarr.Introduction to Spatial Data
Spatial data that represents discrete features using geometric primitives:
  • Point — A single coordinate pair (e.g., a city, a sensor location).
  • Line (LineString) — An ordered sequence of points forming a path (e.g., a road, a river).
  • Polygon — A closed shape defining an area (e.g., a park boundary, a country border).
  • MultiPoint / MultiLineString / MultiPolygon — Collections of the respective geometry types.
Vector data is the category, while Geometry and Geography are the spatial data types used to store and process vector data. Whether you define a polygon as Geometry or Geography depends on whether calculations should use a flat plane or account for Earth’s curvature.Common formats include GeoJSON, Shapefile, GeoParquet, and GeoPackage.Introduction to Spatial Data
The “round Earth” spatial data type. Geography uses a geodetic (ellipsoidal) coordinate system with latitude and longitude, calculating distances as great circles over a curved surface. This produces more accurate measurements over long distances compared to planar Geometry operations, but is mathematically more complex.
GeometryGeography
SurfaceFlat planeCurved surface (spheroid)
UnitsLinear (meters, feet)Angular (degrees)
AccuracyDistorts over large areasHandles Earth’s curvature
PerformanceFasterSlower
Best for: Global datasets, flight paths, or regional analysis where Earth’s curvature affects precision.
Geometry uses a Cartesian (planar) coordinate system, calculating distances and areas with Euclidean math on a flat plane. In WherobotsDB, geometries follow the OGC Simple Features specification and can be points, lines, polygons, or collections thereof. They are typically stored in a geometry column and manipulated with spatial functions like ST_Area(), ST_Buffer(), and ST_Intersection().Best for: Local-scale analysis (city-level or smaller) where Earth’s curvature doesn’t significantly impact accuracy.

Spatial Formats

The following terms describe common geospatial data formats for both vector and raster data, each with its own strengths, limitations, and use cases. WherobotsDB supports querying across all of these formats through its flexible data connectors and format compatibility layers.
A GeoTIFF file organized so that it can be efficiently streamed over HTTP using range requests. COGs use internal tiling and overviews to allow clients to read only the portion of the image they need, making them ideal for cloud-native raster workflows.
A JSON-based format for encoding geographic data structures. GeoJSON supports Point, LineString, Polygon, and collection types, and is widely used in web mapping and APIs. Its human-readable structure makes it easy to inspect but less efficient for large datasets compared to binary formats.
An OGC standard built on SQLite for storing vector features, tile matrices, raster data, and metadata in a single portable file. GeoPackage is a modern, open alternative to Shapefiles without their size and encoding limitations.
An extension of Apache Parquet that adds standardized geospatial metadata and geometry encoding. GeoParquet combines Parquet’s columnar storage efficiency and compression with native spatial support, making it ideal for large-scale analytical workloads. WherobotsDB has native read/write support for GeoParquet.
A single-file archive format for tiled map data (vector or raster tiles) designed for cloud-native access. PMTiles enables serving map tiles directly from cloud object storage without a tile server, using HTTP range requests.PMTiles reference
One of the oldest and most widely supported vector data formats, developed by Esri. A Shapefile is actually a collection of files (.shp, .shx, .dbf, and others). While ubiquitous, Shapefiles have limitations including a 2 GB file size cap, 10-character field name limits, and no built-in support for UTF-8 encoding.
The binary equivalent of WKT. WKB provides a compact, machine-readable encoding of geometry objects and is the default serialization format in most spatial databases.
A text-based markup language for representing geometry objects and coordinate reference systems. For example, POINT(30 10) or POLYGON((0 0, 1 0, 1 1, 0 1, 0 0)). WKT is commonly used in SQL queries with functions like ST_GeomFromWKT().
A format for storing chunked, compressed, N-dimensional arrays. In geospatial contexts, Zarr is used for large raster datasets and multi-dimensional data (e.g., time-series satellite imagery). RasterFlow uses Zarr internally for storage of raster data, including mosaics and model outputs.

Spatial Concepts

The following terms cover fundamental geospatial concepts and techniques that are essential for understanding how to work with spatial data and perform spatial analysis in WherobotsDB and beyond.
A user-defined geographic boundary that scopes an analysis or data request to a specific region. In RasterFlow, the AOI is specified as a WKT string to define where mosaics are built and inference is run.
A framework that defines how coordinates map to locations on Earth. A CRS includes a coordinate system and a datum. Geographic CRS like WGS 84 (EPSG:4326) uses latitude and longitude directly on the ellipsoid — no projection is involved. Projected CRS like Web Mercator (EPSG:3857) add a map projection to flatten the curved surface onto a 2D plane, which introduces distortion.
The process of converting a human-readable address or place name into geographic coordinates (latitude/longitude). Reverse geocoding does the opposite: converting coordinates into an address or place name.
A technique that defines a virtual boundary around a real-world geographic area. When a tracked entity (device, vehicle, person) enters or exits the geofence, an event is triggered. Geofencing relies on spatial predicates like ST_Contains or ST_Intersects.
An algorithm that aligns noisy or inaccurate location data (e.g., GPS traces) to the most probable path on a road network. Map matching is essential for fleet tracking, traffic analysis, and routing applications.WherobotsAI Map Matching
The process of ingesting raw geospatial data, applying spatial transformations (reprojection, clipping, enrichment, joins), and loading the results into a target system. Wherobots is purpose-built for spatial ETL at scale, supporting both vector and raster data in the same pipeline.
A data structure that organizes geometries to accelerate spatial queries (e.g., “find all buildings within this polygon”). Common spatial index types include R-tree, Quadtree, and H3. WherobotsDB automatically manages spatial indexing for you, eliminating the need for manual tuning.Advanced users can still tune their partitions or thresholds, if they deem it necessary for their specific use case.
A database join where the join condition is a spatial relationship (e.g., ST_Intersects, ST_Contains, ST_DWithin) rather than a key match. Spatial joins are fundamental to geospatial analytics — for example, joining building footprints with parcel boundaries to determine which buildings are in which parcels.
The World Geodetic System 1984, the most widely used geographic coordinate system. WGS 84 uses latitude and longitude in degrees and is the standard CRS for GPS, web mapping, and most geospatial interchange formats. EPSG:4326 is its identifier in the EPSG registry.
A spatial analysis technique that summarizes raster cell values within defined vector zones (polygons). For example, calculating the average elevation, maximum temperature, or total rainfall within each county boundary.

Remote Sensing and Raster Analysis

The following terms define key concepts related to remote sensing, raster data analysis, and computer vision techniques commonly applied to satellite and aerial imagery in WherobotsAI Raster Inference and RasterFlow.
A raster dataset representing the bare-earth surface elevation. DEMs are used for terrain analysis, flood modeling, viewshed calculations, and slope/aspect mapping. Related models include Digital Surface Models (DSMs), which include buildings and vegetation.
A composite raster created by stitching together multiple overlapping images into a single seamless dataset. In RasterFlow, mosaics are spatially-aligned raster datasets stored in Zarr format, combining one or more source images across an Area of Interest and temporal dimension. Model inference outputs can also be mosaics.
A widely used spectral index that quantifies vegetation health by measuring the difference between near-infrared (strongly reflected by plants) and red light (absorbed by plants). NDVI values range from -1 to 1, where higher values indicate denser, healthier vegetation.
A computer vision task that identifies and locates discrete objects within an image using bounding boxes. In WherobotsAI Raster Inference, object detection is used to find features like ships, solar panels, and aircraft in satellite imagery.
The acquisition of information about the Earth’s surface from sensors mounted on satellites or aircraft. Remote sensing produces raster imagery used for land cover classification, change detection, environmental monitoring, and more.
A computer vision task that classifies every pixel in an image into a category (e.g., water, forest, building, road). In WherobotsAI Raster Inference and RasterFlow, semantic segmentation is used for applications like predicting agricultural field boundaries or rural roads.
The conversion of raster data into vector geometries. In RasterFlow, vectorization converts pixel-level model predictions into polygon features (e.g., turning a segmentation mask into building footprint polygons stored in GeoParquet).

Ecosystem and Standards

The following terms reference key technologies, standards bodies, and open-source projects that Wherobots integrates with or builds upon to deliver its geospatial analytics capabilities.
An open-source workflow orchestration platform for authoring, scheduling, and monitoring data pipelines. Wherobots integrates with Airflow via the WherobotsRunOperator to schedule and automate Job Runs.Airflow configuration
An open table format for large analytic datasets. Iceberg provides ACID transactions, schema evolution, time travel, and partition evolution. Havasu extends Iceberg with native spatial indexing and geometry-aware optimizations.
An open-source cluster computing system for processing large-scale spatial data. Sedona extends Apache Spark (and Apache Flink) with distributed spatial datasets and Spatial SQL. Wherobots was founded by the creators of Apache Sedona and uses it as a open core pillar of WherobotsDB.Apache Sedona vs. Wherobots
A unified analytics engine for large-scale data processing. Spark provides distributed computing primitives that WherobotsDB extends with geospatial capabilities. When you create a SedonaContext, you are initializing a Spark cluster under the hood.
An open-source translator library for raster and vector geospatial data formats. GDAL supports 200+ formats and is a foundational building block for most geospatial software. WherobotsDB leverages GDAL-compatible format support for data loading.
An international organization that develops open standards for geospatial content and services. OGC standards referenced in Wherobots include Simple Features (geometry model), GeoPackage, and WKT/WKB encoding.
The extension of standard SQL with spatial data types and functions (e.g., ST_Intersects, ST_Buffer, ST_Area). Spatial SQL is the primary query language in WherobotsDB, following the OGC/ISO SQL/MM spatial standard. It enables performing geospatial analysis using familiar SQL syntax.Write Spatial SQL queries
A specification for cataloging spatiotemporal data (primarily satellite imagery) in a standardized, searchable way. STAC makes it easy to discover, access, and manage large archives of geospatial assets across providers.