Use Wherobots for spatial ETL and analytics on your Databricks Delta Lake tables
This workflow demonstrates how Wherobots can directly read and process data from your existing Delta Lake tables, allowing you to leverage powerful spatial analytics without a separate ingestion step.
While reading is fully supported, writing to Delta Lake tables is not.
This workflow demonstrates how Wherobots can directly read and process data from your existing Delta Lake tables, allowing you to leverage powerful spatial analytics without a separate ingestion step.
While reading is fully supported, writing to Delta Lake tables is not.
Benefits
Reading Databricks’ Delta Lake tables with Wherobots Cloud allows you to:- Avoid data migration: Access and process your Delta Lake tables directly in Databricks without needing to move them.
- Leverage existing data infrastructure: Integrate your Databricks-managed data with Wherobots’ geospatial capabilities.
- Enable advanced geospatial analysis: Perform operations like Map Matching and Raster Inference on data stored in Delta Lake tables, which may not have native geospatial support.
- Handle large-scale datasets: Read and process large, partitioned Parquet files (even those without explicit geometry columns) stored within Delta Lake tables.
Before you start
Before using this feature, ensure that you have the following required resources:- An Account within a Community, Professional, or Enterprise Edition Organization. For more information, see Create a Wherobots Account.
- A pre-existing Delta Lake table: This table should be set up and managed within the Databricks platform.
- A Databricks Personal Access Token (PAT): For more information, see Databricks personal access tokens for workspace users or Databricks personal access tokens for service principals in the official Databricks Documentation.
- A pre-existing Unity Catalog: In Databricks, Unity Catalog is the governance layer for your data.
Set your Databricks permissions
This table outlines the specific Unity Catalog privileges required for a user or service principal to read data from Databricks tables and volumes.Databricks permissions are hierarchical
To access a table, you must have privileges for the catalog and schema that contain it. For more information, consult the official Databricks Documentation or your Workspace Admin.
To access a table, you must have privileges for the catalog and schema that contain it. For more information, consult the official Databricks Documentation or your Workspace Admin.
| Access Privilege/Permission | Applies To (Object) | Purpose |
|---|---|---|
USE CATALOG | Catalog | Allows the user or service principal to see and access the specified catalog. This is the top-level requirement. |
USE SCHEMA | Schema | Grants the ability to see and access the schema (database) that contains the desired table or function. |
SELECT | Table / View | Grants read-only permission to query data from a specific table or view within the schema. |
EXECUTE | Function / Notebook | Grants the ability to run a user-defined function (UDF) or execute a notebook. This does not grant read access to tables. |
READ VOLUME | External Volume | Grants read-only permission to access files stored within a specific Unity Catalog external volume. |
EXTERNAL USE SCHEMA | Schema | Grants an external system the ability to read data from tables within a schema. For more information, see Enable external data access to Unity Catalog |
More information on EXTERNAL USE SCHEMA
The user or service principal authenticating the external request must be explicitly granted the EXTERNAL USE SCHEMA privilege on the schema containing the target table. This is a critical security measure to prevent accidental data exfiltration and is not included in broader privileges like ALL PRIVILEGES.
The user or service principal authenticating the external request must be explicitly granted the EXTERNAL USE SCHEMA privilege on the schema containing the target table. This is a critical security measure to prevent accidental data exfiltration and is not included in broader privileges like ALL PRIVILEGES.
Reading a Delta Lake table in a Wherobots Notebook
This section outlines how a user can access and read data from an existing Delta Lake table.- Log in to Wherobots Cloud.
- Start a runtime.
- Once the runtime has loaded, open your Wherobots Notebook.
- In your Wherobots Notebook:
- Define your connection parameters and credentials.Configuring the Sedona ContextFinding Your Catalog, Schema, and Table Names
To locate the names for your catalog, schema, and table, see the official Databricks guide on how to explore database objects using the Catalog Explorer.- On line 9, replace
https://your-databricks-workspace.cloud.databricks.comwith the URI for your Databricks Catalog. - On line 10, replace
YOUR_CATALOG_NAMEwith your Unity Catalog name. - On line 11, replace
YOUR_DATABRICKS_SCHEMA_NAMEwith your Databricks Schema Name. - On line 12, replace
YOUR_DATABRICKS_TABLE_NAMEwith your Databricks Table Name.
- On line 9, replace
- Configure the Sedona Context to access the specific Delta Lake table.
- Read the Delta Lake table using its catalog name.Reading the Delta table
- Define your connection parameters and credentials.
Performing geospatial analysis on data from a Delta Lake table
Once the Delta Lake table is loaded into a Wherobots Notebook, you can prepare your data for advanced geospatial operations like Map Matching and Raster Inference, as you would with any data stored directly within Wherobots Cloud. To use spatial functions, the DataFrame needs a column with the actualgeometry type, not just a String of WKT or a Binary representation of a shape.
If your data is currently stored in one of these formats, you must first convert it into a true geometry object.
Preprocessing for Spatial Analysis
The method of preprocessing your data for spatial analysis depends on the initial state of your data. The following examples demonstrate this by transforming raw GPS records from a Delta Lake table into a DataFrame with a specializedgeometry column.
While ST_Point creates geometry from a numerical data type (like FLOAT), ST_GeomFromEWKB activates the geometry that was stored in a dormant, non-spatial format like Binary. This activation still requires creating a new column to hold the finished, active object.
- Load Data: Follow steps from Reading a Delta Lake table in a Wherobots Notebook to load your data.
-
Prepare Data: Create a
geometrycolumn using the coordinate fields, depending on the initial data type.- Data stored as a Numerical Type
- Data stored as a Binary Type
Consider a DataFrame loaded from a Delta Lake table that contains separate numerical columns for longitude (x-coordinate) and latitude (y-coordinate):trip_id start_lon start_lat 101 -122.4194 37.7749 102 -122.4080 37.7850 - Input: A PySpark DataFrame (read from a Delta table) where the location data is stored in separate, non-spatial columns.
- Data Types: The
start_lonandstart_latcolumns are standard numerical types, such asDouble,Decimal, orFloat, not spatial geometry. - Spatial Awareness: At this stage, the DataFrame is not spatially aware. The database sees
start_lonandstart_latas two independent numbers; it has no intrinsic understanding that they represent a single geographic location.
ST_Point(start_lon, start_lat)to create a newgeometrycolumn from these two columns.ST_Point ExampleOutput- Output: A new PySpark DataFrame that now includes a spatially-aware
geometrycolumn. - Structure: A table containing all the original columns plus the new
geometrycolumn. - Data Types: The data type of the new
geometrycolumn is a specializedGeometryUDT(User-Defined Type). This is a complex object that Apache Sedona understands as a spatial object. When you display the column using the.show()command, thisGeometryUDTobject is presented in a human-readable string format called Well-Known Text (WKT). - Spatial Awareness: The output DataFrame is now spatially aware. You can run spatial functions (e.g.,
ST_Distance,ST_Contains)on the geometry column.
ST_POINTfunction is essential for spatial analysis because it transforms simple numerical coordinates into a powerful geometry object, as this table illustrates:Feature Before ST_POINTAfter ST_POINTData Format FLOATgeometrySpatial Awareness No. Seen as independent numerical columns. Yes. Acknowledged as a spatial object. Functionality Cannot use spatial functions. Can use functions like ST_Distance,ST_Area, etc.
Next Steps
Now that you have created a spatially-awaregeometry column, you can leverage it to perform powerful geospatial analysis.
Analyze Your Data with Spatial SQL Functions
After you create ageometry column, you can use it with different kinds of spatial functions. You can now analyze the relationships between your geometries using powerful and highly-optimized functions like ST_Distance, ST_Contains, and ST_Intersects.
Perform Map Matching on a Real-World Road Network
Wherobots offers powerful map matching capabilities to take your prepared GPS data and link it to actual roads. For a full, hands-on example, open a Wherobots Notebook and navigate to the tutorial atexamples/Analyzing_Data/GPS_Map_Matching.ipynb.
You can also review the static version on GitHub: GPS_Map_Matching.ipynb
Usage considerations
- Read-Only Access to Delta tables: Wherobots Cloud currently supports reading from Delta Lake tables. Writing directly to Delta Lake tables from Wherobots is not supported.
- Large File Handling: This process is designed to handle large, partitioned Parquet files commonly found in Delta Lake Tables, including those without pre-existing geometry columns.
- Permissions Management: All permissions and access controls for the Delta Lake tables themselves must be managed on the Databricks side.

