- Read a Managed Delta table from Unity Catalog.
- Transform coordinates into spatial
POINTgeometries. - Enrich the data by calculating each forecast’s proximity to Tokyo to create a new threat feature.
- Write the results back to a new external Delta table, dropping the geometry column as it is not natively supported by Databricks.
- Note: This example writes to an External Delta table because Databricks prevents external platforms like Wherobots from writing to Managed tables.
forecast_daily_calendar_imperial table dataset, which comes pre-loaded in your Databricks workspace.
Data Disclaimer: Review the following about the dataset used in this notebook:
- The weather data used in this demonstration originates from the
samples.accuweather.forecast_daily_calendar_imperialdataset provided within Databricks.- Wherobots is not responsible for the accuracy or completeness of this data.
- This analysis is based on daily forecast data and does not represent real-time conditions.
- For complete information about the dataset, go to Forecast Weather Data and click Documentation within the Product Links section.
Prerequisites
To run this example notebook, you’ll need:- An Existing Databricks catalog and schema governed by Unity Catalog.
- A Connection between Wherobots and your Unity Catalog-governed schema and catalog.
- For more information on connecting Unity Catalog to your Wherobots Organization, including the necessary Databricks catalog permissions, see Connect to Unity Catalog.
- If your Unity Catalog has been successfully connected to Wherobots, you will be able to see it in the Wherobots Data Hub.
- The necessary permissions to read from and write to Delta tables within your Databricks Unity Catalog.
Note: Wherobots discovers Databricks catalogs only at your runtime’s initialization. If you created a new Databricks catalog after the Wherobots runtime was started, that catalog won’t be visible until you restart the Wherobots runtime.
To make a new catalog visible, complete the following steps to restart the runtime:
- Save active work: Ensure any running jobs or SQL sessions are saved.
- Destroy runtime: Stop the current Wherobots runtime in Wherobots Cloud.
- Start a new runtime: Start the runtime again.
In Databricks
Create a table in your Unity Catalog that copies the data provided by Accuweather’ssamples.accuweather.forecast_daily_calendar_imperial dataset. After copying this data into its own Delta table, you can query and modify it in Wherobots.
Create the Sample Table
Update theYOUR-CATALOG and YOUR-SCHEMA variables (maintaining the backticks around each) in the cell below to point to the resources in your Databricks environment where you have permission to create tables.
Run the following command in a Databricks SQL editor to create a new table with the necessary sample data from the built-in Accuweather sample data.
In a Wherobots Notebook
Run the following commands in this Wherobots notebook.Import Libraries
Set up Wherobots notebook variables
To define the resources for this ETL pipeline, update the following variables in this Wherobots notebook To list available Databricks external locations to use in theOUTPUT_TABLE_BASE_PATH and find the URL you need, run the following SQL command in a Databricks notebook cell.
url column and use it in your code.
Create the SedonaContext
The following creates aSedonaContext object.
Confirm that you can read data from the Unity Catalog table in your Wherobots Notebook
Read the table and confirm that it returns a DataFrame. Read the table and confirm that it returns a DataFrame containing the Accuweather forecast data.Running Spatial Operations
In this step, we will convert the latitude and longitude columns fromforecast_daily_calendar_imperial_wbc_demo into a Point object and add that object to the table.
This following code transforms latitude and longitude data in a DataFrame into a spatially-aware geometry column and then validates the result.
In short, it adds a new column named point by converting latitude and longitude values into a standard geographic point.
Proximity Analysis: Calculate Distances to Key Locations
In this section, you will perform a proximity analysis to calculate the distance from each weather forecast in your dataset to a specific point of interest. This allows you to filter data based on location and answer questions like, “Which of these weather events is closest to my operations center?”A Practical Example
Imagine your business has major operations or supply chain dependencies in the Tokyo metropolitan area, where severe weather can disrupt logistics and public safety. Your raw data contains thousands of forecasts across the region but lacks the context of which ones pose a direct threat to the city. By defining Tokyo’s coordinates, you can calculate the distance from every weather event to the city center, saving the result in a new column likedistance_to_tokyo_meters.
With this new column, your data becomes an early-warning system. You can now easily ask critical business questions like:
“Show me cities with wind gusts over 40 mph or heavy precipitation within a 500-kilometer radius of Tokyo.”This analysis turns your spatial data into actionable intelligence, allowing you to focus only on the events that directly impact your operations.
Writing the Results
In this step, you will write the results back to an external Delta table managed by Unity Catalog.
Note: Before storing the data in Databricks, we are going to convert the geometry column back int WKT and then drop the point column. This is because Databricks does not natively support geometries. Also, keep in mind that when you write the data back to Databricks, your user may not have the necessary permissions to query it; you will need to grant those permissions explicitly in the Unity Catalog.
Data preparation for Databricks
Before loading the data into Databricks, you must drop the ‘point’ column because it contains thegeometry datatype, POINT.
This procedure is required because Databricks does not offer native support for columns containing geometry data types.
Note: A geometry data type is a special data type used in spatial databases to represent geographic features such as points (POINT), lines (LINESTRING), or polygons (POLYGON). You can learn more about Introduction to Spatial Data in the Wherobots Documentation.
Confirm Successful Write to Databricks
If your notebook has run to this point and you see the Spark job status showingCOMPLETED for all jobs, the process has finished successfully.
This indicates that the data has been successfully written from your Wherobots environment to Databricks.
To double-check: Navigate to your Databricks workspace and verify that the new table has been created as expected.
