Segmentation
WherobotsAI Raster Inference - Segmentation¶
This example demonstrates query inference using a segmentation model with Raster Inference to identify solar farms in satellite imagery. We will use a machine learning model from Satlas 1 which was trained using imagery from the European Space Agency’s Sentinel-2 satellites.
Note: This notebook requires the Wherobots Inference functionality to be enabled and a GPU runtime selected in Wherobots Cloud. Please contact us to enable these features.
Step 1: Set Up The WherobotsDB Context¶
import warnings
warnings.filterwarnings('ignore')
from wherobots.inference.data.io import read_raster_table
from sedona.spark import SedonaContext
from pyspark.sql.functions import expr
config = SedonaContext.builder().appName('segmentation-batch-inference')\
.getOrCreate()
sedona = SedonaContext.create(config)
2: Load Satellite Imagery¶
Next, we load the satellite imagery that we will be running inference over. These GeoTiff images are loaded as out-db rasters in WherobotsDB, where each row represents a different scene.
tif_folder_path = 's3a://wherobots-benchmark-prod/data/ml/satlas/'
files_df = read_raster_table(tif_folder_path, sedona, limit=400)
df_raster_input = files_df.withColumn(
"outdb_raster", expr("RS_FromPath(path)")
)
df_raster_input.cache().count()
df_raster_input.show(truncate=False)
df_raster_input.createOrReplaceTempView("df_raster_input")
3: Run Predictions And Visualize Results¶
To run predictions we will specify the model we wish to use. Some models are pre-loaded and made available in Wherobots Cloud. We can also load our own models. Predictions can be run with the Raster Inference SQL function RS_Segment
or the Python API.
Here we generate 400 raster predictions using RS_Segment
.
model_id = 'solar-satlas-sentinel2'
predictions_df = sedona.sql(f"""
SELECT
outdb_raster,
segment_result.*
FROM (
SELECT
outdb_raster,
RS_SEGMENT('{model_id}', outdb_raster) AS segment_result
FROM
df_raster_input
) AS segment_fields
""")
predictions_df.cache().count()
predictions_df.show()
predictions_df.createOrReplaceTempView("predictions")
Now that we've generated predictions using our model over our satellite imagery, we can use the RS_Segment_To_Geoms
function to extract the geometries indicating the model has identified as possible solar farms. we'll specify the following:
- a raster column to use for georeferencing our results
- the prediction result from the previous step
- our category label "1" returned by the model representing Solar Farms and the class map to use for assigning labels to the prediction
- a confidence threshold between 0 and 1.
df_multipolys = sedona.sql("""
WITH t AS (
SELECT RS_SEGMENT_TO_GEOMS(outdb_raster, confidence_array, array(1), class_map, 0.65) result
FROM predictions
)
SELECT result.* FROM t
""")
df_multipolys.cache().count()
df_multipolys.show()
df_multipolys.createOrReplaceTempView("multipolygon_predictions")
Since we ran inference across the state of Arizona, many scenes don't contain solar farms and don't have positive detections. Let's filter out scenes without segmentation detections so that we can plot the results.
df_merged_predictions = sedona.sql("""
SELECT
element_at(class_name, 1) AS class_name,
cast(element_at(average_pixel_confidence_score, 1) AS double) AS average_pixel_confidence_score,
ST_Collect(geometry) AS merged_geom
FROM
multipolygon_predictions
""")
This leaves us with a few predicted solar farm polygons for our 300 satellite image samples.
df_filtered_predictions = df_merged_predictions.filter("ST_IsEmpty(merged_geom) = False")
df_filtered_predictions.cache().count()
df_filtered_predictions.show()
We'll plot these with SedonaKepler. Compare the satellite basemap with the predictions and see if there's a match!
from sedona.maps.SedonaKepler import SedonaKepler
config = {
'version': 'v1',
'config': {
'mapStyle': {
'styleType': 'dark',
'topLayerGroups': {},
'visibleLayerGroups': {},
'mapStyles': {}
},
}
}
map = SedonaKepler.create_map(config=config)
SedonaKepler.add_df(map, df=df_filtered_predictions, name="Solar Farm Detections")
map
wherobots.inference Python API¶
If you prefer python, wherobots.inference offers a module for registering the SQL inference functions as python functions. Below we run the same inference as before with RS_SEGMENT.
from wherobots.inference.engine.register import create_semantic_segmentation_udfs
from pyspark.sql.functions import col
rs_segment = create_semantic_segmentation_udfs(batch_size = 10, sedona=sedona)
df = df_raster_input.withColumn("segment_result", rs_segment(model_id, col("outdb_raster"))).select(
"outdb_raster",
col("segment_result.confidence_array").alias("confidence_array"),
col("segment_result.class_map").alias("class_map")
)
df.show(3)
References¶
- Bastani, Favyen, Wolters, Piper, Gupta, Ritwik, Ferdinando, Joe, and Kembhavi, Aniruddha. "SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding." arXiv preprint arXiv:2211.15660 (2023). https://doi.org/10.48550/arXiv.2211.15660