Object Detection
WherobotsAI Raster Inference - Object Detection¶
This example demonstrates query inference using an object detection model with Raster Inference to identify marine infrastructure (offshore wind farms and platforms) in satellite imagery. We will use a machine-learning model from Satlas 1 which was trained using imagery from the European Space Agency’s Sentinel-2 satellites.
Note: This notebook requires the Wherobots Inference functionality to be enabled and a GPU runtime selected in Wherobots Cloud. Please contact us to enable these features.
Step 1: Set Up The WherobotsDB Context¶
Here we configure WherobotsDB to enable access to the necessary cloud object storage buckets with sample data.
import warnings
warnings.filterwarnings('ignore')
from wherobots.inference.data.io import read_raster_table
from sedona.spark import SedonaContext
from pyspark.sql.functions import expr
config = SedonaContext.builder().appName('object-detection-batch-inference')\
.getOrCreate()
sedona = SedonaContext.create(config)
2: Load Satellite Imagery¶
Next, we load the satellite imagery that we will be running inference over. These GeoTiff images are loaded as out-db rasters in WherobotsDB, where each row represents a different scene.
tif_folder_path = 's3://wherobots-benchmark-prod/data/ml/satlas-offshore-wind-scenes/'
files_df = read_raster_table(tif_folder_path, sedona, limit=500)
df_raster_input = files_df.withColumn(
"outdb_raster", expr("RS_FromPath(path)")
)
%%time
df_raster_input.cache().count()
df_raster_input.show(truncate=False)
3: Run Predictions And Visualize Results¶
To run predictions we will specify the model we wish to use. Some models are pre-loaded and made available in Wherobots Cloud. We can also load our own models. Predictions can be run using Wherobot's Spatial SQL functions, in this case RS_DETECT_BBOXES
.
Here we generate 100 predictions using RS_DETECT_BBOXES
.
df_raster_input.createOrReplaceTempView("df_raster_input")
model_id = 'marine-satlas-sentinel2'
predictions_df = sedona.sql(f"""
SELECT
outdb_raster,
detect_result.*
FROM (
SELECT
outdb_raster,
RS_DETECT_BBOXES('{model_id}', outdb_raster) AS detect_result
FROM
df_raster_input
) AS detect_fields
""")
predictions_df.cache().count()
predictions_df.show()
predictions_df.createOrReplaceTempView("predictions")
Since we ran inference across many country coastlines all over the world, many scenes don't contain wind farms and don't have positive detections. Now that we've generated predictions using our model over our satellite imagery, we can filter the geometries by confidence score with RS_FILTER_BOX_CONFIDENCE
and by the integer label representing offshore wind farms, 2
, to locate predicted offshore wind farms.
filtered_predictions = sedona.sql(f"""
SELECT
outdb_raster,
filtered.*
FROM (
SELECT
outdb_raster,
RS_FILTER_BOX_CONFIDENCE(bboxes_wkt, confidence_scores, labels, 0.65) AS filtered
FROM
predictions
) AS temp
WHERE size(filtered.max_confidence_bboxes) > 0
AND array_contains(filtered.max_confidence_labels, '2')
""")
filtered_predictions.createOrReplaceTempView("filtered_predictions")
filtered_predictions.cache().count()
filtered_predictions.show()
Our final step before plotting our prediction results is to convert our table from a format where each row represents a raster scene's predictions to a format where each row represents one predicted bounding box. To do this we combine our list columns with arrays_zip
and then use explode
to convert lists to rows. To convert our string column representing a geometry into a GeometryType
column, we use ST_GeomFromWKT
so we can plot it with SedonaKepler
.
exploded_df = sedona.sql("""
SELECT
outdb_raster,
exploded.*
FROM (
SELECT
outdb_raster,
explode(arrays_zip(max_confidence_bboxes, max_confidence_scores, max_confidence_labels)) AS exploded
FROM
filtered_predictions
) temp
""")
df_exploded = exploded_df.withColumn("geometry", expr("ST_GeomFromWkt(max_confidence_bboxes)")).drop("max_confidence_bboxes")
print(df_exploded.cache().count())
df_exploded.show()
Zoom into the coasts of China or the Netherlands to spot some detected wind farms!
from sedona.maps.SedonaKepler import SedonaKepler
config = {
'version': 'v1',
'config': {
'mapStyle': {
'styleType': 'dark',
'topLayerGroups': {},
'visibleLayerGroups': {},
'mapStyles': {}
}
}
}
map = SedonaKepler.create_map(config=config)
SedonaKepler.add_df(map, df=df_exploded.drop("outdb_raster"), name="Wind Farm Detections")
map
wherobots.inference Python API¶
If you prefer python, wherobots.inference offers a module for registering the SQL inference functions as python functions. Below we run the same inference as before with RS_DETECT_BBOXES
.
from wherobots.inference.engine.register import create_object_detection_udfs
from pyspark.sql.functions import col
rs_detect, rs_threshold_geoms = create_object_detection_udfs(batch_size = 10, sedona=sedona)
df = df_raster_input.withColumn("detect_result", rs_detect(model_id, col("outdb_raster"))).select(
"outdb_raster",
col("detect_result.bboxes_wkt").alias("bboxes_wkt"),
col("detect_result.confidence_scores").alias("confidence_scores"),
col("detect_result.labels").alias("labels")
)
df.show()