Predicting Bounding Boxes and Segments from an LLM-based Text Query¶
Learn how to perform text-based object detection on aerial imagery using WherobotsAI's Raster Inference with Segment Anything 2 (SAM2) and Google Deepmind's Open Vocabulary Object Detection (OWLv2).
Follow along as we use the simple query, "airplanes", to identify, segment, and draw bounding boxes around commercial aircraft in satellite imagery of Miami airport.
Before you start¶
This is a read-only preview of this notebook.
To execute the cells in this Jupyter Notebook, do the following:
- Login to Wherobots Cloud.
- Start a GPU-Optimized runtime instance.
- Open a notebook. We recommend using a Tiny GPU-Optimized runtime.
- Click File > Open from Path....
- Enter the
examples/Analyzing_Data/Raster_Text_To_Segments_Airplanes.ipynb
path.
For more information on starting and using notebooks, see the following Wherobots Documentation:
Access a GPU-Optimized runtime¶
This notebook requires a GPU-Optimized runtime. For more information on GPU-Optimized runtimes, see Runtime types.
To access this runtime category, do the following:
- Sign up for a paid Wherobots Organization Edition (Professional or Enterprise).
- Submit a Compute Request for a GPU-Optimized runtime.
Start WherobotsDB¶
from sedona.spark import SedonaContext
from sedona.raster_utils.SedonaUtils import SedonaUtils
from sedona.maps.SedonaKepler import SedonaKepler
from pyspark.sql.functions import expr
config = (
SedonaContext.builder()
.getOrCreate()
)
sedona = SedonaContext.create(config)
Load Aerial Imagery Efficiently¶
In this step, we'll load the aerial imagery so we can run inference in a later step.
The GeoTIFF image is large, so we'll split it into tiles and load those tiles as out-of-database or "out-db" rasters in WherobotsDB.
url = "s3://wherobots-examples/data/naip/miami-airport.tiff"
tile_size = 256
df = sedona.read.format("raster").option("tileWidth", tile_size).option("tileHeight", tile_size).load(url)
df.createOrReplaceTempView("df")
df.show()
Viewing the Model's Imagery Inputs¶
We can see the footprints of the tiled images with the SedonaKepler.create_map()
integration. Using SedonaUtils.display_image()
we can view the images as well.
Tip: Save the map to a html file using kepler_map.save_to_html()
kepler_map = SedonaKepler.create_map()
df = df.withColumn('footprint', expr("ST_TRANSFORM(RS_CONVEXHULL(rast),'EPSG:4326')"))
SedonaKepler.add_df(kepler_map, df=df, name="Image Footprints")
kepler_map
Run Inference and Visualize Results¶
To run inference, specify the model to use with model id
. Five models are pre-loaded and made available in Wherobots Cloud to Professional and Enterprise customers. You can also load your own models, learn more about that process here.
Inference can be run using Wherobots' Spatial SQL functions, in this case: RS_Text_to_Segments()
.
Here, we generate predictions for all images in the Region of Interest (ROI). In the output, a label value of 1 signifies a positive prediction corresponding to the input text prompt.
Then, we'll filter and print some of the results to see how our positive detection results look.
model_id = "sam2"
prompt = "airplanes"
threshold = 0.5
preds = sedona.sql(
f"""SELECT rast, RS_TEXT_TO_SEGMENTS('{model_id}', rast, '{prompt}', {threshold}) AS preds from df"""
)
preds.cache().count()
preds.createOrReplaceTempView("preds")
Prepare Results¶
Before plotting our predictions, we need to transform our results.
We'll need to transform our table so that each raster scene only corresponds to a single predicted bounding box instead of every bounding box prediction.
Bounding boxes (or Bboxes) are essentially boundaries drawn around an object of interest.
To do this, combine the list columns containing our prediction results (max_confidence_bboxes
, max_confidence_scores
, and max_confidence_labels
) with arrays_zip
. Then, use explode
to convert lists to rows.
To map the results with SedonaKepler
, convert the max_confidence_bboxes
column to a GeometryType
column with ST_GeomFromWKT
preds_filtered = sedona.sql(f"""
SELECT *
FROM preds
WHERE
size(preds.labels) > 0
AND array_contains(preds.labels, 1)
AND NOT array_contains(preds.segments_wkt, 'POLYGON EMPTY')
""")
preds_filtered.createOrReplaceTempView("preds_filtered")
preds_filtered.show()
+--------------------+--------------------+ | rast| preds| +--------------------+--------------------+ |OutDbGridCoverage...|{[MULTIPOLYGON ((...| |OutDbGridCoverage...|{[MULTIPOLYGON ((...| |OutDbGridCoverage...|{[MULTIPOLYGON ((...| |OutDbGridCoverage...|{[MULTIPOLYGON ((...| |OutDbGridCoverage...|{[POLYGON ((-80.2...| |OutDbGridCoverage...|{[POLYGON ((-80.2...| |OutDbGridCoverage...|{[MULTIPOLYGON ((...| |OutDbGridCoverage...|{[MULTIPOLYGON ((...| |OutDbGridCoverage...|{[MULTIPOLYGON ((...| |OutDbGridCoverage...|{[MULTIPOLYGON ((...| |OutDbGridCoverage...|{[MULTIPOLYGON ((...| |OutDbGridCoverage...|{[POLYGON ((-80.2...| |OutDbGridCoverage...|{[MULTIPOLYGON ((...| |OutDbGridCoverage...|{[MULTIPOLYGON ((...| |OutDbGridCoverage...|{[MULTIPOLYGON ((...| |OutDbGridCoverage...|{[POLYGON ((-80.2...| |OutDbGridCoverage...|{[POLYGON ((-80.2...| |OutDbGridCoverage...|{[MULTIPOLYGON ((...| |OutDbGridCoverage...|{[MULTIPOLYGON ((...| |OutDbGridCoverage...|{[MULTIPOLYGON ((...| +--------------------+--------------------+ only showing top 20 rows
exploded = sedona.sql("""
SELECT
rast,
exploded_predictions.*
FROM
preds_filtered
LATERAL VIEW explode(arrays_zip(preds.segments_wkt, preds.confidence_scores, preds.labels)) AS exploded_predictions
WHERE
exploded_predictions.confidence_scores != 0.0
""")
exploded.cache().count()
exploded.createOrReplaceTempView("exploded")
exploded.show()
+--------------------+--------------------+-----------------+------+ | rast| segments_wkt|confidence_scores|labels| +--------------------+--------------------+-----------------+------+ |OutDbGridCoverage...|MULTIPOLYGON (((-...| 0.92690223| 1| |OutDbGridCoverage...|POLYGON ((-80.272...| 0.9711131| 1| |OutDbGridCoverage...|MULTIPOLYGON (((-...| 0.9245136| 1| |OutDbGridCoverage...|MULTIPOLYGON (((-...| 0.9354013| 1| |OutDbGridCoverage...|MULTIPOLYGON (((-...| 0.81421095| 1| |OutDbGridCoverage...|POLYGON ((-80.274...| 0.9641227| 1| |OutDbGridCoverage...|POLYGON ((-80.284...| 0.9747455| 1| |OutDbGridCoverage...|MULTIPOLYGON (((-...| 0.9429674| 1| |OutDbGridCoverage...|MULTIPOLYGON (((-...| 0.84172434| 1| |OutDbGridCoverage...|MULTIPOLYGON (((-...| 0.96940964| 1| |OutDbGridCoverage...|MULTIPOLYGON (((-...| 0.9551783| 1| |OutDbGridCoverage...|MULTIPOLYGON (((-...| 0.8909463| 1| |OutDbGridCoverage...|MULTIPOLYGON (((-...| 0.9304122| 1| |OutDbGridCoverage...|MULTIPOLYGON (((-...| 0.93405795| 1| |OutDbGridCoverage...|POLYGON ((-80.286...| 0.95852387| 1| |OutDbGridCoverage...|MULTIPOLYGON (((-...| 0.86694515| 1| |OutDbGridCoverage...|MULTIPOLYGON (((-...| 0.72578835| 1| |OutDbGridCoverage...|MULTIPOLYGON (((-...| 0.9040828| 1| |OutDbGridCoverage...|MULTIPOLYGON (((-...| 0.9244062| 1| |OutDbGridCoverage...|POLYGON ((-80.272...| 0.9236686| 1| +--------------------+--------------------+-----------------+------+ only showing top 20 rows
Viewing Model Results: Airplane Segmentation Predictions¶
Just like we visualized the footprints of the tiled images earlier, we can also view our prediction geometries! Highlight a prediction to view its confidence score.
kepler_map = SedonaKepler.create_map()
SedonaKepler.add_df(kepler_map, df=exploded, name="Airplane Detections")
kepler_map
To view results on the underlying imagery used by the model, you can use the show_detections
function. This function accepts a Dataframe containing an outdb_raster
column as well as other arguments to control the plot result. Check out the full docs for the function by calling show_detections?
from wherobots.inference.plot.detections import show_detections
show_detections?
Signature: show_detections( df: pandas.core.frame.DataFrame | pyspark.sql.dataframe.DataFrame, geometry_column: str = None, geometry_crs: str = 'EPSG:4326', confidence_threshold: float = 0.05, plot_geoms: bool = True, side_by_side: bool = True, ) -> None Docstring: Plot raster images with detected object geometries overlaid. This function handles both Pandas and PySpark DataFrames, automatically detecting the raster column. It is compatible with dataframes returned from a SQL inference function. Exploded dataframes are not supported. Args: df: Pandas or PySpark DataFrame containing raster data and detection results geometry_column: Column name containing WKT geometries (if None, automatically detected) geometry_crs: Coordinate reference system of the geometries (default: EPSG:4326) confidence_threshold: Minimum confidence score for displaying detections plot_geoms: Whether to overlay geometries on the images side_by_side: Whether to show original and detection images side by side Raises: ValueError: If the DataFrame is empty or required columns cannot be found File: /opt/conda/envs/wherobots/lib/python3.11/site-packages/wherobots/inference/plot/detections.py Type: function
unpacked_preds_df = sedona.sql("SELECT rast, preds.* FROM preds_filtered")
show_detections(
unpacked_preds_df,
confidence_threshold=0.7,
plot_geoms=True,
geometry_column="segments_wkt",
)
Running Object Detection with a Text Prompt¶
We can also get bounding box predictions instead of segments using RS_Text_To_BBoxes
. BBoxes, or bounding boxes, are more useful when you are only concerned with counting and localizing objects rather than delineating exact shape and area with RS_Text_To_Segments
.
The inference process is largely the same for RS_Text_To_BBoxes
and RS_Text_To_Segments
.
There are 2 key differences:
- Using the
owlv2
model_id
instead ofsam2
. - Changing our SQL queries to operate on the
bboxes_wkt
column instead of thesegments_wkt
column when working with prediction results.
model_id = "owlv2"
prompt = "airplanes"
threshold = 0.5
preds = sedona.sql(
f"""SELECT rast, RS_TEXT_TO_BBoxes('{model_id}', rast, '{prompt}', {threshold}) AS preds from df"""
)
preds.cache().count()
preds.createOrReplaceTempView("preds")
Just like before, we'll filter predictions by labels, remove empty predictions, and show the results in a browsable map and on top of the original imagery for comparison.
preds_filtered = sedona.sql(f"""
SELECT *
FROM preds
WHERE
size(preds.labels) > 0
AND array_contains(preds.labels, 1)
AND NOT array_contains(preds.bboxes_wkt, 'POLYGON EMPTY')
""")
preds_filtered.createOrReplaceTempView("preds_filtered")
preds_filtered.show()
+--------------------+--------------------+ | rast| preds| +--------------------+--------------------+ |OutDbGridCoverage...|{[POLYGON ((-80.2...| |OutDbGridCoverage...|{[POLYGON ((-80.2...| |OutDbGridCoverage...|{[POLYGON ((-80.2...| |OutDbGridCoverage...|{[POLYGON ((-80.2...| |OutDbGridCoverage...|{[POLYGON ((-80.2...| |OutDbGridCoverage...|{[POLYGON ((-80.2...| |OutDbGridCoverage...|{[POLYGON ((-80.2...| |OutDbGridCoverage...|{[POLYGON ((-80.2...| |OutDbGridCoverage...|{[POLYGON ((-80.2...| |OutDbGridCoverage...|{[POLYGON ((-80.2...| |OutDbGridCoverage...|{[POLYGON ((-80.2...| |OutDbGridCoverage...|{[POLYGON ((-80.2...| |OutDbGridCoverage...|{[POLYGON ((-80.2...| |OutDbGridCoverage...|{[POLYGON ((-80.2...| |OutDbGridCoverage...|{[POLYGON ((-80.2...| |OutDbGridCoverage...|{[POLYGON ((-80.2...| |OutDbGridCoverage...|{[POLYGON ((-80.2...| |OutDbGridCoverage...|{[POLYGON ((-80.2...| |OutDbGridCoverage...|{[POLYGON ((-80.2...| |OutDbGridCoverage...|{[POLYGON ((-80.2...| +--------------------+--------------------+ only showing top 20 rows
exploded = sedona.sql("""
SELECT
rast,
exploded_predictions.*
FROM
preds_filtered
LATERAL VIEW explode(arrays_zip(preds.bboxes_wkt, preds.confidence_scores, preds.labels)) AS exploded_predictions
WHERE
exploded_predictions.confidence_scores != 0.0
""")
exploded.cache().count()
exploded.createOrReplaceTempView("exploded")
exploded.show()
+--------------------+--------------------+-----------------+------+ | rast| bboxes_wkt|confidence_scores|labels| +--------------------+--------------------+-----------------+------+ |OutDbGridCoverage...|POLYGON ((-80.272...| 0.5493794| 1| |OutDbGridCoverage...|POLYGON ((-80.273...| 0.5114912| 1| |OutDbGridCoverage...|POLYGON ((-80.281...| 0.60383886| 1| |OutDbGridCoverage...|POLYGON ((-80.274...| 0.5903963| 1| |OutDbGridCoverage...|POLYGON ((-80.273...| 0.5410369| 1| |OutDbGridCoverage...|POLYGON ((-80.274...| 0.62446827| 1| |OutDbGridCoverage...|POLYGON ((-80.284...| 0.653508| 1| |OutDbGridCoverage...|POLYGON ((-80.281...| 0.6892973| 1| |OutDbGridCoverage...|POLYGON ((-80.271...| 0.55916643| 1| |OutDbGridCoverage...|POLYGON ((-80.270...| 0.5294066| 1| |OutDbGridCoverage...|POLYGON ((-80.271...| 0.5224226| 1| |OutDbGridCoverage...|POLYGON ((-80.280...| 0.663254| 1| |OutDbGridCoverage...|POLYGON ((-80.280...| 0.60684156| 1| |OutDbGridCoverage...|POLYGON ((-80.274...| 0.62298393| 1| |OutDbGridCoverage...|POLYGON ((-80.286...| 0.66921747| 1| |OutDbGridCoverage...|POLYGON ((-80.286...| 0.5258749| 1| |OutDbGridCoverage...|POLYGON ((-80.276...| 0.5431616| 1| |OutDbGridCoverage...|POLYGON ((-80.277...| 0.6640583| 1| |OutDbGridCoverage...|POLYGON ((-80.282...| 0.5973092| 1| |OutDbGridCoverage...|POLYGON ((-80.272...| 0.6232186| 1| +--------------------+--------------------+-----------------+------+ only showing top 20 rows
kepler_map = SedonaKepler.create_map()
SedonaKepler.add_df(kepler_map, df=exploded, name="Airplane Detections")
kepler_map
unpacked_preds_df = sedona.sql("SELECT rast, preds.* FROM preds_filtered")
We see below that OWLv2 and SAM2 do remarkably well at identifying airplanes with little user effort! Previously, achieving similar results was a significant undertaking. An entire Machine Learning engineering team would have needed to build such a model from scratch.
show_detections(
unpacked_preds_df,
confidence_threshold=0.5,
plot_geoms=True,
side_by_side=False,
geometry_column="bboxes_wkt",
)
Next Steps with Raster Inference¶
With access to general-purpose, text-promptable models, what will you predict and georeference next?
Some ideas on next steps to try, include:
- Predicting different objects next to the airplanes in the image tiles above using new text prompts.
- Adjusting the confidence score threshold for
RS_Text_to_Segments
orRS_Text_to_BBoxes
to see how SAM2 or OWLv2 respond. - Loading a new imagery dataset with our STAC Reader and try to predict a different feature of interest, such as agriculture, buildings, or tree crowns.
We're excited to hear about what you're doing with SAM2 and OWLv2!