Predicting Bounding Boxes and Segments from an LLM-based Text Query

Learn how to perform text-based object detection on aerial imagery using WherobotsAI’s Raster Inference with Segment Anything 2 (SAM2) and Google Deepmind’s Open Vocabulary Object Detection (OWLv2). Follow along as we use the simple query, “airplanes”, to identify, segment, and draw bounding boxes around commercial aircraft in satellite imagery of Miami airport.

Before you start

This is a read-only preview of this notebook. To execute the cells in this Jupyter Notebook, do the following:

Login to Wherobots Cloud.
Start a GPU-Optimized runtime instance.
Open a notebook. We recommend using a Tiny GPU-Optimized runtime.
Click File > Open from Path….
Enter the examples/Analyzing_Data/Raster_Text_To_Segments_Airplanes.ipynb path.

For more information on starting and using notebooks, see the following Wherobots Documentation:

Access a GPU-Optimized runtime

This notebook requires a GPU-Optimized runtime. For more information on GPU-Optimized runtimes, see Runtime types. To access this runtime category, do the following:

Sign up for a paid Wherobots Organization Edition (Professional or Enterprise).
Submit a Compute Request for a GPU-Optimized runtime.

Start WherobotsDB

from sedona.spark import SedonaContext
from sedona.raster_utils.SedonaUtils import SedonaUtils
from sedona.maps.SedonaKepler import SedonaKepler
from pyspark.sql.functions import expr

config = (
    SedonaContext.builder()
    .getOrCreate()
)

sedona = SedonaContext.create(config)

Load Aerial Imagery Efficiently

In this step, we’ll load the aerial imagery so we can run inference in a later step. The GeoTIFF image is large, so we’ll split it into tiles and load those tiles as out-of-database or “out-db” rasters in WherobotsDB.

url = "s3://wherobots-examples/data/naip/miami-airport.tiff"
tile_size = 256
df = sedona.read.format("raster").option("tileWidth", tile_size).option("tileHeight", tile_size).load(url)
df.createOrReplaceTempView("df")
df.show()

Viewing the Model’s Imagery Inputs

We can see the footprints of the tiled images with the SedonaKepler.create_map() integration. Using SedonaUtils.display_image() we can view the images as well. Tip: Save the map to a html file using kepler_map.save_to_html()

kepler_map = SedonaKepler.create_map()
df = df.withColumn('footprint', expr("ST_TRANSFORM(RS_CONVEXHULL(rast),'EPSG:4326')"))
SedonaKepler.add_df(kepler_map, df=df, name="Image Footprints")

kepler_map

htmlDf = sedona.sql(f"""SELECT RS_AsImage(rast, 250) as image FROM df limit 4""")
SedonaUtils.display_image(htmlDf)

Run Inference and Visualize Results

To run inference, specify the model to use with model id. Five models are pre-loaded and made available in Wherobots Cloud to Professional and Enterprise customers. You can also load your own models, learn more about that process here. Inference can be run using Wherobots’ Spatial SQL functions, in this case: RS_Text_to_Segments(). Here, we generate predictions for all images in the Region of Interest (ROI). In the output, a label value of 1 signifies a positive prediction corresponding to the input text prompt. Then, we’ll filter and print some of the results to see how our positive detection results look.

model_id = "sam2"
prompt = "airplanes"
threshold = 0.5

preds = sedona.sql(
    f"""SELECT rast, RS_TEXT_TO_SEGMENTS('{model_id}', rast, '{prompt}', {threshold}) AS preds from df"""
)
preds.cache().count()
preds.createOrReplaceTempView("preds")

Prepare Results

Before plotting our predictions, we need to transform our results. We’ll need to transform our table so that each raster scene only corresponds to a single predicted bounding box instead of every bounding box prediction. Bounding boxes (or Bboxes) are essentially boundaries drawn around an object of interest. To do this, combine the list columns containing our prediction results (max_confidence_bboxes, max_confidence_scores, and max_confidence_labels) with arrays_zip. Then, use explode to convert lists to rows. To map the results with SedonaKepler, convert the max_confidence_bboxes column to a GeometryType column with ST_GeomFromWKT

preds_filtered = sedona.sql(f"""
  SELECT *
  FROM preds
  WHERE
    size(preds.labels) > 0
    AND array_contains(preds.labels, 1)
    AND NOT array_contains(preds.segments_wkt, 'POLYGON EMPTY')
""")
preds_filtered.createOrReplaceTempView("preds_filtered")
preds_filtered.show()

Output:

+--------------------+--------------------+
|                rast|               preds|
+--------------------+--------------------+
|OutDbGridCoverage...|{[MULTIPOLYGON ((...|
|OutDbGridCoverage...|{[MULTIPOLYGON ((...|
|OutDbGridCoverage...|{[MULTIPOLYGON ((...|
|OutDbGridCoverage...|{[MULTIPOLYGON ((...|
|OutDbGridCoverage...|{[POLYGON ((-80.2...|
|OutDbGridCoverage...|{[POLYGON ((-80.2...|
|OutDbGridCoverage...|{[MULTIPOLYGON ((...|
|OutDbGridCoverage...|{[MULTIPOLYGON ((...|
|OutDbGridCoverage...|{[MULTIPOLYGON ((...|
|OutDbGridCoverage...|{[MULTIPOLYGON ((...|
|OutDbGridCoverage...|{[MULTIPOLYGON ((...|
|OutDbGridCoverage...|{[POLYGON ((-80.2...|
|OutDbGridCoverage...|{[MULTIPOLYGON ((...|
|OutDbGridCoverage...|{[MULTIPOLYGON ((...|
|OutDbGridCoverage...|{[MULTIPOLYGON ((...|
|OutDbGridCoverage...|{[POLYGON ((-80.2...|
|OutDbGridCoverage...|{[POLYGON ((-80.2...|
|OutDbGridCoverage...|{[MULTIPOLYGON ((...|
|OutDbGridCoverage...|{[MULTIPOLYGON ((...|
|OutDbGridCoverage...|{[MULTIPOLYGON ((...|
+--------------------+--------------------+
only showing top 20 rows

exploded = sedona.sql("""
SELECT
    rast,
    exploded_predictions.*
FROM
    preds_filtered
LATERAL VIEW explode(arrays_zip(preds.segments_wkt, preds.confidence_scores, preds.labels)) AS exploded_predictions
WHERE
    exploded_predictions.confidence_scores != 0.0
""")
exploded.cache().count()
exploded.createOrReplaceTempView("exploded")
exploded.show()

Output:

+--------------------+--------------------+-----------------+------+
|                rast|        segments_wkt|confidence_scores|labels|
+--------------------+--------------------+-----------------+------+
|OutDbGridCoverage...|MULTIPOLYGON (((-...|       0.92690223|     1|
|OutDbGridCoverage...|POLYGON ((-80.272...|        0.9711131|     1|
|OutDbGridCoverage...|MULTIPOLYGON (((-...|        0.9245136|     1|
|OutDbGridCoverage...|MULTIPOLYGON (((-...|        0.9354013|     1|
|OutDbGridCoverage...|MULTIPOLYGON (((-...|       0.81421095|     1|
|OutDbGridCoverage...|POLYGON ((-80.274...|        0.9641227|     1|
|OutDbGridCoverage...|POLYGON ((-80.284...|        0.9747455|     1|
|OutDbGridCoverage...|MULTIPOLYGON (((-...|        0.9429674|     1|
|OutDbGridCoverage...|MULTIPOLYGON (((-...|       0.84172434|     1|
|OutDbGridCoverage...|MULTIPOLYGON (((-...|       0.96940964|     1|
|OutDbGridCoverage...|MULTIPOLYGON (((-...|        0.9551783|     1|
|OutDbGridCoverage...|MULTIPOLYGON (((-...|        0.8909463|     1|
|OutDbGridCoverage...|MULTIPOLYGON (((-...|        0.9304122|     1|
|OutDbGridCoverage...|MULTIPOLYGON (((-...|       0.93405795|     1|
|OutDbGridCoverage...|POLYGON ((-80.286...|       0.95852387|     1|
|OutDbGridCoverage...|MULTIPOLYGON (((-...|       0.86694515|     1|
|OutDbGridCoverage...|MULTIPOLYGON (((-...|       0.72578835|     1|
|OutDbGridCoverage...|MULTIPOLYGON (((-...|        0.9040828|     1|
|OutDbGridCoverage...|MULTIPOLYGON (((-...|        0.9244062|     1|
|OutDbGridCoverage...|POLYGON ((-80.272...|        0.9236686|     1|
+--------------------+--------------------+-----------------+------+
only showing top 20 rows

Viewing Model Results: Airplane Segmentation Predictions

Just like we visualized the footprints of the tiled images earlier, we can also view our prediction geometries! Highlight a prediction to view its confidence score.

kepler_map = SedonaKepler.create_map()
SedonaKepler.add_df(kepler_map, df=exploded, name="Airplane Detections")

kepler_map

To view results on the underlying imagery used by the model, you can use the show_detections function. This function accepts a Dataframe containing an outdb_raster column as well as other arguments to control the plot result. Check out the full docs for the function by calling show_detections?

from wherobots.inference.plot.detections import show_detections
show_detections?

Signature:
show_detections(
    df: pandas.core.frame.DataFrame | pyspark.sql.dataframe.DataFrame,
    geometry_column: str = None,
    geometry_crs: str = 'EPSG:4326',
    confidence_threshold: float = 0.05,
    plot_geoms: bool = True,
    side_by_side: bool = True,
) -> None
Docstring:
Plot raster images with detected object geometries overlaid.

This function handles both Pandas and PySpark DataFrames, automatically detecting the raster column.
It is compatible with dataframes returned from a SQL inference function. Exploded dataframes are not
supported.

Args:
    df: Pandas or PySpark DataFrame containing raster data and detection results
    geometry_column: Column name containing WKT geometries (if None, automatically detected)
    geometry_crs: Coordinate reference system of the geometries (default: EPSG:4326)
    confidence_threshold: Minimum confidence score for displaying detections
    plot_geoms: Whether to overlay geometries on the images
    side_by_side: Whether to show original and detection images side by side

Raises:
    ValueError: If the DataFrame is empty or required columns cannot be found
File:      /opt/conda/envs/wherobots/lib/python3.11/site-packages/wherobots/inference/plot/detections.py
Type:      function

unpacked_preds_df = sedona.sql("SELECT rast, preds.* FROM preds_filtered")

show_detections(
    unpacked_preds_df,
    confidence_threshold=0.7,
    plot_geoms=True,
    geometry_column="segments_wkt",
)

Too many detections to plot (57). Randomly sampling 3 records to plot.

Running Object Detection with a Text Prompt

We can also get bounding box predictions instead of segments using RS_Text_To_BBoxes. BBoxes, or bounding boxes, are more useful when you are only concerned with counting and localizing objects rather than delineating exact shape and area with RS_Text_To_Segments. The inference process is largely the same for RS_Text_To_BBoxes and RS_Text_To_Segments. There are 2 key differences:

Using the owlv2 model_id instead of sam2.
Changing our SQL queries to operate on the bboxes_wkt column instead of the segments_wkt column when working with prediction results.

model_id = "owlv2"
prompt = "airplanes"
threshold = 0.5

preds = sedona.sql(
    f"""SELECT rast, RS_TEXT_TO_BBoxes('{model_id}', rast, '{prompt}', {threshold}) AS preds from df"""
)
preds.cache().count()
preds.createOrReplaceTempView("preds")

Just like before, we’ll filter predictions by labels, remove empty predictions, and show the results in a browsable map and on top of the original imagery for comparison.

preds_filtered = sedona.sql(f"""
  SELECT *
  FROM preds
  WHERE
    size(preds.labels) > 0
    AND array_contains(preds.labels, 1)
    AND NOT array_contains(preds.bboxes_wkt, 'POLYGON EMPTY')
""")
preds_filtered.createOrReplaceTempView("preds_filtered")
preds_filtered.show()

Output:

+--------------------+--------------------+
|                rast|               preds|
+--------------------+--------------------+
|OutDbGridCoverage...|{[POLYGON ((-80.2...|
|OutDbGridCoverage...|{[POLYGON ((-80.2...|
|OutDbGridCoverage...|{[POLYGON ((-80.2...|
|OutDbGridCoverage...|{[POLYGON ((-80.2...|
|OutDbGridCoverage...|{[POLYGON ((-80.2...|
|OutDbGridCoverage...|{[POLYGON ((-80.2...|
|OutDbGridCoverage...|{[POLYGON ((-80.2...|
|OutDbGridCoverage...|{[POLYGON ((-80.2...|
|OutDbGridCoverage...|{[POLYGON ((-80.2...|
|OutDbGridCoverage...|{[POLYGON ((-80.2...|
|OutDbGridCoverage...|{[POLYGON ((-80.2...|
|OutDbGridCoverage...|{[POLYGON ((-80.2...|
|OutDbGridCoverage...|{[POLYGON ((-80.2...|
|OutDbGridCoverage...|{[POLYGON ((-80.2...|
|OutDbGridCoverage...|{[POLYGON ((-80.2...|
|OutDbGridCoverage...|{[POLYGON ((-80.2...|
|OutDbGridCoverage...|{[POLYGON ((-80.2...|
|OutDbGridCoverage...|{[POLYGON ((-80.2...|
|OutDbGridCoverage...|{[POLYGON ((-80.2...|
|OutDbGridCoverage...|{[POLYGON ((-80.2...|
+--------------------+--------------------+
only showing top 20 rows

exploded = sedona.sql("""
SELECT
    rast,
    exploded_predictions.*
FROM
    preds_filtered
LATERAL VIEW explode(arrays_zip(preds.bboxes_wkt, preds.confidence_scores, preds.labels)) AS exploded_predictions
WHERE
    exploded_predictions.confidence_scores != 0.0
""")
exploded.cache().count()
exploded.createOrReplaceTempView("exploded")
exploded.show()

Output:

+--------------------+--------------------+-----------------+------+
|                rast|          bboxes_wkt|confidence_scores|labels|
+--------------------+--------------------+-----------------+------+
|OutDbGridCoverage...|POLYGON ((-80.272...|        0.5493794|     1|
|OutDbGridCoverage...|POLYGON ((-80.273...|        0.5114912|     1|
|OutDbGridCoverage...|POLYGON ((-80.281...|       0.60383886|     1|
|OutDbGridCoverage...|POLYGON ((-80.274...|        0.5903963|     1|
|OutDbGridCoverage...|POLYGON ((-80.273...|        0.5410369|     1|
|OutDbGridCoverage...|POLYGON ((-80.274...|       0.62446827|     1|
|OutDbGridCoverage...|POLYGON ((-80.284...|         0.653508|     1|
|OutDbGridCoverage...|POLYGON ((-80.281...|        0.6892973|     1|
|OutDbGridCoverage...|POLYGON ((-80.271...|       0.55916643|     1|
|OutDbGridCoverage...|POLYGON ((-80.270...|        0.5294066|     1|
|OutDbGridCoverage...|POLYGON ((-80.271...|        0.5224226|     1|
|OutDbGridCoverage...|POLYGON ((-80.280...|         0.663254|     1|
|OutDbGridCoverage...|POLYGON ((-80.280...|       0.60684156|     1|
|OutDbGridCoverage...|POLYGON ((-80.274...|       0.62298393|     1|
|OutDbGridCoverage...|POLYGON ((-80.286...|       0.66921747|     1|
|OutDbGridCoverage...|POLYGON ((-80.286...|        0.5258749|     1|
|OutDbGridCoverage...|POLYGON ((-80.276...|        0.5431616|     1|
|OutDbGridCoverage...|POLYGON ((-80.277...|        0.6640583|     1|
|OutDbGridCoverage...|POLYGON ((-80.282...|        0.5973092|     1|
|OutDbGridCoverage...|POLYGON ((-80.272...|        0.6232186|     1|
+--------------------+--------------------+-----------------+------+
only showing top 20 rows

kepler_map = SedonaKepler.create_map()
SedonaKepler.add_df(kepler_map, df=exploded, name="Airplane Detections")

kepler_map

unpacked_preds_df = sedona.sql("SELECT rast, preds.* FROM preds_filtered")

We see below that OWLv2 and SAM2 do remarkably well at identifying airplanes with little user effort! Previously, achieving similar results was a significant undertaking. An entire Machine Learning engineering team would have needed to build such a model from scratch.

show_detections(
    unpacked_preds_df,
    confidence_threshold=0.5,
    plot_geoms=True,
    side_by_side=False,
    geometry_column="bboxes_wkt",
)

Too many detections to plot (57). Randomly sampling 3 records to plot.

Next Steps with Raster Inference

With access to general-purpose, text-promptable models, what will you predict and georeference next? Some ideas on next steps to try, include:

Predicting different objects next to the airplanes in the image tiles above using new text prompts.
Adjusting the confidence score threshold for RS_Text_to_Segments or RS_Text_to_BBoxes to see how SAM2 or OWLv2 respond.
Loading a new imagery dataset with our STAC Reader and try to predict a different feature of interest, such as agriculture, buildings, or tree crowns.

We’re excited to hear about what you’re doing with SAM2 and OWLv2!

Spatial Catalog

WherobotsDB

WherobotsAI

​Before you start

​Access a GPU-Optimized runtime

​Start WherobotsDB

​Load Aerial Imagery Efficiently

​Viewing the Model’s Imagery Inputs

​Run Inference and Visualize Results

​Prepare Results

​Viewing Model Results: Airplane Segmentation Predictions

​Running Object Detection with a Text Prompt

​Next Steps with Raster Inference

Before you start

Access a GPU-Optimized runtime

Start WherobotsDB

Load Aerial Imagery Efficiently

Viewing the Model’s Imagery Inputs

Run Inference and Visualize Results

Prepare Results

Viewing Model Results: Airplane Segmentation Predictions

Running Object Detection with a Text Prompt

Next Steps with Raster Inference