> ## Documentation Index
> Fetch the complete documentation index at: https://docs.wherobots.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Bring Your Own Rasters (BYOR) with RasterFlow

<Badge color="purple">Private Preview</Badge>

<Tip>
  The following content is a read-only preview of an executable Jupyter notebook.

  To run this notebook interactively:

  1. Go to [**Wherobots Cloud**](https://cloud.wherobots.com).
  2. Start a runtime.
  3. Open the notebook.
  4. In the Jupyter Launcher:
     1. Click **File > Open Path**.
     2. Paste the following path to access this notebook: `examples/Analyzing_Data/RasterFlow_Bring_Your_Own_Rasters_NAIP.ipynb`
     3. Click **Enter**.
</Tip>

This notebook demonstrates how to bring your own rasters (BYOR) into RasterFlow by querying a STAC catalog and creating a GTI (GDAL Raster Tile Index). You will learn how to query the NAIP collection, extract Cloud-Optimized GeoTIFF (COG) URLs, and build a mosaic using RasterFlow's `build_gti_mosaic` function.

## What you will learn

This notebook will teach you to:

* Query a STAC catalog to discover available imagery for an area of interest
* Create a GTI (GDAL Raster Tile Index) from STAC items
* Build a seamless mosaic from multiple image tiles using RasterFlow's `build_gti_mosaic`
* Visualize the resulting mosaic and inference outputs
* (Optional) Run road detection using the ChesapeakeRSC model

### NAIP (National Agriculture Imagery Program)

The [National Agriculture Imagery Program (NAIP)](https://www.usgs.gov/centers/eros/science/usgs-eros-archive-aerial-photography-national-agriculture-imagery-program-naip) acquires aerial imagery during the agricultural growing seasons in the continental United States.

Key characteristics:

* **Resolution**: 1 meter ground sample distance
* **Bands**: 4-band imagery (Red, Green, Blue, NIR) in a single COG file
* **Coverage**: Continental United States
* **Update cycle**: Typically every 2-3 years per state

We will use the Element84 Earth Search STAC catalog to query NAIP imagery for Montgomery County, Maryland.

## Setup and Imports

```python theme={"system"}
import os

import geopandas as gpd
import wkls
from ipyleaflet import basemaps
from leafmap import Map
```

## Selecting an Area of Interest (AOI)

We will use Montgomery County, Maryland as our AOI. This county is well-covered by NAIP imagery and provides a good example for demonstrating the BYOR workflow.

```python theme={"system"}
# Generate a geometry for Montgomery County, MD using Well-Known Locations (https://github.com/wherobots/wkls)
gdf = gpd.read_file(wkls['us']['md']['montgomery county'].geojson())

# Save the geometry to a parquet file in the user's S3 path
aoi_path = os.getenv("USER_S3_PATH") + "montgomery_county_md.parquet"
gdf.to_parquet(aoi_path)

print(f"AOI saved to: {aoi_path}")
print(f"Bounding box: {gdf.total_bounds}")
print(f"CRS: {gdf.crs}")

# Display the AOI boundary on an interactive map with basemap
minx, miny, maxx, maxy = gdf.total_bounds
center_lat = (miny + maxy) / 2
center_lon = (minx + maxx) / 2

m = Map(center=(center_lat, center_lon), zoom=10, basemap=basemaps.Esri.WorldImagery)
m.add_gdf(
    gdf, 
    layer_name="Montgomery County, MD", 
    zoom_to_layer=True,
    style={"color": "red", "fillColor": "lightblue", "fillOpacity": 0.3, "weight": 2}
)
m
```

## Loading the NAIP STAC collection

We use Wherobots' built-in STAC reader to load the NAIP collection from Element84's Earth Search catalog. This catalog provides direct S3 URLs to the imagery files.

```python theme={"system"}
from pyspark.sql import functions as F
from sedona.spark import *

config = SedonaContext.builder().getOrCreate()
sedona = SedonaContext.create(config)
```

```python theme={"system"}
# Get the actual polygon geometry as WKT for spatial filter
aoi_wkt = gdf.geometry.iloc[0].wkt

print(f"AOI geometry type: {gdf.geometry.iloc[0].geom_type}")

# Load NAIP collection from Element84 Earth Search
naip_df = (
    sedona.read.format("stac")
    .option("itemsLimitMax", "200")
    .option("itemsLimitPerRequest", "50")
    .load("https://earth-search.aws.element84.com/v1/collections/naip")
)

naip_df.createOrReplaceTempView("naip_raw")

print(f"\nLoaded NAIP STAC collection")
print(f"Schema:")
naip_df.printSchema()
```

## Applying spatial and temporal filters

We filter the NAIP collection to only include tiles that:

* Intersect with our AOI (Montgomery County)
* Were captured in 2017 (the most recent year with 1-meter resolution NAIP imagery available for Maryland so we can test the ChesapeakeRSC model)

```python theme={"system"}
# Filter by:
# - Date range: 2017 (full year)
# - Spatial: Intersects with Montgomery County polygon boundary

filtered_df = sedona.sql(f"""
    SELECT 
        id,
        geometry,
        datetime,
        assets
    FROM naip_raw
    WHERE datetime BETWEEN '2017-01-01' AND '2017-12-31'
      AND ST_Intersects(geometry, ST_GeomFromText('{aoi_wkt}'))
""")

filtered_df.createOrReplaceTempView("naip_filtered")

tile_count = filtered_df.count()
print(f"Found {tile_count} NAIP tiles for Montgomery County, MD in 2017")
filtered_df.select("id", "datetime").show(10, truncate=False)
```

## Exploring the STAC asset structure

Before creating the GTI, we need to understand the structure of the STAC assets. NAIP stores all 4 bands (RGBIR) in a single COG file, so we only need one row per tile in our GTI.

```python theme={"system"}
# Examine the assets structure
# NAIP has a single 'image' asset containing all 4 bands (RGBIR)

sample_df = sedona.sql("SELECT id, assets FROM naip_filtered LIMIT 1")
sample = sample_df.first()

if sample:
    print(f"Tile ID: {sample['id']}")
    print(f"\nAvailable asset keys:")
    for key in sorted(sample['assets'].keys()):
        print(f"  - {key}")
    
    # Check structure of the 'image' asset (4-band COG)
    print(f"\n--- Structure of 'image' asset ---")
    image_asset = sample['assets'].get('image')
    if image_asset:
        print(f"Type: {type(image_asset)}")
        print(f"Content: {image_asset}")
else:
    print("No samples found - check filters")
```

```python theme={"system"}
# Show the S3 URL for the 4-band COG

sample_df = sedona.sql("""
    SELECT 
        id,
        assets['image'].href as image_url
    FROM naip_filtered 
    LIMIT 5
""")

sample_df.show(truncate=False)
```

## Creating the GTI (GDAL Raster Tile Index)

The GTI is a spatial index that maps tile geometries to their corresponding COG URLs.  For more information on GTI, see [the reference documentation](https://gdal.org/en/stable/drivers/raster/gti.html).

Since NAIP has a single 4-band COG per tile, we create one row per tile.

```python theme={"system"}
# Create the GTI
# NAIP has a single 4-band COG per tile, so we just need one row per tile

gti_df = sedona.sql("""
    SELECT 
        id as tile_id,
        ST_AsText(geometry) as geometry_wkt,
        geometry,
        datetime,
        assets['image'].href as url
    FROM naip_filtered
    WHERE assets['image'] IS NOT NULL
""")

# Filter out any rows where URL is null
gti_df = gti_df.filter(F.col("url").isNotNull())

gti_df.createOrReplaceTempView("gti")

row_count = gti_df.count()
print(f"GTI has {row_count} rows (one row per NAIP tile)")
gti_df.select("tile_id", "datetime", "url").show(10, truncate=80)
```

## Saving the GTI as GeoParquet

We convert the Spark DataFrame to a GeoDataFrame and save it as GeoParquet for use with RasterFlow.

```python theme={"system"}
from shapely import wkt

# Select columns for the GTI (excluding the Sedona geometry, using WKT instead)
gti_export_df = gti_df.select(
    "tile_id",
    "geometry_wkt",
    "datetime",
    "url"
)

# Convert Spark DataFrame to Pandas
gti_pandas = gti_export_df.toPandas()

# Convert WKT geometry to Shapely geometry and create GeoDataFrame
gti_pandas['geometry'] = gti_pandas['geometry_wkt'].apply(wkt.loads)
gti_gdf = gpd.GeoDataFrame(gti_pandas, geometry='geometry', crs="EPSG:4326")

# Drop the WKT column (no longer needed)
gti_gdf = gti_gdf.drop(columns=['geometry_wkt'])

# Save to S3 as GeoParquet
gti_path = os.getenv("USER_S3_PATH") + "naip_gti.parquet"
gti_gdf.to_parquet(gti_path)

print(f"GTI saved to: {gti_path}")
print(f"\nGTI Schema:")
print(gti_gdf.dtypes)
print(f"\nSample rows:")
gti_gdf.head()
```

## Verifying the GTI

Before building the mosaic, we verify that the GTI was saved correctly and visualize the tile footprints.

```python theme={"system"}
# Reload and verify the GTI
gti_verify = gpd.read_parquet(gti_path)

print(f"GTI contains {len(gti_verify)} rows")
print(f"Unique tiles: {gti_verify['tile_id'].nunique()}")
print(f"Date range: {gti_verify['datetime'].min()} to {gti_verify['datetime'].max()}")

print(f"\nSample URLs:")
for i, row in gti_verify.head(3).iterrows():
    print(f"  {row['tile_id']}: {row['url'][:70]}...")
```

```python theme={"system"}
# Visualize the GTI footprints with the AOI boundary on an interactive map

# Calculate center from the AOI
minx, miny, maxx, maxy = gdf.total_bounds
center_lat = (miny + maxy) / 2
center_lon = (minx + maxx) / 2

m = Map(center=(center_lat, center_lon), zoom=10, basemap=basemaps.Esri.WorldImagery)

# Add NAIP tile footprints
m.add_gdf(
    gti_verify, 
    layer_name="NAIP Tile Footprints", 
    style={"color": "blue", "fillColor": "lightblue", "fillOpacity": 0.3, "weight": 1}
)

# Add AOI boundary on top
m.add_gdf(
    gdf, 
    layer_name="Montgomery County, MD", 
    zoom_to_layer=True,
    style={"color": "red", "fillColor": "none", "fillOpacity": 0, "weight": 3}
)

print(f"Showing {len(gti_verify)} NAIP tile footprints over Montgomery County, MD")
m
```

## Building a mosaic with RasterFlow

Now we use RasterFlow's `build_gti_mosaic` function to create a seamless 4-band mosaic from the NAIP tiles.

> **Note:** The `naip-analytic` S3 bucket is requester-pays, so we set `requester_pays=True`.

**Expected runtime: \~5-10 minutes** depending on the number of tiles.

```python theme={"system"}
from rasterflow_remote import RasterflowClient

client = RasterflowClient()

# Build a 4-band mosaic (Red, Green, Blue, NIR)
mosaic_index = client.build_gti_mosaic(
    # Path to the GTI GeoParquet file
    gti=os.getenv("USER_S3_PATH") + "naip_gti.parquet",
    
    # Path to the AOI GeoParquet file
    aoi=os.getenv("USER_S3_PATH") + "montgomery_county_md.parquet",
    
    # Output bands (NAIP has 4 bands: r, g, b, ir)
    # Note: We use 'ir' to match the ChesapeakeRSC model's expected input bands
    bands=["r", "g", "b", "ir"],
    
    # Column in the GTI containing the COG URLs
    location_field="url",
    # Group tiles for time dimension by any column in the gti index
    time_column="datetime",
    
    # Output CRS (UTM Zone 18N for Maryland)
    crs_epsg=32618,
    
    # NAIP bucket is requester-pays
    requester_pays=True,
    
    # NAIP uses 0 as nodata value
    nodata=0.0,
)

# This example AOI has one geometry, so we select the first (only) mosaic location.
mosaic_store = mosaic_index.first_row_mosaic

print(f"Mosaic index URI: {mosaic_index.uri}")
print(f"Mosaic store URI: {mosaic_store}")
```

## Visualizing outputs

If RasterFlow is enabled for your organization, you can visualize the mosaic and inference outputs from this notebook using [cloud.wherobots.com/map](https://cloud.wherobots.com/map). Zarr, GeoParquet, and other geospatial data formats are supported, so you can inspect either the mosaic output or the road-probability output from the preceding workflow steps.

## (Optional) Running inference with ChesapeakeRSC

Now that we have built the NAIP mosaic, we can optionally run a semantic segmentation model to detect roads. The [ChesapeakeRSC](https://github.com/isaaccorley/ChesapeakeRSC) model was trained on NAIP imagery from Maryland and can detect roads even when occluded by tree cover.

The model outputs two classes:

* `background`
* `road`

**Expected runtime: \~10-20 minutes** depending on the mosaic size.

```python theme={"system"}
# # Uncomment to run road detection on the NAIP mosaic
# 
# from rasterflow_remote import MosaicToMosaicActorEnum
# from rasterflow_remote.data_models import MergeModeEnum
# 
# # ChesapeakeRSC model - direct URL (no HuggingFace auth required)
# MODEL_PATH = "https://huggingface.co/wherobots/chesapeakersc-pt2/resolve/main/chesapeakersc-ep.pt2"
# 
# prediction_store = client.predict_mosaic(
#     store=mosaic_store,
#     model_path=MODEL_PATH,
#     patch_size=512,
#     clip_size=64,
#     device="cuda",
#     features=["r", "g", "b", "ir"],  # Must match the bands in our mosaic
#     labels=["background", "road"],
#     actor=MosaicToMosaicActorEnum.SEMANTIC_SEGMENTATION_PYTORCH,
#     max_batch_size=256,
#     merge_mode=MergeModeEnum.WEIGHTED_AVERAGE,
# )
# 
# print(f"Prediction index URI: {prediction_store.uri}")
```

You can also open the road-probability output from the preceding inference workflow using [cloud.wherobots.com/map](https://cloud.wherobots.com/map) if RasterFlow is enabled for your organization.
