> ## Documentation Index
> Fetch the complete documentation index at: https://docs.wherobots.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Detecting roads with RasterFlow

<Badge color="purple">Private Preview</Badge>

<Tip>
  The following content is a read-only preview of an executable Jupyter notebook.

  To run this notebook interactively:

  1. Go to the [**Wherobots Model Hub**](https://www.wherobots.com/model-hub).
  2. Select the specific notebook you wish to run.
  3. Click **Run Model in Notebook**.
</Tip>

<img src="https://mintcdn.com/wherobots/ymWKde3Wg4LC2o0o/tutorials/example-notebooks/images/rasterflow-chesapeake-chesapeakersc-banner.jpg?fit=max&auto=format&n=ymWKde3Wg4LC2o0o&q=85&s=c15b1163224ffd6807d9914880aa8395" alt="chesapeakersc banner" width="1292" height="231" data-path="tutorials/example-notebooks/images/rasterflow-chesapeake-chesapeakersc-banner.jpg" />

This notebook will guide you through detecting roads from aerial imagery, using Wherobots RasterFlow and the ChesapeakeRSC model. You will gain a hands-on understanding of how to run models like ChesapeakeRSC on your selected area of interest, vectorize the model outputs, and work with those vectors in WherobotsDB.

### ChesapeakeRSC

The [ChesapeakeRSC](https://github.com/isaaccorley/ChesapeakeRSC)<sup>1</sup> model is an open source segmentation model that can detect roads from high resolution imagery, even when those roads may be occluded by tree cover.

This model predicts 2 classes:

* background
* road

It was trained on high resolution imagery from the [National Agriculture Imagery Program (NAIP)](https://www.usgs.gov/centers/eros/science/usgs-eros-archive-aerial-photography-national-agriculture-imagery-program-naip) and labeled land cover data for the state of Maryland in the United States.

## Selecting an Area of Interest (AOI)

We will choose an Area of Interest (AOI) for our analysis. ChesapeakeRSC was trained on select geographies in the Northeastern United States.

We will try it out in Montgomery County, Maryland.

```python theme={"system"}
import wkls
import geopandas as gpd
import os

# Generate a geometry for Montgomery County, Maryland using Well-Known Locations (https://github.com/wherobots/wkls)
gdf = gpd.read_file(wkls['us']['md']['Montgomery County'].geojson())

# Save the geometry to a parquet file in the user's S3 path
aoi_path = os.getenv("USER_S3_PATH") + "montgomery.parquet"
gdf.to_parquet(aoi_path)
```

## Initializing the RasterFlow client

```python theme={"system"}
from datetime import datetime

from rasterflow_remote import RasterflowClient

from rasterflow_remote.data_models import (
    ModelRecipes, 
    VectorizeMethodEnum
)

rf_client = RasterflowClient()
```

## Running a model

RasterFlow has pre-defined workflows to simplify orchestration of the processing steps for model inference.  These steps include:

* Ingesting imagery for the specified Area of Interest (AOI)
* Generating a seamless image from multiple image tiles (a mosaic)
* Running inference with the selected model

The output is a Zarr store of the model outputs.

Note: This step will take approximately 30 minutes to complete.

```python theme={"system"}
model_output_index = rf_client.build_and_predict_mosaic_recipe(
    # Path to our AOI in GeoParquet or GeoJSON format
    aoi = aoi_path,

    # Date range for imagery to be used by the model
    start = datetime(2017, 1, 1),
    end = datetime(2018, 1, 1),

    # Coordinate Reference System EPSG code for the output mosaic   
    crs_epsg = 3857,

    # The model recipe to be used for inference (ChesapeakeRSC in this case)
    model_recipe = ModelRecipes.CHESAPEAKE_RSC
)

model_output_index.mosaics
```

```python theme={"system"}
# This example AOI has one geometry, so we select the first (only) mosaic location.
model_output_store = model_output_index.first_row_mosaic

model_output_store
```

## Visualizing outputs

If RasterFlow is enabled for your organization, you can visualize the Zarr, GeoParquet, and other geospatial outputs using [cloud.wherobots.com/map](https://cloud.wherobots.com/map).

## Vectorize the raster model outputs

The output for the ChesapeakeRSC model is a raster with two classes: background and road.

We can run a seperate flow to convert the roads into vector geometries, based on the confidence threshold.  Converting these results to geometries allows us to more easily post process the results or join the resuilts with other vector data.

```python theme={"system"}
import xarray as xr
import s3fs
import zarr
# Determine the classes that are predicted by the model
fs = s3fs.S3FileSystem(profile="default", asynchronous=True)
zstore = zarr.storage.FsspecStore(fs, path=model_output_store[5:])
ds = xr.open_zarr(zstore)
model_features = ds['band'].data.tolist()

# Remove the 'background' class for vectorization
vector_features = [f for f in model_features if f != 'background']  
```

```python theme={"system"}
# Note: this should take about 5 minutes to complete
vectorized_results = rf_client.vectorize_mosaic(
        store = model_output_store,
        features = vector_features,
        threshold = 0.5,
        vectorize_method = VectorizeMethodEnum.SEMANTIC_SEGMENTATION_RASTERIO,
        vectorize_config={"stats": True, "medial_skeletonize": False}
    )

print(vectorized_results)
```

## Save the vectorized results to the catalog

We can store these vectorized outputs in the catalog by using WherobotsDB to persist the GeoParquet results.

```python theme={"system"}
from sedona.spark import *

config = SedonaContext.builder().getOrCreate()
sedona = SedonaContext.create(config)
```

```python theme={"system"}
sedona.sql("CREATE DATABASE IF NOT EXISTS org_catalog.chesapeake_db")

df = sedona.read.format("geoparquet").load(vectorized_results.uri)
df = df.withColumnRenamed("label", "layer")
df.writeTo("org_catalog.chesapeake_db.chesapeake_vectorized").createOrReplace()
```

## Visualize the vectorized results

To visualize the vectorized results, we will choose a small subset of the outputs that intersect with a bounding box around the center of the AOI.  We will also filter out results with a score lower than 0.7.

```python theme={"system"}
from sedona.spark.sql.st_constructors import ST_MakeEnvelope
from sedona.spark.sql.st_predicates import ST_Intersects
import pyspark.sql.functions as f

# Compute a small bounding box around the center of the AOI
minx, miny, maxx, maxy = gdf.total_bounds
center_lat = (miny + maxy) / 2
center_lon = (minx + maxx) / 2
delta = 0.01

min_lon = center_lon - delta
max_lon = center_lon + delta
min_lat = center_lat - delta
max_lat = center_lat + delta

df = df.filter(
    ST_Intersects(f.col("geometry"), ST_MakeEnvelope(min_lon, min_lat, max_lon, max_lat))
).filter("score_mean > 0.7")
```

```python theme={"system"}
from sedona.spark.maps.SedonaKepler import SedonaKepler
map = SedonaKepler.create_map(df=df, name="Vectorized results")
map
```

## Generate PM Tiles for visualization

To improve visualization performance of a large number of geometries, we can use Wherobots built-in high performance PM tile generator.

```python theme={"system"}
from wherobots import vtiles

full_tiles_path = os.getenv("USER_S3_PATH") + "tiles.pmtiles"
vtiles.generate_pmtiles(df, full_tiles_path)
```

```python theme={"system"}
vtiles.show_pmtiles(full_tiles_path)
```

```python theme={"system"}
full_tiles_path
```

### References

1. **Robinson, C., Corley, I., Ortiz, A., Dodhia, R., Lavista Ferres, J. M., & Najafirad, P. (2024).** Seeing the roads through the trees: A benchmark for modeling spatial dependencies with aerial imagery. *arXiv preprint arXiv:2401.06762*. [https://arxiv.org/abs/2401.06762](https://arxiv.org/abs/2401.06762)
