Documentation Index
Fetch the complete documentation index at: https://docs.wherobots.com/llms.txt
Use this file to discover all available pages before exploring further.
Private Preview
The following content is a read-only preview of an executable Jupyter notebook.To run this notebook interactively:
- Go to the Wherobots Model Hub.
- Select the specific notebook you wish to run.
- Click Run Model in Notebook.
This notebook will guide you through detecting sidewalks from aerial imagery, using Wherobots RasterFlow and the Tile2Net model. You will gain a hands-on understanding of how to run models like Tile2Net on your selected area of interest, vectorize the model outputs, and work with those vectors in WherobotsDB.
Tile2Net
The Tile2Net1 model is an open source segmentation model that can detect sidewalks and other pathways from very high resolution imagery.
This model predicts 4 classes:
- background
- road
- sidewalk
- crosswalk
It was trained on ~19cm aerial imagery from USGS for Cambridge, Manhattan, Brooklyn, and Washington DC. We will demonstrate results using 30cm resolution data from the National Agriculture Imagery Program (NAIP).
Selecting an Area of Interest (AOI)
To start, we will choose an Area of Interest (AOI) for our analysis. Tile2Net was trained on select geographies in the Northeastern United States. So we will pick a location in that geographic region that wasn’t in the training data and where these is recent 30cm resolution NAIP data: College Park, Maryland.
The National Agriculture Imagery Program (NAIP) provides aerial imagery for the United States, capturing high-resolution images during the agricultural growing seasons. The 2023 NAIP dataset offers 30cm resolution imagery in certain regions, see this map for more details.
import wkls
import geopandas as gpd
import os
# Generate a geometry for College Park, Maryland using Well-Known Locations (https://github.com/wherobots/wkls)
# NOTE: later in the notebook we fix an area to inspect assuming you ran on College Park. If you change the location here
# you will need to update the visualization area in "Visualize a subset of the model outputs"
gdf = gpd.read_file(wkls.us.md.collegepark.geojson())
# Save the geometry to a parquet file in the user's S3 path
aoi_path = os.getenv("USER_S3_PATH") + "collegepark.parquet"
gdf.to_parquet(aoi_path)
Initializing the RasterFlow client
from datetime import datetime
from rasterflow_remote import RasterflowClient
from rasterflow_remote.data_models import (
ModelRecipes,
VectorizeMethodEnum
)
rf_client = RasterflowClient()
Running a model
RasterFlow has pre-defined workflows to simplify orchestration of the processing steps for model inference. These steps include:
- Ingesting imagery for the specified Area of Interest (AOI)
- Generating a seamless image from multiple image tiles (a mosaic)
- Running inference with the selected model
The output is a Zarr store of the model outputs.
Note: This step will take approximately 30 minutes to complete.
model_output_index = rf_client.build_and_predict_mosaic_recipe(
# Path to our AOI in GeoParquet or GeoJSON format
aoi = aoi_path,
# Date range for imagery to be used by the model
start = datetime(2023, 1, 1),
end = datetime(2024, 1, 1),
# Coordinate Reference System EPSG code for the output mosaic
crs_epsg = 3857,
# The model recipe to be used for inference (Tile2Net in this case)
model_recipe = ModelRecipes.TILE_2_NET,
)
model_output_index.mosaics
# This example AOI has one geometry, so we select the first (only) mosaic location.
model_output_store = model_output_index.first_row_mosaic
model_output_store
Visualizing outputs
If RasterFlow is enabled for your organization, you can visualize the Zarr, GeoParquet, and other geospatial outputs using cloud.wherobots.com/map.
Vectorize the raster model outputs
The output for the Tile2Net model is a raster with four classes: background, road, sidewalk, crosswalk.
We can run a seperate flow to convert the roads, sidewalks and crosswalks into vector geometries, based on the confidence threshold. Converting these results to geometries allows us to more easily post process the results or join the resuilts with other vector data.
import xarray as xr
import s3fs
import zarr
# Determine the classes that are predicted by the model
fs = s3fs.S3FileSystem(profile="default", asynchronous=True)
zstore = zarr.storage.FsspecStore(fs, path=model_output_store[5:])
ds = xr.open_zarr(zstore)
model_features = ds['band'].data.tolist()
# Only vectorize the 'sidewalk' and 'crosswalk' classes
relevant_features = ['sidewalk', 'crosswalk']
vector_features = [f for f in model_features if f in relevant_features]
# Note: this should take about 5 minutes to complete
vectorized_results = rf_client.vectorize_mosaic(
store = model_output_store,
features = vector_features,
threshold = 0.05,
vectorize_method = VectorizeMethodEnum.SEMANTIC_SEGMENTATION_RASTERIO,
vectorize_config={"stats": True, "medial_skeletonize": False}
)
print(vectorized_results)
Save the vectorized results to the catalog
We can store these vectorized outputs in the catalog by using WherobotsDB to persist the GeoParquet results.
from sedona.spark import *
import pyspark.sql.functions as f
from pyspark.sql.functions import expr
config = SedonaContext.builder().getOrCreate()
sedona = SedonaContext.create(config)
sedona.sql("CREATE DATABASE IF NOT EXISTS org_catalog.tile2net_db")
df = sedona.read.format("geoparquet").load(vectorized_results.uri)
df = df.withColumnRenamed("label", "layer")
df.writeTo("org_catalog.tile2net_db.tile2net_vectorized").createOrReplace()
Visualize the vectorized results
To visualize the vectorized results, we will filter out results with a score lower than 0.2. This threshold was determined through observation to strike a balance: it eliminates obvious noise without being overly aggressive, ensuring that we don’t accidentally filter out too many relevant results.
Gaps in prediction for Tile2net are likely due to it being used with lower resolution imagery (30cm resolution) than what it was trained on (19cm resolution), as well as changes in geographic context, and occlusion from trees over the pathways.
df = df.filter("score_mean > 0.2")
df_filtered = df.withColumn(
"area_m2",
expr("ST_AreaSpheroid(geometry)")
).filter("area_m2 > 1000")
from sedona.spark.maps.SedonaKepler import SedonaKepler
map = SedonaKepler.create_map(df=df_filtered, name="Vectorized results")
map
Generate PM Tiles for visualization
To improve visualization performance of a large number of geometries, we can use Wherobots built-in high performance PM tile generator.
from wherobots import vtiles
full_tiles_path = os.getenv("USER_S3_PATH") + "tiles.pmtiles"
vtiles.generate_pmtiles(df_filtered, full_tiles_path)
vtiles.show_pmtiles(full_tiles_path)
References
- Hosseini, M., Sevtsuk, A., Miranda, F., Cesar Jr, R. M., & Silva, C. T. (2023). Mapping the walk: A scalable computer vision approach for generating sidewalk network datasets from aerial imagery. Computers, Environment and Urban Systems, 101, 101950.