> ## Documentation Index
> Fetch the complete documentation index at: https://docs.wherobots.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Change Detection with RasterFlow

<Badge color="purple">Private Preview</Badge>

<Tip>
  The following content is a read-only preview of an executable Jupyter notebook.

  To run this notebook interactively:

  1. Go to [**Wherobots Cloud**](https://cloud.wherobots.com).
  2. Start a runtime.
  3. Open the notebook.
  4. In the Jupyter Launcher:
     1. Click **File > Open Path**.
     2. Paste the following path to access this notebook: `examples/Analyzing_Data/RasterFlow_ChangeDetection.ipynb`
     3. Click **Enter**.
</Tip>

This notebook demonstrates RasterFlow's change detection inference workflow using the [Fields of the World (FTW)](https://fieldsofthe.world/)<sup>1</sup> model as an example.

**Related notebooks:**

* [RasterFlow\_FTW.ipynb](/tutorials/example-notebooks/rasterflow-ftw) — Complete FTW field boundary detection tutorial
* [RasterFlow\_Bring\_Your\_Own\_Model.ipynb](/tutorials/example-notebooks/rasterflow-bring-your-own-model) — Export custom PyTorch models

### Why Fields of the World (FTW)?

FTW predicts 3 classes (`non_field_background`, `field`, `field_boundaries`). While FTW is a field boundary segmentation model (not a true change detection model), it takes bi-temporal input (two seasons), making it useful for demonstrating change detection.

### Model signature

Change detection models receive multi-temporal imagery stacked along the channel dimension:

|            | Shape                                                                                   |
| ---------- | --------------------------------------------------------------------------------------- |
| **Input**  | `(Batch, Channels × TimeSteps, H, W)` — e.g. `(N, 8, 256, 256)` for 4 bands × 2 seasons |
| **Output** | `(Batch, Classes, H, W)` — e.g. `(N, 3, 256, 256)` for 3 classes                        |

```python theme={"system"}
class ChangeDetectionModel(nn.Module):
    def forward(self, x):  # x: (N, Channels*TimeSteps, H, W)
        return logits      # (N, Classes, H, W)
```

We'll use a particular **Actor** to run change detection inference using Rasterflow: `MosaicToMosaicActorEnum.SEMANTIC_SEGMENTATION_CHANGE_DETECTION_PYTORCH`.

## Selecting an Area of Interest (AOI)

To start, we will choose an Area of Interest (AOI) for our analysis. The area around Haskell County, Kansas has some interesting crop field patterns so we will try out the model there.

```python theme={"system"}
import wkls
import geopandas as gpd
import os

# Generate a geometry for Haskell County, Kansas using WKLS (https://github.com/wherobots/wkls)
gdf = gpd.read_file(wkls.us.ks.haskellcounty.geojson())

# Make variables for the bounds of our aoi for visualizing results after inference
min_lon, min_lat, max_lon, max_lat = gdf.total_bounds
```

## Initializing the RasterFlow client

See our docs to learn more about all methods available on the client [https://docs.wherobots.com/reference/rasterflow/client](https://docs.wherobots.com/reference/rasterflow/client)

```python theme={"system"}
from datetime import datetime

from rasterflow_remote import RasterflowClient, DatasetEnum, MosaicToMosaicActorEnum

from rasterflow_remote.data_models import (
    VectorizeMethodEnum,
    MergeModeEnum
)

rf_client = RasterflowClient()
```

## Building a Mosaic

RasterFlow provides a `build_mosaics` workflow to create analysis-ready imagery for your Area of Interest (AOI). This step:

* Ingests Sentinel-2 imagery for the specified AOI across your defined time range (e.g., 2 years)
* Applies [quality filtering and cloud masking](https://docs.wherobots.com/develop/rasterflow/rasterflow-datasets#cloud-and-quality-filtering) to select valid observations
* Generates a seamless, temporally-composited mosaic from multiple image tiles

The output is a Zarr store containing the mosaic, ready for inference or analysis.

This step takes about 5 minutes for this AOI.

```python theme={"system"}
mosaic_index = rf_client.build_mosaics(
    datasets=[DatasetEnum.S2_MED_WINDOWED_PIXEL],
    aoi=gdf,
    start=datetime(2023, 1, 1),
    end=datetime(2025, 1, 1),
)

mosaic_index.mosaics
```

```python theme={"system"}
# This example AOI has one geometry, so we select the first (only) mosaic location.
input_store = mosaic_index.first_row_mosaic

input_store
```

With `xarray` we can print out a nice representation of our mosaic and see that it has two time steps for our change detection task.

```python theme={"system"}
import xarray as xr
import s3fs
import zarr

fs = s3fs.S3FileSystem(profile="default", asynchronous=True)
zstore = zarr.storage.FsspecStore(fs, path=input_store[5:])
mosaic = xr.open_zarr(zstore)

mosaic['variables']
```

We can also see what bands are included in the mosaic.

```python theme={"system"}
mosaic['variables']['band']
```

## Mosaic Inference

Now that we have built the mosaic and understand the features that were included, we can run `predict_mosaic`.

We'll pass in the band names we want to predict, as well as other inference configs that specify how to run inference.

* `mosaics` is the mosaic we just created for sentinel-2
* `model_path` can be a path to a Pytorch 2 Archive file on s3 or Huggingface. In this case we'll use the Fields of the World model from the Wherobots' Huggingface Collection.
* `patch_size` controls the XY size of the array input to the model
* `clip_size` in conjunction with `MergeModeEnum`, controls how to run overlapping windowed inference to reduce edge effects relative to non-overlapping inference
* `device` specifies what device to use for running inference. Our runtimes are GPU only at this time, but you could run on the CPU device if you wanted to!
* `features` selects what bands and in what order to run inference on
* `labels` sets the labels on the models output. Order matters here based on what your model returns!
* `actor` determines the kind of inference handler to use. The correct actor to select depends on the task your model is solving and the kind of input it expects. For change detection, we support models that take in an input where time and channels dimensions are stacked together. So the channel dimension is ordered like so in in this example:

```
        "<time_1>s2med_windowed_pixel:B04",
        "<time_1>s2med_windowed_pixel:B03",
        "<time_1>s2med_windowed_pixel:B02",
        "<time_1>s2med_windowed_pixel:B08",
        "<time_2>s2med_windowed_pixel:B04",
        "<time_2>s2med_windowed_pixel:B03",
        "<time_2>s2med_windowed_pixel:B02",
        "<time_2>s2med_windowed_pixel:B08"
```

* `max_batch_size`: Maximum number of patches fed through the model in a single forward pass.
  * Higher values = better GPU utilization but more GPU memory required
  * Lower values = less memory but potentially slower inference
* `xy_block_multiplier`: Controls how many Zarr chunks are processed together as a single block during inference.
  * Higher values = larger blocks = fewer tasks but more memory per task
  * Lower values = smaller blocks = more tasks but less memory per task
    Use a smaller multiplier (e.g., 1) if you're running into memory issues; use the default (4) for better throughput when memory
    allows.

```python theme={"system"}
model_output_index = rf_client.predict_mosaic(
    mosaics=input_store,
    model_path="https://huggingface.co/wherobots/ftw-v1.1-pt2/resolve/main/ftw-v1.1-ep.pt2",
    patch_size=256,
    clip_size=32,
    device="cuda",
    features=[
        "s2med_windowed_pixel:B04",
        "s2med_windowed_pixel:B03",
        "s2med_windowed_pixel:B02",
        "s2med_windowed_pixel:B08",
    ],
    labels=[
        "non_field_background",
        "field",
        "field_boundaries",
    ],
    actor=MosaicToMosaicActorEnum.SEMANTIC_SEGMENTATION_CHANGE_DETECTION_PYTORCH,
    max_batch_size=128,
    merge_mode=MergeModeEnum.WEIGHTED_AVERAGE,
    xy_block_multiplier=1
)

model_output_index.mosaics
```

```python theme={"system"}
# This example AOI has one geometry, so we select the first (only) mosaic location.
model_output_store = model_output_index.first_row_mosaic

model_output_store
```

## (Optional) Build an optimized Zarr for visualization

RasterFlow writes its outputs as Zarr stores at native resolution. To explore them interactively on [cloud.wherobots.com/map](https://cloud.wherobots.com/map), you can build an *optimized* multiscale Zarr. `build_zarr_multiscales` adds downsampled overview levels (image pyramids) to the store so the map can stream coarse tiles when zoomed out and full-resolution pixels when zoomed in.

This step is optional and can take a few minutes for large outputs, so the code below is commented out by default — uncomment it to run it.

```python theme={"system"}
# optimized_store = rf_client.build_zarr_multiscales(source_store=model_output_store)
# optimized_store
```

Let's check out our mosaic result with xarray to see what a Zarr change detection store looks like.

```python theme={"system"}
import s3fs
import zarr

fs = s3fs.S3FileSystem(profile="default", asynchronous=True)
zstore = zarr.storage.FsspecStore(fs, path=model_output_store[5:])
result_mosaic = xr.open_zarr(zstore)
```

The prediction result mosaic is shaped similar to the input, height and width are the same.

Rather than have 4 bands, it has 3 bands for the prediction result, representing the categories. The FTW model outputs 3 categories, whereas other models may output a binary classification and there will be only one prediction band.

Instead of 2 time steps, there is only 1, representing the change interval. We index the `time` coordinate of the Zarr store using the start date used to build the input mosaic. We also store a time delta coordinate to represent the interval of time between the start\_date and end\_date.

```python theme={"system"}
result_mosaic['variables']
```

```python theme={"system"}
start_time = result_mosaic['variables'].coords['time'].values[0]
print("start time:", start_time)
end_time = (result_mosaic['variables'].coords['time'] + result_mosaic['variables'].coords['time_delta']).values[0]
print("end time", end_time)
```

## Visualizing outputs

If RasterFlow is enabled for your organization, you can visualize the Zarr, GeoParquet, and other geospatial outputs using [cloud.wherobots.com/map](https://cloud.wherobots.com/map).

## Vectorize the raster model outputs

The output for the FTW model is a raster with three classes as bands: field, field\_boundaries, and non\_field\_background.

We will run a separate flow to convert the fields and field boundaries into vector geometries.  Converting these results to geometries allows us to more easily post process the results or join the results with other vector data.

```python theme={"system"}
# Determine the classes that are predicted by the model
model_features = result_mosaic['band'].data.tolist()

# Remove the 'non_field_background' class for vectorization
vector_features = [f for f in model_features if f != 'non_field_background']    
```

```python theme={"system"}
# Note: this should take about 5 minutes to complete
vectorized_results = rf_client.vectorize_mosaic(
        mosaic = model_output_store,
        features = vector_features,
        threshold = 0.5,
        vectorize_method = VectorizeMethodEnum.SEMANTIC_SEGMENTATION_RASTERIO,
        vectorize_config={"stats": True, "medial_skeletonize": False}
    )

print(vectorized_results)
```

## Save the vectorized results to the catalog

We can store these vectorized outputs in the catalog by using WherobotsDB to persist the GeoParquet results.

```python theme={"system"}
from sedona.spark import *
config = SedonaContext.builder().getOrCreate()
sedona = SedonaContext.create(config)
```

```python theme={"system"}
sedona.sql("CREATE DATABASE IF NOT EXISTS examples_temp.ftw_db")

vectorize_result_df = sedona.read.format("geoparquet").load(vectorized_results.uri)
vectorize_result_df = vectorize_result_df.withColumnRenamed("label", "layer")
vectorize_result_df.writeTo("examples_temp.ftw_db.ftw_vectorized").createOrReplace()
```

## Visualize the vectorized results

To visualize the vectorized results, we will show the fields around Plymell, Kansas and filter out results with a score lower than 0.5. This threshold was determined through observation to strike a balance: it eliminates obvious noise without being overly aggressive, ensuring that we don't accidentally filter out too many relevant results.

```python theme={"system"}
from sedona.spark.sql.st_constructors import ST_GeomFromText
from sedona.spark.sql.st_predicates import ST_Intersects
import pyspark.sql.functions as f

# Bounding box for Plymell, Kansas
plymell_wkt = "POLYGON ((-100.98 37.83, -100.98 37.68, -100.8 37.68, -100.8 37.83, -100.98 37.83))"
vectorize_result_filtered_df = vectorize_result_df.filter("layer == 'field'").filter("score_mean > 0.5") \
    .filter(ST_Intersects(f.col("geometry"), ST_GeomFromText(lit(plymell_wkt))))
```

We will apply some WherobotsDB vector post processing operations to refine the model results.

```python theme={"system"}
tolerance = 0.00001
decimal_places = 6  # roughly 10cm in lat/lon
df = vectorize_result_filtered_df.withColumn("geometry", f.expr("ST_MakeValid(geometry)"))
df = df.withColumn("geometry", f.expr("ST_SetSRID(geometry, 4326)")) # vectorize_mosaic defaults to returning geometries in the CRS EPSG:4326
df = df.withColumn("geometry", f.expr(f"ST_ReducePrecision(geometry, {decimal_places})"))
df = df.withColumn("geometry", f.expr(f"ST_SimplifyPreserveTopology(geometry, {tolerance})"))
df = df.repartition(200)
df.show()
```

```python theme={"system"}
from sedona.spark.maps.SedonaKepler import SedonaKepler
map = SedonaKepler.create_map(df=df, name="Vectorized results")
map
```

## Generate PM Tiles for visualization

To improve visualization performance of a large number of geometries, we can use Wherobots built-in high performance PM tile generator.

The FTW model has a tendency to create extremely large boundary geometries, which doesn't play nicely with PMTiles. To avoid this we subdivide the boundary geometries.

```python theme={"system"}
fields_df = df.where("layer = 'field'")
# 1266 is the vertex count of the second-largest boundary geometry.
# We use this instead of the absolute largest boundary (an extreme outlier)
# so the subdivision threshold is high enough to keep most geometries intact
# while still forcing that outlier-sized boundary to be subdivided for
# better PMTiles rendering performance.
boundaries_df = df.where("layer = 'field_boundaries'").withColumn("geometry", f.expr("ST_SubDivideExplode(geometry, 1266)"))
tile_features_df = fields_df.withColumn("layer", f.lit("fields")).unionByName(
    boundaries_df.withColumn("layer", f.lit("boundaries"))
)
```

```python theme={"system"}
from wherobots import vtiles
full_tiles_path = os.getenv("USER_S3_PATH") + "tiles.pmtiles"
vtiles.generate_pmtiles(tile_features_df, full_tiles_path)
```

```python theme={"system"}
vtiles.show_pmtiles(full_tiles_path)
```

## Sharing PMTiles results with the Wherobots PMTiles Viewer

You can generate a pre-signed url to your pmtiles using `get_url`.

Then, copy this to your clipboard with right-click + "Copy Output to Clipboard".

You can paste this url into [https://tile-viewer.wherobots.com/](https://tile-viewer.wherobots.com/) and create a publicly accessible PMTiles map served from your own bucket.

```python theme={"system"}
from wherobots.tools.utility.s3_utils import get_url

get_url(full_tiles_path)
```

### References

1. **Kerner, H., Chaudhari, S., Ghosh, A., Robinson, C., Ahmad, A., Choi, E., Jacobs, N., Holmes, C., Mohr, M., et al. (2024).** Fields of The World: A Machine Learning Benchmark Dataset For Global Agricultural Field Boundary Segmentation. *arXiv preprint arXiv:2409.16252*. Accepted at AAAI-2025 Artificial Intelligence for Social Impact (AISI) track.
2. ESA. (2015). Sentinel-2 User Handbook (Issue 1, Rev. 2). European Space Agency. [https://sentinels.copernicus.eu/documents/247904/685211/Sentinel-2\_User\_Handbook](https://sentinels.copernicus.eu/documents/247904/685211/Sentinel-2_User_Handbook)