Skip to main content
WherobotsDB provides various APIs to work with raster data, provided below are some functions. The whole catalog of Raster functions provided by Spatial SQL can be found here

Raster Manipulation

Coordinate translation

WherobotsDB allows you to translate coordinates as per your needs. It can translate pixel locations to world coordinates and vice versa.

PixelAsPoint

Use RS_PixelAsPoint to translate pixel coordinates to world location.
SELECT RS_PixelAsPoint(rast, 450, 400) FROM rasterDf
Output:
POINT (-13063342 3992403.75)

World to Raster Coordinate

Use RS_WorldToRasterCoord to translate world location to pixel coordinates. To just get X coordinate use RS_WorldToRasterCoordX and for just Y coordinate use RS_WorldToRasterCoordY.
SELECT RS_WorldToRasterCoord(rast, -1.3063342E7, 3992403.75)
Output:
POINT (450 400)

Pixel Manipulation

Use RS_Values to fetch values for a specified array of Point Geometries. The coordinates in the point geometry are indicative of real-world location.
SELECT RS_Values(rast, Array(ST_Point(-13063342, 3992403.75), ST_Point(-13074192, 3996020)))
Output:
[132.0, 148.0]
To change values over a grid or area defined by a geometry, we will use RS_SetValues.
SELECT RS_SetValues(
        rast, 1, 250, 260, 3, 3,
        Array(10, 12, 17, 26, 28, 37, 43, 64, 66)
    )
Follow the links to get more information on how to use the functions appropriately.

Band Manipulation

WherobotsDB provides APIs to select specific bands from a raster image and create a new raster. For example, to select 2 bands from a raster, you can use the RS_Band API to retrieve the desired multi-band raster. Let’s use a multi-band raster for this example. The process of loading and converting it to raster type is the same.
SELECT RS_Band(colorRaster, Array(1, 2))
Let’s say you have many one band rasters and want to add a band to the raster to perform map algebra operations. You can do so using RS_AddBand function.
SELECT RS_AddBand(raster1, raster2, 1, 2)
This will result in raster1 having raster2’s specified band.

Resample raster data

WherobotsDB allows you to resample raster data using different interpolation methods like the nearest neighbor, bilinear, and bicubic to change the cell size or align raster grids, using RS_Resample.
SELECT RS_Resample(rast, 50, -50, -13063342, 3992403.75, true, "bicubic")
For more information please follow the link.

Execute map algebra operations

Map algebra is a way to perform raster calculations using mathematical expressions. The expression can be a simple arithmetic operation or a complex combination of multiple operations. The Normalized Difference Vegetation Index (NDVI) is a simple graphical indicator that can be used to analyze remote sensing measurements from a space platform and assess whether the target being observed contains live green vegetation or not.
NDVI = (NIR - Red) / (NIR + Red)
where NIR is the near-infrared band and Red is the red band.
SELECT RS_MapAlgebra(raster, 'D', 'out = (rast[3] - rast[0]) / (rast[3] + rast[0]);') as ndvi FROM raster_table
For more information please refer to Map Algebra API.

Interoperability between raster and vector data

Geometry As Raster

WherobotsDB allows you to rasterize a geometry by using RS_AsRaster.
SELECT RS_AsRaster(
        ST_GeomFromWKT('POLYGON((150 150, 220 260, 190 300, 300 220, 150 150))'),
        RS_MakeEmptyRaster(1, 'b', 4, 6, 1, -1, 1),
        'b', 230
    )
The image created is as below for the vector: Rasterized vector
The vector coordinates are buffed up to showcase the output, the real use case, may or may not match the example.

Spatial range query

WherobotsDB provides raster predicates to do a range query using a geometry window, for example let’s use RS_Intersects.
SELECT rast FROM rasterDf WHERE RS_Intersects(rast, ST_GeomFromWKT('POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))'))

Spatial join query

WherobotsDB’s raster predicates can also do a spatial join using the raster column and geometry column, using the same function as above.
SELECT r.rast, g.geom FROM rasterDf r, geomDf g WHERE RS_Intersects(r.rast, g.geom)
These range and join queries will filter rasters using the provided geometric boundary and the spatial boundary of the raster.
WherobotsDB offers more raster predicates to do spatial range queries and spatial join queries. Please refer to raster predicates docs.

Collecting raster Dataframes and working with them locally in Python

WherobotsDB allows collecting Dataframes with raster columns and working with them locally in Python. The raster objects are represented as SedonaRaster objects in Python, which can be used to perform raster operations.
df_raster = sedona.read.format("binaryFile").load("/path/to/raster.tif").selectExpr("RS_FromGeoTiff(content) as rast")
rows = df_raster.collect()
raster = rows[0].rast
raster  # <sedona.raster.sedona_raster.InDbSedonaRaster at 0x1618fb1f0>
You can retrieve the metadata of the raster by accessing the properties of the SedonaRaster object.
raster.width        # width of the raster
raster.height       # height of the raster
raster.affine_trans # affine transformation matrix
raster.crs_wkt      # coordinate reference system as WKT
You can get a numpy array containing the band data of the raster using the as_numpy or as_numpy_masked method. The band data is organized in CHW order.
raster.as_numpy()        # numpy array of the raster
raster.as_numpy_masked() # numpy array with nodata values masked as nan
If you want to work with the raster data using rasterio, you can retrieve a rasterio.DatasetReader object using the as_rasterio method.
ds = raster.as_rasterio()  # rasterio.DatasetReader object
# Work with the raster using rasterio
band1 = ds.read(1)         # read the first band

Writing Python UDFs to work with raster data

You can write Python UDFs that receive raster data, process it with NumPy, SciPy, or scikit-learn, and return either a scalar value or a new raster. Use the sedona_vectorized_udf decorator for the best performance — it preserves all raster metadata (CRS, affine transform, nodata values, etc.) automatically.

Raster to scalar

Compute a summary statistic from each raster tile and return it as a DataFrame column:
from pyspark.sql.types import DoubleType
from pyspark.sql.functions import col
from sedona.spark.sql.functions import sedona_vectorized_udf
from sedona.spark.raster import SedonaRaster


@sedona_vectorized_udf(return_type=DoubleType())
def mean_pixels(raster: SedonaRaster) -> float:
    return float(raster.as_numpy().mean())


df_raster.select(mean_pixels(col("rast")).alias("mean")).show()

Raster to raster

Return a SedonaRaster to pass raster data back to WherobotsDB with full metadata preserved. Use raster.with_bands() to replace pixel data — band count and data type can change freely.
Raster-to-raster UDFs using with_bands() only work with in-db rasters. Out-db rasters do not carry the metadata needed to round-trip through a UDF. Use RS_AsInDB to convert out-db rasters to in-db before passing them to a raster-returning UDF.
import numpy as np
from pyspark.sql.functions import col
from scipy import ndimage
from sedona.spark.raster import SedonaRaster
from sedona.spark.sql.functions import sedona_vectorized_udf
from sedona.spark.sql.types import RasterType


@sedona_vectorized_udf(return_type=RasterType())
def find_peaks(raster: SedonaRaster) -> SedonaRaster:
    """Label connected regions of local maxima in a single-band raster."""
    band = raster.as_numpy()[0].astype(np.float32)
    local_max = ndimage.maximum_filter(band, size=5) == band
    labels, _ = ndimage.label(local_max)
    return raster.with_bands(labels[np.newaxis])  # 1 band, int64


df_raster.select(find_peaks(col("rast")).alias("peak_labels")).show()
The with_bands() method accepts any NumPy array in CHW order (channels × height × width). The band count and data type can differ from the input — for example, a 4-band float32 input can produce a 1-band int64 output. The spatial dimensions (height and width) must match the original raster.

Working with out-db rasters

SedonaRaster will automatically construct a rasterio Env using Hadoop S3A configurations when loading out-db rasters, so out-db rasters should be loaded without credential problems. However, if you are working with out-db rasters in a subprocess, SedonaRaster will fail to infer Hadoop S3A configurations. To make subprocesses pick up Hadoop S3A configurations and properly load out-db rasters, you have to export S3 configs to environment variables using sedona.spark.raster.gdal_conf.export_gdal_conf_to_env:
from sedona.spark.raster import gdal_conf
gdal_conf.export_gdal_conf_to_env()

# ... launch subprocesses and work with out-db rasters