Skip to main content
Sedona loader are available in Scala, Java and Python and have the same APIs.
The raster loader of Sedona leverages Spark built-in binary data source and works with several RS constructors to produce Raster type. Each raster is a row in the resulting DataFrame and stored in a Raster format. By default, these functions uses lon/lat order.

Loading raster using the raster loader

The raster loader reads raster data from binary files as out-of-database (out-db) rasters then splits that raster data into smaller tiles.
var rawDf = sedona.read.format("raster").load("/FILE-PATH/*.tif")
rawDf.createOrReplaceTempView("rawdf")
rawDf.show()
The output will look like this:
+--------------------+---+---+
|                rast|  x|  y|
+--------------------+---+---+
|OutDbGridCoverage...|  0|  0|
|OutDbGridCoverage...|  1|  0|
|OutDbGridCoverage...|  2|  0|
...
The output contains the following columns:
  • rast: The raster data in Raster format. This is an out-db raster tile that references to the original raster data file.
  • x: The 0-based x-coordinate of the tile. This column only presents when retile is not disabled.
  • y: The 0-based y-coordinate of the tile. This column only presents when retile is not disabled.
The size of the tile is determined by the internal tiling scheme of the raster data. Using the Cloud Optimized GeoTIFF (COG) format for raster data is recommended since doing so usually organizes the pixel data as square tiles. You can also disable automatic tiling using option("retile", "false"), or specify the tile size manually using options such as option("tileWidth", "256") and option("tileHeight", "256"). The options for the raster loader are as follows:
  • retile: Enables tiling. Default is true.
  • tileWidth: The width of the tile. If not specified, the size of internal tiles will be used.
  • tileHeight: The height of the tile. If not specified, will use tileWidth if tileWidth is explicitly set, otherwise the size of internal tiles will be used.
  • padWithNoData: Pad the right and bottom of the tile with NODATA values if the tile is smaller than the specified tile size. Default is false.
  • autoRescale: Whether to rescale the pixel values using the scale and offset values in the GeoTIFF file. Default is false.
If the internal tiling scheme of raster data does not conform to tiling, the raster loader will throw an error. You can disable automatic tiling using option("retile", "false"), or specify the tile size manually to workaround this issue. A better solution is to translate the raster data into COG format using gdal_translate or other tools.
The raster loader also works with Spark generic source file options, such as option("pathGlobFilter", "*.tif*") and option("recursiveFileLookup", "true"). For instance, you can load all the .tif files recursively in a directory using
sedona.read.format("raster").option("recursiveFileLookup", "true").option("pathGlobFilter", "*.tif*").load(path_to_raster_data_folder)
The raster loader uniquely handles paths ending in / by performing a recursive search for files. This is equivalent to omitting / from the path and setting option("recursiveFileLookup", "true"). The DataFrame loaded by the raster loader will be automatically repartitioned by default, this is for evenly distributing the workload of processing raster tiles to the entire cluster. The number of partitions is proportional to the number of executor CPU cores in the cluster. You can disable auto repartitioning by setting the Spark session configuration spark.sedona.raster.load.autoRepartition to false. If you want to manually specify the number of partitions, you can set the Spark session configuration spark.sedona.raster.load.numPartitions to the desired number of partitions.

Loading raster using binaryFile loader (Deprecated)

Step 1: Load raster to a binary DataFrame

You can load any type of raster data using the code below. Then use the RS constructors below to create a Raster DataFrame.
sedona.read.format("binaryFile").load("/some/path/*.asc")

Step 2: Create a raster type column

After loading the raster data files using binaryFile loader, you can either use RS_FromPath to load the raster as an out-db raster, or use one of RS_FromGeoTiff, RS_FromArcInfoAsciiGrid and RS_FromNetCDF to load the binary data of the raster file as an in-db raster.

Loading raster files as out-db raster using RS_FromPath

We can drop the content binary column to avoid reading the content of the file entirely when using RS_FromPath to load out-db rasters.
var df = sedona.read.format("binaryFile").load("/some/path/*.tiff").drop("content")
df = df.withColumn("raster", f.expr("RS_FromPath(path)"))

Loading raster content as in-db raster using RS_FromGeoTiff

We’ll use the content binary column to load in-db raster. This requires loading the entire raster file into memory.
var df = sedona.read.format("binaryFile").load("/FILE-PATH/*.tiff")
df = df.withColumn("raster", f.expr("RS_FromGeoTiff(content)"))

Raster loading functions

The following functions can be used to create raster objects from various file formats. Each function has its own dedicated reference page:
FunctionDescription
RS_FromGeoTiffCreate a raster from GeoTiff binary data
RS_FromArcInfoAsciiGridCreate a raster from Arc Info ASCII Grid binary data
RS_FromNetCDFCreate a raster from NetCDF binary data
RS_FromPathCreate an out-db raster from a file path
RS_MakeEmptyRasterCreate an empty raster with specified dimensions
RS_MakeRasterCreate a raster from an array of pixel values
RS_AsInDBConvert an out-db raster to an in-db raster
RS_BandPathGet the file path of an out-db raster
RS_NetCDFInfoGet variable information from a NetCDF file