Sedona loader are available in Scala, Java and Python and have the same APIs.
Raster format.
By default, these functions uses lon/lat order.
Loading raster using the raster loader
Theraster loader reads raster data from binary files as out-of-database (out-db) rasters then splits that raster data into smaller tiles.
- Scala
- Java
- Python
rast: The raster data inRasterformat. This is an out-db raster tile that references to the original raster data file.x: The 0-based x-coordinate of the tile. This column only presents when retile is not disabled.y: The 0-based y-coordinate of the tile. This column only presents when retile is not disabled.
option("retile", "false"), or specify the tile size manually using options such as option("tileWidth", "256") and option("tileHeight", "256").
The options for the raster loader are as follows:
retile: Enables tiling. Default istrue.tileWidth: The width of the tile. If not specified, the size of internal tiles will be used.tileHeight: The height of the tile. If not specified, will usetileWidthiftileWidthis explicitly set, otherwise the size of internal tiles will be used.padWithNoData: Pad the right and bottom of the tile with NODATA values if the tile is smaller than the specified tile size. Default isfalse.autoRescale: Whether to rescale the pixel values using the scale and offset values in the GeoTIFF file. Default isfalse.
If the internal tiling scheme of raster data does not conform to tiling, the
raster loader will throw an error. You can disable automatic tiling using option("retile", "false"), or specify the tile size manually to workaround this issue. A better solution is to translate the raster data into COG format using gdal_translate or other tools.raster loader also works with Spark generic source file options, such as option("pathGlobFilter", "*.tif*") and option("recursiveFileLookup", "true"). For instance, you can load all the .tif files recursively in a directory using
raster loader uniquely handles paths ending in / by performing a recursive search for files. This is equivalent to omitting / from the path and setting option("recursiveFileLookup", "true").
The DataFrame loaded by the raster loader will be automatically repartitioned by default, this is for evenly distributing the workload of processing raster tiles to the entire cluster. The number of partitions is proportional to the number of executor CPU cores in the cluster. You can disable auto repartitioning by setting the Spark session configuration spark.sedona.raster.load.autoRepartition to false. If you want to manually specify the number of partitions, you can set the Spark session configuration spark.sedona.raster.load.numPartitions to the desired number of partitions.
Loading raster using binaryFile loader (Deprecated)
Step 1: Load raster to a binary DataFrame
You can load any type of raster data using the code below. Then use the RS constructors below to create a Raster DataFrame.Step 2: Create a raster type column
After loading the raster data files usingbinaryFile loader, you can either use RS_FromPath to load the raster as an out-db raster, or use one of RS_FromGeoTiff, RS_FromArcInfoAsciiGrid and RS_FromNetCDF to load the binary data of the raster file as an in-db raster.
Loading raster files as out-db raster using RS_FromPath
We can drop the content binary column to avoid reading the content of the file entirely when using RS_FromPath to load out-db rasters.
Loading raster content as in-db raster using RS_FromGeoTiff
We’ll use the content binary column to load in-db raster. This requires loading the entire raster file into memory.
Raster loading functions
The following functions can be used to create raster objects from various file formats. Each function has its own dedicated reference page:| Function | Description |
|---|---|
| RS_FromGeoTiff | Create a raster from GeoTiff binary data |
| RS_FromArcInfoAsciiGrid | Create a raster from Arc Info ASCII Grid binary data |
| RS_FromNetCDF | Create a raster from NetCDF binary data |
| RS_FromPath | Create an out-db raster from a file path |
| RS_MakeEmptyRaster | Create an empty raster with specified dimensions |
| RS_MakeRaster | Create a raster from an array of pixel values |
| RS_AsInDB | Convert an out-db raster to an in-db raster |
| RS_BandPath | Get the file path of an out-db raster |
| RS_NetCDFInfo | Get variable information from a NetCDF file |

