> ## Documentation Index
> Fetch the complete documentation index at: https://docs.wherobots.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Raster Loaders

<Note>
  Sedona loader are available in Scala, Java and Python and have the same APIs.
</Note>

The raster loader of Sedona leverages Spark built-in binary data source and works with several RS constructors to produce Raster type. Each raster is a row in the resulting DataFrame and stored in a `Raster` format.

By default, these functions uses lon/lat order.

## Loading raster using the raster loader

The `raster` loader reads raster data from binary files as out-of-database (out-db) rasters then splits that raster data into smaller tiles.

<Tabs>
  <Tab title="Scala">
    ```scala theme={"system"}
    var rawDf = sedona.read.format("raster").load("/FILE-PATH/*.tif")
    rawDf.createOrReplaceTempView("rawdf")
    rawDf.show()
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    Dataset<Row> rawDf = sedona.read().format("raster").load("/FILE-PATH/*.tif")
    rawDf.createOrReplaceTempView("rawdf")
    rawDf.show()
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={"system"}
    rawDf = sedona.read.format("raster").load("/FILE-PATH/*.tif")
    rawDf.createOrReplaceTempView("rawdf")
    rawDf.show()
    ```
  </Tab>
</Tabs>

The output will look like this:

```
+--------------------+---+---+
|                rast|  x|  y|
+--------------------+---+---+
|OutDbGridCoverage...|  0|  0|
|OutDbGridCoverage...|  1|  0|
|OutDbGridCoverage...|  2|  0|
...
```

The output contains the following columns:

* `rast`: The raster data in `Raster` format. This is an out-db raster tile that references to the original raster data file.
* `x`: The 0-based x-coordinate of the tile. This column only presents when retile is not disabled.
* `y`: The 0-based y-coordinate of the tile. This column only presents when retile is not disabled.

The size of the tile is determined by the internal tiling scheme of the raster data. Using the [Cloud Optimized GeoTIFF (COG)](https://www.cogeo.org/) format for raster data is recommended since doing so usually organizes the pixel data as square tiles.

You can also disable automatic tiling using `option("retile", "false")`, or specify the tile size manually using options such as `option("tileWidth", "256")` and `option("tileHeight", "256")`.

The options for the `raster` loader are as follows:

* `retile`: Enables tiling. Default is `true`.
* `tileWidth`: The width of the tile. If not specified, the size of internal tiles will be used.
* `tileHeight`: The height of the tile. If not specified, will use `tileWidth` if `tileWidth` is explicitly set, otherwise the size of internal tiles will be used.
* `padWithNoData`: Pad the right and bottom of the tile with NODATA values if the tile is smaller than the specified tile size. Default is `false`.
* `autoRescale`: Whether to rescale the pixel values using the scale and offset values in the GeoTIFF file. Default is `false`.

<Note>
  If the internal tiling scheme of raster data does not conform to tiling, the `raster` loader will throw an error. You can disable automatic tiling using `option("retile", "false")`, or specify the tile size manually to workaround this issue. A better solution is to translate the raster data into COG format using `gdal_translate` or other tools.
</Note>

The `raster` loader also works with Spark generic source file options, such as `option("pathGlobFilter", "*.tif*")` and `option("recursiveFileLookup", "true")`. For instance, you can load all the `.tif` files recursively in a directory using

```python theme={"system"}
sedona.read.format("raster").option("recursiveFileLookup", "true").option("pathGlobFilter", "*.tif*").load(path_to_raster_data_folder)
```

The `raster` loader uniquely handles paths ending in `/` by performing a recursive search for files. This is equivalent to omitting `/` from the path and setting `option("recursiveFileLookup", "true")`.

The DataFrame loaded by the `raster` loader will be automatically repartitioned by default, this is for evenly distributing the workload of processing raster tiles to the entire cluster. The number of partitions is proportional to the number of executor CPU cores in the cluster. You can disable auto repartitioning by setting the Spark session configuration `spark.sedona.raster.load.autoRepartition` to `false`. If you want to manually specify the number of partitions, you can set the Spark session configuration `spark.sedona.raster.load.numPartitions` to the desired number of partitions.

## Loading raster using binaryFile loader (Deprecated)

### Step 1: Load raster to a binary DataFrame

You can load any type of raster data using the code below. Then use the RS constructors below to create a Raster DataFrame.

```scala theme={"system"}
sedona.read.format("binaryFile").load("/some/path/*.asc")
```

### Step 2: Create a raster type column

After loading the raster data files using `binaryFile` loader, you can either use `RS_FromPath` to load the raster as an out-db raster, or use one of `RS_FromGeoTiff`, `RS_FromArcInfoAsciiGrid` and `RS_FromNetCDF` to load the binary data of the raster file as an in-db raster.

#### Loading raster files as out-db raster using `RS_FromPath`

We can drop the `content` binary column to avoid reading the content of the file entirely when using `RS_FromPath` to load out-db rasters.

```scala theme={"system"}
var df = sedona.read.format("binaryFile").load("/some/path/*.tiff").drop("content")
df = df.withColumn("raster", f.expr("RS_FromPath(path)"))
```

#### Loading raster content as in-db raster using `RS_FromGeoTiff`

We'll use the `content` binary column to load in-db raster. This requires loading the entire raster file into memory.

```scala theme={"system"}
var df = sedona.read.format("binaryFile").load("/FILE-PATH/*.tiff")
df = df.withColumn("raster", f.expr("RS_FromGeoTiff(content)"))
```

## Raster loading functions

The following functions can be used to create raster objects from various file formats. Each function has its own dedicated reference page:

| Function                                                                                             | Description                                          |
| ---------------------------------------------------------------------------------------------------- | ---------------------------------------------------- |
| [RS\_FromGeoTiff](/reference/wherobots-db/raster-data/constructors/RS_FromGeoTiff)                   | Create a raster from GeoTiff binary data             |
| [RS\_FromArcInfoAsciiGrid](/reference/wherobots-db/raster-data/constructors/RS_FromArcInfoAsciiGrid) | Create a raster from Arc Info ASCII Grid binary data |
| [RS\_FromNetCDF](/reference/wherobots-db/raster-data/constructors/RS_FromNetCDF)                     | Create a raster from NetCDF binary data              |
| [RS\_FromPath](/reference/wherobots-db/raster-data/constructors/RS_FromPath)                         | Create an out-db raster from a file path             |
| [RS\_MakeEmptyRaster](/reference/wherobots-db/raster-data/constructors/RS_MakeEmptyRaster)           | Create an empty raster with specified dimensions     |
| [RS\_MakeRaster](/reference/wherobots-db/raster-data/constructors/RS_MakeRaster)                     | Create a raster from an array of pixel values        |
| [RS\_AsInDB](/reference/wherobots-db/raster-data/constructors/RS_AsInDB)                             | Convert an out-db raster to an in-db raster          |
| [RS\_BandPath](/reference/wherobots-db/raster-data/accessors/RS_BandPath)                            | Get the file path of an out-db raster                |
| [RS\_NetCDFInfo](/reference/wherobots-db/raster-data/constructors/RS_NetCDFInfo)                     | Get variable information from a NetCDF file          |
