> ## Documentation Index
> Fetch the complete documentation index at: https://docs.wherobots.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Raster Support in Havasu

Havasu supports storing raster data as in-db or out-db rasters, and allows users to use raster functions in WherobotsDB for manipulating raster data. This document describes how to use raster data type and raster functions in Havasu.

## Raster as a primitive type

Besides the primitive types supported by Apache Iceberg, Havasu introduced a new data type `raster` to represent raster data. For instance, user can create a table with a raster column using SQL:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    sedona.sql("""
    CREATE TABLE IF NOT EXISTS org_catalog.test_db.test_table (
        id bigint,
        data string,
        rast raster
    )""")
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    sedona.sql("""
    CREATE TABLE IF NOT EXISTS org_catalog.test_db.test_table (
        id bigint,
        data string,
        rast raster
    )""")
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    sedona.sql(
        "CREATE TABLE IF NOT EXISTS org_catalog.test_db.test_table (" +
        "id bigint," +
        "data string," +
        "rast raster" +
        ")"
    );
    ```
  </Tab>
</Tabs>

This will create an empty table with a raster column `rast`. We can inspect the table schema using `DESCRIBE` command:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    sedona.sql("DESCRIBE TABLE org_catalog.test_db.test_table").show()
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    sedona.sql("DESCRIBE TABLE org_catalog.test_db.test_table").show()
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    sedona.sql("DESCRIBE TABLE org_catalog.test_db.test_table").show();
    ```
  </Tab>
</Tabs>

```
+--------+---------+-------+
|col_name|data_type|comment|
+--------+---------+-------+
|      id|   bigint|   null|
|    data|   string|   null|
|    rast|   raster|   null|
+--------+---------+-------+
```

Or using `.printSchema()` function:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    sedona.table("org_catalog.test_db.test_table").printSchema()
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    sedona.table("org_catalog.test_db.test_table").printSchema()
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    sedona.table("org_catalog.test_db.test_table").printSchema();
    ```
  </Tab>
</Tabs>

```
root
 |-- id: long (nullable = true)
 |-- data: string (nullable = true)
 |-- rast: raster (nullable = true)
```

## Creating table with raster column

As mentioned above, user can create a Havasu table using SQL:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    sedona.sql("""
    CREATE TABLE IF NOT EXISTS org_catalog.test_db.test_table (
        id bigint,
        data string,
        rast raster
    )""")
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    sedona.sql("""
    CREATE TABLE IF NOT EXISTS org_catalog.test_db.test_table (
        id bigint,
        data string,
        rast raster
    )""")
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    sedona.sql(
        "CREATE TABLE IF NOT EXISTS org_catalog.test_db.test_table (" +
        "id bigint," +
        "data string," +
        "rast raster" +
        ")"
    );
    ```
  </Tab>
</Tabs>

If user has a DataFrame containing raster data, they can also create a Havasu table using the DataFrame:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    # Create a DataFrame containing raster data
    df_binary = sedona.read.format("binaryFile")\
        .option("pathGlobFilter", "*.tif")\
        .option("recursiveFileLookup", "true")\
        .load('s3://wherobots-examples/data/eurosat_small')
    df_geotiff = df_binary.withColumn("rast", expr("RS_FromGeoTiff(content)"))\
        .selectExpr("rast", "path AS data")

    # Create a Havasu table using the DataFrame
    df_geotiff.writeTo("org_catalog.test_db.test_table").create()
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    // Create a DataFrame containing raster data
    val df_binary = sedona.read.format("binaryFile")
        .option("pathGlobFilter", "*.tif")
        .option("recursiveFileLookup", "true")
        .load("s3://wherobots-examples/data/eurosat_small")

    val df_geotiff = df_binary.withColumn("rast", expr("RS_FromGeoTiff(content)"))
        .selectExpr("rast", "path AS data")

    // Create a Havasu table using the DataFrame
    df_geotiff.writeTo("org_catalog.test_db.test_table").create()
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    // Create a DataFrame containing raster data
    Dataset<Row> df_binary = sedona.read().format("binaryFile")
        .option("pathGlobFilter", "*.tif")
        .option("recursiveFileLookup", "true")
        .load("s3://wherobots-examples/data/eurosat_small");

    Dataset<Row> df_geotiff = df_binary.withColumn("rast", expr("RS_FromGeoTiff(content)"))
        .selectExpr("rast", "path AS data");

    // Create a Havasu table using the DataFrame
    df_geotiff.writeTo("org_catalog.test_db.test_table").create();
    ```
  </Tab>
</Tabs>

## Writing data

### Writing DataFrame to Havasu table

Users can write a DataFrame containing geometry data to a Havasu table:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    df_geotiff.writeTo("org_catalog.test_db.test_table").append()
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    df_geotiff.writeTo("org_catalog.test_db.test_table").append()
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    df_geotiff.writeTo("org_catalog.test_db.test_table").append();
    ```
  </Tab>
</Tabs>

### ACID Transactions

Havasu supports all table update statements supported by Apache Iceberg, such as `UPDATE`, `DELETE` and `MERGE INTO`.
The syntax is identical to the open source Apache Iceberg, please refer to [Apache Iceberg - Write with SQL](https://iceberg.apache.org/docs/latest/spark-writes/#writing-with-sql) for more information.

## Querying data

User can load data from a Havasu table using `sedona.table(...)`:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    df = sedona.table("org_catalog.test_db.test_table")
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    val df = sedona.table("org_catalog.test_db.test_table")
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    Dataset<Row> df = sedona.table("org_catalog.test_db.test_table");
    ```
  </Tab>
</Tabs>

You can apply some configurations for reading the table, such as the split size if you want to read the table into a DataFrame with more partitions:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    df = sedona.read.option('split-size', '1000').table("org_catalog.test_db.test_table")
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    val df = sedona.read.option('split-size', '1000').table("org_catalog.test_db.test_table")
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    Dataset<Row> df = sedona.read().option('split-size', '1000').table("org_catalog.test_db.test_table");
    ```
  </Tab>
</Tabs>

You can run spatial range query on the table using Spatial SQL:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    df = sedona.sql("""
    SELECT * FROM org_catalog.test_db.test_table
    WHERE RS_Intersects(rast, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))
    """)
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    val df = sedona.sql("""
    SELECT * FROM org_catalog.test_db.test_table
    WHERE RS_Intersects(rast, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))
    """)
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    Dataset<Row> df = sedona.sql(
            "SELECT * FROM org_catalog.test_db.test_table " +
            "WHERE RS_Intersects(rast, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))"
    );
    ```
  </Tab>
</Tabs>

Or using the DataFrame API:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    df = sedona.table("org_catalog.test_db.test_table")\
        .where("RS_Intersects(rast, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))")
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    val df = sedona.table("org_catalog.test_db.test_table")
        .where("RS_Intersects(rast, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))")
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    Dataset<Row> df = sedona.table("org_catalog.test_db.test_table")
        .where("RS_Intersects(rast, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))");
    ```
  </Tab>
</Tabs>

Users can also load a Havasu table by specifying the name of the data source explicitly using `.format("havasu.iceberg")`, this will load an isolated table reference that will not automatically refresh tables used by queries.

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    df = sedona.read.format("havasu.iceberg").load("org_catalog.test_db.test_table")
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    val df = sedona.read.format("havasu.iceberg").load("org_catalog.test_db.test_table")
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    Dataset<Row> df = sedona.read().format("havasu.iceberg").load("org_catalog.test_db.test_table");
    ```
  </Tab>
</Tabs>

Havasu is capable of optimizing raster spatial range queries using data skipping. This feature allows user to skip reading data files that don't contain data that satisfy the spatial filter.
Please refer to [Spatial Filter Push-down for Rasters](/reference/havasu/raster/rs-filter-push-down/) for more information.

## Working with raster data

Data in columns with `raster` type will be loaded as `RasterUDT` values in Sedona, user can use any RS\_ functions provided by WherobotsDB to manipulate the raster data. For example, user can use `RS_MapAlgebra` to create a new raster by applying a map algebra expression to the raster column:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    sedona.sql("SELECT RS_MapAlgebra(rast, NULL, 'out = rast[0] * 0.5;') FROM org_catalog.test_db.test_table").show()
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    sedona.sql("SELECT RS_MapAlgebra(rast, NULL, 'out = rast[0] * 0.5;') FROM org_catalog.test_db.test_table").show()
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    sedona.sql("SELECT RS_MapAlgebra(rast, NULL, 'out = rast[0] * 0.5;') FROM org_catalog.test_db.test_table").show();
    ```
  </Tab>
</Tabs>

The resulting DataFrame can also be written back to a Havasu table using a CTAS statement:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    sedona.sql("""
    CREATE TABLE org_catalog.test_db.test_table_mapalg AS
    SELECT RS_MapAlgebra(rast, NULL, 'out = rast[0] * 0.5;') FROM org_catalog.test_db.test_table
    """)
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    sedona.sql("""
    CREATE TABLE org_catalog.test_db.test_table_mapalg AS
    SELECT RS_MapAlgebra(rast, NULL, 'out = rast[0] * 0.5;') FROM org_catalog.test_db.test_table
    """)
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    sedona.sql(
            "CREATE TABLE org_catalog.test_db.test_table_mapalg AS " +
            "SELECT RS_MapAlgebra(rast, NULL, 'out = rast[0] * 0.5;') FROM org_catalog.test_db.test_table"
    );
    ```
  </Tab>
</Tabs>

Or simply call `writeTo` function on the resulting DataFrame of the query:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    sedona.sql("SELECT RS_MapAlgebra(rast, NULL, 'out = rast[0] * 0.5;') FROM org_catalog.test_db.test_table")\
        .writeTo("org_catalog.test_db.test_table_mapalg").create()
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    sedona.sql("SELECT RS_MapAlgebra(rast, NULL, 'out = rast[0] * 0.5;') FROM org_catalog.test_db.test_table")
        .writeTo("org_catalog.test_db.test_table_mapalg").create()
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    sedona.sql("SELECT RS_MapAlgebra(rast, NULL, 'out = rast[0] * 0.5;') FROM org_catalog.test_db.test_table")
        .writeTo("org_catalog.test_db.test_table_mapalg").create()
    ```
  </Tab>
</Tabs>

## Further reading

Havasu is based on Apache Iceberg, all features of Apache Iceberg except MOREtables are supported in WherobotsDB. Please refer to [Apache Iceberg documentation for Spark](https://iceberg.apache.org/docs/latest/getting-started/) for more information.

Out-db rasters usually requires tuning to achieve good performance. To learn more about out-db rasters, please refer to [Out-DB Rasters](/reference/havasu-table/raster-data/out-db-rasters/).

Havasu supports spatial filter push-down for raster tables. Please refer to [Spatial Filter Push-down for Rasters](/reference/havasu/raster/rs-filter-push-down/) to learn how to take advantage of this feature.