> ## Documentation Index
> Fetch the complete documentation index at: https://docs.wherobots.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Geometry Support in Havasu

Havasu supports geometry data type and allows users to use geometry functions in WherobotsDB for manipulating geometry data. This document describes how to use geometry data type and geometry functions in Havasu.

## Geometry as a Primitive Type

Besides the primitive types supported by Apache Iceberg, Havasu introduced a new data type `GEOMETRY` to represent geospatial data. For instance, user can create a table with a geometry column using SQL:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    sedona.sql("""
    CREATE TABLE IF NOT EXISTS org_catalog.test_db.test_table (
        id bigint,
        data string,
        geom geometry
    ) PARTITIONED BY (bucket(4, id))""")
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    sedona.sql("""
    CREATE TABLE IF NOT EXISTS org_catalog.test_db.test_table (
        id bigint,
        data string,
        geom geometry
    ) PARTITIONED BY (bucket(4, id))""")

    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    sedona.sql(
        "CREATE TABLE IF NOT EXISTS org_catalog.test_db.test_table (" +
        "id bigint," +
        "data string," +
        "geom geometry" +
        ") PARTITIONED BY (bucket(4, id))"
    );
    ```
  </Tab>
</Tabs>

This will create an empty table partitioned by the bucketed value of `id`. Notice that Havasu introduced a new data type `GEOMETRY` to represent geospatial data. We can inspect the table schema using `DESCRIBE` command:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    sedona.sql("DESCRIBE TABLE org_catalog.test_db.test_table").show()
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    sedona.sql("DESCRIBE TABLE org_catalog.test_db.test_table").show()
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    sedona.sql("DESCRIBE TABLE org_catalog.test_db.test_table").show();
    ```
  </Tab>
</Tabs>

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    sedona.sql("DESCRIBE TABLE org_catalog.test_db.test_table").show()
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    sedona.sql("DESCRIBE TABLE org_catalog.test_db.test_table").show()
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    sedona.sql("DESCRIBE TABLE org_catalog.test_db.test_table").show();
    ```
  </Tab>
</Tabs>

```
+--------------+-------------+-------+
|      col_name|    data_type|comment|
+--------------+-------------+-------+
|            id|       bigint|   null|
|          data|       string|   null|
|          geom|     geometry|   null|
|              |             |       |
|# Partitioning|             |       |
|        Part 0|bucket(4, id)|       |
+--------------+-------------+-------+
```

Or using `.printSchema()` function:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    sedona.table("org_catalog.test_db.test_table").printSchema()
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    sedona.table("org_catalog.test_db.test_table").printSchema()
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    sedona.table("org_catalog.test_db.test_table").printSchema();
    ```
  </Tab>
</Tabs>

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    sedona.table("org_catalog.test_db.test_table").printSchema()
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    sedona.table("org_catalog.test_db.test_table").printSchema()
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    sedona.table("org_catalog.test_db.test_table").printSchema();
    ```
  </Tab>
</Tabs>

```
root
 |-- id: long (nullable = true)
 |-- data: string (nullable = true)
 |-- geom: geometry (nullable = true)
```

## Creating Table with Geometry Column

As mentioned above, user can create a Havasu table using SQL:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    sedona.sql("""
    CREATE TABLE IF NOT EXISTS org_catalog.test_db.test_table (
        id bigint,
        data string,
        geom geometry
    ) PARTITIONED BY (bucket(4, id))""")
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    sedona.sql("""
    CREATE TABLE IF NOT EXISTS org_catalog.test_db.test_table (
        id bigint,
        data string,
        geom geometry
    ) PARTITIONED BY (bucket(4, id))""")
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    sedona.sql(
        "CREATE TABLE IF NOT EXISTS org_catalog.test_db.test_table (" +
        "id bigint," +
        "data string," +
        "geom geometry" +
        ") PARTITIONED BY (bucket(4, id))"
    );
    ```
  </Tab>
</Tabs>

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    sedona.sql("""
    CREATE TABLE IF NOT EXISTS org_catalog.test_db.test_table (
            id bigint,
            data string,
            geom geometry
    ) PARTITIONED BY (bucket(4, id))""")
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    sedona.sql("""
    CREATE TABLE IF NOT EXISTS org_catalog.test_db.test_table (
            id bigint,
            data string,
            geom geometry
    ) PARTITIONED BY (bucket(4, id))""")
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    sedona.sql(
            "CREATE TABLE IF NOT EXISTS org_catalog.test_db.test_table (" +
            "id bigint," +
            "data string," +
            "geom geometry" +
            ") PARTITIONED BY (bucket(4, id))"
    );
    ```
  </Tab>
</Tabs>

If user has a DataFrame containing geometry data, they can also create a Havasu table using the DataFrame:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    # Create a DataFrame with geometry data
    df = sedona.createDataFrame([
        (1, "a", "POINT (1 2)"),
        (2, "b", "POINT (2 3)")
    ], ["id", "data", "geom"])
    .withColumn("geom", expr("ST_GeomFromText(geom)"))

    # Create a Havasu table using the DataFrame
    df.writeTo("org_catalog.test_db.test_table").create()
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    // Create a DataFrame with geometry data
    val df = Seq(
        (1, "a", "POINT (1 2)"),
        (2, "b", "POINT (2 3)")
    ).toDF("id", "data", "geom")
    .withColumn("geom", expr("ST_GeomFromText(geom)"))

    // Create a Havasu table using the DataFrame
    df.writeTo("org_catalog.test_db.test_table").create()
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    // Create a DataFrame with geometry data
    List<Row> data = Arrays.asList(
        RowFactory.create(1, "a", "POINT (1 2)"),
        RowFactory.create(2, "b", "POINT (2 3)")
    );

    List<StructField> fields = Arrays.asList(
        DataTypes.createStructField("id", DataTypes.IntegerType, true),
        DataTypes.createStructField("data", DataTypes.StringType, true),
        DataTypes.createStructField("geom", DataTypes.StringType, true)
    );

    StructType schema = DataTypes.createStructType(fields);

    Dataset<Row> df = sedona.createDataFrame(data, schema)
        .withColumn("geom", expr("ST_GeomFromText(geom)"));

    // Create a Havasu table using the DataFrame
    df.writeTo("org_catalog.test_db.test_table").create();// Create a DataFrame with geometry data
    ```
  </Tab>
</Tabs>

## Writing Data

### INSERT INTO

User can insert data into a Havasu table using `INSERT INTO table_name VALUES`:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    sedona.sql("""
    INSERT INTO org_catalog.test_db.test_table
    VALUES (1, 'a', ST_GeomFromText('POINT (1 2)')), (2, 'b', ST_Point(2, 3))
    """)
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    sedona.sql("""
    INSERT INTO org_catalog.test_db.test_table
    VALUES (1, 'a', ST_GeomFromText('POINT (1 2)')), (2, 'b', ST_Point(2, 3))
    """)
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    sedona.sql(
            "INSERT INTO org_catalog.test_db.test_table " +
            "VALUES (1, 'a', ST_GeomFromText('POINT (1 2)')), (2, 'b', ST_Point(2, 3))"
    );

    ```
  </Tab>
</Tabs>

Or using `INSERT INTO table_name SELECT ...` to insert result set of a query into the table:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    sedona.sql("INSERT INTO org_catalog.test_db.test_table SELECT 3, 'c', ST_Point(3, 4)")
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    sedona.sql("INSERT INTO org_catalog.test_db.test_table SELECT 3, 'c', ST_Point(3, 4)")
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    sedona.sql("INSERT INTO org_catalog.test_db.test_table SELECT 3, 'c', ST_Point(3, 4)");
    ```
  </Tab>
</Tabs>

### Writing DataFrame to Havasu table

User can write a DataFrame containing geometry data to a Havasu table:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    # Create a DataFrame with geometry data
    df = sedona.createDataFrame([
        (1, "a", "POINT (1 2)"),
        (2, "b", "POINT (2 3)")
    ], ["id", "data", "geom"])
    .withColumn("geom", expr("ST_GeomFromText(geom)"))

    # Write the DataFrame to a Havasu table
    df.writeTo("org_catalog.test_db.test_table").append()
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    // Create a DataFrame with geometry data
    val df = Seq(
        (1, "a", "POINT (1 2)"),
        (2, "b", "POINT (2 3)")
    ).toDF("id", "data", "geom")
    .withColumn("geom", expr("ST_GeomFromText(geom)"))

    // Write the DataFrame to a Havasu table
    df.writeTo("org_catalog.test_db.test_table").append()
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    List<Row> data = Arrays.asList(
        RowFactory.create(1, "a", "POINT (1 2)"),
        RowFactory.create(2, "b", "POINT (2 3)")
    );

    List<StructField> fields = Arrays.asList(
        DataTypes.createStructField("id", DataTypes.IntegerType, true),
        DataTypes.createStructField("data", DataTypes.StringType, true),
        DataTypes.createStructField("geom", DataTypes.StringType, true)
    );

    StructType schema = DataTypes.createStructType(fields);

    Dataset<Row> df = sedona.createDataFrame(data, schema)
        .withColumn("geom", expr("ST_GeomFromText(geom)"));

    df.writeTo("org_catalog.test_db.test_table").append();
    ```
  </Tab>
</Tabs>

This is semantically equivalent to the `INSERT INTO table_name SELECT ...` statement.

### Updating data in Havasu table

Havasu supports UPDATE queries that update matching rows in tables. Update queries accept a filter to match rows to update. Spatial filters are also supported in Havasu.

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    sedona.sql("UPDATE org_catalog.test_db.test_table SET data = 'd' WHERE id = 1")

    # Use ST_Intersects to update rows that intersect with a polygon
    sedona.sql("""
    UPDATE org_catalog.test_db.test_table
    SET data = 'd'
    WHERE ST_Intersects(geom, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))
    """)
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    sedona.sql("UPDATE org_catalog.test_db.test_table SET data = 'd' WHERE id = 1")

    // Use ST_Intersects to update rows that intersect with a polygon
    sedona.sql("""
    UPDATE org_catalog.test_db.test_table
    SET data = 'd'
    WHERE ST_Intersects(geom, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))
    """)
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    sedona.sql("UPDATE org_catalog.test_db.test_table SET data = 'd' WHERE id = 1");

    // Use ST_Intersects to update rows that intersect with a polygon
    sedona.sql(
            "UPDATE org_catalog.test_db.test_table " +
            "SET data = 'd' " +
            "WHERE ST_Intersects(geom, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))"
    );
    ```
  </Tab>
</Tabs>

### Deleting data from Havasu table

Havasu supports DELETE FROM queries to remove data from tables. Delete queries accept a filter to match rows to delete. Spatial filters are also supported in Havasu.

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    sedona.sql("DELETE FROM org_catalog.test_db.test_table WHERE id = 1")

    # Use ST_Intersects to delete rows that intersect with a polygon
    sedona.sql("""
    DELETE FROM org_catalog.test_db.test_table
    WHERE ST_Intersects(geom, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))
    """)
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    sedona.sql("DELETE FROM org_catalog.test_db.test_table WHERE id = 1")

    // Use ST_Intersects to delete rows that intersect with a polygon
    sedona.sql("""
    DELETE FROM org_catalog.test_db.test_table
    WHERE ST_Intersects(geom, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))
    """)
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    sedona.sql("DELETE FROM org_catalog.test_db.test_table WHERE id = 1");

    // Use ST_Intersects to delete rows that intersect with a polygon
    sedona.sql(
            "DELETE FROM org_catalog.test_db.test_table " +
            "WHERE ST_Intersects(geom, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))"
    );
    ```
  </Tab>
</Tabs>

### Merging DataFrame into Havasu table using `MERGE INTO`

Havasu supports MERGE INTO by rewriting data files that contain rows that need to be updated in an overwrite commit.
The syntax is identical to the open source Apache Iceberg, please refer to [Apache Iceberg - MERGE INTO](https://iceberg.apache.org/docs/latest/spark-writes/#merge-into) for more information.

## Querying Data

User can load data from a Havasu table using `sedona.table(...)`:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    df = sedona.table("org_catalog.test_db.test_table")
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    df = sedona.table("org_catalog.test_db.test_table")
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    Dataset<Row> df = sedona.table("org_catalog.test_db.test_table");
    ```
  </Tab>
</Tabs>

You can apply some configurations for reading the table, such as the split size if you want to read the table into a DataFrame with more partitions:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    df = sedona.read.option('split-size', '1000').table("org_catalog.test_db.test_table")
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    df = sedona.read.option('split-size', '1000').table("org_catalog.test_db.test_table")
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    Dataset<Row> df = sedona.read().option("split-size", "1000").table("org_catalog.test_db.test_table");
    ```
  </Tab>
</Tabs>

You can run spatial range query on the table using Spatial SQL:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    df = sedona.sql("""
    SELECT * FROM org_catalog.test_db.test_table
    WHERE ST_Intersects(geom, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))
    """)
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    df = sedona.sql("""
    SELECT * FROM org_catalog.test_db.test_table
    WHERE ST_Intersects(geom, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))
    """)
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    Dataset<Row> df = sedona.sql(
            "SELECT * FROM org_catalog.test_db.test_table " +
            "WHERE ST_Intersects(geom, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))"
    );
    ```
  </Tab>
</Tabs>

Or using the DataFrame API:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    df = sedona.table("org_catalog.test_db.test_table")\
        .where("ST_Intersects(geom, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))")
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    df = sedona.table("org_catalog.test_db.test_table")\
        .where("ST_Intersects(geom, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))")
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    Dataset<Row> df = sedona.table("org_catalog.test_db.test_table")
        .where("ST_Intersects(geom, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))");
    ```
  </Tab>
</Tabs>

Users can also load a Havasu table by specifying the name of the data source explicitly using `.format("havasu.iceberg")`, this will load an isolated table reference that will not automatically refresh tables used by queries.

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    df = sedona.read.format("havasu.iceberg").load("org_catalog.test_db.test_table")\
        .where("ST_Intersects(geom, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))")
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    val df = sedona.read.format("havasu.iceberg").load("org_catalog.test_db.test_table")\
        .where("ST_Intersects(geom, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))")
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    Dataset<Row> df = sedona.read().format("havasu.iceberg").load("org_catalog.test_db.test_table")
        .where("ST_Intersects(geom, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))");
    ```
  </Tab>
</Tabs>

Spatial range queries are very efficient in Havasu. Havasu supports spatial filter based data skipping. This feature allows user to skip reading data files that don't contain data that satisfy the spatial filter.
Please refer to [Cluster by geospatial fields for faster queries](/reference/havasu-table/geometry-data/cluster-geometry-table/) for more information.

## Working with Geometry Data

Data in columns with `geometry` type will be loaded as `GeometryUDT` values in Sedona, user can use any ST\_ functions provided by WherobotsDB to manipulate the geospatial data. For example, user can use `ST_Buffer` to create a buffer around the geometry column:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    sedona.sql("SELECT ST_Buffer(geom, 0.001) FROM org_catalog.test_db.test_table").show()
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    sedona.sql("SELECT ST_Buffer(geom, 0.001) FROM org_catalog.test_db.test_table").show()
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    sedona.sql("SELECT ST_Buffer(geom, 0.001) FROM org_catalog.test_db.test_table").show();
    ```
  </Tab>
</Tabs>

```
+----------------------+
|st_buffer(geom, 0.001)|
+----------------------+
|  POLYGON ((3.001 4...|
|  POLYGON ((1.001 2...|
|  POLYGON ((2.001 3...|
|  POLYGON ((1.001 2...|
|  POLYGON ((2.001 3...|
+----------------------+
```

The resulting DataFrame can also be written back to a Havasu table using a CTAS statement:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    sedona.sql("""
    CREATE TABLE org_catalog.test_db.test_table_buf AS
    SELECT ST_Buffer(geom, 0.001) FROM org_catalog.test_db.test_table
    """)
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    sedona.sql("""
    CREATE TABLE org_catalog.test_db.test_table_buf AS
    SELECT ST_Buffer(geom, 0.001) FROM org_catalog.test_db.test_table
    """)
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    sedona.sql(
            "CREATE TABLE org_catalog.test_db.test_table_buf AS " +
            "SELECT ST_Buffer(geom, 0.001) FROM org_catalog.test_db.test_table"
    );
    ```
  </Tab>
</Tabs>

Or simply call `writeTo` function on the resulting DataFrame of the query:

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    sedona.sql("SELECT ST_Buffer(geom, 0.001) FROM org_catalog.test_db.test_table")\
        .writeTo("org_catalog.test_db.test_table_buf").create()
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    sedona.sql("SELECT ST_Buffer(geom, 0.001) FROM org_catalog.test_db.test_table")
        .writeTo("org_catalog.test_db.test_table_buf").create()
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    sedona.sql("SELECT ST_Buffer(geom, 0.001) FROM org_catalog.test_db.test_table")
        .writeTo("org_catalog.test_db.test_table_buf").create();
    ```
  </Tab>
</Tabs>

## Further Reading

Havasu is based on Apache Iceberg, all features of Apache Iceberg except MOR tables are supported in Havasu. Please refer to [Apache Iceberg documentation for Spark](https://iceberg.apache.org/docs/latest/getting-started/) for more information.

If you have spatial data stored in Iceberg table or parquet files as WKT or WKB, you
can migrate your data to Havasu very efficiently without scanning or rewriting your data files.
Please refer to [Convert Existing Table to Havasu Table](/reference/havasu-table/geometry-data/convert-geometry-fields/)
and [Migrating Parquet Files to Havasu](/reference/havasu/geometry/migrating-parquet-files/) for more information.

Havasu supports spatial filter push down and can optimize spatial
range queries.

Please refer to [Cluster by geospatial fields for faster queries](/reference/havasu-table/geometry-data/cluster-geometry-table/) for how to organize your spatial data for better performance.