Skip to content

Havasu (Iceberg) table management

Create Table using CREATE TABLE Command

To create a table with raster column in WherobotsDB, use sedona.sql(...) to run a CREATE TABLE command:

sedona.sql("CREATE DATABASE IF NOT EXISTS wherobots.test_db")
sedona.sql("CREATE TABLE wherobots.test_db.test_table (data string, rast raster)")
sedona.sql("CREATE DATABASE IF NOT EXISTS wherobots.test_db")
sedona.sql("CREATE TABLE wherobots.test_db.test_table (data string, rast raster)")
sedona.sql("CREATE DATABASE IF NOT EXISTS wherobots.test_db");
sedona.sql("CREATE TABLE wherobots.test_db.test_table (data string, rast raster)");

This will create an empty table. Notice that Havasu introduced a new data type raster to represent raster data. Data in columns with rast type will be loaded as RasterUDT values in WherobotsDB, user can use any RS_ functions provided by WherobotsDB to manipulate the raster data.

Create Table using DataFrame

User can also create a table using a DataFrame. The raster column in the DataFrame will go into the raster column in the created geospatial table, as described in Load Raster Data.

Then we can write this DataFrame to a Havasu table:

df_geotiff.writeTo("wherobots.test_db.test_table").create()
df_geotiff.writeTo("wherobots.test_db.test_table").create()
df_geotiff.writeTo("wherobots.test_db.test_table").create();

Havasu supports a full range of SQL DDL and DML commands which Apache Iceberg supports, such as ALTER TABLE, DROP TABLE, etc.

Writing Raster Data

Raster data in Havasu is similar to PostGIS. It provides 2 options for storing raster data:

  • in-db raster: raster data completely stored in a Havasu table.
  • out-db raster: raster data is stored in an external file (such as GeoTIFF file on S3) and the Havasu table stores the path to the file and geo-referencing metadata of that raster.

In-DB Raster

To construct in-db raster data and write them to a Havasu table, user can use RS_FromGeoTiff to construct a raster value and write the DataFrame using .writeTo(...).append()

# Read GeoTIFF files using RS_FromGeoTiff
df_binary = sedona.read.format("binaryFile")\
    .option("pathGlobFilter", "*.tif")\
    .option("recursiveFileLookup", "true")\
    .load('s3://wherobots-examples/data/eurosat_small')
df_geotiff = df_binary.withColumn("rast", expr("RS_FromGeoTiff(content)"))\
    .selectExpr("rast", "path AS data")

# Writing the data into Havasu table
df_geotiff.writeTo("wherobots.test_db.test_table").append()
// Read GeoTIFF files using RS_FromGeoTiff
val df_binary = sedona.read.format("binaryFile")
    .option("pathGlobFilter", "*.tif")
    .option("recursiveFileLookup", "true")
    .load("s3://wherobots-examples/data/eurosat_small")

val df_geotiff = df_binary.withColumn("rast", expr("RS_FromGeoTiff(content)"))
    .selectExpr("rast", "path AS data")

// Writing the data into Havasu table
df_geotiff.writeTo("wherobots.test_db.test_table").append()
// Read GeoTIFF files using RS_FromGeoTiff
Dataset<Row> df_binary = sedona.read().format("binaryFile")
    .option("pathGlobFilter", "*.tif")
    .option("recursiveFileLookup", "true")
    .load("s3://wherobots-examples/data/eurosat_small");

Dataset<Row> df_geotiff = df_binary.withColumn("rast", expr("RS_FromGeoTiff(content)"))
    .selectExpr("rast", "path AS data");

// Writing the data into Havasu table
df_geotiff.writeTo("wherobots.test_db.test_table").append();

Or user can first load the GeoTIFF file as out-db raster using RS_FromPath, then use RS_AsInDB to convert the out-db raster to in-db raster:

df_paths = spark.createDataFrame([
    ("s3://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_10.tif",),
    ("s3://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_11.tif",)
], ["path"])

# Read GeoTIFF files as out-db rasters using RS_FromPath, then convert to in-db rasters using RS_AsInDB
df_rast = df_paths.withColumn("rast", expr("RS_AsInDB(RS_FromPath(path))"))

# Writing the data into Havasu table
df_rast.selectExpr("rast", "path AS data").writeTo("wherobots.test_db.test_table").append()
val df_paths = Seq(
    "s3://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_10.tif",
    "s3://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_11.tif"
).toDF("path")

// Read GeoTIFF files as out-db rasters using RS_FromPath, then convert to in-db rasters using RS_AsInDB
val df_rast = df_paths.withColumn("rast", expr("RS_AsInDB(RS_FromPath(path))"))

// Writing the data into Havasu table
df_rast.selectExpr("rast", "path AS data").writeTo("wherobots.test_db.test_table").append()
List<Row> data = Arrays.asList(
    RowFactory.create("s3://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_10.tif"),
    RowFactory.create("s3://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_11.tif")
);
List<StructField> fields = Arrays.asList(
    DataTypes.createStructField("path", DataTypes.StringType, true)
);
StructType schema = DataTypes.createStructType(fields);
Dataset<Row> df_paths = spark.createDataFrame(data, schema);

// Read GeoTIFF files as out-db rasters using RS_FromPath, then convert to in-db rasters using RS_AsInDB
Dataset<Row> df_rast = df_paths.withColumn("rast", expr("RS_AsInDB(RS_FromPath(path))"));

// Writing the data into Havasu table
df_rast.selectExpr("rast", "path AS data").writeTo("wherobots.test_db.test_table").append();

Out-DB Raster

To construct out-db raster data and write them to a Havasu table, user can use RS_FromPath to construct a raster value and insert it into the table:

df_paths = spark.createDataFrame([
    ("s3://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_10.tif",),
    ("s3://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_11.tif",)
], ["path"])

# Read GeoTIFF files as out-db rasters using RS_FromPath
df_rast = df_paths.withColumn("rast", expr("RS_FromPath(path)"))

# Writing the data into Havasu table
df_rast.selectExpr("rast", "path AS data").writeTo("wherobots.test_db.test_table").append()
val df_paths = Seq(
    "s3://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_10.tif",
    "s3://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_11.tif"
).toDF("path")

// Read GeoTIFF files as out-db rasters using RS_FromPath
val df_rast = df_paths.withColumn("rast", expr("RS_FromPath(path)"))

// Writing the data into Havasu table
df_rast.selectExpr("rast", "path AS data").writeTo("wherobots.test_db.test_table").append()

List<Row> data = Arrays.asList(
    RowFactory.create("s3://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_10.tif"),
    RowFactory.create("s3://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_11.tif")
);
List<StructField> fields = Arrays.asList(
    DataTypes.createStructField("path", DataTypes.StringType, true)
);
StructType schema = DataTypes.createStructType(fields);
Dataset<Row> df_paths = spark.createDataFrame(data, schema);

// Read GeoTIFF files as out-db rasters using RS_FromPath
Dataset<Row> df_rast = df_paths.withColumn("rast", expr("RS_FromPath(path)"));

// Writing the data into Havasu table
df_rast.selectExpr("rast", "path AS data").writeTo("wherobots.test_db.test_table").append();
```

Havasu also supports a range of table update operations, such as UPDATE, DELETE, MERGE INTO, etc. Please refer to Apache Iceberg - Spark Writes for details.

Reading Raster Table

Havasu tables can be queried using SQL SELECT statements:

df = sedona.sql("SELECT * FROM wherobots.db_name.table_name")
df = sedona.sql("SELECT * FROM wherobots.db_name.table_name")
Dataset<Row> df = sedona.sql("SELECT * FROM wherobots.db_name.table_name");

Or using DataFrame API:

df = sedona.table("wherobots.db_name.table_name")
df = sedona.table("wherobots.db_name.table_name")
Dataset<Row> df = sedona.table("wherobots.db_name.table_name");

Havasu also supports time travel and inspecting metadata of table. Please refer to Apache Iceberg - Queries for details.

Processing Raster Data using Spatial SQL

Users can use any RT_ functions provided by WherobotsDB to manipulate raster data read from Havasu.

# Read and manipulate geospatial data in a Havasu table
df = sedona.table("wherobots.test_db.test_table")
df_mapalg = df.withColumn("rast", expr("RS_MapAlgebra(rast, NULL, 'out = 0.5 * (rast[0] + rast[1]);')"))

# Write geospatial data back to a Havasu table
df_mapalg.writeTo("wherobots.test_db.test_table_2").create()
// Read and manipulate geospatial data in a Havasu table
val df = sedona.table("wherobots.test_db.test_table")
val df_mapalg = df.withColumn("rast", expr("RS_MapAlgebra(rast, NULL, 'out = 0.5 * (rast[0] + rast[1]);')"))

// Write geospatial data back to a Havasu table
df_mapalg.writeTo("wherobots.test_db.test_table_2").create()
// Read and manipulate geospatial data in a Havasu table
Dataset<Row> df = sedona.table("wherobots.test_db.test_table");
Dataset<Row> df_mapalg = df.withColumn("rast", expr("RS_MapAlgebra(rast, NULL, 'out = 0.5 * (rast[0] + rast[1]);')"));

// Write geospatial data back to a Havasu table
df_mapalg.writeTo("wherobots.test_db.test_table_2").create();

Further Reading

For more information, please refer to Raster Support in Havasu.


Last update: May 20, 2024 16:46:41