Havasu (Iceberg) table management
Create Table using CREATE TABLE
Command¶
To create a table with raster column in WherobotsDB, use sedona.sql(...)
to run a CREATE TABLE
command:
sedona.sql("CREATE DATABASE IF NOT EXISTS wherobots.test_db")
sedona.sql("CREATE TABLE wherobots.test_db.test_table (data string, rast raster)")
sedona.sql("CREATE DATABASE IF NOT EXISTS wherobots.test_db")
sedona.sql("CREATE TABLE wherobots.test_db.test_table (data string, rast raster)")
sedona.sql("CREATE DATABASE IF NOT EXISTS wherobots.test_db");
sedona.sql("CREATE TABLE wherobots.test_db.test_table (data string, rast raster)");
This will create an empty table. Notice that Havasu introduced a new data type raster
to represent raster data. Data in columns with rast
type will be loaded as RasterUDT
values in WherobotsDB, user can use any RS_ functions provided by WherobotsDB to manipulate the raster data.
Create Table using DataFrame¶
User can also create a table using a DataFrame. The raster
column in the DataFrame will go into the raster
column in the created geospatial table, as described in Load Raster Data.
Then we can write this DataFrame to a Havasu table:
df_geotiff.writeTo("wherobots.test_db.test_table").create()
df_geotiff.writeTo("wherobots.test_db.test_table").create()
df_geotiff.writeTo("wherobots.test_db.test_table").create();
Havasu supports a full range of SQL DDL and DML commands which Apache Iceberg supports, such as ALTER TABLE
, DROP TABLE
, etc.
Writing Raster Data¶
Raster data in Havasu is similar to PostGIS. It provides 2 options for storing raster data:
- in-db raster: raster data completely stored in a Havasu table.
- out-db raster: raster data is stored in an external file (such as GeoTIFF file on S3) and the Havasu table stores the path to the file and geo-referencing metadata of that raster.
In-DB Raster¶
To construct in-db raster data and write them to a Havasu table, user can use RS_FromGeoTiff
to construct a raster value and write the DataFrame using .writeTo(...).append()
# Read GeoTIFF files using RS_FromGeoTiff
df_binary = sedona.read.format("binaryFile")\
.option("pathGlobFilter", "*.tif")\
.option("recursiveFileLookup", "true")\
.load('s3://wherobots-examples/data/eurosat_small')
df_geotiff = df_binary.withColumn("rast", expr("RS_FromGeoTiff(content)"))\
.selectExpr("rast", "path AS data")
# Writing the data into Havasu table
df_geotiff.writeTo("wherobots.test_db.test_table").append()
// Read GeoTIFF files using RS_FromGeoTiff
val df_binary = sedona.read.format("binaryFile")
.option("pathGlobFilter", "*.tif")
.option("recursiveFileLookup", "true")
.load("s3://wherobots-examples/data/eurosat_small")
val df_geotiff = df_binary.withColumn("rast", expr("RS_FromGeoTiff(content)"))
.selectExpr("rast", "path AS data")
// Writing the data into Havasu table
df_geotiff.writeTo("wherobots.test_db.test_table").append()
// Read GeoTIFF files using RS_FromGeoTiff
Dataset<Row> df_binary = sedona.read().format("binaryFile")
.option("pathGlobFilter", "*.tif")
.option("recursiveFileLookup", "true")
.load("s3://wherobots-examples/data/eurosat_small");
Dataset<Row> df_geotiff = df_binary.withColumn("rast", expr("RS_FromGeoTiff(content)"))
.selectExpr("rast", "path AS data");
// Writing the data into Havasu table
df_geotiff.writeTo("wherobots.test_db.test_table").append();
Or user can first load the GeoTIFF file as out-db raster using RS_FromPath
, then use RS_AsInDB
to convert the out-db raster to in-db raster:
df_paths = spark.createDataFrame([
("s3://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_10.tif",),
("s3://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_11.tif",)
], ["path"])
# Read GeoTIFF files as out-db rasters using RS_FromPath, then convert to in-db rasters using RS_AsInDB
df_rast = df_paths.withColumn("rast", expr("RS_AsInDB(RS_FromPath(path))"))
# Writing the data into Havasu table
df_rast.selectExpr("rast", "path AS data").writeTo("wherobots.test_db.test_table").append()
val df_paths = Seq(
"s3://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_10.tif",
"s3://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_11.tif"
).toDF("path")
// Read GeoTIFF files as out-db rasters using RS_FromPath, then convert to in-db rasters using RS_AsInDB
val df_rast = df_paths.withColumn("rast", expr("RS_AsInDB(RS_FromPath(path))"))
// Writing the data into Havasu table
df_rast.selectExpr("rast", "path AS data").writeTo("wherobots.test_db.test_table").append()
List<Row> data = Arrays.asList(
RowFactory.create("s3://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_10.tif"),
RowFactory.create("s3://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_11.tif")
);
List<StructField> fields = Arrays.asList(
DataTypes.createStructField("path", DataTypes.StringType, true)
);
StructType schema = DataTypes.createStructType(fields);
Dataset<Row> df_paths = spark.createDataFrame(data, schema);
// Read GeoTIFF files as out-db rasters using RS_FromPath, then convert to in-db rasters using RS_AsInDB
Dataset<Row> df_rast = df_paths.withColumn("rast", expr("RS_AsInDB(RS_FromPath(path))"));
// Writing the data into Havasu table
df_rast.selectExpr("rast", "path AS data").writeTo("wherobots.test_db.test_table").append();
Out-DB Raster¶
To construct out-db raster data and write them to a Havasu table, user can use RS_FromPath
to construct a raster value and insert it into the table:
df_paths = spark.createDataFrame([
("s3://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_10.tif",),
("s3://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_11.tif",)
], ["path"])
# Read GeoTIFF files as out-db rasters using RS_FromPath
df_rast = df_paths.withColumn("rast", expr("RS_FromPath(path)"))
# Writing the data into Havasu table
df_rast.selectExpr("rast", "path AS data").writeTo("wherobots.test_db.test_table").append()
val df_paths = Seq(
"s3://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_10.tif",
"s3://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_11.tif"
).toDF("path")
// Read GeoTIFF files as out-db rasters using RS_FromPath
val df_rast = df_paths.withColumn("rast", expr("RS_FromPath(path)"))
// Writing the data into Havasu table
df_rast.selectExpr("rast", "path AS data").writeTo("wherobots.test_db.test_table").append()
List<Row> data = Arrays.asList(
RowFactory.create("s3://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_10.tif"),
RowFactory.create("s3://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_11.tif")
);
List<StructField> fields = Arrays.asList(
DataTypes.createStructField("path", DataTypes.StringType, true)
);
StructType schema = DataTypes.createStructType(fields);
Dataset<Row> df_paths = spark.createDataFrame(data, schema);
// Read GeoTIFF files as out-db rasters using RS_FromPath
Dataset<Row> df_rast = df_paths.withColumn("rast", expr("RS_FromPath(path)"));
// Writing the data into Havasu table
df_rast.selectExpr("rast", "path AS data").writeTo("wherobots.test_db.test_table").append();
Havasu also supports a range of table update operations, such as UPDATE
, DELETE
, MERGE INTO
, etc. Please refer to Apache Iceberg - Spark Writes for details.
Reading Raster Table¶
Havasu tables can be queried using SQL SELECT
statements:
df = sedona.sql("SELECT * FROM wherobots.db_name.table_name")
df = sedona.sql("SELECT * FROM wherobots.db_name.table_name")
Dataset<Row> df = sedona.sql("SELECT * FROM wherobots.db_name.table_name");
Or using DataFrame API:
df = sedona.table("wherobots.db_name.table_name")
df = sedona.table("wherobots.db_name.table_name")
Dataset<Row> df = sedona.table("wherobots.db_name.table_name");
Havasu also supports time travel and inspecting metadata of table. Please refer to Apache Iceberg - Queries for details.
Processing Raster Data using Spatial SQL¶
Users can use any RT_
functions provided by WherobotsDB to manipulate raster data read from Havasu.
# Read and manipulate geospatial data in a Havasu table
df = sedona.table("wherobots.test_db.test_table")
df_mapalg = df.withColumn("rast", expr("RS_MapAlgebra(rast, NULL, 'out = 0.5 * (rast[0] + rast[1]);')"))
# Write geospatial data back to a Havasu table
df_mapalg.writeTo("wherobots.test_db.test_table_2").create()
// Read and manipulate geospatial data in a Havasu table
val df = sedona.table("wherobots.test_db.test_table")
val df_mapalg = df.withColumn("rast", expr("RS_MapAlgebra(rast, NULL, 'out = 0.5 * (rast[0] + rast[1]);')"))
// Write geospatial data back to a Havasu table
df_mapalg.writeTo("wherobots.test_db.test_table_2").create()
// Read and manipulate geospatial data in a Havasu table
Dataset<Row> df = sedona.table("wherobots.test_db.test_table");
Dataset<Row> df_mapalg = df.withColumn("rast", expr("RS_MapAlgebra(rast, NULL, 'out = 0.5 * (rast[0] + rast[1]);')"));
// Write geospatial data back to a Havasu table
df_mapalg.writeTo("wherobots.test_db.test_table_2").create();
Further Reading¶
For more information, please refer to Raster Support in Havasu.