Convert Existing Table to Havasu Table¶
By default, values in GEOMETRY
columns were stored as EWKB (Extended Well-Known Binary) format in the data files. The GEOMETRY
column is defined as BINARY
column with some Havasu specific metadata in the table metadata, so that other implementations of Apache Iceberg could still read the data as binary values. The benefit of using GEOMETRY
column to store geospatial data is that it enforces the data in the geometry column be to valid geospatial data, and Havasu will collect statistics on the geometry data in the column, which could be used to optimize spatial range queries (see Cluster by geospatial fields for faster queries).
Havasu also supports storing geometry data in other formats, such as WKT, WKB, GeoJSON, etc. Though user cannot explicitly specify the format of the geometry column to formats other than EWKB. These geometry formats are designed solely for easier data migration from already existing Iceberg tables or parquet files to Havasu. If user has an Iceberg table containing serialized geometry data, they can convert the data to GEOMETRY
column without re-writing the table. Plain parquet files containing serialized geometry data can also be directly loaded into Havasu without scanning the data files. Please refer to Migrating Parquet Files to Havasu for details.
If you have an existing Iceberg table containing spatial data, you can upgrade it to use GEOMETRY
column by running the following command:
ALTER TABLE <table_name> SET GEOMETRY FIELDS <column_name> AS <geometry_format>;
Supported column types and geometry formats are:
Column Type | Geometry Format |
---|---|
BINARY | wkb |
BINARY | ewkb |
STRING | wkt |
STRING | geojson |
For example, we can upgrade a table containing WKT values in geom
column to use GEOMETRY
column:
ALTER TABLE wherobots.db.table_name SET GEOMETRY FIELDS wkt AS 'wkt';
This command will only attach Havasu-specific metadata to the converted column and won’t rewrite the data. The data will remain in its original format (STRING
column storing WKT in this case), and any geospatial data written into this table will be serialized as WKT, which make the table backward compatible. Again, the benefit is that user get strong enforcements on the data in the geometry column. Havasu will raise an error if invalid data was encountered when reading the data, and writing plain string values into the geometry column will not be allowed. If user want to write data to the upgraded Havasu table, they need to write geometry objects into the geometry column.
INSERT INTO wherobots.db.table_name (`wkt`, `id`) VALUES (ST_GeomFromWKT('POINT (1 1)'), 1);
Havasu will write new data to migrated table in its original serialized geometry format. In this example, new geometry values written to wherobots.db.table_name
will also be serialized as WKT. This is for being compatible with the original data type of the geometry column.
If you want to revert the upgrade, you can run the following command:
ALTER TABLE wherobots.db.table_name SET GEOMETRY FIELDS wkt AS 'none';
The wkt
column will be converted back to STRING
column, and user could write any plain string values into this column.
Note
The GEOMETRY
column is converted in this way does not have any Havasu-specific spatial statistics metadata, so Havasu will not be able to optimize spatial range queries on this column. If you want Havasu to be able to optimize spatial range queries on this column, you need to rewrite the table using Iceberg rewrite_data_files
procedure or CREATE SPATIAL INDEX FOR
statement. Please refer to Cluster by geospatial fields for faster queries for details.