GEOMETRY columns were stored as EWKB (Extended Well-Known Binary) format in the data files.
The GEOMETRY column is defined as BINARY column with some Havasu specific metadata in the table metadata, so that other
implementations of Apache Iceberg could still read the data as binary values. The benefit of using GEOMETRY column to store
geospatial data is that it enforces the data in the geometry column be to valid geospatial data, and Havasu will collect
statistics on the geometry data in the column, which could be used to optimize spatial range
queries (see Cluster by geospatial fields for faster queries.
Havasu also supports storing geometry data in other formats, such as WKT, WKB, GeoJSON, etc. Though user cannot explicitly specify the
format of the geometry column to formats other than EWKB. These geometry formats are designed solely for easier data migration from
already existing Iceberg tables or parquet files to Havasu. If user has an Iceberg table containing serialized geometry
data, they can convert the data to GEOMETRY column without re-writing the table. Plain parquet files containing serialized geometry
data can also be directly loaded into Havasu without scanning the data files.
Please refer to Migrating Parquet Files to Havasu for details.
If you have an existing Iceberg table containing spatial data, you can upgrade it to use GEOMETRY column by running the following command:
| Column Type | Geometry Format |
|---|---|
| BINARY | wkb |
| BINARY | ewkb |
| STRING | wkt |
| STRING | geojson |
geom column to use GEOMETRY column:
STRING column storing WKT in this case), and any geospatial data written into this table will be serialized as WKT, which make the table backward compatible. Again, the benefit is that user get strong enforcements on the data in the geometry column. Havasu will raise an error if invalid data was encountered when reading the data, and writing plain string values into the geometry column will not be allowed. If user want to write data to the upgraded Havasu table, they need to write geometry objects into the geometry column.
wherobots.db.table_name will also be serialized as WKT. This is for being compatible with the original data type of the geometry column.
If you want to revert the upgrade, you can run the following command:
wkt column will be converted back to STRING column, and user could write any plain string values into this column.
The
GEOMETRY column is converted in this way does not have any Havasu-specific spatial statistics metadata, so Havasu will not be able to optimize spatial range queries on this column. If you want Havasu to be able to
optimize spatial range queries on this column, you need to rewrite the table using Iceberg rewrite_data_files procedure or CREATE SPATIAL INDEX FOR statement. Please refer to Cluster by geospatial fields for faster queries for details.
