Skip to content

Compatibility with Apache Iceberg

This page describes how to currently read and write Havasu tables using open source Apache Iceberg, and outlines the exciting upcoming advancements in native spatial data support within the broader Iceberg ecosystem.

Background

Apache Iceberg is a general-purpose open table format for large-scale data analytics.

Havasu, Wherobots' specialized extension of Apache Iceberg, significantly enhances Iceberg by providing robust support and optimizations for geospatial data.

Havasu is interoperable, ensuring that data tables remain compatible with Iceberg's table format while introducing specific mechanisms for geospatial data.

You can use Apache Iceberg APIs to read and write Havasu tables.

The Evolving Landscape: Native Spatial Types in Apache Iceberg

The Apache Iceberg community, with contributions from Wherobots, Planet, CARTO, and many others, has introduced native GEOMETRY and GEOGRAPHY data types directly into the Iceberg specification. This is a major development that will bring standardized and enhanced spatial capabilities to the entire Iceberg ecosystem.

What this means for Wherobots:

  • Future Native Support: Wherobots is actively involved in this development and is committed to supporting these native spatial types in Apache Iceberg.
  • Simplified Data Handling: With the planned implementation and integration of these changes into Wherobots, the intention is to allow direct definition and interaction with GEOMETRY columns via standard Iceberg APIs and tools, without the need for the workarounds described below.
  • Enhanced Optimizations & Interoperability: Native support is expected to pave the way for more deeply integrated spatial indexing and query optimizations within Iceberg, and broader compatibility with other Iceberg-compatible engines.
  • Migration Path: Wherobots plans to provide clear guidance and tools to help customers migrate existing Havasu spatial tables to leverage the new native Iceberg spatial types, ensuring a smooth transition.

As mentioned in our blog post, these native Apache Iceberg spatial features are currently under active development and are not yet generally available for Wherobots customers to use directly in their everyday workflows with all Iceberg tools.

Wherobots is closely involved in this schematic evolution and will update our platform and documentation as these features mature and become ready for production use.

Since these native Iceberg spatial features are still being finalized and rolled out across the ecosystem, the following sections describe the current best practices for interacting with Havasu tables using standard Apache Iceberg libraries that do not yet support the upcoming native spatial types.

Reading GEOMETRY Data Using Apache Iceberg (Current Method)

Currently, when reading a Havasu table's GEOMETRY column using such Iceberg libraries, it will be interpreted as a BINARY column.

You can use any standard Apache Iceberg API to read the data. The geometric data will be read as binary values, typically in Well-Known Binary (WKB) or Extended WKB (EWKB) format. To work with these as geometry objects in your application or query engine, you'll need to deserialize them.

WherobotsDB provides functions like ST_GeomFromWKB (or similar functions for EWKB parsing) to convert these binary values back into geometry objects:

-- Example: Reading and deserializing a geometry column in WherobotsDB
SELECT id, ST_AsText(ST_GeomFromWKB(geom_binary)) AS geom_wkt
FROM wherobots.test_db.havasu_spatial_table;

Writing GEOMETRY Data Using Apache Iceberg (Current Method)

Havasu currently stores geometry data in BINARY columns using EWKB format by default. This allows users to employ standard Apache Iceberg APIs to write serialized geometry data into Havasu tables. For example, you can prepare your geometry data as EWKB byte arrays and write them using Iceberg:

-- Example: Inserting EWKB data into a Havasu table using an Iceberg-compatible engine
-- The exact syntax for ST_AsBinary might vary based on the SQL engine preparing the data.
INSERT INTO wherobots.test_db.havasu_spatial_table (id, attribute, geom_binary)
VALUES (1, 'feature_a', ST_AsBinary(ST_GeomFromText('POINT (1 2)'))),
       (2, 'feature_b', ST_AsBinary(ST_GeomFromText('POLYGON ((0 0, 1 1, 1 0, 0 0))')));

Read Havasu-Iceberg tables with Iceberg APIs; write with Havasu APIs

You can read Havasu-Iceberg tables using Apache Iceberg APIs. If your Iceberg version lacks native geometry type support, use Havasu APIs for writing Havasu-Iceberg tables.