Skip to content

Getting started

WherobotsDB provides various SQL APIs to work with vector and raster data. For vector data, it follows SQL/MM Part3 Spatial SQL Standard. WherobotsDB maintains support for the existing public APIs found in Apache Sedona. To make existing Sedona users feel comfortable, our documentation still uses the word Sedona in the context of WherobotsDB.

Create SedonaContext

To start using WherobotsDB, we need to launch a SedonaContext in Wherobots cloud:

from sedona.spark import SedonaContext

config = SedonaContext.builder().getOrCreate()
sedona = SedonaContext.create(config)
import org.apache.sedona.spark.SedonaContext

val config = SedonaContext.builder.getOrCreate()
val sedona = SedonaContext(config)
import org.apache.spark.sql.SparkSession;
import org.apache.sedona.spark.SedonaContext;

SparkSession config = SedonaContext.builder().getOrCreate();
SparkSession sedona = SedonaContext.create(config);

All the SQL APIs can be accessed via:

myDataFrame = sedona.sql("YOUR_SQL")
myDataFrame.createOrReplaceTempView("spatialDf")
var myDataFrame = sedona.sql("YOUR_SQL")
myDataFrame.createOrReplaceTempView("spatialDf")
Dataset<Row> myDataFrame = sedona.sql("YOUR_SQL")
myDataFrame.createOrReplaceTempView("spatialDf")

Alternatively, expr and selectExpr can be used:

myDataFrame.withColumn("geometry", expr("ST_*")).selectExpr("ST_*")
myDataFrame.withColumn("raster", expr("RS_*")).selectExpr("RS_*")

Detailed Spatial SQL APIs for vector and raster can be found at Vector SQL APIs and Raster SQL APIs respectively. References of Havasu table format can be found at Havasu APIs

Quick start

After starting SedonaContext, users can start working with WherobotsDB on Wherobots cloud without any setup.

Create a WherobotsDB Havasu table

Wherobots cloud provides a set of default configurations for the users to store data under the S3 prefix allocated for the users. Simply use wherobots as the catalog name when creating a Havasu table.

sedona.sql("CREATE DATABASE IF NOT EXISTS wherobots.test_db")
sedona.sql("""
CREATE TABLE IF NOT EXISTS wherobots.test_db.taxi (
  pickup GEOMETRY,
  Trip_Pickup_DateTime STRING,
  Payment_Type STRING,
  Fare_Amt DECIMAL
) USING havasu.iceberg
""")
sedona.sql("CREATE DATABASE IF NOT EXISTS wherobots.test_db")
sedona.sql("""
CREATE TABLE IF NOT EXISTS wherobots.test_db.taxi (
  pickup GEOMETRY,
  Trip_Pickup_DateTime STRING,
  Payment_Type STRING,
  Fare_Amt DECIMAL
)
sedona.sql("CREATE DATABASE IF NOT EXISTS wherobots.test_db");
sedona.sql(
  "CREATE TABLE IF NOT EXISTS wherobots.test_db.taxi (" +
  "pickup GEOMETRY," +
  "Trip_Pickup_DateTime STRING," +
  "Payment_Type STRING," +
  "Fare_Amt DECIMAL" +
")");

Read data from external storage

User can ingest and query data using WherobotsDB. For example, users can load an example CSV file:

taxidf = sedona.read.format('csv').option("header","true").option("delimiter", ",").load("s3://wherobots-examples/data/nyc-taxi-data.csv")
taxidf = taxidf.selectExpr('ST_Point(CAST(Start_Lon AS Decimal(24,20)), CAST(Start_Lat AS Decimal(24,20))) AS pickup', 'Trip_Pickup_DateTime', 'Payment_Type', 'CAST(Fare_Amt AS DECIMAL)')
taxidf = taxidf.filter(col("pickup").isNotNull())
val taxidf = sedona.read.format("csv").option("header","true").option("delimiter", ",").load("s3://wherobots-examples/data/nyc-taxi-data.csv")
val taxidf = taxidf.selectExpr("ST_Point(CAST(Start_Lon AS Decimal(24,20)), CAST(Start_Lat AS Decimal(24,20))) AS pickup", "Trip_Pickup_DateTime", "Payment_Type", "CAST(Fare_Amt AS DECIMAL)")
val taxidf = taxidf.filter(col("pickup").isNotNull)
Dataset<Row> taxidf = sedona.read().format("csv").option("header","true").option("delimiter", ",").load("s3://wherobots-examples/data/nyc-taxi-data.csv");
Dataset<Row> taxidf = taxidf.selectExpr("ST_Point(CAST(Start_Lon AS Decimal(24,20)), CAST(Start_Lat AS Decimal(24,20))) AS pickup", "Trip_Pickup_DateTime", "Payment_Type", "CAST(Fare_Amt AS DECIMAL)");
Dataset<Row> taxidf = taxidf.filter(col("pickup").isNotNull());

Insert data to a WherobotsDB Havasu table

User can insert the data above to an existing WherobotsDB table.

taxidf.writeTo("wherobots.test_db.taxi").append()
taxidf.writeTo("wherobots.test_db.taxi").append()
taxidf.writeTo("wherobots.test_db.taxi").append();

Read from a WherobotsDB Havasu table

User can immediately query the data stored in Havasu table.

df_result = sedona.sql("SELECT ST_Buffer(pickup, 0.001) pickup_buf FROM wherobots.test_db.taxi WHERE ST_Within(pickup, ST_PolygonFromEnvelope(-74, 40, -73, 41))")
df_result.show()
val df_result = sedona.sql("SELECT ST_Buffer(pickup, 0.001) pickup_buf FROM wherobots.test_db.taxi WHERE ST_Within(pickup, ST_PolygonFromEnvelope(-74, 40, -73, 41))")
df_result.show()
Dataset<Row> df_result = sedona.sql("SELECT ST_Buffer(pickup, 0.001) pickup_buf FROM wherobots.test_db.taxi WHERE ST_Within(pickup, ST_PolygonFromEnvelope(-74, 40, -73, 41))");
df_result.show();

Limitations

  • MOR Tables: Havasu does not support MOR tables. We will add MOR support to Havasu in the future.
  • File Formats: Havasu only supports storing geometry or raster values in parquet files. Other formats such as ORC or Avro are not supported.

Last update: May 20, 2024 07:48:29