Getting started
WherobotsDB provides various SQL APIs to work with vector and raster data. For vector data, it follows SQL/MM Part3 Spatial SQL Standard. WherobotsDB maintains support for the existing public APIs found in Apache Sedona. To make existing Sedona users feel comfortable, our documentation still uses the word Sedona
in the context of WherobotsDB.
Create SedonaContext¶
To start using WherobotsDB, we need to launch a SedonaContext in Wherobots cloud:
from sedona.spark import SedonaContext
config = SedonaContext.builder().getOrCreate()
sedona = SedonaContext.create(config)
import org.apache.sedona.spark.SedonaContext
val config = SedonaContext.builder.getOrCreate()
val sedona = SedonaContext(config)
import org.apache.spark.sql.SparkSession;
import org.apache.sedona.spark.SedonaContext;
SparkSession config = SedonaContext.builder().getOrCreate();
SparkSession sedona = SedonaContext.create(config);
All the SQL APIs can be accessed via:
myDataFrame = sedona.sql("YOUR_SQL")
myDataFrame.createOrReplaceTempView("spatialDf")
var myDataFrame = sedona.sql("YOUR_SQL")
myDataFrame.createOrReplaceTempView("spatialDf")
Dataset<Row> myDataFrame = sedona.sql("YOUR_SQL")
myDataFrame.createOrReplaceTempView("spatialDf")
Alternatively, expr
and selectExpr
can be used:
myDataFrame.withColumn("geometry", expr("ST_*")).selectExpr("ST_*")
myDataFrame.withColumn("raster", expr("RS_*")).selectExpr("RS_*")
Detailed Spatial SQL APIs for vector and raster can be found at Vector SQL APIs and Raster SQL APIs respectively. References of Havasu table format can be found at Havasu APIs
Quick start¶
After starting SedonaContext, users can start working with WherobotsDB on Wherobots cloud without any setup.
Create a WherobotsDB Havasu table¶
Wherobots cloud provides a set of default configurations for the users to store data under the S3 prefix allocated for the users. Simply use wherobots
as the catalog name when creating a Havasu table.
sedona.sql("CREATE DATABASE IF NOT EXISTS wherobots.test_db")
sedona.sql("""
CREATE TABLE IF NOT EXISTS wherobots.test_db.taxi (
pickup GEOMETRY,
Trip_Pickup_DateTime STRING,
Payment_Type STRING,
Fare_Amt DECIMAL
) USING havasu.iceberg
""")
sedona.sql("CREATE DATABASE IF NOT EXISTS wherobots.test_db")
sedona.sql("""
CREATE TABLE IF NOT EXISTS wherobots.test_db.taxi (
pickup GEOMETRY,
Trip_Pickup_DateTime STRING,
Payment_Type STRING,
Fare_Amt DECIMAL
)
sedona.sql("CREATE DATABASE IF NOT EXISTS wherobots.test_db");
sedona.sql(
"CREATE TABLE IF NOT EXISTS wherobots.test_db.taxi (" +
"pickup GEOMETRY," +
"Trip_Pickup_DateTime STRING," +
"Payment_Type STRING," +
"Fare_Amt DECIMAL" +
")");
Read data from external storage¶
User can ingest and query data using WherobotsDB. For example, users can load an example CSV file:
taxidf = sedona.read.format('csv').option("header","true").option("delimiter", ",").load("s3://wherobots-examples/data/nyc-taxi-data.csv")
taxidf = taxidf.selectExpr('ST_Point(CAST(Start_Lon AS Decimal(24,20)), CAST(Start_Lat AS Decimal(24,20))) AS pickup', 'Trip_Pickup_DateTime', 'Payment_Type', 'CAST(Fare_Amt AS DECIMAL)')
taxidf = taxidf.filter(col("pickup").isNotNull())
val taxidf = sedona.read.format("csv").option("header","true").option("delimiter", ",").load("s3://wherobots-examples/data/nyc-taxi-data.csv")
val taxidf = taxidf.selectExpr("ST_Point(CAST(Start_Lon AS Decimal(24,20)), CAST(Start_Lat AS Decimal(24,20))) AS pickup", "Trip_Pickup_DateTime", "Payment_Type", "CAST(Fare_Amt AS DECIMAL)")
val taxidf = taxidf.filter(col("pickup").isNotNull)
Dataset<Row> taxidf = sedona.read().format("csv").option("header","true").option("delimiter", ",").load("s3://wherobots-examples/data/nyc-taxi-data.csv");
Dataset<Row> taxidf = taxidf.selectExpr("ST_Point(CAST(Start_Lon AS Decimal(24,20)), CAST(Start_Lat AS Decimal(24,20))) AS pickup", "Trip_Pickup_DateTime", "Payment_Type", "CAST(Fare_Amt AS DECIMAL)");
Dataset<Row> taxidf = taxidf.filter(col("pickup").isNotNull());
Insert data to a WherobotsDB Havasu table¶
User can insert the data above to an existing WherobotsDB table.
taxidf.writeTo("wherobots.test_db.taxi").append()
taxidf.writeTo("wherobots.test_db.taxi").append()
taxidf.writeTo("wherobots.test_db.taxi").append();
Read from a WherobotsDB Havasu table¶
User can immediately query the data stored in Havasu table.
df_result = sedona.sql("SELECT ST_Buffer(pickup, 0.001) pickup_buf FROM wherobots.test_db.taxi WHERE ST_Within(pickup, ST_PolygonFromEnvelope(-74, 40, -73, 41))")
df_result.show()
val df_result = sedona.sql("SELECT ST_Buffer(pickup, 0.001) pickup_buf FROM wherobots.test_db.taxi WHERE ST_Within(pickup, ST_PolygonFromEnvelope(-74, 40, -73, 41))")
df_result.show()
Dataset<Row> df_result = sedona.sql("SELECT ST_Buffer(pickup, 0.001) pickup_buf FROM wherobots.test_db.taxi WHERE ST_Within(pickup, ST_PolygonFromEnvelope(-74, 40, -73, 41))");
df_result.show();
Limitations¶
- MOR Tables: Havasu does not support MOR tables. We will add MOR support to Havasu in the future.
- File Formats: Havasu only supports storing geometry or raster values in parquet files. Other formats such as ORC or Avro are not supported.