Skip to content

What is Apache Sedona?

Apache Sedona (https://sedona.apache.org/) is a cluster computing system for processing large-scale spatial data. Sedona extends existing cluster computing systems, such as Apache Spark and Apache Flink, with a set of out-of-the-box distributed Spatial Datasets and Spatial SQL that efficiently load, process, and analyze large-scale spatial data across machines.

Relationship between Apache Sedona and Wherobots

Wherobots uses Apache Sedona as an open core to build a cloud data platform optimized for geospatial analytics and AI workloads. Wherobots offers a streamlined experience for executing geospatial workloads, powered by Apache Sedona.

Wherobots Inc. was founded by the original creators of Apache Sedona. As an open-source project, Apache Sedona attracts contributions from various leading companies, including Wherobots.

Wherobots actively contributes to the development and improvement of Apache Sedona and also develops additional optimizations and proprietary features that enhance and extend the capabilities of Apache Sedona, such as WherobotsDB, a full-fledged spatial analytics database system.

Differences between Apache Sedona and Wherobots

Feature Apache Sedona Wherobots
Spatial Functions 200+ 300+
Data Model Geometry, In-DB raster Geometry, In-DB raster, Out-DB raster
Distributed ETL and Analytics Performance The fastest open-source engine1 20X faster than Apache Sedona (formerly GeoSpark)
Spatial Data Lake Storage No ❌ Yes ✅
Distributed Vector Tiles No ❌ Yes ✅
Spatial AI No ❌ Yes ✅
Distributed Raster Inference No ❌ Yes ✅
Distributed Map Matching No ❌ Yes ✅
Deployment Mode Unmanaged Fully managed and highly optimized
Supported Languages SQL, Scala, Java, Python SQL, Scala, Java, Python

  1. From Third-party evaluation in the Apache Sedona documentation:

    • SIGMOD 2020: Tahboub, R. Y., & Rompf, T. (2020). Architecting a query compiler for spatial workloads. Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, 2481–2496.

    In Figure 16a, GeoSpark distance join query runs around 7x - 9x faster than Simba, a spatial extension on Spark, on 1 - 24 core machines.

    • PVLDB 2018: Pandey, V., Kipf, A., Neumann, T., & Kemper, A. (2018). How good are modern spatial analytics systems?. Proceedings of the VLDB Endowment, 11(11), 1632-1645.

    GeoSpark comes close to a complete spatial analytics system. It also exhibits the best performance in most cases.