What is Apache Sedona?¶
Apache Sedona (https://sedona.apache.org/) is a cluster computing system for processing large-scale spatial data. Sedona extends existing cluster computing systems, such as Apache Spark and Apache Flink, with a set of out-of-the-box distributed Spatial Datasets and Spatial SQL that efficiently load, process, and analyze large-scale spatial data across machines.
Relationship between Apache Sedona and Wherobots¶
Wherobots uses Apache Sedona as an open core to build a cloud data platform optimized for geospatial analytics and AI workloads. Wherobots offers a streamlined experience for executing geospatial workloads, powered by Apache Sedona.
Wherobots Inc. was founded by the original creators of Apache Sedona. As an open-source project, Apache Sedona attracts contributions from various leading companies, including Wherobots.
Wherobots actively contributes to the development and improvement of Apache Sedona and also develops additional optimizations and proprietary features that enhance and extend the capabilities of Apache Sedona, such as WherobotsDB
, a full-fledged spatial analytics database system.
Differences between Apache Sedona and Wherobots¶
Feature | Apache Sedona | Wherobots |
---|---|---|
Spatial Functions | 200+ | 300+ |
Data Model | Geometry, In-DB raster | Geometry, In-DB raster, Out-DB raster |
Distributed ETL and Analytics Performance | The fastest open-source engine1 | 20X faster than Apache Sedona (formerly GeoSpark) |
Spatial Data Lake Storage | No ❌ | Yes ✅ |
Distributed Vector Tiles | No ❌ | Yes ✅ |
Spatial AI | No ❌ | Yes ✅ |
Distributed Raster Inference | No ❌ | Yes ✅ |
Distributed Map Matching | No ❌ | Yes ✅ |
Deployment Mode | Unmanaged | Fully managed and highly optimized |
Supported Languages | SQL, Scala, Java, Python | SQL, Scala, Java, Python |
-
From Third-party evaluation in the Apache Sedona documentation:
- SIGMOD 2020: Tahboub, R. Y., & Rompf, T. (2020). Architecting a query compiler for spatial workloads. Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, 2481–2496.
In Figure 16a, GeoSpark distance join query runs around 7x - 9x faster than Simba, a spatial extension on Spark, on 1 - 24 core machines.
- PVLDB 2018: Pandey, V., Kipf, A., Neumann, T., & Kemper, A. (2018). How good are modern spatial analytics systems?. Proceedings of the VLDB Endowment, 11(11), 1632-1645.
GeoSpark comes close to a complete spatial analytics system. It also exhibits the best performance in most cases.