Skip to main content
Spatial SQL functions can be used in a DataFrame style API similar to Spark functions. The following objects contain the exposed functions: org.apache.spark.sql.sedona_sql.expressions.st_functions, org.apache.spark.sql.sedona_sql.expressions.st_constructors, org.apache.spark.sql.sedona_sql.expressions.st_predicates, and org.apache.spark.sql.sedona_sql.expressions.st_aggregates. Every function can take all Column arguments. Additionally, overloaded forms can commonly take a mix of String and other Scala types (such as Double) as arguments. In general the following rules apply (although check the documentation of specific functions for any exceptions):
  • Scala
  • Python
  1. Every function returns a Column so that it can be used interchangeably with Spark functions as well as DataFrame methods such as DataFrame.select or DataFrame.join.
  2. Every function has a form that takes all Column arguments. These are the most versatile of the forms.
  3. Most functions have a form that takes a mix of String arguments with other Scala types.
The exact mixture of argument types allowed is function specific. However, in these instances, all String arguments are assumed to be the names of columns and will be wrapped in a Column automatically. Non-String arguments are assumed to be literals that are passed to the sedona function. If you need to pass a String literal then you should use the all Column form of the sedona function and wrap the String literal in a Column with the lit Spark function. A short example of using this API (uses the array_min and array_max Spark functions):
  • Scala
  • Python
val values_df = spark.sql("SELECT array(0.0, 1.0, 2.0) AS values")
val min_value = array_min("values")
val max_value = array_max("values")
val point_df = values_df.select(ST_Point(min_value, max_value).as("point"))
The above code will generate the following dataframe:
+-----------+
|point      |
+-----------+
|POINT (0 2)|
+-----------+
Some functions will take native python values and infer them as literals. For example:
from sedona.spark import *

df = df.select(ST_Point(1.0, 3.0).alias("point"))
This will generate a dataframe with a constant point in a column:
+-----------+
|point      |
+-----------+
|POINT (1 3)|
+-----------+