> ## Documentation Index
> Fetch the complete documentation index at: https://docs.wherobots.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Save Vector Data to External Storage

## Save as txt files

To save a Spatial DataFrame to some permanent storage such as Hive tables and HDFS, you can simply convert each geometry in the Geometry type column back to a plain String and save the plain DataFrame to wherever you want.

Use the following code to convert the Geometry column in a DataFrame back to a WKT string column:

```sql theme={"system"}
SELECT ST_AsText(countyshape)
FROM polygondf
```

<Note>
  ST\_AsGeoJSON is also available. We would like to invite you to contribute more functions
</Note>

## Save GeoParquet

WherobotsDB can directly save a DataFrame with the Geometry column as a GeoParquet file. You need to specify `geoparquet` as the write format. The Geometry type will be preserved in the GeoParquet file.

```scala theme={"system"}
df.write.format("geoparquet").save(geoparquetoutputlocation + "/GeoParquet_File_Name.parquet")
```

### CRS Metadata

WherobotsDB supports writing GeoParquet files with custom GeoParquet spec version and crs.
The default GeoParquet spec version is `1.0.0` and the default crs is `null`. You can specify the GeoParquet spec version and crs as follows:

```scala theme={"system"}
val projjson = "{...}" // PROJJSON string for all geometry columns
df.write.format("geoparquet")
        .option("geoparquet.version", "1.0.0")
        .option("geoparquet.crs", projjson)
        .save(geoparquetoutputlocation + "/GeoParquet_File_Name.parquet")
```

If you have multiple geometry columns written to the GeoParquet file, you can specify the CRS for each column.

```scala theme={"system"}
val projjson_g0 = "{...}" // PROJJSON string for g0
val projjson_g1 = "{...}" // PROJJSON string for g1
df.write.format("geoparquet")
        .option("geoparquet.version", "1.0.0")
        .option("geoparquet.crs.g0", projjson_g0)
        .option("geoparquet.crs.g1", projjson_g1)
        .save(geoparquetoutputlocation + "/GeoParquet_File_Name.parquet")
```

The value of `geoparquet.crs` and `geoparquet.crs.<column_name>` can be one of the following:

* `"null"`: Explicitly setting `crs` field to `null`. This is the default behavior.
* `""` (empty string): Omit the `crs` field. This implies that the CRS is [OGC:CRS84](https://www.opengis.net/def/crs/OGC/1.3/CRS84) for CRS-aware implementations.
* `"{...}"` (PROJJSON string): The `crs` field will be set as the PROJJSON object representing the Coordinate Reference System (CRS) of the geometry.

Please note that WherobotsDB currently cannot set/get a projjson string to/from a CRS. Its geoparquet reader will ignore the projjson metadata and you will have to set your CRS via [`ST_SetSRID`](/reference/wherobots-db/geometry-data/spatial-reference/ST_SetSRID) after reading the file.
Its geoparquet writer will not leverage the SRID field of a geometry so you will have to always set the `geoparquet.crs` option manually when writing the file, if you want to write a meaningful CRS field.

Due to the same reason, WherobotsDB geoparquet reader and writer do NOT check the axis order (lon/lat or lat/lon) and assume they are handled by the users themselves when writing / reading the files. You can always use [`ST_FlipCoordinates`](/reference/wherobots-db/geometry-data/editors/ST_FlipCoordinates) to swap the axis order of your geometries.

### Covering Metadata

WherobotsDB supports writing the [`covering` field](https://github.com/opengeospatial/geoparquet/blob/v1.1.0/format-specs/geoparquet.md#covering) to geometry column metadata. The `covering` field specifies a bounding box column to help accelerate spatial data retrieval. The bounding box column should be a top-level struct column containing `xmin`, `ymin`, `xmax`, `ymax` columns. If the DataFrame you are writing contains such columns, you can specify `.option("geoparquet.covering.<geometryColumnName>", "<coveringColumnName>")` option to write `covering` metadata to GeoParquet files:

```scala theme={"system"}
df.write.format("geoparquet")
  .option("geoparquet.covering.geometry", "bbox")
  .save("/path/to/saved_geoparquet.parquet")
```

If the DataFrame has only one geometry column, you can simply specify the `geoparquet.covering` option and omit the geometry column name:

```scala theme={"system"}
df.write.format("geoparquet")
  .option("geoparquet.covering", "bbox")
  .save("/path/to/saved_geoparquet.parquet")
```

If the DataFrame does not have a covering column, you can construct one using Sedona's SQL functions:

```scala theme={"system"}
val df_bbox = df.withColumn("bbox", expr("struct(ST_XMin(geometry) AS xmin, ST_YMin(geometry) AS ymin, ST_XMax(geometry) AS xmax, ST_YMax(geometry) AS ymax)"))
df_bbox.write.format("geoparquet").option("geoparquet.covering.geometry", "bbox").save("/path/to/saved_geoparquet.parquet")
```

## Sort then Save GeoParquet

To maximize the performance of WherobotsDB GeoParquet filter pushdown, we suggest that you sort the data by their geohash values (see [ST\_GeoHash](/reference/wherobots-db/geometry-data/output/ST_GeoHash)) and then save as a GeoParquet file. An example is as follows:

```
SELECT col1, col2, geom, ST_GeoHash(geom, 5) as geohash
FROM spatialDf
ORDER BY geohash
```

## Save as GeoJSON

The GeoJSON data source in WherobotsDB can be used to save a Spatial DataFrame to a single-line JSON file, with geometries written in GeoJSON format.

```sparksql theme={"system"}
df.write.format("geojson").save("YOUR/PATH.json")
```

The structure of the generated file will be like this:

```json theme={"system"}
{"type":"Feature","geometry":{"type":"Point","coordinates":[102.0,0.5]},"properties":{"prop0":"value0"}}
{"type":"Feature","geometry":{"type":"LineString","coordinates":[[102.0,0.0],[103.0,1.0],[104.0,0.0],[105.0,1.0]]},"properties":{"prop0":"value1"}}
{"type":"Feature","geometry":{"type":"Polygon","coordinates":[[[100.0,0.0],[101.0,0.0],[101.0,1.0],[100.0,1.0],[100.0,0.0]]]},"properties":{"prop0":"value2"}}
```

Note that the DataFrame must contain at least one column with geometry type for the write operation to work. WherobotsDB will use the following rules to determine which column to use as the geometry:

1. If there's a column named "geometry" with geometry type, Sedona will use this column
2. Otherwise, Sedona will use the first geometry column found in the root schema

You can also manually specify which geometry column to use with the "geometry.column" option:

```python theme={"system"}
df.write.format("geojson").option("geometry.column", "geometry").save("/tmp/a_thing")
```

## Save to PostGIS

Unfortunately, the Spark SQL JDBC data source doesn't support creating geometry types in PostGIS using the 'createTableColumnTypes' option.
Only the Spark built-in types are recognized.
This means that you'll need to manage your PostGIS schema separately from Spark.
One way to do this is to create the table with the correct geometry column before writing data to it with Spark.
Alternatively, you can write your data to the table using Spark and then manually alter the column to be a geometry type afterward.

Postgis uses EWKB to serialize geometries.
If you convert your geometries to EWKB format in WherobotsDB you don't have to do any additional conversion in Postgis.

```sql theme={"system"}
-- Example: create table in PostGIS first
CREATE TABLE my_table (id int8, geom geometry);

-- Write EWKB payload from Spark (pseudo-code)
df.withColumn("geom", expr("ST_AsEWKB(geom)"))
  .write.format("jdbc")
  .option("truncate","true") -- Don't let Spark recreate the table.
  -- Other options omitted for brevity
  .save()

-- If you didn't create the table before writing you can change the type afterward.
-- In PostGIS:
ALTER TABLE my_table ALTER COLUMN geom TYPE geometry;
```

## Save to GeoPandas

WherobotsDB DataFrame can be directly converted to a GeoPandas DataFrame. Use a single, valid Tabs block for language variants.

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    import geopandas as gpd

    df = spatialDf.toPandas()
    gdf = gpd.GeoDataFrame(df, geometry="geometry")
    ```
  </Tab>
</Tabs>

To leverage Arrow optimization and speed up the conversion, we recommend the `dataframe_to_arrow` method in Sedona.

```python theme={"system"}
import geopandas as gpd
from sedona.spark import *

sedona_df = sedona.createDataFrame(zip(test_wkt), ["wkt"]).selectExpr(
    "ST_GeomFromText(wkt) as geom"
)

gdf = gpd.GeoDataFrame.from_arrow(dataframe_to_arrow(sedona_df))
```

You can then plot the GeoPandas DataFrame using many tools in the GeoPandas ecosystem.

```python theme={"system"}
gdf.plot(
    figsize=(10, 8),
    column="value",
    legend=True
)
```

## Save to Snowflake / other data warehouses

Below are concise, single Tabs blocks showing code per language. Tabs are not nested inside code blocks.

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    import geopandas as gpd

    df = spatialDf.toPandas()
    gdf = gpd.GeoDataFrame(df, geometry="geometry")
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    // snowflakeUrl is https://<accountIdentifier>.snowflakecomputing.com
    val sfOptions = Map("sfUrl" -> snowflakeUrl, "sfUser" -> username, "sfPassword" -> password, "sfDatabase" -> database, "sfSchema" -> schema)
    val dest_table_name = "<DESTINATION_TABLE_NAME>"
    val save_mode = "append" // Append data to the table if the table already exists with some data. Other possible values are: errorifexists, ignore, overwrite.
    df_dist_from_ny.write.format(SNOWFLAKE_SOURCE_NAME)
            .options(sfOptions)
            .option("dbtable", dest_table_name)
            .mode(save_mode)
            .save()
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    import java.util.HashMap;
    HashMap<String, String> sfOptions = new HashMap<>();
    sfOptions.put("sfUrl", snowflakeUrl); // snowflakeUrl is https://<accountIdentifier>.snowflakecomputing.com
    sfOptions.put("sfUser", username);
    sfOptions.put("sfPassword", password);
    sfOptions.put("sfDatabase", database);
    sfOptions.put("sfSchema", schema);
    String dest_table_name = "<DESTINATION_TABLE_NAME>";
    String save_mode = "append"; // Append data to the table if the table already exists with some data.
    df_dist_from_ny.write.format(SNOWFLAKE_SOURCE_NAME)
        .options(sfOptions)
        .option("dbtable", dest_table_name)
        .mode(save_mode)
        .save();
    ```
  </Tab>
</Tabs>

## Save to an AWS RDS PostGIS instance

WherobotsDB dataframes can be saved to PostGIS tables hosted in an AWS RDS instance for persistent storage.

A map of configuration and context options must be passed to establish connection with the RDS instance.

If you're unable to establish connection with the RDS instance, double check if the instance is accessible by the server running this code.
For more information on intra or inter VPC connection with the RDS instance, consult [here](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_VPC.Scenarios.html).

The SedonaContext object also needs to be passed a [`SaveMode`](https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/sql/SaveMode.html) parameter using `mode` which specifies how to handle collisions with existing tables if any.

<Tabs>
  <Tab title="Python">
    ```python theme={"system"}
    url = '<URL>' # jdbc:postgresql://ENDPOINT/DATABASE_NAME
    driver = 'org.postgresql.Driver'
    user = '<USERNAME>'
    password = '<PASSWORD>'
    options = {"url": url, "driver": driver, "user": user, "password": password}
    dest_table_name = "<DESTINATION_TABLE_NAME>"
    save_mode = "append" # Append data to the table if the table already exists with some data.
    df_final.write.format("jdbc") \
        .options(**options) \
        .option("dbtable", dest_table_name) \
        .mode(save_mode) \
        .save()
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={"system"}
    val url = "<URL>" // jdbc:postgresql://ENDPOINT/DATABASE_NAME
    val driver = "org.postgresql.Driver"
    val user = "<USERNAME>"
    val password = "<PASSWORD>"
    val options = Map("url" -> url, "user" -> user, "password" -> password, "driver" -> driver)
    val save_mode = "append" // Append data to the table if the table already exists with some data.
    df_dist_from_ny.write.format("jdbc")
        .options(options)
        .option("dbtable", dest_table_name)
        .mode(save_mode)
        .save()
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={"system"}
    import java.util.HashMap;
    HashMap<String, String> options = new HashMap<>();
    options.put("url", url); // jdbc:postgresql://ENDPOINT/DATABASE_NAME
    options.put("user", username);
    options.put("password", password);
    options.put("driver", "org.postgresql.Driver");
    String dest_table_name = "<DESTINATION_TABLE_NAME>";
    String save_mode = "append"; // Append data to the table if the table already exists with some data.
    df_dist_from_ny.write.format("jdbc")
        .options(options)
        .option("dbtable", dest_table_name)
        .mode(save_mode)
        .save();
    ```
  </Tab>
</Tabs>

## Save to Havasu (Iceberg) tables

Please read the [Havasu documentation](/tutorials/wherobotsdb/vector-data/havasu/) for more information on how to save a Spatial DataFrame to Havasu tables.
