> ## Documentation Index
> Fetch the complete documentation index at: https://docs.wherobots.com/llms.txt
> Use this file to discover all available pages before exploring further.

# WherobotsRunOperator - Job Run Constructor

> Use WherobotsRunOperator to run your Python or JAR scripts on Wherobots Cloud directly from your Airflow DAGs to create Job Runs.

You can submit your scripts
for execution on the Wherobots Cloud. A Job Run tracks the
status of that script's execution. When implemented,
an Airflow `TaskInstance` executes a Job Run and waits for it to complete.

For a comprehensive list of the parameters associated with Job Runs,
see [Runs REST API](/reference/runs/).

## Benefits

You can use Airflow to streamline, automate, and manage complex ETL
workload tasks that are running on your vector or raster based data.
By using `WherobotsRunOperator`, you can incorporate Wherobots'
geospatial data processing features into your Airflow workflows.

With `WherobotsRunOperator`, you no longer need to manage and
configure your own cluster to complete your geospatial data processing.
Wherobots can handle that resource allocation for you, while providing
analytics capabilities that are optimized for geospatial concerns.

## Before you start

Before using `WherobotsRunOperator`, ensure that you
have the following required resources:

* An account within a Professional or Enterprise Edition Organization. For more information, see [Create a Wherobots Account](/get-started/wherobots-cloud/create-account).
* Wherobots API key. For more information, see
  [API keys](/get-started/wherobots-cloud/api-keys/) in the Wherobots documentation.
* The Wherobots Apache Airflow Provider. For installation information, see [Wherobots Apache Airflow Provider](/develop/airflow-provider).
* An Airflow Connection. For more information, see [Create a new Connection in Airflow Server](/develop/airflow-provider#create-a-new-connection-in-airflow-server).
* Python version ≥ 3.8
* Apache Airflow. For more information, see
  [Installation of Airflow](https://airflow.apache.org/docs/apache-airflow/stable/installation/index.html) in the Apache Airflow documentation.

## DAG files

Directed Acyclic Graph (DAG) files are Python scripts that
define your Airflow workflows. DAG files need to be accessible
to the Airflow scheduler so that your tasks can be executed. DAG files
are commonly stored in `$AIRFLOW_HOME/dags`. For more information, see
[Setting Configuration Options](https://airflow.apache.org/docs/apache-airflow/stable/howto/set-config.html)
in the Apache Airflow Documentation.

### WherobotsRunOperator constructor

The following example details how to integrate `WherobotsRunOperator` into your DAG file.

In this documentation, we discuss the `WherobotsRunOperator`-related lines of code which are
highlighted below. For more information on the general formatting of Airflow DAG files, see
[Best Practices](https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html) in
the Airflow Documentation.

```py title="Example DAG file" linenums="1" hl_lines="5 7 16 18-22 24" theme={"system"}
import datetime
import pendulum

from airflow import DAG
from wherobots.db.region import Region
from airflow_providers_wherobots.operators.run import WherobotsRunOperator

from wherobots.db.runtime import Runtime

with DAG(
    dag_id="test_run_operator",
    schedule="@once",
    start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
    catchup=False,
    tags=["example"],
) as test_run_dag:
    operator = WherobotsRunOperator(
        # region parameter establishes a connection to a specified AWS cloud provider region.
        # Replace 'Region.AWS_US_WEST_2' with the desired AWS region, for example:
        # - For AWS US East (N. Virginia): region=Region.AWS_US_EAST_1
        region=Region.AWS_US_WEST_2,
        task_id="test_run_smoke",
        name="airflow_operator_test_run_{{ ts_nodash }}",
        # runtime parameter specifies the compute resources allocated for the runtime environment.
        # Replace 'Runtime.TINY' with the desired runtime size, for example:
        # - For a small runtime: runtime=Runtime.SMALL
        # - For a medium runtime: runtime=Runtime.MEDIUM
        runtime=Runtime.TINY,
        run_python={
	        "uri": "S3-PATH-TO-YOUR-FILE"
        },
        dag=test_run_dag,
        poll_logs=True,
    )
```

<Tip>
  **Can I use any of the runtimes listed in Accepted values?**

  You might not have access to all of the available runtimes.

  For example, Community Edition Organizations can only use the Tiny and Micro runtimes. You can see the runtimes available to your Organization within the **Start a Notebook** dropdown in [Wherobots Cloud](https://cloud.wherobots.com/).
</Tip>

| Parameter         | Type   | Description                                                                                                                                                                                                                                                                                   | Accepted values                                                                                                                                                                                                                             |
| :---------------- | :----- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `operator`        |        | `WherobotsRunOperator` instantiates `WherobotsRunOperator`.                                                                                                                                                                                                                                   |                                                                                                                                                                                                                                             |
| `name`            | `str`  | The name of the run. If not specified, a default name will be generated.                                                                                                                                                                                                                      |                                                                                                                                                                                                                                             |
| `runtime`         | `enum` | Specifies the Wherobots runtime. <br /> `runtime` is optional. Defaults to `Runtime.SMALL` for Professional Edition. <br /> See [Runs REST API](/reference/runs/) for more information.                                                                                                       | See [Runtime `enum` values](/develop/runtimes/#runtime-enum-values). <br /><br />You can see the runtimes <br />available to your Organization within the **Start a Notebook** dropdown in [Wherobots Cloud](https://cloud.wherobots.com/). |
| `region`          | `enum` | The compute region where your workload is running.                                                                                                                                                                                                                                            | See [Region `enum` values](/develop/runtimes/#region-enum-values) for supported values.                                                                                                                                                     |
| `run_python`      | `dict` | Provide the Python file's S3 path. [Wherobots Managed Storage](/develop/storage-management/storage/#wherobots-managed-storage) and [Storage Integration](/develop/storage-management/s3-storage-integration/) S3 paths are supported. You can use either `run_python` or `run_jar`, not both. | Takes the following keys: `uri:`(`str`) and `args:` (`list[str]`).                                                                                                                                                                          |
| `poll_logs`       | `bool` | Enables Log polling when set to `True`.                                                                                                                                                                                                                                                       | `True`, `False`                                                                                                                                                                                                                             |
| `timeout_seconds` | `int`  | The total duration of the Job Run's execution in seconds, starting from when its status changes to `RUNNING`. The default value is `3600`.                                                                                                                                                    |                                                                                                                                                                                                                                             |
| `do_xcom_push`    | `bool` | The default value is `True`. In this case the operator will push the `run_id: {value}` into XComs during the Airflow Task execution.                                                                                                                                                          | `True`, `False`                                                                                                                                                                                                                             |

#### Other parameters

The following parameters aren't included in the example but are also compatible with the `WherobotsRunsOperator`.

| Parameter          | Type   | Description                                                                                                                                                                                                                                                      | Accepted values                                                                       |
| :----------------- | :----- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------ |
| `run_jar`          | `dict` | [Wherobots Managed Storage](/develop/storage-management/storage/#wherobots-managed-storage) and [Storage Integration](/develop/storage-management/s3-storage-integration/) S3 paths are supported.<br /> You can use either `run_python` or `run_jar`, not both. | Takes the following keys: `uri:`(`str`), `args:` (`list[str]`), `mainClass:` (`str`). |
| `polling_interval` | `int`  | The interval in seconds to poll the status of the run. <br />The default value is `30`.                                                                                                                                                                          |                                                                                       |
| `environment`      | `dict` | The model for runtime environment configs, including Spark cluster configs and dependencies. Defaults to `{}`. For more information see [Environment keys](/develop/run-operator#environment-keys).                                                              |                                                                                       |

##### Environment keys

| Environment parameter | Type            | Required or Optional | Parameter description                                                                                                                                                                                                                       |
| :-------------------- | :-------------- | :------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `sparkDriverDiskGB`   | `int`           | Optional             | The driver disk size of the Spark cluster.                                                                                                                                                                                                  |
| `sparkExecutorDiskGB` | `int`           | Optional             | The executor disk size of the Spark cluster.                                                                                                                                                                                                |
| `sparkConfigs`        | `dict{str:str}` | Optional             | The user specified Spark configs.                                                                                                                                                                                                           |
| `dependencies`        | `list[dict]`    | Optional             | Indicates the 3rd party dependencies need to<br /> add to the runtime. Required if adding `sparkConfigs`. Must be an array (list) of objects (dictionaries). Each object must represent either a Python Package Index or a File Dependency. |

##### Dependencies

The following details information on the third-party dependencies that can be added to the runtime environment.

| Parameter        | Type   | Details                                                                                                                                                                                                                                                                                                                                                                                                                                 | Rules for accepted values       |
| :--------------- | :----- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------------------------ |
| `sourceType`     | string | Enum for the source type of dependency                                                                                                                                                                                                                                                                                                                                                                                                  | Valid values: `PYPI` or `FILE`. |
| `filePath`       | string | The file path to the dependency file. Must be a valid file path accessible within the runtime environment. <br /> The file extension must be one of: `.jar`, `.whl`, `.zip`, or `.json`. Can only be used with the `FILE` `sourceType`.<br /><br />JSON files can be used to place a configuration<br />file at a predetermined file-system location on<br />all nodes of the compute cluster<br />(`/opt/wherobots/<file name>.json`). |                                 |
| `libraryName`    | string | The python package name. Can only be used with the `PYPI` `sourceType`.                                                                                                                                                                                                                                                                                                                                                                 |                                 |
| `libraryVersion` | string | The Python package version. Can only be used with the `PYPI` `sourceType`.                                                                                                                                                                                                                                                                                                                                                              |                                 |

## JAR and Python Scripts

You can use Python or JAR files to further leverage Wherobots'
geospatial features and optimizations in conjunction with an Airflow DAG.

### Tile generation

This example uses Sedona to process and generate vector tiles. The example reads
buildings and roads data from Wherobots open datasets, filters
the data based on a specified region, and then generates
vector tiles using the `wherobots.vtiles` library. It writes the tiles to a
PMTiles file in the user's S3 storage and displays a sample of the tiles.

```py title="Tile generation example Python script" linenums="1" theme={"system"}
from sedona.spark import *
from wherobots import vtiles
import pyspark.sql.functions as f
import os
import pyspark.sql.functions as f

config = SedonaContext.builder().getOrCreate()
sedona = SedonaContext.create(config)

# Set to False to generate tiles for the entire dataset, True to generate only for region_wkt area
filter = True
region_wkt = "POLYGON ((-122.097931 47.538528, -122.048836 47.566566, -121.981888 47.510012, -122.057076 47.506302, -122.097931 47.538528))"
filter_expression = ST_Intersects(f.col("geometry"), ST_GeomFromText(f.lit(region_wkt)))

buildings_df = (
    sedona.table("wherobots_open_data.overture_maps_foundation.buildings_building")
    .select(
        f.col("geometry"),
        f.lit("buildings").alias("layer"),
        f.element_at(f.col("sources"), 1).dataset.alias("source")
    )
)

buildings_df.show()

roads_df = (
    sedona.table("wherobots_open_data.overture_maps_foundation.transportation_segment")
    .select(
        f.col("geometry"),
        f.lit("roads").alias("layer"),
        f.element_at(f.col("sources"), 1).dataset.alias("source")
    )
)

roads_df.show()

features_df = roads_df.union(buildings_df)

if filter:
    features_df = features_df.filter(ST_Intersects(f.col("geometry"), ST_GeomFromText(f.lit(region_wkt))))

features_df.count()

tiles_df = vtiles.generate(features_df)

tiles_df.show(3, 150, True)

full_tiles_path = os.getenv("USER_S3_PATH") + "tiles.pmtiles"
vtiles.write_pmtiles(tiles_df, full_tiles_path, features_df=features_df)

vtiles.show_pmtiles(full_tiles_path)

sample_tiles_path = os.getenv("USER_S3_PATH") + "sampleTiles.pmtiles"
vtiles.generate_quick_pmtiles(features_df, sample_tiles_path)
```

## Reviewing logs

`WherobotsRunOperator` removes the need to write your own logic to poll the
Wherobots Job Run logs. Airflow users can access the logs by
specifying `poll_logs=true` in a DAG. Wherobots polls the logs
and streams them in the Airflow Task logs.

### Airflow UI

When you start your Airflow DAG from the Airflow server, logs can be seen in
the Airflow Server's **Logs** tab.
Relevant logs begin with the line `INFO - === Logs for Run <run_id> Start:`. The `run_id`
can be found in the Airflow Server's **XCom** tab.

### Monitor Job Runs in Wherobots Cloud

The [**Job Runs**](https://cloud.wherobots.com/job-runs) section Wherobots Cloud lets you monitor the progress of Jobs Runs,
understand resource consumption, review the execution timeline, and verify configuration settings.

For more information, see the [Workload History Documentation](/develop/workload-history/).

## Feature Limitations

`WherobotsRunOperator` has the following limitations:

* Wherobots only stores logs for 30 days.
* The amount of concurrent Job Runs that can occur are determined by your Organization's maximum concurrent Spatial Unit consumption quota. For more information on Spatial Units and quota, see [Runtimes](/develop/runtimes/).
