Connect to AWS Glue Catalog

Use the data in your AWS Glue Catalog in Wherobots. Connect it once from the Data Hub, then read and write tables in Glue from your Wherobots Spark workloads. Wherobots connects to AWS Glue through a Cloud Connection, which is a reusable IAM role that Wherobots assumes in your AWS account. The Data Hub wizard provisions everything with AWS CloudFormation, so there’s no IAM policy or Spark configuration to write by hand.

Before you start

The following requirements must be met within both your Wherobots Organization and AWS account before you can connect to a Glue catalog.

Wherobots Requirements

An Admin account within a Professional, Innovation, or Enterprise Edition Organization to create catalogs and Cloud Connections.
Wherobots Organization members with the User role can use existing catalogs and Cloud Connections set up by Admins but cannot create new ones. See Organization Roles.
Community Edition is not supported. See Organization Editions or Upgrade Organization.
An existing Cloud Connection to your AWS account, or permission to create one. For more information, see Cloud Connections.
Your Organization ID, found at cloud.wherobots.com/organization.

AWS Requirements

An AWS account with an existing Glue database and an S3 bucket that holds your Glue table data.

Permission to create CloudFormation stacks that provision IAM resources (typically AdministratorAccess), so Wherobots can set up the Cloud Connection and grant it Glue access. CloudFormation creates the IAM role and policies for you, so you don’t edit them by hand.

What CloudFormation provisions

The pre-filled templates create or modify IAM resources on your behalf. The actions they perform typically require AdministratorAccess:

IAM Action	Description
`CreateRole`	Creates the Cloud Connection IAM role
`UpdateAssumeRolePolicy`	Sets the trust policy that lets Wherobots assume the role
`PutRolePolicy`	Attaches the inline policy granting Glue and S3 access
`AttachRolePolicy`	Attaches a managed policy to the role
`DeleteRolePolicy`	Removes an inline policy (on stack update or delete)
`DetachRolePolicy`	Detaches a managed policy (on stack update or delete)

For a complete list of IAM Actions, see Actions defined by AWS Identity and Access Management in the AWS Documentation.

Cloud Connections

Before you can connect to an AWS Glue catalog, Wherobots needs a Cloud Connection to your AWS account.

What is a Cloud Connection?

A Cloud Connection is a single, reusable trust relationship between your Wherobots Organization and a cloud provider account. For AWS, it’s a first-class object that stores your AWS account ID and an IAM role (with an external ID) that Wherobots assumes on your behalf to access resources in your account.

Create one Cloud Connection per AWS account and reuse it across all of your S3 storage integrations and AWS Glue catalogs in that account, instead of configuring credentials separately for each one.

Trust is established with AWS CloudFormation. Create the connection directly in the Wherobots UI, or download the CloudFormation template (YAML) so your security team can review it before deploying.

Manage Cloud Connections

Cloud Connections live in Organization Settings under Cloud Connections, where Admins can create, list, verify, and delete them.

Verify: Confirms Wherobots can assume your IAM role through the two-hop AWS STS AssumeRole chain.
Delete: Blocked while any storage integration or Glue catalog is still bound to the connection. You must remove any resources bound to that Cloud Connection prior to its deletion.

Connect to the AWS Glue catalog

Connect your AWS Glue Catalog to your Wherobots Organization in order to read and write to it from your Wherobots workloads. The Data Hub wizard provisions the connection with a CloudFormation stack, so you don’t have to write IAM policies or Spark configs by hand.

Open the Add Amazon Glue Catalog wizard

In Wherobots Cloud, open the Data Hub.Start adding a catalog using either the Data Hub explorer or the Add Catalog button:

Data Hub explorer
Add Catalog button

With no catalog selected, the details pane shows Add data cards. Select the Amazon Glue card.

The Data Hub empty state showing the Unity Catalog, Amazon Glue, and Storage Integration cards

Either method opens the Add Amazon Glue Catalog wizard.

Enter the catalog details

On the Details step, enter:

Name: The name of the catalog in Wherobots.
- This affects the Fully Qualified Name (FQN) used to reference this catalog in Wherobots: CATALOG_NAME.DATABASE_NAME.TABLE_NAME
- A name can contain any character. But if it includes anything other than letters, numbers, or underscores — such as a space, dash, or period — you must wrap the name in backticks wherever you reference it in the FQN. For example, a catalog named My-Catalog is referenced as `My-Catalog`.database.table.
For simplicity and to avoid quoting issues, it’s recommended to use only alphanumeric characters, spaces, or underscores.
AWS Region: The region your Glue catalog is in.
S3 Path: The bucket (and optional prefix) that holds your Glue table data.
Access level: The level of access Wherobots has to the catalog.
- Choose Read-only to allow Wherobots to read tables in your Glue catalog, but not create or modify them.
- Choose Read-write to allow Wherobots to read, create, and modify tables in your Glue catalog.
Click Continue.

The Details step of the Add Amazon Glue Catalog wizard — Step 1, Details: name the catalog, then set the AWS region, S3 path, and access level.

Choose a Cloud Connection

On the Connection step, pick an existing Cloud Connection or create a new one, then click Continue.

The Connection step of the Add Amazon Glue Catalog wizard — Step 2, Connection: select the Cloud Connection Wherobots uses to reach Glue.

Create a new Cloud Connection

Creating a connection takes two steps inside the Create Cloud Connection dialog.

Enter the connection details

Connection Name — a label that identifies the trust relationship in Wherobots.
AWS Account ID — the 12-digit AWS account you’re connecting to.

Click Create Connection.

The Create Cloud Connection dialog — Connection details: name the connection and enter your AWS Account ID.

Grant access in AWS

Wherobots generates a pre-filled CloudFormation stack that creates the connection’s IAM role. Click Open in AWS Console to launch it in a new tab, or Download Template to run it yourself.

On the AWS Quick create stack page, everything is pre-filled. Scroll to the bottom, select I acknowledge that AWS CloudFormation might create IAM resources, and click Create stack.

The Capabilities section of the AWS Quick create stack page with the IAM acknowledgment checkbox selected — On the AWS Quick create stack page, acknowledge the IAM capability and click Create stack.

Back in Wherobots, click Done to finish creating the connection.

Deploy the catalog stack

On the Deploy step, launch the pre-filled CloudFormation stack (named wherobots-glue-readonly-<name> or wherobots-glue-readwrite-<name>). It attaches Glue and S3 access to your Cloud Connection.

Click Open in AWS Console (or Download Template to manually paste it into the AWS CloudFormation console).
On the AWS Quick create stack page, select I acknowledge that AWS CloudFormation might create IAM resources, and click Create stack.
Return to Wherobots and click Create catalog.

The Deploy step of the Add Amazon Glue Catalog wizard, with Open in AWS Console and Create catalog — Step 3, Deploy: launch the CloudFormation stack, then create the catalog.

Verify and finish

After the stack completes, Wherobots verifies it can access your Glue catalog. Once verified, the catalog appears in the Data Hub and is ready to use.

IAM changes can take a few minutes to propagate across AWS. If verification fails immediately, wait a moment and retry.

Next Steps

After you connect your Glue catalog, you can query its tables from a Wherobots notebook. See Query AWS Glue Catalog in a Notebook for guidance and starter code.

In Wherobots Cloud, you must start a new runtime after creating the catalog. Notebooks only see catalogs that existed when the runtime started.

Known Limitations

The following limitations apply to Glue catalogs connected to Wherobots:

Glue REST API doesn’t support staged creates, i.e. no CREATE TABLE foo AS SELECT ...
Glue REST API doesn’t support reading or writing views
Glue REST API doesn’t support ALTER TABLE … RENAME TO
Iceberg V3 Tables can be read, but ONLY if they don’t have any of the new column types (geometry / geography, variant, timestamp_ns)

Wherobots Fundamentals

Getting Started

Data Hub

Team & Access

Plans & Billing

Connect to AWS Glue Catalog

Before you start

Cloud Connections

Manage Cloud Connections

Connect to the AWS Glue catalog

Next Steps

Known Limitations

​Before you start

​Cloud Connections

​Manage Cloud Connections

​Connect to the AWS Glue catalog

​Next Steps

​Known Limitations

Before you start

Cloud Connections

Manage Cloud Connections

Connect to the AWS Glue catalog

Next Steps

Known Limitations