Before you start
Before starting, ensure you have the following:Wherobots Requirements
Wherobots Requirements
- A Wherobots Account with access to your Organization ID (found at cloud.wherobots.com/organization)
- A Storage Integration configured in Wherobots, pointing to the S3 bucket that contains your Glue table data. See S3 Storage Integration for setup instructions.
AWS Requirements
AWS Requirements
- An AWS Account with an existing Glue database
- An S3 bucket for Glue table data
Integration workflow
This integration requires configuring both the AWS Console and Wherobots Cloud. We recommend opening both in separate browser tabs. Complete these steps in order:- AWS Console: Add Glue permissions to your Storage Integration IAM role
- Wherobots Cloud: Configure your Spark session and test the connection
AWS Console
Before starting, make sure you have:
- Your Wherobots Organization ID (found at cloud.wherobots.com/organization)
- A Storage Integration already configured in Wherobots, pointing to the S3 bucket that contains your Glue table data. See S3 Storage Integration for setup instructions.
Locate the IAM role
- Open the AWS Console and navigate to IAM > Roles.
- Find the role created by your Storage Integration (the role name you specified during setup).
- Click on the role to view its details.
Update the permissions policy
Edit the role’s permissions policy to include both S3 and Glue access. Replace the existing policy with the following, substituting your values for the placeholders:
Placeholder reference
| Placeholder | Description |
|---|---|
<YOUR_AWS_ACCOUNT_ID> | Your 12-digit AWS account ID |
<YOUR_REGION> | AWS region (e.g., us-west-2) |
<YOUR_BUCKET_NAME> | S3 bucket containing Glue table data |
<YOUR_GLUE_DATABASE_NAME> | Name of your Glue database |
Wherobots Cloud
After completing the IAM role configuration in the AWS Console, continue here to configure and test the Glue Catalog connection in a Wherobots notebook.Configure Spark session variables
Set the following variables at the top of your notebook or script:
Build the Spark session
Configuration parameters explained
| Parameter | Purpose |
|---|---|
catalog-impl | Specifies Glue as the catalog backend |
client.factory | Wherobots credential factory for Storage Integration |
client.assume-role.arn | IAM role ARN to assume for AWS access |
credentials-provider.external-id | Security token for role assumption |
glue.account-id | Directs queries to your AWS account’s Glue |
Test the connection
Run the following commands in your Wherobots notebook to verify the configuration is working correctly.List available databases:You should see your Glue database in the output.Select your database:
Create a test table with geometry (Iceberg V3)
Iceberg V3 supports geometry and geography types natively. Create a table with a geometry column:Insert test data with geometry:Query the data:Verify table creation:
Troubleshooting
Invalid table identifier
Error:IllegalArgumentException: Invalid table identifier: my-table
Cause: Hyphens in the catalog name cause parsing issues.
Solution: Use underscores instead of hyphens in CATALOG_NAME. For example, use glue_catalog instead of glue-catalog.
Database not found
Error:EntityNotFoundException: Database default not found
Cause: Attempting to use a database that doesn’t exist in Glue.
Solution: Run SHOW DATABASES to see available databases, then USE <database_name> with the correct name.
AccessDenied on glue:GetDatabases
Error:AccessDeniedException: User is not authorized to perform: glue:GetDatabases
Cause: The IAM policy is missing Glue permissions or has incorrect resource ARNs.
Solution:
- Verify the IAM policy includes all Glue actions listed in the AWS Console section.
- Check that resource ARNs match your account ID, region, and database name.
- Ensure
glue.account-idis set in the Spark config.
sts:AssumeRole not authorized
Error:StsException: User is not authorized to perform: sts:AssumeRole
Cause: The IAM role’s trust policy doesn’t allow Wherobots to assume it.
Solution:
- Verify the Storage Integration is properly configured.
- Check that the trust policy matches what Wherobots provided during Storage Integration setup.
- Ensure the External ID format is correct:
<ORG_ID>:wherobots-workloads.
Glue queries hitting wrong AWS account
Error: Queries fail with access denied errors referencing an unexpected AWS account ID. Cause: Missingglue.account-id configuration.
Solution: Ensure .config(f"spark.sql.catalog.{CATALOG_NAME}.glue.account-id", ACCOUNT_ID) is included in the Spark session builder.
Quick reference
The following tables summarize key placeholders and configuration values used in this integration guide for quick reference.Placeholders
| Placeholder | Description |
|---|---|
<YOUR_AWS_ACCOUNT_ID> | 12-digit AWS account ID |
<YOUR_WHEROBOTS_ORG_ID> | From Wherobots console Organization page |
<YOUR_ROLE_NAME> | IAM role name from Storage Integration setup |
<YOUR_REGION> | AWS region (e.g., us-west-2) |
<YOUR_BUCKET_NAME> | S3 bucket for Glue table data |
<YOUR_GLUE_DATABASE_NAME> | Name of your Glue database |
<PATH_TO_TABLES> | S3 path prefix for table storage |
Key values
| Item | Value |
|---|---|
| Wherobots Credentials Factory | com.wherobots.iceberg.aws.WherobotsStIntCredentialsFactory |
| External ID Format | <ORG_ID>:wherobots-workloads |
| Glue Catalog Implementation | org.apache.iceberg.aws.glue.GlueCatalog |
| Storage Integration Setup | S3 Storage Integration |

