Prerequisites
What you need before starting
Integration Guide
Step-by-step setup
Managed Catalog
Create catalogs from S3
Benefits
- Ease of access to data: Integrating your S3 buckets allows Wherobots Organization members to seamlessly access and work with data stored in Amazon S3 buckets without having to manually transfer or duplicate the data.
- Self-service setup: Administrators can configure the integration themselves through a user interface.
- Secure authentication: Wherobots’ S3 integration supports secure authentication methods, including Amazon Web Services (AWS) access keys and AWS Identity and Access Management (IAM) role-based access.
- Control data access: Administrators can select specific buckets to be accessed through the integration, providing granular control over data access.
- Supports Requester Pays buckets: Wherobots’ S3 integration supports integrations with Amazon S3 Requester Pays buckets. For more information, see Using Requester Pays buckets for storage transfers and usage in the Amazon S3 documentation.
Before you start
Wherobots Requirements
Wherobots Requirements
- Admin account within a Professional or Enterprise Edition Organization
Wherobots Organization members with the User role accounts can use existing integrations set up by Admins but cannot create new ones. See Organization Roles.
- Community Edition is not supported. See Organization Editions or Upgrade Organization
AWS Requirements
AWS Requirements
- An AWS account
-
An IAM role with permissions to modify trust and role policies (or permissions to create one). This role will be used by Wherobots to access your S3 bucket.
Required AWS IAM Actions
The following IAM Actions are needed to create or manage IAM roles in AWS. These typically requireAdministratorAccess.For a complete list of IAM Actions, see Actions defined by AWS Identity and Access Management in the AWS Documentation.IAM Action Description AttachRolePolicyAttaches a managed policy to the role CreateRoleCreates a new IAM role DeleteRolePolicyRemoves an inline policy from the role DetachRolePolicyDetaches a managed policy from the role PutRolePolicyCreates a new inline policy and attaches it to the role UpdateAssumeRolePolicyModifies the trust policy of the role UpdateRoleModifies the role’s description or maximum session duration - An existing public or private AWS S3 bucket
Bucket types
The following Amazon S3 bucket types can be integrated with Wherobots:Public Bucket
Public Bucket
A public bucket on Amazon S3 is a bucket that has turned off Amazon S3’s default Block all public access option.
Private Bucket
Private Bucket
A private bucket on Amazon S3 is a bucket that keeps the default Block all public access option enabled. This is the recommended option for Managed Catalogs.
Requester Pays Bucket
Requester Pays Bucket
In Amazon S3, a Requester Pays bucket shifts the responsibility for the cost of the request and the data download from the bucket owner to the person accessing the data.
Integrate a public or private bucket
This integration requires switching between Wherobots Cloud and AWS Console. We recommend opening both in separate browser tabs and following the workflow below.
Integration workflow
This integration involves configuring an IAM role in AWS and setting up the storage integration in Wherobots Cloud. Complete these steps in order, switching between platforms as indicated:- Wherobots Cloud: Start the integration and copy policy JSON (Steps 1-3)
- AWS Console: Configure your IAM role with the copied policies
- Wherobots Cloud: Submit and verify the integration (Step 4)
S3 Path Restriction: Bucket paths cannot contain periods. For example,
s3://my.bucket.name is not allowed. Acceptable paths can consist of alphanumeric characters, underscores, equal signs, and dashes.- Wherobots Cloud
- AWS Console
Start the integration
- Log in to Wherobots Cloud
- Click Storage
-
Click Create Storage Integration

Enter integration details
On the Add New Storage Integration page, enter the following:
- Name: Alphanumeric characters, spaces, special characters, or underscores (must include at least one letter)
- S3 Path: Your bucket path prefaced by
s3://(e.g.,s3://my-bucket-name) - Role ARN: Your IAM role’s Amazon Resource Name
Don’t have a Role ARN yet? You’ll get this from AWS. For now, keep this page open and switch to the AWS Console tab to create or find your IAM role.

Copy the policy JSON
Copy both the Sample Role Policy and Trust Relationship JSON from this page. You’ll need these in the next step.

Submit and verify
After completing the IAM role configuration in AWS, return here to finish the integration.
- Click Submit on the Create Storage Integration page
- You’ll be taken to Organization Settings — scroll to Storage
-
Find your bucket and click … > Verify Access
If unsuccessful, wait a few seconds and click Retry.
Manage storage integrations
After creating S3 storage integrations, Admins can manage them from Organization Settings.Verify Access
Verify Access
To verify an integration, see:
Organization Settings > Storage> Click … (The ellipsis button) > Verify Access
View All
View All
To view all storage integrations, see:
Organization Settings > Storage
Delete
Delete
To delete an integration, see:
Organization Settings > Storage> Click … (The ellipsis button) > Delete
View a specific integration’s contents
Open the Storage Source dropdown
Log in to Wherobots Cloud and click the Storage Source dropdown at the top of the screen.

Access integrated storage in a notebook
After creating a Managed Catalog from your S3 storage integration, access your data using the catalog reference format:To use new storage integrations or catalogs in your notebooks, you must start a new runtime. Notebooks can only access integrations created before the runtime started.
Start a notebook
Log in to Wherobots Cloud and start a Notebook with a Python Kernel. See Notebook instance management and Jupyter Notebook Management for details.
Managed Catalog
A Managed Catalog can be created from a private bucket storage integration at any time, allowing for multiple catalogs per integration.What is a Managed Catalog?
A Managed Catalog is a metadata repository that is created, owned, and controlled directly within your Wherobots Organization. When you connect a data source like an S3 bucket and register it as a managed catalog, Wherobots takes on the following responsibilities:- Source of Truth: Wherobots becomes the authoritative source for all metadata, including schemas, table definitions, file locations, and partition information.
- Data Discovery: Wherobots actively scans the underlying storage (e.g., S3) to discover new data and automatically update the catalog.
- Lifecycle Management: Wherobots handles all metadata operations, such as creating, updating, and deleting tables. Changes in the underlying data are automatically synced to the catalog.
- Optimization: Because Wherobots has full control, it can build and manage advanced spatial indexes and perform other performance optimizations directly on the metadata.
Create a Managed Catalog from an S3 bucket
To create a Managed Catalog from an S3 bucket storage integration, complete the following steps:Open Data Hub
Log in to Wherobots Cloud and click Data Hub.
Configure the catalog
- Name: Alphanumeric characters, spaces, special characters, or underscores (must include at least one letter)
- Storage: Select a private bucket from the dropdown
-
Path (Optional): Enter the sub-folder where you’d like to store this Managed Catalog

To use new storage integrations or catalogs in your notebooks, you must start a new runtime.
Notebooks can only access storage integrations or catalogs that were created before the runtime started.
Limitations
The following limitations apply to S3 storage integrations:Current limitations
Current limitations
- Bucket paths cannot contain periods (e.g.,
s3://my.bucket.nameis not allowed) - A bucket can only be configured with a single storage integration
- Public buckets are not recommended for Managed Catalogs due to the write access requirements



