Skip to main content
Wherobots’ integration with Amazon Simple Storage Service (S3) allows Amazon S3 customers to utilize Wherobots as the spatial engine that operates on their data while still using Amazon S3 for data storage. Accelerate your creation of spatial data products by using data directly from Amazon S3 public or private buckets, bypassing the need for time-consuming data transfers.

Benefits

  • Ease of access to data: Integrating your S3 buckets allows Wherobots Organization members to seamlessly access and work with data stored in Amazon S3 buckets without having to manually transfer or duplicate the data.
  • Self-service setup: Administrators can configure the integration themselves through a user interface.
  • Secure authentication: Wherobots’ S3 integration supports secure authentication methods, including Amazon Web Services (AWS) access keys and AWS Identity and Access Management (IAM) role-based access.
  • Control data access: Administrators can select specific buckets to be accessed through the integration, providing granular control over data access.
  • Supports Requester Pays buckets: Wherobots’ S3 integration supports integrations with Amazon S3 Requester Pays buckets. For more information, see Using Requester Pays buckets for storage transfers and usage in the Amazon S3 documentation.

Before you start

  • Admin account within a Professional or Enterprise Edition Organization
    Wherobots Organization members with the User role accounts can use existing integrations set up by Admins but cannot create new ones. See Organization Roles.
  • Community Edition is not supported. See Organization Editions or Upgrade Organization
  • An AWS account
  • An IAM role with permissions to modify trust and role policies (or permissions to create one). This role will be used by Wherobots to access your S3 bucket.
    The following IAM Actions are needed to create or manage IAM roles in AWS. These typically require AdministratorAccess.
    IAM ActionDescription
    AttachRolePolicyAttaches a managed policy to the role
    CreateRoleCreates a new IAM role
    DeleteRolePolicyRemoves an inline policy from the role
    DetachRolePolicyDetaches a managed policy from the role
    PutRolePolicyCreates a new inline policy and attaches it to the role
    UpdateAssumeRolePolicyModifies the trust policy of the role
    UpdateRoleModifies the role’s description or maximum session duration
    For a complete list of IAM Actions, see Actions defined by AWS Identity and Access Management in the AWS Documentation.
  • An existing public or private AWS S3 bucket

Bucket types

The following Amazon S3 bucket types can be integrated with Wherobots:
A public bucket on Amazon S3 is a bucket that has turned off Amazon S3’s default Block all public access option.
Granting external write access to a public S3 bucket is strongly discouraged. Use a private bucket for Managed Catalogs.
A private bucket on Amazon S3 is a bucket that keeps the default Block all public access option enabled. This is the recommended option for Managed Catalogs.
In Amazon S3, a Requester Pays bucket shifts the responsibility for the cost of the request and the data download from the bucket owner to the person accessing the data.
Accessing data from Requester Pays buckets will result in additional fees charged to you, not the bucket owner.
For more information on Amazon S3 buckets, see Creating a bucket in the Amazon S3 documentation.

Integrate a public or private bucket

This integration requires switching between Wherobots Cloud and AWS Console. We recommend opening both in separate browser tabs and following the workflow below.

Integration workflow

This integration involves configuring an IAM role in AWS and setting up the storage integration in Wherobots Cloud. Complete these steps in order, switching between platforms as indicated:
  1. Wherobots Cloud: Start the integration and copy policy JSON (Steps 1-3)
  2. AWS Console: Configure your IAM role with the copied policies
  3. Wherobots Cloud: Submit and verify the integration (Step 4)
S3 Path Restriction: Bucket paths cannot contain periods. For example, s3://my.bucket.name is not allowed. Acceptable paths can consist of alphanumeric characters, underscores, equal signs, and dashes.
1

Start the integration

  1. Log in to Wherobots Cloud
  2. Click Storage
  3. Click Create Storage Integration Create storage integration
2

Enter integration details

On the Add New Storage Integration page, enter the following:
  • Name: Alphanumeric characters, spaces, special characters, or underscores (must include at least one letter)
  • S3 Path: Your bucket path prefaced by s3:// (e.g., s3://my-bucket-name)
  • Role ARN: Your IAM role’s Amazon Resource Name
Don’t have a Role ARN yet? You’ll get this from AWS. For now, keep this page open and switch to the AWS Console tab to create or find your IAM role.
Add storage integration
3

Copy the policy JSON

Copy both the Sample Role Policy and Trust Relationship JSON from this page. You’ll need these in the next step.Grant access
Switch to the AWS Console tab above to configure your IAM role with these policies. Return here after completing the AWS configuration.
4

Submit and verify

After completing the IAM role configuration in AWS, return here to finish the integration.
  1. Click Submit on the Create Storage Integration page
  2. You’ll be taken to Organization Settings — scroll to Storage
  3. Find your bucket and click … > Verify Access Verify integration If unsuccessful, wait a few seconds and click Retry.

Manage storage integrations

After creating S3 storage integrations, Admins can manage them from Organization Settings.
To verify an integration, see: Organization Settings > Storage> Click … (The ellipsis button) > Verify Access
To view all storage integrations, see: Organization Settings > Storage
To delete an integration, see: Organization Settings > Storage> Click … (The ellipsis button) > Delete

View a specific integration’s contents

1

Open the Storage Source dropdown

Log in to Wherobots Cloud and click the Storage Source dropdown at the top of the screen.Storage source
2

Select your bucket

Select your desired bucket to view its contents.

Access integrated storage in a notebook

After creating a Managed Catalog from your S3 storage integration, access your data using the catalog reference format:
CATALOG_NAME.DATABASE_NAME.TABLE_NAME
To use new storage integrations or catalogs in your notebooks, you must start a new runtime. Notebooks can only access integrations created before the runtime started.
1

Start a notebook

Log in to Wherobots Cloud and start a Notebook with a Python Kernel. See Notebook instance management and Jupyter Notebook Management for details.
2

Load your data from the catalog

# Replace `CATALOG_NAME`, `DATABASE_NAME`, and `TABLE_NAME` with your specific names.

from sedona.spark import *

config = SedonaContext.builder().getOrCreate()
sedona = SedonaContext.create(config)

# Access data using catalog.database.table format
df = sedona.table("CATALOG_NAME.DATABASE_NAME.TABLE_NAME")
df.printSchema()
df.show()

Managed Catalog

A Managed Catalog can be created from a private bucket storage integration at any time, allowing for multiple catalogs per integration.
Granting external write access to a public S3 bucket is strongly discouraged. Use a private S3 bucket for your Managed Catalog.

What is a Managed Catalog?

A Managed Catalog is a metadata repository that is created, owned, and controlled directly within your Wherobots Organization. When you connect a data source like an S3 bucket and register it as a managed catalog, Wherobots takes on the following responsibilities:
  • Source of Truth: Wherobots becomes the authoritative source for all metadata, including schemas, table definitions, file locations, and partition information.
  • Data Discovery: Wherobots actively scans the underlying storage (e.g., S3) to discover new data and automatically update the catalog.
  • Lifecycle Management: Wherobots handles all metadata operations, such as creating, updating, and deleting tables. Changes in the underlying data are automatically synced to the catalog.
  • Optimization: Because Wherobots has full control, it can build and manage advanced spatial indexes and perform other performance optimizations directly on the metadata.
You typically use a managed catalog when your raw spatial data files reside in an AWS S3 private bucket and you want Wherobots to handle all aspects of data management, query optimization, and spatial ETL.

Create a Managed Catalog from an S3 bucket

To create a Managed Catalog from an S3 bucket storage integration, complete the following steps:
1

Open Data Hub

Log in to Wherobots Cloud and click Data Hub.
2

Add a new catalog

Click Add Catalog.Click Data Hub
3

Configure the catalog

  • Name: Alphanumeric characters, spaces, special characters, or underscores (must include at least one letter)
  • Storage: Select a private bucket from the dropdown
  • Path (Optional): Enter the sub-folder where you’d like to store this Managed Catalog Create spatial catalog
To use new storage integrations or catalogs in your notebooks, you must start a new runtime. Notebooks can only access storage integrations or catalogs that were created before the runtime started.

Limitations

The following limitations apply to S3 storage integrations:

Current limitations

  • Bucket paths cannot contain periods (e.g., s3://my.bucket.name is not allowed)
  • A bucket can only be configured with a single storage integration
  • Public buckets are not recommended for Managed Catalogs due to the write access requirements