Skip to content

S3 storage integration

Wherobots’ integration with Amazon Simple Storage Service (S3) allows Amazon S3 customers to utilize Wherobots as the spatial engine that operates on their data while still using Amazon S3 for their data storage.

Accelerate your creation of spatial data products by using data directly from your S3 storage, bypassing the need for time-consuming data transfers.

Benefits

  • Ease of access to data: Integrating your S3 buckets allows Wherobots Organization Administrators to seamlessly access and work with data stored in their own Amazon S3 buckets without having to manually transfer or duplicate the data.
  • Self-service setup: Administrators can configure the integration themselves through a user interface.
  • Secure authentication: Wherobots’ S3 integration supports secure authentication methods, including AWS access keys and IAM role-based access.
  • Control data access: Administrators can select specific buckets to be accessed through the integration, providing granular control over data access.

Before you start

The following is required to integrate your Amazon S3 storage with Wherobots:

  • A Professional or Enterprise Edition Organization.
  • This feature is not available for Community Edition organizations. For more information on Organization tiers, see Pricing Plans.
  • An account with Administrator privileges.
  • Those in the User role are not able to integrate external storage in Wherobots.
  • An Amazon S3 account.
  • S3 bucket: private or public

Integrate a public bucket

This section discusses how to integrate an existing Amazon S3 public bucket with Wherobots. A public bucket is a bucket that has granted public read access.

To implement a storage integration between Wherobots and an existing public S3 bucket, you need access to that public bucket's Amazon S3 path and Wherobots Cloud.

Wherobots Cloud (public bucket workflow)

  1. Log in to Wherobots Cloud.
  2. Click Storage.
  3. Click Create Storage Integration.
  4. On the Add New Storage Integration page, do the following:
    1. Create a Name for your storage integration. The name must start with a letter and can only contain letters, numbers, and underscores.
    2. Select the Public Bucket checkbox to confirm that this is a publicly accessible Amazon S3 bucket.
    3. In the S3 Path field, add the path to the bucket. The S3 Path is the name of the public Amazon S3 bucket, prefaced by s3://.
    4. Click Submit.
    5. Complete the steps in Amazon S3 Dashboard.

Amazon S3 Dashboard

To integrate a public bucket, you must have that bucket's Amazon S3 path. A public bucket in Amazon S3 is a bucket that has turned off S3’s default option to Block all public access.

Verify your public bucket storage integration

To ensure that your bucket has been created successfully, do the following:

  1. Log in to Wherobots Cloud.
  2. Go to Organization Settings.
  3. Scroll down to Storage.
  4. Click > Verify Access

If you have successfully integrated a public bucket, a pop-up window will appear and confirm that Wherobots has read access to your storage but not write access.

Note: Write access is not permitted for public buckets. As a result, public buckets cannot be used as a Spatial Catalog.

Integrate a private bucket

This section discusses how to integrate an Amazon S3 private bucket storage with Wherobots. In S3, you can create private buckets but give specific Roles access to that bucket.

To integrate a private bucket, you must have already created a private bucket in Amazon S3 that leaves S3's default option to Block all public access enabled. For more information see, Creating a bucket in the Amazon S3 documentation.

To implement a storage integration between Wherobots and a private S3 bucket, you will need access to Wherobots Cloud and the Amazon IAM Dashboard.

Wherobots Cloud (private bucket workflow)

  1. Log in to Wherobots Cloud.
  2. Click Storage.
  3. Click Create Storage Integration.
  4. On the Add New Storage Integration page, do the following:

    1. Create a Name for your storage integration. The name must start with a letter and can only contain letters, numbers, and underscores.
    2. Leave the Public Bucket checkbox unchecked.
    3. Click Copy to copy the Trust Relationship JSON.
    4. In the Role ARN field, add the role’s Amazon Resource Name (ARN), if you have already created a Role. If you haven’t created a role yet, complete the steps in Grant a role access to a private S3 bucket and then return to this step after creating a Role and copying the ARN.
    5. In the S3 Path field, add the path to your private bucket. The S3 path is the name of your private Amazon S3 bucket, prefaced by s3://.
    6. (Optional) Leave Would you like to create a Wherobots Spatial Catalog in this location? box unchecked to create a Spatial Catalog.
    7. Click Submit. You will be taken to Organization Settings.
    8. Scroll to Storage.
    9. Find your private bucket and click > Verify Access.
    10. Click Sample Role Policy.
    11. Click Copy. Paste the Role Policy in the AWS Policy Editor field and then return to this section. For more information see step 14 in Grant a role access to a private S3 bucket.
    12. If your storage integration has been created successfully, you will see the following after clicking Organization Settings > Storage > > Verify Access for your private bucket.
    13. Wait a few seconds and then click Retry, if your integration is unsuccessful.

    To complete your S3 storage integration, you must finish the steps in Amazon IAM Dashboard - Grant role access to a private S3 bucket.

Amazon IAM Dashboard

Grant role access to a private S3 bucket

You can configure an IAM role to enable access to your private S3 Bucket. Creating a Role for Wherobots' S3 storage integration streamlines access management by granting permissions to the Role itself, rather than individual users.

To enable create and configure a role, do the following:

  1. Sign in to the AWS console and access your AWS IAM Dashboard.
    1. Once you're in the console, you'll see a search bar at the top. Type "IAM" and select IAM from the dropdown menu.
  2. Once you’re on the IAM Dashboard, click Create role.
  3. Select Custom trust policy.
  4. In the Custom trust policy field, paste the Trust Relationship JSON from step 4.k in Wherobots Cloud (Private bucket workflow).
  5. Click Next.
  6. (Optional) On the Add permissions screen, add any required permissions.
  7. Click Next.
  8. Enter a Role name and Description.
  9. Click Create role.
  10. On the IAM Dashboard, click the Role you just created.
  11. Copy the ARN for that Role and return to step 4.d in Wherobots Cloud (Private bucket workflow).
  12. On the Permissions tab, click Add Permissions > Create inline policy.
  13. Click JSON.
  14. Paste the Role Policy from step 4.k in Wherobots Cloud (Private bucket workflow) into the Policy Editor field. Note: At minimum, the role should have List and Get privileges. If you wish to mutate the data, you should also include Put and Delete privileges.
  15. Click Next.
  16. Enter a Policy Name.
  17. Click Create policy.

    To complete your S3 storage integration, you must finish the steps in Wherobots Cloud (Private bucket workflow).

    For additional information on creating a role, see IAM role creation in the Amazon Web Services (AWS) documentation.

Verify your private bucket storage integration

To ensure that your bucket has been created successfully, do the following:

  1. Log in to Wherobots Cloud.
  2. Go to Organization Settings.
  3. Scroll down to Storage.
  4. Click > Verify Access next to your desired private bucket.

If you have successfully integrated a private bucket, a pop-up window will appear and confirm that Wherobots has read and write access to your storage.

Delete storage integration

To delete a storage integration, do the following:

  1. Log in to Wherobots Cloud.
  2. Go to Organization Settings.
  3. Scroll down to Storage.
  4. Click > Delete next to the storage integration that you want to remove.
  5. Click Delete to confirm that you want to delete this storage integration.

Access integrated storage in a notebook

Once you create a storage integration, you can read your data in a Wherobots Notebook. This works for both private and public S3 buckets.

To use storage integration in a notebook, do the following:

  1. Log in to Wherobots Cloud.
  2. Start a notebook.
    For more information on how to start and manage a notebook, see Notebook instance management.
  3. Create a Notebook with a Python Kernel.
    Fore more information, see Jupyter Notebook Management.
  4. In the Notebook, include the following Python code, replacing bucket-name s3-bucket-name and file-path with the name of your bucket and the file path to your bucket resource, respectively:

    from sedona.spark import *  
    config = SedonaContext.builder().getOrCreate()  
    sedona = SedonaContext.create(config)  
    path = "s3://s3-bucket-name/file-path"  
    rawDf = sedona.read.format("binaryFile").load(path) rawDf.printSchema()  
    
  5. Run the cells. You should see the following output:

    root  
    |-- path: string (nullable = true)  
    |-- modificationTime: timestamp (nullable = true)  
    |-- length: long (nullable = true)  
    |-- content: binary (nullable = true)  
    

Spatial Catalog

Spatial catalogs can be created from a private bucket storage integration at any time, allowing for multiple catalogs per integration and addressing situations where a catalog wasn't created during the initial setup.

To create a spatial catalog from a storage integration, do the following:

  1. Log in to Wherobots Cloud.
  2. Click Spatial Catalog.
  3. Click Create Catalog.
  4. In the Name field, enter a name for your Spatial Catalog.
  5. In the Storage dropdown, select a private bucket.
  6. (Optional) In the Path field, enter the sub-folder where you’d like to store this Spatial Catalog.

Limitations

Currently, Wherobots' Amazon S3 integration has the following limitations:

  • A bucket can only be configured with a single storage integration.
  • Write access is not permitted for public buckets. As a result, public buckets cannot be used as a Spatial Catalog.

Last update: October 11, 2024 22:31:31