Uploading data

You can upload file(s) to Wherobots Cloud for later use within your Jupyter notebooks and jobs. You can upload individual files directly through your browser using the file browser, or with the AWS CLI and aws s3 cp commands to upload files directly to Wherobots Cloud's data warehouse. The latter is recommended if you need to upload multiple files, large files, or more complex folder structures like partitioned Parquet files.

To upload a file from the browser, navigate to the desired folder, click the "Upload" button, and select the file you want to upload.

To upload data using the AWS CLI:

  • Click the "Upload" button and follow the steps to request temporary AWS ingest credentials. A short-lived set of AWS credentials (access key ID, access key secret, and session token) will be generated for you.
  • Configure your local environment with those credentials in your ~/.aws/credentials file. Refer to the AWS CLI documentation for more information on the configuration of AWS credentials for the CLI.
  • Navigate to the parent folder you want to upload to, and copy the full S3 path of the target folder by clicking the "Copy" icon on its right hand side. The path should look like s3://wbts-wbc-XXXX/XXXX/data/customer-XXXX/.
  • From your command line, upload your files with aws s3 cp:
$ cat ~/.aws/credentials
[wherobots]
aws_access_key_id = ...
aws_access_secret_key = ...
aws_session_token = ...
$ aws --profile=wherobots s3 cp --recursive my-data/ s3://wbts-wbc-XXXX/XXXX/data/customer-XXXX/

The data directory is accessible from within the Jupyter notebook environment via predefined environment variables:

  • USER_S3_PATH - pointing to /data/customer-XXXX
  • USER_S3_SHARED_PATH - pointing to /data/shared
  • USER_WAREHOUSE_PATH - pointing to /data/customer-XXXX/warehouse

Last update: July 8, 2024 02:18:16