Skip to content

Jupyter Notebook Management

When you open a Notebook in Wherobots Cloud, you will see the JupyterLab Launcher:

Note

The introductory notebook appears when you first open a Wherobots Cloud Notebook. After that, you'll see the Launcher.

Jupyter Dashboard

Note

A warning about a lost server connection may appear in your Jupyter notebook after extended use. Wherobots uses cookies for authentication, and these expire after an hour for security. Refresh the page to be redirected to the login screen, and after logging back in, you'll return to your previous page.

For more information on starting a runtime and opening a Notebook in Wherobots Cloud, see Notebook Instance Management.

Before you start

The following is required to manage a Wherobots Notebook:

  • An account within a Community, Professional, or Enterprise Edition Organization. For more information, see Create an Account.

Execute a Jupyter Notebook

There are two types of kernels available for your Jupyter Notebook:

  • Python kernel (ipykernel)
  • Scala kernel (Scala)

These kernels can be created by clicking File -> New Launcher in JupyterLab.

Spark Web UI

The Spark Web UI helps monitor, analyze performance, and optimize resources for efficient data processing. This aids in finding bottlenecks and improving application efficiency. You can access the Spark Web UI by clicking Sedona Spark and selecting the correct port number.

Spark UI

To obtain the port number, execute the provided code snippet:

spark_ui_port = sedona.sparkContext.uiWebUrl.split(":")[-1]

For more information on the Spark Web UI, see Web UI in the Apache Spark documentation.

Execute all cells

To execute all the code cells in the Jupyter notebook, do the following within JupyterLab:

  1. Locate the toolbar at the top-left of the notebook and click Run.
  2. Click Run All Cells to execute each cell in the notebook.

    Execute all cells

Note

When you first execute a WherobotsDB code cell, you might see the following warning:

<TIMESTAMP> WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

This behavior is normal as it takes somewhere between 1-5 minutes depending on the number of Executors provided, to start the Executors.

Python zip library file support

In JupyterLab, users can import their own customized Python modules.

  1. On your local computer or within a virtual environment, create a directory called zipmoduletest.

  2. In the zipmoduletest, create a file named hellosedona.py with the following contents:

    def hello(input):
        return 'hello ' + str(input);
    
  3. In the same directory, add an empty __init__.py file.

  4. Run ls zipmoduletest.

    __init__.py       hellosedona.py
    

    It should look similar to the above.

  5. Use the zip command to place the two module files into a file called zipmoduletest.zip. You can also use any file compression tool to zip these two files.

    zip -r9 ../zipmoduletest.zip *
    
  6. Upload the zip file into Wherobots Managed Storage or within your integrated Amazon S3 bucket. For more information, see Notebook and Data Storage.

  7. Within a Notebook, use the following code to import the zip file.

    sedona.sparkContext.addPyFile('s3://<Your-Bucket>/path-to-file/zipmoduletest.zip')
    from zipmoduletest.hellosedona import hello
    hello_str = hello("Sedona")
    

    The output will be:

    hello Sedona
    

    Note

    You can also include this code to import custom Python modules in job submissions.

Use an Amazon S3 Bucket with your Wherobots Notebook

For more information on accessing data from within an Amazon S3 bucket, see Access Integrated Storage in a Notebook.

Info

To use new storage integrations or catalogs in your notebooks, you must start a new runtime. Notebooks can only access storage integrations or catalogs that were created before the runtime started.

Open a specific Wherobots Notebook

To open a specific notebook, do the following in JupyterLab:

  1. Click File > Open from path.
  2. Enter the notebook path.
  3. Click Open.

Export a Python Notebook

You can export Notebooks as executable Python files in order to create Jobs.

To export your Python Notebook, do the following:

  1. In the JupyterLab toolbar click File.
  2. Hover over Save and Export Notebook As...

    Create Python executable file

  3. Select Executable Script.

    The file will save to your machine.

    Create Python executable file continued

Once you have the Python executable file, refer to WherobotsRunOperator to create a job.

Export a Scala Notebook

To export your Notebook as an executable Scala file, do the following:

  1. Within the JupyterLab toolbar click File.
  2. Hover over Save and Export Notebook As...

    Create Scala executable file

  3. Select Executable Script.

    Create Scala executable file continued

    The file will download to your machine.

    You can import the Scala executable file to sedona-maven-exmaple/src/main/scala/com/wherobots/sedona/ for job submission.

Note

The Scala executable file that you create won't have a main class. You can wrap the code after (excluding import statements) object <class-name-you-want> extends App { all-of-code }.

Note

Executing a .scala file is not possible within the Jupyter Python environment. To execute code, utilize the Jupyter Scala notebook.

Create Jar File

  1. Navigate to File on the toolbar in JupyterLab.
  2. Click New Launcher.

    Open launcher

  3. Open Terminal.

    Open terminal

  4. Go the sedona-maven-example directory.

  5. Run mvn clean package.
  6. Locate target folder.

    Target folder

  7. Right-click on sedonadb-example-0.0.1.jar.

  8. Select Download.

    Download jar file

Note

You may add any dependency to the pom.xml located at notebook-example/scala/sedona-maven-example.

Once you have the jar file, refer to WherobotsRunOperator to create a job.