This is the second part of the tutorial series How I work from home, in which I explain how to access your data remotely via different ways. In this part, I talk about remote data analysis using the excellent Jupyter Notebook environment.

Part 2 - Table of Contents

  1. Install Jupyter Notebook
  2. Link the remote server to your local machine
  3. Connect from your browser

Remote data analysis with Jupyter Notebook

You can use the methods listed in pt. 1 to access any files you want remotely, but in the end it will incur downloading something. If you simply intend to analyse the data (interactively or not), you might just want to run the analysis remotely as well. You won’t have to download anything, and the pipeline will have faster access to the files.

Notice: Take into account that the data processing will run on the remote computer. If this is a shared computer and you run numerically intensive processes, all the other users will notice the performance loss. In that case, consider scripting the interactive analysis in an automated way so you can submit it to a calculation node.

01: Install Jupyter Notebook

Jupyter Notebooks are language-agnostic documents composed of cells that allow you to run code interactively while documenting the process. They were created for Python (IPython era) but now admit more languages (Julia, R, Scala…). Behind the scenes, they are JSON files that are rendered by a web application. This means that you access them through the browser, which is perfect for remote access. I mean, that’s what browsers were designed for!

Since I mostly use Python for data analysis, I will install Jupyter along a Python environment with conda. conda is the package manager of the Anaconda Python distribution for data science, freely available. Anaconda is distributed in two flavours:

  • Anaconda Distribution: The full pack with hundreds of packages for data science, Jupyter included.
  • Miniconda: A stripped down version with only the basics (Python + conda, mainly), which you can then use to install what you want.

I normally prefer a smaller installation size, so choose Miniconda and then install Jupyter in the REMOTE computer. You don’t need it locally (you will just connect to a webserver). Instructions for Miniconda:

# Connect to remote PC (jump through BASTION if necessary, see part 1)
[henry@home] $ ssh
# Download Miniconda Python 3.7 for Linux 64bit
[robert@remote] $ wget
# Install Miniconda
[robert@remote] $ bash Miniconda*.sh
# Configure .bashrc
# If present, remove the PATH modification line added by the Miniconda installer
[robert@remote] $ echo ". ~/miniconda3/etc/profile.d/" >> ~/.bashrc
[robert@remote] $ source ~/.bashrc
# Activate conda base environment and install Jupyter
[robert@remote] $ conda activate
[robert@remote] $ conda install jupyter notebook ipython

Now, you can run Jupyter Notebook in the remote PC. Use the --no-browser flag to skip the automatic $BROWSER launch.

# If not already activated
[robert@remote] $ conda activate
# Run Jupyter server
[robert@remote] $ jupyter notebook --no-browser

You will see something like this in the output:

[I 16:43:24.055 NotebookApp] Serving notebooks from local directory: /home/user/data
[I 16:43:24.056 NotebookApp] The Jupyter Notebook is running at:
[I 16:43:24.056 NotebookApp] http://localhost:8888/?token=fbccf41667a91f907253cb653654651356623741d947
[I 16:43:24.056 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 16:43:24.056 NotebookApp]

    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:

If you Ctrl+Click on that URL now, a new tab will open in the browser, but it will fail to connect. We have to fix that by forwarding the connection to our home PC.

You need to keep this process running, so don’t close the ssh connection! If you really need to do that, consider using screen or tmux to create a virtual session before launching the Jupyter server. Remember to copy the token code in that case!

[robert@remote] $ screen -R notebook
[robert@remote+screen] $ conda activate
[robert@remote+screen] $ jupyter notebook --no-browser
# <Ctrl+A><D> to exit screen and leave it running
# Re-enter screen with: screen -R notebook

02: Forward the server connection

At this point, the Jupyter server will be running, but we cannot access it from our home PC. We need to forward the remote IP and port (8888 by default) to our localhost address with SSH tunneling, aka the -Nfl combo.

If you have direct access to the remote PC running the Jupyter server, it is easy. We will use the same port at both sides for simplicity:

[henry@home] $ ssh -NfL 8888:localhost:8888

If you have to jump through a bastion server, you can use -J like before and it will behave as expected:

[henry@home] $ ssh -J -NfL 8888:localhost:8888

Without -J (older OpenSSH builds), you can do it with two forwardings:

# First connection let us bypass the bastion
[henry@home] $ ssh -NfL -p 22522
# We can now access REMOTE through localhost at port 2222!
# Bind the remote 8888 to our local 8888.
# Notice how the username matches REMOTE, not HOME
[henry@home] $ ssh -NfL 8888:localhost:8888 -p 2222 robert@localhost

03: Connect!

If the 2nd step was successful, you can now open a new tab in your browser and enter http://localhost:8888. It will ask for the token from step 1. In fact, you can also Ctrl+Click in that link (http://localhost:8888/?token=fbccf41667a91f907253cb653654651356623741d947) to open the webapp directly in your local browser. It will correctly load now!

Jupyter Notebook web-app

You can create a new Notebook by using the New dropdown menu in the top right corner and begin your data analysis. You will be able to type your code without lag because the commands will only be sent to the remote PC when you execute the cell (Ctrl+Enter or Shift+Enter).

04: Additional benefits of the Notebook environment

The Notebook environment can assist your remote work in more ways than interactive analysis:

Explore and view your files

The Notebook web-app provides a tree view for all the directories and files listed under the working directory. This is the first thing you see when you connect to the server. So if you ran jupyter notebook from ~/Documents, the dashboard will list everything under ~/Documents/*.

From this view you can also:

  • Edit text files: The builtin editor supports syntax highlighting, shortcuts…
  • View/download other files: If your browser natively supports the format (images, PDF documents…), you will preview it. Otherwise, it will be downloaded.
  • Create, copy, rename and delete generic directories and files
  • Manage running notebooks: Pause, resume and stop them, even several ones at the same time.

Execute shell code

You might already know that, inside a notebook, you can prepend an exclamation mark before the shell command (!ls -alh), and when you execute the cell that command will be run in a subprocess. The stdout results will appear as the results of the cell. You can even save them to a Python variable: directories = !ls -1 */

A little lesser known feature is the possibility of launching a full terminal in your browser. From the New dropdown in the dashboard, choose Terminal and you will get a new tab with an interactive shell console:

Jupyter Notebook terminal

Wrapping up

That’s all for now. Jupyter can really accelerate your analysis workflows in the way to automating everything. However, there are areas where a graphical interface is needed no matter what… Check the third part to learn how to do that on remote!

All parts

  1. Remote file access
  2. Remote data analysis
  3. Remote graphical interfaces and desktop