This is the first part of the tutorial series How I work from home, in which I explain how to access your data remotely via different ways. In this part, I talk about remote data access via ssh
and sshfs
, tunneling included.
Part 1 - Table of Contents
Access remote data on demand with sshfs
You probably know that you can connect to a remote computer using the ssh
command. If the computer is exposed outside the local network, then you can simply grab its IP or domain name and do ssh [-p <port>] <user>@<REMOTE>
and you are in! You can then work from the terminal like you were sitting there.
For example, let’s say your home PC is called home
and its user is henry
, who wants to connect to remote.example.org
as robert
(the address before $
is the PC where you type the command):
# If you have a public address assigned
[henry@home] $ ssh robert@remote.example.org
# Now you are using the remote PC
[robert@remote.example.org] $ hostname
remote.example.org
Then, if you want to copy some files to your home laptop, you use scp
like this:
# Copy a file from WORK to HOME. Syntax is always ORIGIN -> DESTINATION
# The ending dot means "current directory", i.e. the destination
[henry@home] $ scp robert@remote.example.org:some_file.txt .
# Copy a file from HOME to WORK.
[henry@home] $ scp some_file.txt robert@remote.example.org:
This is normally fine and sometimes inevitable, but you will have to wait until the download finishes to read the contents. With large files and slow connections, this can take long. However, if the files you want to read support random access (like HDF5 or DCD), you won’t need to download it fully (only the parts you are interested in)!
In Linux, you can mount the remote path in a local access point thanks to FUSE
and sshfs
. This means that the remote data will be magically available in your computer and will be only downloaded when it is needed, on demand.
First, install sshfs
on your home computer. Depending on your Linux distribution, you need to use a different package manager:
# Ubuntu
[henry@home] $ sudo apt install sshfs
# Arch Linux
[henry@home] $ sudo pacman -S sshfs
Then, create a local directory to be used as the mount point. You can use any name, but I normally use ~/mnt
.
[henry@home] $ mkdir ~/mnt
Finally, use sshfs
to mount the remote path to the recently created directory:
[henry@home] $ sshfs <user>@<REMOTE>:<path/to/the/remote/directory> ~/mnt
You can now access ~/mnt
and it will display the contents of the remote directory you have chosen, both in the terminal and graphical explorer. You can open files, edit them and create new ones as if they were local files. That’s the magic of FUSE!
For example:
[henry@home] $ cd ~/mnt
[henry@home] $ ls -alh
[henry@home] $ less README.md
[henry@home] $ echo "Example text" > new_document.txt
Depending on the number of items in the directory and your connection speed, it can take a few seconds to first display some contents. But, if you are on a trusted network, you can obtain more performance by disabling compression and using a cheaper encryption with these -o
flags:
[henry@home] $ sshfs -o Ciphers=arcfour -o Compression=no <user>@<REMOTE>:<path/to/the/remote/directory> ~/mnt
When you are done, you will be able to unmount the remote path using:
[henry@home] $ fusermount3 -u ~/mnt
Get access through a bastion
Most of the local computers in research or industry environments are behind a firewall that prevents direct access to you desktop and you have to jump over a intermediary or bastion server. This is, first you ssh
into the bastion, and then you ssh
into your PC.
In ideal conditions:
[HOME] --------------(ssh)---------------> [REMOTE]
With a bastion server:
|
|
[HOME] ---(ssh)---> [BASTION] ---(ssh)---> [REMOTE]
|
| firewall
Disclaimer
- This guide provides some hints to workaround limitations in network access provided in your workspace (academic or industrial). Ask your IT team first if they are happy about it!
- SSH tunneling can get heavy in the bastion. Use it responsibly.
Let’s assign hostnames, users and custom ports to each computer for further reference (notice how the username matches the hostname’s first letter).
Computer | Network address | SSH port | Username | Public access |
---|---|---|---|---|
HOME | (not available) | 22 | henry | No |
BASTION | bastion.example.org | 22522 | bianca | Yes |
REMOTE | remote.example.org | 22 | robert | No |
In order to reach REMOTE
from HOME
, your current approach probably involves two separate commands:
# In HOME, connect to BASTION using custom PORT
[henry@home] $ ssh -p 22522 bianca@bastion.example.org
# Now, from BASTION, connect to REMOTE
[bianca@bastion] $ ssh robert@remote.example.org
Can we do it in one command? Yes! In several ways, depending on your OpenSSH version (check it with ssh -V
). I will list them from newer (preferred) to older builds.
OpenSSH 7.3+ - With -J
(equivalent to -o ProxyJump
)
[henry@home] $ ssh -J bianca@bastion.example.org:22522 robert@remote.example.org
OpenSSH 5.4 - 7.3 - With ProxyCommand
[henry@home] $ ssh -o ProxyCommand="ssh -W %h:%p bianca@bastion.example.org" robert@remote.example.org
OpenSSH before 5.4 - With ssh -t ... ssh
[henry@home] $ ssh -t -p 22522 bianca@bastion.example.org ssh robert@remote.example.org
SSHFS through a bastion
The cleanest way is to use ProxyJump
again, thanks to the -o
argument:
[henry@home]$ sshfs robert@remote.example.org:<path/in/remote/pc> ~/mnt -o ProxyJump="bianca@bastion.example.org:22522"
# \____/
# local mount point
Alternatively, you can create a ssh
tunnel (aka the -NfL combo) and then sshfs
from the forwarded port:
# First, create the ssh tunnel to local port 2222
[henry@home]$ ssh -NfL 2222:remote.example.org:22 bianca@bastion.example.org -p 22522
# local port_/ \__remote addr+port_/ \____bastion address_____/ \__bastion port
# Now, REMOTE is accessible through localhost at port 2222
[henry@home]$ sshfs -p 2222 localhost:<path/in/remote/pc> ~/mnt
# \_localhost:2222_/
# (redirects to REMOTE!)
Unmounting it’s the same as without bastions:
[henry@home] $ fusermount3 -u ~/mnt
From Bash aliases to the SSH config file
If you need to type these commands once in a while, it’s fine, but you are probably thinking of creating Bash aliases for each command, right? Well, that’s OK, but you might want to consider using ~/.ssh/config
.
This file can list all your common SSH connections together with their command-line options in a tidy way. Also, they will be available to other ssh
-related commands, like scp
or sshfs
, and you will be able to use TAB-complete!
Let’s say you had this alias
in your ~/.bashrc
so you can connect to bastion.example.org
by simply typing bastion
:
alias bastion="ssh -p 22522 bianca@bastion.example.org"
The (cleaner) ~/.ssh/config
equivalent would be:
Host bastion
Hostname bastion.example.org
User bianca
Port 22522
And you connect like this:
[henry@home] $ ssh bastion
It’s four keystrokes longer, but now it will also work with scp
, sshfs
and so on!
# Now you can stop worrying about whether it is -p or -P
[henry@home] $ scp some_file bastion:data/
Convinced now? Let’s edit the file with mkdir -p ~/.ssh; vi ~/.ssh/config
and add the previous Host bastion
block:
Host bastion
Hostname bastion.example.org
User bianca
Port 22522
Of course, jumps and forwardings are also available. The -J
equivalent in ssh_config
speech is ProxyJump
:
# openssh 7.3+ supports -J
# We need `ProxyJump`, which btw supports more than one jump!
# Use comma-separated values for that
Host remote
Hostname remote.example.org
User robert
ProxyJump bianca@bastion.example.org:22522
Older versions without -J
(under 7.3) need ProxyCommand
:
# Older openssh builds do not support -J
# We need `ProxyCommand` directives with delegated ssh -W
Host remote
Hostname remote.example.org
User robert
ProxyCommand ssh -p 22522 bianca@bastion.example.org -W %h:%p
# Before openssh 5.4 use intermediate nc proxy
Host remote
Hostname remote.example.org
User robert
ProxyCommand /usr/bin/nc -X connect -x bastion.example.org:22522 %h %p
With these entries defined, we now have an alias remote
that grants direct access to remote.example.org
, with bastion jumps and everything! We can now go and simply use it with all ssh-related commands.
[henry@home] $ ssh remote
# And also other ssh related commands!
[henry@home] $ scp myfile remote:some/directory/
[henry@home] $ sshfs remote:<path/to/directory> ~/mnt
For SSH tunneling, he -NfL
combo has no exact translation. You can only specify the -L
part with LocalForward
:
Host remote2222
Hostname remote.example.org
User robert
ProxyJump bianca@bastion.example.org:22522
LocalForward 2222 localhost:22
So you will have to manually provide the -Nf
part at runtime:
[henry@home] $ ssh -Nf remote2222
The ~/.ssh/config
file is incredibly useful! Check the official docs for full details on all the available options (wildcards, compression, X-forwarding, and so on).
I like to have persistent, compressed SSH connections that do not hang over some minutes. Since we can specify wildcard hosts to match all possible connections, we can add these lines at the top of the file:
Host *
ServerAliveInterval 15
ServerAliveCountMax 3
Compression yes
Bonus: SSH keys
For a better access experience, I recommend configuring SSH keys for login instead of password-based. That way, you won’t need to type the passwords, especially on bastioned hosts! You will have to do it for all connections involved; ie, HOME -> BASTION
and BASTION -> REMOTE
.
Direct access to remote PC
In your home PC:
[henry@home] $ ssh-keygen -t rsa
# Follow the assistant
[henry@home] $ ssh-copy-id robert@remote.example.org
Access through bastion
In your home PC:
[henry@home] $ ssh-keygen -t rsa
# Follow the assistant
[henry@home] $ ssh-copy-id bianca@bastion.example.org -p 22522
Then, ssh
into the bastion (you won’t need the password now, probably) and repeat for the lab PC:
[bianca@bastion] $ ssh-keygen -t rsa
# Follow the assistant
[bianca@bastion] $ ssh-copy-id robert@remote.example.org
Done! No more passwords!
Wrapping up
That’s all for now. With this first post I hope I have covered how to access your remote data even if it is behind a bastion server. Next part will be devoted to remote data analysis), so the files stay where they are.
All parts