Google Colab(oratory) is an invaluable resource for data science. You get GPU/TPU computing power while from a Jupyter Notebook frontend running on Google’s servers… for free! Surprisingly though, conda is not preinstalled in the default configuration. Learn how to fix it!


Update Aug 7th 2019:

Instructions below do not work for Python 3 anymore (I will look into this). Using Python 2.7 does work, though! You will have to download Anaconda for Python 2.7 and set up Google Colab accordingly. I cannot recommend this approach since Python 2 will not receive support anymore.


TLDR: Just run these two cells at the beginning of your Colab notebook:

!wget https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh && bash Anaconda3-5.2.0-Linux-x86_64.sh -bfp /usr/local
import sys
sys.path.append('/usr/local/lib/python3.6/site-packages')

Python has become the most popular language in StackOverflow and one could argue that its success is recently due to data science in general and machine learning in particular. Most of the data scientists rely on the Anaconda distribution or, at least, its package manager to install the libraries they need: conda.

While most popular projects offer *.deb packages and pip wheels (both methods officially supported by Google Colab), some are only distributed through conda (for example, OpenMM). However, conda is not preinstalled in the Colab environments! The good news is that you can install it manually for each notebook.

Install Miniconda

Google Colab uses Python 3.6, so we need an Anaconda distribution compiled for that version. Recent builds use Python 3.7, so you have to use Anaconda v5.2 or Miniconda v4.5.4. Choose one below.

A - Using the full Anaconda distribution

The full Anaconda bundle contains a huge selection of data science packages (650, to be exact) ready to run.

!wget https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh && bash Anaconda3-5.2.0-Linux-x86_64.sh -bfp /usr/local

B - Using the Miniconda distribution

Miniconda only contains the basics: Python itself, conda, pip and some libraries. It’s up to you to install whatever you need afterwards.

!wget https://repo.continuum.io/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh && bash Miniconda3-4.5.4-Linux-x86_64.sh -bfp /usr/local

Test the installation

Now, you should be able to run some shell commands to check everything is correct:

!conda info --all
!conda list

Patch sys.path and import your packages

To be able to import the Anaconda packages, you have patch sys.path so Python can find the modules:

import sys
sys.path.append('/usr/local/lib/python3.6/site-packages')

Now you can just use Python’s import like usual.

Install more packages

Of course, you can install more packages if needed. Just remember to use -y to avoid interactive prompts and -q to remove excessive output. Also, if the package needs cuda, make sure it is compiled for v10. For example, to install openmmtools (which relies on openmm and several conda-only packages).

!conda install -y -q -c conda-forge -c omnia/label/cuda100 -c omnia openmmtools

Solving the conda environment in terms of dependencies can take a while sometimes. In some tests, that last command took 15-20 minutes, but in other cases it finished in 2 minutes. ¯\_(ツ)_/¯

Check GPU support with openmm tests:

import simtk.testInstallation
simtk.testInstallation.main()

Expected output is 4 platforms:

There are 4 Platforms available:

1 Reference - Successfully computed forces
2 CPU - Successfully computed forces
3 CUDA - Successfully computed forces
4 OpenCL - Successfully computed forces

Median difference in forces between platforms:

Reference vs. CPU: 6.3031e-06
Reference vs. CUDA: 6.73543e-06
CPU vs. CUDA: 7.81258e-07
Reference vs. OpenCL: 6.75426e-06
CPU vs. OpenCL: 8.15821e-07
CUDA vs. OpenCL: 2.17776e-07

… And that’s it. Maybe in the future Google Colab will bundle conda as well and this won’t be needed. But as of Apr 2019, this is the way to go!

Bonus: Ready-to-run Miniconda-enabled Notebook.