Deploying machine learning models has always been a struggle. Most of the software industry has adopted the use of container engines like Docker for deploying code to production, but since accessing hardware resources like GPUs from Docker was difficult and required hacky, driver specific workarounds, the machine learning community has shied away from this option. With the recent release of NVIDIA’s nvidia-docker tool, however, accessing GPUs from within Docker is a breeze, and we’re already reaping the benefits here at indico. In this tutorial we’ll walk you through setting up nvidia-docker so you too can deploy machine learning models with ease.

Before we get into the details however, let’s talk briefly about why using Docker for your next data science project may be a good choice. There is certainly a learning curve for the tools in the Docker ecosystem, but the benefits are worth the effort.

  1. No inconsistencies between team environment configurations:

    Software configuration is always a pain. Docker’s configure once, run anywhere model means your teammates will have to worry less about environment setup and can focus more on writing code and building machine learning models.

  2. Reliable deployments:

    Fewer bugs crop up in production when you can be assured that your development environment is identical to your production environment.

  3. Git-like tool for environment configuration:

    If something does go wrong in production, reverting to a previous Docker image ensures you can quickly get back to a functional state.

Why is a special solution needed for using GPUs within Docker?

Docker is designed to be hardware and platform agnostic. GPUs are specialized hardware that is not necessarily available on every host. Because of this, the Docker binary does not include GPU support out of the box, and requires a fair amount of configuration to get things working properly. When we first started using Docker in production and needed to enable access to GPU devices from within the container, we had to roll our own solution. It was educational to have to understand the mechanisms by which hardware like GPUs are exposed to an operating system (primarily the /dev block), but we ended up with a solution that was not portable and required that the host’s NVIDIA driver was identical to a second copy of the driver installed within the container. Whenever we updated our NVIDIA drivers to support newer CUDA versions, we had to make a breaking change to our Docker image in order to ensure drivers matched exactly.

Thankfully, the nice folks at NVIDIA have rectified this problem by releasing nvidia-docker, a tool for configuring docker to allow GPU access from within containers.

How does nvidia-docker work?

nvidia-docker takes the following steps to get CUDA working within your container:

  • It attaches the GPU device blocks to your container as Docker volumes (/dev/nvidia0, /dev/nvidiactl, etc.)
  • It mounts the device drivers on your host within the Docker container

This means that as long as you have a functional NVIDIA driver on your host and a CUDA version recent enough to support your driver is installed within your container, you should be able to execute CUDA code from your running Docker container. Importantly, the Docker container can also be run in another environment with different driver versions, making it easy to build once and then run anywhere.

How do I install nvidia-docker?

Use of nvidia-docker requires:

  • Linux kernel > 3.10
  • NVIDIA GPU with Architecture > Fermi (2.1)
  • NVIDIA drivers >= 340.29 with binary nvidia-modprobe
  • Docker >= 1.9

If you already meet these requirements, installation of nvidia-docker is as easy as installing a .deb file (on Ubuntu 14.04):

bash
# Install nvidia-docker and nvidia-docker-plugin
wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.0-rc.3/nvidia-docker_1.0.0.rc.3-1_amd64.deb
sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb

If you already have a working nvidia-docker on your host machine, you can try out nvidia-docker immediately by running the nvidia/cuda Docker image provided by NVIDIA:

# Test nvidia-smi
nvidia-docker run --rm nvidia/cuda nvidia-smi

Depending on your driver version, you may need to specify a different version of CUDA to run when testing your installation:

# Test nvidia-smi
nvidia-docker run --rm nvidia/cuda:7.5 nvidia-smi

If all is well, you should see something like:

$ nvidia-docker run --rm nvidia/cuda:7.5 nvidia-smi
7.5: Pulling from nvidia/cuda
bf5d46315322: Already exists
9f13e0ac480c: Already exists
e8988b5b3097: Already exists
40af181810e7: Already exists
e6f7c7e5c03e: Already exists
261ad237e477: Already exists
83d2db6fdab9: Pull complete
e8e8d0e851cd: Pull complete
c0000b849c19: Pull complete
180b04fcdc2d: Pull complete
1e5b85df3d02: Pull complete
Digest: sha256:c601c6902928d62c79f2cbf90bf07477b666e28b51b094b3a10924ec7dacde8b
Status: Downloaded newer image for nvidia/cuda:7.5
Fri Nov  4 16:34:00 2016       
+------------------------------------------------------+                       
| NVIDIA-SMI 352.93     Driver Version: 352.93         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 760     Off  | 0000:01:00.0     N/A |                  N/A |
| 17%   31C    P8    N/A /  N/A |    172MiB /  4095MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0                  Not Supported                                         |
+-----------------------------------------------------------------------------+

For distributions other than Ubuntu or to install nvidia-docker from source, check out the nvidia-docker quick start guide and installation documentation.

Now let’s use nvidia-docker for something more substantial. We’ll be setting up and running the “neural doodle” project from Alex Champanard (@alexjc). The project takes rough sketches and turns them into artistic masterpieces using techniques from the Semantic Style Transfer paper.

Alex has already done the hard work of providing us with a Docker image of his project, and has gone to the trouble of installing the necessary CUDA drivers in the Docker image as well. Normally we’d need to have a functioning installation of CUDA, Theano, and the lasagne library in order to run his code, but since he’s provided us with a Docker image we should be up and running in just a few minutes.

git clone https://github.com/alexjc/neural-doodle.git && cd neural-doodle
alias doodle="nvidia-docker run -v ($pwd)/samples:/nd/samples -v ($pwd)/frames:/nd/frames -it alexjc/neural-doodle:gpu"

# paint a photo of a coastline in the style of Monet
doodle --style samples/Monet.jpg --output samples/Coastline.png --device=gpu --iterations=40

 
This example takes this original Monet painting:
original Monet painting

and this sketch of a similar coastline:
coastline sketch

and creates a new work of art in style similar to the original Monet:
new art, Monet style

Pretty cool, huh?

Let’s walk through the neural-doodle dockerfile and the doodle alias to remove some of the magic behind what we’ve just done.

The dockerfile used to build the alexjc/neural-doodle:gpu image is below:

FROM nvidia/cuda:7.5-cudnn4-devel

# Install dependencies
RUN apt-get -qq update            && \
    apt-get -qq install --assume-yes \
        "module-init-tools"         \
        "build-essential"           \
        "cmake"                     \
        "git"                       \
        "wget"                      \
        "libopenjpeg2"              \
        "libopenblas-dev"           \
        "liblapack-dev"             \
        "libjpeg-dev"               \
        "libtiff5-dev"              \
        "zlib1g-dev"                \
        "libfreetype6-dev"          \
        "liblcms2-dev"              \
        "libwebp-dev"               \
        "gfortran"                  \
        "pkg-config"                \
        "python3"                   \
        "python3-dev"               \
        "python3-pip"               \
        "python3-numpy"             \
        "python3-scipy"             \
        "python3-matplotlib"        \
        "python3-six"               \
        "python3-networkx"          \
        "python3-tk"             &&  \
    rm -rf /var/lib/apt/lists/*  &&  \
    python3 -m pip -q install "cython"


# Install requirements before copying project files
WORKDIR /nd
COPY requirements.txt .
RUN python3 -m pip -q install -r "requirements.txt"


# Copy only required project files
COPY doodle.py .


# Get a pre-trained neural network (VGG19)
RUN wget -q "https://github.com/alexjc/neural-doodle/releases/download/v0.0/vgg19_conv.pkl.bz2"


# Set an entrypoint to the main doodle.py script
ENTRYPOINT ["python3", "doodle.py", "--device=gpu"]

Hey, this isn’t so bad. The dockerfile Alex used is based off of an official NVIDIA Docker image (nvidia/cuda:7.5-cudnn4-devel) that already includes the required CUDA libraries, so it only has to describe how to install a few system dependencies for working with image formats, install a few machine learning Python packages with pip (Theano, lasagne, etc.), and download some pre-trained model weights. It’s little more than a glorified bash setup script.

The doodle alias isn’t bad either. It simply specifies the Docker image we’ll be running (alexjc/neural-doodle:gpu) and lets Docker know that the ./samples and ./frames directories should be accessible from the Docker container at /nd/samples/ and /nd/frames. This is done using Docker’s “volumes” feature, which the curious can read more about on the official Docker site.

At indico, we now use a setup to the neural-doodle configuration to host the indico API on Amazon GPUs. Instead of using our own bash scripts, we allow the nvidia-docker tool to handle the process of ensuring device drivers within the Docker container match device drivers on the host. This means when our customers wish to run our APIs on their local machines, deployment is as easy as providing them with access to our production Docker image and letting the nvidia-docker tool handle the rest.

Operating System Support

At the moment, nvidia-docker is only portable in the sense that it’s not reliant on a particular GPU model, NVIDIA driver version, or linux distribution. Running nvidia-docker on OSX or Windows will likely not be supported anytime soon.

Where can I find more information on nvidia-docker?

NVIDIA has done an excellent job of keeping the wiki of their Github page up-to-date. Chances are if you have questions that aren’t answered in this blog post, you can probably find answers in the nvidia-docker Github wiki.

If you’re using a version of CUDA other than the one used in this demo (CUDA 7.5), you might also want to take a peek at the full list of base images that NVIDIA provides for you to work with.


I hope you’ve enjoyed this whirlwind tour on using nvidia-docker to build and run machine learning projects, and perhaps created a bit of original algorithmic art while you’re at it. If you run into trouble trying out this tutorial, or want to learn more about how we’re using Docker in production at indico, feel free to reach out over our site chat and say hello. Happy hacking!

Suggested Posts

TensorFlow in Practice (Video + Slides)

Machine Learning So Easy, Even Your Cat Could Do It (Part 1): Sentiment Analysis

Python Deep Learning Frameworks Reviewed