Deploying machine learning models has always been a struggle. Most of the software industry has adopted the use of container engines like Docker for deploying code to production, but since accessing hardware resources like GPUs from Docker was difficult and required hacky, driver specific workarounds, the machine learning community has shied away from this option. With the recent release of NVIDIA’s nvidia-docker
tool, however, accessing GPUs from within Docker is a breeze, and we’re already reaping the benefits here at indico. In this tutorial we’ll walk you through setting up nvidia-docker
so you too can deploy machine learning models with ease.
Before we get into the details however, let’s talk briefly about why using Docker for your next data science project may be a good choice. There is certainly a learning curve for the tools in the Docker ecosystem, but the benefits are worth the effort.
- No inconsistencies between team environment configurations:
Software configuration is always a pain. Docker’s configure once, run anywhere model means your teammates will have to worry less about environment setup and can focus more on writing code and building machine learning models.
- Reliable deployments:
Fewer bugs crop up in production when you can be assured that your development environment is identical to your production environment.
- Git-like tool for environment configuration:
If something does go wrong in production, reverting to a previous Docker image ensures you can quickly get back to a functional state.
Why is a special solution needed for using GPUs within Docker?
Docker is designed to be hardware and platform agnostic. GPUs are specialized hardware that is not necessarily available on every host. Because of this, the Docker binary does not include GPU support out of the box, and requires a fair amount of configuration to get things working properly. When we first started using Docker in production and needed to enable access to GPU devices from within the container, we had to roll our own solution. It was educational to have to understand the mechanisms by which hardware like GPUs are exposed to an operating system (primarily the /dev
block), but we ended up with a solution that was not portable and required that the host’s NVIDIA driver was identical to a second copy of the driver installed within the container. Whenever we updated our NVIDIA drivers to support newer CUDA versions, we had to make a breaking change to our Docker image in order to ensure drivers matched exactly.
Thankfully, the nice folks at NVIDIA have rectified this problem by releasing nvidia-docker
, a tool for configuring docker to allow GPU access from within containers.
How does nvidia-docker
work?
nvidia-docker
takes the following steps to get CUDA working within your container:
- It attaches the GPU device blocks to your container as Docker volumes (/dev/nvidia0, /dev/nvidiactl, etc.)
- It mounts the device drivers on your host within the Docker container
This means that as long as you have a functional NVIDIA driver on your host and a CUDA version recent enough to support your driver is installed within your container, you should be able to execute CUDA code from your running Docker container. Importantly, the Docker container can also be run in another environment with different driver versions, making it easy to build once and then run anywhere.
How do I install nvidia-docker
?
Use of nvidia-docker
requires:
- Linux kernel > 3.10
- NVIDIA GPU with Architecture > Fermi (2.1)
- NVIDIA drivers >= 340.29 with binary nvidia-modprobe
- Docker >= 1.9
If you already meet these requirements, installation of nvidia-docker
is as easy as installing a .deb file
(on Ubuntu 14.04):
bash # Install nvidia-docker and nvidia-docker-plugin wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.0-rc.3/nvidia-docker_1.0.0.rc.3-1_amd64.deb sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb
If you already have a working nvidia-docker
on your host machine, you can try out nvidia-docker
immediately by running the nvidia/cuda
Docker image provided by NVIDIA:
# Test nvidia-smi nvidia-docker run --rm nvidia/cuda nvidia-smi
Depending on your driver version, you may need to specify a different version of CUDA to run when testing your installation:
# Test nvidia-smi nvidia-docker run --rm nvidia/cuda:7.5 nvidia-smi
If all is well, you should see something like:
$ nvidia-docker run --rm nvidia/cuda:7.5 nvidia-smi 7.5: Pulling from nvidia/cuda bf5d46315322: Already exists 9f13e0ac480c: Already exists e8988b5b3097: Already exists 40af181810e7: Already exists e6f7c7e5c03e: Already exists 261ad237e477: Already exists 83d2db6fdab9: Pull complete e8e8d0e851cd: Pull complete c0000b849c19: Pull complete 180b04fcdc2d: Pull complete 1e5b85df3d02: Pull complete Digest: sha256:c601c6902928d62c79f2cbf90bf07477b666e28b51b094b3a10924ec7dacde8b Status: Downloaded newer image for nvidia/cuda:7.5 Fri Nov 4 16:34:00 2016 +------------------------------------------------------+ | NVIDIA-SMI 352.93 Driver Version: 352.93 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 760 Off | 0000:01:00.0 N/A | N/A | | 17% 31C P8 N/A / N/A | 172MiB / 4095MiB | N/A Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 Not Supported | +-----------------------------------------------------------------------------+
For distributions other than Ubuntu or to install nvidia-docker
from source, check out the nvidia-docker
quick start guide and installation documentation.
Now let’s use nvidia-docker
for something more substantial. We’ll be setting up and running the “neural doodle” project from Alex Champanard (@alexjc). The project takes rough sketches and turns them into artistic masterpieces using techniques from the Semantic Style Transfer paper.
Alex has already done the hard work of providing us with a Docker image of his project, and has gone to the trouble of installing the necessary CUDA drivers in the Docker image as well. Normally we’d need to have a functioning installation of CUDA, Theano, and the lasagne library in order to run his code, but since he’s provided us with a Docker image we should be up and running in just a few minutes.
git clone https://github.com/alexjc/neural-doodle.git && cd neural-doodle alias doodle="nvidia-docker run -v ($pwd)/samples:/nd/samples -v ($pwd)/frames:/nd/frames -it alexjc/neural-doodle:gpu" # paint a photo of a coastline in the style of Monet doodle --style samples/Monet.jpg --output samples/Coastline.png --device=gpu --iterations=40
This example takes this original Monet painting:
and this sketch of a similar coastline:
and creates a new work of art in style similar to the original Monet:
Pretty cool, huh?
Let’s walk through the neural-doodle
dockerfile and the doodle
alias to remove some of the magic behind what we’ve just done.
The dockerfile used to build the alexjc/neural-doodle:gpu
image is below:
FROM nvidia/cuda:7.5-cudnn4-devel # Install dependencies RUN apt-get -qq update && apt-get -qq install --assume-yes "module-init-tools" "build-essential" "cmake" "git" "wget" "libopenjpeg2" "libopenblas-dev" "liblapack-dev" "libjpeg-dev" "libtiff5-dev" "zlib1g-dev" "libfreetype6-dev" "liblcms2-dev" "libwebp-dev" "gfortran" "pkg-config" "python3" "python3-dev" "python3-pip" "python3-numpy" "python3-scipy" "python3-matplotlib" "python3-six" "python3-networkx" "python3-tk" && rm -rf /var/lib/apt/lists/* && python3 -m pip -q install "cython" # Install requirements before copying project files WORKDIR /nd COPY requirements.txt . RUN python3 -m pip -q install -r "requirements.txt" # Copy only required project files COPY doodle.py . # Get a pre-trained neural network (VGG19) RUN wget -q "https://github.com/alexjc/neural-doodle/releases/download/v0.0/vgg19_conv.pkl.bz2" # Set an entrypoint to the main doodle.py script ENTRYPOINT ["python3", "doodle.py", "--device=gpu"]
Hey, this isn’t so bad. The dockerfile Alex used is based off of an official NVIDIA Docker image (nvidia/cuda:7.5-cudnn4-devel
) that already includes the required CUDA libraries, so it only has to describe how to install a few system dependencies for working with image formats, install a few machine learning Python packages with pip
(Theano, lasagne, etc.), and download some pre-trained model weights. It’s little more than a glorified bash setup script.
The doodle
alias isn’t bad either. It simply specifies the Docker image we’ll be running (alexjc/neural-doodle:gpu
) and lets Docker know that the ./samples
and ./frames
directories should be accessible from the Docker container at /nd/samples/
and /nd/frames
. This is done using Docker’s “volumes” feature, which the curious can read more about on the official Docker site.
At indico, we now use a setup to the neural-doodle
configuration to host the indico API on Amazon GPUs. Instead of using our own bash scripts, we allow the nvidia-docker
tool to handle the process of ensuring device drivers within the Docker container match device drivers on the host. This means when our customers wish to run our APIs on their local machines, deployment is as easy as providing them with access to our production Docker image and letting the nvidia-docker
tool handle the rest.
Operating System Support
At the moment, nvidia-docker
is only portable in the sense that it’s not reliant on a particular GPU model, NVIDIA driver version, or linux distribution. Running nvidia-docker
on OSX or Windows will likely not be supported anytime soon.
Where can I find more information on nvidia-docker
?
NVIDIA has done an excellent job of keeping the wiki of their Github page up-to-date. Chances are if you have questions that aren’t answered in this blog post, you can probably find answers in the nvidia-docker
Github wiki.
If you’re using a version of CUDA other than the one used in this demo (CUDA 7.5), you might also want to take a peek at the full list of base images that NVIDIA provides for you to work with.
I hope you’ve enjoyed this whirlwind tour on using nvidia-docker
to build and run machine learning projects, and perhaps created a bit of original algorithmic art while you’re at it. If you run into trouble trying out this tutorial, or want to learn more about how we’re using Docker in production at indico, feel free to reach out over our site chat and say hello. Happy hacking!