SQream Platform
GPU Powered Data & Analytics Acceleration
Enterprise (Private Deployment) SQL on GPU for Large & Complex Queries
Public Cloud (GCP, AWS) GPU Powered Data Lakehouse
No Code Data Solution for Small & Medium Business
By Jake Wheat
This is a technical article about how to get CUDA passthrough working in a particular Linux container implementation, LXC. We will demonstrate GPU passthrough for LXC, with a short CUDA example program.
Linux containers can be used for many things. We are going to set up something which is like a light-weight virtual machine. This can then be used to help with clean builds, testing, or to help with deployment. Linux containers can also support limiting resource access, resource prioritization, resource accounting and process freezing.
Linux containers are built on two features in the Linux kernel, cgroups, https://en.wikipedia.org/wiki/Cgroups, and namespace isolation. There are several projects building on these kernel features in order to make them a bit easier to use. We are going to use one called LXC. You can read about it here: https://linuxcontainers.org/.
Here is how to get CUDA working in a container on Ubuntu Server 15.04. First install Ubuntu server 15.04 in the usual way.
We want to use the latest LXC:
sudo add-apt-repository ppa:ubuntu-lxc/lxc-stable
Then we can update the system like this:
sudo apt-get update sudo apt-get upgrade sudo apt-get install lxc
After doing this, it is probably a good idea to reboot, to avoid the possibility of having issues connected to the systemd upgrade bug mentioned in the sidebar.
sudo apt-get install -f
or
sudo dpkg --configure -a
after rebooting to finish the apt-get install. I think this is because of a bug in the Ubuntu package upgrade for one of the dependencies of systemd. If ‘sudo reboot’ hangs, you can ctrl-z it into the background (it might eventually time out with an error after several minutes), then run ‘sync;sync;sync;’, then do a hard reset and the system should recover OK after you run the apt-get/dpkg recovery commands above.
Next we’ll install the Nvidia driver on the host operating system. We will install from the Nvidia driver .run installer. You will probably have an issue where the Nouveau kernel module has been loaded by Ubuntu. We don’t want this because it conflicts with the Nvidia driver kernel module. Let’s fix this issue. Create this file, ‘/etc/modprobe.d/nvidia-installer-disable-nouveau.conf’, with these contents:
blacklist nouveau options nouveau modeset=0
Then we should reboot so that we are running without the Nouveau module loaded. Here is where you can get the driver from Nvidia: http://www.nvidia.com/object/unix.html. I used Linux x86_64/AMD64/EM64T – Latest Long Lived Branch version: 352.21, which has the filename ‘NVIDIA-Linux-x86_64-352.21.run’. This driver is compatible with CUDA 7.0. In order to install the driver from source we’ll need gcc and make.
sudo apt-get install gcc make
Then install it:
sudo sh ./NVIDIA-Linux-x86_64-352.21.run
We don’t care about the Xorg stuff on a server, when the installer asks about it just ignore it or tell the installer to do nothing. You can check that CUDA is working on the host machine at this point, by installing the CUDA SDK and compiling and running a simple CUDA program. There is an example program at the bottom of this post. There is also a precompiled exe linked at the bottom of the post which might work on your system and you can avoid having to install the CUDA SDK on the host at this time.
There can be problems with installing from the Ubuntu packages:
These issues will sometimes apply to installing the Nvidia drivers on other Linux distributions.
We are going to run an unprivileged container. This means our container will be created and run under our normal user and not under the root user. We need to do a little manual set up to make this work: edit /etc/lxc/lxc-usernet and add the line:
your-username veth lxcbr0 10
Replace your-username with the user you are using. This is to support networking to the container. Do the following:
mkdir -p ~/.config/lxc cp /etc/lxc/default.conf ~/.config/lxc/default.conf
and add these lines to ‘~/.config/lxc/default.conf’:
lxc.id_map = u 0 100000 65536 lxc.id_map = g 0 100000 65536
These should match the numbers in /etc/subuid, /etc/subgid for your user.
lxc-create -t download -n mycontainer -- --dist ubuntu --release vivid --arch amd64
Add the Nvidia devices to the container, edit the file ‘~/.local/share/lxc/mycontainer/config’ and add these lines to the bottom:
lxc.mount.entry = /dev/nvidia0 dev/nvidia0 none bind,optional,create=file lxc.mount.entry = /dev/nvidiactl dev/nvidiactl none bind,optional,create=file lxc.mount.entry = /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
Setup the LXC container for access via ssh, and with a normal user which can use sudo:
lxc-start -n mycontainer lxc-attach -n mycontainer
After running lxc-attach, the console you are on is a root prompt in the container. Run
apt-get install openssh-server adduser myuser usermod -a -G sudo myuser
Use ctrl-d to exit the container back to the host system. We can get the IP address of the container so we can log in with ssh:
lxc-info -n mycontainer
Use the IP address from the output of lxc-info in the two following commands. First copy the Nvidia driver installer into the container:
scp NVIDIA-Linux-x86_64-352.21.run [email protected]:
Log into the container:
ssh 10.0.3.333 -l myuser
The basic driver setup includes adding a kernel module, and adding a bunch of .so files and a few extra bits. Inside the container we don’t want to try to add the kernel module since we are using the host kernel with the module already loaded. We can install without the kernel module like this:
sudo sh ./NVIDIA-Linux-x86_64-352.21.run --no-kernel-module
(We don’t need to install g++ or make in the container to run this because we are not installing the kernel module.) At this point, if you have a simple CUDA test exe, you can scp it into the container and check it runs OK. Now you can install the CUDA SDK using the .run from Nvidia inside the container, or copy your CUDA binaries into the container and you are good to go. If you install the CUDA SDK from the .run, make sure you don’t try to install/upgrade/replace the Nvidia driver during the CUDA SDK installation. You can use something like this:
# make sure we have the prerequisites to install the SDK, and g++ so # we can use the SDK sudo apt-get update sudo apt-get install perl-modules g++ # install the sdk without the driver ./cuda_7.0.28_linux.run -toolkit -toolkitpath=~/cuda-7.0 -silent -override
You probably don’t want to do normal development in a container, and you definitely want to avoid leaving the CUDA SDK or g++, make, etc. installed on either the host or container for production. One good use for installing the CUDA SDK into a container is to create a convenient way to do repeatable production builds of your CUDA exes.
lxc-stop -n mycontainer lxc-start -n mycontainer
This doesn’t always work though. You can check if the permissions have gone weird using this inside the container:
~$ ls /dev/nvidia* -l crw-rw-rw- 1 nobody nogroup 195, 0 Jul 15 08:24 /dev/nvidia0 crw-rw-rw- 1 nobody nogroup 195, 255 Jul 15 08:24 /dev/nvidiactl crw-rw-rw- 1 nobody nogroup 248, 0 Jul 15 08:24 /dev/nvidia-uvm
If the permissions, owners, module major and minor numbers are different to the above, or any of the files are missing, then there is a problem. If restarting the container doesn’t fix the issue, you could try the following in various orders:
I didn’t see any problems like this following the instructions in this post directly, but only when experimenting and trying different things.
If you already have an LXC container, you can do the following:
This should work for privileged containers also. For non-LXC containers, you will need to figure out how to make the Nvidia device files on the host available in the container, and to install the Nvidia drivers in the host and install them in the container without the kernel module, or just expose these files from the host.
Maybe you want to try running the container on something other than Ubuntu 15.04. You can install the latest stable LXC release from source on your distribution of choice, install Nvidia driver on the host system, then create a container as above. On different systems, the big difference is likely to be in the networking setup for the container. Also, on some systems you will have to add some entries to the /etc/subuid and /etc/subgid files. One thing you have to be aware of is I think the Nvidia driver files (.so files etc.) have to match the kernel module version, so you need to make sure the versions are exactly the same in the host and the container. This might be tricky e.g. if you install Nvidia driver on the host using the host packaging system, then try to run a different Linux distribution in the container. The CUDA SDK version doesn’t need to match the Nvidia driver version, it just needs to be a compatible version. Running a CUDA program will tell you if the Nvidia driver you have is compatible with your CUDA exe or not. The other issues are the possible problems with Nvidia permissions on the host (easily solved), and the device/permissions issues mentioned in the sidebar above.
Here is a small CUDA test program which can be used to check if CUDA is working on a system. The expected output is:
16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46
If there is a problem, you will almost certainly get an error message, so you shouldn’t need to go through the output to make sure the numbers match! Paste the below into hellocuda.cu and then compile with
nvcc hellocuda.cu -o hellocuda
You have to have the CUDA SDK installed to compile this.
#include <iostream> using namespace std; void _cudaCheck(cudaError_t err, const char *file, int line) { if (err != cudaSuccess) { cerr << "cuda error: " << cudaGetErrorString(err) << file << line << endl; exit(-1); } } #define cudaCheck(ans) { _cudaCheck((ans), __FILE__, __LINE__); } __global__ void add(int *a, int *b, int *c) { c[threadIdx.x] = a[threadIdx.x] + b[threadIdx.x]; } int main() { const int N = 16; int a[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; int b[N] = {16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}; const size_t sz = N * sizeof(int); int *da; cudaCheck(cudaMalloc(&da, sz)); int *db; cudaCheck(cudaMalloc(&db, sz)); int *dc; cudaCheck(cudaMalloc(&dc, sz)); cudaCheck(cudaMemcpy(da, a, sz, cudaMemcpyHostToDevice)); cudaCheck(cudaMemcpy(db, b, sz, cudaMemcpyHostToDevice)); add<<<1, N>>>(da, db, dc); cudaCheck(cudaGetLastError()); int c[N]; cudaCheck(cudaMemcpy(c, dc, sz, cudaMemcpyDeviceToHost)); cudaCheck(cudaFree(da)); cudaCheck(cudaFree(db)); cudaCheck(cudaFree(dc)); for (unsigned int i = 0 ; i < N; ++i) { cout << c[i] << " "; } cout << endl; return 0; }