Build OpenCV 3.4.0 with CUDA and TBB support in Arch Linux

January 25, 2018

If you have NVIDIA GPU and Intel CPU (although TBB can be used with AMD CPU, I’m not sure but I’m sure you can’t use CUDA with AMD cards) then it is a must to build OpenCV with CUDA and TBB support. In this post I’ll teach you how to properly compile OpenCV with these functions.

I used OpenCV for HOG Vehicle detection and according to this article from OpenCV website, it gives 8x performance for pedestrian detection, I’ve experienced similar results as well.

Although there’s an opencv package in the official repository but I hightly recommend not using it because it’s a general purpose one and doesn’t support CUDA and TBB. Also I’ve experienced some issues with it and protobuf and I usually get other problems after every update.

Installing CUDA

Installing CUDA for opencv is pretty straightforward, just install only install nvidia and cuda packages. Both packages can be found on official repositories and as of this writing I have cuda 9.1.85-1 and nvidia 387.34-20 installed. Install this packages by:

sudo pacman -Syu
sudo pacman -S nvidia
sudo pacman -S cuda

CUDA doesn’t work with gcc 7 which by the way latest gcc is 7.2.1, luckily for us cuda package comes with gcc6 package. CUDA installs itself in /opt/cuda and it’s binaries in /opt/cuda/bin. You can see it symlinks gcc and g++ to those come with gcc6 package:

$ ls -l /opt/cuda/bin
total 66260
-rwxr-xr-x 1 root root    66416 Dec 13 08:09 bin2c
lrwxrwxrwx 1 root root        4 Dec 13 08:09 computeprof -> nvvp
drwxr-xr-x 2 root root     4096 Dec 15 13:06 crt
-rwxr-xr-x 1 root root  4555144 Dec 13 08:09 cudafe
-rwxr-xr-x 1 root root  4139256 Dec 13 08:09 cudafe++
-rwxr-xr-x 1 root root  8878568 Dec 13 08:09 cuda-gdb
-rwxr-xr-x 1 root root   577848 Dec 13 08:09 cuda-gdbserver
-rwxr-xr-x 1 root root      781 Dec 13 08:10 cuda-install-samples-9.1.sh
-rwxr-xr-x 1 root root   286872 Dec 13 08:09 cuda-memcheck
-rwxr-xr-x 1 root root   298824 Dec 13 08:09 cuobjdump
-rwxr-xr-x 1 root root   130704 Dec 13 08:09 fatbinary
lrwxrwxrwx 1 root root       14 Dec 13 08:10 g++ -> /usr/bin/g++-6
lrwxrwxrwx 1 root root       14 Dec 13 08:10 gcc -> /usr/bin/gcc-6
-rwxr-xr-x 1 root root  1150688 Dec 13 08:09 gpu-library-advisor
-rwxr-xr-x 1 root root      219 Dec 13 08:09 nsight
-rwxr-xr-x 1 root root     1533 Dec 13 08:09 nsight_ee_plugins_manage.sh
-rwxr-xr-x 1 root root   190880 Dec 13 08:09 nvcc
-rw-r--r-- 1 root root      393 Dec 13 08:09 nvcc.profile
-rwxr-xr-x 1 root root 18919536 Dec 13 08:09 nvdisasm
-rwxr-xr-x 1 root root  8416976 Dec 13 08:09 nvlink
-rwxr-xr-x 1 root root 11792784 Dec 13 08:09 nvprof
-rwxr-xr-x 1 root root    86296 Dec 13 08:09 nvprune
-rwxr-xr-x 1 root root      215 Dec 13 08:09 nvvp
-rwxr-xr-x 1 root root  8293440 Dec 13 08:09 ptxas
-rwxr-xr-x 1 root root     7686 Dec 13 08:09 uninstall_cuda_toolkit_9.1.pl

We will use these binaries when compiling OpenCV. It also installs an Eclipse based CUDA IDE called Nsight which is super cool. Although I hate Eclipse I really liked Nsight.

Installing TBB

Intel Threading Building Blocks is a thread library, designed by Intel (duh), and it’s super easy to work with (has a kind of rough learning curve) and has all sorts of thread data structures like bounded concurrent queues etc. I implemented a thread safe queue before installing this library but it didn’t work as expected but with TBB it’s super easy to use and store cv::Mat’s in them.

Compiling OpenCV with TBB wouldn’t effect performance as opposed to CUDA but still it’s better than nothing.

Also, another point to mention is current TBB has problems with gcc-libs, it’s compiled with gcc-libs 7.x so you will get this error if you try to build OpenCV with newest TBB:

/usr/lib/gcc/x86_64-pc-linux-gnu/6.3.1/../../../../lib/libtbb.so.2: undefined reference to `std::__exception_ptr::exception_ptr::exception_ptr(void*)@CXXABI_1.3.11'
/usr/lib/gcc/x86_64-pc-linux-gnu/6.3.1/../../../../lib/libtbb.so.2: undefined reference to `__cxa_init_primary_exception@CXXABI_1.3.11'
collect2: error: ld returned 1 exit status

This is also posted here, workaround is you should install TBB 2017_20170412-1 from Arch Linux Archive. To do this, first install newest TBB and then downgrade it:

sudo pacman -S intel-tbb
sudo pacman -U https://archive.archlinux.org/packages/i/intel-tbb/intel-tbb-2017_20170412-1-x86_64.pkg.tar.xz

If you get this error while installing:

loading packages...
error: '/var/cache/pacman/pkg/intel-tbb-2017_20170412-1-x86_64.pkg.tar.xz': package missing required signature

Then you should edit your /etc/pacman.conf with your favorite editor, just change this line:

# By default, pacman accepts packages signed by keys that its local keyring
# trusts (see pacman-key and its man page), as well as unsigned packages.
#SigLevel = Required DatabaseOptional
SigLevel = Never

Comment out SigLevel line, and add the fourth line. After installing you should revert your changes. This will temporarily disable package signature checking.

Keep in mind that after building OpenCV, you can install latest TBB without any errors. I’m using intel-tbb 2018_20171205-1 and it works just fine. It’ll only give compile time errors when building OpenCV if you don’t do this.

Install OpenCV Dependencies

opencv package in the official repository has some dependencies. You can see them here. We will install all these with this command:

sudo pacman -S openexr xine-lib libdc1394 gtkglext cblas lapack libgphoto2 hdf5 python-numpy python2-numpy cmake eigen lapacke mesa

Notice we didn’t install intel-tbb although it was on the dependency list.

Building OpenCV 3.4.0

Now comes the good part. We will first download and install opencv and opencv_contrib from Github. From OpenCV Release Page and opencv_contrib Releases Page, grab the Source Code zips, download and extract them to your ~/Downloads folder.

One thing to do before this is to check which version of CUDA architecture does your GPU support. Not specifying this while building will increase our build time 6x, keep in mind that build will run for ~20 minutes, not doing this will make it run for ~2 hours! Because if we don’t, OpenCV will compile against all CUDA architectures (there are 6 of them)(6 * 20 minutes = 2 hours, math checks out.).

To do this, go to NVIDIA CUDA GPU Page and look for your GPU there. For example I have GeForce 840M and it says my GPU supports CUDA 5.0. Codename for 5.0 is Maxwell. You can find more info here. Here’s the table for codenames:

Version	Codename
1.x	Tesla
2.x	Fermi
3.x	Kepler
5.x	Maxwell
6.x	Pascal
7.x	Volta

Note your codename using this table, we will use it later.

Now, make sure you extracted both of those libraries in the same folder, I will assume they are in ~/Downloads now. There should be two folders there named opencv-3.4.0/ and opencv_contrib-3.4.0/.

cd ~/Downloads/opencv-3.4.0/
mkdir build
cd build
export CXXFLAGS="-std=c++11"
export CXX=/opt/cuda/bin/g++
export CC=/opt/cuda/bin/gcc
cmake -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/usr/local -DOPENCV_EXTRA_MODULES_PATH=../../opencv_contrib-3.4.0/modules -D WITH_CUDA=ON -D WITH_CUBLAS=ON -D WITH_TBB=ON -D WITH_V4L=ON -D WITH_QT=OFF -D WITH_OPENGL=ON -D BUILD_PERF_TESTS=OFF -D BUILD_TESTS=OFF -DCUDA_NVCC_FLAGS="-D_FORCE_INLINES -std=c++11 --expt-relaxed-constexpr" -D BUILD_opencv_java=OFF -DCUDA_GENERATION=Maxwell -DBUILD_opencv_python=OFF -DBUILD_opencv_python2=OFF -DWITH_OPENMP=ON -DBUILD_DOCS=OFF ..

That EXPORT lines are telling cmake that it should use gcc 6.x, remember CUDA doesn’t support gcc 7.x so we need these lines. Also we will compile it in C++11 mode.

Now that huge cmake line sets up build files (makefiles) for building. I will explain what some of them does:

-D CMAKE_BUILD_TYPE=Release: Builds release build, gets rid of all debug features.
-D CMAKE_INSTALL_PREFIX=/usr/local: installs opencv in /usr/local
-DOPENCV_EXTRA_MODULES_PATH=../../opencv_contrib-3.4.0/modules: needed for extra modules like bgsegm (Background segmentation) and xobjdetect (extended object detection). You can see full list in OpenCV Documentation.
-D WITH_CUDA=ON: Enable CUDA (duh)
-D WITH_TBB=ON: Enable Intel TBB support
-D WITH_QT=OFF: Disable QT support, with QT you will get some buttons in cv::imshow window but it doesn’t play well with Nvidia Optimus (if you have two GPU’s and use bumblebee for example for Arch for optimus support)
-D CUDA_NVCC_FLAGS=”-D_FORCE_INLINES -std=c++11 –expt-relaxed-constexpr”: this is needed or else nvcc will gives error about relaxed something etc. It recommends setting that last flag.
-D BUILD_opencv_java=OFF: Disabled Java because I don’t use it.
-D CUDA_GENERATION=Maxwell: this line is important, change this and specify your GPU CUDA arch here, it will decrease your build time 6x.
-DBUILD_opencv_python=OFF: I included this because I only needed C++ files, if you use OpenCV with Python, you should remove this and other setting about Python.

Other settings are not that important you can leave them as is.

Note: If build fails, with identifier "__builtin_ia32_mwaitx" is undefined error, you can add -D_MWAITXINTRIN_H_INCLUDED -D__STRICT_ANSI__ just after -D_FORCE_INLINES in the cmake line.

Now after Cmake finishes generating build files, you should issue these commands:

make -j8
sudo make install

make -j8 builds OpenCV using 8 threads, you can change that number but anything between 4 and 16 is fine. Last command installs it in /usr/local/include, after the installation (if finished succesfully) you should see two directories there, named opencv and opencv2.

That’s all folks.