Cross-posted from http://openterrain.tumblr.com/post/109330474336/resolved-gdal-on-aws-gpu-instances
g2.2xlarge instances with Nvidia GRID K520 GPUs running Amazon Linux.
As we kicked off the new Knight News Grant, it was clear early on that we were going to be processing quite a lot of raster data. Given that, I wanted to ensure that we'd be able to benefit from GDAL's OpenCL-accelerated warping (for reprojection and scaling).
Ubuntu is typically my weapon of choice for these sorts of things, however I wanted to minimize hardware-related compatibility problems, and Amazon publishes an Amazon Linux AMI with NVIDIA GRID GPU Driver on the AWS Marketplace.
Straightforward, right? Sadly, no.
It seemed thoroughly unlikely that GDAL from yum would include OpenCL support (rightly so), so I went about compiling GDAL from source, omitting everything I didn't care about (basically everything except GeoTIFF, zlib, curl, and OpenCL support):
# enable EPEL (for proj-devel) sudo yum-config-manager --enable epel sudo yum -y update sudo yum -y install make automake gcc gcc-c++ libcurl-devel proj-devel cd /tmp curl -L http://download.osgeo.org/gdal/1.11.1/gdal-1.11.1.tar.gz | tar zxf - cd gdal-1.11.1 ./configure --prefix=/opt/local \ --with-threads \ --with-ogr \ --with-geos \ --without-libtool \ --with-libz=internal \ --with-libtiff=internal \ --with-geotiff=internal \ --without-gif \ --without-pg \ --without-grass \ --without-libgrass \ --without-cfitsio \ --without-pcraster \ --without-netcdf \ --without-png \ --without-jpeg \ --without-gif \ --without-ogdi \ --without-fme \ --without-hdf4 \ --without-hdf5 \ --without-jasper \ --without-ecw \ --without-kakadu \ --without-mrsid \ --without-jp2mrsid \ --without-bsb \ --without-grib \ --without-mysql \ --without-ingres \ --without-xerces \ --without-expat \ --without-odbc \ --without-sqlite3 \ --without-dwgdirect \ --without-idb \ --without-sde \ --without-perl \ --without-php \ --without-ruby \ --without-python \ --with-hide-internal-symbols \ --with-opencl \ --with-opencl-include=/opt/nvidia/cuda/include sudo make install # allow GDAL to find necessary libraries export LD_LIBRARY_PATH=/opt/local/lib:$LD_LIBRARY_PATH export PATH=/opt/local/bin:$PATH
My first discovery was that GDAL wouldn't even try to use the GPU, even when
explicitly requested to (using
-wo "USE_OPENCL=TRUE"). Running as
solved the problem, allowing subsequent non-
root invocations (without
explicitly requesting OpenCL) to also work.
watch nvidia-smi is a good way to see whether tasks are being offloaded to
Once I got it to start using the GPU, I immediately ran into a problem:
ERROR 1: Error: Failed to build program executable! Build Log: :55:20: error: cannot decrement value of type 'float __attribute__((address_space(1)))' dstPtr[iDstOffset] --; ~~~~~~~~~~~~~~~~~~ ^ ERROR 1: Error at file gdalwarpkernel_opencl.c line 2325: CL_BUILD_PROGRAM_FAILURE ERROR 1: OpenCL routines reported failure (-11) on line 3250. ERROR 1: Error: Failed to build program executable! Build Log: :55:20: error: cannot decrement value of type 'float __attribute__((address_space(1)))' dstPtr[iDstOffset] --; ~~~~~~~~~~~~~~~~~~ ^ ERROR 1: Error at file gdalwarpkernel_opencl.c line 2325: CL_BUILD_PROGRAM_FAILURE ERROR 1: OpenCL routines reported failure (-11) on line 3250.
Fortunately, this turned out to be a quick fix for Even Roualt (thanks!!),
though it did reinforce my fear of GPU compatibility headaches (
<thing> = <thing> - 1):
Index: alg/gdalwarpkernel_opencl.c =================================================================== --- alg/gdalwarpkernel_opencl.c (révision 28173) +++ alg/gdalwarpkernel_opencl.c (copie de travail) @@ -593,7 +593,7 @@ "if (dstPtr[iDstOffset] == dstMinVal)\n" "dstPtr[iDstOffset] = dstMinVal + 1;\n" "else\n" - "dstPtr[iDstOffset] --;\n" + "dstPtr[iDstOffset] = dstPtr[iDstOffset] - 1;\n" "}\n" "}\n" "#endif\n"
Once patched, it works like a dream, happily churning through jobs at an impressive clip.
This fix will be included in GDAL-1.11.2.
The requirement to run GDAL as
root to initialize the GPU turned out to be
a configuration issue in the Amazon AMI. Running
modprobe nvidia_uvm to load
the kernel driver for the GPU on boot solved the problem, but only after adding
udev rule (
/etc/udev/rules.d/99-nvidia.rules) to create the necessary
# /etc/udev/rules.d/99-nvidia.rules KERNEL=="nvidia_uvm", RUN+="/bin/sh -c '/bin/mknod -m 666 /dev/nvidia-uvm c $(grep nvidia-uvm /proc/devices | cut -d \ -f 1) 0'"