Cross-posted from http://openterrain.tumblr.com/post/109330474336/resolved-gdal-on-aws-gpu-instances
Specifically, g2.2xlarge
instances with Nvidia GRID K520 GPUs running Amazon Linux.
As we kicked off the new Knight News Grant, it was clear early on that we were going to be processing quite a lot of raster data. Given that, I wanted to ensure that we'd be able to benefit from GDAL's OpenCL-accelerated warping (for reprojection and scaling).
Ubuntu is typically my weapon of choice for these sorts of things, however I wanted to minimize hardware-related compatibility problems, and Amazon publishes an Amazon Linux AMI with NVIDIA GRID GPU Driver on the AWS Marketplace.
Straightforward, right? Sadly, no.
It seemed thoroughly unlikely that GDAL from yum would include OpenCL support (rightly so), so I went about compiling GDAL from source, omitting everything I didn't care about (basically everything except GeoTIFF, zlib, curl, and OpenCL support):
# enable EPEL (for proj-devel)
sudo yum-config-manager --enable epel
sudo yum -y update
sudo yum -y install make automake gcc gcc-c++ libcurl-devel proj-devel
cd /tmp
curl -L http://download.osgeo.org/gdal/1.11.1/gdal-1.11.1.tar.gz | tar zxf -
cd gdal-1.11.1
./configure --prefix=/opt/local \
--with-threads \
--with-ogr \
--with-geos \
--without-libtool \
--with-libz=internal \
--with-libtiff=internal \
--with-geotiff=internal \
--without-gif \
--without-pg \
--without-grass \
--without-libgrass \
--without-cfitsio \
--without-pcraster \
--without-netcdf \
--without-png \
--without-jpeg \
--without-gif \
--without-ogdi \
--without-fme \
--without-hdf4 \
--without-hdf5 \
--without-jasper \
--without-ecw \
--without-kakadu \
--without-mrsid \
--without-jp2mrsid \
--without-bsb \
--without-grib \
--without-mysql \
--without-ingres \
--without-xerces \
--without-expat \
--without-odbc \
--without-sqlite3 \
--without-dwgdirect \
--without-idb \
--without-sde \
--without-perl \
--without-php \
--without-ruby \
--without-python \
--with-hide-internal-symbols \
--with-opencl \
--with-opencl-include=/opt/nvidia/cuda/include
sudo make install
# allow GDAL to find necessary libraries
export LD_LIBRARY_PATH=/opt/local/lib:$LD_LIBRARY_PATH
export PATH=/opt/local/bin:$PATH
My first discovery was that GDAL wouldn't even try to use the GPU, even when
explicitly requested to (using -wo "USE_OPENCL=TRUE"
). Running as root
solved the problem, allowing subsequent non-root
invocations (without
explicitly requesting OpenCL) to also work.
watch nvidia-smi
is a good way to see whether tasks are being offloaded to
the GPU.
Once I got it to start using the GPU, I immediately ran into a problem:
ERROR 1: Error: Failed to build program executable!
Build Log:
:55:20: error: cannot decrement value of type 'float
__attribute__((address_space(1)))'
dstPtr[iDstOffset] --;
~~~~~~~~~~~~~~~~~~ ^
ERROR 1: Error at file gdalwarpkernel_opencl.c line 2325:
CL_BUILD_PROGRAM_FAILURE
ERROR 1: OpenCL routines reported failure (-11) on line 3250.
ERROR 1: Error: Failed to build program executable!
Build Log:
:55:20: error: cannot decrement value of type 'float
__attribute__((address_space(1)))'
dstPtr[iDstOffset] --;
~~~~~~~~~~~~~~~~~~ ^
ERROR 1: Error at file gdalwarpkernel_opencl.c line 2325:
CL_BUILD_PROGRAM_FAILURE
ERROR 1: OpenCL routines reported failure (-11) on line 3250.
Fortunately, this turned out to be a quick fix for Even Roualt (thanks!!),
though it did reinforce my fear of GPU compatibility headaches (<thing>--
vs
<thing> = <thing> - 1
):
Index: alg/gdalwarpkernel_opencl.c
===================================================================
--- alg/gdalwarpkernel_opencl.c (révision 28173)
+++ alg/gdalwarpkernel_opencl.c (copie de travail)
@@ -593,7 +593,7 @@
"if (dstPtr[iDstOffset] == dstMinVal)\n"
"dstPtr[iDstOffset] = dstMinVal + 1;\n"
"else\n"
- "dstPtr[iDstOffset] --;\n"
+ "dstPtr[iDstOffset] = dstPtr[iDstOffset] - 1;\n"
"}\n"
"}\n"
"#endif\n"
Once patched, it works like a dream, happily churning through jobs at an impressive clip.
This fix will be included in GDAL-1.11.2.
The requirement to run GDAL as root
to initialize the GPU turned out to be
a configuration issue in the Amazon AMI. Running modprobe nvidia_uvm
to load
the kernel driver for the GPU on boot solved the problem, but only after adding
a udev
rule (/etc/udev/rules.d/99-nvidia.rules
) to create the necessary
device nodes:
# /etc/udev/rules.d/99-nvidia.rules
KERNEL=="nvidia_uvm", RUN+="/bin/sh -c '/bin/mknod -m 666 /dev/nvidia-uvm c $(grep nvidia-uvm /proc/devices | cut -d \ -f 1) 0'"