CUDA::Minimal and Error Handling
In the last few days I've been introducing my CUDA bindings for Perl that I've put on github called CUDA::Minimal. CUDA is a framework for writing and running massively parallel code on the highly parallel computing architecture that is your video card (assuming your card is capable of CUDA in the first place). Today I am going to discuss error handling in CUDA.
Error handling is a boring topic, but it's important, so I'm going to motivate it a bit. Consider this statement from version 4.0 of the CUDA C Best Practices Guide (which you can find here):
Code samples throughout the guide omit error checking for conciseness. Production code should, however, systematically check the error code returned by each API call...
In CUDA::Minimal, you don't have to sacrifice conciseness for error-checking...
Example: Unable to Allocate Memory
For example, suppose you try to allocate memory on your device and you run into trouble. This error is hard to make in Perl because it allows using underscores in numbers like 10_000, but let's assume that you accidentally allocate far more memory than you meant to allocate. Here's a fully working script that should produce the error (and only the error), at least on current hardware:
use strict; use warnings; use CUDA::Minimal; # Oops: 1 Terabyte of memory? my $input_dev_ptr = Malloc( Sizeof f => 10e12); print "I've escaped the error!\n";
Although it may seem a little contrived, this will croak with an informative message:
Unable to allocate 4294967295 bytes on the device: out of memory at cuda-error.pl line 6
Malloc can croak for a handful of reasons, but this error comes to
Malloc from CUDA itself: we're asking for too much memory. The system croaks with this message. Contrast that with the CUDA-C code, which would chug merrily along unless you checked the error condition.
The simple way to handle these sorts of errors is to use
eval blocks and capture errors when you know how to respond.
Thread Launch Problems
An interesting feature of CUDA is that kernel-launches are non-blocking. When you launch a kernel, as demonstrated in the opening example , control returns to the CPU immediately after the kernel starts running. This is handy because it allows you to do other things on the CPU while the calculations run on the GPU, such as logging.
However, non-blocking kernel launches cannot directly report run time errors in your kernel, such as a segmentation fault. To make matters even more confusing, kernel launch failures will trip errors when you call other functions, such as memory allocations or transfers, because the system remains in an error state until you clear it. So if you get an "Unspecified launch failure," scan backward from that point in the code to find the offending kernel launch.
To be sure you've found the correct problematic kernel launch, place the following immediately after your kernel launch:
TreadSynchronize; croak ("Found error") if ThereAreCudaErrors;
At the moment, there are three error-related functions you should know about:
ThereAreCudaErrors: boolean function that returns true when an error exists and false otherwise.
PeekAtLastError: Returns the string describing the last error, or 'no errors'
GetLastError: Like PeekAtLastError, but also resets the error status.
DeviceReset: Resets the device so that future kernel launches do not fail from a previous "Unspecified launch failure"