When you launch a CUDA kernel that runs for more than two seconds on your GPU, you may notice a brief screen flicker followed by your CUDA kernel abruptly exiting. What happened?!?! You may have tripped the timeout detection and recovery (TDR) process, or the watchdog timer, in the Windows Display Driver Model (WDDM). This feature attempts to detect scenarios when the computer appears to be unresponsive. The Windows operating system tries to recover from this situation by reinitializing and resetting the display driver. Unfortunately, this process also terminates your kernel!
Symptoms of the GPU Display Timeout
To demonstrate the timeout, we will use the sample NVIDIA Monte Carlo code described here:
You can download the complete code example including Visual Studio projects from NVIDIA. This example is included in the CUDA Toolkit download as one of the CUDA samples.
In our test, we ran the Monte Carlo example with 8192 options on a single Titan X (Pascal) compiled in debug mode – release mode finishes too quickly to demonstrate the timeout. Without modifying any parameters, the executable completes in approximately 1 second. The output is shown below:
Figure 1 – The NVIDIA Monte Carlo Example Running under the TDR Threshold
Increasing the number of options increases the total run time. We chose to increase the total number of options by a factor of 4, which results in an approximate run time of 3-4 seconds. This execution time exceeds the TDR limit defined in WDDM. The change of nOptions is shown below:
Figure 2 – Increasing the Number of Options in the NVIDIA Monte Carlo Example by a Factor of 4
The output of the new test run is shown below:
Figure 3 – Output when the TDR process is Tripped in WDDM
You will notice that a cudaErrorLaunchFailure (code 4) is reported after the timeout, and is reported by cudaFree in our case as the NVIDIA reference code doesn’t immediately check the error. The NVIDIA documentation suggests that the problematic kernel and all subsequent CUDA calls should fail with a cudaErrorLaunchTimeout (code 6) instead.
In Windows 7, running a kernel longer than the 2 second timeout results in this system error message:
Figure 4 – Display Driver Stopped Responding in Windows 7
Avoiding the Timeout You can avoid the timeout by reducing the runtime of your kernels to less than 2 seconds. Additionally, there are two adjustments you can make to your system to prevent the TDR process from occurring for long running kernels:
- Switch the GPU to use the Tesla Compute Cluster mode
- TDR Registry Modifications
Solution #1: Tesla Compute Cluster (TCC) Mode
TCC mode will not be subject to the same timeout restrictions that Windows applies to GPUs driving display. TCC mode is only available for Tesla GPUs, most desktop Quadro GPUs, and most NVIDIA GeForce Titan GPUs. Laptop Quadro cards usually implement Optimus (a performance/power tradeoff feature), which makes it incompatible with TCC. Finally, GeForce cards do not support TCC. If you are not able to use TCC mode, you will need to modify the registry, described in the next section.
To enable TCC on a GPU, it cannot be driving the display – you will need a different GPU to display the graphics.
We will use NVIDIA System Management Interface (nvidia-smi) command line utility to switch modes. When we open a command prompt, navigate to the NVSMI directory, and type nvidia-smi.exe, we can see the current state of the GPU. The figure below shows the state of the GPU:
Figure 5 – Using nvidia-smi to Determine the State of the System
You can see that the Titan is set to WDDM mode. Using an administrative command prompt, we can use nvidia-smi to switch the mode from WDDM to TCC. Open an administrative command prompt and use the following command:
Where the -i argument refers to the zero-based GPU index, and the -dm refers to the driver model type (WDDM or TCC). This command will only work if run as an administrator and the GPU is not driving display. The output of a successful configuration change is shown
Figure 6 – Output of nvidia-smi after Changing Driver Model Types
A reboot is required for the changes to take effect. After rebooting the machine, we run nvidia-smi.exe again and notice the updated output:
Figure 7 – Using nvidia-smi to Determine the Updated State of the System
TCC has now been configured properly. Running the application with the larger number of options passes successfully:
Figure 8 – Running the Monte Carlo Example with TCC Mode Enabled
If you would like to return to WDDM mode, enter the command:
Solution #2: TDR Registry Modifications
If TCC mode is not supported or your graphics card drives a display, you can bypass the timeout by modify the registry. As with all registry modifications, exercise caution. Navigate to the following registry entry:
Now you will need to add a key. Right click and select:
Name the new value TdrDelay. Double click the entry and set the desired timeout in seconds. For example, we have changed the delay to 8 seconds as shown below:
Figure 9 – Setting the TDR Delay Threshold to 8 Seconds in the Registry
You will need to reboot the system for the new registry settings to take effect. Running the kernel with the longer delay results in a successful completion:
Figure 10 – Running the Monte Carlo Example with the TdrDelay Set to 8 Seconds
You are also able to completely disable TDR by adding a TdrLevel key instead of the TdrDelay key. Set the TdrLevel key to 0 to completely disable the timeout. Set TdrLevel back to 3 (default) to re-enable the timeout. Disabling the timeout entirely is not recommended as this can result in system instability if your kernel locks the display, for example a kernel that has an infinite loop.
Additional information about the TDR registry keys can be found on the Microsoft website here:
NVIDIA has added a Compute Preemption feature available with Pascal. It allows the GPU to suspend currently executing blocks, yield their spots in a streaming multiprocessor, and resume executing them at some point in the future. NVIDIA may use this feature in the future to address the display timeout in the future.
You can read more about compute preemption from NVIDIA here: