CUDA runtime error (59) : device-side assert triggered

Question

I have access to Tesla K20c, I am running ResNet50 on CIFAR10 dataset... Then I get the error as:
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/THC/generated/../generic/THCTensorMathPointwise.cu line=265 error=59 : device-side assert triggered
Traceback (most recent call last):
File "main.py", line 109, in <module>
train(loader_train, model, criterion, optimizer)
File "main.py", line 54, in train optimizer.step()
File "/usr/local/anaconda35/lib/python3.6/site-packages/torch/optim/sgd.py", line 93, in step
d_p.add_(weight_decay, p.data) RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/THC/generated/../generic/THCTensorMathPointwise.cu:265
How to resolve this error

try running your script with CUDA_LAUNCH_BLOCKING=1 python your_script.py to get a more accuracte stack trace. — McLawrence
after running with CUDA_LAUNC...=1, I get the error as /opt/conda/.../THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [0,0,0] Assertion t >= 0 && t < n_classes failed. This would come around 20 times. then the Traceback follows: RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1524580978845/work/aten/src/THCUNN/generic/ClassNLLCriterion.cu:116 how to resolve? — saichand
This is an error with your target labels: t >= 0 && t < n_classes. print your labels and make sure that they are positive and smaller than the number of outputs of your last layer. — McLawrence
n_classes should be same as the output of the last layer.. Is it right? — saichand

McLawrence McLawrence · Accepted Answer · 2018-08-06T06:28:34

In general, when encountering cuda runtine errors, it is advisable to run your program again using the CUDA_LAUNCH_BLOCKING=1 flag to obtain an accurate stack trace.

In your specific case, the targets of your data were too high (or low) for the specified number of classes.

CUDA runtime error (59) : device-side assert triggered

5 Answers