0
votes

I'm running into an error when I try to compile CUDA with relocatable device code enabled (-rdc = true). I'm using Visual Studio 2013 as compiler with CUDA 7.5. Below is a small example that shows the error. To clarify, the code below runs fine when -rdc = false, but when set to true, the error shows up.

The error simply says: CUDA error 11 [\cuda\detail\cub\device\dispatch/device_radix_sort_dispatch.cuh, 687]: invalid argument

Then I found this, which says:

When invoked with primitive data types, thrust::sort, thrust::sort_by_key,thrust::stable_sort, thrust::stable_sort_by_key may fail to link in some cases with nvcc -rdc=true.

Is there some workaround to allow separate compilation?

main.cpp:

#include <stdio.h>
#include <vector>
#include "cuda_runtime.h"
#include "RadixSort.h"

typedef unsigned int uint;
typedef unsigned __int64 uint64;

int main()
{
   RadixSort sorter;

   uint n = 10;
   std::vector<uint64> test(n);
   for (uint i = 0; i < n; i++)
      test[i] = i + 1;

   uint64 * d_array;
   uint64 size = n * sizeof(uint64);

   cudaMalloc(&d_array, size);
   cudaMemcpy(d_array, test.data(), size, cudaMemcpyHostToDevice);

   try
   {
      sorter.Sort(d_array, n);
   }
   catch (const std::exception & ex)
   {
      printf("%s\n", ex.what());
   }
}

RadixSort.h:

#pragma once
typedef unsigned int uint;
typedef unsigned __int64 uint64;

class RadixSort
{
public:
   RadixSort() {}
   ~RadixSort() {}

   void Sort(uint64 * input, const uint n);
};

RadixSort.cu:

#include "RadixSort.h"

#include <thrust/device_vector.h>
#include <thrust/device_ptr.h>
#include <thrust/sort.h>

void RadixSort::Sort(uint64 * input, const uint n)
{
    thrust::device_ptr<uint64> d_input = thrust::device_pointer_cast(input);
    thrust::stable_sort(d_input, d_input + n);
    cudaDeviceSynchronize();
}
1
Regarding this: Is there some workaround to allow separate compilation? which GPU are you running on?Robert Crovella
Currently the GTX 760.RobbinMarcus
Try compiling with the architecture set to match your GTX 760, which should be cc3.0 I believe.Robert Crovella
Thanks, compling with compute_30 and sm_30 was indeed the solution. Any idea as to why _20 is not supported?RobbinMarcus
@Spectrallic: Could you please add a short answer describing your solution for the next person who comes along with the same problem?talonmies

1 Answers

1
votes

As mentioned in the comments by Robert Crovella:

Changing the CUDA architecture to a higher value will solve this problem. In my case I changed it to compute_30 and sm_30 under CUDA C++ -> Device -> Code Generation.

Edit:

The general recommendation is to select the best fit hierarchy for your specific GPU. See the link in comments for additional information.