6
votes

How can one make use of Nvidia's tensor cores (in a compute shader?!) using Vulkan?

There is this article by Nvidia Programming Tensor Cores in CUDA 9, but that's obviously focusing on CUDA. I am not too familiar with CUDA but it looks like some measures must be taken to enable computations on the Tensor cores, like the algorithm must be set to some kind special type, and some math type must be set to the value CUDNN_TENSOR_OP_MATH. I am wondering, if Tensor core acceleration could also be used from other APIs and I am especially interested in Vulkan.

More specifically, I'd like to dig into filters for denoising a bit more. To my understanding, filters mostly require exactly those mathematical operations which Tensor cores are able to accelerate, which are matrix-multiply-and-accumulate operations.

2
Actually, NVIDIA is the one that should be answering this. No point for us to speculate. The best place IMO to bring this up would be at github.com/KhronosGroup/Vulkan-Ecosystem/issues. There's some tangential discussion at github.com/KhronosGroup/Vulkan-Docs/issues/686, but I suggest not to pollute that Issue further.krOoze

2 Answers

5
votes

Nvidia has recently added a few new extensions, one of them being VK_NV_COOPERATIVE_MATRIX which will allow the use of tensor cores inside Vulkan.

The capability for glslang to handle this new feature I believe was added yesterday which is why you haven't seen this until now (see here):

enter image description here

here are some examples of it being used:

https://github.com/KhronosGroup/glslang/blob/4605e2ed2b2b1acbe157d365c3c528367b8b168f/Test/spv.coopmat.comp

https://github.com/KhronosGroup/glslang/blob/4605e2ed2b2b1acbe157d365c3c528367b8b168f/Test/spv.1.3.coopmat.comp

#version 450 core
#extension GL_KHR_memory_scope_semantics : enable
#extension GL_NV_cooperative_matrix : enable
#extension GL_EXT_shader_explicit_arithmetic_types_float16 : enable

#pragma use_variable_pointers

layout (local_size_x = 64, local_size_y = 1, local_size_z = 1) in;

layout(set = 0, binding = 0) coherent buffer Block {
    float y[1024*1024];
    float x[];
} block;


void main()
{
    fcoopmatNV<32, gl_ScopeSubgroup, 16, 8> m = fcoopmatNV<32, gl_ScopeSubgroup, 16, 8>(0.0);

    m = m + m;
    m = m - m;
    m = -m;
    m = 2.0*m;
    m = m*2.0;

    coopMatLoadNV(m, block.x, 16, 128, false);
    coopMatStoreNV(m, block.x, 16, 128, false);
}

This appears to be quite analogous to how its done in CUDA, requiring explicit memory transfers to the memory where tensor cores can operate.

So to use them you need VK_NV_COOPERATIVE_MATRIX in vulkan and GL_NV_COOPERATIVE_MATRIX in glsl.

EDIT:

j00hi has mentioned that there is now an nvidia blog post on how to use these tensor cores.

-2
votes

Tensor cores is a niche feature which might not make it as a Vulkan extension. You could still use CUDA to do your tensor core accelerated computation and have the data be shared between the CUDA and Vulkan contexts.

Check this sample: cuda vulkan interop

Note that since synchronization would be necessary between launching the CUDA kernel and working with the result from the Vulkan side, performance might suffer. You will have to evaluate the costs in your application.