How to use Nvidia's Tensor Cores via Vulkan

Question

How can one make use of Nvidia's tensor cores (in a compute shader?!) using Vulkan?

There is this article by Nvidia Programming Tensor Cores in CUDA 9, but that's obviously focusing on CUDA. I am not too familiar with CUDA but it looks like some measures must be taken to enable computations on the Tensor cores, like the algorithm must be set to some kind special type, and some math type must be set to the value CUDNN_TENSOR_OP_MATH. I am wondering, if Tensor core acceleration could also be used from other APIs and I am especially interested in Vulkan.

More specifically, I'd like to dig into filters for denoising a bit more. To my understanding, filters mostly require exactly those mathematical operations which Tensor cores are able to accelerate, which are matrix-multiply-and-accumulate operations.

Actually, NVIDIA is the one that should be answering this. No point for us to speculate. The best place IMO to bring this up would be at github.com/KhronosGroup/Vulkan-Ecosystem/issues. There's some tangential discussion at github.com/KhronosGroup/Vulkan-Docs/issues/686, but I suggest not to pollute that Issue further. — krOoze

Krupip Krupip · Accepted Answer · 2019-02-27T17:01:56

Nvidia has recently added a few new extensions, one of them being VK_NV_COOPERATIVE_MATRIX which will allow the use of tensor cores inside Vulkan.

The capability for glslang to handle this new feature I believe was added yesterday which is why you haven't seen this until now (see here):

here are some examples of it being used:

https://github.com/KhronosGroup/glslang/blob/4605e2ed2b2b1acbe157d365c3c528367b8b168f/Test/spv.coopmat.comp

https://github.com/KhronosGroup/glslang/blob/4605e2ed2b2b1acbe157d365c3c528367b8b168f/Test/spv.1.3.coopmat.comp

#version 450 core
#extension GL_KHR_memory_scope_semantics : enable
#extension GL_NV_cooperative_matrix : enable
#extension GL_EXT_shader_explicit_arithmetic_types_float16 : enable

#pragma use_variable_pointers

layout (local_size_x = 64, local_size_y = 1, local_size_z = 1) in;

layout(set = 0, binding = 0) coherent buffer Block {
    float y[1024*1024];
    float x[];
} block;


void main()
{
    fcoopmatNV<32, gl_ScopeSubgroup, 16, 8> m = fcoopmatNV<32, gl_ScopeSubgroup, 16, 8>(0.0);

    m = m + m;
    m = m - m;
    m = -m;
    m = 2.0*m;
    m = m*2.0;

    coopMatLoadNV(m, block.x, 16, 128, false);
    coopMatStoreNV(m, block.x, 16, 128, false);
}

This appears to be quite analogous to how its done in CUDA, requiring explicit memory transfers to the memory where tensor cores can operate.

So to use them you need VK_NV_COOPERATIVE_MATRIX in vulkan and GL_NV_COOPERATIVE_MATRIX in glsl.

EDIT:

j00hi has mentioned that there is now an nvidia blog post on how to use these tensor cores.

How to use Nvidia's Tensor Cores via Vulkan

2 Answers