NVencs Output Bitstream is not readable

Question

I have one question related to Nvidias NVenc API. I want to use the API to encode some OpenGL graphics. My problem is, that the API reports no error throughout the whole program, everything seems to be fine. But the generated output is not readable by, e.g. VLC. If I try to play the generated file, VLC would flash a black screen for about 0.5s, then ends the playback. The Video has the length of 0, the size of the Vid seems rather small, too. Resolution is 1280*720 and the size of 5secs recording is only 700kb. Is this realistic?

The flow of the application is as following:

Render to secondary Framebuffer
Download Framebuffer to one of two PBOs (glReadPixels())
Map the PBO of the previous frame, to get a pointer understandable by Cuda.
Call a simple CudaKernel converting OpenGLs RGBA to ARGB which should be understandable by NVenc according to this(p.18). The kernel reads the content of the PBO and writes the converted content into a CudaArray (created with cudaMalloc) which is registered as InputResource with NVenc.
The content of the converted Array gets encoded. A completion event plus the corresponding output bitstream buffer get queued.
A secondary thread listens on the queued output events, if one event is signaled, the Output Bitstream gets mapped and written to hdd.

The initializion of NVenc-Encoder:

InitParams* ip = new InitParams();
m_initParams = ip;
memset(ip, 0, sizeof(InitParams));
ip->version = NV_ENC_INITIALIZE_PARAMS_VER;
ip->encodeGUID = m_encoderGuid;  //Used Codec
ip->encodeWidth = width; // Frame Width
ip->encodeHeight = height; // Frame Height
ip->maxEncodeWidth = 0; // Zero means no dynamic res changes
ip->maxEncodeHeight = 0; 
ip->darWidth = width; // Aspect Ratio
ip->darHeight = height; 
ip->frameRateNum = 60; // 60 fps
ip->frameRateDen = 1; 
ip->reportSliceOffsets = 0; // According to programming guide
ip->enableSubFrameWrite = 0;
ip->presetGUID = m_presetGuid; // Used Preset for Encoder Config

NV_ENC_PRESET_CONFIG presetCfg; // Load the Preset Config
memset(&presetCfg, 0, sizeof(NV_ENC_PRESET_CONFIG));
presetCfg.version = NV_ENC_PRESET_CONFIG_VER;
presetCfg.presetCfg.version = NV_ENC_CONFIG_VER;
CheckApiError(m_apiFunctions.nvEncGetEncodePresetConfig(m_Encoder,
    m_encoderGuid, m_presetGuid, &presetCfg));
memcpy(&m_encodingConfig, &presetCfg.presetCfg, sizeof(NV_ENC_CONFIG));
// And add information about Bitrate etc
m_encodingConfig.rcParams.averageBitRate = 500000;
m_encodingConfig.rcParams.maxBitRate = 600000;
m_encodingConfig.rcParams.rateControlMode = NV_ENC_PARAMS_RC_MODE::NV_ENC_PARAMS_RC_CBR;
ip->encodeConfig = &m_encodingConfig;
ip->enableEncodeAsync = 1; // Async Encoding
ip->enablePTD = 1; // Encoder handles picture ordering

Registration of CudaResource

m_cuContext->SetCurrent(); // Make the clients cuCtx current
NV_ENC_REGISTER_RESOURCE res;
memset(&res, 0, sizeof(NV_ENC_REGISTER_RESOURCE));
NV_ENC_REGISTERED_PTR resPtr; // handle to the cuda resource for future use
res.bufferFormat = m_inputFormat; // Format is ARGB
res.height = m_height;
res.width = m_width;
// NOTE: I've set the pitch to the width of the frame, because the resource is a non-pitched 
//cudaArray. Is this correct? Pitch = 0 would produce no output.
res.pitch = pitch; 
res.resourceToRegister = (void*) (uintptr_t) resourceToRegister; //CUdevptr to resource
res.resourceType = 
    NV_ENC_INPUT_RESOURCE_TYPE::NV_ENC_INPUT_RESOURCE_TYPE_CUDADEVICEPTR;
res.version = NV_ENC_REGISTER_RESOURCE_VER;
CheckApiError(m_apiFunctions.nvEncRegisterResource(m_Encoder, &res));
m_registeredInputResources.push_back(res.registeredResource);

Encoding

m_cuContext->SetCurrent(); // Make Clients context current
MapInputResource(id); //Map the CudaInputResource
NV_ENC_PIC_PARAMS temp;
memset(&temp, 0, sizeof(NV_ENC_PIC_PARAMS));
temp.version = NV_ENC_PIC_PARAMS_VER;
unsigned int currentBufferAndEvent = m_counter % m_registeredEvents.size(); //Counter is inc'ed in every Frame
temp.bufferFmt = m_currentlyMappedInputBuffer.mappedBufferFmt;
temp.inputBuffer = m_currentlyMappedInputBuffer.mappedResource; //got set by MapInputResource
temp.completionEvent = m_registeredEvents[currentBufferAndEvent];
temp.outputBitstream = m_registeredOutputBuffers[currentBufferAndEvent];
temp.inputWidth = m_width;
temp.inputHeight = m_height;
temp.inputPitch = m_width;
temp.inputTimeStamp = m_counter;
temp.pictureStruct = NV_ENC_PIC_STRUCT_FRAME; // According to samples
temp.qpDeltaMap = NULL;
temp.qpDeltaMapSize = 0;

EventWithId latestEvent(currentBufferAndEvent,
    m_registeredEvents[currentBufferAndEvent]);
PushBackEncodeEvent(latestEvent); // Store the Event with its ID in a Queue

CheckApiError(m_apiFunctions.nvEncEncodePicture(m_Encoder, &temp));
m_counter++;
UnmapInputResource(id); // Unmap

Every little hint, where to look at, is very much appreciated. I'm running out of ideas what might be wrong.

Thanks a lot!

You don't give much details, but sounds like a common problem for VLC when dealing with RAW bitsreams: VLC cannot playback them if it isn't told the codec. To do so, try giving the file the right ending, e.g. "filename.h264" for h264 codec. — kunzmi
It seems that some issues have been sorted out in the cross posting. — Robert Crovella
Yes indeed, when everything is solved I'll give a detailed answer. — Christoph

Christoph Christoph · Accepted Answer · 2016-08-04T15:57:24

With the help of hall822 from the nvidia forums I managed to solve the issue.

The primary error was that I registered my cuda resource with a pitch equal to the size of the frame. I'm using a Framebuffer-Renderbuffer to draw my content into. The data of this is a plain, unpitched array. My first thought, giving a pitch equal to zero, failed. The encoder did nothing. The next idea was to set it to the width of the frame, a quarter of the image was encoded.

// NOTE: I've set the pitch to the width of the frame, because the resource is a non-pitched 
//cudaArray. Is this correct? Pitch = 0 would produce no output.
res.pitch = pitch;

To answer this question: Yes, it is correct. But the pitch is measured in byte. So because I'm encoding RGBA-Frames, the correct pitch has to be FRAME_WIDTH * 4.

The second error was that my color channels were not right (See point 4 in my opening post). The NVidia enum says that the encoder expects the channels in ARGB format but actually ment is BGRA, so the alpha channel which is always 255 polluted the blue channel.

Edit: This may be due to the fact that NVidia is using little endian internally. I'm writing my pixel data to a byte array, choosing an other type like int32 may allow one to pass actual ARGB data.

NVencs Output Bitstream is not readable

1 Answers