2
votes

How does live streaming, using any protocol/codec, work from end-to-end?

I have been searching google, youtube, FFMPEG documentation, OBS source code, stack overflow, but still cannot understand how live video streaming works, from videos. So I am trying to capture desktop screenshots and convert that to a live video stream that is H.264 encoded.

What I know how to do:

  1. Capture screenshot images using Graphics.CopyFromScreen with C# on some loop
  2. Encode the bits and save images as JPEG files
  3. Send JPEG image in base64 one-at-a-time and write it on a named pipe server
  4. Read image buffer from named pipe on a nodejs server
  5. Send base64 jpeg image over socket to client to display on a web page, each frame

What I want to be able to do:

  1. Encode, I assume chunks, images into some H.264 format for live streaming with one of the protocols (RTMP, RTSP, HLS, DASH)
  2. Push the encoded video chunks onto a server (such as an RTMP server), continuously (I assume ever 1-2 seconds?)
  3. Access server from a client to stream and display live video

I've tried using FFMPEG to continuously send .mp4 files onto an RTMP server but this doesn't seem to work as it closes the connection after each video. I have also looked into using ffmpeg concat lists but this just combines videos, it can't append videos read by a live stream to my understanding and probably wasn't made for that.

So my best lead is from this stackoverflow answer which suggests:

  1. Encode in FLV container, set duration to be arbitrarily long (according to the answer, youtube used this method)
  2. Encode the stream into RTMP, using ffmpeg or other opensource rtmp muxers
  3. Convert stream into HLS

How is this encoding and converting done? Can this all be done with ffmpeg commands?

2
ffmpeg can be used for encoding different codec types. it also support different streaming protocols in it.mail2subhajit

2 Answers

0
votes

MediaFoundation will be the best solution for you since you have mentioned C#.

You may check my sample (fully tested) which is based on Desktop duplication for desktop capture. The sample encodes the frames into H264 using media-foundation and live-streams the output video wrapped in RTP using Live555. I'm able to achieve up to 100FPS through this approach. Live555 also supports RTSP, HLS & MPEG.

You can also check this one which is based on the GDI approach for desktop capture. And the h264 encoded video is streamed in MPEG container format.

Here are some more reference links for you.

  1. https://github.com/ashumeow/webrtc4all/blob/master/gotham/MFT_WebRTC4All/test/test_encoder.cc
  2. DXGI Desktop Duplication: encoding frames to send them over the network
  3. Getting green screen in ffplay: Streaming desktop (DirectX surface) as H264 video over RTP stream using Live555
  4. Intel graphics hardware H264 MFT ProcessInput call fails after feeding few input samples, the same works fine with Nvidia hardware MFT
  5. Color conversion from DXGI_FORMAT_B8G8R8A8_UNORM to NV12 in GPU using DirectX11 pixel shaders
  6. GOP setting is not honored by Intel H264 hardware MFT
  7. Encoding a D3D Surface obtained through Desktop Duplication using Media Foundation
0
votes

if your requirement is w.r.t the screen capture you can use any codec - H.264 , HEVC or mJPEG

Based on the platform select the input interface - like v4l2 or dshow

Go through the below links for the cmd usage for specific streaming scenario.

RTMP/RTSP Streaming

[ffmpeg][1] - https://trac.ffmpeg.org/wiki/StreamingGuide

HLS /DASH Streaming

[mp4Box][1] - https://gpac.wp.imt.fr/mp4box/

Note : for HLS /DASH streaming you need to setup a HTTP server - nginx, IIS or apache server.