Recently i was trying to stream live video captured by webcam over udp. The approach I took was to read one frame,send it over udp and read the data at receiver side and display it.
Now, I understand that sending data over udp/tcp results in fragmentation which happens at any random fashion depending on the MTU of the transport layer and the underlying IP protocol does not guarantee number of frames that will be delivered. Minimum MTU of any data layer is said to be 1500 bytes.
However, my each frame is of 1MB ( ~1048576 bytes). So considering data fragmentation at 1500 bytes a single frame might get fragmented and the receiver would then get ~ 700 packets (1048576/1500). Now the receiver needs to accumulate data of all these 700 packets for just one frame which involves additional processing. Is this something normal, 700 packets for just 1 frame data !!. If I want to keep frame rate at just 24fps, which means the receiver has to process 700*24 = 16800 packets/second, which does not seem to be feasible.
I want to understand how does another streaming websites work, they definitely don't process 16800 data packets/second. They would be using other streaming protocols like RTSP, however these are built on top of UDP/TCP, which means these protocol also needs to handle the fragmentation. These days streaming websites can deliver 4k video, and each frame size will be much bigger than 1MB but the MTU is still 1500 bytes. They must also be doing data-compression, but to what extent . Even if they somehow manage to reduce frame size by 50% (which also needs to be de-compressed at receiver side which means additional processing) they will still need to process ~8000 data packets/second for a low quality 24fps video. How do they handle it, how do they manage data fragmentation at these scales.