Block based Motion Estimation in Video Compression

1

votes

As we know almosty all video encoders use some temporal coding. It uses block (Rectangular area) based motion estimation to find best macth of a block of pixels for a current frame in reference / previous frames. This gives the motion vector. This is fine if the motion is translational(i.e. if the block moves to left/right or up/down) What if the object rotates and if the object was rectangular in shape and it rotates, then motion estimation would not be so accurate and hence would not result in least presidual(original minus prediction).

So what methods does a video encoder adopt to deal with such rotational motions./movements.

Does it then handle such situation by coding that block as Intra block(Code as it is without any reference to any previous) within the P frame

or

are there any other tricks at hand to deal it while coding it as P macroblock itself?

video-encodingmotion

2

votes

As far as I know, video encoders don't have any special case for rotational movements. First, detection of rotational motion itself would consume a lot of time. Also, motion estimation is done at the macroblock level and therefore, there might be quite a few macroblocks in the frame that are not moving in a rotational manner, unless the whole frame itself is somehow rotating.

One "trick" that I can suggest is the following-

Calculate PSNR between predicted frame (P Frame) and actual frame. If PSNR is too low, it makes more sense to encode the frame as an information frame (I Frame). Note that this cannot be done for live transmissions because it would be time consuming. But it can be done when encoding time is not a factor. In that case you could simply use a Full Search.

1

votes

The point of motion estimation is that it is a computationally cheap way of reducing 'typical' videos.

If you were to use motion based coding on something like a video of a waterfall it would fail to reduce the size.

A similar concept applies to JPEG photos. The JPEG compression only works because it takes advantage of the particular sensitivity of the human eye.

Ultimately, data is data and you cannot losslessly reduce the amount of it. The best you can do is to make some guesses about the source and destination and then try to recreate something that will be indistinguishable to the viewer, but which uses less data. That is why motion estimation WORKS. 99.99 percent of movies that humans watch have humans in them, moving around like humans do...left and right...up and down. And by WORKS, I mean, can be done in a quick enough time to make it worthwhile to do it for millions of hours of footage produced every year.

This, of course has something to do with Shannon entropy http://en.wikipedia.org/wiki/Entropy_(information_theory) , but that article makes my brain start to seep out through my eye sockets a bit...

1

votes

First thing is the computational complexity which increases dramatically for every addition of a rotational direction. For example, the Motion estimation time is 'x' seconds. After adding say right hand 90 degrees, we have again 'x' seconds, since it needs to check the same reference frame search window again with the rotated block. Again after adding the left rotation 90 degrees, again it adds another x seconds to motion estimate, and so on. And the main issue here is that, in the entire encoder, typically, Motion Estimation is the block which consumes major part of encoding time.

Second issue is the complexity of motion compensation unit. If we have rotational block in estimation or prediction then we must generate the same transformation for generating the compensated frame, in the encoder and decoder too. The worst thing is that it adds much complexity in the decoder side also.

The third thing is the prediction unit for the support of variable block size. The standard always defines motion vectors for the block sizes which are fixed. If rotational block sizes are proposed, then the directions needs to be standardized in decoder also, where motion compensation unit, entropy encoder/decoder etc.

The fourth thing is the Motion Vector Coding. Since we add the rotational motion vectors, we need to add extra bits to motion vectors.So, put these things in a beam balance - "adding addition bits for each MV" and "improving compression efficiency using rotational Motion vectors", which one weighs more. If the balance is balanced, or if "adding extra bits for each MV" weighs more, then there is no use of using rotational MVs.

Fifth thing is about the deep understanding of the encoder block diagram. The encoder which we are using is analogous to adaptive Differential Pulse Code Modulator or any similar type with predictive coding. The video signal is always encoder differentially. When a video signal or any signal is coded differentially, the time difference between previous and the current sample is infinitesimally small (here 1/frame-rate), such that the individual blocks always follow translational direction.So, we get in use, the rotational MVs only if we are using multiple reference frame when reference frame if larger than frame-rate or at least larger than GOP-size. So, in this case rotational MVs could give very slight improvement in PSNR or increase Motion Estimation time dramatically.

Another thing is about the subjective and statistical study of the Motion direction.

Despite all these, there are some proposals in JCT-VC for implementing this, but finally not approved in current HEVC standard. May be they will try to figure it out and solve all the issues in future.

Block based Motion Estimation in Video Compression

3 Answers