First off, its not the BCC you deal with, but rather its first order approximation, often called the optical flow constraint (OFC).
The OFC will not hold for your example and the reason is that you are using in-appropriate estimations of derivatives.
The first hint that something is amiss should arise when you notice that you have two images, one for t=0, and one for t=1. Which one should you use for estimating I_x and I_y derivatives? You picked t=0, why not t=1? An approach could be to take both and then average, i.e.:
I_x = ( I_x(t=0) + I_y(t=1) )/2
Doing so would result in a derivative filter which is no longer 2-points, but rather 4-points with a kernel stretching into the t-domain. Many experts in optical flow use both kinds of I_x instead of averaging, and get two kinds of optical flow: one forward, and one backwards, which they can then post-process.
How you pick your derivate estimators will critcally effect the behaviour of your optical flow algorithm!
When working with numerical estimations of diff-equations, it is more common to use centralized differential estimations, that is:
I_x = (I(x+1,y) - I(x-1,y))/2
In optical flow, this is common to do, and then revert to the smaller single sided difference when near to the boundaries of the image.
However, this will still not perform as good as you would like in making the OFC hold. If you investigate the OFC, it can be re-written as:
I_t = - I_x v_x - I_y I_y
I_t = - |v| ( k grad(I))
where v = (v_x, v_y) = |v|(cos(a),sin(a)) is the motion vector, and k = (cos(a),sin(a)), the unit vector of the motion. The quantity ( k grad(I)) is what is called a directional derivative. In order for this to hold, you should not use 2-point centralized differences either. What you should do is to use estimates for I_t, I_x and I_y that share the same kernel/support region. This could for example be 3-by-3-by-2 size filters.
In Matlab, the code for this could be for example:
[dx, dy, dt] = grad3D(imNew,imPrev)
gg = [0.2163, 0.5674, 0.2163];
f = imNew + imPrev;
dx = f(:,[2:end end]) - f(:,[1 1:(end-1)]);
dx = conv2(dx,gg','same');
dy = f([2:end end],:) - f([1 1:(end-1)],:);
dy = conv2(dy,gg ,'same');
dt = 2*conv2(gg,gg,imNew - imPrev,'same');
The reason people still use smaller derivate filters sometimes is because you do not lose any resolution, which is the drawback of using larger kernels. Note that the above algorithm for derivative estimation is not the same as performing 2 point derivatives in coarser scales. Many believe so, but it is not true.
In the interactive tutorial and toolbox on optical flow that I wrote, you can get more intution about these issues. Run the toolbox with the option:
in.method = 'gradient';
and you will get an interactive option that might help you get more intuition.