This can be a healthy approach for speed up, for some use-cases of optical flow.
To be sure we are on the same page, the optical flow will not directly be used as the main source of information for tracking the car. That is handled by another module... probably some object detector based on deep learning. This object detector is likely intense computation-wise (unless you got some special hardware for it, like a dedicated GPU or VPU).
The dense flow can be a great compliment, to get low-latency information about what is happening to the car and around it, and maybe be used as part of bootstrapping the later detector (providing a prior to the detector in the successive frames, as in a Kalman filter tracking sort of setting).
With this assumption, you can easily optimize your optical flow calculations as you mention. I would suggest you skip the notion of a dense mask here, just use the bounding box from your detector. This will be your region of interest for the flow calculation. If you are in frame t, where your detection is at, and want to use optical flow from 2 frames, t and t-1, then use the same bounding box for both frames. This will give you 2 ROIs, and those are fed to your dense optical flow module.
Each optical flow algorithm has support regions for their low-level operations, and they dont all handle image boundaries well. Therefore, make sure you supply ample extra space to your bounding boxes (as far as they can fit in the original video frames).
For clarity, if you have bounding box for car detection as bbox(x,y,w,h), you want a good margin m, so that the used bounding box for dense flow is bbox(x-m/2,y-m/2,w+m,h+m). The margin is something you want to set depending on your optical flow method (and its parameters).
edit: code for you here, you go. Note, I havent tested this at all:
// This is c++, since we are talking optimizations
// I am uncertain of they python wrappers, but below I am fairly certain
// will not make any deep copies outside of the flow module.
// Assume you have INew and IOld, the two frames of the video:
// assume your bounding box from your detector is a cv::Rect b
float m = 4.0; // 4 pixels as an example margin...
Rect boxFlow(b.x - m/2, b.y - m/2, b.width + m, b.height + m);
Mat roiNew(INew (boxFlow));
Mat roiOld(IOld (boxFlow));
Mat uflow;
calcOpticalFlowFarneback(roiOld, roiNew, uflow, 0.5, 3, 15, 3, 5, 1.2, 0);
However, depending on your setup, this may not be a significant speed up in the end. For above code, I dont know how memory is handled inside the flow module, and in your python version, I am even less certain how it will be handled.
Consider that in a streamlined dense optical flow approach (where you use same size data througout), you have the same size of your memory buffers in your pipeline. A naive implementation of the approach you mention would need to allocate memory dynamically, as the size and number of your bounding boxes will vary over time.