The "pixel" disparity is defined in rectified image coordinates. However, as your real cameras will not normally be exactly parallel and row-aligned, there is a non-identity transformation that rectifies your input camera images. Therefore you need to "undo" the rectification in order to find the pixel in the other image corresponding to a given one. The procedure is as follows:
- User selects a point in, say, the left image, giving you a pair of image coordinates (xl, yl).
- Apply the left rectification transform to them, obtaining their corresponding left rectified image coordinates. If you are using one of the common linear rectification methods, this is (xlr, ylr, wlr)' = Hlr * (xl, yl, 1)' , where Hlr is the left rectification homography.
- Look up the disparity at map at (xlr / wlr, ylr / wlr), obtaining the pixel's disparity value d (here I assume that your stereo algorithm yields a left-to-right disparity map for the X coordinate).
- The matching point in the right rectified image is then (xrr, yrr) = (d + xlr / wlr, ylr / wlr)
- Apply the inverse of the right rectification transform to get the corresponding pixel in right image coordinates (xr, yr, wr)' = Hrr^-1 * (xrr, yrr, 1)'
Note that all these operations need be performed once only for each pixel, and can be cached. In other words, you can pre-compute a "rectified" 2-channel disparity map that for each pixel yields an offset from its coordinates in one image to the corresponding pixel in the other image. The map itself can be stored as an image, whose channel type depends on the disparity range - usually short integer will be enough, as it can represent offsets of +- 32K pixels.