3
votes

I want to estimate distance (camera to a point in the ground : that means Yw=0) from a given pixel coordinate of that point . For that I used camera calibration methods

But the results are not meaningful.

I have following details to calibration

-focal length x and y , principal point x and y, effective pixel size in meters , yaw and pitch angles and camera heights etc.

-I have entered focal length ,principal points and translation vector in terms of pixels for calculation

-I have multiplied image point with camera_matrix and then rotational| translation matrix (R|t), to get the world point.

Is my procedure correct?? What can be wrong ?

result
image_point(x,y) =400,380
world_point z co ordinate(distance) = 12.53

image_point(x,y) =400,180
world_point z co ordinate(distance) = 5.93

problem I am getting very few pixels for z coordinate , That means z co ordinate is << 1 m , (because effective pixel size in meters = 10 ^-5 )

This is my matlab code

%positive downward pitch
xR = 0.033;
yR = 0;
zR = pi;



%effective pixel size in meters = 10 ^-5 ; focal_length x & y = 0.012 m 
% principal point x & y = 320 and 240 
intrinsic_params =[1200,0,320;0,1200,240;0,0,1];
Rx=[1,0,0 ; 0,cos(xR),sin(xR); 0,-sin(xR),cos(xR)];
Ry=[cos(yR),0,-sin(yR) ; 0,1,0 ; sin(yR),0,cos(yR)];
Rz=[cos(zR),sin(zR),0 ; -sin(zR),cos(zR),0 ; 0,0,1];

R= Rx * Ry * Rz ;


% The camera is 1.17m above the ground
t=[0;117000;0];

extrinsic_params = horzcat(R,t);

% extrinsic_params is  3 *4  matrix

P = intrinsic_params * extrinsic_params; % P 3*4 matrix

% make it square ....
P_sq = [P; 0,0,0,1];

%image size is 640 x 480
%An arbitrary pixel 360,440 is entered as input

image_point = [400,380,0,1]; 
% world point will be in the form X Y Z 1 
world_point =    P_sq *  image_point'
1
Welcome to SO. Although you have described your procedure, your Matlab code would be easier to use in an answer, because there is less chance of misunderstanding. Also, could you clarify "not meaningful" - perhaps give an example input and expected output?Neil Slater
Your procedure looks correct, so you shoudl be making a mistake somewhere probbly. AsNeil proposed, post your code with explanations part by part, as that will be easier to debug.Ander Biguri
I edited my post by adding matlab code. thankssa1991
I also added code's result and the problemsa1991

1 Answers

1
votes

Your procedure is kind of right, however it is going in the wrong direction. See this link. Using your intrinsic and extrinsic calibration matrix you can find the pixel-space position of a real-world vector, NOT the other way around. The exception to this is if your camera is stationary in the global frame and you have the Z position of the feature in the global space.

Stationary camera, known feature Z case: (see also this link)

%% First we simulate a camera feature measurement
K = [0.5 0 320;
    0 0.5 240;
    0 0 1]; % Example intrinsics
R = rotx(0)*roty(0)*rotz(pi/4); % orientation of camera in global frame
c = [1; 1; 1]; %Pos camera in global frame

rwPt = [ 10; 10; 5]; %position of a feature in global frame
imPtH = K*R*(rwPt - c); %Homogeneous image point
imPt = imPtH(1:2)/imPtH(3) %Actual image point


%% Now we use the simulated image point imPt and the knowledge of the
% features Z coordinate to determine the features X and Y coordinates
%% First determine the scaling term lambda
imPtH2 = [imPt; 1]; 
z = R.' * inv(K) * imPtH2;
lambda = (rwPt(3)-c(3))/z(3);

%% Now the RW position of the feature is:
rwPt2 = c + lambda*R.' * inv(K) * imPtH2 % Reconstructed RW point

Non-stationary camera case:

To find the real-world position or distance from the camera to a particular feature (given on the image plane) you have to employ some method of reconstructing the 3D data from the 2D image.

The two that come to mind immediately is opencv's solvePnP and stereo-vision depth estimation. solvePnP requires 4 co-planar (in RW space) features to be available in the image, and the positions of the features in RW space known. This may not sound useful as you need to know the RW position of the features, but you can simply define the 4 features with a known offset rather than a position in the global frame - the result will be the relative position of the camera in the frame the features are defined in. solvePnP gives very accurate pose estimation of the camera. See my example.

Stero vision depth estimation requires the same feature to be found in two spatially-separate images and the transformation between the images in RW space must be known very precisely.

There may be other methods but these are the two I am familiar with.