Adjust camera to match ray and screen point

Question

I'm working on free-view tool for 360-degree panorama made with three.js. I want the camera to rotate when user drags point from the screen leaving that point exactly under the mouse pointer.

Geometry is simple box geometry around world origin, camera is perspective camera located at the origin:

this.mesh = new THREE.Mesh(new THREE.BoxGeometry(2, 2, 2, 0, 0, 0),
    new THREE.MeshFaceMaterial(_array_of_THREE.MeshBasicMaterial_));
this.camera = new THREE.PerspectiveCamera(90, width / height, 0.1, 2);

I have a solution that works inaccurately, it is based on the following steps:

When user starts dragging, I remember the ray in world coordinates pointing to the screen point of drag start.
Whenever I want to adjust camera (currently - on drag end only), I compute ray in world coordinates that points on current position of mouse pointer.
I then compute axis and angle required to bring first ray to the second by rotation.
I rotate vector of camera direction by that angle around that axis.
Finally, I set camera to look at the new direction.

Here is the code for steps 2-5:

adjustCamera: function(cameraDirection, worldRay, screenPoint){
        var angle;
        this.camera.lookAt(cameraDirection);
        this.raycaster.setFromCamera(screenPoint, this.camera);
        this.axis.copy(this.raycaster.ray.direction);
        angle = this.axis.angleTo(worldRay);
        this.axis.cross(worldRay).normalize();
        this.ray.copy(cameraDirection);
        this.ray.applyAxisAngle(this.axis, angle).normalize();
        this.camera.lookAt(this.ray);
    }

I realized why this schema doesn't work. Camera orientation changed this way gets some roll (when rotation axis has non-zero z coordinate ), and this is eliminated by lookAt - it strips roll away, leaving only pitch and yaw. This leads to some inaccuracy, and it grows when initial and final rays are further away and when initial camera position has higher pitch. And I'm stuck here now, having no idea how to compute camera position without roll.

So, the question is: How can I accurately rotate the camera to bring specific ray to specific screen point? (preferrably suitable for the schema I'm using)

EDIT: There actually could be more than one (seems that no more than two, provided the ray doesn't point to nadir or zenith) correct (with no roll) camera positions that project world ray to the specific point on the screen.

Imagine following example: ray close to zenith should be matched with point in upper half of the screen. The first camera option is obvious, and the second in this case is rotated around vertical axis by 180 degrees and with higher pitch. In the first option zenith is projected on the screen above the lock point, and in the second zenith shows below.

In this ambiguous case option closest to initial camera direction should be chosen.

jjrv jjrv · Accepted Answer · 2016-10-13T11:23:03

Solving this for similar panorama purposes took quite a while.

The complication here is how a rotation matrix is constructed from the camera lookat direction and an up vector which fixes the roll orientation. The matrix ends up projecting your regular X, Y and Z axes so that:

Z matches lookat direction
X is perpendicular to Z and the up vector.
Y is perpendicular to X and Z to make the axes orthonormal.

Like this:

axisZ = direction;
axisX = normalize(cross(up, direction));
axisY = cross(direction, axisX);

Since up and direction are not perpendicular, we need to normalize the X axis, dividing with a square root.

We put those in the rows or columns of a matrix (depending on which way you multiply and if vectors are rows or columns) and get an equation like:

v = M_rotation w or view = M_rotation * world

You can just expand all the terms of everything to get equations for X, Y and Z components of view, and try to extract the components of direction. Thanks to the square root there, you get a higher degree polynomial system with 3 equations and variables pretty much all referring to each other. Since all axes are orthonormal, you can use Z² = 1 - X² * Y² to eliminate one variable and equation but the two resulting polynomials are 4th degree, both sharing two variables. I was unable to solve it at first, but also accidentally tried:

w = M^-1_rotation v = M^T_rotation v

Reversing the camera direction swaps input and output, and transposing the rotation matrix makes the equations look totally different.

If you then fix the up vector to the Y axis (you can always rotate the world before and after to allow arbitrary up directions later), you can eventually get to 2 reasonable equations:

(v_y * (1 - d_y²))² - (w_y - v_z * d_y)² * (1 - d_y²) = 0

(v_x * sqrt(1 - d_y² - d_x²) - v_y * d_y * d_x)² - (w_x - w_z * d_x)² * (1 - d_y²) = 0

Now the first one is a biquadratic equation symbolically solvable by hand for d_y, the Y component of the camera lookat direction. Solving the second one for d_x with Wolfram Alpha resulted in 4 polynomials with about 300 terms each. Defining some helper variables then produced a reasonable algorithm. It's not short or fast (don't put it in a vertex shader), but definitely fulfills the purpose of reacting intuitively to mouse movements. None of the heuristics I came up with worked equally well.

Here it is in TypeScript.

Note that a lookat direction rotating the zenith or nadir points off the YZ plane or on the wrong side of the XZ plane, simply doesn't exist. That means the user cannot drag points on the screen completely arbitrarily, unless you allow changing the up vector as well.

Adjust camera to match ray and screen point

2 Answers