Difficulty getting depth of face landmark points from 2D regions on iPhone X (SceneKit/ARKit app)

Question

I'm running face landmark detection using the front-facing camera on iPhone X, and am trying very hard to get 3D points of face landmarks (VNFaceLandmarkRegion2D gives image coordinates X, Y only).

I've been trying to use either the ARSCNView.hitTest or ARFrame.hitTest, but am so far unsuccessful. I think my error may be in converting the initial landmark points to the correct coordinate system. I've tried quite a few permutations, but currently based on my research this is what I've come up with:

let point = CGPoint(x: landmarkPt.x * faceBounds.width + faceBounds.origin.x, y: (1.0 - landmarkPt.y) * faceBounds.height + faceBounds.origin.y)
let screenPoint = CGPoint(point.x * view.bounds.width, point.y * view.bounds.height)
frame.hitTest(screenPoint, types: ARHitTestResult.ResultType.featurePoint)

I've also tried to do

let newPoint = CGPoint(x: point.x, y: 1.0 - point.y)

after the conversion, but nothing seems to work. My frame.hitTest result is always empty. Am I missing anything in the conversion?

Does the front-facing camera add another level to this? (I also tried inverting the initial X value at one point, in case the coordinate system was being mirrored). It also seems to me that the initial landmark normalizedPoints are sometimes negative and also sometimes greater than 1.0, which doesn't make any sense to me. I'm using ARSession.currentFrame?.capturedImage to capture the frame of the front-facing camera, if that's important.

Any help would be very, very appreciated, thanks so much!

-- SOLVED --

For anyone with similar issues: I am finally getting hit test results!

for point in observation.landmarks?.allPoints?.pointsInImage(imageSize: sceneView.bounds.size) {
    let result = ARSCNView.hitTest(point, options: [ARSCNHitTestOption.rootNode: faceNode)
}

I use the face geometry as an occlusion node attached to the face node.

Thanks Rickster!

The result is type [SCNHitTestResult]. Take a look at: developer.apple.com/documentation/scenekit/scnhittestresult. You can use result[i].localCoordinates or result[i].worldCoordinates to get position of detected points, depending on what coordinate system you want the point in. — miweinst
Just curious on how accurate you have found this method to be? With the help given on this thread I've been experimenting with drawing some things on a face, using the landmark detection to guide placement. But it typically seems to come out a bit "off." I'm trying to figure out whether it's just a limitation of the Vision framework or whether I introduced some error during one of the steps (Hit Test, converting coordinate systems, etc.).... — Zack Foster
Hi Zack, this has been my experience too. The Vision landmark detection has not been quite accurate enough for my needs on a given frame. I'm trying to find real-world feature measurements, and using Vision can give outliers depending on lighting and any motion in the captured frame. However, pro-tip, since Vision works quite fast, I've found that averaging features over many frames (and throwing out outliers) can be accurate within a couple millimeters, so this is my current solution until perhaps switching to a new feature detection algorithm. — miweinst

rickster rickster · Accepted Answer · 2018-01-08T19:33:17

You're using ARFaceTrackingConfiguration, correct? In that case, the featurePoint hit test type won't help you, because feature points are part of world tracking, not face tracking... in fact, just about all the ARKit hit testing machinery is specific to world tracking, and not useful to face tracking.

What you can do instead is make use of the face mesh (ARFaceGeometry) and face pose tracking (ARFaceAnchor) to work your way from a 2D image point to a 3D world-space (or camera-space) point. There's at least a couple paths you could go down for that:

If you're already using SceneKit, you can use SceneKit's hit testing instead of ARKit's. (That is, you're hit testing against "virtual" geometry modeled in SceneKit, not against a sparse estimate of the real-world environment modeled by ARKit. In this case, the "virtual" geometry of the face mesh comes into SceneKit via ARKit.) That is, you want ARSCNView.hitTest(_:options:) (inherited from SCNSceneRenderer), not hitTest(_:types:). Of course, this means you'll need to be using ARSCNFaceGeometry to visualize the face mesh in your scene, and ARSCNView's node/anchor mapping to make it track the face pose (though if you want the video image to show through, you can make the mesh transparent) — otherwise the SceneKit hit test won't have any SceneKit geometry to find.
If you're not using SceneKit, or for some reason can't put the face mesh into your scene, you have all the information you need to reconstruct a hit test against the face mesh. ARCamera has view and projection matrices that tell you the relationship of your 2D screen projection to 3D world space, ARFaceAnchor tells you where the face is in world space, and ARFaceGeometry tells you where each point is on the face — so it's just a bunch of math to get from a screen point to a face-mesh point and vice versa.

Difficulty getting depth of face landmark points from 2D regions on iPhone X (SceneKit/ARKit app)

1 Answers