ARFaceTrackingConfiguration: How to distinguish pictures from real faces?

Question

we have several apps in the Store that use ARFaceTrackingConfiguration to detect the users face in iOS devices with FaceID cameras.

As you might have seen, ARKit will also track picture of faces you put in front of your iPad Pro/iPhoneX, as if they were faces. E.g. take a picture from one of our apps (to replicate one can download&run Apples example app for ARFaceTrackingConfiguration):

Now I have noticed that internally ARKit treats real faces differently then it does pictures of faces. Because generally (both for ARWorldTrackingConfiguration as well as ARFaceTrackingConfiguration) ARKit tries to match real world sizes and virtual object sizes, i.e. and object that is 10x10cm in your 3D editing software will match a real world object of the same 10x10cm. BUT when face-tracking is used, and the phone detects an abnormally sized face (small 4cm wide face as in the picture above or a poster of a person where the face is much bigger) it will scale the FaceGeometry as if the detected face is a normal sized head, i.e. the measurements will be around ~14cm for the head width. All virtual objects will then be scaled accordingly which will make then the wrong size in the real world. C.f. the next picture:

The glasses 3D model is about 14cm wide, yet they are only presented as a 4cm object.

In comparison, if you put the glasses on a real 3D face, they will be the correct size, on a small persons head (like 12cm) they will be slightly too big, on a big persons head (like 16cm) they will be slightly too small (as they will be their real 14cm in both cases).

I can even see ARKit switching between:

Flat Face-detection using just the camera image
Face-Detection using the FaceID TrueDepth camera.

which is especially prominent when you hold a baby in front of the app. With a babys head, ARKit will first attempt to scale up everything so that the babys head is 14cm wide in the virtual scene and the glasses fit like on an adult. Then, usually 1-2s after the head appears in the camera ARFaceTrackingConfiguration will switch from mode (1) to mode (2) and show the real size of the 3D Object, which leads to supercute pictures of small baby heads with adult sized glasses (not shown here as SO isn't for sharing baby pictures).

So, now for the question:

Is there a way of determining whether ARKit is in mode 1 or 2 ?

Hi, did you find anything about this? I have the same problem. I couldn't find anything but this question. — Ensar Bayhan
Sadly, there is still no official API, @Andy 's answer below is still your best bet. — Bersaelor

Andy Fedoroff Andy Fedoroff · Accepted Answer · 2020-01-04T12:52:22

There's no way to do it in ARKit 3.0 API at the moment.

ARKit session's ARFaceTrackingConfiguration is constantly getting data from motion sensors at 1000 Hz, from front RGB camera at 60 Hz, and from IR camera at 15 Hz. And TrueDepth sensor is working while the session is running. You can't manually stop TrueDepth sensor in ARKit.

A working distance in ARFaceTrackingConfiguration is approximately 15...100 cm, so you can effectively detect up to 3 faces in ARKit 3.0 within that distance. But there's some logical bug in ARKit face detection – you can track your face at the same time as you're tracking a big face on a poster behind you (but face on a poster is flat because it has equidistant depth). So, a canonical mask's scale depends on the size of detected face (as you said before) but ARKit can't momentarily adapt a scale for that canonical mask (ARFaceGeometry) due to the fact that Face Tracking is very CPU intensive.

Apple's TrueDepth module has so narrow working distance range 'cause 30K dots coming from IR projector must have definite brightness, blurriness, coverage and dot size to be effectively used by ARKit.

With this code you could test whether TrueDepth module is involved in a process or not:

@available(iOS 13.0, *)
class ViewController: UIViewController {

    @IBOutlet var sceneView: ARSCNView!

    override func viewDidLoad() {
        super.viewDidLoad()

        sceneView.session.delegate = self
    }
}

extension ViewController: ARSessionDelegate {

    func session(_ session: ARSession, didUpdate frame: ARFrame) {

        print(sceneView.session.currentFrame?.capturedDepthData?.depthDataQuality as Any)
    }
}

Usually, every fourth frame with depth data is printed (but sometimes a gap is bigger than 4 frames):

There's only one case when TrueDepth sensor doesn't contribute to RGB data: when you move a smartphone too close to a poster or too close to your face – so you'll only see nils being printed.

ARFaceTrackingConfiguration: How to distinguish pictures from real faces?

So, now for the question:

1 Answers