0
votes

I'm learning all I can about machine learning specifically on iOS. I found the OpenFace model converted into a .mlmodel and I can successfully run it through vision and get a 128 vector space representation of each face.

First, I make the Vision model object from the core ML Model that is in my project's file system. I also construct the VNCoreMLRequest from that model and assign a function for the completion.

let openFaceModel = try! VNCoreMLModel(for: OpenFace().model)
var request: VNCoreMLRequest = VNCoreMLRequest(model: self.openFaceModel, completionHandler: self.visionResults) 

Second, I get the CMSampleBuffer delivered to me from the camera. I use it to perform the request.

func stream(_ pixelBuffer: CMSampleBuffer) {
    guard let cvBuffer = CMSampleBufferGetImageBuffer(buffer) else {
        throw CMBufferProcessorError.cvPixelBufferConversionFailed
    }

    let handler = VNImageRequestHandler(cvPixelBuffer: cvBuffer, options: [:])

    do {
        try handler.perform([self.request])
    }catch{
        print(error)
    }

}

Finally, my function that was assigned as the completion handle for the VNCoreMLRequest gets called with the results.

func visionResults(request: VNRequest, error: Error?) {
    guard let features = request.results as? [VNCoreMLFeatureValueObservation] else {
        print("No Results")
        return
    }

    print("Feature Count: \(features.count)")

    for feature in features {
        quickLog(title: "Feature Type", message: "\(feature.featureValue.type.rawValue)")
        quickLog(title: "Feature Value", message: "\(feature.featureValue.multiArrayValue)")
    }
}

I'm successfully retrieving the 128 dimension multi array. Now I have three questions based on two observations.

I observed that I get a unique vector back even when there are no faces in the frame.

1) Is this desired behavior? If so how do I filter for a multi array result that represents the absence of a face?

I observed that I only ever get back one results even if there are multiple faces in the frame.

2) Is this expected behavior for this model?

Thank you for the help!

2

2 Answers

1
votes

Not sure exactly which model you're using (link?) but if it has been trained only on single faces (and not multiple faces or the absence of faces) then using the model on more than one face at a time or on no faces at all will give useless predictions. In that case you're using the model on so-called out-of-distribution data, i.e. on things it was not trained to detect. Most deep learning models are not trustworthy when used on such OoD data.

You could combine this with Vision's face detection features: first run a face detection request on the image, then crop out that area of the image, and run your OpenFace model on each of these crops (once for each separate image). If there are no faces detected, you don't need to run OpenFace.

0
votes

OpenFace works on a single face image, that's what it has been trained for. It doesn't check whether there is a face or not in the input image. It also needs the cropped face images to be aligned according to eyes and nose so that in each image the eyes and noses are in the same place. enter image description here

OpenFace normalizes each face before training so each face image that goes into the model has the eyes and nose at the same location. This allows it to be trainable with fewer images. It has fewer parameters than FaceNet, which means it runs faster and requires less space on the disk.

OpenFace model work in this way: it takes face image as input and creates 128 valued vector as output. These vectors can be used to compare and identify faces and it's unique for each face. Imagine it like to put every face in a unique location in a cube(3d) but in 128 dimensions. This way you can check the distance between faces and if it's very close to each other (threshold value 0.99) you can say the two picture belongs to same person, if it's higher than the threshold you can say images belong two different people. The distance measure is nothing but the sum of the squared differences of dots (Euclidean distance). enter image description here