Face recognition is actually a combination of face detection and face recognition. These two tasks work together. Suggestions result from the face recognition step.
In this post we are discussing face recognition as a separate step. Refer to the post on "Better face detection" for a discussion of face detection improvements planned and factors impacting face detection.
At the present time (Jun 2021) we are testing a new machine learning algorithm that is more accurate (>20% improvement) than our current method. The downside is that it is slower and takes up lots of memory... so we are trying to minimize the downside while keeping the benefits before we roll it into production. We hope to do this by Oct 2021. We utilize face detection and recognition technology from Applied Recognition Corp. They are doing the advanced R&D to continuously improve the algorithms that ultimately improve the accuracy of Tag That Photo.
How recognition works - we extract the pixels around an identified face that meets a certain quality threshold. The identified face is passed to the recognition module from the face detection module. The pixels extracted from the original image are sent through a deep learning algorithm that generates a sequence of digital numbers (technically a vector of approx. 100 floating point numbers). This sequence of digital numbers is also known as a face template in face recognition jargon. The template is then used for two purposes;
- To compare with all other known people (faces) to see if there is a match. The best match turns into a suggestion. There can be more than one match but that is rare.
- If no matches are found, then the template is compared with all unknown faces to potentially find an existing cluster of unknown faces that look similar.
The time consuming part in the process is; a) face detection setup on images that can contain millions of pixels, and b) generation of the face template. The actual comparison task between two face templates is very fast - typically 3ms (or 3 thousandths of a second) on current computers. So once the software has a new template, it can quickly compare that template with hundreds of thousands of known and unknown face templates.
The factors affecting face recognition accuracy include; lighting, pose, resolution, expression, occlusion (stuff blocking the face), weight, age, facial hair, and glasses. As you can see there are a LOT of things getting in the way of the recognition algorithm identifying a face as a match for another face.
A match implies that the mathematical distance between two face templates is short. The template comparison module actually returns a score - and that score is higher for closer templates. The score is based on a statistical scale - so that the higher the score, the higher the probability that the faces are from the same person.
One thing that can "confuse" the recognition algorithm is having too many faces of a person that are not good representations of that person for some of the reasons listed above. For example, they have sunglasses on and there is extreme sunlight washing out the image. The pixels for that face and the resulting template could be relatively distant from a "good" face for that person. When Tag That Photo does a face comparison, the entire group of tagged faces for a person (excluding faces tagged manually) are compared using something similar to an average. The more variety of faces for a person, the less unique (in general) that Person becomes.
We are considering limiting the total number of "known" faces for a person to a max of 50, and choosing those based on quality scores during face detection. We may allow users to over-ride those faces with other "good" faces that perhaps best represent that person in terms of pose, expression, lighting, etc. This will help the recognition accuracy on a large database with hundreds of known people, and potentially tens of thousands of tagged faces.
We also are considering sorting the suggestions in order of popularity - so your close friends and family are first. And, making better use of screen real estate to show more thumbnails will help with tagging efficiency.