For a computer vision system, the task of recognizing human faces from single pictures is made difficult by variations in face position, size, expression, and pose (front, profile, ...). We present an automatic system able to recognize human faces on the basis of single grey-level mug-shots matched against a large data set including one image per person (100-250 persons). The system performance and robustness is assessed, and is compared with other systems.
2D views of faces are represented by labeled graphs. Graph nodes are labeled with jets and graph edges are labeled with distance vectors. Jets are 80-dimensional vectors based on a 2D Gabor-wavelet transform. We use 40 complex Gabor-wavelets, which are localized filters, each of a certain spatial frequency and orientation (8 orientations and 5 spatial frequencies). Jets are a concise and robust representation of local grey-level value regions of the image.
Since we want to compare faces across pose, we define a set of about 45 facial (or fiducial) points at which nodes are positioned. These points are identical in different poses (to the extent they are visible). Points such as the tip of the nose, the corners of the mouth, the pupils, the tip of the chin are included. The graphs thus have an object adapted grid and compatible nodes in different views can be compared with each other.
To be able to process a large number of new faces, we have introduced a special graph structure, the bunch graph. It is constructed from a representative set (typically 70) of model graphs having the same pose, (e.g. frontal), and the same graph structure ( i.e. same set of fiducial points as nodes, and same set of edges between the nodes). Each node of the bunch graph is then labeled with all the jets taken from the models at the same fiducial point. For instance, in frontal view, the left-eye node of the face bunch graph has attached different left-eye jets taken from the frontal view model graphs. New faces can be encoded by taking jets from different models at each node, e.g. left-eye jet taken from model 3 while nose jet taken from model 25. This takes advantage of the full combinatorial power of the bunch graph and makes it possible to represent and process faces not seen before.
For new pictures, new graphs are automatically generated using elastic bunch-graph matching, which is guided by a simple similarity function defined between a face bunch graph and a new image graph. This similarity function accounts for spatial distortion and is based on the average similarity between image jets and the best fitting jets in each bunch. Finally, once generated, the new graph is compared with stored graphs and the example with the highest average similarity is taken as the recognized person. This procedure is also possible between different views, because jets representing the same fiducial points across views can be associated. (i.e. recognizing a half-profile from a database of frontals).
On the ARL/FERET database raw recognition rates against galleries of 250 persons are 98% for frontal poses, 84% for profiles, and 57% for half-profiles. Performance degrades to the range of 10-20% for recognition across poses.
Keywords: face recognition, different poses, Gabor wavelets, elastic graph matching, bunch graph, ARPA/ARL FERET database