Age Estimation using Deep Learning

Figure 1. The age distribution of the IMDb-Wiki dataset

The progress of computer vision has made great strides in recent years, thanks to the significant research that was conducted exploring different applications of neural networks on images. One of these applications is age estimation.

In view of certain factors, such as face orientation, illumination quality, occluded faces, and black-and- white images – all of which could significantly affect the outcome – accurate age estimation could be difficult to achieve. However, recent advances in deep learning have achieved state-of-the-art performance on challenging datasets. The computer-vision software company Sighthound [1] claims to have achieved the lowest mean absolute error to date. In fact, the company also offers its neural network expertise as a service to retail businesses, enabling the latter to gather different statistical data about their customers, such as age and gender.

Age estimation is considered to be challenging in terms of regression analysis, due to the number of age-classification possibilities (ranging from 0-100+) and the extensive research conducted using different classification methodologies. Moreover it could be viewed as a texture pattern, in which the features could be used like local binary patterns, biologically inspired features and convolutional neural networks (CNNs) which, in recent years, experienced a surge of popularity due to the outstanding performance in face recognition.

Figure 2. Different examples of age estimation using the VGG-FACE transfer learning model

This project has investigated the use of deep learning for age estimation. Instead of creating a CNN and training it from scratch, this study draws from transfer learning, by using a pre-trained model called VGG-Face, which has been trained on thousands of images for face recognition. A support-vector regression (SVR) model was then trained on the features outputted by the CNN to reach the final age estimation, by fine-tuning the data obtained through VGG-face. The study made use of over 60,000 images to train and test the network, which achieved an accuracy of ±6 years. Although this number of images is not sufficient for training a neural network from scratch, it is adequate for the purpose of transfer learning.

References/Bibliography:

[1] https://www.sighthound.com/technology/face-detection/benchmarks/bao-and-afw

[2] IMDb-Wiki dataset https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/

Student: Daniel Camilleri
Course: B.Sc. IT (Hons.) Computer Engineering
Supervisor: Dr. Ing. Reuben Farrugia