MegaPixels
Labeled Faces in The Wild (LFW) is a database of face photographs designed for studying the problem of unconstrained face recognition.
It includes 13,456 images of 4,432 people’s images copied from the Internet during 2002-2004.
Eighteen of the 5,749 people in the Labeled Faces in the Wild Dataset. The most widely used face dataset for benchmarking commercial face recognition algorithms.

Labeled Faces in the Wild

Labeled Faces in The Wild (LFW) is "a database of face photographs designed for studying the problem of unconstrained face recognition1. It is used to evaluate and improve the performance of facial recognition algorithms in academic, commercial, and government research. According to BiometricUpdate.com3, LFW is "the most widely used evaluation set in the field of facial recognition, LFW attracts a few dozen teams from around the globe including Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong."

The LFW dataset includes 13,233 images of 5,749 people that were collected between 2002-2004. LFW is a subset of Names of Faces and is part of the first facial recognition training dataset created entirely from images appearing on the Internet. The people appearing in LFW are...

The Names and Faces dataset was the first face recognition dataset created entire from online photos. However, Names and Faces and LFW are not the first face recognition dataset created entirely "in the wild". That title belongs to the UCD dataset. Images obtained "in the wild" means using an image without explicit consent or awareness from the subject or photographer.

Biometric Trade Routes

To understand how this dataset has been used, its citations have been geocoded to show an approximate geographic digital trade route of the biometric data. Lines indicate an organization (education, commercial, or governmental) that has cited the LFW dataset in their research. Data is compiled from Semantic Scholar.

Synthetic Faces

To visualize the types of photos in the dataset without explicitly publishing individual's identities a generative adversarial network (GAN) was trained on the entire dataset. The images in this video show a neural network learning the visual latent space and then interpolating between archetypical identities within the LFW dataset.

Citations

Browse or download the geocoded citation data collected for the LFW dataset.

Additional Information

(tweet-sized snippets go here)

 former President George W. Bush
former President George W. Bush
 Colin Powell (236), Tony Blair (144), and Donald Rumsfeld (121)
Colin Powell (236), Tony Blair (144), and Donald Rumsfeld (121)
All 5,379 faces in the Labeled Faces in The Wild Dataset
All 5,379 faces in the Labeled Faces in The Wild Dataset

Code

The LFW dataset is so widely used that a popular code library called Sci-Kit Learn includes a function called fetch_lfw_people to download the faces in the LFW dataset.

#!/usr/bin/python

import numpy as np
from sklearn.datasets import fetch_lfw_people
import imageio
import imutils

# download LFW dataset (first run takes a while)
lfw_people = fetch_lfw_people(min_faces_per_person=1, resize=1, color=True, funneled=False)

# introspect dataset
n_samples, h, w, c = lfw_people.images.shape
print(f'{n_samples:,} images at {w}x{h} pixels')
cols, rows = (176, 76)
n_ims = cols * rows

# build montages
im_scale = 0.5
ims = lfw_people.images[:n_ims]
montages = imutils.build_montages(ims, (int(w * im_scale,   int(h * im_scale)), (cols, rows))
montage = montages[0]

# save full montage image
imageio.imwrite('lfw_montage_full.png', montage)

# make a smaller version
montage = imutils.resize(montage, width=960)
imageio.imwrite('lfw_montage_960.jpg', montage)

Supplementary Material

Text and graphics ©Adam Harvey / megapixels.cc