MegaPixels
The Darkside of Datasets

Labeled Faces in the Wild

Created
2007
Images
13,233
People
5,749
Created From
Yahoo News images
Search available
Searchable

Labeled Faces in The Wild (LFW) is amongst the most widely used facial recognition training datasets in the world and is the first of its kind to be created entirely from images posted online. The LFW dataset includes 13,233 images of 5,749 people that were collected between 2002-2004. Use the tools below to check if you were included in this dataset or scroll down to read the analysis.

Eighteen of the 5,749 people in the Labeled Faces in the Wild Dataset. The most widely used face dataset for benchmarking commercial face recognition algorithms.
Eighteen of the 5,749 people in the Labeled Faces in the Wild Dataset. The most widely used face dataset for benchmarking commercial face recognition algorithms.

Intro

Three paragraphs describing the LFW dataset in a format that can be easily replicated for the other datasets. Nothing too custom. An analysis of the initial research papers with context relative to all the other dataset papers.

 all 5,749 people in the LFW Dataset sorted from most to least images collected.
all 5,749 people in the LFW Dataset sorted from most to least images collected.

LFW by the Numbers

Facts

need citations

 former President George W. Bush
former President George W. Bush
 Colin Powel (236), Tony Blair (144), and Donald Rumsfeld (121)
Colin Powel (236), Tony Blair (144), and Donald Rumsfeld (121)

People and Companies using the LFW Dataset

This section describes who is using the dataset and for what purposes. It should include specific examples of people or companies with citations and screenshots. This section is followed up by the graph, the map, and then the supplementary material.

The LFW dataset is used by numerous companies for benchmarking algorithms and in some cases training. According to the benchmarking results page [^lfw_results] provided by the authors, over 2 dozen companies have contributed their benchmark results.

According to BiometricUpdate.com [^lfw_pingan], LFW is "the most widely used evaluation set in the field of facial recognition, LFW attracts a few dozen teams from around the globe including Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong."

According to researchers at the Baidu Research – Institute of Deep Learning "LFW has been the most popular evaluation benchmark for face recognition, and played a very important role in facilitating the face recognition society to improve algorithm. [^lfw_baidu]."

In addition to commercial use as an evaluation tool, alll of the faces in LFW dataset are prepackaged into a popular machine learning code framework called scikit-learn.

load file: lfw_commercial_use.csv
name_display,company_url,example_url,country,description
Company Country Industries
Aratek China Biometric sensors for telecom, civil identification, finance, education, POS, and transportation
Aratek China Biometric sensors for telecom, civil identification, finance, education, POS, and transportation
Aratek China Biometric sensors for telecom, civil identification, finance, education, POS, and transportation

Add 2-4 screenshots of companies mentioning LFW here

 "PING AN Tech facial recognition receives high score in latest LFW test results"
"PING AN Tech facial recognition receives high score in latest LFW test results"
 "Face Recognition Performance in LFW benchmark"
"Face Recognition Performance in LFW benchmark"
 "The 1st place in face verification challenge, LFW"
"The 1st place in face verification challenge, LFW"

In benchmarking, companies use a dataset to evaluate their algorithms which are typically trained on other data. After training, researchers will use LFW as a benchmark to compare results with other algorithms.

For example, Baidu (est. net worth $13B) uses LFW to report results for their "Targeting Ultimate Accuracy: Face Recognition via Deep Embedding". According to the three Baidu researchers who produced the paper:

Citations

Overall, LFW has at least 456 citations from 123 countries. Sed ut perspiciatis, unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam eaque ipsa, quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt, explicabo. Nemo enim ipsam voluptatem, quia voluptas sit, aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos.

Sed ut perspiciatis, unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam eaque ipsa, quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt, explicabo. Nemo enim ipsam voluptatem, quia voluptas sit, aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos.

Distribution of citations per year per country for the top 5 countries with citations for the LFW Dataset
Distribution of citations per year per country for the top 5 countries with citations for the LFW Dataset
Geographic distributions of citations for the LFW Dataset
Geographic distributions of citations for the LFW Dataset

Conclusion

The LFW face recognition training and evaluation dataset is a historically important face dataset as it was the first popular dataset to be created entirely from Internet images, paving the way for a global trend towards downloading anyone’s face from the Internet and adding it to a dataset. As will be evident with other datasets, LFW’s approach has now become the norm.

For all the 5,000 people in this datasets, their face is forever a part of facial recognition history. It would be impossible to remove anyone from the dataset because it is so ubiquitous. For their rest of the lives and forever after, these 5,000 people will continue to be used for training facial recognition surveillance.

Right to Removal

If you are affected by disclosure of your identity in this dataset please do contact the authors. Many have stated that they are willing to remove images upon request. The authors of the LFW dataset provide the following email for inquiries:

You can use the following message to request removal from the dataset:

To: Gary Huang mailto:gbhuang@cs.umass.edu

Subject: Request for Removal from LFW Face Dataset

Dear [researcher name],

I am writing to you about the "Labeled Faces in The Wild Dataset". Recently I discovered that your dataset includes my identity and I no longer wish to be included in your dataset.

The dataset is being used thousands of companies around the world to improve facial recognition software including usage by governments for the purpose of law enforcement, national security, tracking consumers in retail environments, and tracking individuals through public spaces.

My name as it appears in your dataset is [your name]. Please remove all images from your dataset and inform your newsletter subscribers to likewise update their copies.

- [your name]


Supplementary Data

Researchers, journ

Title Organization Country Type
3D-aided face recognition from videos University of Lyon France edu
A Community Detection Approach to Cleaning Extremely Large Face Database National University of Defense Technology, China China edu
3D-aided face recognition from videos University of Lyon France edu
3D-aided face recognition from videos University of Lyon France edu
3D-aided face recognition from videos University of Lyon France edu
3D-aided face recognition from videos University of Lyon France edu
3D-aided face recognition from videos University of Lyon France edu
3D-aided face recognition from videos University of Lyon France edu
3D-aided face recognition from videos University of Lyon France edu
3D-aided face recognition from videos University of Lyon France edu
3D-aided face recognition from videos University of Lyon France edu
3D-aided face recognition from videos University of Lyon France edu
3D-aided face recognition from videos University of Lyon France edu
3D-aided face recognition from videos University of Lyon France edu
3D-aided face recognition from videos University of Lyon France edu

Code

import numpy as np from sklearn.datasets import fetch_lfw_people import imageio import imutils

download LFW dataset (first run takes a while)

lfw_people = fetch_lfw_people(min_faces_per_person=1, resize=1, color=True, funneled=False)

introspect dataset

n_samples, h, w, c = lfw_people.images.shape print('{:,} images at {}x{}'.format(n_samples, w, h)) cols, rows = (176, 76) n_ims = cols * rows

build montages

im_scale = 0.5 ims = lfw_people.images[:n_ims montages = imutils.build_montages(ims, (int(wim_scale, int(him_scale)), (cols, rows)) montage = montages[0]

save full montage image

imageio.imwrite('lfw_montage_full.png', montage)

make a smaller version

montage_960 = imutils.resize(montage, width=960) imageio.imwrite('lfw_montage_960.jpg', montage_960)

Disclaimer

MegaPixels is an educational art project designed to encourage discourse about facial recognition datasets. Any ethical or legal issues should be directed to the researcher's parent organizations. Except where necessary for contact or clarity, the names of researchers have been subsituted by their parent organization. In no way does this project aim to villify researchers who produced the datasets.

Read more about MegaPixels Code of Conduct