1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
|
------------
status: published
title: Labeled Faces in The Wild
desc: Labeled Faces in The Wild (LFW) is a database of face photographs designed for studying the problem of unconstrained face recognition.
subdesc: It includes 13,456 images of 4,432 people's images copied from the Internet during 2002-2004.
image: assets/background.jpg
slug: lfw
color: #ff0000
published: 2019-2-23
updated: 2019-2-23
authors: Adam Harvey
------------
### sidebar
+ Created: 2002 – 2004
+ Images: 13,233
+ Identities: 5,749
+ Origin: Yahoo! News Images
+ Used by: Facebook, Google, Microsoft, Baidu, Tencent, SenseTime, Face++, CIA, NSA, IARPA
+ Website: <a href="http://vis-www.cs.umass.edu/lfw">umass.edu</a>
- There are about 3 men for every 1 woman in the LFW dataset[^lfw_www]
- The person with the most images is [George W. Bush](http://vis-www.cs.umass.edu/lfw/person/George_W_Bush_comp.html) with 530
- There are about 3 George W. Bush's for every 1 [Tony Blair](http://vis-www.cs.umass.edu/lfw/person/Tony_Blair.html)
- The LFW dataset includes over 500 actors, 30 models, 10 presidents, 124 basketball players, 24 football players, 11 kings, 7 queens, and 1 [Moby](http://vis-www.cs.umass.edu/lfw/person/Moby.html)
- In all 3 of the LFW publications [^lfw_original_paper], [^lfw_survey], [^lfw_tech_report] the words "ethics", "consent", and "privacy" appear 0 times
- The word "future" appears 71 times
- \* denotes partial funding for related research
## Labeled Faces in the Wild
*Labeled Faces in The Wild* (LFW) is "a database of face photographs designed for studying the problem of unconstrained face recognition[^lfw_www]. It is used to evaluate and improve the performance of facial recognition algorithms in academic, commercial, and government research. According to BiometricUpdate.com[^lfw_pingan], LFW is "the most widely used evaluation set in the field of facial recognition, LFW attracts a few dozen teams from around the globe including Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong."
The LFW dataset includes 13,233 images of 5,749 people that were collected between 2002-2004. LFW is a subset of *Names of Faces* and is part of the first facial recognition training dataset created entirely from images appearing on the Internet. The people appearing in LFW are...
The *Names and Faces* dataset was the first face recognition dataset created entire from online photos. However, *Names and Faces* and *LFW* are not the first face recognition dataset created entirely "in the wild". That title belongs to the [UCD dataset](/datasets/ucd_faces/). Images obtained "in the wild" means using an image without explicit consent or awareness from the subject or photographer.
The *Names and Faces* dataset was the first face recognition dataset created entire from online photos. However, *Names and Faces* and *LFW* are not the first face recognition dataset created entirely "in the wild". That title belongs to the [UCD dataset](/datasets/ucd_faces/). Images obtained "in the wild" means using an image without explicit consent or awareness from the subject or photographer.

The *Names and Faces* dataset was the first face recognition dataset created entire from online photos. However, *Names and Faces* and *LFW* are not the first face recognition dataset created entirely "in the wild". That title belongs to the [UCD dataset](/datasets/ucd_faces/). Images obtained "in the wild" means using an image without explicit consent or awareness from the subject or photographer.
The *Names and Faces* dataset was the first face recognition dataset created entire from online photos. However, *Names and Faces* and *LFW* are not the first face recognition dataset created entirely "in the wild". That title belongs to the [UCD dataset](/datasets/ucd_faces/). Images obtained "in the wild" means using an image without explicit consent or awareness from the subject or photographer.
{% include 'map.html' %}
{% include 'supplementary_header.html' %}
{% include 'citations.html' %}
### Commercial Use
Add a paragraph about how usage extends far beyond academia into research centers for largest companies in the world. And even funnels into CIA funded research in the US and defense industry usage in China.
```
load_file assets/lfw_commercial_use.csv
name_display, company_url, example_url, country, description
```
Research, text, and graphics ©Adam Harvey / megapixels.cc
-------
Ignore text below these lines
-------
### Research
- "In our experiments, we used 10000 images and associated captions from the Faces in the wilddata set [3]."
- "This work was supported in part by the Center for Intelligent Information Retrieval, the Central Intelligence Agency, the National Security Agency and National Science Foundation under CAREER award IIS-0546666 and grant IIS-0326249."
- From: "People-LDA: Anchoring Topics to People using Face Recognition" <https://www.semanticscholar.org/paper/People-LDA%3A-Anchoring-Topics-to-People-using-Face-Jain-Learned-Miller/10f17534dba06af1ddab96c4188a9c98a020a459> and <https://ieeexplore.ieee.org/document/4409055>
- This paper was presented at IEEE 11th ICCV conference Oct 14-21 and the main LFW paper "Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments" was also published that same year
- 10f17534dba06af1ddab96c4188a9c98a020a459
- This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via contract number 2014-14071600010.
- From "Labeled Faces in the Wild: Updates and New Reporting Procedures"
- 70% of people in the dataset have only 1 image and 29% have 2 or more images
- The LFW dataset is considered the "most popular benchmark for face recognition" [^lfw_baidu]
- The LFW dataset is "the most widely used evaluation set in the field of facial recognition" [^lfw_pingan]
- All images in LFW dataset were obtained "in the wild" meaning without any consent from the subject or from the photographer
- The faces in the LFW dataset were detected using the Viola-Jones haarcascade face detector [^lfw_website] [^lfw-survey]
- The LFW dataset is used by several of the largest tech companies in the world including "Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong." [^lfw_pingan]
- All images in the LFW dataset were copied from Yahoo News between 2002 - 2004
- In 2014, two of the four original authors of the LFW dataset received funding from IARPA and ODNI for their followup paper [Labeled Faces in the Wild: Updates and New Reporting Procedures](https://www.semanticscholar.org/paper/Labeled-Faces-in-the-Wild-%3A-Updates-and-New-Huang-Learned-Miller/2d3482dcff69c7417c7b933f22de606a0e8e42d4) via IARPA contract number 2014-14071600010
- The dataset includes 2 images of [George Tenet](http://vis-www.cs.umass.edu/lfw/person/George_Tenet.html), the former Director of Central Intelligence (DCI) for the Central Intelligence Agency whose facial biometrics were eventually used to help train facial recognition software in China and Russia
### Footnotes
[^lfw_www]: <http://vis-www.cs.umass.edu/lfw/results.html>
[^lfw_baidu]: Jingtuo Liu, Yafeng Deng, Tao Bai, Zhengping Wei, Chang Huang. Targeting Ultimate Accuracy: Face Recognition via Deep Embedding. <https://arxiv.org/abs/1506.07310>
[^lfw_pingan]: Lee, Justin. "PING AN Tech facial recognition receives high score in latest LFW test results". BiometricUpdate.com. Feb 13, 2017. <https://www.biometricupdate.com/201702/ping-an-tech-facial-recognition-receives-high-score-in-latest-lfw-test-results>
|