1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
|
<!doctype html>
<html>
<head>
<title>MegaPixels</title>
<meta charset="utf-8" />
<meta name="author" content="Adam Harvey" />
<meta name="description" content="LFW: Labeled Faces in The Wild" />
<meta name="referrer" content="no-referrer" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
<link rel='stylesheet' href='/assets/css/fonts.css' />
<link rel='stylesheet' href='/assets/css/css.css' />
</head>
<body>
<header>
<a class='slogan' href="/">
<div class='logo'></div>
<div class='site_name'>MegaPixels</div>
<span class='sub'>The Darkside of Datasets</span>
</a>
<div class='links'>
<a href="/search/">Face Search</a>
<a href="/datasets/">Datasets</a>
<a href="/research/">Research</a>
<a href="/about/">About</a>
</div>
</header>
<div class="content">
<section><h1>Labeled Faces in The Wild</h1>
</section><section><div class='meta'><div><div class='gray'>Created</div><div>2007</div></div><div><div class='gray'>Images</div><div>13,233</div></div><div><div class='gray'>People</div><div>5,749</div></div><div><div class='gray'>Created From</div><div>Yahoo News images</div></div><div><div class='gray'>Search available</div><div>Searchable</div></div></div></section><section><p>Labeled Faces in The Wild (LFW) is amongst the most widely used facial recognition training datasets in the world and is the first of its kind to be created entirely from images that were posted online. The LFW dataset includes 13,233 images of 5,749 people that were collected between 2002-2004. Use the tools below to check if you were included in this dataset or scroll down to read the analysis.</p>
<p>{INSERT IMAGE SEARCH MODULE}</p>
<p>{INSERT TEXT SEARCH MODULE}</p>
</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/lfw/assets/lfw_feature.jpg' alt='Eight out of 5,749 people in the Labeled Faces in the Wild dataset. The face recognition training dataset is created entirely from photos downloaded from the Internet.'><div class='caption'>Eight out of 5,749 people in the Labeled Faces in the Wild dataset. The face recognition training dataset is created entirely from photos downloaded from the Internet.</div></div></section><section><h2>INTRO</h2>
<p>It began in 2002. Researchers at University of Massachusetts Amherst were developing algorithms for facial recognition and they needed more data. Between 2002-2004 they scraped Yahoo News for images of public figures. Two years later they cleaned up the dataset and repackaged it as Labeled Faces in the Wild (LFW).</p>
<p>Since then the LFW dataset has become one of the most widely used datasets used for evaluating face recognition algorithms. The associated research paper “Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments” has been cited 996 times reaching 45 different countries throughout the world.</p>
<p>The faces come from news stories and are mostly celebrities from the entertainment industry, politicians, and villains. It’s a sampling of current affairs and breaking news that has come to pass. The images, detached from their original context now server a new purpose: to train, evaluate, and improve facial recognition.</p>
<p>As the most widely used facial recognition dataset, it can be said that each individual in LFW has, in a small way, contributed to the current state of the art in facial recognition surveillance. John Cusack, Julianne Moore, Barry Bonds, Osama bin Laden, and even Moby are amongst these biometric pillars, exemplar faces provided the visual dimensions of a new computer vision future.</p>
</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/lfw/assets/lfw_montage_1280.jpg' alt='The entire LFW dataset cropped to facial regions'><div class='caption'>The entire LFW dataset cropped to facial regions</div></div></section><section><p>In addition to commercial use as an evaluation tool, alll of the faces in LFW dataset are prepackaged into a popular machine learning code framework called scikit-learn.</p>
<h2>Facts</h2>
<p>The person with the most images is:
The person with the least images is:</p>
<h2>Commercial Use</h2>
<p>The LFW dataset is used by numerous companies for <a href="about/glossary#benchmarking">benchmarking</a> algorithms and in some cases <a href="about/glossary#training">training</a>. According to the benchmarking results page [^lfw_results] provided by the authors, over 2 dozen companies have contributed their benchmark results.</p>
<p>According to BiometricUpdate.com [^lfw_pingan], LFW is "the most widely used evaluation set in the field of facial recognition, LFW attracts a few dozen teams from around the globe including Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong."</p>
<p>According to researchers at the Baidu Research – Institute of Deep Learning "LFW has been the most popular evaluation benchmark for face recognition, and played a very important role in facilitating the face recognition society to improve algorithm. [^lfw_baidu]."</p>
<pre><code>load file: lfw_commercial_use.csv
name_display,company_url,example_url,country,description
</code></pre>
<table>
<thead><tr>
<th style="text-align:left">Company</th>
<th style="text-align:left">Country</th>
<th style="text-align:left">Industries</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left"><a href="http://www.aratek.co">Aratek</a></td>
<td style="text-align:left">China</td>
<td style="text-align:left">Biometric sensors for telecom, civil identification, finance, education, POS, and transportation</td>
</tr>
<tr>
<td style="text-align:left"><a href="http://www.aratek.co">Aratek</a></td>
<td style="text-align:left">China</td>
<td style="text-align:left">Biometric sensors for telecom, civil identification, finance, education, POS, and transportation</td>
</tr>
<tr>
<td style="text-align:left"><a href="http://www.aratek.co">Aratek</a></td>
<td style="text-align:left">China</td>
<td style="text-align:left">Biometric sensors for telecom, civil identification, finance, education, POS, and transportation</td>
</tr>
</tbody>
</table>
<p>Add 2-4 screenshots of companies mentioning LFW here</p>
</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/lfw/assets/lfw_screenshot_01.jpg' alt=' "PING AN Tech facial recognition receives high score in latest LFW test results"'><div class='caption'> "PING AN Tech facial recognition receives high score in latest LFW test results"</div></div>
<div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/lfw/assets/lfw_screenshot_02.jpg' alt=' "Face Recognition Performance in LFW benchmark"'><div class='caption'> "Face Recognition Performance in LFW benchmark"</div></div>
<div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/lfw/assets/lfw_screenshot_03.jpg' alt=' "The 1st place in face verification challenge, LFW"'><div class='caption'> "The 1st place in face verification challenge, LFW"</div></div></section><section><p>In benchmarking, companies use a dataset to evaluate their algorithms which are typically trained on other data. After training, researchers will use LFW as a benchmark to compare results with other algorithms.</p>
<p>For example, Baidu (est. net worth $13B) uses LFW to report results for their "Targeting Ultimate Accuracy: Face Recognition via Deep Embedding". According to the three Baidu researchers who produced the paper:</p>
<h2>Citations</h2>
<p>Overall, LFW has at least 456 citations from 123 countries. Sed ut perspiciatis, unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam eaque ipsa, quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt, explicabo. Nemo enim ipsam voluptatem, quia voluptas sit, aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos.</p>
<p>Sed ut perspiciatis, unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam eaque ipsa, quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt, explicabo. Nemo enim ipsam voluptatem, quia voluptas sit, aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos.</p>
</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/lfw/assets/temp_graph.jpg' alt='Distribution of citations per year per country for the top 5 countries with citations for the LFW Dataset'><div class='caption'>Distribution of citations per year per country for the top 5 countries with citations for the LFW Dataset</div></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/lfw/assets/temp_map.jpg' alt='Geographic distributions of citations for the LFW Dataset'><div class='caption'>Geographic distributions of citations for the LFW Dataset</div></div></section><section><h2>Conclusion</h2>
<p>The LFW face recognition training and evaluation dataset is a historically important face dataset as it was the first popular dataset to be created entirely from Internet images, paving the way for a global trend towards downloading anyone’s face from the Internet and adding it to a dataset. As will be evident with other datasets, LFW’s approach has now become the norm.</p>
<p>For all the 5,000 people in this datasets, their face is forever a part of facial recognition history. It would be impossible to remove anyone from the dataset because it is so ubiquitous. For their rest of the lives and forever after, these 5,000 people will continue to be used for training facial recognition surveillance.</p>
<h2>Right to Removal</h2>
<p>If you are affected by disclosure of your identity in this dataset please do contact the authors, many state that they are willing to remove images upon request. The authors of the LFW can be reached from the emails posted in their paper:</p>
<p>You can use the following message to request removal from the dataset:</p>
<p>Dear [researcher name],</p>
<p>I am writing to you about the "LFW Dataset". Recently I have discovered that your dataset includes my identity and no longer wish to be included in your dataset</p>
<p>MegaPixels is an educational art project developed for academic purposes. In no way does this project aim to villify the researchers who produced the datasets. The aim of this project is to encourage discourse around ethics and consent in artificial intelligence by providing information about these datasets that is otherwise difficult to obtain or inaccessible to other researchers.</p>
<h2>Supplementary Data</h2>
<p>Sed ut perspiciatis, unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam eaque ipsa, quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt, explicabo. Nemo enim ipsam voluptatem, quia voluptas sit, aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos.</p>
<table>
<thead><tr>
<th style="text-align:left">Title</th>
<th style="text-align:left">Organization</th>
<th style="text-align:left">Country</th>
<th style="text-align:left">Type</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left">3D-aided face recognition from videos</td>
<td style="text-align:left">University of Lyon</td>
<td style="text-align:left">France</td>
<td style="text-align:left">edu</td>
</tr>
<tr>
<td style="text-align:left">A Community Detection Approach to Cleaning Extremely Large Face Database</td>
<td style="text-align:left">National University of Defense Technology, China</td>
<td style="text-align:left">China</td>
<td style="text-align:left">edu</td>
</tr>
<tr>
<td style="text-align:left">3D-aided face recognition from videos</td>
<td style="text-align:left">University of Lyon</td>
<td style="text-align:left">France</td>
<td style="text-align:left">edu</td>
</tr>
<tr>
<td style="text-align:left">3D-aided face recognition from videos</td>
<td style="text-align:left">University of Lyon</td>
<td style="text-align:left">France</td>
<td style="text-align:left">edu</td>
</tr>
<tr>
<td style="text-align:left">3D-aided face recognition from videos</td>
<td style="text-align:left">University of Lyon</td>
<td style="text-align:left">France</td>
<td style="text-align:left">edu</td>
</tr>
<tr>
<td style="text-align:left">3D-aided face recognition from videos</td>
<td style="text-align:left">University of Lyon</td>
<td style="text-align:left">France</td>
<td style="text-align:left">edu</td>
</tr>
<tr>
<td style="text-align:left">3D-aided face recognition from videos</td>
<td style="text-align:left">University of Lyon</td>
<td style="text-align:left">France</td>
<td style="text-align:left">edu</td>
</tr>
<tr>
<td style="text-align:left">3D-aided face recognition from videos</td>
<td style="text-align:left">University of Lyon</td>
<td style="text-align:left">France</td>
<td style="text-align:left">edu</td>
</tr>
<tr>
<td style="text-align:left">3D-aided face recognition from videos</td>
<td style="text-align:left">University of Lyon</td>
<td style="text-align:left">France</td>
<td style="text-align:left">edu</td>
</tr>
<tr>
<td style="text-align:left">3D-aided face recognition from videos</td>
<td style="text-align:left">University of Lyon</td>
<td style="text-align:left">France</td>
<td style="text-align:left">edu</td>
</tr>
<tr>
<td style="text-align:left">3D-aided face recognition from videos</td>
<td style="text-align:left">University of Lyon</td>
<td style="text-align:left">France</td>
<td style="text-align:left">edu</td>
</tr>
<tr>
<td style="text-align:left">3D-aided face recognition from videos</td>
<td style="text-align:left">University of Lyon</td>
<td style="text-align:left">France</td>
<td style="text-align:left">edu</td>
</tr>
<tr>
<td style="text-align:left">3D-aided face recognition from videos</td>
<td style="text-align:left">University of Lyon</td>
<td style="text-align:left">France</td>
<td style="text-align:left">edu</td>
</tr>
<tr>
<td style="text-align:left">3D-aided face recognition from videos</td>
<td style="text-align:left">University of Lyon</td>
<td style="text-align:left">France</td>
<td style="text-align:left">edu</td>
</tr>
<tr>
<td style="text-align:left">3D-aided face recognition from videos</td>
<td style="text-align:left">University of Lyon</td>
<td style="text-align:left">France</td>
<td style="text-align:left">edu</td>
</tr>
</tbody>
</table>
<h2>Code</h2>
<pre><code class="lang-python">#!/usr/bin/python
import numpy as np
from sklearn.datasets import fetch_lfw_people
import imageio
import imutils
# download LFW dataset (first run takes a while)
lfw_people = fetch_lfw_people(min_faces_per_person=1, resize=1, color=True, funneled=False)
# introspect dataset
n_samples, h, w, c = lfw_people.images.shape
print('{:,} images at {}x{}'.format(n_samples, w, h))
cols, rows = (176, 76)
n_ims = cols * rows
# build montages
im_scale = 0.5
ims = lfw_people.images[:n_ims
montages = imutils.build_montages(ims, (int(w*im_scale, int(h*im_scale)), (cols, rows))
montage = montages[0]
# save full montage image
imageio.imwrite('lfw_montage_full.png', montage)
# make a smaller version
montage_960 = imutils.resize(montage, width=960)
imageio.imwrite('lfw_montage_960.jpg', montage_960)
</code></pre>
<div class="footnotes">
<hr>
<ol></ol>
</div>
</section>
</div>
<footer>
<div>
<a href="/">MegaPixels.cc</a>
<a href="/about/disclaimer/">Disclaimer</a>
<a href="/about/terms/">Terms of Use</a>
<a href="/about/privacy/">Privacy</a>
<a href="/about/">About</a>
<a href="/about/team/">Team</a>
</div>
<div>
MegaPixels ©2017-19 Adam R. Harvey /
<a href="https://ahprojects.com">ahprojects.com</a>
</div>
</footer>
</body>
<script src="/assets/js/app/site.js"></script>
</html>
|