site/public/datasets/lfw/index.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141

<!doctype html>
<html>
<head>
  <title>MegaPixels</title>
  <meta charset="utf-8" />
  <meta name="author" content="Adam Harvey" />
  <meta name="description" content="Labeled Faces in The Wild (LFW) is the first facial recognition dataset created entirely from online photos" />
  <meta name="referrer" content="no-referrer" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
  <link rel='stylesheet' href='/assets/css/fonts.css' />
  <link rel='stylesheet' href='/assets/css/tabulator.css' />
  <link rel='stylesheet' href='/assets/css/css.css' />
  <link rel='stylesheet' href='/assets/css/leaflet.css' />
  <link rel='stylesheet' href='/assets/css/applets.css' />
</head>
<body>
  <header>
    <a class='slogan' href="/">
      <div class='logo'></div>
      <div class='site_name'>MegaPixels</div>
    </a>
    <div class='links'>
      <a href="/datasets/">Datasets</a>
      <a href="/about/">About</a>
    </div>
  </header>
  <div class="content content-">
    
  <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/lfw/assets/background.jpg)'><div class='inner'><div class='hero_desc'><span class='bgpad'><span style='color: #ff0000'>Labeled Faces in The Wild</span> (LFW) is the first facial recognition dataset created entirely from online photos</span></div><div class='hero_subdesc'><span class='bgpad'>It includes 13,456 images of 4,432 people's images copied from the Internet during 2002-2004 and is the most frequently used dataset in the world for benchmarking face recognition algorithms.
</span></div></div></section><section><div class='left-sidebar'><div class='meta'><div><div class='gray'>Created</div><div>2002 &ndash; 2004</div></div><div><div class='gray'>Images</div><div>13,233</div></div><div><div class='gray'>Identities</div><div>5,749</div></div><div><div class='gray'>Origin</div><div>Yahoo! News Images</div></div><div><div class='gray'>Used by</div><div>Facebook, Google, Microsoft, Baidu, Tencent, SenseTime, Face++, CIA, NSA, IARPA</div></div><div><div class='gray'>Website</div><div><a href="http://vis-www.cs.umass.edu/lfw">umass.edu</a></div></div></div><ul>
<li>There are about 3 men for every 1 woman in the LFW dataset<a class="footnote_shim" name="[^lfw_www]_1"> </a><a href="#[^lfw_www]" class="footnote" title="Footnote 1">1</a></li>
<li>The person with the most images is <a href="http://vis-www.cs.umass.edu/lfw/person/George_W_Bush_comp.html">George W. Bush</a> with 530</li>
<li>There are about 3 George W. Bush's for every 1 <a href="http://vis-www.cs.umass.edu/lfw/person/Tony_Blair.html">Tony Blair</a></li>
<li>The LFW dataset includes over 500 actors, 30 models, 10 presidents, 124 basketball players, 24 football players, 11 kings, 7 queens, and 1 <a href="http://vis-www.cs.umass.edu/lfw/person/Moby.html">Moby</a></li>
<li>In all 3 of the LFW publications [^lfw_original_paper], [^lfw_survey], [^lfw_tech_report] the words "ethics", "consent", and "privacy" appear 0 times</li>
<li>The word "future" appears 71 times</li>
<li>* denotes partial funding for related research</li>
</ul>
</div><h2>Labeled Faces in the Wild</h2>
<p>(PAGE UNDER DEVELOPMENT)</p>
<p><em>Labeled Faces in The Wild</em> (LFW) is "a database of face photographs designed for studying the problem of unconstrained face recognition<a class="footnote_shim" name="[^lfw_www]_2"> </a><a href="#[^lfw_www]" class="footnote" title="Footnote 1">1</a>. It is used to evaluate and improve the performance of facial recognition algorithms in academic, commercial, and government research. According to BiometricUpdate.com<a class="footnote_shim" name="[^lfw_pingan]_1"> </a><a href="#[^lfw_pingan]" class="footnote" title="Footnote 3">3</a>, LFW is "the most widely used evaluation set in the field of facial recognition, LFW attracts a few dozen teams from around the globe including Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong."</p>
<p>The LFW dataset includes 13,233 images of 5,749 people that were collected between 2002-2004. LFW is a subset of <em>Names of Faces</em> and is part of the first facial recognition training dataset created entirely from images appearing on the Internet. The people appearing in LFW are...</p>
<p>The <em>Names and Faces</em> dataset was the first face recognition dataset created entire from online photos. However, <em>Names and Faces</em> and <em>LFW</em> are not the first face recognition dataset created entirely "in the wild". That title belongs to the <a href="/datasets/ucd_faces/">UCD dataset</a>. Images obtained "in the wild" means using an image without explicit consent or awareness from the subject or photographer.</p>
<p>The <em>Names and Faces</em> dataset was the first face recognition dataset created entire from online photos. However, <em>Names and Faces</em> and <em>LFW</em> are not the first face recognition dataset created entirely "in the wild". That title belongs to the <a href="/datasets/ucd_faces/">UCD dataset</a>. Images obtained "in the wild" means using an image without explicit consent or awareness from the subject or photographer.</p>
</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/lfw/assets/lfw_montage_all_crop.jpg' alt='All 5,379 people in the Labeled Faces in The Wild Dataset. Showing one face per person'><div class='caption'>All 5,379 people in the Labeled Faces in The Wild Dataset. Showing one face per person</div></div></section><section><p>The <em>Names and Faces</em> dataset was the first face recognition dataset created entire from online photos. However, <em>Names and Faces</em> and <em>LFW</em> are not the first face recognition dataset created entirely "in the wild". That title belongs to the <a href="/datasets/ucd_faces/">UCD dataset</a>. Images obtained "in the wild" means using an image without explicit consent or awareness from the subject or photographer.</p>
<p>The <em>Names and Faces</em> dataset was the first face recognition dataset created entire from online photos. However, <em>Names and Faces</em> and <em>LFW</em> are not the first face recognition dataset created entirely "in the wild". That title belongs to the <a href="/datasets/ucd_faces/">UCD dataset</a>. Images obtained "in the wild" means using an image without explicit consent or awareness from the subject or photographer.</p>
</section><section>
	
	<h3>Biometric Trade Routes (beta)</h3>
<!-- 
	<div class="map-sidebar right-sidebar">
	  <h3>Legend</h3>
	  <ul>
	    <li><span style="color: #f2f293">&#9632;</span> Industry</li>
	    <li><span style="color: #f30000">&#9632;</span> Academic</li>
	    <li><span style="color: #3264f6">&#9632;</span> Government</li>
	  </ul>
	</div>
	 -->
	<p>
		To understand how LFW has been used around the world...
		affected global research on computer vision, surveillance, defense, and consumer technology, the  and where this dataset has been used the locations of each organization that used or referenced the datast 
	</p>
 
 </section>

<section class="applet_container">
 <div class="applet" data-payload="{&quot;command&quot;: &quot;map&quot;}"></div>
</section>

<div class="caption">
	<div class="map-legend-item edu">Academic</div>
	<div class="map-legend-item com">Industry</div>
	<div class="map-legend-item gov">Government</div> 
	Data is compiled from <a href="https://www.semanticscholar.org">Semantic Scholar</a> and not yet manually verified.
</div>

<section>
	<p class='subp'>
		The data is generated by collecting all citations for all  original research papers associated with the dataset. Then the PDFs are then converted to text and the organization names are extracted and geocoded. Because of the automated approach to extracting data, actual use of the dataset can not yet be confirmed. This visualization is provided to help locate and confirm usage and will be updated as data noise is reduced.
	</p>
</section><section>
  <h3>Who used LFW?</h3>

  <p>
    This bar chart presents a ranking of the top countries where citations originated.  Mouse over individual columns
    to see yearly totals.  Colors are only assigned to the top 10 overall countries.
  </p>
 
 </section>

<section class="applet_container">
 <div class="applet" data-payload="{&quot;command&quot;: &quot;chart&quot;}"></div>
</section><section>


  <div class="hr-wave-holder">
      <div class="hr-wave-line hr-wave-line1"></div>
      <div class="hr-wave-line hr-wave-line2"></div>
  </div>

  <h2>Supplementary Information</h2>
</section><section class="applet_container">

  <h3>Citations</h3>
  <p>
    Citations were collected from <a href="https://www.semanticscholar.org">Semantic Scholar</a>, a website which aggregates
    and indexes research papers.  Metadata was extracted from these papers, including extracting names of institutions automatically from PDFs, and then the addresses were geocoded.  Data is not yet manually verified, and reflects anytime the paper was cited.  Some papers may only mention the dataset in passing, while others use it as part of their research methodology.
  </p>
  <p>
    Add button/link to download CSV
  </p>

  <div class="applet" data-payload="{&quot;command&quot;: &quot;citations&quot;}"></div>
</section><section><h3>Commercial Use</h3>
<p>Add a paragraph about how usage extends far beyond academia into research centers for largest companies in the world. And even funnels into CIA funded research in the US and defense industry usage in China.</p>
</section><section class='applet_container'><div class='applet' data-payload='{"command": "load_file assets/lfw_commercial_use.csv", "fields": ["name_display, company_url, example_url, country, description"]}'></div></section><section><p>Research, text, and graphics ©Adam Harvey / megapixels.cc</p>
</section><section><ul class="footnotes"><li><a name="[^lfw_www]" class="footnote_shim"></a><span class="backlinks"><a href="#[^lfw_www]_1">a</a><a href="#[^lfw_www]_2">b</a></span><p><a href="http://vis-www.cs.umass.edu/lfw/results.html">http://vis-www.cs.umass.edu/lfw/results.html</a></p>
</li><li><a name="[^lfw_baidu]" class="footnote_shim"></a><span class="backlinks"></span><p>Jingtuo Liu, Yafeng Deng, Tao Bai, Zhengping Wei, Chang Huang. Targeting Ultimate Accuracy: Face Recognition via Deep Embedding. <a href="https://arxiv.org/abs/1506.07310">https://arxiv.org/abs/1506.07310</a></p>
</li><li><a name="[^lfw_pingan]" class="footnote_shim"></a><span class="backlinks"><a href="#[^lfw_pingan]_1">a</a></span><p>Lee, Justin. "PING AN Tech facial recognition receives high score in latest LFW test results". BiometricUpdate.com. Feb 13, 2017. <a href="https://www.biometricupdate.com/201702/ping-an-tech-facial-recognition-receives-high-score-in-latest-lfw-test-results">https://www.biometricupdate.com/201702/ping-an-tech-facial-recognition-receives-high-score-in-latest-lfw-test-results</a></p>
</li></ul></section>

  </div>
  <footer>
    <div>
      <a href="/">MegaPixels.cc</a>
      <a href="/about/disclaimer/">Disclaimer</a>
      <a href="/about/terms/">Terms of Use</a>
      <a href="/about/privacy/">Privacy</a>
      <a href="/about/">About</a>
      <a href="/about/team/">Team</a>
    </div>
    <div>
      MegaPixels &copy;2017-19 Adam R. Harvey /&nbsp;
      <a href="https://ahprojects.com">ahprojects.com</a>
    </div>
  </footer>
</body>

<script src="/assets/js/dist/index.js"></script>
</html>