site/public/datasets/helen/index.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247

<!doctype html>
<html>
<head>
  <title>MegaPixels: HELEN</title>
  <meta charset="utf-8" />
  <meta name="author" content="Adam Harvey" />
  <meta name="description" content="HELEN is a dataset of face images from Flickr used for training facial component localization algorithms" />
  <meta property="og:title" content="MegaPixels: HELEN"/>
  <meta property="og:type" content="website"/>
  <meta property="og:summary" content="MegaPixels is an art and research project about face recognition datasets created \"in the wild\"/>
  <meta property="og:image" content="https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/helen/assets/background.jpg" />
  <meta property="og:url" content="https://megapixels.cc/datasets/helen/"/>
  <meta property="og:site_name" content="MegaPixels" />
  <meta name="referrer" content="no-referrer" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=no"/>
  <meta name="apple-mobile-web-app-status-bar-style" content="black">
  <meta name="apple-mobile-web-app-capable" content="yes">

  <link rel="apple-touch-icon" sizes="57x57" href="/assets/img/favicon/apple-icon-57x57.png">
  <link rel="apple-touch-icon" sizes="60x60" href="/assets/img/favicon/apple-icon-60x60.png">
  <link rel="apple-touch-icon" sizes="72x72" href="/assets/img/favicon/apple-icon-72x72.png">
  <link rel="apple-touch-icon" sizes="76x76" href="/assets/img/favicon/apple-icon-76x76.png">
  <link rel="apple-touch-icon" sizes="114x114" href="/assets/img/favicon/apple-icon-114x114.png">
  <link rel="apple-touch-icon" sizes="120x120" href="/assets/img/favicon/apple-icon-120x120.png">
  <link rel="apple-touch-icon" sizes="144x144" href="/assets/img/favicon/apple-icon-144x144.png">
  <link rel="apple-touch-icon" sizes="152x152" href="/assets/img/favicon/apple-icon-152x152.png">
  <link rel="apple-touch-icon" sizes="180x180" href="/assets/img/favicon/apple-icon-180x180.png">
  <link rel="icon" type="image/png" sizes="192x192"  href="/assets/img/favicon/android-icon-192x192.png">
  <link rel="icon" type="image/png" sizes="32x32" href="/assets/img/favicon/favicon-32x32.png">
  <link rel="icon" type="image/png" sizes="96x96" href="/assets/img/favicon/favicon-96x96.png">
  <link rel="icon" type="image/png" sizes="16x16" href="/assets/img/favicon/favicon-16x16.png">
  <link rel="manifest" href="/assets/img/favicon/manifest.json">
  <meta name="msapplication-TileColor" content="#ffffff">
  <meta name="msapplication-TileImage" content="/ms-icon-144x144.png">
  <meta name="theme-color" content="#ffffff">
  
  <link rel='stylesheet' href='/assets/css/fonts.css' />
  <link rel='stylesheet' href='/assets/css/css.css' />
  <link rel='stylesheet' href='/assets/css/leaflet.css' />
  <link rel='stylesheet' href='/assets/css/applets.css' />
  <link rel='stylesheet' href='/assets/css/mobile.css' />
</head>
<body>
  <header>
    <a class='slogan' href="/">
      <div class='logo'></div>
      <div class='site_name'>MegaPixels</div>
      <div class='page_name'>Helen Dataset</div>
    </a>
    <div class='links'>
      <a href="/datasets/">Datasets</a>
      <a href="/about/">About</a>
      <a href="/research">Research</a>
    </div>
  </header>
  <div class="content content-dataset">
    
  <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/helen/assets/background.jpg)'></section><section><div class='image'><div class='intro-caption caption'>Example images from the HELEN dataset</div></div></section><section><h1>HELEN Dataset</h1>
</section><section><div class='right-sidebar'><div class='meta'>
    <div class='gray'>Published</div>
    <div>2012</div>
  </div><div class='meta'>
    <div class='gray'>Images</div>
    <div>2,330 </div>
  </div><div class='meta'>
    <div class='gray'>Purpose</div>
    <div>facial feature localization algorithm</div>
  </div><div class='meta'>
    <div class='gray'>Website</div>
    <div><a href='http://www.ifp.illinois.edu/~vuongle2/helen/' target='_blank' rel='nofollow noopener'>illinois.edu</a></div>
  </div></div><p>Helen is a dataset of annotated face images used for facial component localization. It includes 2,330 images from Flickr found by searching for "portrait" combined with terms such as "family", "wedding", "boy", "outdoor", and "studio".<a class="footnote_shim" name="[^orig_paper]_1"> </a><a href="#[^orig_paper]" class="footnote" title="Footnote 1">1</a></p>
<p>The dataset was published in 2012 with the primary motivation listed as facilitating "high quality editing of portraits". However, the paper's introduction also mentions that facial feature localization "is an essential component for face recognition, tracking and expression analysis."<a class="footnote_shim" name="[^orig_paper]_2"> </a><a href="#[^orig_paper]" class="footnote" title="Footnote 1">1</a></p>
<p>Irregardless of the authors' primary motivations, the HELEN dataset has become one of the most widely used datasets for training facial landmark algorithms, which are essential parts of most facial recogntion processing systems. Facial landmarking are used to isolate facial features such as the eyes, nose, jawline, and mouth in order to align faces to match a templated pose.</p>
</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/helen/assets/montage_lms_21_14_14_14_26.png' alt=' An example annotation from the HELEN dataset showing 194 points that were originally annotated by Mechanical Turk workers. Graphic &copy; 2019 MegaPixels.cc based on data from HELEN dataset by  Le, Vuong et al.'><div class='caption'> An example annotation from the HELEN dataset showing 194 points that were originally annotated by Mechanical Turk workers. Graphic &copy; 2019 MegaPixels.cc based on data from HELEN dataset by  Le, Vuong et al.</div></div></section><section><p>This analysis shows that since its initial publication in 2012, the HELEN dataset has been used in over 200 research projects related to facial recognition with the vast majority of research taking place in China.</p>
<p>Commercial use includes IBM, NVIDIA, NEC, Microsoft Research Asia, Google, Megvii, Microsoft, Intel, Daimler, Tencent, Baidu, Adobe, Facebook</p>
<p>Military and Defense Usage includes NUDT</p>
<p><a href="http://eccv2012.unifi.it/">http://eccv2012.unifi.it/</a></p>
<p>TODO</p>
<ul>
<li>add proof of use in dlib and openface</li>
<li>add proof of use in commercial use of dlib? ibm dif</li>
<li>make landmark over blurred images</li>
<li>add 6x6 gride for landmarks</li>
<li>highlight key findings</li>
<li>highlight key commercial usage</li>
<li>look for most interesting research papers to provide example of how it's used for face recognition</li>
<li>estimated time: 6 hours</li>
<li>add data to github repo?</li>
</ul>
<table>
<thead><tr>
<th>Organization</th>
<th>Paper</th>
<th>Link</th>
<th>Year</th>
<th>Used Duke MTMC</th>
</tr>
</thead>
<tbody>
<tr>
<td>SenseTime, Amazon</td>
<td><a href="https://arxiv.org/pdf/1805.10483.pdf">Look at Boundary: A Boundary-Aware Face Alignment Algorithm</a></td>
</tr>
<tr>
<td>2018</td>
<td>year</td>
<td>&#x2714;</td>
</tr>
<tr>
<td>SenseTime</td>
<td><a href="https://arxiv.org/pdf/1807.11079.pdf">ReenactGAN: Learning to Reenact Faces via Boundary Transfer</a></td>
<td>2018</td>
<td>year</td>
<td>&#x2714;</td>
</tr>
</tbody>
</table>
<p>The dataset was used for training the OpenFace software "we used the HELEN and LFPW training subsets for training and the rest for testing" <a href="https://github.com/TadasBaltrusaitis/OpenFace/wiki/Datasets">https://github.com/TadasBaltrusaitis/OpenFace/wiki/Datasets</a></p>
<p>The popular dlib facial landmark detector was trained using HELEN</p>
<p>In addition to the 200+ verified citations, the HELEN dataset was used for</p>
<ul>
<li><a href="https://github.com/memoiry/face-alignment">https://github.com/memoiry/face-alignment</a></li>
<li><a href="http://www.dsp.toronto.edu/projects/face_analysis/">http://www.dsp.toronto.edu/projects/face_analysis/</a></li>
</ul>
<p>It's been converted into new datasets including</p>
<ul>
<li><a href="https://github.com/JPlin/Relabeled-HELEN-Dataset">https://github.com/JPlin/Relabeled-HELEN-Dataset</a></li>
<li><a href="https://www.kaggle.com/kmader/helen-eye-dataset">https://www.kaggle.com/kmader/helen-eye-dataset</a></li>
</ul>
<p>The original site</p>
<ul>
<li><a href="http://www.ifp.illinois.edu/~vuongle2/helen/">http://www.ifp.illinois.edu/~vuongle2/helen/</a></li>
</ul>
<h3>Example Images</h3>
</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/helen/assets/feature_outdoor_02.jpg' alt=' An image from the HELEN dataset "wedding" category used for training face recognition  2839127417_1.jpg for outdoor studio'><div class='caption'> An image from the HELEN dataset "wedding" category used for training face recognition  2839127417_1.jpg for outdoor studio</div></div>
<div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/helen/assets/feature_graduation.jpg' alt=' An image from the HELEN dataset "wedding" category used for training face recognition 2325274893_1 '><div class='caption'> An image from the HELEN dataset "wedding" category used for training face recognition 2325274893_1 </div></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/helen/assets/feature_wedding.jpg' alt=' An image from the HELEN dataset "wedding" category used for training face recognition 2325274893_1 '><div class='caption'> An image from the HELEN dataset "wedding" category used for training face recognition 2325274893_1 </div></div>
<div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/helen/assets/feature_wedding_02.jpg' alt=' An image from the HELEN dataset "wedding" category used for training face recognition 2325274893_1 '><div class='caption'> An image from the HELEN dataset "wedding" category used for training face recognition 2325274893_1 </div></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/helen/assets/feature_family.jpg' alt=' Original Flickr image used in HELEN facial analysis and recognition dataset for the keyword "family". 296814969'><div class='caption'> Original Flickr image used in HELEN facial analysis and recognition dataset for the keyword "family". 296814969</div></div>
<div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/helen/assets/feature_family_05.jpg' alt=' Original Flickr image used in HELEN facial analysis and recognition dataset for the keyword "family". 296814969'><div class='caption'> Original Flickr image used in HELEN facial analysis and recognition dataset for the keyword "family". 296814969</div></div></section><section>
  <h3>Who used Helen Dataset?</h3>

  <p>
    This bar chart presents a ranking of the top countries where dataset citations originated.  Mouse over individual columns to see yearly totals. These charts show at most the top 10 countries.
  </p>
 
 </section>

<section class="applet_container">
<!-- 	<div style="position: absolute;top: 0px;right: -55px;width: 180px;font-size: 14px;">Labeled Faces in the Wild Dataset<br><span class="numc" style="font-size: 11px;">20 citations</span>
</div> -->
 <div class="applet" data-payload="{&quot;command&quot;: &quot;chart&quot;}"></div>
</section>

<section class="applet_container">
 <div class="applet" data-payload="{&quot;command&quot;: &quot;piechart&quot;}"></div>
</section>

<section>
	
	<h3>Information Supply Chain</h3>

	<p>
		To help understand how Helen Dataset has been used around the world by commercial, military, and academic organizations; existing publicly available research citing Helen Dataset was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.
	</p>
 
 </section>

<section class="applet_container fullwidth">
 <div class="applet" data-payload="{&quot;command&quot;: &quot;map&quot;}"></div>
</section>

<div class="caption">
	<ul class="map-legend">
	<li class="edu">Academic</li>
	<li class="com">Commercial</li>
	<li class="gov">Military / Government</li>
	</ul>
	<div class="source">Citation data is collected using SemanticScholar.org then dataset usage verified and geolocated. Citations are used to provide overview of how and where images were used.</div>
</div>


<section class="applet_container">

  <h3>Dataset Citations</h3>
  <p>
    The dataset citations used in the visualizations were collected from <a href="https://www.semanticscholar.org">Semantic Scholar</a>, a website which aggregates and indexes research papers.  Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources.  These papers have been manually verified to show that researchers downloaded and used the dataset to train or test machine learning algorithms. If you use our data, please <a href="/about/attribution">cite our work</a>.
  </p>

  <div class="applet" data-payload="{&quot;command&quot;: &quot;citations&quot;}"></div>
</section><section>

  <div class="hr-wave-holder">
      <div class="hr-wave-line hr-wave-line1"></div>
      <div class="hr-wave-line hr-wave-line2"></div>
  </div>

  <h2>Supplementary Information</h2>
  
</section><section><h3>Age and Gender Distribution</h3>
</section><section>
	<p>Age and gender estimation distribution were calculated by anlayzing all faces in the dataset images. This may include additional faces appearing next to an annotated face, or this may skip false faces that were erroneously included as part of the original dataset. These numbers are provided as an estimation and not a factual representation of the exact gender and age of all faces.</p>
</section><section><div class='columns columns-2'><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /datasets/helen/assets/age.csv", "fields": ["Caption: HELEN dataset age distribution", "Top: 10", "OtherLabel: Other"]}'></div></section><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /datasets/helen/assets/gender.csv", "fields": ["Caption: HELEN dataset gender distribution", "Top: 10", "OtherLabel: Other"]}'></div></section></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/helen/assets/montage_lms_21_15_15_7_26_0.png' alt=' Visualization of the HELEN dataset 194-point facial landmark annotations. Credit: graphic &copy; MegaPixels.cc 2019, data from HELEN dataset by Zhou, Brand, Lin 2013. If you use this image please credit both the graphic and data source.'><div class='caption'> Visualization of the HELEN dataset 194-point facial landmark annotations. Credit: graphic &copy; MegaPixels.cc 2019, data from HELEN dataset by Zhou, Brand, Lin 2013. If you use this image please credit both the graphic and data source.</div></div></section><section>

  <h4>Cite Our Work</h4>
  <p>
  	
  	If you find this analysis helpful, please cite our work:

<pre id="cite-bibtex">
@online{megapixels,
  author = {Harvey, Adam. LaPlace, Jules.},
  title = {MegaPixels: Origins, Ethics, and Privacy Implications of Publicly Available Face Recognition Image Datasets},
  year = 2019,
  url = {https://megapixels.cc/},
  urldate = {2019-04-18}
}</pre>

	</p>
</section><section><h4>Cite the Original Author's Work</h4>
<p>If you find the HELEN dataset useful or reference it in your work, please cite the author's original work as:</p>
<pre>
@inproceedings{Le2012InteractiveFF,
 title={Interactive Facial Feature Localization},
 author={Vuong Le and Jonathan Brandt and Zhe L. Lin and Lubomir D. Bourdev and Thomas S. Huang},
 booktitle={ECCV},
 year={2012}
}
</pre></section><section><h3>References</h3><section><ul class="footnotes"><li>1 <a name="[^orig_paper]" class="footnote_shim"></a><span class="backlinks"><a href="#[^orig_paper]_1">a</a><a href="#[^orig_paper]_2">b</a></span>Le, Vuong et al. “Interactive Facial Feature Localization.” ECCV (2012).
</li></ul></section></section>

  </div>
  <footer>
    <ul class="footer-left">
      <li><a href="/">MegaPixels.cc</a></li>
      <li><a href="/datasets/">Datasets</a></li>
      <li><a href="/about/">About</a></li>
      <li><a href="/about/news/">News</a></li>
      <li><a href="/about/legal/">Legal &amp; Privacy</a></li>
    </ul>
    <ul class="footer-right">
      <li>MegaPixels &copy;2017-19 &nbsp;<a href="https://ahprojects.com">Adam R. Harvey</a></li>
      <li>Made with support from &nbsp;<a href="https://mozilla.org">Mozilla</a></li>
    </ul>
  </footer>
</body>

<script src="/assets/js/dist/index.js"></script>
</html>