1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
|
<!doctype html>
<html>
<head>
<title>MegaPixels</title>
<meta charset="utf-8" />
<meta name="author" content="Adam Harvey" />
<meta name="description" content="MS Celeb is a dataset of web images used for training and evaluating face recognition algorithms" />
<meta name="referrer" content="no-referrer" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
<link rel='stylesheet' href='/assets/css/fonts.css' />
<link rel='stylesheet' href='/assets/css/css.css' />
<link rel='stylesheet' href='/assets/css/leaflet.css' />
<link rel='stylesheet' href='/assets/css/applets.css' />
</head>
<body>
<header>
<a class='slogan' href="/">
<div class='logo'></div>
<div class='site_name'>MegaPixels</div>
<div class='splash'>Microsoft Celeb</div>
</a>
<div class='links'>
<a href="/datasets/">Datasets</a>
<a href="/about/">About</a>
</div>
</header>
<div class="content content-dataset">
<section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/msceleb/assets/background.jpg)'><div class='inner'><div class='hero_desc'><span class='bgpad'>MS Celeb is a dataset of web images used for training and evaluating face recognition algorithms</span></div><div class='hero_subdesc'><span class='bgpad'>The MS Celeb dataset includes over 10,000,000 images and 93,000 identities of semi-public figures collected using the Bing search engine
</span></div></div></section><section><h2>Microsoft Celeb Dataset (MS Celeb)</h2>
</section><section><div class='right-sidebar'><div class='meta'>
<div class='gray'>Published</div>
<div>2016</div>
</div><div class='meta'>
<div class='gray'>Images</div>
<div>1,000,000 </div>
</div><div class='meta'>
<div class='gray'>Identities</div>
<div>100,000 </div>
</div><div class='meta'>
<div class='gray'>Purpose</div>
<div>Large-scale face recognition</div>
</div><div class='meta'>
<div class='gray'>Created by</div>
<div>Microsoft Research</div>
</div><div class='meta'>
<div class='gray'>Funded by</div>
<div>Microsoft Research</div>
</div><div class='meta'>
<div class='gray'>Website</div>
<div><a href='http://www.msceleb.org/' target='_blank' rel='nofollow noopener'>msceleb.org</a></div>
</div></div><p>The Microsoft Celeb dataset is a face recognition training site made entirely of images scraped from the Internet. According to Microsoft Research who created and published the dataset in 2016, MS Celeb is the largest publicly available face recognition dataset in the world, containing over 10 million images of 100,000 individuals.</p>
<p>But Microsoft's ambition was bigger. They wanted to recognize 1 million individuals. As part of their dataset they released a list of 1 million target identities for researchers to identity. The identities</p>
<p><a href="https://www.microsoft.com/en-us/research/publication/ms-celeb-1m-dataset-benchmark-large-scale-face-recognition-2/">https://www.microsoft.com/en-us/research/publication/ms-celeb-1m-dataset-benchmark-large-scale-face-recognition-2/</a></p>
<p>In 2019, Microsoft CEO Brad Smith called for the governmental regulation of face recognition, an admission of his own company's inability to control their surveillance-driven business model. Yet since then, and for the last 4 years, Microsoft has willingly and actively played a significant role in accelerating growth in the very same industry they called for the government to regulate. This investigation looks look into the <a href="https://www.microsoft.com/en-us/research/publication/ms-celeb-1m-dataset-benchmark-large-scale-face-recognition-2/">MS Celeb</a> dataset and Microsoft Research's role in creating and distributing the largest publicly available face recognition dataset in the world to both.</p>
<p>to spur growth and incentivize researchers, Microsoft released a dataset called <a href="https://msceleb.org">MS Celeb</a>, or Microsft Celeb, in which they developed and published a list of exactly 1 million targeted people whose biometrics would go on to build</p>
</section><section>
<h3>Who used Microsoft Celeb?</h3>
<p>
This bar chart presents a ranking of the top countries where dataset citations originated. Mouse over individual columns to see yearly totals. These charts show at most the top 10 countries.
</p>
</section>
<section class="applet_container">
<!-- <div style="position: absolute;top: 0px;right: -55px;width: 180px;font-size: 14px;">Labeled Faces in the Wild Dataset<br><span class="numc" style="font-size: 11px;">20 citations</span>
</div> -->
<div class="applet" data-payload="{"command": "chart"}"></div>
</section>
<section class="applet_container">
<div class="applet" data-payload="{"command": "piechart"}"></div>
</section>
<section>
<h3>Biometric Trade Routes</h3>
<p>
To help understand how Microsoft Celeb has been used around the world by commercial, military, and academic organizations; existing publicly available research citing Microsoft Celebrity Dataset was collected, verified, and geocoded to show the biometric trade routes of people appearing in the images. Click on the markers to reveal research projects at that location.
</p>
</section>
<section class="applet_container fullwidth">
<div class="applet" data-payload="{"command": "map"}"></div>
</section>
<div class="caption">
<ul class="map-legend">
<li class="edu">Academic</li>
<li class="com">Commercial</li>
<li class="gov">Military / Government</li>
</ul>
<div class="source">Citation data is collected using <a href="https://semanticscholar.org" target="_blank">SemanticScholar.org</a> then dataset usage verified and geolocated.</div >
</div>
<section class="applet_container">
<h3>Dataset Citations</h3>
<p>
The dataset citations used in the visualizations were collected from <a href="https://www.semanticscholar.org">Semantic Scholar</a>, a website which aggregates and indexes research papers. Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train or test machine learning algorithms. If you use our data, please <a href="/about/attribution">cite our work</a>.
</p>
<div class="applet" data-payload="{"command": "citations"}"></div>
</section><section>
<div class="hr-wave-holder">
<div class="hr-wave-line hr-wave-line1"></div>
<div class="hr-wave-line hr-wave-line2"></div>
</div>
<h2>Supplementary Information</h2>
</section><section><h3>Additional Information</h3>
<ul>
<li>SenseTime <a href="https://www.semanticscholar.org/paper/The-Devil-of-Face-Recognition-is-in-the-Noise-Wang-Chen/9e31e77f9543ab42474ba4e9330676e18c242e72">https://www.semanticscholar.org/paper/The-Devil-of-Face-Recognition-is-in-the-Noise-Wang-Chen/9e31e77f9543ab42474ba4e9330676e18c242e72</a></li>
<li>Microsoft used it <a href="https://www.semanticscholar.org/paper/One-shot-Face-Recognition-by-Promoting-Classes-Guo/6cacda04a541d251e8221d70ac61fda88fb61a70">https://www.semanticscholar.org/paper/One-shot-Face-Recognition-by-Promoting-Classes-Guo/6cacda04a541d251e8221d70ac61fda88fb61a70</a></li>
<li><a href="https://www.hrw.org/news/2019/01/15/letter-microsoft-face-surveillance-technology">https://www.hrw.org/news/2019/01/15/letter-microsoft-face-surveillance-technology</a></li>
<li><a href="https://www.scmp.com/tech/science-research/article/3005733/what-you-need-know-about-sensenets-facial-recognition-firm">https://www.scmp.com/tech/science-research/article/3005733/what-you-need-know-about-sensenets-facial-recognition-firm</a></li>
</ul>
</section><section><h3>References</h3><section><ul class="footnotes"><li><a name="[^brad_smith]" class="footnote_shim"></a><span class="backlinks"></span><p>Brad Smith cite</p>
</li></ul></section></section>
</div>
<footer>
<ul class="footer-left">
<li><a href="/">MegaPixels.cc</a></li>
<li><a href="/datasets/">Datasets</a></li>
<li><a href="/about/">About</a></li>
<li><a href="/about/press/">Press</a></li>
<li><a href="/about/legal/">Legal and Privacy</a></li>
</ul>
<ul class="footer-right">
<li>MegaPixels ©2017-19 <a href="https://ahprojects.com">Adam R. Harvey</a></li>
<li>Made with support from <a href="https://mozilla.org">Mozilla</a></li>
</ul>
</footer>
</body>
<script src="/assets/js/dist/index.js"></script>
</html>
|