summaryrefslogtreecommitdiff
path: root/site/public/datasets/msceleb
diff options
context:
space:
mode:
authorAdam Harvey <adam@ahprojects.com>2019-04-21 09:40:57 +0200
committerAdam Harvey <adam@ahprojects.com>2019-04-21 09:40:57 +0200
commitd6dc546c97a18669416f203c3cf30c066bb28cfb (patch)
treedccdb299dd8417a51765bfaa0e4f9dccf363866f /site/public/datasets/msceleb
parent234ef07325a4ff47c089d8b3d83dd3123d67a95f (diff)
update typos
Diffstat (limited to 'site/public/datasets/msceleb')
-rw-r--r--site/public/datasets/msceleb/index.html2
1 files changed, 1 insertions, 1 deletions
diff --git a/site/public/datasets/msceleb/index.html b/site/public/datasets/msceleb/index.html
index f9c184c8..89543f08 100644
--- a/site/public/datasets/msceleb/index.html
+++ b/site/public/datasets/msceleb/index.html
@@ -57,7 +57,7 @@
</div><div class='meta'>
<div class='gray'>Website</div>
<div><a href='http://www.msceleb.org/' target='_blank' rel='nofollow noopener'>msceleb.org</a></div>
- </div></div><p>Microsoft Celeb (MS Celeb) is a dataset of 10 million face images scraped from the internet and used for research and development of large-scale biometric recognition systems. According to Microsoft Research, who created and published the <a href="https://www.microsoft.com/en-us/research/publication/ms-celeb-1m-dataset-benchmark-large-scale-face-recognition-2/">dataset</a> in 2016, MS Celeb is the largest publicly available face recognition dataset in the world, containing over 10 million images of nearly 100,000 individuals. Microsoft's goal in building this dataset was to distribute an initial training dataset of 100,000 individuals' images, and to use this dataset to accelerate research into recognizing a larger target list of one million people "using all the possibly collected face images of this individual on the web as training data".<a class="footnote_shim" name="[^msceleb_orig]_1"> </a><a href="#[^msceleb_orig]" class="footnote" title="Footnote 1">1</a></p>
+ </div></div><p>Microsoft Celeb (MS Celeb) is a dataset of 10 million face images scraped from the Internet and used for research and development of large-scale biometric recognition systems. According to Microsoft Research, who created and published the <a href="https://www.microsoft.com/en-us/research/publication/ms-celeb-1m-dataset-benchmark-large-scale-face-recognition-2/">dataset</a> in 2016, MS Celeb is the largest publicly available face recognition dataset in the world, containing over 10 million images of nearly 100,000 individuals. Microsoft's goal in building this dataset was to distribute an initial training dataset of 100,000 individuals' images, and to use this dataset to accelerate research into recognizing a larger target list of one million people "using all the possibly collected face images of this individual on the web as training data".<a class="footnote_shim" name="[^msceleb_orig]_1"> </a><a href="#[^msceleb_orig]" class="footnote" title="Footnote 1">1</a></p>
<p>These one million people, defined by Microsoft Research as "celebrities", are often merely people who must maintain an online presence for their professional lives. Microsoft's list of 1 million people is an expansive exploitation of the current reality that for many people, including academics, policy makers, writers, artists, and especially journalists, maintaining an online presence is mandatory. This fact should not allow Microsoft or anyone else to use their biometrics for research and development of surveillance technology. Many names in the target list even include people critical of the very technology Microsoft is using their name and biometric information to build. The list includes digital rights activists like Jillian York; artists critical of surveillance including Trevor Paglen, Jill Magid, and Aram Bartholl; Intercept founders Laura Poitras, Jeremy Scahill, and Glenn Greenwald; Data and Society founder danah boyd; and even Julie Brill, the former FTC commissioner responsible for protecting consumer privacy, to name a few.</p>
<h3>Microsoft's 1 Million Target List</h3>
<p>Below is a selection of names from the full target list, curated to illustrate Microsoft's expansive and exploitative practice of scraping the Internet for biometric training data. The entire name file can be downloaded from <a href="https://www.msceleb.org">msceleb.org</a>. You can email <a href="mailto:msceleb@microsoft.com?subject=MS-Celeb-1M Removal Request&body=Dear%20Microsoft%2C%0A%0AI%20recently%20discovered%20that%20you%20use%20my%20identity%20for%20commercial%20use%20in%20your%20MS-Celeb-1M%20dataset%20used%20for%20research%20and%20development%20of%20face%20recognition.%20I%20do%20not%20wish%20to%20be%20included%20in%20your%20dataset%20in%20any%20format.%20%0A%0APlease%20remove%20my%20name%20and%2For%20any%20associated%20images%20immediately%20and%20send%20a%20confirmation%20once%20you've%20updated%20your%20%22Top1M_MidList.Name.tsv%22%20file.%0A%0AThanks%20for%20promptly%20handing%20this%2C%0A%5B%20your%20name%20%5D">msceleb@microsoft.com</a> to have your name removed. Names appearing with * indicate that Microsoft also distributed images.</p>