diff options
Diffstat (limited to 'site/public/datasets')
| -rw-r--r-- | site/public/datasets/ijb_c/index.html | 9 | ||||
| -rw-r--r-- | site/public/datasets/index.html | 12 | ||||
| -rw-r--r-- | site/public/datasets/msceleb/index.html | 4 |
3 files changed, 6 insertions, 19 deletions
diff --git a/site/public/datasets/ijb_c/index.html b/site/public/datasets/ijb_c/index.html index b6a16bfe..3ebeb6ae 100644 --- a/site/public/datasets/ijb_c/index.html +++ b/site/public/datasets/ijb_c/index.html @@ -4,7 +4,7 @@ <title>MegaPixels</title> <meta charset="utf-8" /> <meta name="author" content="Adam Harvey" /> - <meta name="description" content="IJB-C is a datset ..." /> + <meta name="description" content="IARPA Janus Benchmark C is a dataset of web images used" /> <meta name="referrer" content="no-referrer" /> <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" /> <link rel='stylesheet' href='/assets/css/fonts.css' /> @@ -26,7 +26,7 @@ </header> <div class="content content-dataset"> - <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/ijb_c/assets/background.jpg)'><div class='inner'><div class='hero_desc'><span class='bgpad'>IJB-C is a datset ...</span></div><div class='hero_subdesc'><span class='bgpad'>The IJB-C dataset contains... + <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/ijb_c/assets/background.jpg)'><div class='inner'><div class='hero_desc'><span class='bgpad'>IARPA Janus Benchmark C is a dataset of web images used</span></div><div class='hero_subdesc'><span class='bgpad'>The IJB-C dataset contains 21,294 images and 11,779 videos of 3,531 identities </span></div></div></section><section><h2>IARPA Janus Benchmark C (IJB-C)</h2> </section><section><div class='right-sidebar'><div class='meta'> <div class='gray'>Published</div> @@ -46,9 +46,8 @@ </div><div class='meta'> <div class='gray'>Website</div> <div><a href='https://www.nist.gov/programs-projects/face-challenges' target='_blank' rel='nofollow noopener'>nist.gov</a></div> - </div></div><p>Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor</p> -<p>Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor Loren ipsum dolor</p> -</section><section> + </div></div><p>The IARPA Janus Benchmark C is a dataset created by</p> +</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/ijb_c/assets/ijb_c_montage.jpg' alt=' A visualization of the IJB-C dataset'><div class='caption'> A visualization of the IJB-C dataset</div></div></section><section> <h3>Who used IJB-C?</h3> <p> diff --git a/site/public/datasets/index.html b/site/public/datasets/index.html index 6e43e73f..f1d17598 100644 --- a/site/public/datasets/index.html +++ b/site/public/datasets/index.html @@ -63,18 +63,6 @@ </div> </a> - <a href="/datasets/hrt_transgender/" style="background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/hrt_transgender/assets/index.jpg)"> - <div class="dataset"> - <span class='title'>HRT Transgender Dataset</span> - <div class='fields'> - <div class='year visible'><span>2013</span></div> - <div class='purpose'><span>Face recognition, gender transition biometrics</span></div> - <div class='images'><span>10,564 images</span></div> - <div class='identities'><span>38 </span></div> - </div> - </div> - </a> - <a href="/datasets/ijb_c/" style="background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/ijb_c/assets/index.jpg)"> <div class="dataset"> <span class='title'>IJB-C</span> diff --git a/site/public/datasets/msceleb/index.html b/site/public/datasets/msceleb/index.html index a00b3527..113a63d2 100644 --- a/site/public/datasets/msceleb/index.html +++ b/site/public/datasets/msceleb/index.html @@ -49,7 +49,7 @@ </div><div class='meta'> <div class='gray'>Website</div> <div><a href='http://www.msceleb.org/' target='_blank' rel='nofollow noopener'>msceleb.org</a></div> - </div></div><p>Microsoft Celeb (MS Celeb) is a dataset of 10 million face images scraped from the Internet and used for research and development of large-scale biometric recognition systems. According to Microsoft Research who created and published the <a href="https://www.microsoft.com/en-us/research/publication/ms-celeb-1m-dataset-benchmark-large-scale-face-recognition-2/">dataset</a> in 2016, MS Celeb is the largest publicly available face recognition dataset in the world, containing over 10 million images of nearly 100,000 individuals. Microsoft's goal in building this dataset was to distribute an initial training dataset of 100,000 individuals images and use this to accelerate reserch into recognizing a target list of one million individuals from their face images "using all the possibly collected face images of this individual on the web as training data".<a class="footnote_shim" name="[^msceleb_orig]_1"> </a><a href="#[^msceleb_orig]" class="footnote" title="Footnote 1">1</a></p> + </div></div><p>Microsoft Celeb (MS Celeb) is a dataset of 10 million face images scraped from the Internet and used for research and development of large-scale biometric recognition systems. According to Microsoft Research who created and published the <a href="https://www.microsoft.com/en-us/research/publication/ms-celeb-1m-dataset-benchmark-large-scale-face-recognition-2/">dataset</a> in 2016, MS Celeb is the largest publicly available face recognition dataset in the world, containing over 10 million images of nearly 100,000 individuals. Microsoft's goal in building this dataset was to distribute an initial training dataset of 100,000 individuals images and use this to accelerate research into recognizing a target list of one million individuals from their face images "using all the possibly collected face images of this individual on the web as training data".<a class="footnote_shim" name="[^msceleb_orig]_1"> </a><a href="#[^msceleb_orig]" class="footnote" title="Footnote 1">1</a></p> <p>These one million people, defined by Microsoft Research as "celebrities", are often merely people who must maintain an online presence for their professional lives. Microsoft's list of 1 million people is an expansive exploitation of the current reality that for many people including academics, policy makers, writers, artists, and especially journalists maintaining an online presence is mandatory and should not allow Microsoft or anyone else to use their biometrics for research and development of surveillance technology. Many of names in target list even include people critical of the very technology Microsoft is using their name and biometric information to build. The list includes digital rights activists like Jillian York and [add more]; artists critical of surveillance including Trevor Paglen, Hito Steryl, Jill Magid, and Aram Bartholl; Intercept founders Laura Poitras, Jeremy Scahill, and Glen Greenwald; Data and Society founder danah boyd; and even Julie Brill the former FTC commissioner responsible for protecting consumer’s privacy to name a few.</p> <h3>Microsoft's 1 Million Target List</h3> <p>Below is a list of names that were included in list of 1 million individuals curated to illustrate Microsoft's expansive and exploitative practice of scraping the Internet for biometric training data. The entire name file can be downloaded from <a href="https://msceleb.org">msceleb.org</a>. Email <a href="mailto:msceleb@microsoft.com?subject=MS-Celeb-1M Removal Request&body=Dear%20Microsoft%2C%0A%0AI%20recently%20discovered%20that%20you%20use%20my%20identity%20for%20commercial%20use%20in%20your%20MS-Celeb-1M%20dataset%20used%20for%20research%20and%20development%20of%20face%20recognition.%20I%20do%20not%20wish%20to%20be%20included%20in%20your%20dataset%20in%20any%20format.%20%0A%0APlease%20remove%20my%20name%20and%2For%20any%20associated%20images%20immediately%20and%20send%20a%20confirmation%20once%20you've%20updated%20your%20%22Top1M_MidList.Name.tsv%22%20file.%0A%0AThanks%20for%20promptly%20handing%20this%2C%0A%5B%20your%20name%20%5D">msceleb@microsoft.com</a> to have your name removed. Names appearing with * indicate that Microsoft also distributed images.</p> @@ -166,7 +166,7 @@ <p>Earlier in 2019, Microsoft CEO <a href="https://blogs.microsoft.com/on-the-issues/2018/12/06/facial-recognition-its-time-for-action/">Brad Smith</a> called for the governmental regulation of face recognition, citing the potential for misuse, a rare admission that Microsoft's surveillance-driven business model had lost its bearing. More recently Smith also <a href="https://www.reuters.com/article/us-microsoft-ai/microsoft-turned-down-facial-recognition-sales-on-human-rights-concerns-idUSKCN1RS2FV">announced</a> that Microsoft would seemingly take stand against potential misuse and decided to not sell face recognition to an unnamed United States law enforcement agency, citing that their technology was not accurate enough to be used on minorities because it was trained mostly on white male faces.</p> <p>What the decision to block the sale announces is not so much that Microsoft had upgraded their ethics, but that Microsoft publicly acknowledged it can't sell a data-driven product without data. In other words, Microsoft can't sell face recognition for faces they can't train on.</p> <p>Until now, that data has been freely harvested from the Internet and packaged in training sets like MS Celeb, which are overwhelmingly <a href="https://www.nytimes.com/2018/02/09/technology/facial-recognition-race-artificial-intelligence.html">white</a> and <a href="https://gendershades.org">male</a>. Without balanced data, facial recognition contains blind spots. And without datasets like MS Celeb, the powerful yet inaccurate facial recognition services like Microsoft's Azure Cognitive Service also would not be able to see at all.</p> -<p>Microsoft didn't only create MS Celeb for other researchers to use, they also used it internally. In a publicly available 2017 Microsoft Research project called <a href="https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/">One-shot Face Recognition by Promoting Underrepresented Classes</a>, Microsoft leveraged the MS Celeb dataset to analyze their algorithms and advertise the results. Interestingly, Microsoft's <a href="https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/">corporate version</a> of the paper does not mention they used the MS Celeb datset, but the <a href="https://www.semanticscholar.org/paper/One-shot-Face-Recognition-by-Promoting-Classes-Guo/6cacda04a541d251e8221d70ac61fda88fb61a70">open-access version</a> published on arxiv.org explicitly mentions that Microsoft Research tested their algorithms "on the MS-Celeb-1M low-shot learning benchmark task."</p> +</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/msceleb/assets/msceleb_montage.jpg' alt=' A visualization of 2,000 of the 100,000 identity included in the image dataset distributed by Microsoft Research. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)'><div class='caption'> A visualization of 2,000 of the 100,000 identity included in the image dataset distributed by Microsoft Research. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)</div></div></section><section><p>Microsoft didn't only create MS Celeb for other researchers to use, they also used it internally. In a publicly available 2017 Microsoft Research project called <a href="https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/">One-shot Face Recognition by Promoting Underrepresented Classes</a>, Microsoft leveraged the MS Celeb dataset to analyze their algorithms and advertise the results. Interestingly, Microsoft's <a href="https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/">corporate version</a> of the paper does not mention they used the MS Celeb datset, but the <a href="https://www.semanticscholar.org/paper/One-shot-Face-Recognition-by-Promoting-Classes-Guo/6cacda04a541d251e8221d70ac61fda88fb61a70">open-access version</a> published on arxiv.org explicitly mentions that Microsoft Research tested their algorithms "on the MS-Celeb-1M low-shot learning benchmark task."</p> <p>We suggest that if Microsoft Research wants to make biometric data publicly available for surveillance research and development, they should start with releasing their researchers' own biometric data instead of scraping the Internet for journalists, artists, writers, actors, athletes, musicians, and academics.</p> </section><section> <h3>Who used Microsoft Celeb?</h3> |
