summaryrefslogtreecommitdiff
path: root/site/public
diff options
context:
space:
mode:
authorAdam Harvey <adam@ahprojects.com>2019-06-06 06:18:52 -0500
committerAdam Harvey <adam@ahprojects.com>2019-06-06 06:18:52 -0500
commitf227833124adb7e0d871f702220de687d74d663c (patch)
tree2e28bca4222473bb6b9a082fc519f007f61f340a /site/public
parentfe066d0b79a305731e9ad7286445f17073ef917d (diff)
fix msceleb typos
Diffstat (limited to 'site/public')
-rw-r--r--site/public/assets/css/mobile.css7
-rwxr-xr-xsite/public/assets/css/tabulator.css8
-rw-r--r--site/public/datasets/msceleb/index.html10
3 files changed, 10 insertions, 15 deletions
diff --git a/site/public/assets/css/mobile.css b/site/public/assets/css/mobile.css
index 4258f6b3..124b9d42 100644
--- a/site/public/assets/css/mobile.css
+++ b/site/public/assets/css/mobile.css
@@ -169,11 +169,4 @@ softbr {
.teaser {
display: none;
}
- .intro-mobile{
- font-size:12px;
- }
- .intro-mobile-cr{
- font-size:10px;
- color:#999;
- }
} \ No newline at end of file
diff --git a/site/public/assets/css/tabulator.css b/site/public/assets/css/tabulator.css
index d26b5cfc..d7a3fab3 100755
--- a/site/public/assets/css/tabulator.css
+++ b/site/public/assets/css/tabulator.css
@@ -65,8 +65,8 @@
text-overflow: ellipsis;
vertical-align: bottom;
/* AH */
- font-weight: 500;
- font-size:14px;
+ font-weight: 400;
+ font-size:12px;
}
.tabulator .tabulator-header .tabulator-col .tabulator-col-content .tabulator-col-title .tabulator-title-editor {
@@ -408,6 +408,7 @@
position: relative;
box-sizing: border-box;
min-height: 22px;
+
}
.tabulator-row.tabulator-row-even {
@@ -494,7 +495,7 @@
padding-right: 10px;
}
-.tabulator-row .tabulator-cell {
+.tabulator .tabulator-row .tabulator-cell {
display: inline-block;
position: relative;
box-sizing: border-box;
@@ -504,6 +505,7 @@
white-space: nowrap;
overflow: hidden;
text-overflow: ellipsis;
+ font-size:12px;
}
.tabulator-row .tabulator-cell.tabulator-editing {
diff --git a/site/public/datasets/msceleb/index.html b/site/public/datasets/msceleb/index.html
index 7c3ac86c..8816e7ea 100644
--- a/site/public/datasets/msceleb/index.html
+++ b/site/public/datasets/msceleb/index.html
@@ -205,18 +205,18 @@
<p>What the decision to block the sale announces is not so much that Microsoft had upgraded their ethics policy, but that Microsoft publicly acknowledged it can't sell a data-driven product without data. In other words, Microsoft can't sell face recognition if they don't have enough face training data to build it.</p>
<p>Until now, that data has been freely harvested from the Internet and packaged in training sets like MS Celeb, which are overwhelmingly <a href="https://www.nytimes.com/2018/02/09/technology/facial-recognition-race-artificial-intelligence.html">white</a> and <a href="https://gendershades.org">male</a>. Without balanced data, facial recognition contains blind spots. But without the large-scale datasets like MS Celeb, the powerful yet inaccurate facial recognition services like Microsoft Azure Cognitive would be even less usable.</p>
</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/msceleb/assets/msceleb_montage.jpg' alt=' A visualization of 2,000 of the 100,000 identities included in the MS-Celeb-1M dataset distributed by Microsoft Research. License: Open Data Commons Public Domain Dedication (PDDL)'><div class='caption'> A visualization of 2,000 of the 100,000 identities included in the MS-Celeb-1M dataset distributed by Microsoft Research. License: Open Data Commons Public Domain Dedication (PDDL)</div></div></section><section><p>Microsoft didn't only create MS Celeb for other researchers to use, they also used it internally. In a publicly available 2017 Microsoft Research project called "<a href="https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/">One-shot Face Recognition by Promoting Underrepresented Classes</a>," Microsoft used the MS Celeb face dataset to build their algorithms and advertise the results. Interestingly, Microsoft's <a href="https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/">corporate version</a> of the paper does not mention they used the MS Celeb datset, but the <a href="https://www.semanticscholar.org/paper/One-shot-Face-Recognition-by-Promoting-Classes-Guo/6cacda04a541d251e8221d70ac61fda88fb61a70">open-access version</a> published on arxiv.org does. It states that Microsoft Research analyzed their algorithms using "the MS-Celeb-1M low-shot learning benchmark task."<a class="footnote_shim" name="[^one_shot]_1"> </a><a href="#[^one_shot]" class="footnote" title="Footnote 5">5</a></p>
-<p>Typically researchers will phrase this differently and say that they only use a dataset to validate their algorithm. But validation data can't be easily seperated from the training process. To develop a neural network model, image training datasets are split into three parts: train, test, and validation. Training data is used to fit a model, and the validation and test data are used to provide feedback about the hyperparameters, biases, and outputs. In reality, test and validation data steers and influences the final results of neural networks.</p>
+<p>Typically researchers will phrase this differently and say that they only use a dataset to validate their algorithm. But validation data can't be easily separated from the training process. To develop a neural network model, image training datasets are split into three parts: train, test, and validation. Training data is used to fit a model, and the validation and test data are used to provide feedback about the hyperparameters, biases, and outputs. In reality, test and validation data steers and influences the final results of neural networks.</p>
<h2>Runaway Data</h2>
<p>Despite the recent termination of the <a href="https://msceleb.org">msceleb.org</a> website, the dataset still exists in several repositories on GitHub, the hard drives of countless researchers, and will likely continue to be used in research projects around the world.</p>
<p>For example, on October 28, 2019, the MS Celeb dataset will be used for a new competition called "<a href="https://ibug.doc.ic.ac.uk/resources/lightweight-face-recognition-challenge-workshop/">Lightweight Face Recognition Challenge &amp; Workshop</a>" where the best face recognition entries will be awarded $5,000 from Huawei and $3,000 from DeepGlint. The competition is part of the <a href="http://iccv2019.thecvf.com/program/workshops">ICCV 2019 conference</a>. This time the challenge is no longer being organized by Microsoft, who created the dataset, but instead by Imperial College London (UK) and <a href="https://github.com/deepinsight/insightface">InsightFace</a> (CN).</p>
-<p>And earlier in 2019 images from the MS Celeb were repackaged into another face dataset called <em>Racial Faces in the Wild (RFW)</em>. To create it, the RFW authors uploaded face images from the MS Celeb dataset to the Face++ API and used the inferred racial scores to segregate people into four subsets: Caucasian, Asian, Indian, and African each with 3,000 subjects. That dataset then appeared in a subsequent research project from researchers affilliated with IIIT-Delhi and IBM TJ Watson called <a href="https://arxiv.org/abs/1904.01219">Deep Learning for Face Recognition: Pride or Prejudiced?</a>, which aims to reduce bias but also inadvertently furthers racist language and ideologies in the paper.</p>
-<p>The technology used to compute the estimated racial scores for the for the MS Celeb face images used in the RFW dataset is owned by Megvii Inc, who has been repeatedly linked to the oppressive surveillance of Uighur Muslims in Xinjiang, China. According to posts from the <a href="https://chinai.substack.com/p/chinai-newsletter-11-companies-involved-in-expanding-chinas-public-security-apparatus-in-xinjiang">ChinAI Newsletter</a> and <a href="https://www.buzzfeednews.com/article/ryanmac/us-money-funding-facial-recognition-sensetime-megvii">BuzzFeedNews</a>, Megvii announced in 2017 at the China-Eurasia Security Expo in Ürümqi, Xinjiang, that it would be the official technical support unit of the "Public Security Video Laboratory" in Xinjiang, China. If they didn't already, it's highly likely that Megvii has a copy of everyone's biometric faceprint from the MS Celeb dataset.</p>
+<p>And earlier in 2019 images from the MS Celeb were repackaged into another face dataset called <em>Racial Faces in the Wild (RFW)</em>. To create it, the RFW authors uploaded face images from the MS Celeb dataset to the Face++ API and used the inferred racial scores to segregate people into four subsets: Caucasian, Asian, Indian, and African each with 3,000 subjects. That dataset then appeared in a subsequent research project from researchers affiliated with IIIT-Delhi and IBM TJ Watson called <a href="https://arxiv.org/abs/1904.01219">Deep Learning for Face Recognition: Pride or Prejudiced?</a>, which aims to reduce bias but also inadvertently furthers racist language and ideologies in the paper.</p>
+<p>The technology that was used to compute the estimated racial scores for the MS Celeb face images used in the RFW dataset, Face++, is owned by Megvii Inc, who has been repeatedly linked to the oppressive surveillance of Uighur Muslims in Xinjiang, China. According to posts from the <a href="https://chinai.substack.com/p/chinai-newsletter-11-companies-involved-in-expanding-chinas-public-security-apparatus-in-xinjiang">ChinAI Newsletter</a> and <a href="https://www.buzzfeednews.com/article/ryanmac/us-money-funding-facial-recognition-sensetime-megvii">BuzzFeedNews</a>, Megvii announced in 2017 at the China-Eurasia Security Expo in Ürümqi, Xinjiang, that it would be the official technical support unit of the "Public Security Video Laboratory" in Xinjiang, China. If they didn't already, it's highly likely that Megvii has a copy of everyone's biometric faceprint from the MS Celeb dataset.</p>
<p>Megvii also publicly acknowledges using the MS Celeb face dataset in their 2018 research project called <a href="https://arxiv.org/pdf/1808.06210.pdf">GridFace: Face Rectification via Learning Local Homography Transformations</a>. The paper has three authors, all of whom were associated with Megvii.</p>
<h2>Commercial Usage</h2>
<p>The Microsoft Celeb dataset <a href="http://web.archive.org/web/20180218212120/http://www.msceleb.org/download/sampleset">website</a> says it was created for "non-commercial research purpose only." Publicly available research citations and competitions show otherwise.</p>
-<p>In 2017 Microsoft Research organized a face recognition competition at the International Conference on Computer Vision (ICCV), one of the top 2 computer vision conferences worldwide, where industry and academia used the MS Celeb dataset to compete for the higest performance scores. The winner was Beijing-based OrionStar Technology Co., Ltd.. In their <a href="https://www.prnewswire.com/news-releases/orionstar-wins-challenge-to-recognize-one-million-celebrity-faces-with-artificial-intelligence-300494265.html">press release</a>, OrionStar boast 13% increase on the difficult set over last year's winner. The prior year's competitors included Beijing-based Faceall Technology Co., Ltd., a company providing face recognition for "smart city" applications.</p>
+<p>In 2017 Microsoft Research organized a face recognition competition at the International Conference on Computer Vision (ICCV), one of the top 2 computer vision conferences worldwide, where industry and academia used the MS Celeb dataset to compete for the highest performance scores. The 2017 winner was Beijing-based OrionStar Technology Co., Ltd.. In their <a href="https://www.prnewswire.com/news-releases/orionstar-wins-challenge-to-recognize-one-million-celebrity-faces-with-artificial-intelligence-300494265.html">press release</a>, OrionStar boasted a 13% increase on the difficult set over last year's winner. The prior year's competitors included Beijing-based Faceall Technology Co., Ltd., a company providing face recognition for "smart city" applications.</p>
<p>Considering the multiple citations from commercial organizations (Canon, Hitachi, IBM, Megvii/Face++, Microsoft, Microsoft Asia, SenseTime), military use (National University of Defense Technology in China), and the proliferation of subset data (Racial Faces in the Wild) being used to develop face recognition technology for commercial or defense purposes it's fairly clear that Microsoft has lost control of their MS Celeb dataset and biometric data of nearly 100,000 individuals.</p>
-<p>To provide insight into where these 10 million faces images have traveled, over 100 research papers have been verified and geolocated to show who used the dataset and where it was used.</p>
+<p>To provide insight into where these 10 million faces images have traveled, over 100 research papers have been verified and geolocated to show who used the dataset and where they used it.</p>
</section><section>
<h3>Who used Microsoft Celeb?</h3>