diff options
| author | adamhrv <adam@ahprojects.com> | 2019-06-27 23:58:23 +0200 |
|---|---|---|
| committer | adamhrv <adam@ahprojects.com> | 2019-06-27 23:58:23 +0200 |
| commit | 852e4c1e36c38f57f80fc5d441da82d5991b2212 (patch) | |
| tree | 0c8bc3bbcb6c679e28ba387d0c1e47fb3d16830a /site/public/datasets/msceleb/index.html | |
| parent | ae165ef1235a6997d5791ca241fd3fd134202c92 (diff) | |
update public
Diffstat (limited to 'site/public/datasets/msceleb/index.html')
| -rw-r--r-- | site/public/datasets/msceleb/index.html | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/site/public/datasets/msceleb/index.html b/site/public/datasets/msceleb/index.html index 2e326416..7109cc9b 100644 --- a/site/public/datasets/msceleb/index.html +++ b/site/public/datasets/msceleb/index.html @@ -206,7 +206,7 @@ <p>Earlier in 2019, Microsoft President and Chief Legal Officer <a href="https://blogs.microsoft.com/on-the-issues/2018/12/06/facial-recognition-its-time-for-action/">Brad Smith</a> called for the governmental regulation of face recognition, citing the potential for misuse, a rare admission that Microsoft's surveillance-driven business model had lost its bearing. More recently Smith also <a href="https://www.reuters.com/article/us-microsoft-ai/microsoft-turned-down-facial-recognition-sales-on-human-rights-concerns-idUSKCN1RS2FV">announced</a> that Microsoft would seemingly take a stand against such potential misuse, and had decided to not sell face recognition to an unnamed United States agency, citing a lack of accuracy. In effect, Microsoft's face recognition software was not suitable to be used on minorities because it was trained mostly on white male faces.</p> <p>What the decision to block the sale announces is not so much that Microsoft had upgraded their ethics policy, but that Microsoft publicly acknowledged it can't sell a data-driven product without data. In other words, Microsoft can't sell face recognition if they don't have enough face training data to build it.</p> <p>Until now, that data has been freely harvested from the Internet and packaged in training sets like MS Celeb, which are overwhelmingly <a href="https://www.nytimes.com/2018/02/09/technology/facial-recognition-race-artificial-intelligence.html">white</a> and <a href="https://gendershades.org">male</a>. Without balanced data, facial recognition contains blind spots. But without the large-scale datasets like MS Celeb, the powerful yet inaccurate facial recognition services like Microsoft Azure Cognitive would be even less usable.</p> -</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/msceleb/assets/msceleb_montage.jpg' alt=' A visualization of 2,000 of the 100,000 identities included in the MS-Celeb-1M dataset distributed by Microsoft Research. License: Open Data Commons Public Domain Dedication (PDDL)'><div class='caption'> A visualization of 2,000 of the 100,000 identities included in the MS-Celeb-1M dataset distributed by Microsoft Research. License: Open Data Commons Public Domain Dedication (PDDL)</div></div></section><section><p>Microsoft didn't only create MS Celeb for other researchers to use, they also used it internally. In a publicly available 2017 Microsoft Research project called "<a href="https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/">One-shot Face Recognition by Promoting Underrepresented Classes</a>," Microsoft used the MS Celeb face dataset to build their algorithms and advertise the results. Interestingly, Microsoft's <a href="https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/">corporate version</a> of the paper does not mention they used the MS Celeb datset, but the <a href="https://www.semanticscholar.org/paper/One-shot-Face-Recognition-by-Promoting-Classes-Guo/6cacda04a541d251e8221d70ac61fda88fb61a70">open-access version</a> published on arxiv.org does. It states that Microsoft Research analyzed their algorithms using "the MS-Celeb-1M low-shot learning benchmark task."<a class="footnote_shim" name="[^one_shot]_1"> </a><a href="#[^one_shot]" class="footnote" title="Footnote 5">5</a></p> +</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/msceleb/assets/msceleb_montage.jpg' alt=' A visualization of 2,000 of the 100,000 identities included in the MS-Celeb-1M dataset distributed by Microsoft Research. License: Open Data Commons Public Domain Dedication (PDDL)'><div class='caption'> A visualization of 2,000 of the 100,000 identities included in the MS-Celeb-1M dataset distributed by Microsoft Research. License: Open Data Commons Public Domain Dedication (PDDL)</div></div></section><section><p>Microsoft didn't only create MS Celeb for other researchers to use, they also used it internally. In a publicly available 2017 Microsoft Research project called "<a href="https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/">One-shot Face Recognition by Promoting Underrepresented Classes</a>," Microsoft used the MS Celeb face dataset to build their algorithms and advertise the results. Interestingly, Microsoft's <a href="https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/">corporate version</a> of the paper does not mention they used the MS Celeb datset, but the <a href="https://www.semanticscholar.org/paper/One-shot-Face-Recognition-by-Promoting-Classes-Guo/6cacda04a541d251e8221d70ac61fda88fb61a70">open-access version</a> published on arxiv.org does. It states that Microsoft analyzed their algorithms "on the MS-Celeb-1M low-shot learning <a href="https://www.microsoft.com/en-us/research/publication/ms-celeb-1m-dataset-benchmark-large-scale-face-recognition-2/">benchmark task</a>"<a class="footnote_shim" name="[^one_shot]_1"> </a><a href="#[^one_shot]" class="footnote" title="Footnote 5">5</a>, which is described as a refined version of the original MS-Celeb-1M face dataset.</p> <p>Typically researchers will phrase this differently and say that they only use a dataset to validate their algorithm. But validation data can't be easily separated from the training process. To develop a neural network model, image training datasets are split into three parts: train, test, and validation. Training data is used to fit a model, and the validation and test data are used to provide feedback about the hyperparameters, biases, and outputs. In reality, test and validation data steers and influences the final results of neural networks.</p> <h2>Runaway Data</h2> <p>Despite the recent termination of the <a href="https://msceleb.org">msceleb.org</a> website, the dataset still exists in several repositories on GitHub, the hard drives of countless researchers, and will likely continue to be used in research projects around the world.</p> |
