diff options
Diffstat (limited to 'site/content/pages/datasets/msceleb/index.md')
| -rw-r--r-- | site/content/pages/datasets/msceleb/index.md | 3 |
1 files changed, 2 insertions, 1 deletions
diff --git a/site/content/pages/datasets/msceleb/index.md b/site/content/pages/datasets/msceleb/index.md index 5095da3d..453c1522 100644 --- a/site/content/pages/datasets/msceleb/index.md +++ b/site/content/pages/datasets/msceleb/index.md @@ -87,7 +87,8 @@ Until now, that data has been freely harvested from the Internet and packaged in  -Microsoft didn't only create MS Celeb for other researchers to use, they also used it internally. In a publicly available 2017 Microsoft Research project called "[One-shot Face Recognition by Promoting Underrepresented Classes](https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/)," Microsoft used the MS Celeb face dataset to build their algorithms and advertise the results. Interestingly, Microsoft's [corporate version](https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/) of the paper does not mention they used the MS Celeb datset, but the [open-access version](https://www.semanticscholar.org/paper/One-shot-Face-Recognition-by-Promoting-Classes-Guo/6cacda04a541d251e8221d70ac61fda88fb61a70) published on arxiv.org does. It states that Microsoft Research analyzed their algorithms using "the MS-Celeb-1M low-shot learning benchmark task."[^one_shot] +Microsoft didn't only create MS Celeb for other researchers to use, they also used it internally. In a publicly available 2017 Microsoft Research project called "[One-shot Face Recognition by Promoting Underrepresented Classes](https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/)," Microsoft used the MS Celeb face dataset to build their algorithms and advertise the results. Interestingly, Microsoft's [corporate version](https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/) of the paper does not mention they used the MS Celeb datset, but the [open-access version](https://www.semanticscholar.org/paper/One-shot-Face-Recognition-by-Promoting-Classes-Guo/6cacda04a541d251e8221d70ac61fda88fb61a70) published on arxiv.org does. It states that Microsoft analyzed their algorithms "on the MS-Celeb-1M low-shot learning [benchmark task](https://www.microsoft.com/en-us/research/publication/ms-celeb-1m-dataset-benchmark-large-scale-face-recognition-2/)"[^one_shot], which is described as a refined version of the original MS-Celeb-1M face dataset. + Typically researchers will phrase this differently and say that they only use a dataset to validate their algorithm. But validation data can't be easily separated from the training process. To develop a neural network model, image training datasets are split into three parts: train, test, and validation. Training data is used to fit a model, and the validation and test data are used to provide feedback about the hyperparameters, biases, and outputs. In reality, test and validation data steers and influences the final results of neural networks. |
