From e59b5e38a6dfcb61375686ec83a4606f50ab012d Mon Sep 17 00:00:00 2001 From: Adam Harvey Date: Fri, 28 Jun 2019 18:35:49 +0200 Subject: msc ready v1 --- site/public/datasets/brainwash/index.html | 2 +- site/public/datasets/duke_mtmc/index.html | 2 +- site/public/datasets/helen/index.html | 2 +- site/public/datasets/hrt_transgender/index.html | 2 +- site/public/datasets/ibm_dif/index.html | 2 +- site/public/datasets/ijb_c/index.html | 2 +- site/public/datasets/index.html | 2 +- site/public/datasets/megaface/index.html | 2 +- site/public/datasets/msceleb/assets/notes/index.html | 2 +- site/public/datasets/msceleb/index.html | 6 +++--- site/public/datasets/oxford_town_centre/index.html | 2 +- site/public/datasets/uccs/assets/notes/index.html | 2 +- site/public/datasets/uccs/index.html | 2 +- site/public/datasets/who_goes_there/index.html | 2 +- 14 files changed, 16 insertions(+), 16 deletions(-) (limited to 'site/public/datasets') diff --git a/site/public/datasets/brainwash/index.html b/site/public/datasets/brainwash/index.html index 3dacd6e1..18600b6f 100644 --- a/site/public/datasets/brainwash/index.html +++ b/site/public/datasets/brainwash/index.html @@ -50,7 +50,7 @@
diff --git a/site/public/datasets/duke_mtmc/index.html b/site/public/datasets/duke_mtmc/index.html index 9a70a3f6..fc141450 100644 --- a/site/public/datasets/duke_mtmc/index.html +++ b/site/public/datasets/duke_mtmc/index.html @@ -50,7 +50,7 @@
diff --git a/site/public/datasets/helen/index.html b/site/public/datasets/helen/index.html index a7ada42a..44ef462e 100644 --- a/site/public/datasets/helen/index.html +++ b/site/public/datasets/helen/index.html @@ -50,7 +50,7 @@
diff --git a/site/public/datasets/hrt_transgender/index.html b/site/public/datasets/hrt_transgender/index.html index 02324a2f..2e5e9c62 100644 --- a/site/public/datasets/hrt_transgender/index.html +++ b/site/public/datasets/hrt_transgender/index.html @@ -50,7 +50,7 @@
diff --git a/site/public/datasets/ibm_dif/index.html b/site/public/datasets/ibm_dif/index.html index 1c465f93..be5dbfe4 100644 --- a/site/public/datasets/ibm_dif/index.html +++ b/site/public/datasets/ibm_dif/index.html @@ -50,7 +50,7 @@
diff --git a/site/public/datasets/ijb_c/index.html b/site/public/datasets/ijb_c/index.html index a36fac14..abe7d5ed 100644 --- a/site/public/datasets/ijb_c/index.html +++ b/site/public/datasets/ijb_c/index.html @@ -50,7 +50,7 @@
diff --git a/site/public/datasets/index.html b/site/public/datasets/index.html index 1fb83352..a634b877 100644 --- a/site/public/datasets/index.html +++ b/site/public/datasets/index.html @@ -50,7 +50,7 @@
diff --git a/site/public/datasets/megaface/index.html b/site/public/datasets/megaface/index.html index 33abf6c1..712af28a 100644 --- a/site/public/datasets/megaface/index.html +++ b/site/public/datasets/megaface/index.html @@ -50,7 +50,7 @@
diff --git a/site/public/datasets/msceleb/assets/notes/index.html b/site/public/datasets/msceleb/assets/notes/index.html index cac21eef..36c32429 100644 --- a/site/public/datasets/msceleb/assets/notes/index.html +++ b/site/public/datasets/msceleb/assets/notes/index.html @@ -50,7 +50,7 @@
diff --git a/site/public/datasets/msceleb/index.html b/site/public/datasets/msceleb/index.html index 7109cc9b..42a44571 100644 --- a/site/public/datasets/msceleb/index.html +++ b/site/public/datasets/msceleb/index.html @@ -50,7 +50,7 @@
@@ -212,8 +212,8 @@

Despite the recent termination of the msceleb.org website, the dataset still exists in several repositories on GitHub, the hard drives of countless researchers, and will likely continue to be used in research projects around the world.

For example, on October 28, 2019, the MS Celeb dataset will be used for a new competition called "Lightweight Face Recognition Challenge & Workshop" where the best face recognition entries will be awarded $5,000 from Huawei and $3,000 from DeepGlint. The competition is part of the ICCV 2019 conference. This time the challenge is no longer being organized by Microsoft, who created the dataset, but instead by Imperial College London (UK) and InsightFace (CN). The organizers provide a 25GB download of cropped faces from MS Celeb for anyone to download (in .rec format).

And in June, shortly after posting about the disappearance of the MS Celeb dataset, it reemerged on Academic Torrents. As of June 10, the MS Celeb dataset files have been redistributed in at least 9 countries and downloaded 44 times without any restrictions. The files were seeded and are mostly distributed by an AI company based in China called Hyper.ai, which states that it redistributes MS Celeb and other datasets for "teachers and students of service industry-related practitioners and research institutes." 6

-

Earlier in 2019 images from the MS Celeb were also repackaged into another face dataset called Racial Faces in the Wild (RFW). To create it, the RFW authors uploaded face images from the MS Celeb dataset to the Face++ API and used the inferred racial scores to segregate people into four subsets: Caucasian, Asian, Indian, and African each with 3,000 subjects. That dataset then appeared in a subsequent research project from researchers affiliated with IIIT-Delhi and IBM TJ Watson called Deep Learning for Face Recognition: Pride or Prejudiced?, which aims to reduce bias but also inadvertently furthers racist language and ideologies that can not be repeated here.

-

The estimated racial scores for the MS Celeb face images used in the RFW dataset were computed using the Face++ API, which is owned by Megvii Inc, a company that has been repeatedly linked to the oppressive surveillance of Uighur Muslims in Xinjiang, China. According to posts from the ChinAI Newsletter and BuzzFeedNews, Megvii announced in 2017 at the China-Eurasia Security Expo in Ürümqi, Xinjiang, that it would be the official technical support unit of the "Public Security Video Laboratory" in Xinjiang, China. If they didn't already, it's highly likely that Megvii has a copy of everyone's biometric faceprint from the MS Celeb dataset, either from uploads to the Face++ API or through the research projects explicitly referencing MS Celeb dataset usage, such as a 2018 paper called GridFace: Face Rectification via Learning Local Homography Transformations jointly published by 3 authors, all of whom worked for Megvii.

+

Earlier in 2019 images from the MS Celeb were also repackaged into another face dataset called Racial Faces in the Wild (RFW). To create it, the RFW authors uploaded face images from the MS Celeb dataset to the Face++ API and used the inferred racial scores to segregate people into four subsets: Caucasian, Asian, Indian, and African each with 3,000 subjects. That dataset then appeared in a subsequent research project from researchers affiliated with IIIT-Delhi and IBM TJ Watson called Deep Learning for Face Recognition: Pride or Prejudiced?, which aims to reduce bias but also inadvertently furthers racist ideologies, using discredited racial terminology that cannot be repeated here.

+

The estimated racial scores for the MS Celeb face images used in the RFW dataset were computed using the Face++ API, which is owned by Megvii Inc, a company that has been repeatedly linked to the oppressive surveillance of Uighur Muslims in Xinjiang, China. According to posts from the ChinAI Newsletter and BuzzFeedNews, Megvii announced in 2017 at the China-Eurasia Security Expo in Ürümqi, Xinjiang, that it would be the official technical support unit of the "Public Security Video Laboratory" in Xinjiang, China. If they didn't already, it's highly likely that Megvii has a copy of everyone's biometric faceprint from the MS Celeb dataset, either from uploads to the Face++ API or through research projects explicitly referencing MS Celeb dataset usage, such as a 2018 paper called GridFace: Face Rectification via Learning Local Homography Transformations jointly published by 3 authors, all of whom worked for Megvii.

Commercial Usage

Microsoft's MS Celeb website says it was created for "non-commercial research purpose only." Publicly available research citations and competitions show otherwise.

In 2017 Microsoft Research organized a face recognition competition at the International Conference on Computer Vision (ICCV), one of the top 2 computer vision conferences worldwide, where industry and academia used the MS Celeb dataset to compete for the highest performance scores. The 2017 winner was Beijing-based OrionStar Technology Co., Ltd.. In their press release, OrionStar boasted a 13% increase on the difficult set over last year's winner. The prior year's competitors included Beijing-based Faceall Technology Co., Ltd., a company providing face recognition for "smart city" applications.

diff --git a/site/public/datasets/oxford_town_centre/index.html b/site/public/datasets/oxford_town_centre/index.html index 40f8bbc6..11fb436f 100644 --- a/site/public/datasets/oxford_town_centre/index.html +++ b/site/public/datasets/oxford_town_centre/index.html @@ -50,7 +50,7 @@
diff --git a/site/public/datasets/uccs/assets/notes/index.html b/site/public/datasets/uccs/assets/notes/index.html index c8daf796..ce36f3d9 100644 --- a/site/public/datasets/uccs/assets/notes/index.html +++ b/site/public/datasets/uccs/assets/notes/index.html @@ -50,7 +50,7 @@
diff --git a/site/public/datasets/uccs/index.html b/site/public/datasets/uccs/index.html index 96ab1e09..2dcf88a1 100644 --- a/site/public/datasets/uccs/index.html +++ b/site/public/datasets/uccs/index.html @@ -50,7 +50,7 @@
diff --git a/site/public/datasets/who_goes_there/index.html b/site/public/datasets/who_goes_there/index.html index 3db77ff7..a00fd151 100644 --- a/site/public/datasets/who_goes_there/index.html +++ b/site/public/datasets/who_goes_there/index.html @@ -50,7 +50,7 @@
-- cgit v1.2.3-70-g09d2