summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authoradamhrv <adam@ahprojects.com>2019-04-19 03:16:32 +0200
committeradamhrv <adam@ahprojects.com>2019-04-19 03:16:32 +0200
commitcf0d2816acf0ef73ddffbf649677fafcc953c004 (patch)
treee9d46c77b308befbc7359e99d1c15bc0ea2842da
parentb693f14a61b68f1cdf86255301e69e91084947e9 (diff)
fix more typos
-rw-r--r--site/content/pages/datasets/brainwash/index.md6
-rw-r--r--site/content/pages/datasets/duke_mtmc/index.md4
-rw-r--r--site/content/pages/datasets/ijb_c/index.md7
-rw-r--r--site/content/pages/datasets/msceleb/index.md12
-rw-r--r--site/public/datasets/brainwash/index.html6
-rw-r--r--site/public/datasets/duke_mtmc/index.html4
-rw-r--r--site/public/datasets/msceleb/index.html16
7 files changed, 26 insertions, 29 deletions
diff --git a/site/content/pages/datasets/brainwash/index.md b/site/content/pages/datasets/brainwash/index.md
index 75b0c006..79294114 100644
--- a/site/content/pages/datasets/brainwash/index.md
+++ b/site/content/pages/datasets/brainwash/index.md
@@ -21,9 +21,9 @@ authors: Adam Harvey
Brainwash is a dataset of livecam images taken from San Francisco's Brainwash Cafe. It includes 11,918 images of "everyday life of a busy downtown cafe"[^readme] captured at 100 second intervals throught the entire day. The Brainwash dataset includes 3 full days of webcam images taken on October 27, November 13, and November 24 in 2014. According the author's [reserach paper](https://www.semanticscholar.org/paper/End-to-End-People-Detection-in-Crowded-Scenes-Stewart-Andriluka/1bd1645a629f1b612960ab9bba276afd4cf7c666) introducing the dataset, the images were acquired with the help of Angelcam.com[^end_to_end]
-The Brainwash dataset is unique because it uses images from a publicly available webcam that records people inside a privately owned business without any consent. No ordinary cafe custom could ever suspect there image would end up in dataset used for surveillance reserach and development, but that is exactly what happened to customers at Brainwash cafe in San Francisco.
+The Brainwash dataset is unique because it uses images from a publicly available webcam that records people inside a privately owned business without any consent. No ordinary cafe custom could ever suspect their image would end up in dataset used for surveillance research and development, but that is exactly what happened to customers at Brainwash cafe in San Francisco.
-Although Brainwash appears to be a less popular dataset, it was used in 2016 and 2017 by researchers from the National University of Defense Technology in China took note of the dataset and used it for two [research](https://www.semanticscholar.org/paper/Localized-region-context-and-object-feature-fusion-Li-Dou/b02d31c640b0a31fb18c4f170d841d8e21ffb66c) [projects](https://www.semanticscholar.org/paper/A-Replacement-Algorithm-of-Non-Maximum-Suppression-Zhao-Wang/591a4bfa6380c9fcd5f3ae690e3ac5c09b7bf37b) on advancing the capabilities of object detection to more accurately isolate the target region in an image ([PDF](https://www.itm-conferences.org/articles/itmconf/pdf/2017/04/itmconf_ita2017_05006.pdf)). [^localized_region_context] [^replacement_algorithm]. The dataset also appears in a 2017 [research paper](https://ieeexplore.ieee.org/document/7877809) from Peking University for the purpose of improving surveillance capabilities for "people detection in the crowded scenes".
+Although Brainwash appears to be a less popular dataset, it notably was used in 2016 and 2017 by researchers affiliated the National University of Defense Technology in China for two [research](https://www.semanticscholar.org/paper/Localized-region-context-and-object-feature-fusion-Li-Dou/b02d31c640b0a31fb18c4f170d841d8e21ffb66c) [projects](https://www.semanticscholar.org/paper/A-Replacement-Algorithm-of-Non-Maximum-Suppression-Zhao-Wang/591a4bfa6380c9fcd5f3ae690e3ac5c09b7bf37b) on advancing the capabilities of object detection to more accurately isolate the target region in an image ([PDF](https://www.itm-conferences.org/articles/itmconf/pdf/2017/04/itmconf_ita2017_05006.pdf)). [^localized_region_context] [^replacement_algorithm]. The dataset also appears in a 2017 [research paper](https://ieeexplore.ieee.org/document/7877809) from Peking University for the purpose of improving surveillance capabilities for "people detection in the crowded scenes".
![caption: A visualization of 81,973 head annotations from the Brainwash dataset training partition. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)](assets/brainwash_grid.jpg)
@@ -31,7 +31,7 @@ Although Brainwash appears to be a less popular dataset, it was used in 2016 and
{% include 'supplementary_header.html' %}
-![caption: An sample image from the Brainwash dataset used for training face and head detection algorithms for surveillance. The datset contains 11,916 more images like this one. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)](assets/brainwash_example.jpg)
+![caption: An sample image from the Brainwash dataset used for training face and head detection algorithms for surveillance. The dataset contains 11,916 more images like this one. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)](assets/brainwash_example.jpg)
![caption: A visualization of the active regions for 81,973 head annotations from the Brainwash dataset training partition. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)](assets/brainwash_saliency_map.jpg)
diff --git a/site/content/pages/datasets/duke_mtmc/index.md b/site/content/pages/datasets/duke_mtmc/index.md
index 2420d042..9356823e 100644
--- a/site/content/pages/datasets/duke_mtmc/index.md
+++ b/site/content/pages/datasets/duke_mtmc/index.md
@@ -44,7 +44,7 @@ Despite [repeated](https://www.hrw.org/news/2017/11/19/china-police-big-data-sys
The reasons that companies in China use the Duke MTMC dataset for research are technically no different than the reasons it is used in the United States and Europe. In fact, the original creators of the dataset published a follow up report in 2017 titled [Tracking Social Groups Within and Across Cameras](https://www.semanticscholar.org/paper/Tracking-Social-Groups-Within-and-Across-Cameras-Solera-Calderara/9e644b1e33dd9367be167eb9d832174004840400) with specific applications to "automated analysis of crowds and social gatherings for surveillance and security applications". Their work, as well as the creation of the original dataset in 2014 were both supported in part by the United States Army Research Laboratory.
-Citations from the United States and Europe show a similar trend to that in China, including publicly acknowledged and verified usage of the Duke MTMC dataset supported or carried out by the United States Department of Homeland Security, IARPA, IBM, Microsoft (who provides surveillance to ICE), and Vision Semantics (who works with the UK Ministry of Defence). One [paper](https://pdfs.semanticscholar.org/59f3/57015054bab43fb8cbfd3f3dbf17b1d1f881.pdf) is even jointly published by researchers affiliated with both the University College of London and the National University of Defense Technology in China.
+Citations from the United States and Europe show a similar trend to that in China, including publicly acknowledged and verified usage of the Duke MTMC dataset supported or carried out by the United States Department of Homeland Security, IARPA, IBM, Microsoft (who has provided surveillance to ICE), and Vision Semantics (who has worked with the UK Ministry of Defence). One [paper](https://pdfs.semanticscholar.org/59f3/57015054bab43fb8cbfd3f3dbf17b1d1f881.pdf) is even jointly published by researchers affiliated with both the University College of London and the National University of Defense Technology in China.
| Organization | Paper | Link | Year | Used Duke MTMC |
|---|---|---|---|
@@ -79,7 +79,7 @@ For the approximately 2,000 students in Duke MTMC dataset there is unfortunately
#### Video Timestamps
-The video timestamps contain the likely, but not yet confirmed, date and times the video recorded. Because the video timestamps align with the start and stop [time sync data](http://vision.cs.duke.edu/DukeMTMC/details.html#time-sync) provided by the researchers, it at least confirms the relative timing. The [precipitous weather](https://www.wunderground.com/history/daily/KIGX/date/2014-3-19?req_city=Durham&req_state=NC&req_statename=North%20Carolina&reqdb.zip=27708&reqdb.magic=1&reqdb.wmo=99999) on March 14, 2014 in Durham, North Carolina supports, but does not confirm, that this day is a potential capture date.
+The video timestamps contain the likely, but not yet confirmed, date and times the video recorded. Because the video timestamps align with the start and stop [time sync data](http://vision.cs.duke.edu/DukeMTMC/details.html#time-sync) provided by the researchers, it at least confirms the relative timing. The [precipitous weather](https://www.wunderground.com/history/daily/KIGX/date/2014-3-19?req_city=Durham&req_state=NC&req_statename=North%20Carolina&reqdb.zip=27708&reqdb.magic=1&reqdb.wmo=99999) on March 14, 2014 in Durham, North Carolina supports, but does not confirm, that this day is the likely capture date.
=== columns 2
diff --git a/site/content/pages/datasets/ijb_c/index.md b/site/content/pages/datasets/ijb_c/index.md
index e3d6a134..46cab323 100644
--- a/site/content/pages/datasets/ijb_c/index.md
+++ b/site/content/pages/datasets/ijb_c/index.md
@@ -19,6 +19,8 @@ authors: Adam Harvey
### sidebar
### end sidebar
+[ page under development ]
+
The IARPA Janus Benchmark C is a dataset created by
@@ -32,8 +34,3 @@ The IARPA Janus Benchmark C is a dataset created by
{% include 'cite_our_work.html' %}
### Footnotes
-
-[^readme]: "readme.txt" https://exhibits.stanford.edu/data/catalog/sx925dc9385.
-[^end_to_end]: Stewart, Russel. Andriluka, Mykhaylo. "End-to-end people detection in crowded scenes". 2016.
-[^localized_region_context]: Li, Y. and Dou, Y. and Liu, X. and Li, T. Localized Region Context and Object Feature Fusion for People Head Detection. ICIP16 Proceedings. 2016. Pages 594-598.
-[^replacement_algorithm]: Zhao. X, Wang Y, Dou, Y. A Replacement Algorithm of Non-Maximum Suppression Base on Graph Clustering.
diff --git a/site/content/pages/datasets/msceleb/index.md b/site/content/pages/datasets/msceleb/index.md
index c353eca0..c16016f8 100644
--- a/site/content/pages/datasets/msceleb/index.md
+++ b/site/content/pages/datasets/msceleb/index.md
@@ -19,13 +19,13 @@ authors: Adam Harvey
### sidebar
### end sidebar
-Microsoft Celeb (MS Celeb) is a dataset of 10 million face images scraped from the Internet and used for research and development of large-scale biometric recognition systems. According to Microsoft Research who created and published the [dataset](https://www.microsoft.com/en-us/research/publication/ms-celeb-1m-dataset-benchmark-large-scale-face-recognition-2/) in 2016, MS Celeb is the largest publicly available face recognition dataset in the world, containing over 10 million images of nearly 100,000 individuals. Microsoft's goal in building this dataset was to distribute an initial training dataset of 100,000 individuals images and use this to accelerate research into recognizing a target list of one million individuals from their face images "using all the possibly collected face images of this individual on the web as training data".[^msceleb_orig]
+Microsoft Celeb (MS Celeb) is a dataset of 10 million face images scraped from the Internet and used for research and development of large-scale biometric recognition systems. According to Microsoft Research who created and published the [dataset](https://www.microsoft.com/en-us/research/publication/ms-celeb-1m-dataset-benchmark-large-scale-face-recognition-2/) in 2016, MS Celeb is the largest publicly available face recognition dataset in the world, containing over 10 million images of nearly 100,000 individuals. Microsoft's goal in building this dataset was to distribute an initial training dataset of 100,000 individuals images and use this to accelerate research into recognizing a target list of one million people from their face images "using all the possibly collected face images of this individual on the web as training data".[^msceleb_orig]
-These one million people, defined by Microsoft Research as "celebrities", are often merely people who must maintain an online presence for their professional lives. Microsoft's list of 1 million people is an expansive exploitation of the current reality that for many people including academics, policy makers, writers, artists, and especially journalists maintaining an online presence is mandatory and should not allow Microsoft or anyone else to use their biometrics for research and development of surveillance technology. Many of names in target list even include people critical of the very technology Microsoft is using their name and biometric information to build. The list includes digital rights activists like Jillian York and [add more]; artists critical of surveillance including Trevor Paglen, Hito Steryl, Jill Magid, and Aram Bartholl; Intercept founders Laura Poitras, Jeremy Scahill, and Glen Greenwald; Data and Society founder danah boyd; and even Julie Brill the former FTC commissioner responsible for protecting consumer’s privacy to name a few.
+These one million people, defined by Microsoft Research as "celebrities", are often merely people who must maintain an online presence for their professional lives. Microsoft's list of 1 million people is an expansive exploitation of the current reality that for many people including academics, policy makers, writers, artists, and especially journalists maintaining an online presence is mandatory and should not allow Microsoft or anyone else to use their biometrics for research and development of surveillance technology. Many of names in the target list even include people critical of the very technology Microsoft is using their name and biometric information to build. The list includes digital rights activists like Jillian York; artists critical of surveillance including Trevor Paglen, Jill Magid, and Aram Bartholl; Intercept founders Laura Poitras, Jeremy Scahill, and Glen Greenwald; Data and Society founder danah boyd; and even Julie Brill the former FTC commissioner responsible for protecting consumer privacy to name a few.
### Microsoft's 1 Million Target List
-Below is a list of names that were included in list of 1 million individuals curated to illustrate Microsoft's expansive and exploitative practice of scraping the Internet for biometric training data. The entire name file can be downloaded from [msceleb.org](https://msceleb.org). Email <a href="mailto:msceleb@microsoft.com?subject=MS-Celeb-1M Removal Request&body=Dear%20Microsoft%2C%0A%0AI%20recently%20discovered%20that%20you%20use%20my%20identity%20for%20commercial%20use%20in%20your%20MS-Celeb-1M%20dataset%20used%20for%20research%20and%20development%20of%20face%20recognition.%20I%20do%20not%20wish%20to%20be%20included%20in%20your%20dataset%20in%20any%20format.%20%0A%0APlease%20remove%20my%20name%20and%2For%20any%20associated%20images%20immediately%20and%20send%20a%20confirmation%20once%20you've%20updated%20your%20%22Top1M_MidList.Name.tsv%22%20file.%0A%0AThanks%20for%20promptly%20handing%20this%2C%0A%5B%20your%20name%20%5D">msceleb@microsoft.com</a> to have your name removed. Names appearing with * indicate that Microsoft also distributed images.
+Below is a selection of names from the full target list, curated to illustrate Microsoft's expansive and exploitative practice of scraping the Internet for biometric training data. The entire name file can be downloaded from [msceleb.org](https://msceleb.org). You can email <a href="mailto:msceleb@microsoft.com?subject=MS-Celeb-1M Removal Request&body=Dear%20Microsoft%2C%0A%0AI%20recently%20discovered%20that%20you%20use%20my%20identity%20for%20commercial%20use%20in%20your%20MS-Celeb-1M%20dataset%20used%20for%20research%20and%20development%20of%20face%20recognition.%20I%20do%20not%20wish%20to%20be%20included%20in%20your%20dataset%20in%20any%20format.%20%0A%0APlease%20remove%20my%20name%20and%2For%20any%20associated%20images%20immediately%20and%20send%20a%20confirmation%20once%20you've%20updated%20your%20%22Top1M_MidList.Name.tsv%22%20file.%0A%0AThanks%20for%20promptly%20handing%20this%2C%0A%5B%20your%20name%20%5D">msceleb@microsoft.com</a> to have your name removed. Names appearing with * indicate that Microsoft also distributed images.
=== columns 2
@@ -63,13 +63,13 @@ Below is a list of names that were included in list of 1 million individuals cur
After publishing this list, researchers from Microsoft Asia then worked with researchers affiliated with China's National University of Defense Technology (controlled by China's Central Military Commission) and used the the MS Celeb dataset for their [research paper](https://www.semanticscholar.org/paper/Faces-as-Lighting-Probes-via-Unsupervised-Deep-Yi-Zhu/b301fd2fc33f24d6f75224e7c0991f4f04b64a65) on using "Faces as Lighting Probes via Unsupervised Deep Highlight Extraction" with potential applications in 3D face recognition.
-In an [article](https://www.ft.com/content/9378e7ee-5ae6-11e9-9dde-7aedca0a081a) published by Financial Times based on data surfaced during this investigation, Samm Sacks (a senior fellow at New America think tank) commented that this research raised "red flags because of the nature of the technology, the author's affiliations, combined with what we know about how this technology is being deployed in China right now". Adding, that "the [Chinese] government is using these technologies to biuld surveillance systems and to detain minorities [in Xinjiang]".[^madhu_ft]
+In an [article](https://www.ft.com/content/9378e7ee-5ae6-11e9-9dde-7aedca0a081a) published by Financial Times based on data surfaced during this investigation, Samm Sacks (a senior fellow at New America think tank) commented that this research raised "red flags because of the nature of the technology, the author's affiliations, combined with what we know about how this technology is being deployed in China right now". Adding, that "the [Chinese] government is using these technologies to build surveillance systems and to detain minorities [in Xinjiang]".[^madhu_ft]
Four more papers published by SenseTime which also use the MS Celeb dataset raise similar flags. SenseTime is a computer vision surveillance company who until [April 2019](https://uhrp.org/news-commentary/china%E2%80%99s-sensetime-sells-out-xinjiang-security-joint-venture) provided surveillance to Chinese authorities to monitor and track Uighur Muslims in Xinjiang province and had been [flagged](https://www.nytimes.com/2019/04/14/technology/china-surveillance-artificial-intelligence-racial-profiling.html) numerous times as having potential links to human rights violations.
One of the 4 SenseTime papers, "[Exploring Disentangled Feature Representation Beyond Face Identification](https://www.semanticscholar.org/paper/Exploring-Disentangled-Feature-Representation-Face-Liu-Wei/1fd5d08394a3278ef0a89639e9bfec7cb482e0bf)", shows how SenseTime was developing automated face analysis technology to infer race, narrow eyes, nose size, and chin size, all of which could be used to target vulnerable ethnic groups based on their facial appearances.
-Earlier in 2019, Microsoft CEO [Brad Smith](https://blogs.microsoft.com/on-the-issues/2018/12/06/facial-recognition-its-time-for-action/) called for the governmental regulation of face recognition, citing the potential for misuse, a rare admission that Microsoft's surveillance-driven business model had lost its bearing. More recently Smith also [announced](https://www.reuters.com/article/us-microsoft-ai/microsoft-turned-down-facial-recognition-sales-on-human-rights-concerns-idUSKCN1RS2FV) that Microsoft would seemingly take stand against potential misuse and decided to not sell face recognition to an unnamed United States law enforcement agency, citing that their technology was not accurate enough to be used on minorities because it was trained mostly on white male faces.
+Earlier in 2019, Microsoft CEO [Brad Smith](https://blogs.microsoft.com/on-the-issues/2018/12/06/facial-recognition-its-time-for-action/) called for the governmental regulation of face recognition citing the potential for misuse, a rare admission that Microsoft's surveillance-driven business model had lost its bearing. More recently Smith also [announced](https://www.reuters.com/article/us-microsoft-ai/microsoft-turned-down-facial-recognition-sales-on-human-rights-concerns-idUSKCN1RS2FV) that Microsoft would seemingly take stand against such potential misuse and decided to not sell face recognition to an unnamed United States agency, citing a lack of accuracy made it not suitable to be used on minorities, because it was trained mostly on white male faces.
What the decision to block the sale announces is not so much that Microsoft had upgraded their ethics, but that Microsoft publicly acknowledged it can't sell a data-driven product without data. In other words, Microsoft can't sell face recognition for faces they can't train on.
@@ -77,7 +77,7 @@ Until now, that data has been freely harvested from the Internet and packaged in
![caption: A visualization of 2,000 of the 100,000 identity included in the image dataset distributed by Microsoft Research. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)](assets/msceleb_montage.jpg)
-Microsoft didn't only create MS Celeb for other researchers to use, they also used it internally. In a publicly available 2017 Microsoft Research project called [One-shot Face Recognition by Promoting Underrepresented Classes](https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/), Microsoft leveraged the MS Celeb dataset to analyze their algorithms and advertise the results. Interestingly, Microsoft's [corporate version](https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/) of the paper does not mention they used the MS Celeb datset, but the [open-access version](https://www.semanticscholar.org/paper/One-shot-Face-Recognition-by-Promoting-Classes-Guo/6cacda04a541d251e8221d70ac61fda88fb61a70) published on arxiv.org explicitly mentions that Microsoft Research tested their algorithms "on the MS-Celeb-1M low-shot learning benchmark task."
+Microsoft didn't only create MS Celeb for other researchers to use, they also used it internally. In a publicly available 2017 Microsoft Research project called [One-shot Face Recognition by Promoting Underrepresented Classes](https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/), Microsoft leveraged the MS Celeb dataset to analyze their algorithms and advertise the results. Interestingly, Microsoft's [corporate version](https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/) of the paper does not mention they used the MS Celeb datset, but the [open-access version](https://www.semanticscholar.org/paper/One-shot-Face-Recognition-by-Promoting-Classes-Guo/6cacda04a541d251e8221d70ac61fda88fb61a70) published on arxiv.org explicitly mentions that Microsoft Research introspected their algorithms "on the MS-Celeb-1M low-shot learning benchmark task."
We suggest that if Microsoft Research wants to make biometric data publicly available for surveillance research and development, they should start with releasing their researchers' own biometric data instead of scraping the Internet for journalists, artists, writers, actors, athletes, musicians, and academics.
diff --git a/site/public/datasets/brainwash/index.html b/site/public/datasets/brainwash/index.html
index cf1f5e5e..453e4956 100644
--- a/site/public/datasets/brainwash/index.html
+++ b/site/public/datasets/brainwash/index.html
@@ -50,8 +50,8 @@
<div class='gray'>Website</div>
<div><a href='https://purl.stanford.edu/sx925dc9385' target='_blank' rel='nofollow noopener'>stanford.edu</a></div>
</div></div><p>Brainwash is a dataset of livecam images taken from San Francisco's Brainwash Cafe. It includes 11,918 images of "everyday life of a busy downtown cafe"<a class="footnote_shim" name="[^readme]_1"> </a><a href="#[^readme]" class="footnote" title="Footnote 1">1</a> captured at 100 second intervals throught the entire day. The Brainwash dataset includes 3 full days of webcam images taken on October 27, November 13, and November 24 in 2014. According the author's <a href="https://www.semanticscholar.org/paper/End-to-End-People-Detection-in-Crowded-Scenes-Stewart-Andriluka/1bd1645a629f1b612960ab9bba276afd4cf7c666">reserach paper</a> introducing the dataset, the images were acquired with the help of Angelcam.com<a class="footnote_shim" name="[^end_to_end]_1"> </a><a href="#[^end_to_end]" class="footnote" title="Footnote 2">2</a></p>
-<p>The Brainwash dataset is unique because it uses images from a publicly available webcam that records people inside a privately owned business without any consent. No ordinary cafe custom could ever suspect there image would end up in dataset used for surveillance reserach and development, but that is exactly what happened to customers at Brainwash cafe in San Francisco.</p>
-<p>Although Brainwash appears to be a less popular dataset, it was used in 2016 and 2017 by researchers from the National University of Defense Technology in China took note of the dataset and used it for two <a href="https://www.semanticscholar.org/paper/Localized-region-context-and-object-feature-fusion-Li-Dou/b02d31c640b0a31fb18c4f170d841d8e21ffb66c">research</a> <a href="https://www.semanticscholar.org/paper/A-Replacement-Algorithm-of-Non-Maximum-Suppression-Zhao-Wang/591a4bfa6380c9fcd5f3ae690e3ac5c09b7bf37b">projects</a> on advancing the capabilities of object detection to more accurately isolate the target region in an image (<a href="https://www.itm-conferences.org/articles/itmconf/pdf/2017/04/itmconf_ita2017_05006.pdf">PDF</a>). <a class="footnote_shim" name="[^localized_region_context]_1"> </a><a href="#[^localized_region_context]" class="footnote" title="Footnote 3">3</a> <a class="footnote_shim" name="[^replacement_algorithm]_1"> </a><a href="#[^replacement_algorithm]" class="footnote" title="Footnote 4">4</a>. The dataset also appears in a 2017 <a href="https://ieeexplore.ieee.org/document/7877809">research paper</a> from Peking University for the purpose of improving surveillance capabilities for "people detection in the crowded scenes".</p>
+<p>The Brainwash dataset is unique because it uses images from a publicly available webcam that records people inside a privately owned business without any consent. No ordinary cafe custom could ever suspect their image would end up in dataset used for surveillance research and development, but that is exactly what happened to customers at Brainwash cafe in San Francisco.</p>
+<p>Although Brainwash appears to be a less popular dataset, it notably was used in 2016 and 2017 by researchers affiliated the National University of Defense Technology in China for two <a href="https://www.semanticscholar.org/paper/Localized-region-context-and-object-feature-fusion-Li-Dou/b02d31c640b0a31fb18c4f170d841d8e21ffb66c">research</a> <a href="https://www.semanticscholar.org/paper/A-Replacement-Algorithm-of-Non-Maximum-Suppression-Zhao-Wang/591a4bfa6380c9fcd5f3ae690e3ac5c09b7bf37b">projects</a> on advancing the capabilities of object detection to more accurately isolate the target region in an image (<a href="https://www.itm-conferences.org/articles/itmconf/pdf/2017/04/itmconf_ita2017_05006.pdf">PDF</a>). <a class="footnote_shim" name="[^localized_region_context]_1"> </a><a href="#[^localized_region_context]" class="footnote" title="Footnote 3">3</a> <a class="footnote_shim" name="[^replacement_algorithm]_1"> </a><a href="#[^replacement_algorithm]" class="footnote" title="Footnote 4">4</a>. The dataset also appears in a 2017 <a href="https://ieeexplore.ieee.org/document/7877809">research paper</a> from Peking University for the purpose of improving surveillance capabilities for "people detection in the crowded scenes".</p>
</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/brainwash/assets/brainwash_grid.jpg' alt=' A visualization of 81,973 head annotations from the Brainwash dataset training partition. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)'><div class='caption'> A visualization of 81,973 head annotations from the Brainwash dataset training partition. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)</div></div></section><section>
<h3>Who used Brainwash Dataset?</h3>
@@ -112,7 +112,7 @@
<h2>Supplementary Information</h2>
-</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/brainwash/assets/brainwash_example.jpg' alt=' An sample image from the Brainwash dataset used for training face and head detection algorithms for surveillance. The datset contains 11,916 more images like this one. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)'><div class='caption'> An sample image from the Brainwash dataset used for training face and head detection algorithms for surveillance. The datset contains 11,916 more images like this one. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)</div></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/brainwash/assets/brainwash_saliency_map.jpg' alt=' A visualization of the active regions for 81,973 head annotations from the Brainwash dataset training partition. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)'><div class='caption'> A visualization of the active regions for 81,973 head annotations from the Brainwash dataset training partition. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)</div></div></section><section>
+</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/brainwash/assets/brainwash_example.jpg' alt=' An sample image from the Brainwash dataset used for training face and head detection algorithms for surveillance. The dataset contains 11,916 more images like this one. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)'><div class='caption'> An sample image from the Brainwash dataset used for training face and head detection algorithms for surveillance. The dataset contains 11,916 more images like this one. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)</div></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/brainwash/assets/brainwash_saliency_map.jpg' alt=' A visualization of the active regions for 81,973 head annotations from the Brainwash dataset training partition. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)'><div class='caption'> A visualization of the active regions for 81,973 head annotations from the Brainwash dataset training partition. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)</div></div></section><section>
<h4>Cite Our Work</h4>
<p>
diff --git a/site/public/datasets/duke_mtmc/index.html b/site/public/datasets/duke_mtmc/index.html
index 90c131b8..116fe770 100644
--- a/site/public/datasets/duke_mtmc/index.html
+++ b/site/public/datasets/duke_mtmc/index.html
@@ -140,7 +140,7 @@
</tbody>
</table>
<p>The reasons that companies in China use the Duke MTMC dataset for research are technically no different than the reasons it is used in the United States and Europe. In fact, the original creators of the dataset published a follow up report in 2017 titled <a href="https://www.semanticscholar.org/paper/Tracking-Social-Groups-Within-and-Across-Cameras-Solera-Calderara/9e644b1e33dd9367be167eb9d832174004840400">Tracking Social Groups Within and Across Cameras</a> with specific applications to "automated analysis of crowds and social gatherings for surveillance and security applications". Their work, as well as the creation of the original dataset in 2014 were both supported in part by the United States Army Research Laboratory.</p>
-<p>Citations from the United States and Europe show a similar trend to that in China, including publicly acknowledged and verified usage of the Duke MTMC dataset supported or carried out by the United States Department of Homeland Security, IARPA, IBM, Microsoft (who provides surveillance to ICE), and Vision Semantics (who works with the UK Ministry of Defence). One <a href="https://pdfs.semanticscholar.org/59f3/57015054bab43fb8cbfd3f3dbf17b1d1f881.pdf">paper</a> is even jointly published by researchers affiliated with both the University College of London and the National University of Defense Technology in China.</p>
+<p>Citations from the United States and Europe show a similar trend to that in China, including publicly acknowledged and verified usage of the Duke MTMC dataset supported or carried out by the United States Department of Homeland Security, IARPA, IBM, Microsoft (who has provided surveillance to ICE), and Vision Semantics (who has worked with the UK Ministry of Defence). One <a href="https://pdfs.semanticscholar.org/59f3/57015054bab43fb8cbfd3f3dbf17b1d1f881.pdf">paper</a> is even jointly published by researchers affiliated with both the University College of London and the National University of Defense Technology in China.</p>
<table>
<thead><tr>
<th>Organization</th>
@@ -260,7 +260,7 @@
<h2>Supplementary Information</h2>
</section><section><h4>Video Timestamps</h4>
-<p>The video timestamps contain the likely, but not yet confirmed, date and times the video recorded. Because the video timestamps align with the start and stop <a href="http://vision.cs.duke.edu/DukeMTMC/details.html#time-sync">time sync data</a> provided by the researchers, it at least confirms the relative timing. The <a href="https://www.wunderground.com/history/daily/KIGX/date/2014-3-19?req_city=Durham&amp;req_state=NC&amp;req_statename=North%20Carolina&amp;reqdb.zip=27708&amp;reqdb.magic=1&amp;reqdb.wmo=99999">precipitous weather</a> on March 14, 2014 in Durham, North Carolina supports, but does not confirm, that this day is a potential capture date.</p>
+<p>The video timestamps contain the likely, but not yet confirmed, date and times the video recorded. Because the video timestamps align with the start and stop <a href="http://vision.cs.duke.edu/DukeMTMC/details.html#time-sync">time sync data</a> provided by the researchers, it at least confirms the relative timing. The <a href="https://www.wunderground.com/history/daily/KIGX/date/2014-3-19?req_city=Durham&amp;req_state=NC&amp;req_statename=North%20Carolina&amp;reqdb.zip=27708&amp;reqdb.magic=1&amp;reqdb.wmo=99999">precipitous weather</a> on March 14, 2014 in Durham, North Carolina supports, but does not confirm, that this day is the likely capture date.</p>
</section><section><div class='columns columns-2'><div class='column'><table>
<thead><tr>
<th>Camera</th>
diff --git a/site/public/datasets/msceleb/index.html b/site/public/datasets/msceleb/index.html
index b62670c9..63f314bb 100644
--- a/site/public/datasets/msceleb/index.html
+++ b/site/public/datasets/msceleb/index.html
@@ -33,13 +33,13 @@
<div>2016</div>
</div><div class='meta'>
<div class='gray'>Images</div>
- <div>10,000,000 </div>
+ <div>1,000,000 </div>
</div><div class='meta'>
<div class='gray'>Identities</div>
<div>100,000 </div>
</div><div class='meta'>
<div class='gray'>Purpose</div>
- <div>Face recognition</div>
+ <div>Large-scale face recognition</div>
</div><div class='meta'>
<div class='gray'>Created by</div>
<div>Microsoft Research</div>
@@ -49,10 +49,10 @@
</div><div class='meta'>
<div class='gray'>Website</div>
<div><a href='http://www.msceleb.org/' target='_blank' rel='nofollow noopener'>msceleb.org</a></div>
- </div></div><p>Microsoft Celeb (MS Celeb) is a dataset of 10 million face images scraped from the Internet and used for research and development of large-scale biometric recognition systems. According to Microsoft Research who created and published the <a href="https://www.microsoft.com/en-us/research/publication/ms-celeb-1m-dataset-benchmark-large-scale-face-recognition-2/">dataset</a> in 2016, MS Celeb is the largest publicly available face recognition dataset in the world, containing over 10 million images of nearly 100,000 individuals. Microsoft's goal in building this dataset was to distribute an initial training dataset of 100,000 individuals images and use this to accelerate research into recognizing a target list of one million individuals from their face images "using all the possibly collected face images of this individual on the web as training data".<a class="footnote_shim" name="[^msceleb_orig]_1"> </a><a href="#[^msceleb_orig]" class="footnote" title="Footnote 1">1</a></p>
-<p>These one million people, defined by Microsoft Research as "celebrities", are often merely people who must maintain an online presence for their professional lives. Microsoft's list of 1 million people is an expansive exploitation of the current reality that for many people including academics, policy makers, writers, artists, and especially journalists maintaining an online presence is mandatory and should not allow Microsoft or anyone else to use their biometrics for research and development of surveillance technology. Many of names in target list even include people critical of the very technology Microsoft is using their name and biometric information to build. The list includes digital rights activists like Jillian York and [add more]; artists critical of surveillance including Trevor Paglen, Hito Steryl, Jill Magid, and Aram Bartholl; Intercept founders Laura Poitras, Jeremy Scahill, and Glen Greenwald; Data and Society founder danah boyd; and even Julie Brill the former FTC commissioner responsible for protecting consumer’s privacy to name a few.</p>
+ </div></div><p>Microsoft Celeb (MS Celeb) is a dataset of 10 million face images scraped from the Internet and used for research and development of large-scale biometric recognition systems. According to Microsoft Research who created and published the <a href="https://www.microsoft.com/en-us/research/publication/ms-celeb-1m-dataset-benchmark-large-scale-face-recognition-2/">dataset</a> in 2016, MS Celeb is the largest publicly available face recognition dataset in the world, containing over 10 million images of nearly 100,000 individuals. Microsoft's goal in building this dataset was to distribute an initial training dataset of 100,000 individuals images and use this to accelerate research into recognizing a target list of one million people from their face images "using all the possibly collected face images of this individual on the web as training data".<a class="footnote_shim" name="[^msceleb_orig]_1"> </a><a href="#[^msceleb_orig]" class="footnote" title="Footnote 1">1</a></p>
+<p>These one million people, defined by Microsoft Research as "celebrities", are often merely people who must maintain an online presence for their professional lives. Microsoft's list of 1 million people is an expansive exploitation of the current reality that for many people including academics, policy makers, writers, artists, and especially journalists maintaining an online presence is mandatory and should not allow Microsoft or anyone else to use their biometrics for research and development of surveillance technology. Many of names in the target list even include people critical of the very technology Microsoft is using their name and biometric information to build. The list includes digital rights activists like Jillian York; artists critical of surveillance including Trevor Paglen, Jill Magid, and Aram Bartholl; Intercept founders Laura Poitras, Jeremy Scahill, and Glen Greenwald; Data and Society founder danah boyd; and even Julie Brill the former FTC commissioner responsible for protecting consumer privacy to name a few.</p>
<h3>Microsoft's 1 Million Target List</h3>
-<p>Below is a list of names that were included in list of 1 million individuals curated to illustrate Microsoft's expansive and exploitative practice of scraping the Internet for biometric training data. The entire name file can be downloaded from <a href="https://msceleb.org">msceleb.org</a>. Email <a href="mailto:msceleb@microsoft.com?subject=MS-Celeb-1M Removal Request&body=Dear%20Microsoft%2C%0A%0AI%20recently%20discovered%20that%20you%20use%20my%20identity%20for%20commercial%20use%20in%20your%20MS-Celeb-1M%20dataset%20used%20for%20research%20and%20development%20of%20face%20recognition.%20I%20do%20not%20wish%20to%20be%20included%20in%20your%20dataset%20in%20any%20format.%20%0A%0APlease%20remove%20my%20name%20and%2For%20any%20associated%20images%20immediately%20and%20send%20a%20confirmation%20once%20you've%20updated%20your%20%22Top1M_MidList.Name.tsv%22%20file.%0A%0AThanks%20for%20promptly%20handing%20this%2C%0A%5B%20your%20name%20%5D">msceleb@microsoft.com</a> to have your name removed. Names appearing with * indicate that Microsoft also distributed images.</p>
+<p>Below is a selection of names from the full target list, curated to illustrate Microsoft's expansive and exploitative practice of scraping the Internet for biometric training data. The entire name file can be downloaded from <a href="https://msceleb.org">msceleb.org</a>. You can email <a href="mailto:msceleb@microsoft.com?subject=MS-Celeb-1M Removal Request&body=Dear%20Microsoft%2C%0A%0AI%20recently%20discovered%20that%20you%20use%20my%20identity%20for%20commercial%20use%20in%20your%20MS-Celeb-1M%20dataset%20used%20for%20research%20and%20development%20of%20face%20recognition.%20I%20do%20not%20wish%20to%20be%20included%20in%20your%20dataset%20in%20any%20format.%20%0A%0APlease%20remove%20my%20name%20and%2For%20any%20associated%20images%20immediately%20and%20send%20a%20confirmation%20once%20you've%20updated%20your%20%22Top1M_MidList.Name.tsv%22%20file.%0A%0AThanks%20for%20promptly%20handing%20this%2C%0A%5B%20your%20name%20%5D">msceleb@microsoft.com</a> to have your name removed. Names appearing with * indicate that Microsoft also distributed images.</p>
</section><section><div class='columns columns-2'><div class='column'><table>
<thead><tr>
<th>Name</th>
@@ -160,13 +160,13 @@
</tbody>
</table>
</div></div></section><section><p>After publishing this list, researchers from Microsoft Asia then worked with researchers affiliated with China's National University of Defense Technology (controlled by China's Central Military Commission) and used the the MS Celeb dataset for their <a href="https://www.semanticscholar.org/paper/Faces-as-Lighting-Probes-via-Unsupervised-Deep-Yi-Zhu/b301fd2fc33f24d6f75224e7c0991f4f04b64a65">research paper</a> on using "Faces as Lighting Probes via Unsupervised Deep Highlight Extraction" with potential applications in 3D face recognition.</p>
-<p>In an <a href="https://www.ft.com/content/9378e7ee-5ae6-11e9-9dde-7aedca0a081a">article</a> published by Financial Times based on data surfaced during this investigation, Samm Sacks (a senior fellow at New America think tank) commented that this research raised "red flags because of the nature of the technology, the author's affiliations, combined with what we know about how this technology is being deployed in China right now". Adding, that "the [Chinese] government is using these technologies to biuld surveillance systems and to detain minorities [in Xinjiang]".<a class="footnote_shim" name="[^madhu_ft]_1"> </a><a href="#[^madhu_ft]" class="footnote" title="Footnote 2">2</a></p>
+<p>In an <a href="https://www.ft.com/content/9378e7ee-5ae6-11e9-9dde-7aedca0a081a">article</a> published by Financial Times based on data surfaced during this investigation, Samm Sacks (a senior fellow at New America think tank) commented that this research raised "red flags because of the nature of the technology, the author's affiliations, combined with what we know about how this technology is being deployed in China right now". Adding, that "the [Chinese] government is using these technologies to build surveillance systems and to detain minorities [in Xinjiang]".<a class="footnote_shim" name="[^madhu_ft]_1"> </a><a href="#[^madhu_ft]" class="footnote" title="Footnote 2">2</a></p>
<p>Four more papers published by SenseTime which also use the MS Celeb dataset raise similar flags. SenseTime is a computer vision surveillance company who until <a href="https://uhrp.org/news-commentary/china%E2%80%99s-sensetime-sells-out-xinjiang-security-joint-venture">April 2019</a> provided surveillance to Chinese authorities to monitor and track Uighur Muslims in Xinjiang province and had been <a href="https://www.nytimes.com/2019/04/14/technology/china-surveillance-artificial-intelligence-racial-profiling.html">flagged</a> numerous times as having potential links to human rights violations.</p>
<p>One of the 4 SenseTime papers, "<a href="https://www.semanticscholar.org/paper/Exploring-Disentangled-Feature-Representation-Face-Liu-Wei/1fd5d08394a3278ef0a89639e9bfec7cb482e0bf">Exploring Disentangled Feature Representation Beyond Face Identification</a>", shows how SenseTime was developing automated face analysis technology to infer race, narrow eyes, nose size, and chin size, all of which could be used to target vulnerable ethnic groups based on their facial appearances.</p>
-<p>Earlier in 2019, Microsoft CEO <a href="https://blogs.microsoft.com/on-the-issues/2018/12/06/facial-recognition-its-time-for-action/">Brad Smith</a> called for the governmental regulation of face recognition, citing the potential for misuse, a rare admission that Microsoft's surveillance-driven business model had lost its bearing. More recently Smith also <a href="https://www.reuters.com/article/us-microsoft-ai/microsoft-turned-down-facial-recognition-sales-on-human-rights-concerns-idUSKCN1RS2FV">announced</a> that Microsoft would seemingly take stand against potential misuse and decided to not sell face recognition to an unnamed United States law enforcement agency, citing that their technology was not accurate enough to be used on minorities because it was trained mostly on white male faces.</p>
+<p>Earlier in 2019, Microsoft CEO <a href="https://blogs.microsoft.com/on-the-issues/2018/12/06/facial-recognition-its-time-for-action/">Brad Smith</a> called for the governmental regulation of face recognition citing the potential for misuse, a rare admission that Microsoft's surveillance-driven business model had lost its bearing. More recently Smith also <a href="https://www.reuters.com/article/us-microsoft-ai/microsoft-turned-down-facial-recognition-sales-on-human-rights-concerns-idUSKCN1RS2FV">announced</a> that Microsoft would seemingly take stand against such potential misuse and decided to not sell face recognition to an unnamed United States agency, citing a lack of accuracy made it not suitable to be used on minorities, because it was trained mostly on white male faces.</p>
<p>What the decision to block the sale announces is not so much that Microsoft had upgraded their ethics, but that Microsoft publicly acknowledged it can't sell a data-driven product without data. In other words, Microsoft can't sell face recognition for faces they can't train on.</p>
<p>Until now, that data has been freely harvested from the Internet and packaged in training sets like MS Celeb, which are overwhelmingly <a href="https://www.nytimes.com/2018/02/09/technology/facial-recognition-race-artificial-intelligence.html">white</a> and <a href="https://gendershades.org">male</a>. Without balanced data, facial recognition contains blind spots. And without datasets like MS Celeb, the powerful yet inaccurate facial recognition services like Microsoft's Azure Cognitive Service also would not be able to see at all.</p>
-</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/msceleb/assets/msceleb_montage.jpg' alt=' A visualization of 2,000 of the 100,000 identity included in the image dataset distributed by Microsoft Research. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)'><div class='caption'> A visualization of 2,000 of the 100,000 identity included in the image dataset distributed by Microsoft Research. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)</div></div></section><section><p>Microsoft didn't only create MS Celeb for other researchers to use, they also used it internally. In a publicly available 2017 Microsoft Research project called <a href="https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/">One-shot Face Recognition by Promoting Underrepresented Classes</a>, Microsoft leveraged the MS Celeb dataset to analyze their algorithms and advertise the results. Interestingly, Microsoft's <a href="https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/">corporate version</a> of the paper does not mention they used the MS Celeb datset, but the <a href="https://www.semanticscholar.org/paper/One-shot-Face-Recognition-by-Promoting-Classes-Guo/6cacda04a541d251e8221d70ac61fda88fb61a70">open-access version</a> published on arxiv.org explicitly mentions that Microsoft Research tested their algorithms "on the MS-Celeb-1M low-shot learning benchmark task."</p>
+</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/msceleb/assets/msceleb_montage.jpg' alt=' A visualization of 2,000 of the 100,000 identity included in the image dataset distributed by Microsoft Research. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)'><div class='caption'> A visualization of 2,000 of the 100,000 identity included in the image dataset distributed by Microsoft Research. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)</div></div></section><section><p>Microsoft didn't only create MS Celeb for other researchers to use, they also used it internally. In a publicly available 2017 Microsoft Research project called <a href="https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/">One-shot Face Recognition by Promoting Underrepresented Classes</a>, Microsoft leveraged the MS Celeb dataset to analyze their algorithms and advertise the results. Interestingly, Microsoft's <a href="https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/">corporate version</a> of the paper does not mention they used the MS Celeb datset, but the <a href="https://www.semanticscholar.org/paper/One-shot-Face-Recognition-by-Promoting-Classes-Guo/6cacda04a541d251e8221d70ac61fda88fb61a70">open-access version</a> published on arxiv.org explicitly mentions that Microsoft Research introspected their algorithms "on the MS-Celeb-1M low-shot learning benchmark task."</p>
<p>We suggest that if Microsoft Research wants to make biometric data publicly available for surveillance research and development, they should start with releasing their researchers' own biometric data instead of scraping the Internet for journalists, artists, writers, actors, athletes, musicians, and academics.</p>
</section><section>
<h3>Who used Microsoft Celeb?</h3>