summaryrefslogtreecommitdiff
path: root/site/content/pages/datasets
diff options
context:
space:
mode:
Diffstat (limited to 'site/content/pages/datasets')
-rw-r--r--site/content/pages/datasets/adience/index.md1
-rw-r--r--site/content/pages/datasets/brainwash/index.md14
-rw-r--r--site/content/pages/datasets/duke_mtmc/index.md9
-rw-r--r--site/content/pages/datasets/helen/assets/_background.jpgbin0 -> 554835 bytes
-rw-r--r--site/content/pages/datasets/helen/assets/age.csv10
-rwxr-xr-xsite/content/pages/datasets/helen/assets/alpha.pngbin0 -> 7160 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/background.jpgbin134927 -> 210197 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/feature_bride.jpgbin0 -> 39064 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/feature_family.jpgbin0 -> 74137 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/feature_family_05.jpgbin0 -> 74142 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/feature_graduation.jpgbin0 -> 46147 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/feature_groom.jpgbin0 -> 41000 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/feature_outdoor.jpgbin0 -> 46663 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/feature_outdoor_02.jpgbin0 -> 56328 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/feature_wedding.jpgbin0 -> 48999 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/feature_wedding_02.jpgbin0 -> 43256 bytes
-rw-r--r--site/content/pages/datasets/helen/assets/gender.csv4
-rwxr-xr-xsite/content/pages/datasets/helen/assets/ijb_c_montage.jpgbin424821 -> 0 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/index.jpgbin14856 -> 23135 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/montage_20_2_2_40_15.pngbin0 -> 3259144 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/montage_lms_21_14_14_14_22.pngbin0 -> 23058 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/montage_lms_21_14_14_14_25.pngbin0 -> 22435 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/montage_lms_21_14_14_14_26.pngbin0 -> 14164 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/montage_lms_21_15_15_7_26_0.pngbin0 -> 45864 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/single.pngbin0 -> 9001 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/single_21_15_15_7_43_19.pngbin0 -> 7160 bytes
-rw-r--r--site/content/pages/datasets/helen/index.md120
-rw-r--r--site/content/pages/datasets/hrt_transgender/index.md24
-rw-r--r--site/content/pages/datasets/ibm_dif/index.md16
-rw-r--r--site/content/pages/datasets/index.md1
-rw-r--r--site/content/pages/datasets/lfpw/index.md10
-rw-r--r--site/content/pages/datasets/megaface/assets/age.csv10
-rw-r--r--site/content/pages/datasets/megaface/assets/gender.csv4
-rw-r--r--site/content/pages/datasets/megaface/index.md27
-rw-r--r--site/content/pages/datasets/msceleb/assets/age.csv10
-rw-r--r--site/content/pages/datasets/msceleb/assets/gender.csv4
-rw-r--r--site/content/pages/datasets/msceleb/index.md52
-rw-r--r--site/content/pages/datasets/oxford_town_centre/index.md7
-rw-r--r--site/content/pages/datasets/pipa/assets/age.csv10
-rw-r--r--site/content/pages/datasets/pipa/assets/gender.csv4
-rw-r--r--site/content/pages/datasets/pipa/index.md2
-rw-r--r--site/content/pages/datasets/uccs/index.md9
-rwxr-xr-xsite/content/pages/datasets/vgg_face/assets/background.jpgbin134927 -> 0 bytes
-rwxr-xr-xsite/content/pages/datasets/vgg_face/assets/ijb_c_montage.jpgbin424821 -> 0 bytes
-rwxr-xr-xsite/content/pages/datasets/vgg_face/assets/index.jpgbin14856 -> 0 bytes
-rw-r--r--site/content/pages/datasets/vgg_face/index.md30
-rw-r--r--site/content/pages/datasets/who_goes_there/index.md2
47 files changed, 292 insertions, 88 deletions
diff --git a/site/content/pages/datasets/adience/index.md b/site/content/pages/datasets/adience/index.md
index 12cf539a..675de813 100644
--- a/site/content/pages/datasets/adience/index.md
+++ b/site/content/pages/datasets/adience/index.md
@@ -7,7 +7,6 @@ subdesc: Adience ...
slug: adience
cssclass: dataset
image: assets/background.jpg
-year: 2016
published: 2019-4-18
updated: 2019-4-18
authors: Adam Harvey
diff --git a/site/content/pages/datasets/brainwash/index.md b/site/content/pages/datasets/brainwash/index.md
index 2a5346b5..6d2279cb 100644
--- a/site/content/pages/datasets/brainwash/index.md
+++ b/site/content/pages/datasets/brainwash/index.md
@@ -4,6 +4,7 @@ status: published
title: Brainwash Dataset
desc: Brainwash is a dataset of webcam images taken from the Brainwash Cafe in San Francisco
subdesc: It includes 11,917 images of "everyday life of a busy downtown cafe" and is used for training face and head detection algorithms
+caption: One of 11,917 images from the Brainwash dataset captured from the Brainwash Cafe in San Francisco
slug: brainwash
cssclass: dataset
image: assets/background.jpg
@@ -14,9 +15,14 @@ authors: Adam Harvey
------------
-## Brainwash Dataset
+# Brainwash Dataset
+
+*Update: In response to the publication of this report, the Brainwash dataset has been "removed from access at the request of the depositor."*
### sidebar
+
++ Press coverage: <a href="https://www.nytimes.com/2019/07/13/technology/">New York Times</a>, <a href="https://www.tijd.be/dossier/legrandinconnu/brainwash/10136670.html">De Tijd</a>
+
### end sidebar
Brainwash is a dataset of livecam images taken from San Francisco's Brainwash Cafe. It includes 11,917 images of "everyday life of a busy downtown cafe"[^readme] captured at 100 second intervals throughout the day. The Brainwash dataset includes 3 full days of webcam images taken on October 27, November 13, and November 24 in 2014. According the author's [research paper](https://www.semanticscholar.org/paper/End-to-End-People-Detection-in-Crowded-Scenes-Stewart-Andriluka/1bd1645a629f1b612960ab9bba276afd4cf7c666) introducing the dataset, the images were acquired with the help of Angelcam.com. [^end_to_end]
@@ -45,6 +51,12 @@ The two papers associated with the National University of Defense Technology in
![caption: Nine of 11,917 images from the the Brainwash dataset. Graphic: megapixels.cc based on Brainwash dataset by Russel et. al. License: <a href="https://opendatacommons.org/licenses/pddl/summary/index.html">Open Data Commons Public Domain Dedication</a> (PDDL)](assets/brainwash_grid.jpg)
+### Press Coverage
+
+- New York Times: [Facial Recognition Tech Is Growing Stronger, Thanks to Your Face](https://www.nytimes.com/2019/07/13/technology/)
+- De Tijd: [Brainwash](https://www.tijd.be/dossier/legrandinconnu/brainwash/10136670.html)
+
+
{% include 'cite_our_work.html' %}
#### Citing Brainwash Dataset
diff --git a/site/content/pages/datasets/duke_mtmc/index.md b/site/content/pages/datasets/duke_mtmc/index.md
index b5c6bf1a..cba08a3c 100644
--- a/site/content/pages/datasets/duke_mtmc/index.md
+++ b/site/content/pages/datasets/duke_mtmc/index.md
@@ -6,6 +6,7 @@ desc: <span class="dataset-name">Duke MTMC</span> is a dataset of surveillance c
subdesc: Duke MTMC contains over 2 million video frames and 2,700 unique identities collected from 8 HD cameras at Duke University campus in March 2014
slug: duke_mtmc
cssclass: dataset
+caption: A still frame from the Duke MTMC (Multi-Target-Multi-Camera) CCTV dataset captured on Duke University campus in 2014. The dataset has now been terminated by the author in response to this report.
image: assets/background.jpg
published: 2019-4-18
updated: 2019-05-22
@@ -13,12 +14,16 @@ authors: Adam Harvey
------------
-## Duke MTMC
+# Duke MTMC
+
+*Update: In response to this report and an [investigation](https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d43599e) by the Financial Times, Duke University has terminated the Duke MTMC dataset.*
### sidebar
### end sidebar
-Duke MTMC (Multi-Target, Multi-Camera) is a dataset of surveillance video footage taken on Duke University's campus in 2014 and is used for research and development of video tracking systems, person re-identification, and low-resolution facial recognition. The dataset contains over 14 hours of synchronized surveillance video from 8 cameras at 1080p and 60 FPS, with over 2 million frames of 2,000 students walking to and from classes. The 8 surveillance cameras deployed on campus were specifically setup to capture students "during periods between lectures, when pedestrian traffic is heavy".[^duke_mtmc_orig]
+Duke MTMC (Multi-Target, Multi-Camera) is a dataset of surveillance video footage taken on Duke University's campus in 2014 and is used for research and development of video tracking systems, person re-identification, and low-resolution facial recognition.
+
+The dataset contains over 14 hours of synchronized surveillance video from 8 cameras at 1080p and 60 FPS, with over 2 million frames of 2,000 students walking to and from classes. The 8 surveillance cameras deployed on campus were specifically setup to capture students "during periods between lectures, when pedestrian traffic is heavy".[^duke_mtmc_orig]
For this analysis of the Duke MTMC dataset over 100 publicly available research papers that used the dataset were analyzed to find out who's using the dataset and where it's being used. The results show that the Duke MTMC dataset has spread far beyond its origins and intentions in academic research projects at Duke University. Since its publication in 2016, more than twice as many research citations originated in China as in the United States. Among these citations were papers links to the Chinese military and several of the companies known to provide Chinese authorities with the oppressive surveillance technology used to monitor millions of Uighur Muslims.
diff --git a/site/content/pages/datasets/helen/assets/_background.jpg b/site/content/pages/datasets/helen/assets/_background.jpg
new file mode 100644
index 00000000..5968da24
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/_background.jpg
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/age.csv b/site/content/pages/datasets/helen/assets/age.csv
new file mode 100644
index 00000000..17121aac
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/age.csv
@@ -0,0 +1,10 @@
+age,faces
+0 - 12,31
+13 - 18,367
+19 - 24,567
+25 - 34,634
+35 - 44,362
+45 - 54,113
+55 - 64,56
+64 - 75,34
+75 - 100,10
diff --git a/site/content/pages/datasets/helen/assets/alpha.png b/site/content/pages/datasets/helen/assets/alpha.png
new file mode 100755
index 00000000..eb1defd0
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/alpha.png
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/background.jpg b/site/content/pages/datasets/helen/assets/background.jpg
index 6958a2b2..0288163e 100755
--- a/site/content/pages/datasets/helen/assets/background.jpg
+++ b/site/content/pages/datasets/helen/assets/background.jpg
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/feature_bride.jpg b/site/content/pages/datasets/helen/assets/feature_bride.jpg
new file mode 100755
index 00000000..5430f50b
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/feature_bride.jpg
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/feature_family.jpg b/site/content/pages/datasets/helen/assets/feature_family.jpg
new file mode 100755
index 00000000..a3fb833d
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/feature_family.jpg
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/feature_family_05.jpg b/site/content/pages/datasets/helen/assets/feature_family_05.jpg
new file mode 100755
index 00000000..57fb35bc
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/feature_family_05.jpg
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/feature_graduation.jpg b/site/content/pages/datasets/helen/assets/feature_graduation.jpg
new file mode 100755
index 00000000..f9f7d132
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/feature_graduation.jpg
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/feature_groom.jpg b/site/content/pages/datasets/helen/assets/feature_groom.jpg
new file mode 100755
index 00000000..31791987
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/feature_groom.jpg
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/feature_outdoor.jpg b/site/content/pages/datasets/helen/assets/feature_outdoor.jpg
new file mode 100755
index 00000000..375f5ae5
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/feature_outdoor.jpg
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/feature_outdoor_02.jpg b/site/content/pages/datasets/helen/assets/feature_outdoor_02.jpg
new file mode 100755
index 00000000..4a02876d
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/feature_outdoor_02.jpg
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/feature_wedding.jpg b/site/content/pages/datasets/helen/assets/feature_wedding.jpg
new file mode 100755
index 00000000..deed7061
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/feature_wedding.jpg
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/feature_wedding_02.jpg b/site/content/pages/datasets/helen/assets/feature_wedding_02.jpg
new file mode 100755
index 00000000..27489f7b
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/feature_wedding_02.jpg
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/gender.csv b/site/content/pages/datasets/helen/assets/gender.csv
new file mode 100644
index 00000000..e51919bc
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/gender.csv
@@ -0,0 +1,4 @@
+gender,faces
+Male,1118
+Female,1184
+They,186
diff --git a/site/content/pages/datasets/helen/assets/ijb_c_montage.jpg b/site/content/pages/datasets/helen/assets/ijb_c_montage.jpg
deleted file mode 100755
index 3b5a0e40..00000000
--- a/site/content/pages/datasets/helen/assets/ijb_c_montage.jpg
+++ /dev/null
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/index.jpg b/site/content/pages/datasets/helen/assets/index.jpg
index 7268d6ad..b9ce489d 100755
--- a/site/content/pages/datasets/helen/assets/index.jpg
+++ b/site/content/pages/datasets/helen/assets/index.jpg
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/montage_20_2_2_40_15.png b/site/content/pages/datasets/helen/assets/montage_20_2_2_40_15.png
new file mode 100755
index 00000000..86720be7
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/montage_20_2_2_40_15.png
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/montage_lms_21_14_14_14_22.png b/site/content/pages/datasets/helen/assets/montage_lms_21_14_14_14_22.png
new file mode 100755
index 00000000..3362f6bf
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/montage_lms_21_14_14_14_22.png
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/montage_lms_21_14_14_14_25.png b/site/content/pages/datasets/helen/assets/montage_lms_21_14_14_14_25.png
new file mode 100755
index 00000000..450235d5
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/montage_lms_21_14_14_14_25.png
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/montage_lms_21_14_14_14_26.png b/site/content/pages/datasets/helen/assets/montage_lms_21_14_14_14_26.png
new file mode 100755
index 00000000..490d44bb
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/montage_lms_21_14_14_14_26.png
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/montage_lms_21_15_15_7_26_0.png b/site/content/pages/datasets/helen/assets/montage_lms_21_15_15_7_26_0.png
new file mode 100755
index 00000000..6f1c85c5
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/montage_lms_21_15_15_7_26_0.png
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/single.png b/site/content/pages/datasets/helen/assets/single.png
new file mode 100755
index 00000000..5f7d23b0
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/single.png
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/single_21_15_15_7_43_19.png b/site/content/pages/datasets/helen/assets/single_21_15_15_7_43_19.png
new file mode 100755
index 00000000..eb1defd0
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/single_21_15_15_7_43_19.png
Binary files differ
diff --git a/site/content/pages/datasets/helen/index.md b/site/content/pages/datasets/helen/index.md
index d44c9b98..da1dc33b 100644
--- a/site/content/pages/datasets/helen/index.md
+++ b/site/content/pages/datasets/helen/index.md
@@ -1,30 +1,134 @@
------------
-status: draft
+status: published
title: HELEN
-desc: HELEN Face Dataset
-subdesc: HELEN (under development)
+desc: HELEN is a dataset of face images from Flickr used for training facial component localization algorithms
+subdesc: HELEN includes 2,330 images from Flickr found by keyword searches for "portrait", "wedding", "outdoor", "boy", "studio", and "family"
+caption: Selected images from the HELEN dataset
slug: helen
cssclass: dataset
+caption: Example images from the HELEN dataset
image: assets/background.jpg
-year: 2000
-published: 2019-4-18
-updated: 2019-4-18
+published: 2019-9-23
+updated: 2019-9-23
authors: Adam Harvey
------------
-## HELEN
+
+# HELEN Dataset
### sidebar
### end sidebar
-[ page under development ]
+Helen is a dataset of annotated face images used for facial component localization. It includes 2,330 images from Flickr found by searching for "portrait" combined with terms such as "family", "wedding", "boy", "outdoor", and "studio".[^orig_paper]
+
+The dataset was published in 2012 with the primary motivation listed as facilitating "high quality editing of portraits". However, the paper's introduction also mentions that facial feature localization "is an essential component for face recognition, tracking and expression analysis."[^orig_paper]
+
+Irregardless of the authors' primary motivations, the HELEN dataset has become one of the most widely used datasets for training facial landmark algorithms, which are essential parts of most facial recogntion processing systems. Facial landmarking are used to isolate facial features such as the eyes, nose, jawline, and mouth in order to align faces to match a templated pose.
+
+![caption: An example annotation from the HELEN dataset showing 194 points that were originally annotated by Mechanical Turk workers. Graphic &copy; 2019 MegaPixels.cc based on data from HELEN dataset by Le, Vuong et al.](assets/montage_lms_21_14_14_14_26.png)
+
+This analysis shows that since its initial publication in 2012, the HELEN dataset has been used in over 200 research projects related to facial recognition with the vast majority of research taking place in China.
+
+Commercial use includes IBM, NVIDIA, NEC, Microsoft Research Asia, Google, Megvii, Microsoft, Intel, Daimler, Tencent, Baidu, Adobe, Facebook
+
+Military and Defense Usage includes NUDT
+
+http://eccv2012.unifi.it/
+
+TODO
+
+- add proof of use in dlib and openface
+- add proof of use in commercial use of dlib? ibm dif
+- make landmark over blurred images
+- add 6x6 gride for landmarks
+- highlight key findings
+- highlight key commercial usage
+- look for most interesting research papers to provide example of how it's used for face recognition
+- estimated time: 6 hours
+- add data to github repo?
+
+| Organization | Paper | Link | Year | Used Duke MTMC |
+|---|---|---|---|
+| SenseTime, Amazon | [Look at Boundary: A Boundary-Aware Face Alignment Algorithm](https://arxiv.org/pdf/1805.10483.pdf)
+ | 2018 | year | &#x2714; |
+| SenseTime | [ReenactGAN: Learning to Reenact Faces via Boundary Transfer](https://arxiv.org/pdf/1807.11079.pdf) | 2018 | year | &#x2714; |
+
+
+The dataset was used for training the OpenFace software "we used the HELEN and LFPW training subsets for training and the rest for testing" https://github.com/TadasBaltrusaitis/OpenFace/wiki/Datasets
+
+The popular dlib facial landmark detector was trained using HELEN
+
+In addition to the 200+ verified citations, the HELEN dataset was used for
+- https://github.com/memoiry/face-alignment
+- http://www.dsp.toronto.edu/projects/face_analysis/
+
+It's been converted into new datasets including
+- https://github.com/JPlin/Relabeled-HELEN-Dataset
+- https://www.kaggle.com/kmader/helen-eye-dataset
+
+The original site
+- http://www.ifp.illinois.edu/~vuongle2/helen/
+
+### Example Images
+
+
+
+![caption: An image from the HELEN dataset "wedding" category used for training face recognition 2839127417_1.jpg for outdoor studio](assets/feature_outdoor_02.jpg)
+![caption: An image from the HELEN dataset "wedding" category used for training face recognition 2325274893_1 ](assets/feature_graduation.jpg)
+
+![caption: An image from the HELEN dataset "wedding" category used for training face recognition 2325274893_1 ](assets/feature_wedding.jpg)
+![caption: An image from the HELEN dataset "wedding" category used for training face recognition 2325274893_1 ](assets/feature_wedding_02.jpg)
+
+![caption: Original Flickr image used in HELEN facial analysis and recognition dataset for the keyword "family". 296814969](assets/feature_family.jpg)
+![caption: Original Flickr image used in HELEN facial analysis and recognition dataset for the keyword "family". 296814969](assets/feature_family_05.jpg)
+
{% include 'dashboard.html' %}
{% include 'supplementary_header.html' %}
+### Age and Gender Distribution
+
+{% include 'age_gender_disclaimer.html' %}
+
+=== columns 2
+
+```
+single_pie_chart /datasets/helen/assets/age.csv
+Caption: HELEN dataset age distribution
+Top: 10
+OtherLabel: Other
+```
+
+```
+single_pie_chart /datasets/helen/assets/gender.csv
+Caption: HELEN dataset gender distribution
+Top: 10
+OtherLabel: Other
+```
+
+=== end columns
+
+![caption: Visualization of the HELEN dataset 194-point facial landmark annotations. Credit: graphic &copy; MegaPixels.cc 2019, data from HELEN dataset by Zhou, Brand, Lin 2013. If you use this image please credit both the graphic and data source.](assets/montage_lms_21_15_15_7_26_0.png)
+
{% include 'cite_our_work.html' %}
+
+#### Cite the Original Author's Work
+
+If you find the HELEN dataset useful or reference it in your work, please cite the author's original work as:
+
+<pre>
+@inproceedings{Le2012InteractiveFF,
+ title={Interactive Facial Feature Localization},
+ author={Vuong Le and Jonathan Brandt and Zhe L. Lin and Lubomir D. Bourdev and Thomas S. Huang},
+ booktitle={ECCV},
+ year={2012}
+}
+</pre>
+
### Footnotes
+
+[^orig_paper]: Le, Vuong et al. “Interactive Facial Feature Localization.” ECCV (2012). \ No newline at end of file
diff --git a/site/content/pages/datasets/hrt_transgender/index.md b/site/content/pages/datasets/hrt_transgender/index.md
deleted file mode 100644
index fb820593..00000000
--- a/site/content/pages/datasets/hrt_transgender/index.md
+++ /dev/null
@@ -1,24 +0,0 @@
-------------
-
-status: draft
-title: HRT Transgender Dataset
-desc: TBD
-subdesc: TBD
-slug: hrt_transgender
-cssclass: dataset
-image: assets/background.jpg
-year: 2015
-published: 2019-2-23
-updated: 2019-2-23
-authors: Adam Harvey
-
-------------
-
-## HRT Transgender Dataset
-
-### sidebar
-### end sidebar
-
-[ page under development ]
-
-{% include 'dashboard.html' } \ No newline at end of file
diff --git a/site/content/pages/datasets/ibm_dif/index.md b/site/content/pages/datasets/ibm_dif/index.md
index 4c620e95..c5f25e1d 100644
--- a/site/content/pages/datasets/ibm_dif/index.md
+++ b/site/content/pages/datasets/ibm_dif/index.md
@@ -1,20 +1,20 @@
------------
status: draft
-title: MegaFace
-desc: MegaFace Dataset
-subdesc: MegaFace contains 670K identities and 4.7M images
-slug: megaface
+title: IBM DiF
+desc: Diversity in Faces Dataset
+subdesc: Loren Ispum...
+slug: ibm_dif
cssclass: dataset
image: assets/background.jpg
-year: 2016
-published: 2019-4-18
-updated: 2019-4-18
+year: 2019
+published: 2019-9-18
+updated: 2019-9-18
authors: Adam Harvey
------------
-## MegaFace
+## IBM Diversity in Faces
### sidebar
### end sidebar
diff --git a/site/content/pages/datasets/index.md b/site/content/pages/datasets/index.md
index f56a3291..54912242 100644
--- a/site/content/pages/datasets/index.md
+++ b/site/content/pages/datasets/index.md
@@ -4,6 +4,7 @@ status: published
title: MegaPixels: Face Recognition Datasets
desc: Facial Recognition Datasets
slug: home
+cssclass: dataset-list
published: 2018-12-15
updated: 2019-04-24
authors: Adam Harvey
diff --git a/site/content/pages/datasets/lfpw/index.md b/site/content/pages/datasets/lfpw/index.md
index 1021d490..21f885d4 100644
--- a/site/content/pages/datasets/lfpw/index.md
+++ b/site/content/pages/datasets/lfpw/index.md
@@ -19,7 +19,15 @@ authors: Adam Harvey
### sidebar
### end sidebar
-[ page under development ]
+RESEARCH below this line
+
+> Release 1 of LFPW consists of 1,432 faces from images downloaded from the web using simple text queries on sites such as google.com, flickr.com, and yahoo.com. Each image was labeled by three MTurk workers, and 29 fiducial points, shown below, are included in dataset. LFPW was originally described in the following publication:
+
+> Due to copyright issues, we cannot distribute image files in any format to anyone. Instead, we have made available a list of image URLs where you can download the images yourself. We realize that this makes it impossible to exactly compare numbers, as image links will slowly disappear over time, but we have no other option. This seems to be the way other large web-based databases seem to be evolving.
+
+<https://neerajkumar.org/databases/lfpw/>
+
+> This research was performed at Kriegman-Belhumeur Vision Technologies and was funded by the CIA through the Office of the Chief Scientist. <https://www.cs.cmu.edu/~peiyunh/topdown/> (nk_cvpr2011\_faceparts.pdf)
{% include 'dashboard.html' %}
diff --git a/site/content/pages/datasets/megaface/assets/age.csv b/site/content/pages/datasets/megaface/assets/age.csv
new file mode 100644
index 00000000..52a86599
--- /dev/null
+++ b/site/content/pages/datasets/megaface/assets/age.csv
@@ -0,0 +1,10 @@
+age,faces
+0 - 12,785
+13 - 18,52026
+19 - 24,254411
+25 - 34,452129
+35 - 44,341809
+45 - 54,193525
+55 - 64,65635
+64 - 75,22148
+75 - 100,3108
diff --git a/site/content/pages/datasets/megaface/assets/gender.csv b/site/content/pages/datasets/megaface/assets/gender.csv
new file mode 100644
index 00000000..a23e6548
--- /dev/null
+++ b/site/content/pages/datasets/megaface/assets/gender.csv
@@ -0,0 +1,4 @@
+gender,faces
+male,884043
+female,580747
+they,94990
diff --git a/site/content/pages/datasets/megaface/index.md b/site/content/pages/datasets/megaface/index.md
index 4c620e95..2009e70e 100644
--- a/site/content/pages/datasets/megaface/index.md
+++ b/site/content/pages/datasets/megaface/index.md
@@ -1,9 +1,10 @@
------------
-status: draft
+status: published
title: MegaFace
desc: MegaFace Dataset
subdesc: MegaFace contains 670K identities and 4.7M images
+caption: Example images from the MegaFace dataset
slug: megaface
cssclass: dataset
image: assets/background.jpg
@@ -19,12 +20,34 @@ authors: Adam Harvey
### sidebar
### end sidebar
-[ page under development ]
+MegaFace is a dataset...
{% include 'dashboard.html' %}
{% include 'supplementary_header.html' %}
+### Age and Gender Distribution
+
+=== columns 2
+
+```
+single_pie_chart /datasets/megaface/assets/age.csv
+Caption: MegaFace dataset age distribution
+Top: 10
+OtherLabel: Other
+```
+
+```
+single_pie_chart /datasets/megaface/assets/gender.csv
+Caption: MegaFace dataset gender distribution
+Top: 10
+OtherLabel: Other
+```
+
+=== end columns
+
+{% include 'age_gender_disclaimer.html' %}
+
{% include 'cite_our_work.html' %}
### Footnotes
diff --git a/site/content/pages/datasets/msceleb/assets/age.csv b/site/content/pages/datasets/msceleb/assets/age.csv
new file mode 100644
index 00000000..ce9238f8
--- /dev/null
+++ b/site/content/pages/datasets/msceleb/assets/age.csv
@@ -0,0 +1,10 @@
+age,faces
+0 - 12,51
+13 - 18,3769
+19 - 24,25147
+25 - 34,58352
+35 - 44,57071
+45 - 54,35828
+55 - 64,15335
+64 - 75,6858
+75 - 100,1173
diff --git a/site/content/pages/datasets/msceleb/assets/gender.csv b/site/content/pages/datasets/msceleb/assets/gender.csv
new file mode 100644
index 00000000..ffa644ec
--- /dev/null
+++ b/site/content/pages/datasets/msceleb/assets/gender.csv
@@ -0,0 +1,4 @@
+gender,faces
+Male,150310
+Female,67319
+They,9068
diff --git a/site/content/pages/datasets/msceleb/index.md b/site/content/pages/datasets/msceleb/index.md
index 0e457cd9..64584b31 100644
--- a/site/content/pages/datasets/msceleb/index.md
+++ b/site/content/pages/datasets/msceleb/index.md
@@ -4,6 +4,7 @@ status: published
title: Microsoft Celeb Dataset
desc: MS Celeb is a dataset of 10 million face images harvested from the Internet
subdesc: The MS Celeb dataset includes 10 million images of 100,000 people and an additional target list of 1,000,000 individuals
+caption: Example images forom the MS-Celeb-1M dataset
slug: msceleb
cssclass: dataset
image: assets/background.jpg
@@ -14,12 +15,21 @@ authors: Adam Harvey
------------
-## Microsoft Celeb Dataset (MS Celeb)
+
+# Microsoft Celeb Dataset (MS Celeb)
+
+*Update: In response to this report and an [investigation](https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d43599e) by the Financial Times, Microsoft has terminated their MS-Celeb website <https://msceleb.org>.*
### sidebar
+
++ Press coverage: <a href="https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d43599e">Financial Times</a>, <a href="https://www.nytimes.com/2019/07/13/technology/databases-faces-facial-recognition-technology.html">New York Times</a>, <a href="https://www.bbc.com/news/technology-48555149">BBC</a>, <a href="https://www.spiegel.de/netzwelt/web/microsoft-gesichtserkennung-datenbank-mit-zehn-millionen-fotos-geloescht-a-1271221.html">Spiegel</a>, <a href="https://www.lesechos.fr/tech-medias/intelligence-artificielle/le-mariage-explosif-de-nos-donnees-et-de-lia-1031813">Les Echos</a>, <a href="https://www.lastampa.it/2019/06/22/tecnologia/microsoft-ha-cancellato-il-suo-database-per-il-riconoscimento-facciale-PWwLGmpO1fKQdykMZVBd9H/pagina.html">La Stampa</a>
+
### end sidebar
-Microsoft Celeb (MS-Celeb-1M) is a dataset of 10 million face images harvested from the Internet for the purpose of developing face recognition technologies. According to Microsoft Research, who created and published the [dataset](https://www.microsoft.com/en-us/research/publication/ms-celeb-1m-dataset-benchmark-large-scale-face-recognition-2/) in 2016, MS Celeb is the largest publicly available face recognition dataset in the world, containing over 10 million images of nearly 100,000 individuals. Microsoft's goal in building this dataset was to distribute an initial training dataset of 100,000 individuals' biometric data to accelerate research into recognizing a larger target list of one million people "using all the possibly collected face images of this individual on the web as training data".[^msceleb_orig]
+Microsoft Celeb (MS-Celeb-1M) is a dataset of 10 million face images harvested from the Internet for the purpose of developing face recognition technologies.
+
+According to Microsoft Research, who created and published the [dataset](https://www.microsoft.com/en-us/research/publication/ms-celeb-1m-dataset-benchmark-large-scale-face-recognition-2/) in 2016, MS Celeb is the largest publicly available face recognition dataset in the world, containing over 10 million images of nearly 100,000 individuals. Microsoft's goal in building this dataset was to distribute an initial training dataset of 100,000 individuals' biometric data to accelerate research into recognizing a larger target list of one million people "using all the possibly collected face images of this individual on the web as training data".[^msceleb_orig]
+
While the majority of people in this dataset are American and British actors, the exploitative use of the term "celebrity" extends far beyond Hollywood. Many of the names in the MS Celeb face recognition dataset are merely people who must maintain an online presence for their professional lives: journalists, artists, musicians, activists, policy makers, writers, and academics. Many people in the target list are even vocal critics of the very technology Microsoft is using their name and biometric information to build. It includes digital rights activists like Jillian York; artists critical of surveillance including Trevor Paglen, Jill Magid, and Aram Bartholl; Intercept founders Laura Poitras, Jeremy Scahill, and Glenn Greenwald; Data and Society founder danah boyd; Shoshana Zuboff, author of *Surveillance Capitalism*; and even Julie Brill, the former FTC commissioner responsible for protecting consumer privacy.
@@ -115,15 +125,47 @@ Considering the multiple citations from commercial organizations (Canon, Hitachi
To provide insight into where these 10 million faces images have traveled, over 100 research papers have been verified and geolocated to show who used the dataset and where they used it.
+## GDPR and MS-Celeb
+
+[ in progress ]
+
{% include 'dashboard.html' %}
{% include 'supplementary_header.html' %}
+### Age and Gender Distribution
+
+=== columns 2
+
+```
+single_pie_chart /datasets/msceleb/assets/age.csv
+Caption: MS-Celeb dataset age distribution
+Top: 10
+OtherLabel: Other
+```
+
+```
+single_pie_chart /datasets/helen/assets/gender.csv
+Caption: MS-Celeb dataset gender distribution
+Top: 10
+OtherLabel: Other
+```
+
+=== end columns
+
##### FAQs and Fact Check
-- **The MS Celeb images were not derived from Creative Commons sources**. They were obtained by "retriev[ing] approximately 100 images per celebrity from popular search engines"[^msceleb_orig]. The dataset actually includes many copyrighted images. Microsoft doesn't provide any image URLs, but manually reviewing a small portion of images from the dataset shows many images with watermarked "Copyright" text over the image. TinEye could be used to more accurately determine the image origins in aggregate
-- **Microsoft did not distribute images of all one million people.** They distributed images for about 100,000 and then encouraged other researchers to download the remaining 900,000 people "by using all the possibly collected face images of this individual on the web as training data."[^msceleb_orig]
-- **Microsoft had not deleted or stopped distribution of their MS Celeb at the time of most press reports on June 4.** Until at least June 6, 2019 the Microsoft Research data portal provided the MS Celeb dataset for download: <http://web.archive.org/web/20190606150005/https://msropendata.com/datasets/98fdfc70-85ee-5288-a69f-d859bbe9c737>
+- **Despite several erroneous reports mentioning the MS-Celeb images were derived from Creative Commons licensed media, the MS Celeb images were obtained from web search engines**. The authors mention "they were obtained by "retriev[ing] approximately 100 images per celebrity from popular search engines"[^msceleb_orig]. Many, if not the vast majority, are copyrighted images. Microsoft doesn't provide image URLs, but manually reviewing a small portion of images from the dataset shows images with watermarked "Copyright" text over the image and sources including stock photo agencies such as Getty. TinEye could be used to more accurately determine the image origins in aggregate.
+- **Most reports incorrectly reported that Microsoft distributed images of all one million people. As this analysis mentions several times, Microsoft distributed images for 100,000 people and a separate target list of 900,000 more names.** Other researchers where then expected and encouraged to download the remaining 900,000 people "by using all the possibly collected face images of this individual on the web as training data."[^msceleb_orig]
+- **Microsoft claimed that they had deleted or stopped distribution of their MS Celeb dataset in April 2019 after the Financial Times investigation. This false.** Until at least June 6, 2019 the Microsoft Research data portal freely provided the full MS Celeb dataset download: <http://web.archive.org/web/20190606150005/https://msropendata.com/datasets/98fdfc70-85ee-5288-a69f-d859bbe9c737>
+
+### Press Coverage
+
+- Financial Times (original story): [Who’s using your face? The ugly truth about facial recognition](https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d43599e)
+- New York Times (front page story): [Facial Recognition Tech Is Growing Stronger, Thanks to Your Face](https://www.nytimes.com/2019/07/13/technology/databases-faces-facial-recognition-technology.html)
+- BBC: [Microsoft deletes massive face recognition database](https://www.bbc.com/news/technology-48555149)
+- Spiegel: [Microsoft löscht Datenbank mit zehn Millionen Fotos](https://www.spiegel.de/netzwelt/web/microsoft-gesichtserkennung-datenbank-mit-zehn-millionen-fotos-geloescht-a-1271221.html)
+
### Footnotes
diff --git a/site/content/pages/datasets/oxford_town_centre/index.md b/site/content/pages/datasets/oxford_town_centre/index.md
index c2e3e7a7..eb1e360b 100644
--- a/site/content/pages/datasets/oxford_town_centre/index.md
+++ b/site/content/pages/datasets/oxford_town_centre/index.md
@@ -4,6 +4,7 @@ status: published
title: Oxford Town Centre Dataset
desc: Oxford Town Centre is a dataset of surveillance camera footage from Cornmarket St Oxford, England
subdesc: The Oxford Town Centre dataset includes approximately 2,200 identities and is used for research and development of face recognition systems
+caption: A still frame from the Oxford Town Centre CCTV video-dataset
slug: oxford_town_centre
cssclass: dataset
image: assets/background.jpg
@@ -14,12 +15,14 @@ authors: Adam Harvey
------------
-## Oxford Town Centre
+# Oxford Town Centre
### sidebar
### end sidebar
-The Oxford Town Centre dataset is a CCTV video of pedestrians in a busy downtown area in Oxford used for research and development of activity and face recognition systems.[^ben_benfold_orig] The CCTV video was obtained from a surveillance camera at the corner of Cornmarket and Market St. in Oxford, England and includes approximately 2,200 people. Since its publication in 2009[^guiding_surveillance] the [Oxford Town Centre dataset](http://www.robots.ox.ac.uk/ActiveVision/Research/Projects/2009bbenfold_headpose/project.html) has been used in over 80 verified research projects including commercial research by Amazon, Disney, OSRAM, and Huawei; and academic research in China, Israel, Russia, Singapore, the US, and Germany among dozens more.
+The Oxford Town Centre dataset is a CCTV video of pedestrians in a busy downtown area in Oxford used for research and development of activity and face recognition systems.[^ben_benfold_orig]
+
+The CCTV video was obtained from a surveillance camera at the corner of Cornmarket and Market St. in Oxford, England and includes approximately 2,200 people. Since its publication in 2009[^guiding_surveillance] the [Oxford Town Centre dataset](http://www.robots.ox.ac.uk/ActiveVision/Research/Projects/2009bbenfold_headpose/project.html) has been used in over 80 verified research projects including commercial research by Amazon, Disney, OSRAM, and Huawei; and academic research in China, Israel, Russia, Singapore, the US, and Germany among dozens more.
The Oxford Town Centre dataset is unique in that it uses footage from a public surveillance camera that would otherwise be designated for public safety. The video shows that the pedestrians act normally and unrehearsed indicating they neither knew of nor consented to participation in the research project.
diff --git a/site/content/pages/datasets/pipa/assets/age.csv b/site/content/pages/datasets/pipa/assets/age.csv
new file mode 100644
index 00000000..c742bcb3
--- /dev/null
+++ b/site/content/pages/datasets/pipa/assets/age.csv
@@ -0,0 +1,10 @@
+age,faces
+0 - 12,6
+13 - 18,929
+19 - 24,3598
+25 - 34,6035
+35 - 44,5055
+45 - 54,2833
+55 - 64,741
+64 - 75,173
+75 - 100,17
diff --git a/site/content/pages/datasets/pipa/assets/gender.csv b/site/content/pages/datasets/pipa/assets/gender.csv
new file mode 100644
index 00000000..b128aaec
--- /dev/null
+++ b/site/content/pages/datasets/pipa/assets/gender.csv
@@ -0,0 +1,4 @@
+gender,faces
+Male,10750
+Female,9423
+They,1741
diff --git a/site/content/pages/datasets/pipa/index.md b/site/content/pages/datasets/pipa/index.md
index ca30b693..dd59cafb 100644
--- a/site/content/pages/datasets/pipa/index.md
+++ b/site/content/pages/datasets/pipa/index.md
@@ -14,7 +14,7 @@ authors: Adam Harvey
------------
-## MegaFace
+## PIPA: People in Photo Albums
### sidebar
### end sidebar
diff --git a/site/content/pages/datasets/uccs/index.md b/site/content/pages/datasets/uccs/index.md
index b493c633..3b9bed8a 100644
--- a/site/content/pages/datasets/uccs/index.md
+++ b/site/content/pages/datasets/uccs/index.md
@@ -5,6 +5,7 @@ title: UnConstrained College Students Dataset
slug: uccs
desc: <span class="dataset-name">UnConstrained College Students</span> is a dataset of long-range surveillance photos of students on University of Colorado in Colorado Springs campus
subdesc: The UnConstrained College Students dataset includes 16,149 images of 1,732 students, faculty, and pedestrians and is used for developing face recognition and face detection algorithms
+caption: One of 16,149 images form the UnConstrained College Students face recognition dataset captured at University of Colorado, Colorado Springs
image: assets/background.jpg
cssclass: dataset
image: assets/background.jpg
@@ -15,12 +16,16 @@ authors: Adam Harvey
------------
-## UnConstrained College Students
+# UnConstrained College Students
+
+*Update: In response to this report and its previous publication of metadata from UCCS dataset photos, UCCS has temporarily suspended its dataset, but plans to release a new version.*
### sidebar
### end sidebar
-UnConstrained College Students (UCCS) is a dataset of long-range surveillance photos captured at University of Colorado Colorado Springs developed primarily for research and development of "face detection and recognition research towards surveillance applications"[^uccs_vast]. According to the authors of [two](https://www.semanticscholar.org/paper/Unconstrained-Face-Detection-and-Open-Set-Face-G%C3%BCnther-Hu/d4f1eb008eb80595bcfdac368e23ae9754e1e745) [papers](https://www.semanticscholar.org/paper/Large-scale-unconstrained-open-set-face-database-Sapkota-Boult/07fcbae86f7a3ad3ea1cf95178459ee9eaf77cb1) associated with the dataset, over 1,700 students and pedestrians were "photographed using a long-range high-resolution surveillance camera without their knowledge".[^funding_uccs] This analysis examines the [UCCS dataset](http://vast.uccs.edu/Opensetface/) contents of the [dataset](), its funding sources, timestamp data, and information from publicly available research project citations.
+UnConstrained College Students (UCCS) is a dataset of long-range surveillance photos captured at University of Colorado Colorado Springs developed primarily for research and development of "face detection and recognition research towards surveillance applications"[^uccs_vast].
+
+According to the authors of [two](https://www.semanticscholar.org/paper/Unconstrained-Face-Detection-and-Open-Set-Face-G%C3%BCnther-Hu/d4f1eb008eb80595bcfdac368e23ae9754e1e745) [papers](https://www.semanticscholar.org/paper/Large-scale-unconstrained-open-set-face-database-Sapkota-Boult/07fcbae86f7a3ad3ea1cf95178459ee9eaf77cb1) associated with the dataset, over 1,700 students and pedestrians were "photographed using a long-range high-resolution surveillance camera without their knowledge".[^funding_uccs] This analysis examines the [UCCS dataset](http://vast.uccs.edu/Opensetface/) contents of the [dataset](), its funding sources, timestamp data, and information from publicly available research project citations.
The UCCS dataset includes over 1,700 unique identities, most of which are students walking to and from class. In 2018, it was the "largest surveillance [face recognition] benchmark in the public domain."[^surv_face_qmul] The photos were taken during the spring semesters of 2012 &ndash; 2013 on the West Lawn of the University of Colorado Colorado Springs campus. The photographs were timed to capture students during breaks between their scheduled classes in the morning and afternoon during Monday through Thursday. "For example, a student taking Monday-Wednesday classes at 12:30 PM will show up in the camera on almost every Monday and Wednesday."[^sapkota_boult].
diff --git a/site/content/pages/datasets/vgg_face/assets/background.jpg b/site/content/pages/datasets/vgg_face/assets/background.jpg
deleted file mode 100755
index 6958a2b2..00000000
--- a/site/content/pages/datasets/vgg_face/assets/background.jpg
+++ /dev/null
Binary files differ
diff --git a/site/content/pages/datasets/vgg_face/assets/ijb_c_montage.jpg b/site/content/pages/datasets/vgg_face/assets/ijb_c_montage.jpg
deleted file mode 100755
index 3b5a0e40..00000000
--- a/site/content/pages/datasets/vgg_face/assets/ijb_c_montage.jpg
+++ /dev/null
Binary files differ
diff --git a/site/content/pages/datasets/vgg_face/assets/index.jpg b/site/content/pages/datasets/vgg_face/assets/index.jpg
deleted file mode 100755
index 7268d6ad..00000000
--- a/site/content/pages/datasets/vgg_face/assets/index.jpg
+++ /dev/null
Binary files differ
diff --git a/site/content/pages/datasets/vgg_face/index.md b/site/content/pages/datasets/vgg_face/index.md
deleted file mode 100644
index 2424f1ff..00000000
--- a/site/content/pages/datasets/vgg_face/index.md
+++ /dev/null
@@ -1,30 +0,0 @@
-------------
-
-status: draft
-title: VGG Face
-desc: VGG Face Dataset
-subdesc: VGG Face ...
-slug: vgg_face
-cssclass: dataset
-image: assets/background.jpg
-year: 2016
-published: 2019-4-18
-updated: 2019-4-18
-authors: Adam Harvey
-
-------------
-
-## MegaFace
-
-### sidebar
-### end sidebar
-
-[ page under development ]
-
-{% include 'dashboard.html' %}
-
-{% include 'supplementary_header.html' %}
-
-{% include 'cite_our_work.html' %}
-
-### Footnotes
diff --git a/site/content/pages/datasets/who_goes_there/index.md b/site/content/pages/datasets/who_goes_there/index.md
index feb9896d..c6fe3806 100644
--- a/site/content/pages/datasets/who_goes_there/index.md
+++ b/site/content/pages/datasets/who_goes_there/index.md
@@ -3,7 +3,7 @@
status: draft
title: Who Goes There Dataset
desc: Who Goes There Dataset
-subdesc: Who Goes There (page under development)
+subdesc: Who Goes There
slug: who_goes_there
cssclass: dataset
image: assets/background.jpg