summaryrefslogtreecommitdiff
path: root/site
diff options
context:
space:
mode:
authorjules@lens <julescarbon@gmail.com>2019-10-10 13:33:31 +0200
committerjules@lens <julescarbon@gmail.com>2019-10-10 13:33:31 +0200
commit7d72cbb935ec53ce66c6a0c5cdc68f157be1d35f (patch)
treea44049683c3c5e44449fe2698bb080329ecf7e61 /site
parent488a65aa5caba91c1384e7bcb2023056e913fc22 (diff)
parentcdc0c7ad21eb764cfe36d7583e126660d87fe02d (diff)
Merge branch 'master' of asdf.us:megapixels_dev
Diffstat (limited to 'site')
-rw-r--r--site/assets/css/applets.css5
-rw-r--r--site/assets/css/css.css148
-rwxr-xr-xsite/assets/css/tabulator.css2
-rw-r--r--site/content/_drafts_/pipa/index.md23
-rw-r--r--site/content/pages/about/index.md50
-rw-r--r--site/content/pages/about/legal.md4
-rw-r--r--site/content/pages/datasets/adience/index.md1
-rw-r--r--site/content/pages/datasets/brainwash/index.md14
-rw-r--r--site/content/pages/datasets/duke_mtmc/index.md9
-rw-r--r--site/content/pages/datasets/helen/assets/_background.jpgbin0 -> 554835 bytes
-rw-r--r--site/content/pages/datasets/helen/assets/age.csv10
-rwxr-xr-xsite/content/pages/datasets/helen/assets/alpha.pngbin0 -> 7160 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/background.jpgbin134927 -> 210197 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/feature_bride.jpgbin0 -> 39064 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/feature_family.jpgbin0 -> 74137 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/feature_family_05.jpgbin0 -> 74142 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/feature_graduation.jpgbin0 -> 46147 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/feature_groom.jpgbin0 -> 41000 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/feature_outdoor.jpgbin0 -> 46663 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/feature_outdoor_02.jpgbin0 -> 56328 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/feature_wedding.jpgbin0 -> 48999 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/feature_wedding_02.jpgbin0 -> 43256 bytes
-rw-r--r--site/content/pages/datasets/helen/assets/gender.csv4
-rwxr-xr-xsite/content/pages/datasets/helen/assets/ijb_c_montage.jpgbin424821 -> 0 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/index.jpgbin14856 -> 23135 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/montage_20_2_2_40_15.pngbin0 -> 3259144 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/montage_lms_21_14_14_14_22.pngbin0 -> 23058 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/montage_lms_21_14_14_14_25.pngbin0 -> 22435 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/montage_lms_21_14_14_14_26.pngbin0 -> 14164 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/montage_lms_21_15_15_7_26_0.pngbin0 -> 45864 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/single.pngbin0 -> 9001 bytes
-rwxr-xr-xsite/content/pages/datasets/helen/assets/single_21_15_15_7_43_19.pngbin0 -> 7160 bytes
-rw-r--r--site/content/pages/datasets/helen/index.md120
-rw-r--r--site/content/pages/datasets/hrt_transgender/index.md24
-rw-r--r--site/content/pages/datasets/ibm_dif/index.md16
-rw-r--r--site/content/pages/datasets/index.md3
-rw-r--r--site/content/pages/datasets/lfpw/index.md10
-rw-r--r--site/content/pages/datasets/megaface/assets/age.csv10
-rw-r--r--site/content/pages/datasets/megaface/assets/gender.csv4
-rw-r--r--site/content/pages/datasets/megaface/index.md62
-rw-r--r--site/content/pages/datasets/msceleb/assets/age.csv10
-rw-r--r--site/content/pages/datasets/msceleb/assets/gender.csv4
-rw-r--r--site/content/pages/datasets/msceleb/index.md52
-rw-r--r--site/content/pages/datasets/oxford_town_centre/index.md7
-rw-r--r--site/content/pages/datasets/pipa/assets/age.csv10
-rw-r--r--site/content/pages/datasets/pipa/assets/gender.csv4
-rw-r--r--site/content/pages/datasets/pipa/index.md2
-rw-r--r--site/content/pages/datasets/uccs/index.md9
-rwxr-xr-xsite/content/pages/datasets/vgg_face/assets/background.jpgbin134927 -> 0 bytes
-rwxr-xr-xsite/content/pages/datasets/vgg_face/assets/ijb_c_montage.jpgbin424821 -> 0 bytes
-rwxr-xr-xsite/content/pages/datasets/vgg_face/assets/index.jpgbin14856 -> 0 bytes
-rw-r--r--site/content/pages/datasets/vgg_face/index.md30
-rw-r--r--site/content/pages/datasets/who_goes_there/index.md2
-rw-r--r--site/content/pages/research/munich_security_conference/index.md23
-rw-r--r--site/includes/age_gender_disclaimer.html3
-rw-r--r--site/includes/chart.html14
-rw-r--r--site/includes/dashboard.html6
-rw-r--r--site/includes/map.html22
-rw-r--r--site/public/about/index.html39
-rw-r--r--site/public/about/legal/index.html4
-rw-r--r--site/public/datasets/adience/index.html9
-rw-r--r--site/public/datasets/brainwash/index.html19
-rw-r--r--site/public/datasets/duke_mtmc/index.html13
-rw-r--r--site/public/datasets/helen/index.html96
-rw-r--r--site/public/datasets/ibm_dif/index.html31
-rw-r--r--site/public/datasets/ijb_c/index.html9
-rw-r--r--site/public/datasets/index.html32
-rw-r--r--site/public/datasets/lfpw/index.html17
-rw-r--r--site/public/datasets/megaface/index.html37
-rw-r--r--site/public/datasets/msceleb/index.html31
-rw-r--r--site/public/datasets/oxford_town_centre/index.html12
-rw-r--r--site/public/datasets/pipa/index.html9
-rw-r--r--site/public/datasets/uccs/index.html13
-rw-r--r--site/public/datasets/who_goes_there/index.html9
-rw-r--r--site/public/research/index.html2
-rw-r--r--site/public/research/munich_security_conference/index.html26
76 files changed, 721 insertions, 404 deletions
diff --git a/site/assets/css/applets.css b/site/assets/css/applets.css
index daf36a19..70d8f51f 100644
--- a/site/assets/css/applets.css
+++ b/site/assets/css/applets.css
@@ -9,7 +9,7 @@
min-height: 0;
}
.applet {
- margin-bottom: 60px;
+ margin-bottom: 0px;
transition: opacity 0.2s cubic-bezier(0,0,1,1);
opacity: 0;
}
@@ -187,6 +187,7 @@
.tabulator {
font-family: 'Roboto', sans-serif;
+ font-size:10px;
}
.tabulator-row {
transition: background-color 100ms cubic-bezier(0,0,1,1);
@@ -247,7 +248,7 @@
stroke: rgba(64,64,64,0.3);
}
.chartCaption {
- color: #888;
+ color: #333;
font-size: 12px;
font-family: 'Roboto', sans-serif;
font-weight: 400;
diff --git a/site/assets/css/css.css b/site/assets/css/css.css
index 6b1f40cd..89ac8616 100644
--- a/site/assets/css/css.css
+++ b/site/assets/css/css.css
@@ -12,11 +12,11 @@ html, body {
min-height: 100%;
/*font-family: 'Roboto Mono', sans-serif;*/
font-family: 'Roboto', sans-serif;
- color: #eee;
+ color: #000;
overflow-x: hidden;
}
html {
- background: #181818;
+ background: #fff;
}
a { outline: none; }
img { border: 0; }
@@ -33,6 +33,7 @@ html.mobile .content{
}
/* header */
+/* header */
header {
position: fixed;
@@ -155,7 +156,7 @@ footer {
display: flex;
flex-direction: row;
justify-content: space-between;
- color: #666;
+ color: #ccc;
font-size: 13px;
/*line-height: 17px;*/
padding: 15px;
@@ -178,6 +179,9 @@ footer a {
padding-bottom: 1px;
text-decoration: none;
}
+.desktop footer a {
+ border-bottom:1px solid #999;
+}
.desktop footer a:hover {
color: #fff;
border-bottom:1px solid #999;
@@ -211,30 +215,35 @@ footer ul:last-child li {
/* headings */
h1 {
- color: #eee;
- font-weight: 400;
- font-size: 34pt;
+ color: #000;
+ font-weight: 500;
+ font-size: 28pt;
+ line-height: 38pt;
margin: 20px auto 10px auto;
padding: 0;
transition: color 0.1s cubic-bezier(0,0,1,1);
font-family: 'Roboto Mono', monospace;
+ text-transform: uppercase;
}
h2 {
- color: #eee;
- font-weight: 400;
+ color: #111;
+ font-weight: 500;
font-size: 34px;
line-height: 43px;
margin: 20px auto 20px auto;
padding: 0;
transition: color 0.1s cubic-bezier(0,0,1,1);
font-family: 'Roboto Mono', monospace;
+ text-transform: uppercase;
}
h3 {
+ color: #333;
margin: 20px auto 10px auto;
font-size: 28px;
font-weight: 400;
transition: color 0.1s cubic-bezier(0,0,1,1);
font-family: 'Roboto Mono', monospace;
+ text-transform: uppercase;
}
h4 {
margin: 6px auto 10px auto;
@@ -243,6 +252,7 @@ h4 {
font-weight: 400;
transition: color 0.1s cubic-bezier(0,0,1,1);
font-family: 'Roboto Mono', monospace;
+ text-transform: uppercase;
}
h5 {
margin: 6px auto 10px auto;
@@ -253,11 +263,11 @@ h5 {
font-family: 'Roboto Mono', monospace;
}
.content h3 a {
- color: #888;
+ color: #333;
text-decoration: none;
}
.desktop .content h3 a:hover {
- color: #fff;
+ color: #111;
text-decoration: underline;
}
.right-sidebar h3 {
@@ -272,12 +282,15 @@ h5 {
.right-sidebar ul li a {
border-bottom: 0;
}
+.right-sidebar ul li:last-child{
+ border-bottom: 0;
+}
th, .gray {
font-family: 'Roboto', monospace;
font-weight: 500;
text-transform: uppercase;
letter-spacing: .15rem;
- color: #777;
+ color: #333;
}
th, .gray {
font-size: 9pt;
@@ -354,10 +367,10 @@ section {
}
section p {
margin: 10px auto 20px auto;
- line-height: 1.9rem;
- font-size: 17px;
+ line-height: 1.95rem;
+ font-size: 16px;
font-weight: 400;
- color: #cdcdcd;
+ color: #111;
}
section ul {
margin: 10px auto 20px auto;
@@ -367,22 +380,44 @@ section h1, section h2, section h3, section h4, section h5, section h6, section
max-width: 720px;
}
-.content-dataset section:nth-child(2) p:first-child{
- font-size:19px;
+.content-dataset-list section:nth-child(1) p:nth-child(2){
+ font-size:22px;
+ line-height:34px;
+}
+.content-dataset section:nth-child(4) p:nth-child(2){
+ font-size:18px;
+ line-height: 34px;
+ color:#000;
+}
+.content-dataset section:nth-child(3) p:nth-child(2) {
+ /* highlight news text */
+ /*font-style: italic;*/
+ font-weight: 400;
+ color:#f00;
+}
+.content-dataset section:nth-child(3) p:nth-child(2) a{
+ /* highlight news text */
+ color:#f00;
+ border-bottom: 1px solid #f00;
+}
+.content-dataset section:nth-child(3) p:nth-child(2) a:hover{
+ /* highlight news text */
+ color:#f00;
+ border-bottom: 1px solid #f00;
}
p.subp{
font-size: 14px;
}
.content a {
- color: #dedede;
+ color: #333;
text-decoration: none;
- border-bottom: 2px solid #666;
+ border-bottom: 1px solid #333;
padding-bottom: 1px;
transition: color 0.1s cubic-bezier(0,0,1,1);
}
.desktop .content a:hover {
- color: #fff;
- border-bottom: 2px solid #ccc;
+ color: #111;
+ border-bottom: 1px solid #111;
}
/* top of post metadata */
@@ -393,7 +428,7 @@ p.subp{
justify-content: flex-start;
align-items: flex-start;
font-size: 12px;
- color: #ccc;
+ color: #111;
margin-bottom: 20px;
font-family: 'Roboto', sans-serif;
margin-right: 20px;
@@ -412,7 +447,6 @@ p.subp{
float: right;
width: 200px;
margin: 0px 20px 20px 20px;
- padding-top: 12px;
padding-left: 20px;
border-left: 1px solid #333;
font-family: 'Roboto';
@@ -442,7 +476,10 @@ p.subp{
border-bottom: 1px solid #333;
padding:10px 10px 10px 0;
margin: 0 4px 4px 0;
- color: #bbb;
+ color: #111;
+}
+.right-sidebar .meta:last-child{
+ border-bottom: 0;
}
.right-sidebar ul {
margin-bottom: 10px;
@@ -471,15 +508,14 @@ p.subp{
/* lists */
ul {
- list-style-type: none;
+ list-style-type: square;
margin: 0 0 30px 0;
padding: 0;
}
ul li {
margin-bottom: 8px;
- color: #dedede;
font-weight: 400;
- font-size: 14px;
+ font-size: 15px;
}
/* misc formatting */
@@ -497,8 +533,9 @@ pre {
border-radius: 2px;
padding: 10px;
display: block;
- background: #333;
+ background: #ddd;
overflow: auto
+ /*margin-bottom: 10px;*/
}
pre code {
display: block;
@@ -533,10 +570,10 @@ table tr td{
font-size:12px;
}
table tbody tr:nth-child(odd){
- background-color:#292929;
+ background-color:#ebebeb;
}
table tbody tr:nth-child(even){
- background-color:#333;
+ background-color:#ccc;
}
hr {
@@ -604,8 +641,8 @@ ul.footnotes p {
font-family: 'Roboto Mono', monospace;
font-weight: 400;
text-transform: uppercase;
- color: #666;
- font-size: 11pt;
+ color: #333;
+ font-size: 12pt;
}
/* images */
@@ -670,22 +707,24 @@ section.fullwidth .image {
}
.image .caption.intro-caption{
text-align: center;
+ color:#666;
}
.caption {
text-align: center;
font-size: 10pt;
- color: #999;
+ line-height: 14pt;
+ color: #333;
max-width: 960px;
margin: 10px auto 10px auto;
font-family: 'Roboto';
}
.caption a {
- color: #ccc;
- border: 0;
+ color: #333;
+ border-bottom: 1px solid #333;
}
.desktop .caption a:hover {
- color: #fff;
- border: 0;
+ color: #111;
+ border-bottom: 1px solid #111;
}
@@ -873,7 +912,7 @@ section.fullwidth .image {
.dataset-list .dataset {
width: 300px;
padding: 12px;
- color: white;
+ color: #000;
font-weight: 400;
font-family: 'Roboto';
position: relative;
@@ -884,21 +923,22 @@ section.fullwidth .image {
height: 178px;
}
.desktop .content .dataset-list a {
- border: 1px solid #333;
+ border: 1px solid #999;
}
.desktop .dataset-list a:hover {
- border: 1px solid #666;
+ border: 1px solid #000;
}
.dataset-list .fields {
font-size: 12px;
- color: #ccc;
+ line-height: 17px;
+ color: #333;
}
.dataset-list .dataset .title{
font-size: 16px;
line-height: 20px;
margin-bottom: 4px;
- font-weight: 400;
+ font-weight: 500;
display: block;
}
.dataset-list .fields div {
@@ -965,7 +1005,7 @@ section.intro_section {
justify-content: center;
align-items: center;
background-color: #111111;
- margin-bottom: 20px;
+ /*margin-bottom: 20px;*/
padding: 0;
}
.intro_section .inner {
@@ -1091,7 +1131,8 @@ ul.map-legend li:before {
}
ul.map-legend li.active {
text-decoration: underline;
- color: #fff;
+ color: #000;
+ font-weight: 500;
}
ul.map-legend li.edu:before {
background-color: #f2f293;
@@ -1118,7 +1159,7 @@ ul.map-legend li.source:before {
}
.content-about {
- color: #fff;
+ /*color: #fff;*/
}
.content-about p {
font-size: 16px;
@@ -1128,8 +1169,8 @@ ul.map-legend li.source:before {
font-weight: 300;
}
.content-about section:first-of-type > p:first-of-type {
- font-size: 22px;
- line-height: 40px;
+ font-size: 26px;
+ line-height: 42px;
}
.content-about .about-menu ul li {
display: inline-block;
@@ -1141,12 +1182,13 @@ ul.map-legend li.source:before {
}
.content-about .about-menu ul li a {
border-bottom: 0;
- color: #aaa;
+ color: #555;
}
.content-about .about-menu ul li a.current {
- border-bottom: 1px solid #ddd;
- color: #ddd;
+ border-bottom: 1px solid #000;
+ color: #000;
+ font-weight: 500;
}
/* columns */
@@ -1231,13 +1273,13 @@ ul.map-legend li.source:before {
/* footnotes */
a.footnote {
- font-size: 9px;
+ font-size: 10px;
line-height: 0px;
position: relative;
/*display: inline-block;*/
bottom: 7px;
text-decoration: none;
- color: #ff8;
+ color: #333;
border: 0;
left: -1px;
transition-duration: 0s;
@@ -1255,14 +1297,14 @@ a.footnote_shim {
}
.desktop a.footnote:hover {
/*background-color: #ff8;*/
- color: #fff;
+ color: #000;
border: 0;
}
.backlinks {
margin-right: 10px;
}
.content .backlinks a {
- color: #ff8;
+ color: #333;
font-size: 10px;
text-decoration: none;
border: 0;
diff --git a/site/assets/css/tabulator.css b/site/assets/css/tabulator.css
index d7a3fab3..baf44536 100755
--- a/site/assets/css/tabulator.css
+++ b/site/assets/css/tabulator.css
@@ -1,7 +1,7 @@
/* Tabulator v4.1.3 (c) Oliver Folkerd */
.tabulator {
position: relative;
- font-size: 13px;
+ font-size: 12px;
text-align: left;
overflow: hidden;
-ms-transform: translatez(0);
diff --git a/site/content/_drafts_/pipa/index.md b/site/content/_drafts_/pipa/index.md
deleted file mode 100644
index 250878ff..00000000
--- a/site/content/_drafts_/pipa/index.md
+++ /dev/null
@@ -1,23 +0,0 @@
-------------
-
-status: draft
-title: People in Photo Albums
-desc: <span class="dataset-name"> People in Photo Albums (PIPA)</span> is a dataset...
-subdesc: [ add subdescrition ]
-slug: pipa
-cssclass: dataset
-image: assets/background.jpg
-published: 2019-2-23
-updated: 2019-2-23
-authors: Adam Harvey
-
-------------
-
-## People in Photo Albums
-
-### sidebar
-### end sidebar
-
-[ PAGE UNDER DEVELOPMENT ]
-
-{% include 'dashboard.html' %}
diff --git a/site/content/pages/about/index.md b/site/content/pages/about/index.md
index d4172e81..f07a79ee 100644
--- a/site/content/pages/about/index.md
+++ b/site/content/pages/about/index.md
@@ -22,27 +22,10 @@ authors: Adam Harvey
</ul>
</section>
-MegaPixels is an independent art and research project by Adam Harvey and Jules LaPlace that investigates the ethics, origins, and individual privacy implications of face recognition image datasets and their role in the expansion of biometric surveillance technologies.
+MegaPixels is an independent art and research project by [Adam Harvey](https://ahprojects.com) and [Jules LaPlace](https://asdf.us) that investigates the ethics, origins, and individual privacy implications of face recognition image datasets and their role in the expansion of biometric surveillance technologies.
MegaPixels is made possible with support from <a href="http://mozilla.org">Mozilla</a>
-
-<div class="flex-container team-photos-container">
- <div class="team-member">
- <h3>Adam Harvey</h3>
- <p>is Berlin-based American artist and researcher. His previous projects (<a href="https://cvdazzle.com">CV Dazzle</a>, <a href="https://ahprojects.com/stealth-wear">Stealth Wear</a>, and <a href="https://github.com/adamhrv/skylift">SkyLift</a>) explore the potential for counter-surveillance as artwork. He is the founder of <a href="https://vframe.io">VFRAME</a> (visual forensics software for human rights groups) and is a currently researcher in residence at Karlsruhe HfG.</p>
- <p><a href="https://ahprojects.com">ahprojects.com</a></p>
- </p>
- </div>
- <div class="team-member">
- <h3>Jules LaPlace</h3>
- <p>is an American technologist and artist also based in Berlin. He was previously the CTO of a digital agency in NYC and now also works at VFRAME, developing computer vision and data analysis software for human rights groups. Jules also builds experimental software for artists and musicians.
- </p>
- <p><a href="https://asdf.us/">asdf.us</a></p>
- </div>
-</div>
-
-
MegaPixels is an art and research project first launched in 2017 for an [installation](https://ahprojects.com/megapixels-glassroom/) at Tactical Technology Collective's [GlassRoom](https://tacticaltech.org/pages/glass-room-london-press/) about face recognition datasets. In 2018 MegaPixels was extended to cover pedestrian analysis datasets for a [commission by Elevate Arts festival](https://esc.mur.at/de/node/2370) in Austria. Since then MegaPixels has evolved into a large-scale interrogation of hundreds of publicly-available face and person analysis datasets, the first of which launched on this site in April 2019.
MegaPixels aims to provide a critical perspective on machine learning image datasets, one that might otherwise escape academia and industry funded artificial intelligence think tanks that are often supported by the same technology companies who created many of the datasets presented on this site.
@@ -51,7 +34,14 @@ MegaPixels is an independent project, designed as a public resource for educator
A dataset of verified geocoded citations and dataset statistics will be published in Fall 2019 along with a research paper as part of a research fellowship for [KIM (Critical Artificial Intelligence) Karlsruhe HfG](http://kim.hfg-karlsruhe.de/).
-### Selected News and Exhibitions
+
+#### Team
+
+- [Adam Harvey](https://ahprojects.com): Concept, research and analysis, design, computer vision
+- [Jules LaPlace](https://asdf.us): Information and systems architecture, data management, citation geocoding, web applications
+
+
+### News and Publications
- July 2019: New York Times writes about MegaPixels and how "[Facial Recognition Tech Is Growing Stronger, Thanks to Your Face](https://www.nytimes.com/2019/07/13/technology/databases-faces-facial-recognition-technology.html)"
- June 2019 - 2020: MegaPixels installation at Ars Electronica Center (AT) exhibition ["Compass - Navigating the Future"](https://ars.electronica.art/center/en/megapixels)
@@ -60,24 +50,14 @@ A dataset of verified geocoded citations and dataset statistics will be publishe
Read more [news](/about/news)
-=== columns 3
-##### Team
-
-- Adam Harvey: Concept, research and analysis, design, computer vision
-- Jules LaPlace: Information and systems architecture, data management, web applications
-
-===========
-
-##### Contributing Researchers
+#### Contributing Researchers
- Beth (aka Ms. Celeb)
- Berit Gilma
- Mathana Stender
-===========
-
-##### Code and Libraries
+#### Code and Libraries
- [Semantic Scholar](https://semanticscholar.org) for citation aggregation
- Leaflet.js for maps
@@ -85,9 +65,8 @@ Read more [news](/about/news)
- ThreeJS for 3D visualizations
- PDFMiner.Six and Pandas for research paper analysis
-=== end columns
-##### Attribution
+#### Attribution
If you use MegaPixels or any data derived from it for your work, please cite our original work as follows:
@@ -100,8 +79,3 @@ If you use MegaPixels or any data derived from it for your work, please cite our
urldate = {2019-04-18}
}
</pre>
-
-
-##### Contact
-
-Please direct questions, comments, or feedback to [mastodon.social/@adamhrv](https://mastodon.social/@adamhrv) or contact via [https://ahprojects.com/about](https://ahprojects.com/about) \ No newline at end of file
diff --git a/site/content/pages/about/legal.md b/site/content/pages/about/legal.md
index 08538e9d..cde0f0c9 100644
--- a/site/content/pages/about/legal.md
+++ b/site/content/pages/about/legal.md
@@ -27,11 +27,11 @@ MegaPixels.cc Terms and Privacy
MegaPixels is an independent and academic art and research project about the origins and ethics of publicly available face analysis image datasets. By accessing MegaPixels (the *Service* or *Services*) you agree to the terms and conditions set forth below.
-## Privacy
+### Privacy
The MegaPixels site has been designed to minimize the amount of network requests to 3rd party services and therefore prioritize the privacy of the viewer. This site does not use any local or external analytics programs to monitor site viewers. In fact, the only data collected are the necessary server logs used only for preventing misuse, which are deleted at short-term intervals.
-## 3rd Party Services
+### 3rd Party Services
In order to provide certain features of the site, some 3rd party services are needed. Currently, the MegaPixels.cc site uses two 3rd party services: (1) Leaflet.js for the interactive map and (2) Digital Ocean Spaces as a content delivery network. Both services encrypt your requests to their server using HTTPS and neither service requires storing any cookies or authentication. However, both services will store files in your web browser's local cache (local storage) to improve loading performance. None of these local storage files are using for analytics, tracking, or any similar purpose.
diff --git a/site/content/pages/datasets/adience/index.md b/site/content/pages/datasets/adience/index.md
index 12cf539a..675de813 100644
--- a/site/content/pages/datasets/adience/index.md
+++ b/site/content/pages/datasets/adience/index.md
@@ -7,7 +7,6 @@ subdesc: Adience ...
slug: adience
cssclass: dataset
image: assets/background.jpg
-year: 2016
published: 2019-4-18
updated: 2019-4-18
authors: Adam Harvey
diff --git a/site/content/pages/datasets/brainwash/index.md b/site/content/pages/datasets/brainwash/index.md
index 2a5346b5..a61c007c 100644
--- a/site/content/pages/datasets/brainwash/index.md
+++ b/site/content/pages/datasets/brainwash/index.md
@@ -4,6 +4,7 @@ status: published
title: Brainwash Dataset
desc: Brainwash is a dataset of webcam images taken from the Brainwash Cafe in San Francisco
subdesc: It includes 11,917 images of "everyday life of a busy downtown cafe" and is used for training face and head detection algorithms
+caption: One of the 11,917 images in the Brainwash dataset captured from the Brainwash Cafe in San Francisco
slug: brainwash
cssclass: dataset
image: assets/background.jpg
@@ -14,9 +15,14 @@ authors: Adam Harvey
------------
-## Brainwash Dataset
+# Brainwash Dataset
+
+Update: In response to the publication of this report, the Brainwash dataset has been "removed from access at the request of the depositor."
### sidebar
+
++ Press coverage: <a href="https://www.nytimes.com/2019/07/13/technology/">New York Times</a>, <a href="https://www.tijd.be/dossier/legrandinconnu/brainwash/10136670.html">De Tijd</a>
+
### end sidebar
Brainwash is a dataset of livecam images taken from San Francisco's Brainwash Cafe. It includes 11,917 images of "everyday life of a busy downtown cafe"[^readme] captured at 100 second intervals throughout the day. The Brainwash dataset includes 3 full days of webcam images taken on October 27, November 13, and November 24 in 2014. According the author's [research paper](https://www.semanticscholar.org/paper/End-to-End-People-Detection-in-Crowded-Scenes-Stewart-Andriluka/1bd1645a629f1b612960ab9bba276afd4cf7c666) introducing the dataset, the images were acquired with the help of Angelcam.com. [^end_to_end]
@@ -45,6 +51,12 @@ The two papers associated with the National University of Defense Technology in
![caption: Nine of 11,917 images from the the Brainwash dataset. Graphic: megapixels.cc based on Brainwash dataset by Russel et. al. License: <a href="https://opendatacommons.org/licenses/pddl/summary/index.html">Open Data Commons Public Domain Dedication</a> (PDDL)](assets/brainwash_grid.jpg)
+### Press Coverage
+
+- New York Times: [Facial Recognition Tech Is Growing Stronger, Thanks to Your Face](https://www.nytimes.com/2019/07/13/technology/)
+- De Tijd: [Brainwash](https://www.tijd.be/dossier/legrandinconnu/brainwash/10136670.html)
+
+
{% include 'cite_our_work.html' %}
#### Citing Brainwash Dataset
diff --git a/site/content/pages/datasets/duke_mtmc/index.md b/site/content/pages/datasets/duke_mtmc/index.md
index b5c6bf1a..e6a77269 100644
--- a/site/content/pages/datasets/duke_mtmc/index.md
+++ b/site/content/pages/datasets/duke_mtmc/index.md
@@ -6,6 +6,7 @@ desc: <span class="dataset-name">Duke MTMC</span> is a dataset of surveillance c
subdesc: Duke MTMC contains over 2 million video frames and 2,700 unique identities collected from 8 HD cameras at Duke University campus in March 2014
slug: duke_mtmc
cssclass: dataset
+caption: A still frame from the Duke MTMC (Multi-Target-Multi-Camera) CCTV dataset captured on Duke University campus in 2014. The dataset has now been terminated by the author in response to this report.
image: assets/background.jpg
published: 2019-4-18
updated: 2019-05-22
@@ -13,12 +14,16 @@ authors: Adam Harvey
------------
-## Duke MTMC
+# Duke MTMC
+
+Update: In response to this report and an [investigation](https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d43599e) by the Financial Times, Duke University has terminated the Duke MTMC dataset.
### sidebar
### end sidebar
-Duke MTMC (Multi-Target, Multi-Camera) is a dataset of surveillance video footage taken on Duke University's campus in 2014 and is used for research and development of video tracking systems, person re-identification, and low-resolution facial recognition. The dataset contains over 14 hours of synchronized surveillance video from 8 cameras at 1080p and 60 FPS, with over 2 million frames of 2,000 students walking to and from classes. The 8 surveillance cameras deployed on campus were specifically setup to capture students "during periods between lectures, when pedestrian traffic is heavy".[^duke_mtmc_orig]
+Duke MTMC (Multi-Target, Multi-Camera) is a dataset of surveillance video footage taken on Duke University's campus in 2014 and is used for research and development of video tracking systems, person re-identification, and low-resolution facial recognition.
+
+The dataset contains over 14 hours of synchronized surveillance video from 8 cameras at 1080p and 60 FPS, with over 2 million frames of 2,000 students walking to and from classes. The 8 surveillance cameras deployed on campus were specifically setup to capture students "during periods between lectures, when pedestrian traffic is heavy".[^duke_mtmc_orig]
For this analysis of the Duke MTMC dataset over 100 publicly available research papers that used the dataset were analyzed to find out who's using the dataset and where it's being used. The results show that the Duke MTMC dataset has spread far beyond its origins and intentions in academic research projects at Duke University. Since its publication in 2016, more than twice as many research citations originated in China as in the United States. Among these citations were papers links to the Chinese military and several of the companies known to provide Chinese authorities with the oppressive surveillance technology used to monitor millions of Uighur Muslims.
diff --git a/site/content/pages/datasets/helen/assets/_background.jpg b/site/content/pages/datasets/helen/assets/_background.jpg
new file mode 100644
index 00000000..5968da24
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/_background.jpg
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/age.csv b/site/content/pages/datasets/helen/assets/age.csv
new file mode 100644
index 00000000..17121aac
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/age.csv
@@ -0,0 +1,10 @@
+age,faces
+0 - 12,31
+13 - 18,367
+19 - 24,567
+25 - 34,634
+35 - 44,362
+45 - 54,113
+55 - 64,56
+64 - 75,34
+75 - 100,10
diff --git a/site/content/pages/datasets/helen/assets/alpha.png b/site/content/pages/datasets/helen/assets/alpha.png
new file mode 100755
index 00000000..eb1defd0
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/alpha.png
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/background.jpg b/site/content/pages/datasets/helen/assets/background.jpg
index 6958a2b2..0288163e 100755
--- a/site/content/pages/datasets/helen/assets/background.jpg
+++ b/site/content/pages/datasets/helen/assets/background.jpg
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/feature_bride.jpg b/site/content/pages/datasets/helen/assets/feature_bride.jpg
new file mode 100755
index 00000000..5430f50b
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/feature_bride.jpg
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/feature_family.jpg b/site/content/pages/datasets/helen/assets/feature_family.jpg
new file mode 100755
index 00000000..a3fb833d
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/feature_family.jpg
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/feature_family_05.jpg b/site/content/pages/datasets/helen/assets/feature_family_05.jpg
new file mode 100755
index 00000000..57fb35bc
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/feature_family_05.jpg
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/feature_graduation.jpg b/site/content/pages/datasets/helen/assets/feature_graduation.jpg
new file mode 100755
index 00000000..f9f7d132
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/feature_graduation.jpg
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/feature_groom.jpg b/site/content/pages/datasets/helen/assets/feature_groom.jpg
new file mode 100755
index 00000000..31791987
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/feature_groom.jpg
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/feature_outdoor.jpg b/site/content/pages/datasets/helen/assets/feature_outdoor.jpg
new file mode 100755
index 00000000..375f5ae5
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/feature_outdoor.jpg
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/feature_outdoor_02.jpg b/site/content/pages/datasets/helen/assets/feature_outdoor_02.jpg
new file mode 100755
index 00000000..4a02876d
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/feature_outdoor_02.jpg
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/feature_wedding.jpg b/site/content/pages/datasets/helen/assets/feature_wedding.jpg
new file mode 100755
index 00000000..deed7061
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/feature_wedding.jpg
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/feature_wedding_02.jpg b/site/content/pages/datasets/helen/assets/feature_wedding_02.jpg
new file mode 100755
index 00000000..27489f7b
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/feature_wedding_02.jpg
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/gender.csv b/site/content/pages/datasets/helen/assets/gender.csv
new file mode 100644
index 00000000..ef44b6bd
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/gender.csv
@@ -0,0 +1,4 @@
+gender,faces
+Male,1118
+Female,1184
+Overlap,186
diff --git a/site/content/pages/datasets/helen/assets/ijb_c_montage.jpg b/site/content/pages/datasets/helen/assets/ijb_c_montage.jpg
deleted file mode 100755
index 3b5a0e40..00000000
--- a/site/content/pages/datasets/helen/assets/ijb_c_montage.jpg
+++ /dev/null
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/index.jpg b/site/content/pages/datasets/helen/assets/index.jpg
index 7268d6ad..b9ce489d 100755
--- a/site/content/pages/datasets/helen/assets/index.jpg
+++ b/site/content/pages/datasets/helen/assets/index.jpg
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/montage_20_2_2_40_15.png b/site/content/pages/datasets/helen/assets/montage_20_2_2_40_15.png
new file mode 100755
index 00000000..86720be7
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/montage_20_2_2_40_15.png
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/montage_lms_21_14_14_14_22.png b/site/content/pages/datasets/helen/assets/montage_lms_21_14_14_14_22.png
new file mode 100755
index 00000000..3362f6bf
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/montage_lms_21_14_14_14_22.png
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/montage_lms_21_14_14_14_25.png b/site/content/pages/datasets/helen/assets/montage_lms_21_14_14_14_25.png
new file mode 100755
index 00000000..450235d5
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/montage_lms_21_14_14_14_25.png
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/montage_lms_21_14_14_14_26.png b/site/content/pages/datasets/helen/assets/montage_lms_21_14_14_14_26.png
new file mode 100755
index 00000000..490d44bb
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/montage_lms_21_14_14_14_26.png
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/montage_lms_21_15_15_7_26_0.png b/site/content/pages/datasets/helen/assets/montage_lms_21_15_15_7_26_0.png
new file mode 100755
index 00000000..6f1c85c5
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/montage_lms_21_15_15_7_26_0.png
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/single.png b/site/content/pages/datasets/helen/assets/single.png
new file mode 100755
index 00000000..5f7d23b0
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/single.png
Binary files differ
diff --git a/site/content/pages/datasets/helen/assets/single_21_15_15_7_43_19.png b/site/content/pages/datasets/helen/assets/single_21_15_15_7_43_19.png
new file mode 100755
index 00000000..eb1defd0
--- /dev/null
+++ b/site/content/pages/datasets/helen/assets/single_21_15_15_7_43_19.png
Binary files differ
diff --git a/site/content/pages/datasets/helen/index.md b/site/content/pages/datasets/helen/index.md
index d44c9b98..da1dc33b 100644
--- a/site/content/pages/datasets/helen/index.md
+++ b/site/content/pages/datasets/helen/index.md
@@ -1,30 +1,134 @@
------------
-status: draft
+status: published
title: HELEN
-desc: HELEN Face Dataset
-subdesc: HELEN (under development)
+desc: HELEN is a dataset of face images from Flickr used for training facial component localization algorithms
+subdesc: HELEN includes 2,330 images from Flickr found by keyword searches for "portrait", "wedding", "outdoor", "boy", "studio", and "family"
+caption: Selected images from the HELEN dataset
slug: helen
cssclass: dataset
+caption: Example images from the HELEN dataset
image: assets/background.jpg
-year: 2000
-published: 2019-4-18
-updated: 2019-4-18
+published: 2019-9-23
+updated: 2019-9-23
authors: Adam Harvey
------------
-## HELEN
+
+# HELEN Dataset
### sidebar
### end sidebar
-[ page under development ]
+Helen is a dataset of annotated face images used for facial component localization. It includes 2,330 images from Flickr found by searching for "portrait" combined with terms such as "family", "wedding", "boy", "outdoor", and "studio".[^orig_paper]
+
+The dataset was published in 2012 with the primary motivation listed as facilitating "high quality editing of portraits". However, the paper's introduction also mentions that facial feature localization "is an essential component for face recognition, tracking and expression analysis."[^orig_paper]
+
+Irregardless of the authors' primary motivations, the HELEN dataset has become one of the most widely used datasets for training facial landmark algorithms, which are essential parts of most facial recogntion processing systems. Facial landmarking are used to isolate facial features such as the eyes, nose, jawline, and mouth in order to align faces to match a templated pose.
+
+![caption: An example annotation from the HELEN dataset showing 194 points that were originally annotated by Mechanical Turk workers. Graphic &copy; 2019 MegaPixels.cc based on data from HELEN dataset by Le, Vuong et al.](assets/montage_lms_21_14_14_14_26.png)
+
+This analysis shows that since its initial publication in 2012, the HELEN dataset has been used in over 200 research projects related to facial recognition with the vast majority of research taking place in China.
+
+Commercial use includes IBM, NVIDIA, NEC, Microsoft Research Asia, Google, Megvii, Microsoft, Intel, Daimler, Tencent, Baidu, Adobe, Facebook
+
+Military and Defense Usage includes NUDT
+
+http://eccv2012.unifi.it/
+
+TODO
+
+- add proof of use in dlib and openface
+- add proof of use in commercial use of dlib? ibm dif
+- make landmark over blurred images
+- add 6x6 gride for landmarks
+- highlight key findings
+- highlight key commercial usage
+- look for most interesting research papers to provide example of how it's used for face recognition
+- estimated time: 6 hours
+- add data to github repo?
+
+| Organization | Paper | Link | Year | Used Duke MTMC |
+|---|---|---|---|
+| SenseTime, Amazon | [Look at Boundary: A Boundary-Aware Face Alignment Algorithm](https://arxiv.org/pdf/1805.10483.pdf)
+ | 2018 | year | &#x2714; |
+| SenseTime | [ReenactGAN: Learning to Reenact Faces via Boundary Transfer](https://arxiv.org/pdf/1807.11079.pdf) | 2018 | year | &#x2714; |
+
+
+The dataset was used for training the OpenFace software "we used the HELEN and LFPW training subsets for training and the rest for testing" https://github.com/TadasBaltrusaitis/OpenFace/wiki/Datasets
+
+The popular dlib facial landmark detector was trained using HELEN
+
+In addition to the 200+ verified citations, the HELEN dataset was used for
+- https://github.com/memoiry/face-alignment
+- http://www.dsp.toronto.edu/projects/face_analysis/
+
+It's been converted into new datasets including
+- https://github.com/JPlin/Relabeled-HELEN-Dataset
+- https://www.kaggle.com/kmader/helen-eye-dataset
+
+The original site
+- http://www.ifp.illinois.edu/~vuongle2/helen/
+
+### Example Images
+
+
+
+![caption: An image from the HELEN dataset "wedding" category used for training face recognition 2839127417_1.jpg for outdoor studio](assets/feature_outdoor_02.jpg)
+![caption: An image from the HELEN dataset "wedding" category used for training face recognition 2325274893_1 ](assets/feature_graduation.jpg)
+
+![caption: An image from the HELEN dataset "wedding" category used for training face recognition 2325274893_1 ](assets/feature_wedding.jpg)
+![caption: An image from the HELEN dataset "wedding" category used for training face recognition 2325274893_1 ](assets/feature_wedding_02.jpg)
+
+![caption: Original Flickr image used in HELEN facial analysis and recognition dataset for the keyword "family". 296814969](assets/feature_family.jpg)
+![caption: Original Flickr image used in HELEN facial analysis and recognition dataset for the keyword "family". 296814969](assets/feature_family_05.jpg)
+
{% include 'dashboard.html' %}
{% include 'supplementary_header.html' %}
+### Age and Gender Distribution
+
+{% include 'age_gender_disclaimer.html' %}
+
+=== columns 2
+
+```
+single_pie_chart /datasets/helen/assets/age.csv
+Caption: HELEN dataset age distribution
+Top: 10
+OtherLabel: Other
+```
+
+```
+single_pie_chart /datasets/helen/assets/gender.csv
+Caption: HELEN dataset gender distribution
+Top: 10
+OtherLabel: Other
+```
+
+=== end columns
+
+![caption: Visualization of the HELEN dataset 194-point facial landmark annotations. Credit: graphic &copy; MegaPixels.cc 2019, data from HELEN dataset by Zhou, Brand, Lin 2013. If you use this image please credit both the graphic and data source.](assets/montage_lms_21_15_15_7_26_0.png)
+
{% include 'cite_our_work.html' %}
+
+#### Cite the Original Author's Work
+
+If you find the HELEN dataset useful or reference it in your work, please cite the author's original work as:
+
+<pre>
+@inproceedings{Le2012InteractiveFF,
+ title={Interactive Facial Feature Localization},
+ author={Vuong Le and Jonathan Brandt and Zhe L. Lin and Lubomir D. Bourdev and Thomas S. Huang},
+ booktitle={ECCV},
+ year={2012}
+}
+</pre>
+
### Footnotes
+
+[^orig_paper]: Le, Vuong et al. “Interactive Facial Feature Localization.” ECCV (2012). \ No newline at end of file
diff --git a/site/content/pages/datasets/hrt_transgender/index.md b/site/content/pages/datasets/hrt_transgender/index.md
deleted file mode 100644
index fb820593..00000000
--- a/site/content/pages/datasets/hrt_transgender/index.md
+++ /dev/null
@@ -1,24 +0,0 @@
-------------
-
-status: draft
-title: HRT Transgender Dataset
-desc: TBD
-subdesc: TBD
-slug: hrt_transgender
-cssclass: dataset
-image: assets/background.jpg
-year: 2015
-published: 2019-2-23
-updated: 2019-2-23
-authors: Adam Harvey
-
-------------
-
-## HRT Transgender Dataset
-
-### sidebar
-### end sidebar
-
-[ page under development ]
-
-{% include 'dashboard.html' } \ No newline at end of file
diff --git a/site/content/pages/datasets/ibm_dif/index.md b/site/content/pages/datasets/ibm_dif/index.md
index 4c620e95..c5f25e1d 100644
--- a/site/content/pages/datasets/ibm_dif/index.md
+++ b/site/content/pages/datasets/ibm_dif/index.md
@@ -1,20 +1,20 @@
------------
status: draft
-title: MegaFace
-desc: MegaFace Dataset
-subdesc: MegaFace contains 670K identities and 4.7M images
-slug: megaface
+title: IBM DiF
+desc: Diversity in Faces Dataset
+subdesc: Loren Ispum...
+slug: ibm_dif
cssclass: dataset
image: assets/background.jpg
-year: 2016
-published: 2019-4-18
-updated: 2019-4-18
+year: 2019
+published: 2019-9-18
+updated: 2019-9-18
authors: Adam Harvey
------------
-## MegaFace
+## IBM Diversity in Faces
### sidebar
### end sidebar
diff --git a/site/content/pages/datasets/index.md b/site/content/pages/datasets/index.md
index f56a3291..f3d5fea0 100644
--- a/site/content/pages/datasets/index.md
+++ b/site/content/pages/datasets/index.md
@@ -4,6 +4,7 @@ status: published
title: MegaPixels: Face Recognition Datasets
desc: Facial Recognition Datasets
slug: home
+cssclass: dataset-list
published: 2018-12-15
updated: 2019-04-24
authors: Adam Harvey
@@ -15,4 +16,4 @@ sync: false
Explore face and person recognition datasets contributing to the growing crisis of biometric surveillance technologies. This first group of 5 datasets focuses on image usage connected to foreign surveillance and defense organizations.
-In response to the analyses below, the [Brainwash](https://purl.stanford.edu/sx925dc9385), [Duke MTMC](http://vision.cs.duke.edu/DukeMTMC/), and [MS Celeb](http://msceleb.org/) datasets have been taken down by their authors. The [UCCS](https://vast.uccs.edu/Opensetface/) dataset was temporarily deactivated due to metadata exposure. Read more [news](/about/news). A more complete list of datasets and research will be published in September 2019. These 5 are only a preview.
+In response to the analyses below, the [Brainwash](/datasets/brainwash), [Duke MTMC](/datasets/duke_mtmc), and [MS Celeb](/datasets/msceleb/) datasets have been taken down by their authors. The [UCCS](/dataests/uccs/) dataset was temporarily deactivated due to metadata exposure. Read more [news](/about/news). A more complete list of datasets and research will be published in September 2019. These 5 are only a preview.
diff --git a/site/content/pages/datasets/lfpw/index.md b/site/content/pages/datasets/lfpw/index.md
index 1021d490..21f885d4 100644
--- a/site/content/pages/datasets/lfpw/index.md
+++ b/site/content/pages/datasets/lfpw/index.md
@@ -19,7 +19,15 @@ authors: Adam Harvey
### sidebar
### end sidebar
-[ page under development ]
+RESEARCH below this line
+
+> Release 1 of LFPW consists of 1,432 faces from images downloaded from the web using simple text queries on sites such as google.com, flickr.com, and yahoo.com. Each image was labeled by three MTurk workers, and 29 fiducial points, shown below, are included in dataset. LFPW was originally described in the following publication:
+
+> Due to copyright issues, we cannot distribute image files in any format to anyone. Instead, we have made available a list of image URLs where you can download the images yourself. We realize that this makes it impossible to exactly compare numbers, as image links will slowly disappear over time, but we have no other option. This seems to be the way other large web-based databases seem to be evolving.
+
+<https://neerajkumar.org/databases/lfpw/>
+
+> This research was performed at Kriegman-Belhumeur Vision Technologies and was funded by the CIA through the Office of the Chief Scientist. <https://www.cs.cmu.edu/~peiyunh/topdown/> (nk_cvpr2011\_faceparts.pdf)
{% include 'dashboard.html' %}
diff --git a/site/content/pages/datasets/megaface/assets/age.csv b/site/content/pages/datasets/megaface/assets/age.csv
new file mode 100644
index 00000000..52a86599
--- /dev/null
+++ b/site/content/pages/datasets/megaface/assets/age.csv
@@ -0,0 +1,10 @@
+age,faces
+0 - 12,785
+13 - 18,52026
+19 - 24,254411
+25 - 34,452129
+35 - 44,341809
+45 - 54,193525
+55 - 64,65635
+64 - 75,22148
+75 - 100,3108
diff --git a/site/content/pages/datasets/megaface/assets/gender.csv b/site/content/pages/datasets/megaface/assets/gender.csv
new file mode 100644
index 00000000..05ba9e43
--- /dev/null
+++ b/site/content/pages/datasets/megaface/assets/gender.csv
@@ -0,0 +1,4 @@
+gender,faces
+Male,884043
+Female,580747
+Overlap,94990
diff --git a/site/content/pages/datasets/megaface/index.md b/site/content/pages/datasets/megaface/index.md
index 4c620e95..9c282cb2 100644
--- a/site/content/pages/datasets/megaface/index.md
+++ b/site/content/pages/datasets/megaface/index.md
@@ -1,11 +1,13 @@
------------
-status: draft
+status: published
title: MegaFace
desc: MegaFace Dataset
subdesc: MegaFace contains 670K identities and 4.7M images
+caption: Example images from the MegaFace dataset
slug: megaface
cssclass: dataset
+caption: Images from the MegaFace face recognition training and benchmarking dataset
image: assets/background.jpg
year: 2016
published: 2019-4-18
@@ -14,17 +16,71 @@ authors: Adam Harvey
------------
-## MegaFace
+# MegaFace
### sidebar
### end sidebar
-[ page under development ]
+MegaFace is a dataset of 4,700,000 face images of 672,000 individuals used for developing face recognition technologies. All images were downloaded from Flickr.
+
+#### How was it made
+
+MegaFace was developed by the University of Washington for the purpose of trainng, validating, and benchmarking face recognition algorithms.
+
+The images are from Flickr, but are they all from YFCC100M?
+
+#### Who used it
+
+MegaFace was used for research projects associated with SenseTime, Google, Mitsubishi, Vision Semantics Ltd, Microsoft.
+
+#### Subsets
+
+MegaFace was also used for MegaFace Asian, and MegaAge, and glasses.
+
+#### A sample of the research projects
+
+Used for face recognition
+
+screenshots of papers
+
+#### Visuals
+
+- facial landmarks
+- bounding boxes
+- animation of all the titles of the paper
+-
+
+###
+
+
+
{% include 'dashboard.html' %}
{% include 'supplementary_header.html' %}
+### Age and Gender Distribution
+
+=== columns 2
+
+```
+single_pie_chart /datasets/megaface/assets/age.csv
+Caption: MegaFace dataset age distribution
+Top: 10
+OtherLabel: Other
+```
+
+```
+single_pie_chart /datasets/megaface/assets/gender.csv
+Caption: MegaFace dataset gender distribution
+Top: 10
+OtherLabel: Other
+```
+
+=== end columns
+
+{% include 'age_gender_disclaimer.html' %}
+
{% include 'cite_our_work.html' %}
### Footnotes
diff --git a/site/content/pages/datasets/msceleb/assets/age.csv b/site/content/pages/datasets/msceleb/assets/age.csv
new file mode 100644
index 00000000..ce9238f8
--- /dev/null
+++ b/site/content/pages/datasets/msceleb/assets/age.csv
@@ -0,0 +1,10 @@
+age,faces
+0 - 12,51
+13 - 18,3769
+19 - 24,25147
+25 - 34,58352
+35 - 44,57071
+45 - 54,35828
+55 - 64,15335
+64 - 75,6858
+75 - 100,1173
diff --git a/site/content/pages/datasets/msceleb/assets/gender.csv b/site/content/pages/datasets/msceleb/assets/gender.csv
new file mode 100644
index 00000000..ffa644ec
--- /dev/null
+++ b/site/content/pages/datasets/msceleb/assets/gender.csv
@@ -0,0 +1,4 @@
+gender,faces
+Male,150310
+Female,67319
+They,9068
diff --git a/site/content/pages/datasets/msceleb/index.md b/site/content/pages/datasets/msceleb/index.md
index 0e457cd9..64584b31 100644
--- a/site/content/pages/datasets/msceleb/index.md
+++ b/site/content/pages/datasets/msceleb/index.md
@@ -4,6 +4,7 @@ status: published
title: Microsoft Celeb Dataset
desc: MS Celeb is a dataset of 10 million face images harvested from the Internet
subdesc: The MS Celeb dataset includes 10 million images of 100,000 people and an additional target list of 1,000,000 individuals
+caption: Example images forom the MS-Celeb-1M dataset
slug: msceleb
cssclass: dataset
image: assets/background.jpg
@@ -14,12 +15,21 @@ authors: Adam Harvey
------------
-## Microsoft Celeb Dataset (MS Celeb)
+
+# Microsoft Celeb Dataset (MS Celeb)
+
+*Update: In response to this report and an [investigation](https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d43599e) by the Financial Times, Microsoft has terminated their MS-Celeb website <https://msceleb.org>.*
### sidebar
+
++ Press coverage: <a href="https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d43599e">Financial Times</a>, <a href="https://www.nytimes.com/2019/07/13/technology/databases-faces-facial-recognition-technology.html">New York Times</a>, <a href="https://www.bbc.com/news/technology-48555149">BBC</a>, <a href="https://www.spiegel.de/netzwelt/web/microsoft-gesichtserkennung-datenbank-mit-zehn-millionen-fotos-geloescht-a-1271221.html">Spiegel</a>, <a href="https://www.lesechos.fr/tech-medias/intelligence-artificielle/le-mariage-explosif-de-nos-donnees-et-de-lia-1031813">Les Echos</a>, <a href="https://www.lastampa.it/2019/06/22/tecnologia/microsoft-ha-cancellato-il-suo-database-per-il-riconoscimento-facciale-PWwLGmpO1fKQdykMZVBd9H/pagina.html">La Stampa</a>
+
### end sidebar
-Microsoft Celeb (MS-Celeb-1M) is a dataset of 10 million face images harvested from the Internet for the purpose of developing face recognition technologies. According to Microsoft Research, who created and published the [dataset](https://www.microsoft.com/en-us/research/publication/ms-celeb-1m-dataset-benchmark-large-scale-face-recognition-2/) in 2016, MS Celeb is the largest publicly available face recognition dataset in the world, containing over 10 million images of nearly 100,000 individuals. Microsoft's goal in building this dataset was to distribute an initial training dataset of 100,000 individuals' biometric data to accelerate research into recognizing a larger target list of one million people "using all the possibly collected face images of this individual on the web as training data".[^msceleb_orig]
+Microsoft Celeb (MS-Celeb-1M) is a dataset of 10 million face images harvested from the Internet for the purpose of developing face recognition technologies.
+
+According to Microsoft Research, who created and published the [dataset](https://www.microsoft.com/en-us/research/publication/ms-celeb-1m-dataset-benchmark-large-scale-face-recognition-2/) in 2016, MS Celeb is the largest publicly available face recognition dataset in the world, containing over 10 million images of nearly 100,000 individuals. Microsoft's goal in building this dataset was to distribute an initial training dataset of 100,000 individuals' biometric data to accelerate research into recognizing a larger target list of one million people "using all the possibly collected face images of this individual on the web as training data".[^msceleb_orig]
+
While the majority of people in this dataset are American and British actors, the exploitative use of the term "celebrity" extends far beyond Hollywood. Many of the names in the MS Celeb face recognition dataset are merely people who must maintain an online presence for their professional lives: journalists, artists, musicians, activists, policy makers, writers, and academics. Many people in the target list are even vocal critics of the very technology Microsoft is using their name and biometric information to build. It includes digital rights activists like Jillian York; artists critical of surveillance including Trevor Paglen, Jill Magid, and Aram Bartholl; Intercept founders Laura Poitras, Jeremy Scahill, and Glenn Greenwald; Data and Society founder danah boyd; Shoshana Zuboff, author of *Surveillance Capitalism*; and even Julie Brill, the former FTC commissioner responsible for protecting consumer privacy.
@@ -115,15 +125,47 @@ Considering the multiple citations from commercial organizations (Canon, Hitachi
To provide insight into where these 10 million faces images have traveled, over 100 research papers have been verified and geolocated to show who used the dataset and where they used it.
+## GDPR and MS-Celeb
+
+[ in progress ]
+
{% include 'dashboard.html' %}
{% include 'supplementary_header.html' %}
+### Age and Gender Distribution
+
+=== columns 2
+
+```
+single_pie_chart /datasets/msceleb/assets/age.csv
+Caption: MS-Celeb dataset age distribution
+Top: 10
+OtherLabel: Other
+```
+
+```
+single_pie_chart /datasets/helen/assets/gender.csv
+Caption: MS-Celeb dataset gender distribution
+Top: 10
+OtherLabel: Other
+```
+
+=== end columns
+
##### FAQs and Fact Check
-- **The MS Celeb images were not derived from Creative Commons sources**. They were obtained by "retriev[ing] approximately 100 images per celebrity from popular search engines"[^msceleb_orig]. The dataset actually includes many copyrighted images. Microsoft doesn't provide any image URLs, but manually reviewing a small portion of images from the dataset shows many images with watermarked "Copyright" text over the image. TinEye could be used to more accurately determine the image origins in aggregate
-- **Microsoft did not distribute images of all one million people.** They distributed images for about 100,000 and then encouraged other researchers to download the remaining 900,000 people "by using all the possibly collected face images of this individual on the web as training data."[^msceleb_orig]
-- **Microsoft had not deleted or stopped distribution of their MS Celeb at the time of most press reports on June 4.** Until at least June 6, 2019 the Microsoft Research data portal provided the MS Celeb dataset for download: <http://web.archive.org/web/20190606150005/https://msropendata.com/datasets/98fdfc70-85ee-5288-a69f-d859bbe9c737>
+- **Despite several erroneous reports mentioning the MS-Celeb images were derived from Creative Commons licensed media, the MS Celeb images were obtained from web search engines**. The authors mention "they were obtained by "retriev[ing] approximately 100 images per celebrity from popular search engines"[^msceleb_orig]. Many, if not the vast majority, are copyrighted images. Microsoft doesn't provide image URLs, but manually reviewing a small portion of images from the dataset shows images with watermarked "Copyright" text over the image and sources including stock photo agencies such as Getty. TinEye could be used to more accurately determine the image origins in aggregate.
+- **Most reports incorrectly reported that Microsoft distributed images of all one million people. As this analysis mentions several times, Microsoft distributed images for 100,000 people and a separate target list of 900,000 more names.** Other researchers where then expected and encouraged to download the remaining 900,000 people "by using all the possibly collected face images of this individual on the web as training data."[^msceleb_orig]
+- **Microsoft claimed that they had deleted or stopped distribution of their MS Celeb dataset in April 2019 after the Financial Times investigation. This false.** Until at least June 6, 2019 the Microsoft Research data portal freely provided the full MS Celeb dataset download: <http://web.archive.org/web/20190606150005/https://msropendata.com/datasets/98fdfc70-85ee-5288-a69f-d859bbe9c737>
+
+### Press Coverage
+
+- Financial Times (original story): [Who’s using your face? The ugly truth about facial recognition](https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d43599e)
+- New York Times (front page story): [Facial Recognition Tech Is Growing Stronger, Thanks to Your Face](https://www.nytimes.com/2019/07/13/technology/databases-faces-facial-recognition-technology.html)
+- BBC: [Microsoft deletes massive face recognition database](https://www.bbc.com/news/technology-48555149)
+- Spiegel: [Microsoft löscht Datenbank mit zehn Millionen Fotos](https://www.spiegel.de/netzwelt/web/microsoft-gesichtserkennung-datenbank-mit-zehn-millionen-fotos-geloescht-a-1271221.html)
+
### Footnotes
diff --git a/site/content/pages/datasets/oxford_town_centre/index.md b/site/content/pages/datasets/oxford_town_centre/index.md
index c2e3e7a7..eb1e360b 100644
--- a/site/content/pages/datasets/oxford_town_centre/index.md
+++ b/site/content/pages/datasets/oxford_town_centre/index.md
@@ -4,6 +4,7 @@ status: published
title: Oxford Town Centre Dataset
desc: Oxford Town Centre is a dataset of surveillance camera footage from Cornmarket St Oxford, England
subdesc: The Oxford Town Centre dataset includes approximately 2,200 identities and is used for research and development of face recognition systems
+caption: A still frame from the Oxford Town Centre CCTV video-dataset
slug: oxford_town_centre
cssclass: dataset
image: assets/background.jpg
@@ -14,12 +15,14 @@ authors: Adam Harvey
------------
-## Oxford Town Centre
+# Oxford Town Centre
### sidebar
### end sidebar
-The Oxford Town Centre dataset is a CCTV video of pedestrians in a busy downtown area in Oxford used for research and development of activity and face recognition systems.[^ben_benfold_orig] The CCTV video was obtained from a surveillance camera at the corner of Cornmarket and Market St. in Oxford, England and includes approximately 2,200 people. Since its publication in 2009[^guiding_surveillance] the [Oxford Town Centre dataset](http://www.robots.ox.ac.uk/ActiveVision/Research/Projects/2009bbenfold_headpose/project.html) has been used in over 80 verified research projects including commercial research by Amazon, Disney, OSRAM, and Huawei; and academic research in China, Israel, Russia, Singapore, the US, and Germany among dozens more.
+The Oxford Town Centre dataset is a CCTV video of pedestrians in a busy downtown area in Oxford used for research and development of activity and face recognition systems.[^ben_benfold_orig]
+
+The CCTV video was obtained from a surveillance camera at the corner of Cornmarket and Market St. in Oxford, England and includes approximately 2,200 people. Since its publication in 2009[^guiding_surveillance] the [Oxford Town Centre dataset](http://www.robots.ox.ac.uk/ActiveVision/Research/Projects/2009bbenfold_headpose/project.html) has been used in over 80 verified research projects including commercial research by Amazon, Disney, OSRAM, and Huawei; and academic research in China, Israel, Russia, Singapore, the US, and Germany among dozens more.
The Oxford Town Centre dataset is unique in that it uses footage from a public surveillance camera that would otherwise be designated for public safety. The video shows that the pedestrians act normally and unrehearsed indicating they neither knew of nor consented to participation in the research project.
diff --git a/site/content/pages/datasets/pipa/assets/age.csv b/site/content/pages/datasets/pipa/assets/age.csv
new file mode 100644
index 00000000..c742bcb3
--- /dev/null
+++ b/site/content/pages/datasets/pipa/assets/age.csv
@@ -0,0 +1,10 @@
+age,faces
+0 - 12,6
+13 - 18,929
+19 - 24,3598
+25 - 34,6035
+35 - 44,5055
+45 - 54,2833
+55 - 64,741
+64 - 75,173
+75 - 100,17
diff --git a/site/content/pages/datasets/pipa/assets/gender.csv b/site/content/pages/datasets/pipa/assets/gender.csv
new file mode 100644
index 00000000..b128aaec
--- /dev/null
+++ b/site/content/pages/datasets/pipa/assets/gender.csv
@@ -0,0 +1,4 @@
+gender,faces
+Male,10750
+Female,9423
+They,1741
diff --git a/site/content/pages/datasets/pipa/index.md b/site/content/pages/datasets/pipa/index.md
index ca30b693..dd59cafb 100644
--- a/site/content/pages/datasets/pipa/index.md
+++ b/site/content/pages/datasets/pipa/index.md
@@ -14,7 +14,7 @@ authors: Adam Harvey
------------
-## MegaFace
+## PIPA: People in Photo Albums
### sidebar
### end sidebar
diff --git a/site/content/pages/datasets/uccs/index.md b/site/content/pages/datasets/uccs/index.md
index b493c633..3b9bed8a 100644
--- a/site/content/pages/datasets/uccs/index.md
+++ b/site/content/pages/datasets/uccs/index.md
@@ -5,6 +5,7 @@ title: UnConstrained College Students Dataset
slug: uccs
desc: <span class="dataset-name">UnConstrained College Students</span> is a dataset of long-range surveillance photos of students on University of Colorado in Colorado Springs campus
subdesc: The UnConstrained College Students dataset includes 16,149 images of 1,732 students, faculty, and pedestrians and is used for developing face recognition and face detection algorithms
+caption: One of 16,149 images form the UnConstrained College Students face recognition dataset captured at University of Colorado, Colorado Springs
image: assets/background.jpg
cssclass: dataset
image: assets/background.jpg
@@ -15,12 +16,16 @@ authors: Adam Harvey
------------
-## UnConstrained College Students
+# UnConstrained College Students
+
+*Update: In response to this report and its previous publication of metadata from UCCS dataset photos, UCCS has temporarily suspended its dataset, but plans to release a new version.*
### sidebar
### end sidebar
-UnConstrained College Students (UCCS) is a dataset of long-range surveillance photos captured at University of Colorado Colorado Springs developed primarily for research and development of "face detection and recognition research towards surveillance applications"[^uccs_vast]. According to the authors of [two](https://www.semanticscholar.org/paper/Unconstrained-Face-Detection-and-Open-Set-Face-G%C3%BCnther-Hu/d4f1eb008eb80595bcfdac368e23ae9754e1e745) [papers](https://www.semanticscholar.org/paper/Large-scale-unconstrained-open-set-face-database-Sapkota-Boult/07fcbae86f7a3ad3ea1cf95178459ee9eaf77cb1) associated with the dataset, over 1,700 students and pedestrians were "photographed using a long-range high-resolution surveillance camera without their knowledge".[^funding_uccs] This analysis examines the [UCCS dataset](http://vast.uccs.edu/Opensetface/) contents of the [dataset](), its funding sources, timestamp data, and information from publicly available research project citations.
+UnConstrained College Students (UCCS) is a dataset of long-range surveillance photos captured at University of Colorado Colorado Springs developed primarily for research and development of "face detection and recognition research towards surveillance applications"[^uccs_vast].
+
+According to the authors of [two](https://www.semanticscholar.org/paper/Unconstrained-Face-Detection-and-Open-Set-Face-G%C3%BCnther-Hu/d4f1eb008eb80595bcfdac368e23ae9754e1e745) [papers](https://www.semanticscholar.org/paper/Large-scale-unconstrained-open-set-face-database-Sapkota-Boult/07fcbae86f7a3ad3ea1cf95178459ee9eaf77cb1) associated with the dataset, over 1,700 students and pedestrians were "photographed using a long-range high-resolution surveillance camera without their knowledge".[^funding_uccs] This analysis examines the [UCCS dataset](http://vast.uccs.edu/Opensetface/) contents of the [dataset](), its funding sources, timestamp data, and information from publicly available research project citations.
The UCCS dataset includes over 1,700 unique identities, most of which are students walking to and from class. In 2018, it was the "largest surveillance [face recognition] benchmark in the public domain."[^surv_face_qmul] The photos were taken during the spring semesters of 2012 &ndash; 2013 on the West Lawn of the University of Colorado Colorado Springs campus. The photographs were timed to capture students during breaks between their scheduled classes in the morning and afternoon during Monday through Thursday. "For example, a student taking Monday-Wednesday classes at 12:30 PM will show up in the camera on almost every Monday and Wednesday."[^sapkota_boult].
diff --git a/site/content/pages/datasets/vgg_face/assets/background.jpg b/site/content/pages/datasets/vgg_face/assets/background.jpg
deleted file mode 100755
index 6958a2b2..00000000
--- a/site/content/pages/datasets/vgg_face/assets/background.jpg
+++ /dev/null
Binary files differ
diff --git a/site/content/pages/datasets/vgg_face/assets/ijb_c_montage.jpg b/site/content/pages/datasets/vgg_face/assets/ijb_c_montage.jpg
deleted file mode 100755
index 3b5a0e40..00000000
--- a/site/content/pages/datasets/vgg_face/assets/ijb_c_montage.jpg
+++ /dev/null
Binary files differ
diff --git a/site/content/pages/datasets/vgg_face/assets/index.jpg b/site/content/pages/datasets/vgg_face/assets/index.jpg
deleted file mode 100755
index 7268d6ad..00000000
--- a/site/content/pages/datasets/vgg_face/assets/index.jpg
+++ /dev/null
Binary files differ
diff --git a/site/content/pages/datasets/vgg_face/index.md b/site/content/pages/datasets/vgg_face/index.md
deleted file mode 100644
index 2424f1ff..00000000
--- a/site/content/pages/datasets/vgg_face/index.md
+++ /dev/null
@@ -1,30 +0,0 @@
-------------
-
-status: draft
-title: VGG Face
-desc: VGG Face Dataset
-subdesc: VGG Face ...
-slug: vgg_face
-cssclass: dataset
-image: assets/background.jpg
-year: 2016
-published: 2019-4-18
-updated: 2019-4-18
-authors: Adam Harvey
-
-------------
-
-## MegaFace
-
-### sidebar
-### end sidebar
-
-[ page under development ]
-
-{% include 'dashboard.html' %}
-
-{% include 'supplementary_header.html' %}
-
-{% include 'cite_our_work.html' %}
-
-### Footnotes
diff --git a/site/content/pages/datasets/who_goes_there/index.md b/site/content/pages/datasets/who_goes_there/index.md
index feb9896d..c6fe3806 100644
--- a/site/content/pages/datasets/who_goes_there/index.md
+++ b/site/content/pages/datasets/who_goes_there/index.md
@@ -3,7 +3,7 @@
status: draft
title: Who Goes There Dataset
desc: Who Goes There Dataset
-subdesc: Who Goes There (page under development)
+subdesc: Who Goes There
slug: who_goes_there
cssclass: dataset
image: assets/background.jpg
diff --git a/site/content/pages/research/munich_security_conference/index.md b/site/content/pages/research/munich_security_conference/index.md
index 29b278a9..75392dc3 100644
--- a/site/content/pages/research/munich_security_conference/index.md
+++ b/site/content/pages/research/munich_security_conference/index.md
@@ -5,7 +5,8 @@ title: Transnational Flows of Face Recognition Image Training Data
slug: munich-security-conference
desc: Transnational Flows of Face Recognition Image Training Data
subdesc: Where does face data originate and who's using it?
-cssclass: dataset
+caption: An image from the MegaFace face recognition training dataset taken from the U.S. Embassy of Madrid Flickr account
+cssclass: blog
image: assets/background.jpg
published: 2019-6-28
updated: 2019-6-29
@@ -13,6 +14,7 @@ authors: Adam Harvey
------------
+# Transnational Flows of Face Recognition Image Training Data
*A case study on publicly available facial recognition datasets for the Munich Security Conference's Transnational Security Report*
@@ -23,6 +25,7 @@ authors: Adam Harvey
+ Years: 2006 - 2018
+ Last Updated: July 7, 2019
+ Text and Research: Adam Harvey
++ Published in: <a href="https://tsr.securityconference.de/">Transnational Security Report</a>
### end sidebar
@@ -32,19 +35,13 @@ Our [earlier research](https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d
In this new research for the [Munich Security Conference's Transnational Security Report](https://tsr.securityconference.de) we provide summary statistics about the origins and endpoints of facial recognition information supply chains. To make it more personal, we gathered additional data on the number of public photos from embassies that are currently being used in facial recognition training datasets.
-<div style="display:inline;" class="columns columns-1"><div class="column"><div style="background:#202020;border-radius:6px;padding:20px;width:100%">
-<h4>Key Findings</h4>
-
-<ul>
- <li>24 million non-cooperative images were used in facial recognition research projects</li>
- <li>Most data originated from US-based search engines and Flickr, but most research citations found in China</li>
- <li>Over 6,000 of the images were from US, British, Italian, and French embassies (mostly US embassies)</li>
- <li>Images were used for commercial research by Google (US), Microsoft (US), SenseTime (China), Tencent (China), Mitsubishi (Japan), ExpertSystems (Italy), Siren Solution (Ireland), and Paradigma Digital (Spain); and military research by National University of Defense Technology (China)</li>
-</ul>
-
-</div></div></div>
+### Key Findings
+- 24 million non-cooperative images were used in facial recognition research prects
+- Most data originated from US-based search engines and Flickr, but most research citations found in China
+- Over 6,000 of the images were from US, British, Italian, and French embassies (mostly US embassies)
+- Images were used for commercial research by Google (US), Microsoft (US), SenseTime (China), Tencent (China), Mitsubishi (Japan), ExpertSystems (Italy), Siren Solution (Ireland), and Paradigma Digital (Spain); and military research by National University of Defense Technology (China)
### 24 Million Photos
@@ -73,7 +70,7 @@ OtherLabel: Other
=== end columns
-![](assets/7118211377.jpg)
+![caption: A photo from the U.S Embassy in Tokyo found in a facial recognition training dataset](assets/7118211377.jpg)
### 8,428 Embassy Photos Found in Facial Recognition Datasets
diff --git a/site/includes/age_gender_disclaimer.html b/site/includes/age_gender_disclaimer.html
new file mode 100644
index 00000000..f8dceb62
--- /dev/null
+++ b/site/includes/age_gender_disclaimer.html
@@ -0,0 +1,3 @@
+<section>
+ <p>Age and gender estimation distribution were calculated by anlayzing all faces in the dataset images. This may include additional faces appearing next to an annotated face, or this may skip false faces that were erroneously included as part of the original dataset. These numbers are provided as an estimation and not a factual representation of the exact gender and age of all faces.</p>
+</section> \ No newline at end of file
diff --git a/site/includes/chart.html b/site/includes/chart.html
deleted file mode 100644
index 01c2e83b..00000000
--- a/site/includes/chart.html
+++ /dev/null
@@ -1,14 +0,0 @@
-<section>
- <h3>Who used {{ metadata.meta.dataset.name_display }}?</h3>
-
- <p>
- This bar chart presents a ranking of the top countries where dataset citations originated. Mouse over individual columns to see yearly totals. These charts show at most the top 10 countries.
- </p>
-
- </section>
-
-<section class="applet_container">
-<!-- <div style="position: absolute;top: 0px;right: -55px;width: 180px;font-size: 14px;">Labeled Faces in the Wild Dataset<br><span class="numc" style="font-size: 11px;">20 citations</span>
-</div> -->
- <div class="applet" data-payload="{&quot;command&quot;: &quot;chart&quot;}"></div>
-</section>
diff --git a/site/includes/dashboard.html b/site/includes/dashboard.html
index d5e5693d..4c98189d 100644
--- a/site/includes/dashboard.html
+++ b/site/includes/dashboard.html
@@ -19,10 +19,10 @@
<section>
- <h3>Information Supply chain</h3>
+ <h3>Information Supply Chain</h3>
<p>
- To help understand how {{ metadata.meta.dataset.name_display }} has been used around the world by commercial, military, and academic organizations; existing publicly available research citing {{ metadata.meta.dataset.name_full }} was collected, verified, and geocoded to show the biometric trade routes of people appearing in the images. Click on the markers to reveal research projects at that location.
+ To help understand how {{ metadata.meta.dataset.name_display }} has been used around the world by commercial, military, and academic organizations; existing publicly available research citing {{ metadata.meta.dataset.name_full }} was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.
</p>
</section>
@@ -37,7 +37,7 @@
<li class="com">Commercial</li>
<li class="gov">Military / Government</li>
</ul>
- <div class="source">Citation data is collected using <a href="https://semanticscholar.org" target="_blank">SemanticScholar.org</a> then dataset usage verified and geolocated.</div >
+ <div class="source">Citation data is collected using SemanticScholar.org then dataset usage verified and geolocated. Citations are used to provide overview of how and where images were used.</div>
</div>
diff --git a/site/includes/map.html b/site/includes/map.html
deleted file mode 100644
index 372bed8d..00000000
--- a/site/includes/map.html
+++ /dev/null
@@ -1,22 +0,0 @@
-<section>
-
- <h3>Information Supply Chain</h3>
-
- <p>
- To help understand how {{ metadata.meta.dataset.name_display }} has been used around the world by commercial, military, and academic organizations; existing publicly available research citing {{ metadata.meta.dataset.name_full }} was collected, verified, and geocoded to show the biometric trade routes of people appearing in the images. Click on the location markers to reveal research projects at that location.
- </p>
-
- </section>
-
-<section class="applet_container fullwidth">
- <div class="applet" data-payload="{&quot;command&quot;: &quot;map&quot;}"></div>
-</section>
-
-<div class="caption">
- <ul class="map-legend">
- <li class="edu">Academic</li>
- <li class="com">Commercial</li>
- <li class="gov">Military / Government</li>
- </ul>
- <div class="source">Citation data is collected using <a href="https://semanticscholar.org" target="_blank">SemanticScholar.org</a> and then dataset usage verified and geolocated.</div >
-</div> \ No newline at end of file
diff --git a/site/public/about/index.html b/site/public/about/index.html
index ce2b6228..e5a120d1 100644
--- a/site/public/about/index.html
+++ b/site/public/about/index.html
@@ -63,26 +63,18 @@
<li><a href="/about/attribution/">Attribution</a></li>
<li><a href="/about/legal/">Legal / Privacy</a></li>
</ul>
-</section><p>MegaPixels is an independent art and research project by Adam Harvey and Jules LaPlace that investigates the ethics, origins, and individual privacy implications of face recognition image datasets and their role in the expansion of biometric surveillance technologies.</p>
+</section><p>MegaPixels is an independent art and research project by <a href="https://ahprojects.com">Adam Harvey</a> and <a href="https://asdf.us">Jules LaPlace</a> that investigates the ethics, origins, and individual privacy implications of face recognition image datasets and their role in the expansion of biometric surveillance technologies.</p>
<p>MegaPixels is made possible with support from <a href="http://mozilla.org">Mozilla</a></p>
-<div class="flex-container team-photos-container">
- <div class="team-member">
- <h3>Adam Harvey</h3>
- <p>is Berlin-based American artist and researcher. His previous projects (<a href="https://cvdazzle.com">CV Dazzle</a>, <a href="https://ahprojects.com/stealth-wear">Stealth Wear</a>, and <a href="https://github.com/adamhrv/skylift">SkyLift</a>) explore the potential for counter-surveillance as artwork. He is the founder of <a href="https://vframe.io">VFRAME</a> (visual forensics software for human rights groups) and is a currently researcher in residence at Karlsruhe HfG.</p>
- <p><a href="https://ahprojects.com">ahprojects.com</a></p>
- </p>
- </div>
- <div class="team-member">
- <h3>Jules LaPlace</h3>
- <p>is an American technologist and artist also based in Berlin. He was previously the CTO of a digital agency in NYC and now also works at VFRAME, developing computer vision and data analysis software for human rights groups. Jules also builds experimental software for artists and musicians.
- </p>
- <p><a href="https://asdf.us/">asdf.us</a></p>
- </div>
-</div><p>MegaPixels is an art and research project first launched in 2017 for an <a href="https://ahprojects.com/megapixels-glassroom/">installation</a> at Tactical Technology Collective's <a href="https://tacticaltech.org/pages/glass-room-london-press/">GlassRoom</a> about face recognition datasets. In 2018 MegaPixels was extended to cover pedestrian analysis datasets for a <a href="https://esc.mur.at/de/node/2370">commission by Elevate Arts festival</a> in Austria. Since then MegaPixels has evolved into a large-scale interrogation of hundreds of publicly-available face and person analysis datasets, the first of which launched on this site in April 2019.</p>
+<p>MegaPixels is an art and research project first launched in 2017 for an <a href="https://ahprojects.com/megapixels-glassroom/">installation</a> at Tactical Technology Collective's <a href="https://tacticaltech.org/pages/glass-room-london-press/">GlassRoom</a> about face recognition datasets. In 2018 MegaPixels was extended to cover pedestrian analysis datasets for a <a href="https://esc.mur.at/de/node/2370">commission by Elevate Arts festival</a> in Austria. Since then MegaPixels has evolved into a large-scale interrogation of hundreds of publicly-available face and person analysis datasets, the first of which launched on this site in April 2019.</p>
<p>MegaPixels aims to provide a critical perspective on machine learning image datasets, one that might otherwise escape academia and industry funded artificial intelligence think tanks that are often supported by the same technology companies who created many of the datasets presented on this site.</p>
<p>MegaPixels is an independent project, designed as a public resource for educators, students, journalists, and researchers. Each dataset presented on this site undergoes a thorough review of its images, intent, and citations. MegaPixels is a website-first research project, with an academic publication to follow in fall 2019.</p>
<p>A dataset of verified geocoded citations and dataset statistics will be published in Fall 2019 along with a research paper as part of a research fellowship for <a href="http://kim.hfg-karlsruhe.de/">KIM (Critical Artificial Intelligence) Karlsruhe HfG</a>.</p>
-<h3>Selected News and Exhibitions</h3>
+<h4>Team</h4>
+<ul>
+<li><a href="https://ahprojects.com">Adam Harvey</a>: Concept, research and analysis, design, computer vision</li>
+<li><a href="https://asdf.us">Jules LaPlace</a>: Information and systems architecture, data management, citation geocoding, web applications</li>
+</ul>
+<h3>News and Publications</h3>
<ul>
<li>July 2019: New York Times writes about MegaPixels and how "<a href="https://www.nytimes.com/2019/07/13/technology/databases-faces-facial-recognition-technology.html">Facial Recognition Tech Is Growing Stronger, Thanks to Your Face</a>" </li>
<li>June 2019 - 2020: MegaPixels installation at Ars Electronica Center (AT) exhibition <a href="https://ars.electronica.art/center/en/megapixels">"Compass - Navigating the Future"</a> </li>
@@ -90,18 +82,13 @@
<li>June 26, 2019: The Atlantic writes about image training datasets "in the wild" and research ethics: <a href="https://www.theatlantic.com/technology/archive/2019/06/universities-record-students-campuses-research/592537/">Universities Record Students on Campuses for Research</a> by Sidney Fussell</li>
</ul>
<p>Read more <a href="/about/news">news</a></p>
-</section><section><div class='columns columns-3'><div class='column'><h5>Team</h5>
-<ul>
-<li>Adam Harvey: Concept, research and analysis, design, computer vision</li>
-<li>Jules LaPlace: Information and systems architecture, data management, web applications</li>
-</ul>
-</div><div class='column'><h5>Contributing Researchers</h5>
+<h4>Contributing Researchers</h4>
<ul>
<li>Beth (aka Ms. Celeb)</li>
<li>Berit Gilma</li>
<li>Mathana Stender</li>
</ul>
-</div><div class='column'><h5>Code and Libraries</h5>
+<h4>Code and Libraries</h4>
<ul>
<li><a href="https://semanticscholar.org">Semantic Scholar</a> for citation aggregation</li>
<li>Leaflet.js for maps</li>
@@ -109,7 +96,7 @@
<li>ThreeJS for 3D visualizations</li>
<li>PDFMiner.Six and Pandas for research paper analysis</li>
</ul>
-</div></div></section><section><h5>Attribution</h5>
+<h4>Attribution</h4>
<p>If you use MegaPixels or any data derived from it for your work, please cite our original work as follows:</p>
<pre>
@online{megapixels,
@@ -119,9 +106,7 @@
url = {https://megapixels.cc/},
urldate = {2019-04-18}
}
-</pre><h5>Contact</h5>
-<p>Please direct questions, comments, or feedback to <a href="https://mastodon.social/@adamhrv">mastodon.social/@adamhrv</a> or contact via <a href="https://ahprojects.com/about">https://ahprojects.com/about</a></p>
-</section>
+</pre></section>
</div>
<footer>
diff --git a/site/public/about/legal/index.html b/site/public/about/legal/index.html
index 8beafeea..0beebd43 100644
--- a/site/public/about/legal/index.html
+++ b/site/public/about/legal/index.html
@@ -65,9 +65,9 @@
</ul>
</section><p>MegaPixels.cc Terms and Privacy</p>
<p>MegaPixels is an independent and academic art and research project about the origins and ethics of publicly available face analysis image datasets. By accessing MegaPixels (the <em>Service</em> or <em>Services</em>) you agree to the terms and conditions set forth below.</p>
-<h2>Privacy</h2>
+<h3>Privacy</h3>
<p>The MegaPixels site has been designed to minimize the amount of network requests to 3rd party services and therefore prioritize the privacy of the viewer. This site does not use any local or external analytics programs to monitor site viewers. In fact, the only data collected are the necessary server logs used only for preventing misuse, which are deleted at short-term intervals.</p>
-<h2>3rd Party Services</h2>
+<h3>3rd Party Services</h3>
<p>In order to provide certain features of the site, some 3rd party services are needed. Currently, the MegaPixels.cc site uses two 3rd party services: (1) Leaflet.js for the interactive map and (2) Digital Ocean Spaces as a content delivery network. Both services encrypt your requests to their server using HTTPS and neither service requires storing any cookies or authentication. However, both services will store files in your web browser's local cache (local storage) to improve loading performance. None of these local storage files are using for analytics, tracking, or any similar purpose.</p>
<h3>Links To Other Web Sites</h3>
<p>The MegaPixels.cc contains many links to 3rd party websites, especially in the list of citations that are provided for each dataset. This website has no control over and assumes no responsibility for the content, privacy policies, or practices of any third party web sites or services. You acknowledge and agree that megapixels.cc (and its creators) shall not be responsible or liable, directly or indirectly, for any damage or loss caused or alleged to be caused by or in connection with use of or reliance on any such content, goods or services available on or through any such web sites or services.</p>
diff --git a/site/public/datasets/adience/index.html b/site/public/datasets/adience/index.html
index b2aa2733..9f621441 100644
--- a/site/public/datasets/adience/index.html
+++ b/site/public/datasets/adience/index.html
@@ -55,8 +55,7 @@
</header>
<div class="content content-dataset">
- <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/adience/assets/background.jpg)'><div class='inner'><div class='hero_desc'><span class='bgpad'>Adience ...</span></div><div class='hero_subdesc'><span class='bgpad'>Adience ...
-</span></div></div></section><section><h2>Adience</h2>
+ <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/adience/assets/background.jpg)'></section><section><h2>Adience</h2>
</section><section><div class='right-sidebar'><div class='meta'>
<div class='gray'>Published</div>
<div>2014</div>
@@ -97,10 +96,10 @@
<section>
- <h3>Information Supply chain</h3>
+ <h3>Information Supply Chain</h3>
<p>
- To help understand how Adience Benchmark Dataset has been used around the world by commercial, military, and academic organizations; existing publicly available research citing Adience Benchmark was collected, verified, and geocoded to show the biometric trade routes of people appearing in the images. Click on the markers to reveal research projects at that location.
+ To help understand how Adience Benchmark Dataset has been used around the world by commercial, military, and academic organizations; existing publicly available research citing Adience Benchmark was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.
</p>
</section>
@@ -115,7 +114,7 @@
<li class="com">Commercial</li>
<li class="gov">Military / Government</li>
</ul>
- <div class="source">Citation data is collected using <a href="https://semanticscholar.org" target="_blank">SemanticScholar.org</a> then dataset usage verified and geolocated.</div >
+ <div class="source">Citation data is collected using SemanticScholar.org then dataset usage verified and geolocated. Citations are used to provide overview of how and where images were used.</div>
</div>
diff --git a/site/public/datasets/brainwash/index.html b/site/public/datasets/brainwash/index.html
index 18600b6f..31390edf 100644
--- a/site/public/datasets/brainwash/index.html
+++ b/site/public/datasets/brainwash/index.html
@@ -55,8 +55,8 @@
</header>
<div class="content content-dataset">
- <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/brainwash/assets/background.jpg)'><div class='inner'><div class='hero_desc'><span class='bgpad'>Brainwash is a dataset of webcam images taken from the Brainwash Cafe in San Francisco</span></div><div class='hero_subdesc'><span class='bgpad'>It includes 11,917 images of "everyday life of a busy downtown cafe" and is used for training face and head detection algorithms
-</span></div></div></section><section><h2>Brainwash Dataset</h2>
+ <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/brainwash/assets/background.jpg)'></section><section><div class='image'><div class='intro-caption caption'>One of the 11,917 images in the Brainwash dataset captured from the Brainwash Cafe in San Francisco</div></div></section><section><h1>Brainwash Dataset</h1>
+<p>Update: In response to the publication of this report, the Brainwash dataset has been "removed from access at the request of the depositor."</p>
</section><section><div class='right-sidebar'><div class='meta'>
<div class='gray'>Published</div>
<div>2015</div>
@@ -78,7 +78,7 @@
</div><div class='meta'>
<div class='gray'>Website</div>
<div><a href='https://purl.stanford.edu/sx925dc9385' target='_blank' rel='nofollow noopener'>stanford.edu</a></div>
- </div></div><p>Brainwash is a dataset of livecam images taken from San Francisco's Brainwash Cafe. It includes 11,917 images of "everyday life of a busy downtown cafe"<a class="footnote_shim" name="[^readme]_1"> </a><a href="#[^readme]" class="footnote" title="Footnote 1">1</a> captured at 100 second intervals throughout the day. The Brainwash dataset includes 3 full days of webcam images taken on October 27, November 13, and November 24 in 2014. According the author's <a href="https://www.semanticscholar.org/paper/End-to-End-People-Detection-in-Crowded-Scenes-Stewart-Andriluka/1bd1645a629f1b612960ab9bba276afd4cf7c666">research paper</a> introducing the dataset, the images were acquired with the help of Angelcam.com. <a class="footnote_shim" name="[^end_to_end]_1"> </a><a href="#[^end_to_end]" class="footnote" title="Footnote 2">2</a></p>
+ </div><div class='meta'><div class='gray'>Press coverage</div><div><a href="https://www.nytimes.com/2019/07/13/technology/">New York Times</a>, <a href="https://www.tijd.be/dossier/legrandinconnu/brainwash/10136670.html">De Tijd</a></div></div></div><p>Brainwash is a dataset of livecam images taken from San Francisco's Brainwash Cafe. It includes 11,917 images of "everyday life of a busy downtown cafe"<a class="footnote_shim" name="[^readme]_1"> </a><a href="#[^readme]" class="footnote" title="Footnote 1">1</a> captured at 100 second intervals throughout the day. The Brainwash dataset includes 3 full days of webcam images taken on October 27, November 13, and November 24 in 2014. According the author's <a href="https://www.semanticscholar.org/paper/End-to-End-People-Detection-in-Crowded-Scenes-Stewart-Andriluka/1bd1645a629f1b612960ab9bba276afd4cf7c666">research paper</a> introducing the dataset, the images were acquired with the help of Angelcam.com. <a class="footnote_shim" name="[^end_to_end]_1"> </a><a href="#[^end_to_end]" class="footnote" title="Footnote 2">2</a></p>
<p>The Brainwash dataset is unique because it uses images from a publicly available webcam that records people inside a privately owned business without their consent. No ordinary cafe customer could ever suspect that their image would end up in dataset used for surveillance research and development, but that is exactly what happened to customers at Brainwash Cafe in San Francisco.</p>
<p>Although Brainwash appears to be a less popular dataset, it was notably used in 2016 and 2017 by researchers affiliated with the National University of Defense Technology in China for two <a href="https://www.semanticscholar.org/paper/Localized-region-context-and-object-feature-fusion-Li-Dou/b02d31c640b0a31fb18c4f170d841d8e21ffb66c">research</a> <a href="https://www.semanticscholar.org/paper/A-Replacement-Algorithm-of-Non-Maximum-Suppression-Zhao-Wang/591a4bfa6380c9fcd5f3ae690e3ac5c09b7bf37b">projects</a> on advancing the capabilities of object detection to more accurately isolate the target region in an image. <a class="footnote_shim" name="[^localized_region_context]_1"> </a><a href="#[^localized_region_context]" class="footnote" title="Footnote 3">3</a> <a class="footnote_shim" name="[^replacement_algorithm]_1"> </a><a href="#[^replacement_algorithm]" class="footnote" title="Footnote 4">4</a> The <a href="https://en.wikipedia.org/wiki/National_University_of_Defense_Technology">National University of Defense Technology</a> is controlled by China's top military body, the Central Military Commission.</p>
<p>The Brainwash dataset also appears in a 2018 research paper affiliated with Megvii (Face++) that used images from Brainwash cafe "to validate the generalization ability of [their] CrowdHuman dataset for head detection."<a class="footnote_shim" name="[^crowdhuman]_1"> </a><a href="#[^crowdhuman]" class="footnote" title="Footnote 5">5</a>. Megvii is the parent company of Face++, who has provided surveillance technology to <a href="https://www.nytimes.com/2019/04/14/technology/china-surveillance-artificial-intelligence-racial-profiling.html">monitor Uighur Muslims</a> in Xinjiang and may be <a href="https://www.bloomberg.com/news/articles/2019-05-22/trump-weighs-blacklisting-two-chinese-surveillance-companies">blacklisted</a> in the United States.</p>
@@ -106,10 +106,10 @@
<section>
- <h3>Information Supply chain</h3>
+ <h3>Information Supply Chain</h3>
<p>
- To help understand how Brainwash Dataset has been used around the world by commercial, military, and academic organizations; existing publicly available research citing Brainwash Dataset was collected, verified, and geocoded to show the biometric trade routes of people appearing in the images. Click on the markers to reveal research projects at that location.
+ To help understand how Brainwash Dataset has been used around the world by commercial, military, and academic organizations; existing publicly available research citing Brainwash Dataset was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.
</p>
</section>
@@ -124,7 +124,7 @@
<li class="com">Commercial</li>
<li class="gov">Military / Government</li>
</ul>
- <div class="source">Citation data is collected using <a href="https://semanticscholar.org" target="_blank">SemanticScholar.org</a> then dataset usage verified and geolocated.</div >
+ <div class="source">Citation data is collected using SemanticScholar.org then dataset usage verified and geolocated. Citations are used to provide overview of how and where images were used.</div>
</div>
@@ -145,7 +145,12 @@
<h2>Supplementary Information</h2>
-</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/brainwash/assets/brainwash_grid.jpg' alt=' Nine of 11,917 images from the the Brainwash dataset. Graphic: megapixels.cc based on Brainwash dataset by Russel et. al. License: <a href="https://opendatacommons.org/licenses/pddl/summary/index.html">Open Data Commons Public Domain Dedication</a> (PDDL)'><div class='caption'> Nine of 11,917 images from the the Brainwash dataset. Graphic: megapixels.cc based on Brainwash dataset by Russel et. al. License: <a href="https://opendatacommons.org/licenses/pddl/summary/index.html">Open Data Commons Public Domain Dedication</a> (PDDL)</div></div></section><section>
+</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/brainwash/assets/brainwash_grid.jpg' alt=' Nine of 11,917 images from the the Brainwash dataset. Graphic: megapixels.cc based on Brainwash dataset by Russel et. al. License: <a href="https://opendatacommons.org/licenses/pddl/summary/index.html">Open Data Commons Public Domain Dedication</a> (PDDL)'><div class='caption'> Nine of 11,917 images from the the Brainwash dataset. Graphic: megapixels.cc based on Brainwash dataset by Russel et. al. License: <a href="https://opendatacommons.org/licenses/pddl/summary/index.html">Open Data Commons Public Domain Dedication</a> (PDDL)</div></div></section><section><h3>Press Coverage</h3>
+<ul>
+<li>New York Times: <a href="https://www.nytimes.com/2019/07/13/technology/">Facial Recognition Tech Is Growing Stronger, Thanks to Your Face</a></li>
+<li>De Tijd: <a href="https://www.tijd.be/dossier/legrandinconnu/brainwash/10136670.html">Brainwash</a></li>
+</ul>
+</section><section>
<h4>Cite Our Work</h4>
<p>
diff --git a/site/public/datasets/duke_mtmc/index.html b/site/public/datasets/duke_mtmc/index.html
index fc141450..e86afe63 100644
--- a/site/public/datasets/duke_mtmc/index.html
+++ b/site/public/datasets/duke_mtmc/index.html
@@ -55,8 +55,8 @@
</header>
<div class="content content-dataset">
- <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/duke_mtmc/assets/background.jpg)'><div class='inner'><div class='hero_desc'><span class='bgpad'><span class="dataset-name">Duke MTMC</span> is a dataset of surveillance camera footage of students on Duke University campus</span></div><div class='hero_subdesc'><span class='bgpad'>Duke MTMC contains over 2 million video frames and 2,700 unique identities collected from 8 HD cameras at Duke University campus in March 2014
-</span></div></div></section><section><h2>Duke MTMC</h2>
+ <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/duke_mtmc/assets/background.jpg)'></section><section><div class='image'><div class='intro-caption caption'>A still frame from the Duke MTMC (Multi-Target-Multi-Camera) CCTV dataset captured on Duke University campus in 2014. The dataset has now been terminated by the author in response to this report.</div></div></section><section><h1>Duke MTMC</h1>
+<p>Update: In response to this report and an <a href="https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d43599e">investigation</a> by the Financial Times, Duke University has terminated the Duke MTMC dataset.</p>
</section><section><div class='right-sidebar'><div class='meta'>
<div class='gray'>Published</div>
<div>2016</div>
@@ -75,7 +75,8 @@
</div><div class='meta'>
<div class='gray'>Website</div>
<div><a href='http://vision.cs.duke.edu/DukeMTMC/' target='_blank' rel='nofollow noopener'>duke.edu</a></div>
- </div></div><p>Duke MTMC (Multi-Target, Multi-Camera) is a dataset of surveillance video footage taken on Duke University's campus in 2014 and is used for research and development of video tracking systems, person re-identification, and low-resolution facial recognition. The dataset contains over 14 hours of synchronized surveillance video from 8 cameras at 1080p and 60 FPS, with over 2 million frames of 2,000 students walking to and from classes. The 8 surveillance cameras deployed on campus were specifically setup to capture students "during periods between lectures, when pedestrian traffic is heavy".<a class="footnote_shim" name="[^duke_mtmc_orig]_1"> </a><a href="#[^duke_mtmc_orig]" class="footnote" title="Footnote 1">1</a></p>
+ </div></div><p>Duke MTMC (Multi-Target, Multi-Camera) is a dataset of surveillance video footage taken on Duke University's campus in 2014 and is used for research and development of video tracking systems, person re-identification, and low-resolution facial recognition.</p>
+<p>The dataset contains over 14 hours of synchronized surveillance video from 8 cameras at 1080p and 60 FPS, with over 2 million frames of 2,000 students walking to and from classes. The 8 surveillance cameras deployed on campus were specifically setup to capture students "during periods between lectures, when pedestrian traffic is heavy".<a class="footnote_shim" name="[^duke_mtmc_orig]_1"> </a><a href="#[^duke_mtmc_orig]" class="footnote" title="Footnote 1">1</a></p>
<p>For this analysis of the Duke MTMC dataset over 100 publicly available research papers that used the dataset were analyzed to find out who's using the dataset and where it's being used. The results show that the Duke MTMC dataset has spread far beyond its origins and intentions in academic research projects at Duke University. Since its publication in 2016, more than twice as many research citations originated in China as in the United States. Among these citations were papers links to the Chinese military and several of the companies known to provide Chinese authorities with the oppressive surveillance technology used to monitor millions of Uighur Muslims.</p>
<p>In one 2018 <a href="http://openaccess.thecvf.com/content_cvpr_2018/papers/Xu_Attention-Aware_Compositional_Network_CVPR_2018_paper.pdf">paper</a> jointly published by researchers from SenseNets and SenseTime (and funded by SenseTime Group Limited) entitled <a href="https://www.semanticscholar.org/paper/Attention-Aware-Compositional-Network-for-Person-Xu-Zhao/14ce502bc19b225466126b256511f9c05cadcb6e">Attention-Aware Compositional Network for Person Re-identification</a>, the Duke MTMC dataset was used for "extensive experiments" on improving person re-identification across multiple surveillance cameras with important applications in suspect tracking. Both SenseNets and SenseTime have been linked to the providing surveillance technology to monitor Uighur Muslims in China. <a class="footnote_shim" name="[^xinjiang_nyt]_1"> </a><a href="#[^xinjiang_nyt]" class="footnote" title="Footnote 4">4</a><a class="footnote_shim" name="[^sensetime_qz]_1"> </a><a href="#[^sensetime_qz]" class="footnote" title="Footnote 2">2</a><a class="footnote_shim" name="[^sensenets_uyghurs]_1"> </a><a href="#[^sensenets_uyghurs]" class="footnote" title="Footnote 3">3</a></p>
</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/duke_mtmc/assets/duke_mtmc_reid_montage.jpg' alt=' A collection of 1,600 out of the approximately 2,000 students and pedestrians in the Duke MTMC dataset. These students were also included in the Duke MTMC Re-ID dataset extension used for person re-identification, and eventually the QMUL SurvFace face recognition dataset. Open Data Commons Attribution License.'><div class='caption'> A collection of 1,600 out of the approximately 2,000 students and pedestrians in the Duke MTMC dataset. These students were also included in the Duke MTMC Re-ID dataset extension used for person re-identification, and eventually the QMUL SurvFace face recognition dataset. Open Data Commons Attribution License.</div></div></section><section><p>Despite <a href="https://www.hrw.org/news/2017/11/19/china-police-big-data-systems-violate-privacy-target-dissent">repeated</a> <a href="https://www.hrw.org/news/2018/02/26/china-big-data-fuels-crackdown-minority-region">warnings</a> by Human Rights Watch that the authoritarian surveillance used in China represents a humanitarian crisis, researchers at Duke University continued to provide open access to their dataset for anyone to use for any project. As the surveillance crisis in China grew, so did the number of citations with links to organizations complicit in the crisis. In 2018 alone there were over 90 research projects happening in China that publicly acknowledged using the Duke MTMC dataset. Amongst these were projects from CloudWalk, Hikvision, Megvii (Face++), SenseNets, SenseTime, Beihang University, China's National University of Defense Technology, and the PLA's Army Engineering University.</p>
@@ -268,10 +269,10 @@
<section>
- <h3>Information Supply chain</h3>
+ <h3>Information Supply Chain</h3>
<p>
- To help understand how Duke MTMC Dataset has been used around the world by commercial, military, and academic organizations; existing publicly available research citing Duke Multi-Target, Multi-Camera Tracking Project was collected, verified, and geocoded to show the biometric trade routes of people appearing in the images. Click on the markers to reveal research projects at that location.
+ To help understand how Duke MTMC Dataset has been used around the world by commercial, military, and academic organizations; existing publicly available research citing Duke Multi-Target, Multi-Camera Tracking Project was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.
</p>
</section>
@@ -286,7 +287,7 @@
<li class="com">Commercial</li>
<li class="gov">Military / Government</li>
</ul>
- <div class="source">Citation data is collected using <a href="https://semanticscholar.org" target="_blank">SemanticScholar.org</a> then dataset usage verified and geolocated.</div >
+ <div class="source">Citation data is collected using SemanticScholar.org then dataset usage verified and geolocated. Citations are used to provide overview of how and where images were used.</div>
</div>
diff --git a/site/public/datasets/helen/index.html b/site/public/datasets/helen/index.html
index 44ef462e..08791d29 100644
--- a/site/public/datasets/helen/index.html
+++ b/site/public/datasets/helen/index.html
@@ -4,7 +4,7 @@
<title>MegaPixels: HELEN</title>
<meta charset="utf-8" />
<meta name="author" content="Adam Harvey" />
- <meta name="description" content="HELEN Face Dataset" />
+ <meta name="description" content="HELEN is a dataset of face images from Flickr used for training facial component localization algorithms" />
<meta property="og:title" content="MegaPixels: HELEN"/>
<meta property="og:type" content="website"/>
<meta property="og:summary" content="MegaPixels is an art and research project about face recognition datasets created \"in the wild\"/>
@@ -55,8 +55,7 @@
</header>
<div class="content content-dataset">
- <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/helen/assets/background.jpg)'><div class='inner'><div class='hero_desc'><span class='bgpad'>HELEN Face Dataset</span></div><div class='hero_subdesc'><span class='bgpad'>HELEN (under development)
-</span></div></div></section><section><h2>HELEN</h2>
+ <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/helen/assets/background.jpg)'></section><section><div class='image'><div class='intro-caption caption'>Example images from the HELEN dataset</div></div></section><section><h1>HELEN Dataset</h1>
</section><section><div class='right-sidebar'><div class='meta'>
<div class='gray'>Published</div>
<div>2012</div>
@@ -69,8 +68,74 @@
</div><div class='meta'>
<div class='gray'>Website</div>
<div><a href='http://www.ifp.illinois.edu/~vuongle2/helen/' target='_blank' rel='nofollow noopener'>illinois.edu</a></div>
- </div></div><p>[ page under development ]</p>
-</section><section>
+ </div></div><p>Helen is a dataset of annotated face images used for facial component localization. It includes 2,330 images from Flickr found by searching for "portrait" combined with terms such as "family", "wedding", "boy", "outdoor", and "studio".<a class="footnote_shim" name="[^orig_paper]_1"> </a><a href="#[^orig_paper]" class="footnote" title="Footnote 1">1</a></p>
+<p>The dataset was published in 2012 with the primary motivation listed as facilitating "high quality editing of portraits". However, the paper's introduction also mentions that facial feature localization "is an essential component for face recognition, tracking and expression analysis."<a class="footnote_shim" name="[^orig_paper]_2"> </a><a href="#[^orig_paper]" class="footnote" title="Footnote 1">1</a></p>
+<p>Irregardless of the authors' primary motivations, the HELEN dataset has become one of the most widely used datasets for training facial landmark algorithms, which are essential parts of most facial recogntion processing systems. Facial landmarking are used to isolate facial features such as the eyes, nose, jawline, and mouth in order to align faces to match a templated pose.</p>
+</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/helen/assets/montage_lms_21_14_14_14_26.png' alt=' An example annotation from the HELEN dataset showing 194 points that were originally annotated by Mechanical Turk workers. Graphic &copy; 2019 MegaPixels.cc based on data from HELEN dataset by Le, Vuong et al.'><div class='caption'> An example annotation from the HELEN dataset showing 194 points that were originally annotated by Mechanical Turk workers. Graphic &copy; 2019 MegaPixels.cc based on data from HELEN dataset by Le, Vuong et al.</div></div></section><section><p>This analysis shows that since its initial publication in 2012, the HELEN dataset has been used in over 200 research projects related to facial recognition with the vast majority of research taking place in China.</p>
+<p>Commercial use includes IBM, NVIDIA, NEC, Microsoft Research Asia, Google, Megvii, Microsoft, Intel, Daimler, Tencent, Baidu, Adobe, Facebook</p>
+<p>Military and Defense Usage includes NUDT</p>
+<p><a href="http://eccv2012.unifi.it/">http://eccv2012.unifi.it/</a></p>
+<p>TODO</p>
+<ul>
+<li>add proof of use in dlib and openface</li>
+<li>add proof of use in commercial use of dlib? ibm dif</li>
+<li>make landmark over blurred images</li>
+<li>add 6x6 gride for landmarks</li>
+<li>highlight key findings</li>
+<li>highlight key commercial usage</li>
+<li>look for most interesting research papers to provide example of how it's used for face recognition</li>
+<li>estimated time: 6 hours</li>
+<li>add data to github repo?</li>
+</ul>
+<table>
+<thead><tr>
+<th>Organization</th>
+<th>Paper</th>
+<th>Link</th>
+<th>Year</th>
+<th>Used Duke MTMC</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>SenseTime, Amazon</td>
+<td><a href="https://arxiv.org/pdf/1805.10483.pdf">Look at Boundary: A Boundary-Aware Face Alignment Algorithm</a></td>
+</tr>
+<tr>
+<td>2018</td>
+<td>year</td>
+<td>&#x2714;</td>
+</tr>
+<tr>
+<td>SenseTime</td>
+<td><a href="https://arxiv.org/pdf/1807.11079.pdf">ReenactGAN: Learning to Reenact Faces via Boundary Transfer</a></td>
+<td>2018</td>
+<td>year</td>
+<td>&#x2714;</td>
+</tr>
+</tbody>
+</table>
+<p>The dataset was used for training the OpenFace software "we used the HELEN and LFPW training subsets for training and the rest for testing" <a href="https://github.com/TadasBaltrusaitis/OpenFace/wiki/Datasets">https://github.com/TadasBaltrusaitis/OpenFace/wiki/Datasets</a></p>
+<p>The popular dlib facial landmark detector was trained using HELEN</p>
+<p>In addition to the 200+ verified citations, the HELEN dataset was used for</p>
+<ul>
+<li><a href="https://github.com/memoiry/face-alignment">https://github.com/memoiry/face-alignment</a></li>
+<li><a href="http://www.dsp.toronto.edu/projects/face_analysis/">http://www.dsp.toronto.edu/projects/face_analysis/</a></li>
+</ul>
+<p>It's been converted into new datasets including</p>
+<ul>
+<li><a href="https://github.com/JPlin/Relabeled-HELEN-Dataset">https://github.com/JPlin/Relabeled-HELEN-Dataset</a></li>
+<li><a href="https://www.kaggle.com/kmader/helen-eye-dataset">https://www.kaggle.com/kmader/helen-eye-dataset</a></li>
+</ul>
+<p>The original site</p>
+<ul>
+<li><a href="http://www.ifp.illinois.edu/~vuongle2/helen/">http://www.ifp.illinois.edu/~vuongle2/helen/</a></li>
+</ul>
+<h3>Example Images</h3>
+</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/helen/assets/feature_outdoor_02.jpg' alt=' An image from the HELEN dataset "wedding" category used for training face recognition 2839127417_1.jpg for outdoor studio'><div class='caption'> An image from the HELEN dataset "wedding" category used for training face recognition 2839127417_1.jpg for outdoor studio</div></div>
+<div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/helen/assets/feature_graduation.jpg' alt=' An image from the HELEN dataset "wedding" category used for training face recognition 2325274893_1 '><div class='caption'> An image from the HELEN dataset "wedding" category used for training face recognition 2325274893_1 </div></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/helen/assets/feature_wedding.jpg' alt=' An image from the HELEN dataset "wedding" category used for training face recognition 2325274893_1 '><div class='caption'> An image from the HELEN dataset "wedding" category used for training face recognition 2325274893_1 </div></div>
+<div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/helen/assets/feature_wedding_02.jpg' alt=' An image from the HELEN dataset "wedding" category used for training face recognition 2325274893_1 '><div class='caption'> An image from the HELEN dataset "wedding" category used for training face recognition 2325274893_1 </div></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/helen/assets/feature_family.jpg' alt=' Original Flickr image used in HELEN facial analysis and recognition dataset for the keyword "family". 296814969'><div class='caption'> Original Flickr image used in HELEN facial analysis and recognition dataset for the keyword "family". 296814969</div></div>
+<div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/helen/assets/feature_family_05.jpg' alt=' Original Flickr image used in HELEN facial analysis and recognition dataset for the keyword "family". 296814969'><div class='caption'> Original Flickr image used in HELEN facial analysis and recognition dataset for the keyword "family". 296814969</div></div></section><section>
<h3>Who used Helen Dataset?</h3>
<p>
@@ -91,10 +156,10 @@
<section>
- <h3>Information Supply chain</h3>
+ <h3>Information Supply Chain</h3>
<p>
- To help understand how Helen Dataset has been used around the world by commercial, military, and academic organizations; existing publicly available research citing Helen Dataset was collected, verified, and geocoded to show the biometric trade routes of people appearing in the images. Click on the markers to reveal research projects at that location.
+ To help understand how Helen Dataset has been used around the world by commercial, military, and academic organizations; existing publicly available research citing Helen Dataset was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.
</p>
</section>
@@ -109,7 +174,7 @@
<li class="com">Commercial</li>
<li class="gov">Military / Government</li>
</ul>
- <div class="source">Citation data is collected using <a href="https://semanticscholar.org" target="_blank">SemanticScholar.org</a> then dataset usage verified and geolocated.</div >
+ <div class="source">Citation data is collected using SemanticScholar.org then dataset usage verified and geolocated. Citations are used to provide overview of how and where images were used.</div>
</div>
@@ -130,7 +195,10 @@
<h2>Supplementary Information</h2>
+</section><section><h3>Age and Gender Distribution</h3>
</section><section>
+ <p>Age and gender estimation distribution were calculated by anlayzing all faces in the dataset images. This may include additional faces appearing next to an annotated face, or this may skip false faces that were erroneously included as part of the original dataset. These numbers are provided as an estimation and not a factual representation of the exact gender and age of all faces.</p>
+</section><section><div class='columns columns-2'><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /datasets/helen/assets/age.csv", "fields": ["Caption: HELEN dataset age distribution", "Top: 10", "OtherLabel: Other"]}'></div></section><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /datasets/helen/assets/gender.csv", "fields": ["Caption: HELEN dataset gender distribution", "Top: 10", "OtherLabel: Other"]}'></div></section></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/helen/assets/montage_lms_21_15_15_7_26_0.png' alt=' Visualization of the HELEN dataset 194-point facial landmark annotations. Credit: graphic &copy; MegaPixels.cc 2019, data from HELEN dataset by Zhou, Brand, Lin 2013. If you use this image please credit both the graphic and data source.'><div class='caption'> Visualization of the HELEN dataset 194-point facial landmark annotations. Credit: graphic &copy; MegaPixels.cc 2019, data from HELEN dataset by Zhou, Brand, Lin 2013. If you use this image please credit both the graphic and data source.</div></div></section><section>
<h4>Cite Our Work</h4>
<p>
@@ -147,7 +215,17 @@
}</pre>
</p>
-</section>
+</section><section><h4>Cite the Original Author's Work</h4>
+<p>If you find the HELEN dataset useful or reference it in your work, please cite the author's original work as:</p>
+<pre>
+@inproceedings{Le2012InteractiveFF,
+ title={Interactive Facial Feature Localization},
+ author={Vuong Le and Jonathan Brandt and Zhe L. Lin and Lubomir D. Bourdev and Thomas S. Huang},
+ booktitle={ECCV},
+ year={2012}
+}
+</pre></section><section><h3>References</h3><section><ul class="footnotes"><li>1 <a name="[^orig_paper]" class="footnote_shim"></a><span class="backlinks"><a href="#[^orig_paper]_1">a</a><a href="#[^orig_paper]_2">b</a></span>Le, Vuong et al. “Interactive Facial Feature Localization.” ECCV (2012).
+</li></ul></section></section>
</div>
<footer>
diff --git a/site/public/datasets/ibm_dif/index.html b/site/public/datasets/ibm_dif/index.html
index be5dbfe4..924194a7 100644
--- a/site/public/datasets/ibm_dif/index.html
+++ b/site/public/datasets/ibm_dif/index.html
@@ -1,11 +1,11 @@
<!doctype html>
<html>
<head>
- <title>MegaPixels: MegaFace</title>
+ <title>MegaPixels: IBM DiF</title>
<meta charset="utf-8" />
<meta name="author" content="Adam Harvey" />
- <meta name="description" content="MegaFace Dataset" />
- <meta property="og:title" content="MegaPixels: MegaFace"/>
+ <meta name="description" content="Diversity in Faces Dataset" />
+ <meta property="og:title" content="MegaPixels: IBM DiF"/>
<meta property="og:type" content="website"/>
<meta property="og:summary" content="MegaPixels is an art and research project about face recognition datasets created \"in the wild\"/>
<meta property="og:image" content="https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/ibm_dif/assets/background.jpg" />
@@ -45,7 +45,7 @@
<a class='slogan' href="/">
<div class='logo'></div>
<div class='site_name'>MegaPixels</div>
- <div class='page_name'>MegaFace Dataset</div>
+ <div class='page_name'>IBM Diversity in Faces</div>
</a>
<div class='links'>
<a href="/datasets/">Datasets</a>
@@ -55,26 +55,19 @@
</header>
<div class="content content-dataset">
- <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/ibm_dif/assets/background.jpg)'><div class='inner'><div class='hero_desc'><span class='bgpad'>MegaFace Dataset</span></div><div class='hero_subdesc'><span class='bgpad'>MegaFace contains 670K identities and 4.7M images
-</span></div></div></section><section><h2>MegaFace</h2>
+ <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/ibm_dif/assets/background.jpg)'></section><section><h2>IBM Diversity in Faces</h2>
</section><section><div class='right-sidebar'><div class='meta'>
- <div class='gray'>Published</div>
- <div>2016</div>
- </div><div class='meta'>
<div class='gray'>Images</div>
- <div>4,753,520 </div>
- </div><div class='meta'>
- <div class='gray'>Identities</div>
- <div>672,057 </div>
+ <div>1,070,000 </div>
</div><div class='meta'>
<div class='gray'>Purpose</div>
- <div>face recognition</div>
+ <div>Face recognition and cranio-facial analysis</div>
</div><div class='meta'>
<div class='gray'>Website</div>
- <div><a href='http://megaface.cs.washington.edu/' target='_blank' rel='nofollow noopener'>washington.edu</a></div>
+ <div><a href='https://www.research.ibm.com/artificial-intelligence/trusted-ai/diversity-in-faces/' target='_blank' rel='nofollow noopener'>ibm.com</a></div>
</div></div><p>[ page under development ]</p>
</section><section>
- <h3>Who used MegaFace Dataset?</h3>
+ <h3>Who used IBM Diversity in Faces?</h3>
<p>
This bar chart presents a ranking of the top countries where dataset citations originated. Mouse over individual columns to see yearly totals. These charts show at most the top 10 countries.
@@ -94,10 +87,10 @@
<section>
- <h3>Information Supply chain</h3>
+ <h3>Information Supply Chain</h3>
<p>
- To help understand how MegaFace Dataset has been used around the world by commercial, military, and academic organizations; existing publicly available research citing MegaFace Dataset was collected, verified, and geocoded to show the biometric trade routes of people appearing in the images. Click on the markers to reveal research projects at that location.
+ To help understand how IBM Diversity in Faces has been used around the world by commercial, military, and academic organizations; existing publicly available research citing Diversity in Faces Dataset was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.
</p>
</section>
@@ -112,7 +105,7 @@
<li class="com">Commercial</li>
<li class="gov">Military / Government</li>
</ul>
- <div class="source">Citation data is collected using <a href="https://semanticscholar.org" target="_blank">SemanticScholar.org</a> then dataset usage verified and geolocated.</div >
+ <div class="source">Citation data is collected using SemanticScholar.org then dataset usage verified and geolocated. Citations are used to provide overview of how and where images were used.</div>
</div>
diff --git a/site/public/datasets/ijb_c/index.html b/site/public/datasets/ijb_c/index.html
index abe7d5ed..05826c3f 100644
--- a/site/public/datasets/ijb_c/index.html
+++ b/site/public/datasets/ijb_c/index.html
@@ -55,8 +55,7 @@
</header>
<div class="content content-dataset">
- <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/ijb_c/assets/background.jpg)'><div class='inner'><div class='hero_desc'><span class='bgpad'>IARPA Janus Benchmark C is a dataset of web images used</span></div><div class='hero_subdesc'><span class='bgpad'>The IJB-C dataset contains 21,294 images and 11,779 videos of 3,531 identities
-</span></div></div></section><section><h2>IARPA Janus Benchmark C (IJB-C)</h2>
+ <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/ijb_c/assets/background.jpg)'></section><section><h2>IARPA Janus Benchmark C (IJB-C)</h2>
</section><section><div class='right-sidebar'><div class='meta'>
<div class='gray'>Published</div>
<div>2017</div>
@@ -147,10 +146,10 @@
<section>
- <h3>Information Supply chain</h3>
+ <h3>Information Supply Chain</h3>
<p>
- To help understand how IJB-C has been used around the world by commercial, military, and academic organizations; existing publicly available research citing IARPA Janus Benchmark C was collected, verified, and geocoded to show the biometric trade routes of people appearing in the images. Click on the markers to reveal research projects at that location.
+ To help understand how IJB-C has been used around the world by commercial, military, and academic organizations; existing publicly available research citing IARPA Janus Benchmark C was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.
</p>
</section>
@@ -165,7 +164,7 @@
<li class="com">Commercial</li>
<li class="gov">Military / Government</li>
</ul>
- <div class="source">Citation data is collected using <a href="https://semanticscholar.org" target="_blank">SemanticScholar.org</a> then dataset usage verified and geolocated.</div >
+ <div class="source">Citation data is collected using SemanticScholar.org then dataset usage verified and geolocated. Citations are used to provide overview of how and where images were used.</div>
</div>
diff --git a/site/public/datasets/index.html b/site/public/datasets/index.html
index d38feb2e..a354a2d5 100644
--- a/site/public/datasets/index.html
+++ b/site/public/datasets/index.html
@@ -53,13 +53,13 @@
<a href="/research">Research</a>
</div>
</header>
- <div class="content content-">
+ <div class="content content-dataset-list">
<div class='dataset-heading'>
<section><h1>Dataset Analyses</h1>
<p>Explore face and person recognition datasets contributing to the growing crisis of biometric surveillance technologies. This first group of 5 datasets focuses on image usage connected to foreign surveillance and defense organizations.</p>
-<p>In response to the analyses below, the <a href="https://purl.stanford.edu/sx925dc9385">Brainwash</a>, <a href="http://vision.cs.duke.edu/DukeMTMC/">Duke MTMC</a>, and <a href="http://msceleb.org/">MS Celeb</a> datasets have been taken down by their authors. The <a href="https://vast.uccs.edu/Opensetface/">UCCS</a> dataset was temporarily deactivated due to metadata exposure. Read more <a href="/about/news">news</a>. A more complete list of datasets and research will be published in September 2019. These 5 are only a preview.</p>
+<p>In response to the analyses below, the <a href="/datasets/brainwash">Brainwash</a>, <a href="/datasets/duke_mtmc">Duke MTMC</a>, and <a href="/datasets/msceleb/">MS Celeb</a> datasets have been taken down by their authors. The <a href="/dataests/uccs/">UCCS</a> dataset was temporarily deactivated due to metadata exposure. Read more <a href="/about/news">news</a>. A more complete list of datasets and research will be published in September 2019. These 5 are only a preview.</p>
</section>
</div>
@@ -97,6 +97,34 @@
</div>
</a>
+ <a href="/datasets/helen/">
+ <div class="dataset-image" style="background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/helen/assets/index.jpg)"></div>
+ <div class="dataset">
+ <span class='title'>HELEN</span>
+ <div class='fields'>
+ <div class='year visible'><span>2012</span></div>
+ <div class='purpose'><span>facial feature localization algorithm</span></div>
+
+ <div class='images'><span>2,330 images</span></div>
+
+ </div>
+ </div>
+ </a>
+
+ <a href="/datasets/megaface/">
+ <div class="dataset-image" style="background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/megaface/assets/index.jpg)"></div>
+ <div class="dataset">
+ <span class='title'>MegaFace</span>
+ <div class='fields'>
+ <div class='year visible'><span>2016</span></div>
+ <div class='purpose'><span>face recognition</span></div>
+
+ <div class='images'><span>4,753,520 images</span></div>
+
+ </div>
+ </div>
+ </a>
+
<a href="/datasets/msceleb/">
<div class="dataset-image" style="background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/msceleb/assets/index.jpg)"></div>
<div class="dataset">
diff --git a/site/public/datasets/lfpw/index.html b/site/public/datasets/lfpw/index.html
index f2ddc636..cc2a2c3f 100644
--- a/site/public/datasets/lfpw/index.html
+++ b/site/public/datasets/lfpw/index.html
@@ -55,8 +55,7 @@
</header>
<div class="content content-dataset">
- <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/lfpw/assets/background.jpg)'><div class='inner'><div class='hero_desc'><span class='bgpad'>Labeled Face Parts in the Wild Dataset</span></div><div class='hero_subdesc'><span class='bgpad'>Labeled Face Parts in the Wild ...
-</span></div></div></section><section><h2>Labeled Face Parts in the Wild</h2>
+ <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/lfpw/assets/background.jpg)'></section><section><h2>Labeled Face Parts in the Wild</h2>
</section><section><div class='right-sidebar'><div class='meta'>
<div class='gray'>Published</div>
<div>2011</div>
@@ -69,7 +68,13 @@
</div><div class='meta'>
<div class='gray'>Website</div>
<div><a href='http://neerajkumar.org/databases/lfpw/' target='_blank' rel='nofollow noopener'>neerajkumar.org</a></div>
- </div></div><p>[ page under development ]</p>
+ </div></div><p>RESEARCH below this line</p>
+<blockquote><p>Release 1 of LFPW consists of 1,432 faces from images downloaded from the web using simple text queries on sites such as google.com, flickr.com, and yahoo.com. Each image was labeled by three MTurk workers, and 29 fiducial points, shown below, are included in dataset. LFPW was originally described in the following publication:</p>
+<p>Due to copyright issues, we cannot distribute image files in any format to anyone. Instead, we have made available a list of image URLs where you can download the images yourself. We realize that this makes it impossible to exactly compare numbers, as image links will slowly disappear over time, but we have no other option. This seems to be the way other large web-based databases seem to be evolving.</p>
+</blockquote>
+<p><a href="https://neerajkumar.org/databases/lfpw/">https://neerajkumar.org/databases/lfpw/</a></p>
+<blockquote><p>This research was performed at Kriegman-Belhumeur Vision Technologies and was funded by the CIA through the Office of the Chief Scientist. <a href="https://www.cs.cmu.edu/~peiyunh/topdown/">https://www.cs.cmu.edu/~peiyunh/topdown/</a> (nk_cvpr2011_faceparts.pdf)</p>
+</blockquote>
</section><section>
<h3>Who used LFPW?</h3>
@@ -91,10 +96,10 @@
<section>
- <h3>Information Supply chain</h3>
+ <h3>Information Supply Chain</h3>
<p>
- To help understand how LFPW has been used around the world by commercial, military, and academic organizations; existing publicly available research citing Labeled Face Parts in the Wild was collected, verified, and geocoded to show the biometric trade routes of people appearing in the images. Click on the markers to reveal research projects at that location.
+ To help understand how LFPW has been used around the world by commercial, military, and academic organizations; existing publicly available research citing Labeled Face Parts in the Wild was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.
</p>
</section>
@@ -109,7 +114,7 @@
<li class="com">Commercial</li>
<li class="gov">Military / Government</li>
</ul>
- <div class="source">Citation data is collected using <a href="https://semanticscholar.org" target="_blank">SemanticScholar.org</a> then dataset usage verified and geolocated.</div >
+ <div class="source">Citation data is collected using SemanticScholar.org then dataset usage verified and geolocated. Citations are used to provide overview of how and where images were used.</div>
</div>
diff --git a/site/public/datasets/megaface/index.html b/site/public/datasets/megaface/index.html
index 712af28a..d213293a 100644
--- a/site/public/datasets/megaface/index.html
+++ b/site/public/datasets/megaface/index.html
@@ -55,8 +55,7 @@
</header>
<div class="content content-dataset">
- <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/megaface/assets/background.jpg)'><div class='inner'><div class='hero_desc'><span class='bgpad'>MegaFace Dataset</span></div><div class='hero_subdesc'><span class='bgpad'>MegaFace contains 670K identities and 4.7M images
-</span></div></div></section><section><h2>MegaFace</h2>
+ <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/megaface/assets/background.jpg)'></section><section><div class='image'><div class='intro-caption caption'>Images from the MegaFace face recognition training and benchmarking dataset</div></div></section><section><h1>MegaFace</h1>
</section><section><div class='right-sidebar'><div class='meta'>
<div class='gray'>Published</div>
<div>2016</div>
@@ -68,11 +67,32 @@
<div>672,057 </div>
</div><div class='meta'>
<div class='gray'>Purpose</div>
- <div>face recognition</div>
+ <div>Face recognition</div>
+ </div><div class='meta'>
+ <div class='gray'>Created by</div>
+ <div>Ira Kemelmacher-Shlizerman</div>
</div><div class='meta'>
<div class='gray'>Website</div>
<div><a href='http://megaface.cs.washington.edu/' target='_blank' rel='nofollow noopener'>washington.edu</a></div>
- </div></div><p>[ page under development ]</p>
+ </div></div><p>MegaFace is a dataset of 4,700,000 face images of 672,000 individuals used for developing face recognition technologies. All images were downloaded from Flickr.</p>
+<h4>How was it made</h4>
+<p>MegaFace was developed by the University of Washington for the purpose of trainng, validating, and benchmarking face recognition algorithms.</p>
+<p>The images are from Flickr, but are they all from YFCC100M?</p>
+<h4>Who used it</h4>
+<p>MegaFace was used for research projects associated with SenseTime, Google, Mitsubishi, Vision Semantics Ltd, Microsoft.</p>
+<h4>Subsets</h4>
+<p>MegaFace was also used for MegaFace Asian, and MegaAge, and glasses.</p>
+<h4>A sample of the research projects</h4>
+<p>Used for face recognition</p>
+<p>screenshots of papers</p>
+<h4>Visuals</h4>
+<ul>
+<li>facial landmarks</li>
+<li>bounding boxes</li>
+<li>animation of all the titles of the paper</li>
+<li></li>
+</ul>
+<h2>#</h2>
</section><section>
<h3>Who used MegaFace Dataset?</h3>
@@ -94,10 +114,10 @@
<section>
- <h3>Information Supply chain</h3>
+ <h3>Information Supply Chain</h3>
<p>
- To help understand how MegaFace Dataset has been used around the world by commercial, military, and academic organizations; existing publicly available research citing MegaFace Dataset was collected, verified, and geocoded to show the biometric trade routes of people appearing in the images. Click on the markers to reveal research projects at that location.
+ To help understand how MegaFace Dataset has been used around the world by commercial, military, and academic organizations; existing publicly available research citing MegaFace Dataset was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.
</p>
</section>
@@ -112,7 +132,7 @@
<li class="com">Commercial</li>
<li class="gov">Military / Government</li>
</ul>
- <div class="source">Citation data is collected using <a href="https://semanticscholar.org" target="_blank">SemanticScholar.org</a> then dataset usage verified and geolocated.</div >
+ <div class="source">Citation data is collected using SemanticScholar.org then dataset usage verified and geolocated. Citations are used to provide overview of how and where images were used.</div>
</div>
@@ -133,6 +153,9 @@
<h2>Supplementary Information</h2>
+</section><section><h3>Age and Gender Distribution</h3>
+</section><section><div class='columns columns-2'><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /datasets/megaface/assets/age.csv", "fields": ["Caption: MegaFace dataset age distribution", "Top: 10", "OtherLabel: Other"]}'></div></section><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /datasets/megaface/assets/gender.csv", "fields": ["Caption: MegaFace dataset gender distribution", "Top: 10", "OtherLabel: Other"]}'></div></section></div></section><section>
+ <p>Age and gender estimation distribution were calculated by anlayzing all faces in the dataset images. This may include additional faces appearing next to an annotated face, or this may skip false faces that were erroneously included as part of the original dataset. These numbers are provided as an estimation and not a factual representation of the exact gender and age of all faces.</p>
</section><section>
<h4>Cite Our Work</h4>
diff --git a/site/public/datasets/msceleb/index.html b/site/public/datasets/msceleb/index.html
index 42a44571..a664e99f 100644
--- a/site/public/datasets/msceleb/index.html
+++ b/site/public/datasets/msceleb/index.html
@@ -55,8 +55,8 @@
</header>
<div class="content content-dataset">
- <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/msceleb/assets/background.jpg)'><div class='inner'><div class='hero_desc'><span class='bgpad'>MS Celeb is a dataset of 10 million face images harvested from the Internet</span></div><div class='hero_subdesc'><span class='bgpad'>The MS Celeb dataset includes 10 million images of 100,000 people and an additional target list of 1,000,000 individuals
-</span></div></div></section><section><h2>Microsoft Celeb Dataset (MS Celeb)</h2>
+ <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/msceleb/assets/background.jpg)'></section><section><div class='image'><div class='intro-caption caption'>Example images forom the MS-Celeb-1M dataset</div></div></section><section><h1>Microsoft Celeb Dataset (MS Celeb)</h1>
+<p><em>Update: In response to this report and an <a href="https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d43599e">investigation</a> by the Financial Times, Microsoft has terminated their MS-Celeb website <a href="https://msceleb.org">https://msceleb.org</a>.</em></p>
</section><section><div class='right-sidebar'><div class='meta'>
<div class='gray'>Published</div>
<div>2016</div>
@@ -78,7 +78,8 @@
</div><div class='meta'>
<div class='gray'>Website</div>
<div><a href='http://www.msceleb.org/' target='_blank' rel='nofollow noopener'>msceleb.org</a></div>
- </div></div><p>Microsoft Celeb (MS-Celeb-1M) is a dataset of 10 million face images harvested from the Internet for the purpose of developing face recognition technologies. According to Microsoft Research, who created and published the <a href="https://www.microsoft.com/en-us/research/publication/ms-celeb-1m-dataset-benchmark-large-scale-face-recognition-2/">dataset</a> in 2016, MS Celeb is the largest publicly available face recognition dataset in the world, containing over 10 million images of nearly 100,000 individuals. Microsoft's goal in building this dataset was to distribute an initial training dataset of 100,000 individuals' biometric data to accelerate research into recognizing a larger target list of one million people "using all the possibly collected face images of this individual on the web as training data".<a class="footnote_shim" name="[^msceleb_orig]_1"> </a><a href="#[^msceleb_orig]" class="footnote" title="Footnote 1">1</a></p>
+ </div><div class='meta'><div class='gray'>Press coverage</div><div><a href="https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d43599e">Financial Times</a>, <a href="https://www.nytimes.com/2019/07/13/technology/databases-faces-facial-recognition-technology.html">New York Times</a>, <a href="https://www.bbc.com/news/technology-48555149">BBC</a>, <a href="https://www.spiegel.de/netzwelt/web/microsoft-gesichtserkennung-datenbank-mit-zehn-millionen-fotos-geloescht-a-1271221.html">Spiegel</a>, <a href="https://www.lesechos.fr/tech-medias/intelligence-artificielle/le-mariage-explosif-de-nos-donnees-et-de-lia-1031813">Les Echos</a>, <a href="https://www.lastampa.it/2019/06/22/tecnologia/microsoft-ha-cancellato-il-suo-database-per-il-riconoscimento-facciale-PWwLGmpO1fKQdykMZVBd9H/pagina.html">La Stampa</a></div></div></div><p>Microsoft Celeb (MS-Celeb-1M) is a dataset of 10 million face images harvested from the Internet for the purpose of developing face recognition technologies.</p>
+<p>According to Microsoft Research, who created and published the <a href="https://www.microsoft.com/en-us/research/publication/ms-celeb-1m-dataset-benchmark-large-scale-face-recognition-2/">dataset</a> in 2016, MS Celeb is the largest publicly available face recognition dataset in the world, containing over 10 million images of nearly 100,000 individuals. Microsoft's goal in building this dataset was to distribute an initial training dataset of 100,000 individuals' biometric data to accelerate research into recognizing a larger target list of one million people "using all the possibly collected face images of this individual on the web as training data".<a class="footnote_shim" name="[^msceleb_orig]_1"> </a><a href="#[^msceleb_orig]" class="footnote" title="Footnote 1">1</a></p>
<p>While the majority of people in this dataset are American and British actors, the exploitative use of the term "celebrity" extends far beyond Hollywood. Many of the names in the MS Celeb face recognition dataset are merely people who must maintain an online presence for their professional lives: journalists, artists, musicians, activists, policy makers, writers, and academics. Many people in the target list are even vocal critics of the very technology Microsoft is using their name and biometric information to build. It includes digital rights activists like Jillian York; artists critical of surveillance including Trevor Paglen, Jill Magid, and Aram Bartholl; Intercept founders Laura Poitras, Jeremy Scahill, and Glenn Greenwald; Data and Society founder danah boyd; Shoshana Zuboff, author of <em>Surveillance Capitalism</em>; and even Julie Brill, the former FTC commissioner responsible for protecting consumer privacy.</p>
<h3>Microsoft's 1 Million Target List</h3>
<p>Microsoft Research distributed two main digital assets: a dataset of approximately 10,000,000 images of 100,000 individuals and a target list of exactly 1 million names. The 900,000 names without images are the target list, which is used to gather more images for each subject.</p>
@@ -219,6 +220,8 @@
<p>In 2017 Microsoft Research organized a face recognition competition at the International Conference on Computer Vision (ICCV), one of the top 2 computer vision conferences worldwide, where industry and academia used the MS Celeb dataset to compete for the highest performance scores. The 2017 winner was Beijing-based OrionStar Technology Co., Ltd.. In their <a href="https://www.prnewswire.com/news-releases/orionstar-wins-challenge-to-recognize-one-million-celebrity-faces-with-artificial-intelligence-300494265.html">press release</a>, OrionStar boasted a 13% increase on the difficult set over last year's winner. The prior year's competitors included Beijing-based Faceall Technology Co., Ltd., a company providing face recognition for "smart city" applications.</p>
<p>Considering the multiple citations from commercial organizations (Canon, Hitachi, IBM, Megvii/Face++, Microsoft, Microsoft Asia, SenseTime, OrionStar, Faceall), military use (National University of Defense Technology in China), the proliferation of subset data (Racial Faces in the Wild), and the real-time visible proliferation via Academic Torrents it's fairly clear that Microsoft has lost control of their MS Celeb dataset and the biometric data of nearly 100,000 individuals.</p>
<p>To provide insight into where these 10 million faces images have traveled, over 100 research papers have been verified and geolocated to show who used the dataset and where they used it.</p>
+<h2>GDPR and MS-Celeb</h2>
+<p>[ in progress ]</p>
</section><section>
<h3>Who used Microsoft Celeb?</h3>
@@ -240,10 +243,10 @@
<section>
- <h3>Information Supply chain</h3>
+ <h3>Information Supply Chain</h3>
<p>
- To help understand how Microsoft Celeb has been used around the world by commercial, military, and academic organizations; existing publicly available research citing Microsoft Celebrity Dataset was collected, verified, and geocoded to show the biometric trade routes of people appearing in the images. Click on the markers to reveal research projects at that location.
+ To help understand how Microsoft Celeb has been used around the world by commercial, military, and academic organizations; existing publicly available research citing Microsoft Celebrity Dataset was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.
</p>
</section>
@@ -258,7 +261,7 @@
<li class="com">Commercial</li>
<li class="gov">Military / Government</li>
</ul>
- <div class="source">Citation data is collected using <a href="https://semanticscholar.org" target="_blank">SemanticScholar.org</a> then dataset usage verified and geolocated.</div >
+ <div class="source">Citation data is collected using SemanticScholar.org then dataset usage verified and geolocated. Citations are used to provide overview of how and where images were used.</div>
</div>
@@ -279,11 +282,19 @@
<h2>Supplementary Information</h2>
-</section><section><h5>FAQs and Fact Check</h5>
+</section><section><h3>Age and Gender Distribution</h3>
+</section><section><div class='columns columns-2'><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /datasets/msceleb/assets/age.csv", "fields": ["Caption: MS-Celeb dataset age distribution", "Top: 10", "OtherLabel: Other"]}'></div></section><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /datasets/helen/assets/gender.csv", "fields": ["Caption: MS-Celeb dataset gender distribution", "Top: 10", "OtherLabel: Other"]}'></div></section></div></section><section><h5>FAQs and Fact Check</h5>
<ul>
-<li><strong>The MS Celeb images were not derived from Creative Commons sources</strong>. They were obtained by "retriev[ing] approximately 100 images per celebrity from popular search engines"<a class="footnote_shim" name="[^msceleb_orig]_2"> </a><a href="#[^msceleb_orig]" class="footnote" title="Footnote 1">1</a>. The dataset actually includes many copyrighted images. Microsoft doesn't provide any image URLs, but manually reviewing a small portion of images from the dataset shows many images with watermarked "Copyright" text over the image. TinEye could be used to more accurately determine the image origins in aggregate</li>
-<li><strong>Microsoft did not distribute images of all one million people.</strong> They distributed images for about 100,000 and then encouraged other researchers to download the remaining 900,000 people "by using all the possibly collected face images of this individual on the web as training data."<a class="footnote_shim" name="[^msceleb_orig]_3"> </a><a href="#[^msceleb_orig]" class="footnote" title="Footnote 1">1</a></li>
-<li><strong>Microsoft had not deleted or stopped distribution of their MS Celeb at the time of most press reports on June 4.</strong> Until at least June 6, 2019 the Microsoft Research data portal provided the MS Celeb dataset for download: <a href="http://web.archive.org/web/20190606150005/https://msropendata.com/datasets/98fdfc70-85ee-5288-a69f-d859bbe9c737">http://web.archive.org/web/20190606150005/https://msropendata.com/datasets/98fdfc70-85ee-5288-a69f-d859bbe9c737</a></li>
+<li><strong>Despite several erroneous reports mentioning the MS-Celeb images were derived from Creative Commons licensed media, the MS Celeb images were obtained from web search engines</strong>. The authors mention "they were obtained by "retriev[ing] approximately 100 images per celebrity from popular search engines"<a class="footnote_shim" name="[^msceleb_orig]_2"> </a><a href="#[^msceleb_orig]" class="footnote" title="Footnote 1">1</a>. Many, if not the vast majority, are copyrighted images. Microsoft doesn't provide image URLs, but manually reviewing a small portion of images from the dataset shows images with watermarked "Copyright" text over the image and sources including stock photo agencies such as Getty. TinEye could be used to more accurately determine the image origins in aggregate.</li>
+<li><strong>Most reports incorrectly reported that Microsoft distributed images of all one million people. As this analysis mentions several times, Microsoft distributed images for 100,000 people and a separate target list of 900,000 more names.</strong> Other researchers where then expected and encouraged to download the remaining 900,000 people "by using all the possibly collected face images of this individual on the web as training data."<a class="footnote_shim" name="[^msceleb_orig]_3"> </a><a href="#[^msceleb_orig]" class="footnote" title="Footnote 1">1</a></li>
+<li><strong>Microsoft claimed that they had deleted or stopped distribution of their MS Celeb dataset in April 2019 after the Financial Times investigation. This false.</strong> Until at least June 6, 2019 the Microsoft Research data portal freely provided the full MS Celeb dataset download: <a href="http://web.archive.org/web/20190606150005/https://msropendata.com/datasets/98fdfc70-85ee-5288-a69f-d859bbe9c737">http://web.archive.org/web/20190606150005/https://msropendata.com/datasets/98fdfc70-85ee-5288-a69f-d859bbe9c737</a></li>
+</ul>
+<h3>Press Coverage</h3>
+<ul>
+<li>Financial Times (original story): <a href="https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d43599e">Who’s using your face? The ugly truth about facial recognition</a> </li>
+<li>New York Times (front page story): <a href="https://www.nytimes.com/2019/07/13/technology/databases-faces-facial-recognition-technology.html">Facial Recognition Tech Is Growing Stronger, Thanks to Your Face</a></li>
+<li>BBC: <a href="https://www.bbc.com/news/technology-48555149">Microsoft deletes massive face recognition database</a></li>
+<li>Spiegel: <a href="https://www.spiegel.de/netzwelt/web/microsoft-gesichtserkennung-datenbank-mit-zehn-millionen-fotos-geloescht-a-1271221.html">Microsoft löscht Datenbank mit zehn Millionen Fotos</a></li>
</ul>
</section><section><h3>References</h3><section><ul class="footnotes"><li>1 <a name="[^msceleb_orig]" class="footnote_shim"></a><span class="backlinks"><a href="#[^msceleb_orig]_1">a</a><a href="#[^msceleb_orig]_2">b</a><a href="#[^msceleb_orig]_3">c</a></span>MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition. Accessed April 18, 2019. <a href="http://web.archive.org/web/20190418151913/http://msceleb.org/">http://web.archive.org/web/20190418151913/http://msceleb.org/</a>
</li><li>2 <a name="[^madhu_ft]" class="footnote_shim"></a><span class="backlinks"><a href="#[^madhu_ft]_1">a</a></span>Murgia, Madhumita. Microsoft worked with Chinese military university on artificial intelligence. Financial Times. April 10, 2019.
diff --git a/site/public/datasets/oxford_town_centre/index.html b/site/public/datasets/oxford_town_centre/index.html
index 11fb436f..3a7eabf0 100644
--- a/site/public/datasets/oxford_town_centre/index.html
+++ b/site/public/datasets/oxford_town_centre/index.html
@@ -55,8 +55,7 @@
</header>
<div class="content content-dataset">
- <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/oxford_town_centre/assets/background.jpg)'><div class='inner'><div class='hero_desc'><span class='bgpad'>Oxford Town Centre is a dataset of surveillance camera footage from Cornmarket St Oxford, England</span></div><div class='hero_subdesc'><span class='bgpad'>The Oxford Town Centre dataset includes approximately 2,200 identities and is used for research and development of face recognition systems
-</span></div></div></section><section><h2>Oxford Town Centre</h2>
+ <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/oxford_town_centre/assets/background.jpg)'></section><section><div class='image'><div class='intro-caption caption'>A still frame from the Oxford Town Centre CCTV video-dataset</div></div></section><section><h1>Oxford Town Centre</h1>
</section><section><div class='right-sidebar'><div class='meta'>
<div class='gray'>Published</div>
<div>2009</div>
@@ -78,7 +77,8 @@
</div><div class='meta'>
<div class='gray'>Website</div>
<div><a href='http://www.robots.ox.ac.uk/ActiveVision/Research/Projects/2009bbenfold_headpose/project.html' target='_blank' rel='nofollow noopener'>ox.ac.uk</a></div>
- </div></div><p>The Oxford Town Centre dataset is a CCTV video of pedestrians in a busy downtown area in Oxford used for research and development of activity and face recognition systems.<a class="footnote_shim" name="[^ben_benfold_orig]_1"> </a><a href="#[^ben_benfold_orig]" class="footnote" title="Footnote 1">1</a> The CCTV video was obtained from a surveillance camera at the corner of Cornmarket and Market St. in Oxford, England and includes approximately 2,200 people. Since its publication in 2009<a class="footnote_shim" name="[^guiding_surveillance]_1"> </a><a href="#[^guiding_surveillance]" class="footnote" title="Footnote 2">2</a> the <a href="http://www.robots.ox.ac.uk/ActiveVision/Research/Projects/2009bbenfold_headpose/project.html">Oxford Town Centre dataset</a> has been used in over 80 verified research projects including commercial research by Amazon, Disney, OSRAM, and Huawei; and academic research in China, Israel, Russia, Singapore, the US, and Germany among dozens more.</p>
+ </div></div><p>The Oxford Town Centre dataset is a CCTV video of pedestrians in a busy downtown area in Oxford used for research and development of activity and face recognition systems.<a class="footnote_shim" name="[^ben_benfold_orig]_1"> </a><a href="#[^ben_benfold_orig]" class="footnote" title="Footnote 1">1</a></p>
+<p>The CCTV video was obtained from a surveillance camera at the corner of Cornmarket and Market St. in Oxford, England and includes approximately 2,200 people. Since its publication in 2009<a class="footnote_shim" name="[^guiding_surveillance]_1"> </a><a href="#[^guiding_surveillance]" class="footnote" title="Footnote 2">2</a> the <a href="http://www.robots.ox.ac.uk/ActiveVision/Research/Projects/2009bbenfold_headpose/project.html">Oxford Town Centre dataset</a> has been used in over 80 verified research projects including commercial research by Amazon, Disney, OSRAM, and Huawei; and academic research in China, Israel, Russia, Singapore, the US, and Germany among dozens more.</p>
<p>The Oxford Town Centre dataset is unique in that it uses footage from a public surveillance camera that would otherwise be designated for public safety. The video shows that the pedestrians act normally and unrehearsed indicating they neither knew of nor consented to participation in the research project.</p>
</section><section>
<h3>Who used TownCentre?</h3>
@@ -101,10 +101,10 @@
<section>
- <h3>Information Supply chain</h3>
+ <h3>Information Supply Chain</h3>
<p>
- To help understand how TownCentre has been used around the world by commercial, military, and academic organizations; existing publicly available research citing Oxford Town Centre was collected, verified, and geocoded to show the biometric trade routes of people appearing in the images. Click on the markers to reveal research projects at that location.
+ To help understand how TownCentre has been used around the world by commercial, military, and academic organizations; existing publicly available research citing Oxford Town Centre was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.
</p>
</section>
@@ -119,7 +119,7 @@
<li class="com">Commercial</li>
<li class="gov">Military / Government</li>
</ul>
- <div class="source">Citation data is collected using <a href="https://semanticscholar.org" target="_blank">SemanticScholar.org</a> then dataset usage verified and geolocated.</div >
+ <div class="source">Citation data is collected using SemanticScholar.org then dataset usage verified and geolocated. Citations are used to provide overview of how and where images were used.</div>
</div>
diff --git a/site/public/datasets/pipa/index.html b/site/public/datasets/pipa/index.html
index 95b288fb..9c0a974a 100644
--- a/site/public/datasets/pipa/index.html
+++ b/site/public/datasets/pipa/index.html
@@ -55,8 +55,7 @@
</header>
<div class="content content-dataset">
- <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/pipa/assets/background.jpg)'><div class='inner'><div class='hero_desc'><span class='bgpad'>PIPA ...</span></div><div class='hero_subdesc'><span class='bgpad'>PIPA ...
-</span></div></div></section><section><h2>MegaFace</h2>
+ <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/pipa/assets/background.jpg)'></section><section><h2>PIPA: People in Photo Albums</h2>
</section><section><div class='right-sidebar'><div class='meta'>
<div class='gray'>Published</div>
<div>2015</div>
@@ -97,10 +96,10 @@
<section>
- <h3>Information Supply chain</h3>
+ <h3>Information Supply Chain</h3>
<p>
- To help understand how PIPA Dataset has been used around the world by commercial, military, and academic organizations; existing publicly available research citing People in Photo Albums Dataset was collected, verified, and geocoded to show the biometric trade routes of people appearing in the images. Click on the markers to reveal research projects at that location.
+ To help understand how PIPA Dataset has been used around the world by commercial, military, and academic organizations; existing publicly available research citing People in Photo Albums Dataset was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.
</p>
</section>
@@ -115,7 +114,7 @@
<li class="com">Commercial</li>
<li class="gov">Military / Government</li>
</ul>
- <div class="source">Citation data is collected using <a href="https://semanticscholar.org" target="_blank">SemanticScholar.org</a> then dataset usage verified and geolocated.</div >
+ <div class="source">Citation data is collected using SemanticScholar.org then dataset usage verified and geolocated. Citations are used to provide overview of how and where images were used.</div>
</div>
diff --git a/site/public/datasets/uccs/index.html b/site/public/datasets/uccs/index.html
index 2dcf88a1..8cc11c90 100644
--- a/site/public/datasets/uccs/index.html
+++ b/site/public/datasets/uccs/index.html
@@ -55,8 +55,8 @@
</header>
<div class="content content-dataset">
- <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/uccs/assets/background.jpg)'><div class='inner'><div class='hero_desc'><span class='bgpad'><span class="dataset-name">UnConstrained College Students</span> is a dataset of long-range surveillance photos of students on University of Colorado in Colorado Springs campus</span></div><div class='hero_subdesc'><span class='bgpad'>The UnConstrained College Students dataset includes 16,149 images of 1,732 students, faculty, and pedestrians and is used for developing face recognition and face detection algorithms
-</span></div></div></section><section><h2>UnConstrained College Students</h2>
+ <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/uccs/assets/background.jpg)'></section><section><div class='image'><div class='intro-caption caption'>One of 16,149 images form the UnConstrained College Students face recognition dataset captured at University of Colorado, Colorado Springs</div></div></section><section><h1>UnConstrained College Students</h1>
+<p><em>Update: In response to this report and its previous publication of metadata from UCCS dataset photos, UCCS has temporarily suspended its dataset, but plans to release a new version.</em></p>
</section><section><div class='right-sidebar'><div class='meta'>
<div class='gray'>Images</div>
<div>16,149 </div>
@@ -75,7 +75,8 @@
</div><div class='meta'>
<div class='gray'>Website</div>
<div><a href='http://vast.uccs.edu/Opensetface/' target='_blank' rel='nofollow noopener'>uccs.edu</a></div>
- </div></div><p>UnConstrained College Students (UCCS) is a dataset of long-range surveillance photos captured at University of Colorado Colorado Springs developed primarily for research and development of "face detection and recognition research towards surveillance applications"<a class="footnote_shim" name="[^uccs_vast]_1"> </a><a href="#[^uccs_vast]" class="footnote" title="Footnote 1">1</a>. According to the authors of <a href="https://www.semanticscholar.org/paper/Unconstrained-Face-Detection-and-Open-Set-Face-G%C3%BCnther-Hu/d4f1eb008eb80595bcfdac368e23ae9754e1e745">two</a> <a href="https://www.semanticscholar.org/paper/Large-scale-unconstrained-open-set-face-database-Sapkota-Boult/07fcbae86f7a3ad3ea1cf95178459ee9eaf77cb1">papers</a> associated with the dataset, over 1,700 students and pedestrians were "photographed using a long-range high-resolution surveillance camera without their knowledge".<a class="footnote_shim" name="[^funding_uccs]_1"> </a><a href="#[^funding_uccs]" class="footnote" title="Footnote 3">3</a> This analysis examines the <a href="http://vast.uccs.edu/Opensetface/">UCCS dataset</a> contents of the <a href="">dataset</a>, its funding sources, timestamp data, and information from publicly available research project citations.</p>
+ </div></div><p>UnConstrained College Students (UCCS) is a dataset of long-range surveillance photos captured at University of Colorado Colorado Springs developed primarily for research and development of "face detection and recognition research towards surveillance applications"<a class="footnote_shim" name="[^uccs_vast]_1"> </a><a href="#[^uccs_vast]" class="footnote" title="Footnote 1">1</a>.</p>
+<p>According to the authors of <a href="https://www.semanticscholar.org/paper/Unconstrained-Face-Detection-and-Open-Set-Face-G%C3%BCnther-Hu/d4f1eb008eb80595bcfdac368e23ae9754e1e745">two</a> <a href="https://www.semanticscholar.org/paper/Large-scale-unconstrained-open-set-face-database-Sapkota-Boult/07fcbae86f7a3ad3ea1cf95178459ee9eaf77cb1">papers</a> associated with the dataset, over 1,700 students and pedestrians were "photographed using a long-range high-resolution surveillance camera without their knowledge".<a class="footnote_shim" name="[^funding_uccs]_1"> </a><a href="#[^funding_uccs]" class="footnote" title="Footnote 3">3</a> This analysis examines the <a href="http://vast.uccs.edu/Opensetface/">UCCS dataset</a> contents of the <a href="">dataset</a>, its funding sources, timestamp data, and information from publicly available research project citations.</p>
<p>The UCCS dataset includes over 1,700 unique identities, most of which are students walking to and from class. In 2018, it was the "largest surveillance [face recognition] benchmark in the public domain."<a class="footnote_shim" name="[^surv_face_qmul]_1"> </a><a href="#[^surv_face_qmul]" class="footnote" title="Footnote 4">4</a> The photos were taken during the spring semesters of 2012 &ndash; 2013 on the West Lawn of the University of Colorado Colorado Springs campus. The photographs were timed to capture students during breaks between their scheduled classes in the morning and afternoon during Monday through Thursday. "For example, a student taking Monday-Wednesday classes at 12:30 PM will show up in the camera on almost every Monday and Wednesday."<a class="footnote_shim" name="[^sapkota_boult]_1"> </a><a href="#[^sapkota_boult]" class="footnote" title="Footnote 2">2</a>.</p>
</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/uccs/assets/uccs_map_aerial.jpg' alt=' The location at University of Colorado Colorado Springs where students were surreptitiously photographed with a long-range surveillance camera for use in a defense and intelligence agency funded research project on face recognition. Image: Google Maps'><div class='caption'> The location at University of Colorado Colorado Springs where students were surreptitiously photographed with a long-range surveillance camera for use in a defense and intelligence agency funded research project on face recognition. Image: Google Maps</div></div></section><section><p>The long-range surveillance images in the UnConsrained College Students dataset were taken using a Canon 7D 18-megapixel digital camera fitted with a Sigma 800mm F5.6 EX APO DG HSM telephoto lens and pointed out an office window across the university's West Lawn. The students were photographed from a distance of approximately 150 meters through an office window. "The camera [was] programmed to start capturing images at specific time intervals between classes to maximize the number of faces being captured."<a class="footnote_shim" name="[^sapkota_boult]_2"> </a><a href="#[^sapkota_boult]" class="footnote" title="Footnote 2">2</a>
Their setup made it impossible for students to know they were being photographed, providing the researchers with realistic surveillance images to help build face recognition systems for real world applications for defense, intelligence, and commercial partners.</p>
@@ -107,10 +108,10 @@ Their setup made it impossible for students to know they were being photographed
<section>
- <h3>Information Supply chain</h3>
+ <h3>Information Supply Chain</h3>
<p>
- To help understand how UCCS has been used around the world by commercial, military, and academic organizations; existing publicly available research citing UnConstrained College Students Dataset was collected, verified, and geocoded to show the biometric trade routes of people appearing in the images. Click on the markers to reveal research projects at that location.
+ To help understand how UCCS has been used around the world by commercial, military, and academic organizations; existing publicly available research citing UnConstrained College Students Dataset was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.
</p>
</section>
@@ -125,7 +126,7 @@ Their setup made it impossible for students to know they were being photographed
<li class="com">Commercial</li>
<li class="gov">Military / Government</li>
</ul>
- <div class="source">Citation data is collected using <a href="https://semanticscholar.org" target="_blank">SemanticScholar.org</a> then dataset usage verified and geolocated.</div >
+ <div class="source">Citation data is collected using SemanticScholar.org then dataset usage verified and geolocated. Citations are used to provide overview of how and where images were used.</div>
</div>
diff --git a/site/public/datasets/who_goes_there/index.html b/site/public/datasets/who_goes_there/index.html
index a00fd151..0d19da0b 100644
--- a/site/public/datasets/who_goes_there/index.html
+++ b/site/public/datasets/who_goes_there/index.html
@@ -55,8 +55,7 @@
</header>
<div class="content content-dataset">
- <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/who_goes_there/assets/background.jpg)'><div class='inner'><div class='hero_desc'><span class='bgpad'>Who Goes There Dataset</span></div><div class='hero_subdesc'><span class='bgpad'>Who Goes There (page under development)
-</span></div></div></section><section><h2>Who Goes There</h2>
+ <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/who_goes_there/assets/background.jpg)'></section><section><h2>Who Goes There</h2>
</section><section><div class='right-sidebar'></div><p>[ page under development ]</p>
</section><section>
<h3>Who used Who Goes There Dataset?</h3>
@@ -79,10 +78,10 @@
<section>
- <h3>Information Supply chain</h3>
+ <h3>Information Supply Chain</h3>
<p>
- To help understand how Who Goes There Dataset has been used around the world by commercial, military, and academic organizations; existing publicly available research citing WhoGoesThere was collected, verified, and geocoded to show the biometric trade routes of people appearing in the images. Click on the markers to reveal research projects at that location.
+ To help understand how Who Goes There Dataset has been used around the world by commercial, military, and academic organizations; existing publicly available research citing WhoGoesThere was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.
</p>
</section>
@@ -97,7 +96,7 @@
<li class="com">Commercial</li>
<li class="gov">Military / Government</li>
</ul>
- <div class="source">Citation data is collected using <a href="https://semanticscholar.org" target="_blank">SemanticScholar.org</a> then dataset usage verified and geolocated.</div >
+ <div class="source">Citation data is collected using SemanticScholar.org then dataset usage verified and geolocated. Citations are used to provide overview of how and where images were used.</div>
</div>
diff --git a/site/public/research/index.html b/site/public/research/index.html
index f4f90531..2fb87df3 100644
--- a/site/public/research/index.html
+++ b/site/public/research/index.html
@@ -60,7 +60,7 @@
<a href='/research/munich_security_conference/'><section class='wide' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/background.jpg);' />
<section>
<h4><span class='bgpad'>28 June 2019</span></h4>
- <h2><span class='bgpad'>Analyzing Transnational Flows of Face Recognition Image Training Data</span></h2>
+ <h2><span class='bgpad'>Transnational Flows of Face Recognition Image Training Data</span></h2>
<h3><span class='bgpad'>Where does face data originate and who's using it?</span></h3>
<h4 class='readmore'><span class='bgpad'>Read more...</span></h4>
</section>
diff --git a/site/public/research/munich_security_conference/index.html b/site/public/research/munich_security_conference/index.html
index fc44bfd8..3b18f1cd 100644
--- a/site/public/research/munich_security_conference/index.html
+++ b/site/public/research/munich_security_conference/index.html
@@ -53,28 +53,24 @@
<a href="/research">Research</a>
</div>
</header>
- <div class="content content-dataset">
+ <div class="content content-blog">
- <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/background.jpg)'><div class='inner'><div class='hero_desc'><span class='bgpad'>Transnational Flows of Face Recognition Image Training Data</span></div><div class='hero_subdesc'><span class='bgpad'>Where does face data originate and who's using it?
-</span></div></div></section><section><p><em>A case study on publicly available facial recognition datasets for the Munich Security Conference's Transnational Security Report</em></p>
-</section><section><div class='right-sidebar'><div class='meta'><div class='gray'>Images Analyzed</div><div>24,302,637</div></div><div class='meta'><div class='gray'>Datasets Analyzed</div><div>30</div></div><div class='meta'><div class='gray'>Years</div><div>2006 - 2018</div></div><div class='meta'><div class='gray'>Last Updated</div><div>July 7, 2019</div></div><div class='meta'><div class='gray'>Text and Research</div><div>Adam Harvey</div></div></div><p>National AI strategies often rely on transnational data sources to capitalize on recent advancements in deep learning and neural networks. Researchers benefiting from these transnational data flows can yield quick and significant gains across diverse sectors from health care to biometrics. But new challenges emerge when national AI strategies collide with national interests.</p>
+ <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/background.jpg)'></section><section><div class='image'><div class='intro-caption caption'>An image from the MegaFace face recognition training dataset taken from the U.S. Embassy of Madrid Flickr account</div></div></section><section><h1>Transnational Flows of Face Recognition Image Training Data</h1>
+<p><em>A case study on publicly available facial recognition datasets for the Munich Security Conference's Transnational Security Report</em></p>
+</section><section><div class='right-sidebar'><div class='meta'><div class='gray'>Images Analyzed</div><div>24,302,637</div></div><div class='meta'><div class='gray'>Datasets Analyzed</div><div>30</div></div><div class='meta'><div class='gray'>Years</div><div>2006 - 2018</div></div><div class='meta'><div class='gray'>Last Updated</div><div>July 7, 2019</div></div><div class='meta'><div class='gray'>Text and Research</div><div>Adam Harvey</div></div><div class='meta'><div class='gray'>Published in</div><div><a href="https://tsr.securityconference.de/">Transnational Security Report</a></div></div></div><p>National AI strategies often rely on transnational data sources to capitalize on recent advancements in deep learning and neural networks. Researchers benefiting from these transnational data flows can yield quick and significant gains across diverse sectors from health care to biometrics. But new challenges emerge when national AI strategies collide with national interests.</p>
<p>Our <a href="https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d43599e">earlier research</a> on the <a href="/datasets/msceleb">MS Celeb</a> and <a href="/datasets/duke_mtmc">Duke</a> datasets published with the Financial Times revealed that several computer vision image datasets created by US companies and universities were unexpectedly also used for research by the National University of Defense Technology in China, along with top Chinese surveillance firms including SenseTime, SenseNets, CloudWalk, Hikvision, and Megvii/Face++ which have all been linked to oppressive surveillance in the Xinjiang region of China.</p>
<p>In this new research for the <a href="https://tsr.securityconference.de">Munich Security Conference's Transnational Security Report</a> we provide summary statistics about the origins and endpoints of facial recognition information supply chains. To make it more personal, we gathered additional data on the number of public photos from embassies that are currently being used in facial recognition training datasets.</p>
-<div style="display:inline;" class="columns columns-1"><div class="column"><div style="background:#202020;border-radius:6px;padding:20px;width:100%">
-
-<h4>Key Findings</h4>
-
+<h3>Key Findings</h3>
<ul>
- <li>24 million non-cooperative images were used in facial recognition research projects</li>
- <li>Most data originated from US-based search engines and Flickr, but most research citations found in China</li>
- <li>Over 6,000 of the images were from US, British, Italian, and French embassies (mostly US embassies)</li>
- <li>Images were used for commercial research by Google (US), Microsoft (US), SenseTime (China), Tencent (China), Mitsubishi (Japan), ExpertSystems (Italy), Siren Solution (Ireland), and Paradigma Digital (Spain); and military research by National University of Defense Technology (China)</li>
+<li>24 million non-cooperative images were used in facial recognition research prects</li>
+<li>Most data originated from US-based search engines and Flickr, but most research citations found in China</li>
+<li>Over 6,000 of the images were from US, British, Italian, and French embassies (mostly US embassies)</li>
+<li>Images were used for commercial research by Google (US), Microsoft (US), SenseTime (China), Tencent (China), Mitsubishi (Japan), ExpertSystems (Italy), Siren Solution (Ireland), and Paradigma Digital (Spain); and military research by National University of Defense Technology (China)</li>
</ul>
-
-</div></div></div><h3>24 Million Photos</h3>
+<h3>24 Million Photos</h3>
<p><strong>Origins</strong>: In total, we found over 24 million non-cooperative, non-consensual photos in 30 publicly available face recognition and face analysis datasets. Of these 24 million images, over 15 million face images are from Internet search engines, over 5.8 million from Flickr.com, over 2.5 million from the Internet Movie Database (IMDb.com), and nearly 500,000 from CCTV footage. All 24 million images were collected without any explicit consent, a type of face image that researchers call "in the wild". Every image contains at least one face and many photos contain multiple faces. There are approximately 1 million unique identities across all 24 million images.</p>
<p><strong>Endpoints</strong>:To understand the geographic dimensions of the data, we manually verified 1,134 publicly available research papers that cite these datasets to determine who was using the face data and where it was being used. Even though the vast majority of the images originated in the United States or from US companies, publicly available research papers show that only about 25% of the citations are from the United States while the majority are from China. Because only English research papers were analyzed the number of foreign research papers is likely to be larger and reflect increased foreign usage.</p>
-</section><section><div class='columns columns-2'><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/megapixels_origins_top.csv", "fields": ["Caption: Origins of 24.3 million photos in publicly available face analysis datasets 2006 - 2018", "Top: 10", "OtherLabel: Other"]}'></div></section><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/summary_countries.csv", "fields": ["Caption: Endpoints of 1,134 facial analysis research projects citing 30 face analysis datasets", "Top: 14", "OtherLabel: Other"]}'></div></section></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/7118211377.jpg' alt=''></div></section><section><h3>8,428 Embassy Photos Found in Facial Recognition Datasets</h3>
+</section><section><div class='columns columns-2'><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/megapixels_origins_top.csv", "fields": ["Caption: Origins of 24.3 million photos in publicly available face analysis datasets 2006 - 2018", "Top: 10", "OtherLabel: Other"]}'></div></section><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/summary_countries.csv", "fields": ["Caption: Endpoints of 1,134 facial analysis research projects citing 30 face analysis datasets", "Top: 14", "OtherLabel: Other"]}'></div></section></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/7118211377.jpg' alt=' A photo from the U.S Embassy in Tokyo found in a facial recognition training dataset'><div class='caption'> A photo from the U.S Embassy in Tokyo found in a facial recognition training dataset</div></div></section><section><h3>8,428 Embassy Photos Found in Facial Recognition Datasets</h3>
<p>Out of the 24 million images analyzed, at least 8,428 embassy images were found in face recognition and facial analysis datasets. These images were found by cross-referencing Flickr IDs and URLs between datasets to locate 5,667 images in the MegaFace dataset, 389 images in the IBM Diversity in Faces datasets, and 2,372 images in the Who Goes There dataset. MegaFace is one of the most widely used publicly available face recognition datasets for academic, commercial, and defense-related research.</p>
<p>In total, these 8,428 images were found to be used in at least 42 countries with most citations originating in China and most images originating from US embassies. The images were found to be used in research projects with links to commercial and defense organization including Google, Microsoft, National University of Defense Technology in China, SenseTime, Tencent, Mitsubishi, ExpertSystems (Italy), Siren Solution (Ireland), and Paradigma Digital (Spain).</p>
</section><section><div class='columns columns-2'><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/embassy_counts_summary_dataset.csv", "fields": ["Caption: Number of embassy photos incluced in each face recognition dataset", "Top: 4", "OtherLabel: Other", "Colors: categoryRainbow"]}'></div></section><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/country_counts.csv", "fields": ["Caption: Number of photos per national embassy", "Top: 4", "OtherLabel: Other", "Colors: categoryRainbow"]}'></div></section></div></section><section><p>The embassy and consulate photos below were all found in either the MegaFace or IBM Diversity in Faces datasets. Consulates were only included if marked as "EMBASSY" by the <a href="https://www.state.gov/global-social-media-presence/">U.S. Department of State’s Social Media Presence List</a>. Photos below were chosen because of inclusion of an embassy logo. All photos originated on Flickr.com and were published with a Creative Commons license.</p>