From 27340ac4cd43f8eec7414495b541a65566ae2656 Mon Sep 17 00:00:00 2001 From: adamhrv Date: Tue, 8 Oct 2019 16:02:47 +0200 Subject: update site, white --- site/public/research/munich_security_conference/index.html | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'site/public/research/munich_security_conference/index.html') diff --git a/site/public/research/munich_security_conference/index.html b/site/public/research/munich_security_conference/index.html index fc44bfd8..b43df151 100644 --- a/site/public/research/munich_security_conference/index.html +++ b/site/public/research/munich_security_conference/index.html @@ -55,9 +55,8 @@
-
Transnational Flows of Face Recognition Image Training Data
Where does face data originate and who's using it? -

A case study on publicly available facial recognition datasets for the Munich Security Conference's Transnational Security Report

-

National AI strategies often rely on transnational data sources to capitalize on recent advancements in deep learning and neural networks. Researchers benefiting from these transnational data flows can yield quick and significant gains across diverse sectors from health care to biometrics. But new challenges emerge when national AI strategies collide with national interests.

+

A case study on publicly available facial recognition datasets for the Munich Security Conference's Transnational Security Report

+

National AI strategies often rely on transnational data sources to capitalize on recent advancements in deep learning and neural networks. Researchers benefiting from these transnational data flows can yield quick and significant gains across diverse sectors from health care to biometrics. But new challenges emerge when national AI strategies collide with national interests.

Our earlier research on the MS Celeb and Duke datasets published with the Financial Times revealed that several computer vision image datasets created by US companies and universities were unexpectedly also used for research by the National University of Defense Technology in China, along with top Chinese surveillance firms including SenseTime, SenseNets, CloudWalk, Hikvision, and Megvii/Face++ which have all been linked to oppressive surveillance in the Xinjiang region of China.

In this new research for the Munich Security Conference's Transnational Security Report we provide summary statistics about the origins and endpoints of facial recognition information supply chains. To make it more personal, we gathered additional data on the number of public photos from embassies that are currently being used in facial recognition training datasets.

-- cgit v1.2.3-70-g09d2 From 59fbaaa9eb5539b89a6ed43682a8074356b1366d Mon Sep 17 00:00:00 2001 From: adamhrv Date: Tue, 8 Oct 2019 18:39:21 +0200 Subject: styling --- TODO.md | 20 +++++++---- client/modalImage/modal.css | 1 + site/assets/css/css.css | 39 +++++++++++++++------- site/content/pages/about/index.md | 18 +++++----- site/content/pages/datasets/brainwash/index.md | 4 +-- site/content/pages/datasets/index.md | 2 +- site/content/pages/datasets/megaface/index.md | 37 ++++++++++++++++++-- .../research/munich_security_conference/index.md | 22 +++++------- site/public/about/index.html | 18 +++++----- site/public/datasets/brainwash/index.html | 4 +-- site/public/datasets/index.html | 2 +- site/public/datasets/megaface/index.html | 22 ++++++++++-- .../research/munich_security_conference/index.html | 23 ++++++------- 13 files changed, 140 insertions(+), 72 deletions(-) (limited to 'site/public/research/munich_security_conference/index.html') diff --git a/TODO.md b/TODO.md index 90de6790..da90ab57 100644 --- a/TODO.md +++ b/TODO.md @@ -1,14 +1,22 @@ # TODO -## CSS +## Updates for NYT October 13/14 -- change font size in Tabulator to 12px (can't find where to edit it) +### Charts -## Charts, JS +- fix background hover on bar graph chart? remove extra black bg -- can we make the age/gender all in one include? -- can we auto-add download links to age/gender csv? -- can the pie chart labels keep same order as in CSV? +### CSS +- change tabulator to white/gry +- add a markdown parser? to handle a new update blurb at top of dataset pages. For now I used CSS selectors, but seems brittle +- can you check mobile css? my white-style edits might have broken +### MegaFace Dataset + +- origin needs to be fixed (in the ocean) + +### Homepage + +- feel like we need a "face" diff --git a/client/modalImage/modal.css b/client/modalImage/modal.css index cc9a1f32..c8ef9b60 100644 --- a/client/modalImage/modal.css +++ b/client/modalImage/modal.css @@ -32,6 +32,7 @@ display: block; text-align: center; /*background: black;*/ + color: #FFF; padding: 10px; } .modal .prev span, diff --git a/site/assets/css/css.css b/site/assets/css/css.css index 75f1ad3f..ae22fa1a 100644 --- a/site/assets/css/css.css +++ b/site/assets/css/css.css @@ -156,7 +156,7 @@ footer { display: flex; flex-direction: row; justify-content: space-between; - color: #000; + color: #ccc; font-size: 13px; /*line-height: 17px;*/ padding: 15px; @@ -179,6 +179,9 @@ footer a { padding-bottom: 1px; text-decoration: none; } +.desktop footer a { + border-bottom:1px solid #999; +} .desktop footer a:hover { color: #fff; border-bottom:1px solid #999; @@ -214,7 +217,8 @@ footer ul:last-child li { h1 { color: #000; font-weight: 500; - font-size: 30pt; + font-size: 28pt; + line-height: 38pt; margin: 20px auto 10px auto; padding: 0; transition: color 0.1s cubic-bezier(0,0,1,1); @@ -382,12 +386,24 @@ section h1, section h2, section h3, section h4, section h5, section h6, section } .content-dataset section:nth-child(4) p:nth-child(2){ font-size:20px; - line-height: 32px; + line-height: 34px; color:#000; } .content-dataset section:nth-child(3) p:nth-child(2) { + /* highlight news text */ + /*font-style: italic;*/ + font-weight: 500; + color:#f00; +} +.content-dataset section:nth-child(3) p:nth-child(2) a{ + /* highlight news text */ + color:#f00; + border-bottom: 1px solid #f00; +} +.content-dataset section:nth-child(3) p:nth-child(2) a:hover{ /* highlight news text */ color:#f00; + border-bottom: 1px solid #f00; } p.subp{ font-size: 14px; @@ -492,15 +508,14 @@ p.subp{ /* lists */ ul { - list-style-type: none; + list-style-type: square; margin: 0 0 30px 0; padding: 0; } ul li { margin-bottom: 8px; - color: #333; font-weight: 400; - font-size: 14px; + font-size: 15px; } /* misc formatting */ @@ -626,8 +641,8 @@ ul.footnotes p { font-family: 'Roboto Mono', monospace; font-weight: 400; text-transform: uppercase; - color: #666; - font-size: 11pt; + color: #333; + font-size: 12pt; } /* images */ @@ -1154,8 +1169,8 @@ ul.map-legend li.source:before { font-weight: 300; } .content-about section:first-of-type > p:first-of-type { - font-size: 22px; - line-height: 40px; + font-size: 26px; + line-height: 42px; } .content-about .about-menu ul li { display: inline-block; @@ -1258,13 +1273,13 @@ ul.map-legend li.source:before { /* footnotes */ a.footnote { - font-size: 9px; + font-size: 10px; line-height: 0px; position: relative; /*display: inline-block;*/ bottom: 7px; text-decoration: none; - color: #666; + color: #333; border: 0; left: -1px; transition-duration: 0s; diff --git a/site/content/pages/about/index.md b/site/content/pages/about/index.md index 90072b37..f07a79ee 100644 --- a/site/content/pages/about/index.md +++ b/site/content/pages/about/index.md @@ -35,8 +35,13 @@ MegaPixels is an independent project, designed as a public resource for educator A dataset of verified geocoded citations and dataset statistics will be published in Fall 2019 along with a research paper as part of a research fellowship for [KIM (Critical Artificial Intelligence) Karlsruhe HfG](http://kim.hfg-karlsruhe.de/). +#### Team -### Selected News and Exhibitions +- [Adam Harvey](https://ahprojects.com): Concept, research and analysis, design, computer vision +- [Jules LaPlace](https://asdf.us): Information and systems architecture, data management, citation geocoding, web applications + + +### News and Publications - July 2019: New York Times writes about MegaPixels and how "[Facial Recognition Tech Is Growing Stronger, Thanks to Your Face](https://www.nytimes.com/2019/07/13/technology/databases-faces-facial-recognition-technology.html)" - June 2019 - 2020: MegaPixels installation at Ars Electronica Center (AT) exhibition ["Compass - Navigating the Future"](https://ars.electronica.art/center/en/megapixels) @@ -46,18 +51,13 @@ A dataset of verified geocoded citations and dataset statistics will be publishe Read more [news](/about/news) -##### Team - -- Adam Harvey: Concept, research and analysis, design, computer vision -- Jules LaPlace: Information and systems architecture, data management, web applications - -##### Contributing Researchers +#### Contributing Researchers - Beth (aka Ms. Celeb) - Berit Gilma - Mathana Stender -##### Code and Libraries +#### Code and Libraries - [Semantic Scholar](https://semanticscholar.org) for citation aggregation - Leaflet.js for maps @@ -66,7 +66,7 @@ Read more [news](/about/news) - PDFMiner.Six and Pandas for research paper analysis -##### Attribution +#### Attribution If you use MegaPixels or any data derived from it for your work, please cite our original work as follows: diff --git a/site/content/pages/datasets/brainwash/index.md b/site/content/pages/datasets/brainwash/index.md index 6d2279cb..a61c007c 100644 --- a/site/content/pages/datasets/brainwash/index.md +++ b/site/content/pages/datasets/brainwash/index.md @@ -4,7 +4,7 @@ status: published title: Brainwash Dataset desc: Brainwash is a dataset of webcam images taken from the Brainwash Cafe in San Francisco subdesc: It includes 11,917 images of "everyday life of a busy downtown cafe" and is used for training face and head detection algorithms -caption: One of 11,917 images from the Brainwash dataset captured from the Brainwash Cafe in San Francisco +caption: One of the 11,917 images in the Brainwash dataset captured from the Brainwash Cafe in San Francisco slug: brainwash cssclass: dataset image: assets/background.jpg @@ -17,7 +17,7 @@ authors: Adam Harvey # Brainwash Dataset -*Update: In response to the publication of this report, the Brainwash dataset has been "removed from access at the request of the depositor."* +Update: In response to the publication of this report, the Brainwash dataset has been "removed from access at the request of the depositor." ### sidebar diff --git a/site/content/pages/datasets/index.md b/site/content/pages/datasets/index.md index 54912242..f3d5fea0 100644 --- a/site/content/pages/datasets/index.md +++ b/site/content/pages/datasets/index.md @@ -16,4 +16,4 @@ sync: false Explore face and person recognition datasets contributing to the growing crisis of biometric surveillance technologies. This first group of 5 datasets focuses on image usage connected to foreign surveillance and defense organizations. -In response to the analyses below, the [Brainwash](https://purl.stanford.edu/sx925dc9385), [Duke MTMC](http://vision.cs.duke.edu/DukeMTMC/), and [MS Celeb](http://msceleb.org/) datasets have been taken down by their authors. The [UCCS](https://vast.uccs.edu/Opensetface/) dataset was temporarily deactivated due to metadata exposure. Read more [news](/about/news). A more complete list of datasets and research will be published in September 2019. These 5 are only a preview. +In response to the analyses below, the [Brainwash](/datasets/brainwash), [Duke MTMC](/datasets/duke_mtmc), and [MS Celeb](/datasets/msceleb/) datasets have been taken down by their authors. The [UCCS](/dataests/uccs/) dataset was temporarily deactivated due to metadata exposure. Read more [news](/about/news). A more complete list of datasets and research will be published in September 2019. These 5 are only a preview. diff --git a/site/content/pages/datasets/megaface/index.md b/site/content/pages/datasets/megaface/index.md index 2009e70e..9c282cb2 100644 --- a/site/content/pages/datasets/megaface/index.md +++ b/site/content/pages/datasets/megaface/index.md @@ -7,6 +7,7 @@ subdesc: MegaFace contains 670K identities and 4.7M images caption: Example images from the MegaFace dataset slug: megaface cssclass: dataset +caption: Images from the MegaFace face recognition training and benchmarking dataset image: assets/background.jpg year: 2016 published: 2019-4-18 @@ -15,12 +16,44 @@ authors: Adam Harvey ------------ -## MegaFace +# MegaFace ### sidebar ### end sidebar -MegaFace is a dataset... +MegaFace is a dataset of 4,700,000 face images of 672,000 individuals used for developing face recognition technologies. All images were downloaded from Flickr. + +#### How was it made + +MegaFace was developed by the University of Washington for the purpose of trainng, validating, and benchmarking face recognition algorithms. + +The images are from Flickr, but are they all from YFCC100M? + +#### Who used it + +MegaFace was used for research projects associated with SenseTime, Google, Mitsubishi, Vision Semantics Ltd, Microsoft. + +#### Subsets + +MegaFace was also used for MegaFace Asian, and MegaAge, and glasses. + +#### A sample of the research projects + +Used for face recognition + +screenshots of papers + +#### Visuals + +- facial landmarks +- bounding boxes +- animation of all the titles of the paper +- + +### + + + {% include 'dashboard.html' %} diff --git a/site/content/pages/research/munich_security_conference/index.md b/site/content/pages/research/munich_security_conference/index.md index 365ee404..75392dc3 100644 --- a/site/content/pages/research/munich_security_conference/index.md +++ b/site/content/pages/research/munich_security_conference/index.md @@ -5,7 +5,8 @@ title: Transnational Flows of Face Recognition Image Training Data slug: munich-security-conference desc: Transnational Flows of Face Recognition Image Training Data subdesc: Where does face data originate and who's using it? -cssclass: dataset +caption: An image from the MegaFace face recognition training dataset taken from the U.S. Embassy of Madrid Flickr account +cssclass: blog image: assets/background.jpg published: 2019-6-28 updated: 2019-6-29 @@ -13,6 +14,7 @@ authors: Adam Harvey ------------ +# Transnational Flows of Face Recognition Image Training Data *A case study on publicly available facial recognition datasets for the Munich Security Conference's Transnational Security Report* @@ -33,19 +35,13 @@ Our [earlier research](https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d In this new research for the [Munich Security Conference's Transnational Security Report](https://tsr.securityconference.de) we provide summary statistics about the origins and endpoints of facial recognition information supply chains. To make it more personal, we gathered additional data on the number of public photos from embassies that are currently being used in facial recognition training datasets. -
-

Key Findings

- -
    -
  • 24 million non-cooperative images were used in facial recognition research projects
  • -
  • Most data originated from US-based search engines and Flickr, but most research citations found in China
  • -
  • Over 6,000 of the images were from US, British, Italian, and French embassies (mostly US embassies)
  • -
  • Images were used for commercial research by Google (US), Microsoft (US), SenseTime (China), Tencent (China), Mitsubishi (Japan), ExpertSystems (Italy), Siren Solution (Ireland), and Paradigma Digital (Spain); and military research by National University of Defense Technology (China)
  • -
- -
+### Key Findings +- 24 million non-cooperative images were used in facial recognition research prects +- Most data originated from US-based search engines and Flickr, but most research citations found in China +- Over 6,000 of the images were from US, British, Italian, and French embassies (mostly US embassies) +- Images were used for commercial research by Google (US), Microsoft (US), SenseTime (China), Tencent (China), Mitsubishi (Japan), ExpertSystems (Italy), Siren Solution (Ireland), and Paradigma Digital (Spain); and military research by National University of Defense Technology (China) ### 24 Million Photos @@ -74,7 +70,7 @@ OtherLabel: Other === end columns -![](assets/7118211377.jpg) +![caption: A photo from the U.S Embassy in Tokyo found in a facial recognition training dataset](assets/7118211377.jpg) ### 8,428 Embassy Photos Found in Facial Recognition Datasets diff --git a/site/public/about/index.html b/site/public/about/index.html index 427a97a2..e5a120d1 100644 --- a/site/public/about/index.html +++ b/site/public/about/index.html @@ -69,7 +69,12 @@

MegaPixels aims to provide a critical perspective on machine learning image datasets, one that might otherwise escape academia and industry funded artificial intelligence think tanks that are often supported by the same technology companies who created many of the datasets presented on this site.

MegaPixels is an independent project, designed as a public resource for educators, students, journalists, and researchers. Each dataset presented on this site undergoes a thorough review of its images, intent, and citations. MegaPixels is a website-first research project, with an academic publication to follow in fall 2019.

A dataset of verified geocoded citations and dataset statistics will be published in Fall 2019 along with a research paper as part of a research fellowship for KIM (Critical Artificial Intelligence) Karlsruhe HfG.

-

Selected News and Exhibitions

+

Team

+
    +
  • Adam Harvey: Concept, research and analysis, design, computer vision
  • +
  • Jules LaPlace: Information and systems architecture, data management, citation geocoding, web applications
  • +
+

News and Publications

Read more news

-
Team
-
    -
  • Adam Harvey: Concept, research and analysis, design, computer vision
  • -
  • Jules LaPlace: Information and systems architecture, data management, web applications
  • -
-
Contributing Researchers
+

Contributing Researchers

  • Beth (aka Ms. Celeb)
  • Berit Gilma
  • Mathana Stender
-
Code and Libraries
+

Code and Libraries

  • Semantic Scholar for citation aggregation
  • Leaflet.js for maps
  • @@ -96,7 +96,7 @@
  • ThreeJS for 3D visualizations
  • PDFMiner.Six and Pandas for research paper analysis
-
Attribution
+

Attribution

If you use MegaPixels or any data derived from it for your work, please cite our original work as follows:

 @online{megapixels,
diff --git a/site/public/datasets/brainwash/index.html b/site/public/datasets/brainwash/index.html
index d715d163..efd2f5a8 100644
--- a/site/public/datasets/brainwash/index.html
+++ b/site/public/datasets/brainwash/index.html
@@ -55,8 +55,8 @@
   
   
-
One of 11,917 images from the Brainwash dataset captured from the Brainwash Cafe in San Francisco

Brainwash Dataset

-

Update: In response to the publication of this report, the Brainwash dataset has been "removed from access at the request of the depositor."

+
One of the 11,917 images in the Brainwash dataset captured from the Brainwash Cafe in San Francisco

Brainwash Dataset

+

Update: In response to the publication of this report, the Brainwash dataset has been "removed from access at the request of the depositor."

Who used MegaFace Dataset?

diff --git a/site/public/research/munich_security_conference/index.html b/site/public/research/munich_security_conference/index.html index b43df151..3b18f1cd 100644 --- a/site/public/research/munich_security_conference/index.html +++ b/site/public/research/munich_security_conference/index.html @@ -53,27 +53,24 @@ Research
-
+
-

A case study on publicly available facial recognition datasets for the Munich Security Conference's Transnational Security Report

+
An image from the MegaFace face recognition training dataset taken from the U.S. Embassy of Madrid Flickr account

Transnational Flows of Face Recognition Image Training Data

+

A case study on publicly available facial recognition datasets for the Munich Security Conference's Transnational Security Report

National AI strategies often rely on transnational data sources to capitalize on recent advancements in deep learning and neural networks. Researchers benefiting from these transnational data flows can yield quick and significant gains across diverse sectors from health care to biometrics. But new challenges emerge when national AI strategies collide with national interests.

Our earlier research on the MS Celeb and Duke datasets published with the Financial Times revealed that several computer vision image datasets created by US companies and universities were unexpectedly also used for research by the National University of Defense Technology in China, along with top Chinese surveillance firms including SenseTime, SenseNets, CloudWalk, Hikvision, and Megvii/Face++ which have all been linked to oppressive surveillance in the Xinjiang region of China.

In this new research for the Munich Security Conference's Transnational Security Report we provide summary statistics about the origins and endpoints of facial recognition information supply chains. To make it more personal, we gathered additional data on the number of public photos from embassies that are currently being used in facial recognition training datasets.

-
- -

Key Findings

- +

Key Findings

    -
  • 24 million non-cooperative images were used in facial recognition research projects
  • -
  • Most data originated from US-based search engines and Flickr, but most research citations found in China
  • -
  • Over 6,000 of the images were from US, British, Italian, and French embassies (mostly US embassies)
  • -
  • Images were used for commercial research by Google (US), Microsoft (US), SenseTime (China), Tencent (China), Mitsubishi (Japan), ExpertSystems (Italy), Siren Solution (Ireland), and Paradigma Digital (Spain); and military research by National University of Defense Technology (China)
  • +
  • 24 million non-cooperative images were used in facial recognition research prects
  • +
  • Most data originated from US-based search engines and Flickr, but most research citations found in China
  • +
  • Over 6,000 of the images were from US, British, Italian, and French embassies (mostly US embassies)
  • +
  • Images were used for commercial research by Google (US), Microsoft (US), SenseTime (China), Tencent (China), Mitsubishi (Japan), ExpertSystems (Italy), Siren Solution (Ireland), and Paradigma Digital (Spain); and military research by National University of Defense Technology (China)
- -

24 Million Photos

+

24 Million Photos

Origins: In total, we found over 24 million non-cooperative, non-consensual photos in 30 publicly available face recognition and face analysis datasets. Of these 24 million images, over 15 million face images are from Internet search engines, over 5.8 million from Flickr.com, over 2.5 million from the Internet Movie Database (IMDb.com), and nearly 500,000 from CCTV footage. All 24 million images were collected without any explicit consent, a type of face image that researchers call "in the wild". Every image contains at least one face and many photos contain multiple faces. There are approximately 1 million unique identities across all 24 million images.

Endpoints:To understand the geographic dimensions of the data, we manually verified 1,134 publicly available research papers that cite these datasets to determine who was using the face data and where it was being used. Even though the vast majority of the images originated in the United States or from US companies, publicly available research papers show that only about 25% of the citations are from the United States while the majority are from China. Because only English research papers were analyzed the number of foreign research papers is likely to be larger and reflect increased foreign usage.

-

8,428 Embassy Photos Found in Facial Recognition Datasets

+
 A photo from the U.S Embassy in Tokyo found in a facial recognition training dataset
A photo from the U.S Embassy in Tokyo found in a facial recognition training dataset

8,428 Embassy Photos Found in Facial Recognition Datasets

Out of the 24 million images analyzed, at least 8,428 embassy images were found in face recognition and facial analysis datasets. These images were found by cross-referencing Flickr IDs and URLs between datasets to locate 5,667 images in the MegaFace dataset, 389 images in the IBM Diversity in Faces datasets, and 2,372 images in the Who Goes There dataset. MegaFace is one of the most widely used publicly available face recognition datasets for academic, commercial, and defense-related research.

In total, these 8,428 images were found to be used in at least 42 countries with most citations originating in China and most images originating from US embassies. The images were found to be used in research projects with links to commercial and defense organization including Google, Microsoft, National University of Defense Technology in China, SenseTime, Tencent, Mitsubishi, ExpertSystems (Italy), Siren Solution (Ireland), and Paradigma Digital (Spain).

The embassy and consulate photos below were all found in either the MegaFace or IBM Diversity in Faces datasets. Consulates were only included if marked as "EMBASSY" by the U.S. Department of State’s Social Media Presence List. Photos below were chosen because of inclusion of an embassy logo. All photos originated on Flickr.com and were published with a Creative Commons license.

-- cgit v1.2.3-70-g09d2