summaryrefslogtreecommitdiff
path: root/site/content/pages
diff options
context:
space:
mode:
Diffstat (limited to 'site/content/pages')
-rw-r--r--site/content/pages/datasets/msceleb/index.md4
-rw-r--r--site/content/pages/research/_introduction/index.md49
-rwxr-xr-xsite/content/pages/research/munich_security_conference/assets/embassy_counts_summary_dataset.csv8
-rwxr-xr-xsite/content/pages/research/munich_security_conference/assets/megapixels_origins_top.csv15
-rw-r--r--site/content/pages/research/munich_security_conference/index.md138
-rw-r--r--site/content/pages/test/csv.md2
6 files changed, 120 insertions, 96 deletions
diff --git a/site/content/pages/datasets/msceleb/index.md b/site/content/pages/datasets/msceleb/index.md
index 453c1522..0e457cd9 100644
--- a/site/content/pages/datasets/msceleb/index.md
+++ b/site/content/pages/datasets/msceleb/index.md
@@ -101,9 +101,9 @@ For example, on October 28, 2019, the MS Celeb dataset will be used for a new co
And in June, shortly after [posting](https://twitter.com/adamhrv/status/1134511293526937600) about the disappearance of the MS Celeb dataset, it reemerged on [Academic Torrents](https://academictorrents.com/details/9e67eb7cc23c9417f39778a8e06cca5e26196a97/tech). As of June 10, the MS Celeb dataset files have been redistributed in at least 9 countries and downloaded 44 times without any restrictions. The files were seeded and are mostly distributed by an AI company based in China called Hyper.ai, which states that it redistributes MS Celeb and other datasets for "teachers and students of service industry-related practitioners and research institutes."[^hyperai_readme]
-Earlier in 2019 images from the MS Celeb were also repackaged into another face dataset called *Racial Faces in the Wild (RFW)*. To create it, the RFW authors uploaded face images from the MS Celeb dataset to the Face++ API and used the inferred racial scores to segregate people into four subsets: Caucasian, Asian, Indian, and African each with 3,000 subjects. That dataset then appeared in a subsequent research project from researchers affiliated with IIIT-Delhi and IBM TJ Watson called [Deep Learning for Face Recognition: Pride or Prejudiced?](https://arxiv.org/abs/1904.01219), which aims to reduce bias but also inadvertently furthers racist language and ideologies that can not be repeated here.
+Earlier in 2019 images from the MS Celeb were also repackaged into another face dataset called *Racial Faces in the Wild (RFW)*. To create it, the RFW authors uploaded face images from the MS Celeb dataset to the Face++ API and used the inferred racial scores to segregate people into four subsets: Caucasian, Asian, Indian, and African each with 3,000 subjects. That dataset then appeared in a subsequent research project from researchers affiliated with IIIT-Delhi and IBM TJ Watson called [Deep Learning for Face Recognition: Pride or Prejudiced?](https://arxiv.org/abs/1904.01219), which aims to reduce bias but also inadvertently furthers racist ideologies, using discredited racial terminology that cannot be repeated here.
-The estimated racial scores for the MS Celeb face images used in the RFW dataset were computed using the Face++ API, which is owned by Megvii Inc, a company that has been repeatedly linked to the oppressive surveillance of Uighur Muslims in Xinjiang, China. According to posts from the [ChinAI Newsletter](https://chinai.substack.com/p/chinai-newsletter-11-companies-involved-in-expanding-chinas-public-security-apparatus-in-xinjiang) and [BuzzFeedNews](https://www.buzzfeednews.com/article/ryanmac/us-money-funding-facial-recognition-sensetime-megvii), Megvii announced in 2017 at the China-Eurasia Security Expo in Ürümqi, Xinjiang, that it would be the official technical support unit of the "Public Security Video Laboratory" in Xinjiang, China. If they didn't already, it's highly likely that Megvii has a copy of everyone's biometric faceprint from the MS Celeb dataset, either from uploads to the Face++ API or through the research projects explicitly referencing MS Celeb dataset usage, such as a 2018 paper called [GridFace: Face Rectification via Learning Local Homography Transformations](https://arxiv.org/pdf/1808.06210.pdf) jointly published by 3 authors, all of whom worked for Megvii.
+The estimated racial scores for the MS Celeb face images used in the RFW dataset were computed using the Face++ API, which is owned by Megvii Inc, a company that has been repeatedly linked to the oppressive surveillance of Uighur Muslims in Xinjiang, China. According to posts from the [ChinAI Newsletter](https://chinai.substack.com/p/chinai-newsletter-11-companies-involved-in-expanding-chinas-public-security-apparatus-in-xinjiang) and [BuzzFeedNews](https://www.buzzfeednews.com/article/ryanmac/us-money-funding-facial-recognition-sensetime-megvii), Megvii announced in 2017 at the China-Eurasia Security Expo in Ürümqi, Xinjiang, that it would be the official technical support unit of the "Public Security Video Laboratory" in Xinjiang, China. If they didn't already, it's highly likely that Megvii has a copy of everyone's biometric faceprint from the MS Celeb dataset, either from uploads to the Face++ API or through research projects explicitly referencing MS Celeb dataset usage, such as a 2018 paper called [GridFace: Face Rectification via Learning Local Homography Transformations](https://arxiv.org/pdf/1808.06210.pdf) jointly published by 3 authors, all of whom worked for Megvii.
## Commercial Usage
diff --git a/site/content/pages/research/_introduction/index.md b/site/content/pages/research/_introduction/index.md
new file mode 100644
index 00000000..bdf1c1b0
--- /dev/null
+++ b/site/content/pages/research/_introduction/index.md
@@ -0,0 +1,49 @@
+------------
+
+status: draft
+title: Introducing MegaPixels
+desc: Introduction to Megapixels
+slug: 00_introduction
+cssclass: dataset
+published: 2018-12-15
+updated: 2018-12-15
+authors: Adam Harvey
+
+------------
+
+# Introduction
+
+Face recognition has become the focal point for ...
+
+Add 68pt landmarks animation
+
+But biometric currency is ...
+
+Add rotation 3D head
+
+Inflationary...
+
+Add Theresea May 3D
+
+(comission for CPDP)
+
+Add info from the AI Traps talk
+
+
++ Posted: Dec. 15
++ Author: Adam Harvey
+
+
+
+```
+load_file /site/research/00_introduction/assets/summary_countries_top.csv
+Headings: country, Xcitations
+```
+
+Paragraph text to test css formatting. Paragraph text to test css formatting. Paragraph text to test css formatting. Paragraph text to test css formatting. Paragraph text to test css formatting.
+
+
+
+[ page under development ]
+
+![caption: This is the caption](assets/test.png) \ No newline at end of file
diff --git a/site/content/pages/research/munich_security_conference/assets/embassy_counts_summary_dataset.csv b/site/content/pages/research/munich_security_conference/assets/embassy_counts_summary_dataset.csv
index 89f3c226..3a439821 100755
--- a/site/content/pages/research/munich_security_conference/assets/embassy_counts_summary_dataset.csv
+++ b/site/content/pages/research/munich_security_conference/assets/embassy_counts_summary_dataset.csv
@@ -1,5 +1,5 @@
dataset,images
-ibm_dif,389
-megaface,5679
-vgg_face,1
-who_goes_there,2372
+IBM Diversity in Faces,389
+MegaFace,5679
+VGG Face,1
+Who Goes There,2372 \ No newline at end of file
diff --git a/site/content/pages/research/munich_security_conference/assets/megapixels_origins_top.csv b/site/content/pages/research/munich_security_conference/assets/megapixels_origins_top.csv
index 081b4636..ae6e8f11 100755
--- a/site/content/pages/research/munich_security_conference/assets/megapixels_origins_top.csv
+++ b/site/content/pages/research/munich_security_conference/assets/megapixels_origins_top.csv
@@ -1,9 +1,8 @@
source,images
-Search Engines,30127200
-Flickr.com,11783888
-IMDb.com,5251410
-CCTV,959312
-Wikimedia.org,183500
-Mugshots,113268
-YouTube.com,31888
-Other Sources Combined,37044
+Internet Search Engines,15063600
+Flickr.com,5891944
+Internet Movie Database (IMDB.com),2625705
+CCTV,479656
+Wikimedia.org,91750
+Mugshots,56634
+YouTube.com,15944 \ No newline at end of file
diff --git a/site/content/pages/research/munich_security_conference/index.md b/site/content/pages/research/munich_security_conference/index.md
index aba39b1c..c4c6a70c 100644
--- a/site/content/pages/research/munich_security_conference/index.md
+++ b/site/content/pages/research/munich_security_conference/index.md
@@ -1,19 +1,19 @@
------------
status: published
-title: MSC
+title: Transnational Flows of Face Recognition Image Training Data
slug: munich-security-conference
-desc: Analyzing the Transnational Flow of Facial Recognition Training Data
+desc: Analyzing Transnational Flows of Face Recognition Image Training Data
subdesc: Where does face data originate and who's using it?
cssclass: dataset
image: assets/background.jpg
-published: 2019-4-18
-updated: 2019-4-19
+published: 2019-6-28
+updated: 2019-6-29
authors: Adam Harvey
------------
-## Analysis for the Munich Security Conference Transnational Security Report
+## Face Datasets and Information Supply Chains
### sidebar
@@ -21,21 +21,30 @@ authors: Adam Harvey
+ Datasets Analyzed: 30
+ Years: 2006 - 2018
+ Status: Ongoing Investigation
-+ Last Updated: June 27, 2019
++ Last Updated: June 28, 2019
### end sidebar
+National AI strategies often rely on transnational data sources to capitalize on recent advancements in deep learning and neural networks. Researchers benefiting from these transnational data flows can yield quick and significant gains across diverse sectors from health care to biometrics. But new challenges emerge when national AI strategies collide with national interests.
-Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
+Our [earlier research](https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d43599e) on the [MS Celeb](/datasets/msceleb) and [Duke](/datasets/duke_mtmc) datasets published with the Financial Times revealed that several computer vision image datasets created by US companies and universities were unexpectedly also used for research by the National University of Defense Technology in China, along with top Chinese surveillance firms including SenseTime, SenseNets, CloudWalk, Hikvision, and Megvii/Face++ which have all been linked to the oppressive surveillance of Uighur Muslims in Xinjiang.
+
+In this new research for the [Munich Security Conference's Transnational Security Report](https://tsr.securityconference.de) we provide summary statistics about the origins and endpoints of facial recognition information supply chains. To make it more personal, we gathered additional data on the number of public photos from embassies that are currently being used in facial recognition datasets.
+
+
+### 24 Million Non-Cooperative Faces
+
+In total, we analyzed 30 publicly available face recognition and face analysis datasets that collectively include over 24 million non-cooperative images. Of these 24 million images, over 15 million face images are from Internet search engines, over 5.8 million from Flickr.com, over 2.5 million from the Internet Movie Database (IMDb.com), and nearly 500,000 from CCTV footage. All 24 million images were collected without any explicit consent, a type of face image that researchers call "in the wild".
+
+Next we manually verified 1,134 publicly available research papers that cite these datasets to determine who was using the data and where it was being used. Even though the vast majority of the images originated in the United States, the publicly available research citations show that only about 25% citations are from the country of the origin while the majority of citations are from China.
-Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
=== columns 2
```
single_pie_chart /site/research/munich_security_conference/assets/megapixels_origins_top.csv
-Caption: Sources of Publicly Available Face Training Data 2006 - 2018
+Caption: Sources of Publicly Available Non-Cooperative Face Image Training Data 2006 - 2018
Top: 10
OtherLabel: Other
```
@@ -44,106 +53,73 @@ OtherLabel: Other
```
single_pie_chart /site/research/munich_security_conference/assets/summary_countries.csv
-Caption: Locations Where Face Data Is Used
+Caption: Locations Where Face Data Is Used Based on Public Research Citations
Top: 14
OtherLabel: Other
```
=== end columns
+### 6,000 Embassy Photos Being Used To Train Facial Recognition
-=== columns 2
+Of the 5.8 million Flickr images we found over 6,000 public photos from Embassy Flickr accounts were used to train facial recognition technologies. These images were used in the MegaFace and IBM Diversity in Faces datasets. Over 2,000 more images were included in the Who Goes There dataset, used for facial ethnicity analysis research. A few of the embassy images found in facial recognition datasets are shown below.
-#### Sources of Face Data
-
-Add text
-
-| Source | Images |
-| --- | --- |
-|Search Engines | 30,127,200 |
-|Flickr.com | 11,783,888 |
-|IMDb.com | 5,251,410 |
-|CCTV | 959,312 |
-|Wikimedia.org | 183,500 |
-|Mugshots | 113,268 |
-|Other Sources Combined | 37,044 |
-|YouTube.com | 31,888 |
-
-===
+=== columns 2
-#### Locations Where Face Data Is Used
+```
+single_pie_chart /site/research/munich_security_conference/assets/country_counts.csv
+Caption: Photos from these embassies are being used to train face recognition software
+Top: 4
+OtherLabel: Other
+Colors: categoryRainbow
+```
-Add text
+=====
-|country | citations|
-| --- | --- |
-|China | 327|
-|United States | 302|
-|United Kingdom | 187|
-|Australia | 38|
-|Germany | 35|
-|Singapore | 27|
-|Canada | 25|
-|Netherlands | 25|
-|Italy | 22|
-|France | 17|
-|India | 14|
-|South Korea | 12|
-|Spain | 10|
-|Switzerland | 9|
+```
+single_pie_chart /site/research/munich_security_conference/assets/embassy_counts_summary_dataset.csv
+Caption: Embassy images were found in these datasets
+Top: 4
+OtherLabel: Other
+Colors: categoryRainbow
+```
=== end columns
+![caption: An image in the MegaFace dataset obtained from United Kingdom's Embassy in Italy](assets/4606260362.jpg)
+![caption: An image in the MegaFace dataset obtained from the Flickr account of the United States Embassy in Kabul, Afghanistan](assets/4749096858.jpg)
-
-## Over 6,000 Embassy Images on Flickr Found in Face Recognition Datasets
-
-Including over 2,000 more for racial analysis
+![caption: An image in the MegaFace dataset obtained from U.S. Embassy Canberra](assets/4730007024.jpg)
-![caption: MegaFace from U.S. Embassy Canberra](assets/4730007024.jpg)
+This brief research aims to shed light on the emerging politics of data. A photo is no longer just a photo when it can also be surveillance training data, and datasets can no longer be separated from the development of software when software is now built with data. "Our relationship to computers has changed", says Geoffrey Hinton, one of the founders of modern day neural networks and deep learning. "Instead of programming them, we now show them and they figure it out."[^hinton].
+As data becomes more political, national AI strategies might also want to include transnational dataset strategies.
-=== columns 2
-
-![caption: An image from the MegaFace dataset obtained from United Kingdom's Embassy in Italy https://flickr.com/photos/ukinitaly](assets/4606260362.jpg)
-
-====
+*This research post is ongoing and will updated during July and August, 2019.*
-![caption: An imgae from the MegaFace dataset obtained from the Flick account of the United States Embassy in Kabul Afghanistan https://flickr.com/photos/kabulpublicdiplomacy](assets/4749096858.jpg)
-
-
-=== end columns
+### Further Reading
+- [MS Celeb Dataset Analysis](/datasets/msceleb)
+- [Brainwash Dataset Analysis](/datasets/brainwash)
+- [Duke MTMC Dataset Analysis](/datasets/duke_mtmc)
+- [Unconstrained College Students Dataset Analysis](/datasets/uccs)
+- [Duke MTMC dataset author apologies to students](https://www.dukechronicle.com/article/2019/06/duke-university-facial-recognition-data-set-study-surveillance-video-students-china-uyghur)
+- [BBC coverage of MS Celeb dataset takedown](https://www.bbc.com/news/technology-48555149)
+- [Spiegel coverage of MS Celeb dataset takedown](https://www.spiegel.de/netzwelt/web/microsoft-gesichtserkennung-datenbank-mit-zehn-millionen-fotos-geloescht-a-1271221.html)
-=== columns 2
-
-```
-single_pie_chart /site/research/munich_security_conference/assets/megapixels_origins_top.csv
-Caption: Sources of Face Training Data
-Top: 5
-OtherLabel: Other Countries
-```
-===========
+{% include 'supplementary_header.html' %}
```
-single_pie_chart /site/research/munich_security_conference/assets/embassy_counts_summary_dataset.csv
-Caption: Dataset sources
-Top: 4
-OtherLabel: Other
+load_file /site/research/munich_security_conference/assets/embassy_counts_public.csv
+Headings: Images, Dataset, Embassy, Flickr ID, URL, Guest, Host
```
-=== end columns
+{% include 'cite_our_work.html' %}
-{% include 'supplementary_header.html' %}
-
-[ add a download button for CSV data ]
+### Footnotes
-```
-load_file /site/research/munich_security_conference/assets/embassy_counts_public.csv
-Images, Dataset, Embassy, Flickr ID, URL, Guest, Host
-```
+[^hinton]: "Heroes of Deep Learning: Andrew Ng interviews Geoffrey Hinton". Published on Aug 8, 2017. <https://www.youtube.com/watch?v=-eyhCTvrEtE>
-{% include 'cite_our_work.html' %} \ No newline at end of file
diff --git a/site/content/pages/test/csv.md b/site/content/pages/test/csv.md
index 85f714b4..ef3327f8 100644
--- a/site/content/pages/test/csv.md
+++ b/site/content/pages/test/csv.md
@@ -16,5 +16,5 @@ authors: Megapixels
```
load_file /site/test/assets/test.csv
-Name, Images, Year, Gender, Description, URL
+Headings: Name, Images, Year, Gender, Description, URL
```