From e59b5e38a6dfcb61375686ec83a4606f50ab012d Mon Sep 17 00:00:00 2001 From: Adam Harvey Date: Fri, 28 Jun 2019 18:35:49 +0200 Subject: msc ready v1 --- .../research/_from_1_to_100_pixels/index.html | 20 ++------ site/public/research/_introduction/index.html | 22 ++------- .../research/_what_computers_can_see/index.html | 20 ++------ site/public/research/index.html | 13 ++++- .../research/munich_security_conference/index.html | 55 ++++++++++++---------- 5 files changed, 51 insertions(+), 79 deletions(-) (limited to 'site/public/research') diff --git a/site/public/research/_from_1_to_100_pixels/index.html b/site/public/research/_from_1_to_100_pixels/index.html index 74f334cc..a978b264 100644 --- a/site/public/research/_from_1_to_100_pixels/index.html +++ b/site/public/research/_from_1_to_100_pixels/index.html @@ -50,27 +50,13 @@
-
-

From 1 to 100 Pixels

-
-
-
Posted
-
2018-12-04
-
-
-
By
-
Adam Harvey
-
- -
-
- -

High resolution insights from low resolution data

+

From 1 to 100 Pixels

+

High resolution insights from low resolution data

This post will be about the meaning of "face". How do people define it? How to biometrics researchers define it? How has it changed during the last decade.

What can you know from a very small amount of information?

    diff --git a/site/public/research/_introduction/index.html b/site/public/research/_introduction/index.html index 66905247..8b17c016 100644 --- a/site/public/research/_introduction/index.html +++ b/site/public/research/_introduction/index.html @@ -50,27 +50,13 @@
    -
    -

    Introducing MegaPixels

    -
    -
    -
    Posted
    -
    2018-12-15
    -
    -
    -
    By
    -
    Adam Harvey
    -
    - -
    -
    - -

    Face recognition has become the focal point for ...

    +

    Introduction

    +

    Face recognition has become the focal point for ...

    Add 68pt landmarks animation

    But biometric currency is ...

    Add rotation 3D head

    @@ -82,7 +68,7 @@
  • Posted: Dec. 15
  • Author: Adam Harvey
-

Paragraph text to test css formatting. Paragraph text to test css formatting. Paragraph text to test css formatting. Paragraph text to test css formatting. Paragraph text to test css formatting.

+

Paragraph text to test css formatting. Paragraph text to test css formatting. Paragraph text to test css formatting. Paragraph text to test css formatting. Paragraph text to test css formatting.

[ page under development ]

 This is the caption
This is the caption
diff --git a/site/public/research/_what_computers_can_see/index.html b/site/public/research/_what_computers_can_see/index.html index 003dd733..35f6d47d 100644 --- a/site/public/research/_what_computers_can_see/index.html +++ b/site/public/research/_what_computers_can_see/index.html @@ -50,27 +50,13 @@
-
-

What Computers Can See

-
-
-
Posted
-
2018-12-15
-
-
-
By
-
Adam Harvey
-
- -
-
- -

Rosalind Picard on Affective Computing Podcast with Lex Fridman

+

What Computers Can See About Your Face

+

Rosalind Picard on Affective Computing Podcast with Lex Fridman

diff --git a/site/public/research/munich_security_conference/index.html b/site/public/research/munich_security_conference/index.html index 499d8e9f..b0503f84 100644 --- a/site/public/research/munich_security_conference/index.html +++ b/site/public/research/munich_security_conference/index.html @@ -4,7 +4,7 @@ MegaPixels: MSC - + @@ -50,31 +50,36 @@
-
-

MSC

-
-
-
Posted
-
2019-4-18
-
-
-
By
-
Adam Harvey
-
- -
-
- -
Analyzing the Transnational Flow of Facial Recognition Data
Where does face data originate and who's using it? -

[page under devlopment]

-

Intro paragraph.

-

[ add montage of extracted faces here]

-
 Placeholder caption
Placeholder caption
 Placeholder caption
Placeholder caption
 Placeholder caption
Placeholder caption
 Placeholder caption
Placeholder caption
+
Analyzing Transnational Flows of Face Recognition Image Training Data
Where does face data originate and who's using it? +

Face Datasets and Information Supply Chains

+

National AI strategies often rely on transnational data sources to capitalize on recent advancements in deep learning and neural networks. Researchers benefiting from these transnational data flows can yield quick and significant gains across diverse sectors from health care to biometrics. But new challenges emerge when national AI strategies collide with national interests.

+

Our earlier research on the MS Celeb and Duke datasets published with the Financial Times revealed that several computer vision image datasets created by US companies and universities were unexpectedly also used for research by the National University of Defense Technology in China, along with top Chinese surveillance firms including SenseTime, SenseNets, CloudWalk, Hikvision, and Megvii/Face++ which have all been linked to the oppressive surveillance of Uighur Muslims in Xinjiang.

+

In this new research for the Munich Security Conference's Transnational Security Report we provide summary statistics about the origins and endpoints of facial recognition information supply chains. To make it more personal, we gathered additional data on the number of public photos from Embassies that are currently being used in facial recognition datasets.

+

24 Million Non-Cooperative Faces

+

In total, we analyzed 30 publicly available face recognition and face analysis datasets that collectively include over 24 million non-cooperative images. Of these 24 million images, over 15 million face images are from Internet search engines, over 5.8 million from Flickr.com, over 2.5 million from the Internet Movie Database (IMDb.com), and nearly 500,000 from CCTV footage. All 24 million images were collected without any explicit consent, a type of face image researchers call "in the wild".

+

Next we manually verified 1,134 publicly available research papers that cite these datasets to determine who was using the data and where it was being used. Even though all of the images originated in the United States, the publicly available research citations show that only about 25% citations are from the country of the origin while the majority of citations are from China.

+

6,000 Embassy Photos Being Used To Train Facial Recognition

+

Of the 5.8 million Flickr images we found over 6,000 public photos from Embassy Flickr accounts were used to train facial recognition technologies. These images were used in the MegaFace, IBM Diversity in Faces datasets. Over 2,000 more images were used in the Who Goes There datasets used for facial ethnicity analysis research. A few of the embassy images found in facial recognition datasets are shown below.

+
 An image in the MegaFace dataset obtained from United Kingdoms Embassy in Italy
An image in the MegaFace dataset obtained from United Kingdom's Embassy in Italy
+
 An image in the MegaFace dataset obtained from the Flickr account of the United States Embassy in Kabul, Afghanistan
An image in the MegaFace dataset obtained from the Flickr account of the United States Embassy in Kabul, Afghanistan
 An image in the MegaFace dataset obtained from U.S. Embassy Canberra
An image in the MegaFace dataset obtained from U.S. Embassy Canberra

This brief research aims to shed light on the emerging politics of data. A photo is no longer just a photo when it can also be surveillance training data, and datasets can no longer be separated from the development of software when software is now built with data. "Our relationship to computers has changed", says Geoffrey Hinton, one of the founders of modern day neural networks and deep learning. "Instead of programming them, we now show them and they figure it out." 1.

+

National AI strategies might also want to include transnational dataset strategies.

+

This research post is going and will updated during July and August, 2019.

+

Further Reading

+ +
@@ -83,8 +88,7 @@

Supplementary Information

-

[ add a download button for CSV data ]

-
+

Cite Our Work

@@ -101,7 +105,8 @@ }

-
+

References

-- cgit v1.2.3-70-g09d2 From 16159a8aaa290b5a499d1b5a503a463fdbabff52 Mon Sep 17 00:00:00 2001 From: Adam Harvey Date: Fri, 28 Jun 2019 18:41:18 +0200 Subject: fix typos --- site/content/pages/research/munich_security_conference/index.md | 4 ++-- site/public/research/munich_security_conference/index.html | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) (limited to 'site/public/research') diff --git a/site/content/pages/research/munich_security_conference/index.md b/site/content/pages/research/munich_security_conference/index.md index 0f8a5bda..e232df46 100644 --- a/site/content/pages/research/munich_security_conference/index.md +++ b/site/content/pages/research/munich_security_conference/index.md @@ -27,9 +27,9 @@ authors: Adam Harvey National AI strategies often rely on transnational data sources to capitalize on recent advancements in deep learning and neural networks. Researchers benefiting from these transnational data flows can yield quick and significant gains across diverse sectors from health care to biometrics. But new challenges emerge when national AI strategies collide with national interests. -Our earlier research on the [MS Celeb](/datsets) and [Duke](/datsets/duke_mtmc) datasets published with the Financial Times revealed that several computer vision image datasets created by US companies and universities were unexpectedly also used for research by the National University of Defense Technology in China, along with top Chinese surveillance firms including SenseTime, SenseNets, CloudWalk, Hikvision, and Megvii/Face++ which have all been linked to the oppressive surveillance of Uighur Muslims in Xinjiang. +Our [earlier research](https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d43599e) on the [MS Celeb](/datasets/msceleb) and [Duke](/datasets/duke_mtmc) datasets published with the Financial Times revealed that several computer vision image datasets created by US companies and universities were unexpectedly also used for research by the National University of Defense Technology in China, along with top Chinese surveillance firms including SenseTime, SenseNets, CloudWalk, Hikvision, and Megvii/Face++ which have all been linked to the oppressive surveillance of Uighur Muslims in Xinjiang. -In this new research for the Munich Security Conference's Transnational Security Report we provide summary statistics about the origins and endpoints of facial recognition information supply chains. To make it more personal, we gathered additional data on the number of public photos from Embassies that are currently being used in facial recognition datasets. +In this new research for the [Munich Security Conference's Transnational Security Report](https://tsr.securityconference.de) we provide summary statistics about the origins and endpoints of facial recognition information supply chains. To make it more personal, we gathered additional data on the number of public photos from Embassies that are currently being used in facial recognition datasets. ### 24 Million Non-Cooperative Faces diff --git a/site/public/research/munich_security_conference/index.html b/site/public/research/munich_security_conference/index.html index b0503f84..c88be9db 100644 --- a/site/public/research/munich_security_conference/index.html +++ b/site/public/research/munich_security_conference/index.html @@ -58,8 +58,8 @@
Analyzing Transnational Flows of Face Recognition Image Training Data
Where does face data originate and who's using it?

Face Datasets and Information Supply Chains

National AI strategies often rely on transnational data sources to capitalize on recent advancements in deep learning and neural networks. Researchers benefiting from these transnational data flows can yield quick and significant gains across diverse sectors from health care to biometrics. But new challenges emerge when national AI strategies collide with national interests.

-

Our earlier research on the MS Celeb and Duke datasets published with the Financial Times revealed that several computer vision image datasets created by US companies and universities were unexpectedly also used for research by the National University of Defense Technology in China, along with top Chinese surveillance firms including SenseTime, SenseNets, CloudWalk, Hikvision, and Megvii/Face++ which have all been linked to the oppressive surveillance of Uighur Muslims in Xinjiang.

-

In this new research for the Munich Security Conference's Transnational Security Report we provide summary statistics about the origins and endpoints of facial recognition information supply chains. To make it more personal, we gathered additional data on the number of public photos from Embassies that are currently being used in facial recognition datasets.

+

Our earlier research on the MS Celeb and Duke datasets published with the Financial Times revealed that several computer vision image datasets created by US companies and universities were unexpectedly also used for research by the National University of Defense Technology in China, along with top Chinese surveillance firms including SenseTime, SenseNets, CloudWalk, Hikvision, and Megvii/Face++ which have all been linked to the oppressive surveillance of Uighur Muslims in Xinjiang.

+

In this new research for the Munich Security Conference's Transnational Security Report we provide summary statistics about the origins and endpoints of facial recognition information supply chains. To make it more personal, we gathered additional data on the number of public photos from Embassies that are currently being used in facial recognition datasets.

24 Million Non-Cooperative Faces

In total, we analyzed 30 publicly available face recognition and face analysis datasets that collectively include over 24 million non-cooperative images. Of these 24 million images, over 15 million face images are from Internet search engines, over 5.8 million from Flickr.com, over 2.5 million from the Internet Movie Database (IMDb.com), and nearly 500,000 from CCTV footage. All 24 million images were collected without any explicit consent, a type of face image researchers call "in the wild".

Next we manually verified 1,134 publicly available research papers that cite these datasets to determine who was using the data and where it was being used. Even though all of the images originated in the United States, the publicly available research citations show that only about 25% citations are from the country of the origin while the majority of citations are from China.

-- cgit v1.2.3-70-g09d2 From d7692193d14be01a0068749f3cd0a4402b2645ac Mon Sep 17 00:00:00 2001 From: Adam Harvey Date: Fri, 28 Jun 2019 18:45:42 +0200 Subject: tpyos --- site/content/pages/research/munich_security_conference/index.md | 4 ++-- site/public/research/munich_security_conference/index.html | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) (limited to 'site/public/research') diff --git a/site/content/pages/research/munich_security_conference/index.md b/site/content/pages/research/munich_security_conference/index.md index e232df46..e0c28d49 100644 --- a/site/content/pages/research/munich_security_conference/index.md +++ b/site/content/pages/research/munich_security_conference/index.md @@ -29,14 +29,14 @@ National AI strategies often rely on transnational data sources to capitalize on Our [earlier research](https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d43599e) on the [MS Celeb](/datasets/msceleb) and [Duke](/datasets/duke_mtmc) datasets published with the Financial Times revealed that several computer vision image datasets created by US companies and universities were unexpectedly also used for research by the National University of Defense Technology in China, along with top Chinese surveillance firms including SenseTime, SenseNets, CloudWalk, Hikvision, and Megvii/Face++ which have all been linked to the oppressive surveillance of Uighur Muslims in Xinjiang. -In this new research for the [Munich Security Conference's Transnational Security Report](https://tsr.securityconference.de) we provide summary statistics about the origins and endpoints of facial recognition information supply chains. To make it more personal, we gathered additional data on the number of public photos from Embassies that are currently being used in facial recognition datasets. +In this new research for the [Munich Security Conference's Transnational Security Report](https://tsr.securityconference.de) we provide summary statistics about the origins and endpoints of facial recognition information supply chains. To make it more personal, we gathered additional data on the number of public photos from embassies that are currently being used in facial recognition datasets. ### 24 Million Non-Cooperative Faces In total, we analyzed 30 publicly available face recognition and face analysis datasets that collectively include over 24 million non-cooperative images. Of these 24 million images, over 15 million face images are from Internet search engines, over 5.8 million from Flickr.com, over 2.5 million from the Internet Movie Database (IMDb.com), and nearly 500,000 from CCTV footage. All 24 million images were collected without any explicit consent, a type of face image researchers call "in the wild". -Next we manually verified 1,134 publicly available research papers that cite these datasets to determine who was using the data and where it was being used. Even though all of the images originated in the United States, the publicly available research citations show that only about 25% citations are from the country of the origin while the majority of citations are from China. +Next we manually verified 1,134 publicly available research papers that cite these datasets to determine who was using the data and where it was being used. Even though the vast majority of the images originated in the United States, the publicly available research citations show that only about 25% citations are from the country of the origin while the majority of citations are from China. diff --git a/site/public/research/munich_security_conference/index.html b/site/public/research/munich_security_conference/index.html index c88be9db..5665daa1 100644 --- a/site/public/research/munich_security_conference/index.html +++ b/site/public/research/munich_security_conference/index.html @@ -59,10 +59,10 @@

Face Datasets and Information Supply Chains

National AI strategies often rely on transnational data sources to capitalize on recent advancements in deep learning and neural networks. Researchers benefiting from these transnational data flows can yield quick and significant gains across diverse sectors from health care to biometrics. But new challenges emerge when national AI strategies collide with national interests.

Our earlier research on the MS Celeb and Duke datasets published with the Financial Times revealed that several computer vision image datasets created by US companies and universities were unexpectedly also used for research by the National University of Defense Technology in China, along with top Chinese surveillance firms including SenseTime, SenseNets, CloudWalk, Hikvision, and Megvii/Face++ which have all been linked to the oppressive surveillance of Uighur Muslims in Xinjiang.

-

In this new research for the Munich Security Conference's Transnational Security Report we provide summary statistics about the origins and endpoints of facial recognition information supply chains. To make it more personal, we gathered additional data on the number of public photos from Embassies that are currently being used in facial recognition datasets.

+

In this new research for the Munich Security Conference's Transnational Security Report we provide summary statistics about the origins and endpoints of facial recognition information supply chains. To make it more personal, we gathered additional data on the number of public photos from embassies that are currently being used in facial recognition datasets.

24 Million Non-Cooperative Faces

In total, we analyzed 30 publicly available face recognition and face analysis datasets that collectively include over 24 million non-cooperative images. Of these 24 million images, over 15 million face images are from Internet search engines, over 5.8 million from Flickr.com, over 2.5 million from the Internet Movie Database (IMDb.com), and nearly 500,000 from CCTV footage. All 24 million images were collected without any explicit consent, a type of face image researchers call "in the wild".

-

Next we manually verified 1,134 publicly available research papers that cite these datasets to determine who was using the data and where it was being used. Even though all of the images originated in the United States, the publicly available research citations show that only about 25% citations are from the country of the origin while the majority of citations are from China.

+

Next we manually verified 1,134 publicly available research papers that cite these datasets to determine who was using the data and where it was being used. Even though the vast majority of the images originated in the United States, the publicly available research citations show that only about 25% citations are from the country of the origin while the majority of citations are from China.

6,000 Embassy Photos Being Used To Train Facial Recognition

Of the 5.8 million Flickr images we found over 6,000 public photos from Embassy Flickr accounts were used to train facial recognition technologies. These images were used in the MegaFace, IBM Diversity in Faces datasets. Over 2,000 more images were used in the Who Goes There datasets used for facial ethnicity analysis research. A few of the embassy images found in facial recognition datasets are shown below.

 An image in the MegaFace dataset obtained from United Kingdoms Embassy in Italy
An image in the MegaFace dataset obtained from United Kingdom's Embassy in Italy
-- cgit v1.2.3-70-g09d2 From ef773b16f2d25e068933ef192561c3e00b9a4e44 Mon Sep 17 00:00:00 2001 From: Adam Harvey Date: Fri, 28 Jun 2019 19:13:43 +0200 Subject: typos --- site/content/pages/research/munich_security_conference/index.md | 6 +++--- site/public/research/munich_security_conference/index.html | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) (limited to 'site/public/research') diff --git a/site/content/pages/research/munich_security_conference/index.md b/site/content/pages/research/munich_security_conference/index.md index e0c28d49..8627439d 100644 --- a/site/content/pages/research/munich_security_conference/index.md +++ b/site/content/pages/research/munich_security_conference/index.md @@ -94,9 +94,9 @@ Colors: categoryRainbow This brief research aims to shed light on the emerging politics of data. A photo is no longer just a photo when it can also be surveillance training data, and datasets can no longer be separated from the development of software when software is now built with data. "Our relationship to computers has changed", says Geoffrey Hinton, one of the founders of modern day neural networks and deep learning. "Instead of programming them, we now show them and they figure it out."[^hinton]. -National AI strategies might also want to include transnational dataset strategies. +As data becomes more political, national AI strategies might also want to include transnational dataset strategies. -*This research post is going and will updated during July and August, 2019.* +*This research post is ongoing and will updated during July and August, 2019.* ### Further Reading @@ -106,7 +106,7 @@ National AI strategies might also want to include transnational dataset strategi - [Unconstrained College Students Dataset Analysis](/datasets/uccs) - [Duke MTMC dataset author apologies to students](https://www.dukechronicle.com/article/2019/06/duke-university-facial-recognition-data-set-study-surveillance-video-students-china-uyghur) - [BBC coverage of MS Celeb dataset takedown](https://www.bbc.com/news/technology-48555149) -- [Spiegel coverage of MS Celeb dataset takdown](https://www.spiegel.de/netzwelt/web/microsoft-gesichtserkennung-datenbank-mit-zehn-millionen-fotos-geloescht-a-1271221.html) +- [Spiegel coverage of MS Celeb dataset takedown](https://www.spiegel.de/netzwelt/web/microsoft-gesichtserkennung-datenbank-mit-zehn-millionen-fotos-geloescht-a-1271221.html) {% include 'supplementary_header.html' %} diff --git a/site/public/research/munich_security_conference/index.html b/site/public/research/munich_security_conference/index.html index 5665daa1..9263c772 100644 --- a/site/public/research/munich_security_conference/index.html +++ b/site/public/research/munich_security_conference/index.html @@ -67,8 +67,8 @@

Of the 5.8 million Flickr images we found over 6,000 public photos from Embassy Flickr accounts were used to train facial recognition technologies. These images were used in the MegaFace, IBM Diversity in Faces datasets. Over 2,000 more images were used in the Who Goes There datasets used for facial ethnicity analysis research. A few of the embassy images found in facial recognition datasets are shown below.

 An image in the MegaFace dataset obtained from United Kingdoms Embassy in Italy
An image in the MegaFace dataset obtained from United Kingdom's Embassy in Italy
 An image in the MegaFace dataset obtained from the Flickr account of the United States Embassy in Kabul, Afghanistan
An image in the MegaFace dataset obtained from the Flickr account of the United States Embassy in Kabul, Afghanistan
 An image in the MegaFace dataset obtained from U.S. Embassy Canberra
An image in the MegaFace dataset obtained from U.S. Embassy Canberra

This brief research aims to shed light on the emerging politics of data. A photo is no longer just a photo when it can also be surveillance training data, and datasets can no longer be separated from the development of software when software is now built with data. "Our relationship to computers has changed", says Geoffrey Hinton, one of the founders of modern day neural networks and deep learning. "Instead of programming them, we now show them and they figure it out." 1.

-

National AI strategies might also want to include transnational dataset strategies.

-

This research post is going and will updated during July and August, 2019.

+

As data becomes more political, national AI strategies might also want to include transnational dataset strategies.

+

This research post is ongoing and will updated during July and August, 2019.

Further Reading

-- cgit v1.2.3-70-g09d2 From 44deb13ac175d8e3eb843875e6be820d71940ac4 Mon Sep 17 00:00:00 2001 From: Adam Harvey Date: Fri, 28 Jun 2019 19:15:23 +0200 Subject: title --- site/content/pages/research/munich_security_conference/index.md | 2 +- site/public/research/munich_security_conference/index.html | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) (limited to 'site/public/research') diff --git a/site/content/pages/research/munich_security_conference/index.md b/site/content/pages/research/munich_security_conference/index.md index 8627439d..2b68a8e9 100644 --- a/site/content/pages/research/munich_security_conference/index.md +++ b/site/content/pages/research/munich_security_conference/index.md @@ -1,7 +1,7 @@ ------------ status: published -title: MSC +title: Transnational Flows of Face Recognition Image Training Data slug: munich-security-conference desc: Analyzing Transnational Flows of Face Recognition Image Training Data subdesc: Where does face data originate and who's using it? diff --git a/site/public/research/munich_security_conference/index.html b/site/public/research/munich_security_conference/index.html index 66d8a190..02b0fcfe 100644 --- a/site/public/research/munich_security_conference/index.html +++ b/site/public/research/munich_security_conference/index.html @@ -1,11 +1,11 @@ - MegaPixels: MSC + MegaPixels: Transnational Flows of Face Recognition Image Training Data - + -- cgit v1.2.3-70-g09d2 From 411aa602b9cf886758c4ff5ca5550c43ae7b7804 Mon Sep 17 00:00:00 2001 From: Jules Laplace Date: Fri, 28 Jun 2019 21:42:01 -0400 Subject: copy --- site/public/research/munich_security_conference/index.html | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'site/public/research') diff --git a/site/public/research/munich_security_conference/index.html b/site/public/research/munich_security_conference/index.html index 02b0fcfe..0b625f53 100644 --- a/site/public/research/munich_security_conference/index.html +++ b/site/public/research/munich_security_conference/index.html @@ -61,10 +61,10 @@

Our earlier research on the MS Celeb and Duke datasets published with the Financial Times revealed that several computer vision image datasets created by US companies and universities were unexpectedly also used for research by the National University of Defense Technology in China, along with top Chinese surveillance firms including SenseTime, SenseNets, CloudWalk, Hikvision, and Megvii/Face++ which have all been linked to the oppressive surveillance of Uighur Muslims in Xinjiang.

In this new research for the Munich Security Conference's Transnational Security Report we provide summary statistics about the origins and endpoints of facial recognition information supply chains. To make it more personal, we gathered additional data on the number of public photos from embassies that are currently being used in facial recognition datasets.

24 Million Non-Cooperative Faces

-

In total, we analyzed 30 publicly available face recognition and face analysis datasets that collectively include over 24 million non-cooperative images. Of these 24 million images, over 15 million face images are from Internet search engines, over 5.8 million from Flickr.com, over 2.5 million from the Internet Movie Database (IMDb.com), and nearly 500,000 from CCTV footage. All 24 million images were collected without any explicit consent, a type of face image researchers call "in the wild".

+

In total, we analyzed 30 publicly available face recognition and face analysis datasets that collectively include over 24 million non-cooperative images. Of these 24 million images, over 15 million face images are from Internet search engines, over 5.8 million from Flickr.com, over 2.5 million from the Internet Movie Database (IMDb.com), and nearly 500,000 from CCTV footage. All 24 million images were collected without any explicit consent, a type of face image that researchers call "in the wild".

Next we manually verified 1,134 publicly available research papers that cite these datasets to determine who was using the data and where it was being used. Even though the vast majority of the images originated in the United States, the publicly available research citations show that only about 25% citations are from the country of the origin while the majority of citations are from China.

6,000 Embassy Photos Being Used To Train Facial Recognition

-

Of the 5.8 million Flickr images we found over 6,000 public photos from Embassy Flickr accounts were used to train facial recognition technologies. These images were used in the MegaFace, IBM Diversity in Faces datasets. Over 2,000 more images were used in the Who Goes There datasets used for facial ethnicity analysis research. A few of the embassy images found in facial recognition datasets are shown below.

+

Of the 5.8 million Flickr images we found over 6,000 public photos from Embassy Flickr accounts were used to train facial recognition technologies. These images were used in the MegaFace and IBM Diversity in Faces datasets. Over 2,000 more images were included in the Who Goes There dataset, used for facial ethnicity analysis research. A few of the embassy images found in facial recognition datasets are shown below.

 An image in the MegaFace dataset obtained from United Kingdoms Embassy in Italy
An image in the MegaFace dataset obtained from United Kingdom's Embassy in Italy
 An image in the MegaFace dataset obtained from the Flickr account of the United States Embassy in Kabul, Afghanistan
An image in the MegaFace dataset obtained from the Flickr account of the United States Embassy in Kabul, Afghanistan
 An image in the MegaFace dataset obtained from U.S. Embassy Canberra
An image in the MegaFace dataset obtained from U.S. Embassy Canberra

This brief research aims to shed light on the emerging politics of data. A photo is no longer just a photo when it can also be surveillance training data, and datasets can no longer be separated from the development of software when software is now built with data. "Our relationship to computers has changed", says Geoffrey Hinton, one of the founders of modern day neural networks and deep learning. "Instead of programming them, we now show them and they figure it out." 1.

As data becomes more political, national AI strategies might also want to include transnational dataset strategies.

-- cgit v1.2.3-70-g09d2