From d7692193d14be01a0068749f3cd0a4402b2645ac Mon Sep 17 00:00:00 2001 From: Adam Harvey Date: Fri, 28 Jun 2019 18:45:42 +0200 Subject: tpyos --- site/content/pages/research/munich_security_conference/index.md | 4 ++-- site/public/research/munich_security_conference/index.html | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/site/content/pages/research/munich_security_conference/index.md b/site/content/pages/research/munich_security_conference/index.md index e232df46..e0c28d49 100644 --- a/site/content/pages/research/munich_security_conference/index.md +++ b/site/content/pages/research/munich_security_conference/index.md @@ -29,14 +29,14 @@ National AI strategies often rely on transnational data sources to capitalize on Our [earlier research](https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d43599e) on the [MS Celeb](/datasets/msceleb) and [Duke](/datasets/duke_mtmc) datasets published with the Financial Times revealed that several computer vision image datasets created by US companies and universities were unexpectedly also used for research by the National University of Defense Technology in China, along with top Chinese surveillance firms including SenseTime, SenseNets, CloudWalk, Hikvision, and Megvii/Face++ which have all been linked to the oppressive surveillance of Uighur Muslims in Xinjiang. -In this new research for the [Munich Security Conference's Transnational Security Report](https://tsr.securityconference.de) we provide summary statistics about the origins and endpoints of facial recognition information supply chains. To make it more personal, we gathered additional data on the number of public photos from Embassies that are currently being used in facial recognition datasets. +In this new research for the [Munich Security Conference's Transnational Security Report](https://tsr.securityconference.de) we provide summary statistics about the origins and endpoints of facial recognition information supply chains. To make it more personal, we gathered additional data on the number of public photos from embassies that are currently being used in facial recognition datasets. ### 24 Million Non-Cooperative Faces In total, we analyzed 30 publicly available face recognition and face analysis datasets that collectively include over 24 million non-cooperative images. Of these 24 million images, over 15 million face images are from Internet search engines, over 5.8 million from Flickr.com, over 2.5 million from the Internet Movie Database (IMDb.com), and nearly 500,000 from CCTV footage. All 24 million images were collected without any explicit consent, a type of face image researchers call "in the wild". -Next we manually verified 1,134 publicly available research papers that cite these datasets to determine who was using the data and where it was being used. Even though all of the images originated in the United States, the publicly available research citations show that only about 25% citations are from the country of the origin while the majority of citations are from China. +Next we manually verified 1,134 publicly available research papers that cite these datasets to determine who was using the data and where it was being used. Even though the vast majority of the images originated in the United States, the publicly available research citations show that only about 25% citations are from the country of the origin while the majority of citations are from China. diff --git a/site/public/research/munich_security_conference/index.html b/site/public/research/munich_security_conference/index.html index c88be9db..5665daa1 100644 --- a/site/public/research/munich_security_conference/index.html +++ b/site/public/research/munich_security_conference/index.html @@ -59,10 +59,10 @@

Face Datasets and Information Supply Chains

National AI strategies often rely on transnational data sources to capitalize on recent advancements in deep learning and neural networks. Researchers benefiting from these transnational data flows can yield quick and significant gains across diverse sectors from health care to biometrics. But new challenges emerge when national AI strategies collide with national interests.

Our earlier research on the MS Celeb and Duke datasets published with the Financial Times revealed that several computer vision image datasets created by US companies and universities were unexpectedly also used for research by the National University of Defense Technology in China, along with top Chinese surveillance firms including SenseTime, SenseNets, CloudWalk, Hikvision, and Megvii/Face++ which have all been linked to the oppressive surveillance of Uighur Muslims in Xinjiang.

-

In this new research for the Munich Security Conference's Transnational Security Report we provide summary statistics about the origins and endpoints of facial recognition information supply chains. To make it more personal, we gathered additional data on the number of public photos from Embassies that are currently being used in facial recognition datasets.

+

In this new research for the Munich Security Conference's Transnational Security Report we provide summary statistics about the origins and endpoints of facial recognition information supply chains. To make it more personal, we gathered additional data on the number of public photos from embassies that are currently being used in facial recognition datasets.

24 Million Non-Cooperative Faces

In total, we analyzed 30 publicly available face recognition and face analysis datasets that collectively include over 24 million non-cooperative images. Of these 24 million images, over 15 million face images are from Internet search engines, over 5.8 million from Flickr.com, over 2.5 million from the Internet Movie Database (IMDb.com), and nearly 500,000 from CCTV footage. All 24 million images were collected without any explicit consent, a type of face image researchers call "in the wild".

-

Next we manually verified 1,134 publicly available research papers that cite these datasets to determine who was using the data and where it was being used. Even though all of the images originated in the United States, the publicly available research citations show that only about 25% citations are from the country of the origin while the majority of citations are from China.

+

Next we manually verified 1,134 publicly available research papers that cite these datasets to determine who was using the data and where it was being used. Even though the vast majority of the images originated in the United States, the publicly available research citations show that only about 25% citations are from the country of the origin while the majority of citations are from China.

6,000 Embassy Photos Being Used To Train Facial Recognition

Of the 5.8 million Flickr images we found over 6,000 public photos from Embassy Flickr accounts were used to train facial recognition technologies. These images were used in the MegaFace, IBM Diversity in Faces datasets. Over 2,000 more images were used in the Who Goes There datasets used for facial ethnicity analysis research. A few of the embassy images found in facial recognition datasets are shown below.

 An image in the MegaFace dataset obtained from United Kingdoms Embassy in Italy
An image in the MegaFace dataset obtained from United Kingdom's Embassy in Italy
-- cgit v1.2.3-70-g09d2