summaryrefslogtreecommitdiff
path: root/site/public/research/munich_security_conference/index.html
diff options
context:
space:
mode:
Diffstat (limited to 'site/public/research/munich_security_conference/index.html')
-rw-r--r--site/public/research/munich_security_conference/index.html23
1 files changed, 10 insertions, 13 deletions
diff --git a/site/public/research/munich_security_conference/index.html b/site/public/research/munich_security_conference/index.html
index b43df151..3b18f1cd 100644
--- a/site/public/research/munich_security_conference/index.html
+++ b/site/public/research/munich_security_conference/index.html
@@ -53,27 +53,24 @@
<a href="/research">Research</a>
</div>
</header>
- <div class="content content-dataset">
+ <div class="content content-blog">
- <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/background.jpg)'></section><section><p><em>A case study on publicly available facial recognition datasets for the Munich Security Conference's Transnational Security Report</em></p>
+ <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/background.jpg)'></section><section><div class='image'><div class='intro-caption caption'>An image from the MegaFace face recognition training dataset taken from the U.S. Embassy of Madrid Flickr account</div></div></section><section><h1>Transnational Flows of Face Recognition Image Training Data</h1>
+<p><em>A case study on publicly available facial recognition datasets for the Munich Security Conference's Transnational Security Report</em></p>
</section><section><div class='right-sidebar'><div class='meta'><div class='gray'>Images Analyzed</div><div>24,302,637</div></div><div class='meta'><div class='gray'>Datasets Analyzed</div><div>30</div></div><div class='meta'><div class='gray'>Years</div><div>2006 - 2018</div></div><div class='meta'><div class='gray'>Last Updated</div><div>July 7, 2019</div></div><div class='meta'><div class='gray'>Text and Research</div><div>Adam Harvey</div></div><div class='meta'><div class='gray'>Published in</div><div><a href="https://tsr.securityconference.de/">Transnational Security Report</a></div></div></div><p>National AI strategies often rely on transnational data sources to capitalize on recent advancements in deep learning and neural networks. Researchers benefiting from these transnational data flows can yield quick and significant gains across diverse sectors from health care to biometrics. But new challenges emerge when national AI strategies collide with national interests.</p>
<p>Our <a href="https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d43599e">earlier research</a> on the <a href="/datasets/msceleb">MS Celeb</a> and <a href="/datasets/duke_mtmc">Duke</a> datasets published with the Financial Times revealed that several computer vision image datasets created by US companies and universities were unexpectedly also used for research by the National University of Defense Technology in China, along with top Chinese surveillance firms including SenseTime, SenseNets, CloudWalk, Hikvision, and Megvii/Face++ which have all been linked to oppressive surveillance in the Xinjiang region of China.</p>
<p>In this new research for the <a href="https://tsr.securityconference.de">Munich Security Conference's Transnational Security Report</a> we provide summary statistics about the origins and endpoints of facial recognition information supply chains. To make it more personal, we gathered additional data on the number of public photos from embassies that are currently being used in facial recognition training datasets.</p>
-<div style="display:inline;" class="columns columns-1"><div class="column"><div style="background:#202020;border-radius:6px;padding:20px;width:100%">
-
-<h4>Key Findings</h4>
-
+<h3>Key Findings</h3>
<ul>
- <li>24 million non-cooperative images were used in facial recognition research projects</li>
- <li>Most data originated from US-based search engines and Flickr, but most research citations found in China</li>
- <li>Over 6,000 of the images were from US, British, Italian, and French embassies (mostly US embassies)</li>
- <li>Images were used for commercial research by Google (US), Microsoft (US), SenseTime (China), Tencent (China), Mitsubishi (Japan), ExpertSystems (Italy), Siren Solution (Ireland), and Paradigma Digital (Spain); and military research by National University of Defense Technology (China)</li>
+<li>24 million non-cooperative images were used in facial recognition research prects</li>
+<li>Most data originated from US-based search engines and Flickr, but most research citations found in China</li>
+<li>Over 6,000 of the images were from US, British, Italian, and French embassies (mostly US embassies)</li>
+<li>Images were used for commercial research by Google (US), Microsoft (US), SenseTime (China), Tencent (China), Mitsubishi (Japan), ExpertSystems (Italy), Siren Solution (Ireland), and Paradigma Digital (Spain); and military research by National University of Defense Technology (China)</li>
</ul>
-
-</div></div></div><h3>24 Million Photos</h3>
+<h3>24 Million Photos</h3>
<p><strong>Origins</strong>: In total, we found over 24 million non-cooperative, non-consensual photos in 30 publicly available face recognition and face analysis datasets. Of these 24 million images, over 15 million face images are from Internet search engines, over 5.8 million from Flickr.com, over 2.5 million from the Internet Movie Database (IMDb.com), and nearly 500,000 from CCTV footage. All 24 million images were collected without any explicit consent, a type of face image that researchers call "in the wild". Every image contains at least one face and many photos contain multiple faces. There are approximately 1 million unique identities across all 24 million images.</p>
<p><strong>Endpoints</strong>:To understand the geographic dimensions of the data, we manually verified 1,134 publicly available research papers that cite these datasets to determine who was using the face data and where it was being used. Even though the vast majority of the images originated in the United States or from US companies, publicly available research papers show that only about 25% of the citations are from the United States while the majority are from China. Because only English research papers were analyzed the number of foreign research papers is likely to be larger and reflect increased foreign usage.</p>
-</section><section><div class='columns columns-2'><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/megapixels_origins_top.csv", "fields": ["Caption: Origins of 24.3 million photos in publicly available face analysis datasets 2006 - 2018", "Top: 10", "OtherLabel: Other"]}'></div></section><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/summary_countries.csv", "fields": ["Caption: Endpoints of 1,134 facial analysis research projects citing 30 face analysis datasets", "Top: 14", "OtherLabel: Other"]}'></div></section></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/7118211377.jpg' alt=''></div></section><section><h3>8,428 Embassy Photos Found in Facial Recognition Datasets</h3>
+</section><section><div class='columns columns-2'><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/megapixels_origins_top.csv", "fields": ["Caption: Origins of 24.3 million photos in publicly available face analysis datasets 2006 - 2018", "Top: 10", "OtherLabel: Other"]}'></div></section><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/summary_countries.csv", "fields": ["Caption: Endpoints of 1,134 facial analysis research projects citing 30 face analysis datasets", "Top: 14", "OtherLabel: Other"]}'></div></section></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/7118211377.jpg' alt=' A photo from the U.S Embassy in Tokyo found in a facial recognition training dataset'><div class='caption'> A photo from the U.S Embassy in Tokyo found in a facial recognition training dataset</div></div></section><section><h3>8,428 Embassy Photos Found in Facial Recognition Datasets</h3>
<p>Out of the 24 million images analyzed, at least 8,428 embassy images were found in face recognition and facial analysis datasets. These images were found by cross-referencing Flickr IDs and URLs between datasets to locate 5,667 images in the MegaFace dataset, 389 images in the IBM Diversity in Faces datasets, and 2,372 images in the Who Goes There dataset. MegaFace is one of the most widely used publicly available face recognition datasets for academic, commercial, and defense-related research.</p>
<p>In total, these 8,428 images were found to be used in at least 42 countries with most citations originating in China and most images originating from US embassies. The images were found to be used in research projects with links to commercial and defense organization including Google, Microsoft, National University of Defense Technology in China, SenseTime, Tencent, Mitsubishi, ExpertSystems (Italy), Siren Solution (Ireland), and Paradigma Digital (Spain).</p>
</section><section><div class='columns columns-2'><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/embassy_counts_summary_dataset.csv", "fields": ["Caption: Number of embassy photos incluced in each face recognition dataset", "Top: 4", "OtherLabel: Other", "Colors: categoryRainbow"]}'></div></section><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/country_counts.csv", "fields": ["Caption: Number of photos per national embassy", "Top: 4", "OtherLabel: Other", "Colors: categoryRainbow"]}'></div></section></div></section><section><p>The embassy and consulate photos below were all found in either the MegaFace or IBM Diversity in Faces datasets. Consulates were only included if marked as "EMBASSY" by the <a href="https://www.state.gov/global-social-media-presence/">U.S. Department of State’s Social Media Presence List</a>. Photos below were chosen because of inclusion of an embassy logo. All photos originated on Flickr.com and were published with a Creative Commons license.</p>