diff options
| author | adamhrv <adam@ahprojects.com> | 2019-07-07 12:49:52 +0200 |
|---|---|---|
| committer | adamhrv <adam@ahprojects.com> | 2019-07-07 12:49:52 +0200 |
| commit | 89c4cfb95ce8a0b885c3145a33f22cea178f8ca8 (patch) | |
| tree | f99382908e3d05b1f3867e290342144018be08f4 /site/public/research | |
| parent | b17a04a669b0fed7cf6910ecb4886be9a29ee6b5 (diff) | |
update msc
Diffstat (limited to 'site/public/research')
| -rw-r--r-- | site/public/research/munich_security_conference/index.html | 82 |
1 files changed, 50 insertions, 32 deletions
diff --git a/site/public/research/munich_security_conference/index.html b/site/public/research/munich_security_conference/index.html index e1f74482..0146508e 100644 --- a/site/public/research/munich_security_conference/index.html +++ b/site/public/research/munich_security_conference/index.html @@ -4,7 +4,7 @@ <title>MegaPixels: Transnational Flows of Face Recognition Image Training Data</title> <meta charset="utf-8" /> <meta name="author" content="Adam Harvey" /> - <meta name="description" content="Analyzing Transnational Flows of Face Recognition Image Training Data" /> + <meta name="description" content="Transnational Flows of Face Recognition Image Training Data" /> <meta property="og:title" content="MegaPixels: Transnational Flows of Face Recognition Image Training Data"/> <meta property="og:type" content="website"/> <meta property="og:summary" content="MegaPixels is an art and research project about face recognition datasets created \"in the wild\"/> @@ -55,55 +55,73 @@ </header> <div class="content content-dataset"> - <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/background.jpg)'><div class='inner'><div class='hero_desc'><span class='bgpad'>Analyzing Transnational Flows of Face Recognition Image Training Data</span></div><div class='hero_subdesc'><span class='bgpad'>Where does face data originate and who's using it? -</span></div></div></section><section><h2>Face Datasets and Information Supply Chains</h2> + <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/background.jpg)'><div class='inner'><div class='hero_desc'><span class='bgpad'>Transnational Flows of Face Recognition Image Training Data</span></div><div class='hero_subdesc'><span class='bgpad'>Where does face data originate and who's using it? +</span></div></div></section><section><p><em>A case study on publicly available facial recognition datasets for the Munich Security Conference's Transnational Security Report</em></p> </section><section><div class='right-sidebar'><div class='meta'><div class='gray'>Images Analyzed</div><div>24,302,637</div></div><div class='meta'><div class='gray'>Datasets Analyzed</div><div>30</div></div><div class='meta'><div class='gray'>Years</div><div>2006 - 2018</div></div><div class='meta'><div class='gray'>Status</div><div>Ongoing Investigation</div></div><div class='meta'><div class='gray'>Last Updated</div><div>June 28, 2019</div></div></div><p>National AI strategies often rely on transnational data sources to capitalize on recent advancements in deep learning and neural networks. Researchers benefiting from these transnational data flows can yield quick and significant gains across diverse sectors from health care to biometrics. But new challenges emerge when national AI strategies collide with national interests.</p> -<p>Our <a href="https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d43599e">earlier research</a> on the <a href="/datasets/msceleb">MS Celeb</a> and <a href="/datasets/duke_mtmc">Duke</a> datasets published with the Financial Times revealed that several computer vision image datasets created by US companies and universities were unexpectedly also used for research by the National University of Defense Technology in China, along with top Chinese surveillance firms including SenseTime, SenseNets, CloudWalk, Hikvision, and Megvii/Face++ which have all been linked to the oppressive surveillance of Uighur Muslims in Xinjiang.</p> -<p>In this new research for the <a href="https://tsr.securityconference.de">Munich Security Conference's Transnational Security Report</a> we provide summary statistics about the origins and endpoints of facial recognition information supply chains. To make it more personal, we gathered additional data on the number of public photos from embassies that are currently being used in facial recognition datasets.</p> -<h3>24 Million Non-Cooperative Faces</h3> -<p>In total, we found over 24 million non-cooperative, non-consensual face images in 30 publicly available face recognition and face analysis datasets. Of these 24 million images, over 15 million face images are from Internet search engines, over 5.8 million from Flickr.com, over 2.5 million from the Internet Movie Database (IMDb.com), and nearly 500,000 from CCTV footage. All 24 million images were collected without any explicit consent, a type of face image that researchers call "in the wild".</p> -<p>Next we manually verified 1,134 publicly available research papers that cite these datasets to determine who was using the face data and where it was being used. Even though the vast majority of the images originated in the United States, the publicly available research citations show that only about 25% citations are from the United States while the majority of citations are from China.</p> -</section><section><div class='columns columns-2'><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/megapixels_origins_top.csv", "fields": ["Caption: Sources of Publicly Available Face Image Training Data 2006 - 2018", "Top: 10", "OtherLabel: Other"]}'></div></section><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/summary_countries.csv", "fields": ["Caption: Locations Where Face Data Is Used Based on Public Research Citations", "Top: 14", "OtherLabel: Other"]}'></div></section></div></section><section><h3>Over 6,000 Embassy Photos Found in Facial Recognition Training Datasets</h3> -<p>Out of the 24 million images analyzed, over 6,000 embassies images were found in face recognition training datasets. These images were found by cross-referencing the Flickr IDs between datasets to locate 5,667 images in the MegaFace dataset, 389 images in the IBM Diversity in Faces datasets. Both of these datasets are widely used in academic, industry, and defense research projects. An additional 2,372 more images were found in the Who Goes There dataset, which is used for facial ethnicity analysis research.</p> -<p>In total at least 8,428 embassy images are being used in facial recognition and facial analysis studies in at least 42 countries.</p> -</section><section><div class='columns columns-2'><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/country_counts.csv", "fields": ["Caption: Photos from these embassies are being used to train face recognition software", "Top: 4", "OtherLabel: Other", "Colors: categoryRainbow"]}'></div></section><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/embassy_counts_summary_dataset.csv", "fields": ["Caption: Embassy images were found in these datasets", "Top: 4", "OtherLabel: Other", "Colors: categoryRainbow"]}'></div></section></div></section><section><h4>Embassy Photos in Face Recognition Datasets</h4> -<p>The embassy and consulate photos below were all found in facial recognition training datasets MegaFace or IBM Diversity in Faces. Consulates were only included if marked as "EMBASSY" by the <a href="https://www.state.gov/global-social-media-presence/">U.S. Department of State’s Social Media Presence List</a>. Photos were chosen because of their inclusion of an embassy logo.</p> -</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/9607407530.jpg' alt=' US Embassy Yaounde, Cameroon'><div class='caption'> US Embassy Yaounde, Cameroon</div></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/4350550797.jpg' alt=' US Embassy Madrid'><div class='caption'> US Embassy Madrid</div></div> +<p>Our <a href="https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d43599e">earlier research</a> on the <a href="/datasets/msceleb">MS Celeb</a> and <a href="/datasets/duke_mtmc">Duke</a> datasets published with the Financial Times revealed that several computer vision image datasets created by US companies and universities were unexpectedly also used for research by the National University of Defense Technology in China, along with top Chinese surveillance firms including SenseTime, SenseNets, CloudWalk, Hikvision, and Megvii/Face++ which have all been linked to oppressive surveillance in the Xinjiang region of China.</p> +<p>In this new research for the <a href="https://tsr.securityconference.de">Munich Security Conference's Transnational Security Report</a> we provide summary statistics about the origins and endpoints of facial recognition information supply chains. To make it more personal, we gathered additional data on the number of public photos from embassies that are currently being used in facial recognition training datasets.</p> +<div style="display:inline;" class="columns columns-1"><div class="column"><div style="background:#202020;border-radius:6px;padding:20px;width:100%"> + +<h4>Key Findings</h4> + +<ul> + <li>24 million non-cooperative images were used in facial recognition research projects</li> + <li>Most data originated from US-based search engines and Flickr, but most research citations found in China</li> + <li>Over 6,000 of the images were from US, British, Italian, and French embassies (mostly US embassies)</li> + <li>Images were used for commercial research by Google (US), Microsoft (US), SenseTime (China), Tencent (China), Mitsubishi (Japan), ExpertSystems (Italy), Siren Solution (Ireland), and Paradigma Digital (Spain); and military research by National University of Defense Technology (China)</li> +</ul> + +</div></div></div><h3>24 Million Photos</h3> +<p><strong>Origins</strong>: In total, we found over 24 million non-cooperative, non-consensual photos in 30 publicly available face recognition and face analysis datasets. Of these 24 million images, over 15 million face images are from Internet search engines, over 5.8 million from Flickr.com, over 2.5 million from the Internet Movie Database (IMDb.com), and nearly 500,000 from CCTV footage. All 24 million images were collected without any explicit consent, a type of face image that researchers call "in the wild". Every image contains at least one face and many photos contain multiple faces. There are approximately 1 million unique identities across all 24 million images.</p> +<p><strong>Endpoints</strong>:To understand the geographic dimensions of the data, we manually verified 1,134 publicly available research papers that cite these datasets to determine who was using the face data and where it was being used. Even though the vast majority of the images originated in the United States or from US companies, publicly available research papers show that only about 25% of the citations are from the United States while the majority are from China. Because only English research papers were analyzed the number of foreign research papers is likely to be larger and reflect increased foreign usage.</p> +</section><section><div class='columns columns-2'><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/megapixels_origins_top.csv", "fields": ["Caption: Origins of 24.3 million photos in publicly available face analysis datasets 2006 - 2018", "Top: 10", "OtherLabel: Other"]}'></div></section><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/summary_countries.csv", "fields": ["Caption: Endpoints of 1,134 facial analysis research projects citing 30 face analysis datasets", "Top: 14", "OtherLabel: Other"]}'></div></section></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/7118211377.jpg' alt=''></div></section><section><h3>8,428 Embassy Photos Found in Facial Recognition Datasets</h3> +<p>Out of the 24 million images analyzed, at least 8,428 embassy images were found in face recognition and facial analysis datasets. These images were found by cross-referencing Flickr IDs and URLs between datasets to locate 5,667 images in the MegaFace dataset, 389 images in the IBM Diversity in Faces datasets, and 2,372 images in the Who Goes There dataset. MegaFace is one of the most widely used publicly available face recognition datasets for academic, commercial, and defense-related research.</p> +<p>In total, these 8,428 images were found to be used in at least 42 countries with most citations originating in China and most images originating from US embassies.</p> +</section><section><div class='columns columns-2'><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/embassy_counts_summary_dataset.csv", "fields": ["Caption: Number of embassy photos incluced in each face recognition dataset", "Top: 4", "OtherLabel: Other", "Colors: categoryRainbow"]}'></div></section><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/country_counts.csv", "fields": ["Caption: Number of photos per national embassy", "Top: 4", "OtherLabel: Other", "Colors: categoryRainbow"]}'></div></section></div></section><section><p>The embassy and consulate photos below were all found in either the MegaFace or IBM Diversity in Faces datasets. Consulates were only included if marked as "EMBASSY" by the <a href="https://www.state.gov/global-social-media-presence/">U.S. Department of State’s Social Media Presence List</a>. Photos below were chosen because of inclusion of an embassy logo. All photos originated on Flickr.com and were published with a Creative Commons license.</p> +<p>All images were found to be used in research projects with links to commercial and defense organization including Google, Microsoft, National University of Defense Technology in China, SenseTime, Tencent, Mitsubishi, ExpertSystems (Italy), Siren Solution (Ireland), and Paradigma Digital (Spain).</p> +</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/4730007024.jpg' alt=' US Embassy Canberra'><div class='caption'> US Embassy Canberra</div></div> +<div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/7645865468.jpg' alt=' US Embassy Kingston'><div class='caption'> US Embassy Kingston</div></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/4350550797.jpg' alt=' US Embassy Madrid'><div class='caption'> US Embassy Madrid</div></div> <div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/4625883763.jpg' alt=' US Embassy Kabul'><div class='caption'> US Embassy Kabul</div></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/5906549160.jpg' alt=' US Embassy San Jose'><div class='caption'> US Embassy San Jose</div></div> <div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/6862454118.jpg' alt=' US Embassy Romania'><div class='caption'> US Embassy Romania</div></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/8225846629.jpg' alt=' US Embassy Stockholm'><div class='caption'> US Embassy Stockholm</div></div> -<div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/9246033391.jpg' alt=' US Embassy Malta'><div class='caption'> US Embassy Malta</div></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/4749096858.jpg' alt=' US Embassy Kabul Flickr photo found in the MegaFace dataset'><div class='caption'> US Embassy Kabul Flickr photo found in the MegaFace dataset</div></div> -<div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/4730007024.jpg' alt=' US Embassy Canberra Flickr photo found in the MegaFace dataset'><div class='caption'> US Embassy Canberra Flickr photo found in the MegaFace dataset</div></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/7118211377.jpg' alt=' US Embassy Tokyo Flickr photo in the MegaFace dataset'><div class='caption'> US Embassy Tokyo Flickr photo in the MegaFace dataset</div></div> -<div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/7645865468.jpg' alt=' US Embassy Kingston Flickr photo in MegaFace dataset'><div class='caption'> US Embassy Kingston Flickr photo in MegaFace dataset</div></div></section><section><p>To make this analysis slightly more personal for Munich Security Conference readers, several photos from the US Consulate in Munich were found. Coincidentally, one of the images is from the Deutsch-amerikanischer Datenschutztag.</p> -</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/7208430726.jpg' alt=' US Consulate Munich Deutsch-amerikanischer Datenschutztag (data protection day) . Photo found in the MegaFace face recognition training dataset '><div class='caption'> US Consulate Munich Deutsch-amerikanischer Datenschutztag (data protection day) . Photo found in the MegaFace face recognition training dataset </div></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/7241284424.jpg' alt=' US Consulate Munich Flickr image in the MegaFace dataset'><div class='caption'> US Consulate Munich Flickr image in the MegaFace dataset</div></div></section><section><p>This brief research aims to shed light on the emerging politics of data. A photo is no longer just a photo when it can also be surveillance training data, and datasets can no longer be separated from the development of software when software is now built with data. "Our relationship to computers has changed", says Geoffrey Hinton, one of the founders of modern day neural networks and deep learning. "Instead of programming them, we now show them and they figure it out."<a class="footnote_shim" name="[^hinton]_1"> </a><a href="#[^hinton]" class="footnote" title="Footnote 1">1</a>.</p> +<div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/9246033391.jpg' alt=' US Embassy Malta'><div class='caption'> US Embassy Malta</div></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/4749096858.jpg' alt=' US Embassy Kabul'><div class='caption'> US Embassy Kabul</div></div> +<div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/9607407530.jpg' alt=' US Embassy Yaounde, Cameroon'><div class='caption'> US Embassy Yaounde, Cameroon</div></div></section><section><p>To make this analysis slightly more personal for Munich Security Conference readers, several photos from the US Consulate in Munich were located. Coincidentally, one of the images is from the Deutsch-amerikanischer Datenschutztag symposium (data protection day).</p> +</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/7208430726.jpg' alt=' US Consulate Munich Deutsch-amerikanischer Datenschutztag (data protection day). Photo found in the MegaFace face recognition training dataset '><div class='caption'> US Consulate Munich Deutsch-amerikanischer Datenschutztag (data protection day). Photo found in the MegaFace face recognition training dataset </div></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/7241284424.jpg' alt=' US Consulate Munich image found in the MegaFace dataset'><div class='caption'> US Consulate Munich image found in the MegaFace dataset</div></div></section><section><p>This brief research aims to shed light on the emerging politics of data. A photo is no longer just a photo when it can also be surveillance training data, and datasets can no longer be separated from the development of software when software is now built with data. "Our relationship to computers has changed", says Geoffrey Hinton, one of the founders of modern day neural networks and deep learning. "Instead of programming them, we now show them and they figure it out."<a class="footnote_shim" name="[^hinton]_1"> </a><a href="#[^hinton]" class="footnote" title="Footnote 1">1</a>. Data is a new kind of code.</p> <p>As data becomes more political, national AI strategies might also want to include transnational dataset strategies.</p> -<p><em>This research post is ongoing and will updated during July and August, 2019.</em></p> +<p><em>Research and text: © Adam Harvey</em></p> +</section><section> + + <div class="hr-wave-holder"> + <div class="hr-wave-line hr-wave-line1"></div> + <div class="hr-wave-line hr-wave-line2"></div> + </div> + + <h2>Supplementary Information</h2> + +</section><section><!-- ``` +load_file /path/to/embassy_counts_public +Headings: Images, Dataset, Embassy, Flickr ID, URL, Guest, Host +``` --> + <h3>FAQ</h3> <ul> -<li><strong>Why are most photos from US Embassies?</strong> Most Flickr accounts cross-referenced are from the US State Department's social media account list. </li> +<li><strong>Why are most photos from US Embassies?</strong> Most Flickr accounts cross-referenced are from the US State Department's social media account list. But also because Flickr is a US-based, English-formatted site.</li> <li><strong>Why are most photos from the MegaFace dataset?</strong> Probably because MegaFace is such a large dataset. It includes about 4.7 million images from Flickr. IBM's Diversity in Faces contains far fewer, around 1 million. Only the photos with embassy logos were displayed on this page.</li> +<li><strong>Why is the Who Goes There dataset included if it's not explicitly for "face recognition"?</strong> Ethnicity analysis is part of a broader group of facial analysis algorithms that include recognition of identity, age, gender, pose, emotion, and facial attributes. Ethnicity analysis can be used to recognize ethnic affiliations, which contributes to identity analysis. Who Goes There dataset is included because it contributes to remote biometric identification analysis research.</li> </ul> +<h3>Data Sources</h3> +<p>The list of of embassies used for this analysis are from the <a href="https://www.state.gov/global-social-media-presence/">U.S. Department of State’s Social Media Presence List</a> combined with manual search results. In some cases, the official U.S. Dept. of State list describes consulates and missions as embassies. For example, the US Consulate Munich and the US Mission Canada are marked as "EMBASSY". Consulates and missions listed as embassies by the U.S. Dept. of State list are included in this analysis.</p> +<p>The Who Goes There dataset is used for ethnicity analysis and is included because ethnicity analysis can be used as part of facial recognition.</p> +<p>Citations are gathered from <a href="https://SemanticScholar.com">SemanticScholar.com</a>.</p> <h3>Further Reading</h3> <ul> <li><a href="/datasets/msceleb">MS Celeb Dataset Analysis</a></li> -<li><a href="/datasets/brainwash">Brainwash Dataset Analysis</a></li> <li><a href="/datasets/duke_mtmc">Duke MTMC Dataset Analysis</a></li> -<li><a href="/datasets/uccs">Unconstrained College Students Dataset Analysis</a></li> <li><a href="https://www.dukechronicle.com/article/2019/06/duke-university-facial-recognition-data-set-study-surveillance-video-students-china-uyghur">Duke MTMC dataset author apologies to students</a></li> <li><a href="https://www.bbc.com/news/technology-48555149">BBC coverage of MS Celeb dataset takedown</a></li> <li><a href="https://www.spiegel.de/netzwelt/web/microsoft-gesichtserkennung-datenbank-mit-zehn-millionen-fotos-geloescht-a-1271221.html">Spiegel coverage of MS Celeb dataset takedown</a></li> </ul> </section><section> - <div class="hr-wave-holder"> - <div class="hr-wave-line hr-wave-line1"></div> - <div class="hr-wave-line hr-wave-line2"></div> - </div> - - <h2>Supplementary Information</h2> - -</section><section class='applet_container'><div class='applet' data-payload='{"command": "load_file /site/research/munich_security_conference/assets/embassy_counts_public.csv", "fields": ["Headings: Images, Dataset, Embassy, Flickr ID, URL, Guest, Host"]}'></div></section><section><p>The list of of embassies used for this analysis are from the <a href="https://www.state.gov/global-social-media-presence/">U.S. Department of State’s Social Media Presence List</a> combined with manual search results. In some cases, the official U.S. Dept. of State list describes consulates and missions as embassies. For example, the US Consulate Munich and the US Mission Canada are marked as "EMBASSY". Consulates and missions listed as embassies by the U.S. Dept. of State list are included in this analysis.</p> -</section><section> - <h4>Cite Our Work</h4> <p> |
