diff options
| author | adamhrv <adam@ahprojects.com> | 2019-10-08 18:39:21 +0200 |
|---|---|---|
| committer | adamhrv <adam@ahprojects.com> | 2019-10-08 18:39:21 +0200 |
| commit | 59fbaaa9eb5539b89a6ed43682a8074356b1366d (patch) | |
| tree | 92b5aad51c0e6af27f99b68af528d6ddc3854c9e | |
| parent | 3e8e7b476dbf8f0e3ab9347a078494a16d0cdfb6 (diff) | |
styling
| -rw-r--r-- | TODO.md | 20 | ||||
| -rw-r--r-- | client/modalImage/modal.css | 1 | ||||
| -rw-r--r-- | site/assets/css/css.css | 39 | ||||
| -rw-r--r-- | site/content/pages/about/index.md | 18 | ||||
| -rw-r--r-- | site/content/pages/datasets/brainwash/index.md | 4 | ||||
| -rw-r--r-- | site/content/pages/datasets/index.md | 2 | ||||
| -rw-r--r-- | site/content/pages/datasets/megaface/index.md | 37 | ||||
| -rw-r--r-- | site/content/pages/research/munich_security_conference/index.md | 22 | ||||
| -rw-r--r-- | site/public/about/index.html | 18 | ||||
| -rw-r--r-- | site/public/datasets/brainwash/index.html | 4 | ||||
| -rw-r--r-- | site/public/datasets/index.html | 2 | ||||
| -rw-r--r-- | site/public/datasets/megaface/index.html | 22 | ||||
| -rw-r--r-- | site/public/research/munich_security_conference/index.html | 23 |
13 files changed, 140 insertions, 72 deletions
@@ -1,14 +1,22 @@ # TODO -## CSS +## Updates for NYT October 13/14 -- change font size in Tabulator to 12px (can't find where to edit it) +### Charts -## Charts, JS +- fix background hover on bar graph chart? remove extra black bg -- can we make the age/gender all in one include? -- can we auto-add download links to age/gender csv? -- can the pie chart labels keep same order as in CSV? +### CSS +- change tabulator to white/gry +- add a markdown parser? to handle a new update blurb at top of dataset pages. For now I used CSS selectors, but seems brittle +- can you check mobile css? my white-style edits might have broken +### MegaFace Dataset + +- origin needs to be fixed (in the ocean) + +### Homepage + +- feel like we need a "face" diff --git a/client/modalImage/modal.css b/client/modalImage/modal.css index cc9a1f32..c8ef9b60 100644 --- a/client/modalImage/modal.css +++ b/client/modalImage/modal.css @@ -32,6 +32,7 @@ display: block; text-align: center; /*background: black;*/ + color: #FFF; padding: 10px; } .modal .prev span, diff --git a/site/assets/css/css.css b/site/assets/css/css.css index 75f1ad3f..ae22fa1a 100644 --- a/site/assets/css/css.css +++ b/site/assets/css/css.css @@ -156,7 +156,7 @@ footer { display: flex; flex-direction: row; justify-content: space-between; - color: #000; + color: #ccc; font-size: 13px; /*line-height: 17px;*/ padding: 15px; @@ -179,6 +179,9 @@ footer a { padding-bottom: 1px; text-decoration: none; } +.desktop footer a { + border-bottom:1px solid #999; +} .desktop footer a:hover { color: #fff; border-bottom:1px solid #999; @@ -214,7 +217,8 @@ footer ul:last-child li { h1 { color: #000; font-weight: 500; - font-size: 30pt; + font-size: 28pt; + line-height: 38pt; margin: 20px auto 10px auto; padding: 0; transition: color 0.1s cubic-bezier(0,0,1,1); @@ -382,12 +386,24 @@ section h1, section h2, section h3, section h4, section h5, section h6, section } .content-dataset section:nth-child(4) p:nth-child(2){ font-size:20px; - line-height: 32px; + line-height: 34px; color:#000; } .content-dataset section:nth-child(3) p:nth-child(2) { /* highlight news text */ + /*font-style: italic;*/ + font-weight: 500; + color:#f00; +} +.content-dataset section:nth-child(3) p:nth-child(2) a{ + /* highlight news text */ + color:#f00; + border-bottom: 1px solid #f00; +} +.content-dataset section:nth-child(3) p:nth-child(2) a:hover{ + /* highlight news text */ color:#f00; + border-bottom: 1px solid #f00; } p.subp{ font-size: 14px; @@ -492,15 +508,14 @@ p.subp{ /* lists */ ul { - list-style-type: none; + list-style-type: square; margin: 0 0 30px 0; padding: 0; } ul li { margin-bottom: 8px; - color: #333; font-weight: 400; - font-size: 14px; + font-size: 15px; } /* misc formatting */ @@ -626,8 +641,8 @@ ul.footnotes p { font-family: 'Roboto Mono', monospace; font-weight: 400; text-transform: uppercase; - color: #666; - font-size: 11pt; + color: #333; + font-size: 12pt; } /* images */ @@ -1154,8 +1169,8 @@ ul.map-legend li.source:before { font-weight: 300; } .content-about section:first-of-type > p:first-of-type { - font-size: 22px; - line-height: 40px; + font-size: 26px; + line-height: 42px; } .content-about .about-menu ul li { display: inline-block; @@ -1258,13 +1273,13 @@ ul.map-legend li.source:before { /* footnotes */ a.footnote { - font-size: 9px; + font-size: 10px; line-height: 0px; position: relative; /*display: inline-block;*/ bottom: 7px; text-decoration: none; - color: #666; + color: #333; border: 0; left: -1px; transition-duration: 0s; diff --git a/site/content/pages/about/index.md b/site/content/pages/about/index.md index 90072b37..f07a79ee 100644 --- a/site/content/pages/about/index.md +++ b/site/content/pages/about/index.md @@ -35,8 +35,13 @@ MegaPixels is an independent project, designed as a public resource for educator A dataset of verified geocoded citations and dataset statistics will be published in Fall 2019 along with a research paper as part of a research fellowship for [KIM (Critical Artificial Intelligence) Karlsruhe HfG](http://kim.hfg-karlsruhe.de/). +#### Team -### Selected News and Exhibitions +- [Adam Harvey](https://ahprojects.com): Concept, research and analysis, design, computer vision +- [Jules LaPlace](https://asdf.us): Information and systems architecture, data management, citation geocoding, web applications + + +### News and Publications - July 2019: New York Times writes about MegaPixels and how "[Facial Recognition Tech Is Growing Stronger, Thanks to Your Face](https://www.nytimes.com/2019/07/13/technology/databases-faces-facial-recognition-technology.html)" - June 2019 - 2020: MegaPixels installation at Ars Electronica Center (AT) exhibition ["Compass - Navigating the Future"](https://ars.electronica.art/center/en/megapixels) @@ -46,18 +51,13 @@ A dataset of verified geocoded citations and dataset statistics will be publishe Read more [news](/about/news) -##### Team - -- Adam Harvey: Concept, research and analysis, design, computer vision -- Jules LaPlace: Information and systems architecture, data management, web applications - -##### Contributing Researchers +#### Contributing Researchers - Beth (aka Ms. Celeb) - Berit Gilma - Mathana Stender -##### Code and Libraries +#### Code and Libraries - [Semantic Scholar](https://semanticscholar.org) for citation aggregation - Leaflet.js for maps @@ -66,7 +66,7 @@ Read more [news](/about/news) - PDFMiner.Six and Pandas for research paper analysis -##### Attribution +#### Attribution If you use MegaPixels or any data derived from it for your work, please cite our original work as follows: diff --git a/site/content/pages/datasets/brainwash/index.md b/site/content/pages/datasets/brainwash/index.md index 6d2279cb..a61c007c 100644 --- a/site/content/pages/datasets/brainwash/index.md +++ b/site/content/pages/datasets/brainwash/index.md @@ -4,7 +4,7 @@ status: published title: Brainwash Dataset desc: Brainwash is a dataset of webcam images taken from the Brainwash Cafe in San Francisco subdesc: It includes 11,917 images of "everyday life of a busy downtown cafe" and is used for training face and head detection algorithms -caption: One of 11,917 images from the Brainwash dataset captured from the Brainwash Cafe in San Francisco +caption: One of the 11,917 images in the Brainwash dataset captured from the Brainwash Cafe in San Francisco slug: brainwash cssclass: dataset image: assets/background.jpg @@ -17,7 +17,7 @@ authors: Adam Harvey # Brainwash Dataset -*Update: In response to the publication of this report, the Brainwash dataset has been "removed from access at the request of the depositor."* +Update: In response to the publication of this report, the Brainwash dataset has been "removed from access at the request of the depositor." ### sidebar diff --git a/site/content/pages/datasets/index.md b/site/content/pages/datasets/index.md index 54912242..f3d5fea0 100644 --- a/site/content/pages/datasets/index.md +++ b/site/content/pages/datasets/index.md @@ -16,4 +16,4 @@ sync: false Explore face and person recognition datasets contributing to the growing crisis of biometric surveillance technologies. This first group of 5 datasets focuses on image usage connected to foreign surveillance and defense organizations. -In response to the analyses below, the [Brainwash](https://purl.stanford.edu/sx925dc9385), [Duke MTMC](http://vision.cs.duke.edu/DukeMTMC/), and [MS Celeb](http://msceleb.org/) datasets have been taken down by their authors. The [UCCS](https://vast.uccs.edu/Opensetface/) dataset was temporarily deactivated due to metadata exposure. Read more [news](/about/news). A more complete list of datasets and research will be published in September 2019. These 5 are only a preview. +In response to the analyses below, the [Brainwash](/datasets/brainwash), [Duke MTMC](/datasets/duke_mtmc), and [MS Celeb](/datasets/msceleb/) datasets have been taken down by their authors. The [UCCS](/dataests/uccs/) dataset was temporarily deactivated due to metadata exposure. Read more [news](/about/news). A more complete list of datasets and research will be published in September 2019. These 5 are only a preview. diff --git a/site/content/pages/datasets/megaface/index.md b/site/content/pages/datasets/megaface/index.md index 2009e70e..9c282cb2 100644 --- a/site/content/pages/datasets/megaface/index.md +++ b/site/content/pages/datasets/megaface/index.md @@ -7,6 +7,7 @@ subdesc: MegaFace contains 670K identities and 4.7M images caption: Example images from the MegaFace dataset slug: megaface cssclass: dataset +caption: Images from the MegaFace face recognition training and benchmarking dataset image: assets/background.jpg year: 2016 published: 2019-4-18 @@ -15,12 +16,44 @@ authors: Adam Harvey ------------ -## MegaFace +# MegaFace ### sidebar ### end sidebar -MegaFace is a dataset... +MegaFace is a dataset of 4,700,000 face images of 672,000 individuals used for developing face recognition technologies. All images were downloaded from Flickr. + +#### How was it made + +MegaFace was developed by the University of Washington for the purpose of trainng, validating, and benchmarking face recognition algorithms. + +The images are from Flickr, but are they all from YFCC100M? + +#### Who used it + +MegaFace was used for research projects associated with SenseTime, Google, Mitsubishi, Vision Semantics Ltd, Microsoft. + +#### Subsets + +MegaFace was also used for MegaFace Asian, and MegaAge, and glasses. + +#### A sample of the research projects + +Used for face recognition + +screenshots of papers + +#### Visuals + +- facial landmarks +- bounding boxes +- animation of all the titles of the paper +- + +### + + + {% include 'dashboard.html' %} diff --git a/site/content/pages/research/munich_security_conference/index.md b/site/content/pages/research/munich_security_conference/index.md index 365ee404..75392dc3 100644 --- a/site/content/pages/research/munich_security_conference/index.md +++ b/site/content/pages/research/munich_security_conference/index.md @@ -5,7 +5,8 @@ title: Transnational Flows of Face Recognition Image Training Data slug: munich-security-conference desc: Transnational Flows of Face Recognition Image Training Data subdesc: Where does face data originate and who's using it? -cssclass: dataset +caption: An image from the MegaFace face recognition training dataset taken from the U.S. Embassy of Madrid Flickr account +cssclass: blog image: assets/background.jpg published: 2019-6-28 updated: 2019-6-29 @@ -13,6 +14,7 @@ authors: Adam Harvey ------------ +# Transnational Flows of Face Recognition Image Training Data *A case study on publicly available facial recognition datasets for the Munich Security Conference's Transnational Security Report* @@ -33,19 +35,13 @@ Our [earlier research](https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d In this new research for the [Munich Security Conference's Transnational Security Report](https://tsr.securityconference.de) we provide summary statistics about the origins and endpoints of facial recognition information supply chains. To make it more personal, we gathered additional data on the number of public photos from embassies that are currently being used in facial recognition training datasets. -<div style="display:inline;" class="columns columns-1"><div class="column"><div style="background:#202020;border-radius:6px;padding:20px;width:100%"> -<h4>Key Findings</h4> - -<ul> - <li>24 million non-cooperative images were used in facial recognition research projects</li> - <li>Most data originated from US-based search engines and Flickr, but most research citations found in China</li> - <li>Over 6,000 of the images were from US, British, Italian, and French embassies (mostly US embassies)</li> - <li>Images were used for commercial research by Google (US), Microsoft (US), SenseTime (China), Tencent (China), Mitsubishi (Japan), ExpertSystems (Italy), Siren Solution (Ireland), and Paradigma Digital (Spain); and military research by National University of Defense Technology (China)</li> -</ul> - -</div></div></div> +### Key Findings +- 24 million non-cooperative images were used in facial recognition research prects +- Most data originated from US-based search engines and Flickr, but most research citations found in China +- Over 6,000 of the images were from US, British, Italian, and French embassies (mostly US embassies) +- Images were used for commercial research by Google (US), Microsoft (US), SenseTime (China), Tencent (China), Mitsubishi (Japan), ExpertSystems (Italy), Siren Solution (Ireland), and Paradigma Digital (Spain); and military research by National University of Defense Technology (China) ### 24 Million Photos @@ -74,7 +70,7 @@ OtherLabel: Other === end columns - + ### 8,428 Embassy Photos Found in Facial Recognition Datasets diff --git a/site/public/about/index.html b/site/public/about/index.html index 427a97a2..e5a120d1 100644 --- a/site/public/about/index.html +++ b/site/public/about/index.html @@ -69,7 +69,12 @@ <p>MegaPixels aims to provide a critical perspective on machine learning image datasets, one that might otherwise escape academia and industry funded artificial intelligence think tanks that are often supported by the same technology companies who created many of the datasets presented on this site.</p> <p>MegaPixels is an independent project, designed as a public resource for educators, students, journalists, and researchers. Each dataset presented on this site undergoes a thorough review of its images, intent, and citations. MegaPixels is a website-first research project, with an academic publication to follow in fall 2019.</p> <p>A dataset of verified geocoded citations and dataset statistics will be published in Fall 2019 along with a research paper as part of a research fellowship for <a href="http://kim.hfg-karlsruhe.de/">KIM (Critical Artificial Intelligence) Karlsruhe HfG</a>.</p> -<h3>Selected News and Exhibitions</h3> +<h4>Team</h4> +<ul> +<li><a href="https://ahprojects.com">Adam Harvey</a>: Concept, research and analysis, design, computer vision</li> +<li><a href="https://asdf.us">Jules LaPlace</a>: Information and systems architecture, data management, citation geocoding, web applications</li> +</ul> +<h3>News and Publications</h3> <ul> <li>July 2019: New York Times writes about MegaPixels and how "<a href="https://www.nytimes.com/2019/07/13/technology/databases-faces-facial-recognition-technology.html">Facial Recognition Tech Is Growing Stronger, Thanks to Your Face</a>" </li> <li>June 2019 - 2020: MegaPixels installation at Ars Electronica Center (AT) exhibition <a href="https://ars.electronica.art/center/en/megapixels">"Compass - Navigating the Future"</a> </li> @@ -77,18 +82,13 @@ <li>June 26, 2019: The Atlantic writes about image training datasets "in the wild" and research ethics: <a href="https://www.theatlantic.com/technology/archive/2019/06/universities-record-students-campuses-research/592537/">Universities Record Students on Campuses for Research</a> by Sidney Fussell</li> </ul> <p>Read more <a href="/about/news">news</a></p> -<h5>Team</h5> -<ul> -<li>Adam Harvey: Concept, research and analysis, design, computer vision</li> -<li>Jules LaPlace: Information and systems architecture, data management, web applications</li> -</ul> -<h5>Contributing Researchers</h5> +<h4>Contributing Researchers</h4> <ul> <li>Beth (aka Ms. Celeb)</li> <li>Berit Gilma</li> <li>Mathana Stender</li> </ul> -<h5>Code and Libraries</h5> +<h4>Code and Libraries</h4> <ul> <li><a href="https://semanticscholar.org">Semantic Scholar</a> for citation aggregation</li> <li>Leaflet.js for maps</li> @@ -96,7 +96,7 @@ <li>ThreeJS for 3D visualizations</li> <li>PDFMiner.Six and Pandas for research paper analysis</li> </ul> -<h5>Attribution</h5> +<h4>Attribution</h4> <p>If you use MegaPixels or any data derived from it for your work, please cite our original work as follows:</p> <pre> @online{megapixels, diff --git a/site/public/datasets/brainwash/index.html b/site/public/datasets/brainwash/index.html index d715d163..efd2f5a8 100644 --- a/site/public/datasets/brainwash/index.html +++ b/site/public/datasets/brainwash/index.html @@ -55,8 +55,8 @@ </header> <div class="content content-dataset"> - <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/brainwash/assets/background.jpg)'></section><section><div class='image'><div class='intro-caption caption'>One of 11,917 images from the Brainwash dataset captured from the Brainwash Cafe in San Francisco</div></div></section><section><h1>Brainwash Dataset</h1> -<p><em>Update: In response to the publication of this report, the Brainwash dataset has been "removed from access at the request of the depositor."</em></p> + <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/brainwash/assets/background.jpg)'></section><section><div class='image'><div class='intro-caption caption'>One of the 11,917 images in the Brainwash dataset captured from the Brainwash Cafe in San Francisco</div></div></section><section><h1>Brainwash Dataset</h1> +<p>Update: In response to the publication of this report, the Brainwash dataset has been "removed from access at the request of the depositor."</p> </section><section><div class='right-sidebar'><div class='meta'> <div class='gray'>Published</div> <div>2015</div> diff --git a/site/public/datasets/index.html b/site/public/datasets/index.html index c17caeb0..a354a2d5 100644 --- a/site/public/datasets/index.html +++ b/site/public/datasets/index.html @@ -59,7 +59,7 @@ <div class='dataset-heading'> <section><h1>Dataset Analyses</h1> <p>Explore face and person recognition datasets contributing to the growing crisis of biometric surveillance technologies. This first group of 5 datasets focuses on image usage connected to foreign surveillance and defense organizations.</p> -<p>In response to the analyses below, the <a href="https://purl.stanford.edu/sx925dc9385">Brainwash</a>, <a href="http://vision.cs.duke.edu/DukeMTMC/">Duke MTMC</a>, and <a href="http://msceleb.org/">MS Celeb</a> datasets have been taken down by their authors. The <a href="https://vast.uccs.edu/Opensetface/">UCCS</a> dataset was temporarily deactivated due to metadata exposure. Read more <a href="/about/news">news</a>. A more complete list of datasets and research will be published in September 2019. These 5 are only a preview.</p> +<p>In response to the analyses below, the <a href="/datasets/brainwash">Brainwash</a>, <a href="/datasets/duke_mtmc">Duke MTMC</a>, and <a href="/datasets/msceleb/">MS Celeb</a> datasets have been taken down by their authors. The <a href="/dataests/uccs/">UCCS</a> dataset was temporarily deactivated due to metadata exposure. Read more <a href="/about/news">news</a>. A more complete list of datasets and research will be published in September 2019. These 5 are only a preview.</p> </section> </div> diff --git a/site/public/datasets/megaface/index.html b/site/public/datasets/megaface/index.html index 78f6a0cc..7ef1842f 100644 --- a/site/public/datasets/megaface/index.html +++ b/site/public/datasets/megaface/index.html @@ -55,7 +55,7 @@ </header> <div class="content content-dataset"> - <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/megaface/assets/background.jpg)'></section><section><div class='image'><div class='intro-caption caption'>Example images from the MegaFace dataset</div></div></section><section><h2>MegaFace</h2> + <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/megaface/assets/background.jpg)'></section><section><div class='image'><div class='intro-caption caption'>Images from the MegaFace face recognition training and benchmarking dataset</div></div></section><section><h1>MegaFace</h1> </section><section><div class='right-sidebar'><div class='meta'> <div class='gray'>Published</div> <div>2016</div> @@ -71,7 +71,25 @@ </div><div class='meta'> <div class='gray'>Website</div> <div><a href='http://megaface.cs.washington.edu/' target='_blank' rel='nofollow noopener'>washington.edu</a></div> - </div></div><p>MegaFace is a dataset...</p> + </div></div><p>MegaFace is a dataset of 4,700,000 face images of 672,000 individuals used for developing face recognition technologies. All images were downloaded from Flickr.</p> +<h4>How was it made</h4> +<p>MegaFace was developed by the University of Washington for the purpose of trainng, validating, and benchmarking face recognition algorithms.</p> +<p>The images are from Flickr, but are they all from YFCC100M?</p> +<h4>Who used it</h4> +<p>MegaFace was used for research projects associated with SenseTime, Google, Mitsubishi, Vision Semantics Ltd, Microsoft.</p> +<h4>Subsets</h4> +<p>MegaFace was also used for MegaFace Asian, and MegaAge, and glasses.</p> +<h4>A sample of the research projects</h4> +<p>Used for face recognition</p> +<p>screenshots of papers</p> +<h4>Visuals</h4> +<ul> +<li>facial landmarks</li> +<li>bounding boxes</li> +<li>animation of all the titles of the paper</li> +<li></li> +</ul> +<h2>#</h2> </section><section> <h3>Who used MegaFace Dataset?</h3> diff --git a/site/public/research/munich_security_conference/index.html b/site/public/research/munich_security_conference/index.html index b43df151..3b18f1cd 100644 --- a/site/public/research/munich_security_conference/index.html +++ b/site/public/research/munich_security_conference/index.html @@ -53,27 +53,24 @@ <a href="/research">Research</a> </div> </header> - <div class="content content-dataset"> + <div class="content content-blog"> - <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/background.jpg)'></section><section><p><em>A case study on publicly available facial recognition datasets for the Munich Security Conference's Transnational Security Report</em></p> + <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/background.jpg)'></section><section><div class='image'><div class='intro-caption caption'>An image from the MegaFace face recognition training dataset taken from the U.S. Embassy of Madrid Flickr account</div></div></section><section><h1>Transnational Flows of Face Recognition Image Training Data</h1> +<p><em>A case study on publicly available facial recognition datasets for the Munich Security Conference's Transnational Security Report</em></p> </section><section><div class='right-sidebar'><div class='meta'><div class='gray'>Images Analyzed</div><div>24,302,637</div></div><div class='meta'><div class='gray'>Datasets Analyzed</div><div>30</div></div><div class='meta'><div class='gray'>Years</div><div>2006 - 2018</div></div><div class='meta'><div class='gray'>Last Updated</div><div>July 7, 2019</div></div><div class='meta'><div class='gray'>Text and Research</div><div>Adam Harvey</div></div><div class='meta'><div class='gray'>Published in</div><div><a href="https://tsr.securityconference.de/">Transnational Security Report</a></div></div></div><p>National AI strategies often rely on transnational data sources to capitalize on recent advancements in deep learning and neural networks. Researchers benefiting from these transnational data flows can yield quick and significant gains across diverse sectors from health care to biometrics. But new challenges emerge when national AI strategies collide with national interests.</p> <p>Our <a href="https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d43599e">earlier research</a> on the <a href="/datasets/msceleb">MS Celeb</a> and <a href="/datasets/duke_mtmc">Duke</a> datasets published with the Financial Times revealed that several computer vision image datasets created by US companies and universities were unexpectedly also used for research by the National University of Defense Technology in China, along with top Chinese surveillance firms including SenseTime, SenseNets, CloudWalk, Hikvision, and Megvii/Face++ which have all been linked to oppressive surveillance in the Xinjiang region of China.</p> <p>In this new research for the <a href="https://tsr.securityconference.de">Munich Security Conference's Transnational Security Report</a> we provide summary statistics about the origins and endpoints of facial recognition information supply chains. To make it more personal, we gathered additional data on the number of public photos from embassies that are currently being used in facial recognition training datasets.</p> -<div style="display:inline;" class="columns columns-1"><div class="column"><div style="background:#202020;border-radius:6px;padding:20px;width:100%"> - -<h4>Key Findings</h4> - +<h3>Key Findings</h3> <ul> - <li>24 million non-cooperative images were used in facial recognition research projects</li> - <li>Most data originated from US-based search engines and Flickr, but most research citations found in China</li> - <li>Over 6,000 of the images were from US, British, Italian, and French embassies (mostly US embassies)</li> - <li>Images were used for commercial research by Google (US), Microsoft (US), SenseTime (China), Tencent (China), Mitsubishi (Japan), ExpertSystems (Italy), Siren Solution (Ireland), and Paradigma Digital (Spain); and military research by National University of Defense Technology (China)</li> +<li>24 million non-cooperative images were used in facial recognition research prects</li> +<li>Most data originated from US-based search engines and Flickr, but most research citations found in China</li> +<li>Over 6,000 of the images were from US, British, Italian, and French embassies (mostly US embassies)</li> +<li>Images were used for commercial research by Google (US), Microsoft (US), SenseTime (China), Tencent (China), Mitsubishi (Japan), ExpertSystems (Italy), Siren Solution (Ireland), and Paradigma Digital (Spain); and military research by National University of Defense Technology (China)</li> </ul> - -</div></div></div><h3>24 Million Photos</h3> +<h3>24 Million Photos</h3> <p><strong>Origins</strong>: In total, we found over 24 million non-cooperative, non-consensual photos in 30 publicly available face recognition and face analysis datasets. Of these 24 million images, over 15 million face images are from Internet search engines, over 5.8 million from Flickr.com, over 2.5 million from the Internet Movie Database (IMDb.com), and nearly 500,000 from CCTV footage. All 24 million images were collected without any explicit consent, a type of face image that researchers call "in the wild". Every image contains at least one face and many photos contain multiple faces. There are approximately 1 million unique identities across all 24 million images.</p> <p><strong>Endpoints</strong>:To understand the geographic dimensions of the data, we manually verified 1,134 publicly available research papers that cite these datasets to determine who was using the face data and where it was being used. Even though the vast majority of the images originated in the United States or from US companies, publicly available research papers show that only about 25% of the citations are from the United States while the majority are from China. Because only English research papers were analyzed the number of foreign research papers is likely to be larger and reflect increased foreign usage.</p> -</section><section><div class='columns columns-2'><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/megapixels_origins_top.csv", "fields": ["Caption: Origins of 24.3 million photos in publicly available face analysis datasets 2006 - 2018", "Top: 10", "OtherLabel: Other"]}'></div></section><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/summary_countries.csv", "fields": ["Caption: Endpoints of 1,134 facial analysis research projects citing 30 face analysis datasets", "Top: 14", "OtherLabel: Other"]}'></div></section></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/7118211377.jpg' alt=''></div></section><section><h3>8,428 Embassy Photos Found in Facial Recognition Datasets</h3> +</section><section><div class='columns columns-2'><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/megapixels_origins_top.csv", "fields": ["Caption: Origins of 24.3 million photos in publicly available face analysis datasets 2006 - 2018", "Top: 10", "OtherLabel: Other"]}'></div></section><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/summary_countries.csv", "fields": ["Caption: Endpoints of 1,134 facial analysis research projects citing 30 face analysis datasets", "Top: 14", "OtherLabel: Other"]}'></div></section></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/site/research/munich_security_conference/assets/7118211377.jpg' alt=' A photo from the U.S Embassy in Tokyo found in a facial recognition training dataset'><div class='caption'> A photo from the U.S Embassy in Tokyo found in a facial recognition training dataset</div></div></section><section><h3>8,428 Embassy Photos Found in Facial Recognition Datasets</h3> <p>Out of the 24 million images analyzed, at least 8,428 embassy images were found in face recognition and facial analysis datasets. These images were found by cross-referencing Flickr IDs and URLs between datasets to locate 5,667 images in the MegaFace dataset, 389 images in the IBM Diversity in Faces datasets, and 2,372 images in the Who Goes There dataset. MegaFace is one of the most widely used publicly available face recognition datasets for academic, commercial, and defense-related research.</p> <p>In total, these 8,428 images were found to be used in at least 42 countries with most citations originating in China and most images originating from US embassies. The images were found to be used in research projects with links to commercial and defense organization including Google, Microsoft, National University of Defense Technology in China, SenseTime, Tencent, Mitsubishi, ExpertSystems (Italy), Siren Solution (Ireland), and Paradigma Digital (Spain).</p> </section><section><div class='columns columns-2'><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/embassy_counts_summary_dataset.csv", "fields": ["Caption: Number of embassy photos incluced in each face recognition dataset", "Top: 4", "OtherLabel: Other", "Colors: categoryRainbow"]}'></div></section><section class='applet_container'><div class='applet' data-payload='{"command": "single_pie_chart /site/research/munich_security_conference/assets/country_counts.csv", "fields": ["Caption: Number of photos per national embassy", "Top: 4", "OtherLabel: Other", "Colors: categoryRainbow"]}'></div></section></div></section><section><p>The embassy and consulate photos below were all found in either the MegaFace or IBM Diversity in Faces datasets. Consulates were only included if marked as "EMBASSY" by the <a href="https://www.state.gov/global-social-media-presence/">U.S. Department of State’s Social Media Presence List</a>. Photos below were chosen because of inclusion of an embassy logo. All photos originated on Flickr.com and were published with a Creative Commons license.</p> |
