summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--security.md6
-rw-r--r--site/content/_drafts_/adience/index.md32
-rw-r--r--site/content/_drafts_/ibm_dif/index.md28
-rw-r--r--site/content/_drafts_/megaface/index.md49
-rw-r--r--site/content/pages/about/attribution.md2
-rw-r--r--site/content/pages/about/index.md2
-rw-r--r--site/content/pages/about/legal.md4
-rw-r--r--site/content/pages/about/press.md2
-rw-r--r--site/content/pages/datasets/ijb_c/index.md9
-rw-r--r--site/content/pages/datasets/msceleb/assets/notes.md3
-rw-r--r--site/content/pages/datasets/msceleb/index.md40
-rw-r--r--site/content/pages/datasets/uccs/assets/notes.md5
-rw-r--r--site/content/pages/research/00_introduction/index.md11
-rw-r--r--site/content/pages/research/01_from_1_to_100_pixels/index.md5
-rw-r--r--site/public/about/legal/index.html2
-rw-r--r--site/public/datasets/ijb_c/index.html6
-rw-r--r--site/public/datasets/msceleb/assets/notes/index.html75
-rw-r--r--site/public/datasets/msceleb/index.html46
-rw-r--r--site/public/datasets/uccs/assets/notes/index.html75
-rw-r--r--site/public/index.html6
-rw-r--r--site/public/research/00_introduction/index.html9
-rw-r--r--site/public/research/01_from_1_to_100_pixels/index.html4
-rw-r--r--site/templates/home.html6
-rw-r--r--todo.md35
24 files changed, 393 insertions, 69 deletions
diff --git a/security.md b/security.md
new file mode 100644
index 00000000..d0bffdb4
--- /dev/null
+++ b/security.md
@@ -0,0 +1,6 @@
+# MegaPixels
+
+### Potential Blacklist
+
+- 103.213.248.154
+ - 5,000 hits April with unknown browser from Hong Kong around April 22 \ No newline at end of file
diff --git a/site/content/_drafts_/adience/index.md b/site/content/_drafts_/adience/index.md
new file mode 100644
index 00000000..60a6cd1f
--- /dev/null
+++ b/site/content/_drafts_/adience/index.md
@@ -0,0 +1,32 @@
+------------
+
+status: draft
+title: Adience
+desc: <span class="dataset-name">Adience</span> is a ...
+subdesc: Adience contains ...
+slug: Adience
+cssclass: dataset
+image: assets/background.jpg
+year: 2007
+published: 2019-2-23
+updated: 2019-2-23
+authors: Adam Harvey
+
+------------
+
+## Adience Dataset
+
+### sidebar
+### end sidebar
+
+[ page under development ]
+
+- Deep Age Estimation: From Classification to Ranking
+ - https://verify.megapixels.cc/paper/adience/verify/4f1249369127cc2e2894f6b2f1052d399794919a
+ - funded by FordMotor Company University Reserach Program
+- Unconstrained Age Estimation with Deep Convolutional Neural Networks
+ - https://verify.megapixels.cc/paper/adience/verify/31f1e711fcf82c855f27396f181bf5e565a2f58d
+ - "we augment our data by sampling 1000 images for the age group of 0-20 from Adience [3]"
+ - the work was supported by IARPA and ODNI
+
+{% include 'dashboard.html' %} \ No newline at end of file
diff --git a/site/content/_drafts_/ibm_dif/index.md b/site/content/_drafts_/ibm_dif/index.md
new file mode 100644
index 00000000..5d72193b
--- /dev/null
+++ b/site/content/_drafts_/ibm_dif/index.md
@@ -0,0 +1,28 @@
+------------
+
+status: draft
+title: IBM Diversity in Faces
+desc: <span class="dataset-name">IBM Diversity in Faces</span> is a person re-identification dataset of images captured at UC Santa Cruz in 2007
+subdesc: IBM Diversity in Faces contains 1,264 images and 632 persons on the UC Santa Cruz campus and is used to train person re-identification algorithms for surveillance
+slug: IBM Diversity in Faces
+cssclass: dataset
+image: assets/background.jpg
+year: 2007
+published: 2019-2-23
+updated: 2019-2-23
+authors: Adam Harvey
+
+------------
+
+## IBM Diversity in Faces Dataset
+
+### sidebar
+### end sidebar
+
+[ page under development ]
+
+in "Understanding Unequal Gender Classification Accuracyfrom Face Images" researcher affilliated with IBM created a new version of PPB so they didn't have to agree to the terms of the original PPB.
+
+>We use an approximation of the PPB dataset for the ex-periments in this paper. This dataset contains images ofparliament members from the six countries identified in[4] and were manually labeled by us into the categoriesdark-skinned and light-skinned.1Our approximation tothe PPB dataset, which we call PPB*, is very similar toPPB and satisfies the relevant characteristics for the study we perform. Table 1 compares the decomposition of theoriginal PPB dataset and our PPB* approximation accord-ing to skin type and gender.
+
+{% include 'dashboard.html' %} \ No newline at end of file
diff --git a/site/content/_drafts_/megaface/index.md b/site/content/_drafts_/megaface/index.md
new file mode 100644
index 00000000..4c7bb309
--- /dev/null
+++ b/site/content/_drafts_/megaface/index.md
@@ -0,0 +1,49 @@
+------------
+
+status: draft
+title: MegaFace
+desc: <span class="dataset-name">MegaFace</span> is a face recognition dataset created by scraping Flickr photo albums
+subdesc: MegaFace contains 1,264 images and 632 persons on the UC Santa Cruz campus and is used to train person re-identification algorithms for surveillance
+slug: MegaFace
+cssclass: dataset
+image: assets/background.jpg
+year: 2007
+published: 2019-2-23
+updated: 2019-2-23
+authors: Adam Harvey
+
+------------
+
+## MegaFace Dataset
+
+### sidebar
+### end sidebar
+
+[ page under development ]
+
+*MegaFace (Viewpoint Invariant Pedestrian Recognition)* is a dataset of pedestrian images captured at University of California Santa Cruz in 2007. Accoriding to the reserachers 2 "cameras were placed in different locations in an academic setting and subjects were notified of the presence of cameras, but were not coached or instructed in any way."
+
+MegaFace is amongst the most widely used publicly available person re-identification datasets. In 2017 the MegaFace dataset was combined into a larger person re-identification created by the Chinese University of Hong Kong called PETA (PEdesTrian Attribute).
+
+{% include 'dashboard.html' %}
+
+
+### Research notes
+
+Dataset was used in research paper funded by SenseTime
+
+- https://verify.megapixels.cc/paper/megaface/verify/380d5138cadccc9b5b91c707ba0a9220b0f39271
+- x
+
+From "On Low-Resolution Face Recognition in the Wild:Comparisons and New Techniques"
+
+- Says 130,154 Flickr accounts, but I got 48,382
+- https://verify.megapixels.cc/paper/megaface/verify/841855205818d3a6d6f85ec17a22515f4f062882
+
+> 2) MegaFace Challenge 2 LR subset:The MegaFace challenge 2 (MF2) training dataset [48] is the largest (in the numberof identities) publicly available facial recognition dataset, with4.7 million face images and over 672,000 identities. The MF2dataset is obtained by running the Dlib [29] face detector onimages from Flickr [68], yielding 40 million unlabeled faces across 130,154 distinct Flickr accounts. Automatic identity labeling is performed using a clustering algorithm. We per-formed a subset selection from the MegaFace Challenge 2training set with tight bounding boxes to generate a LR subsetof this dataset. Faces smaller than 50x50 pixels are gathered for each identity, and then we eliminated identities with fewer thanfive images available. This subset selection approach produced 6,700 identities and 85,344 face images in total. The extractionprocess does yield some non-face images, as does the originaldataset processing. No further data cleaning is conducted onthis subset.
+
+UHDB31: A Dataset for Better Understanding Face Recognitionacross Pose and Illumination Variatio
+
+- http://openaccess.thecvf.com/content_ICCV_2017_workshops/papers/w37/Le_UHDB31_A_Dataset_ICCV_2017_paper.pdf
+- MegaFace 1 used 690,572 and 1,027,060
+- MegaFace 2 used 672,057 and 4,753,320 \ No newline at end of file
diff --git a/site/content/pages/about/attribution.md b/site/content/pages/about/attribution.md
index 180d87f0..5060b2d9 100644
--- a/site/content/pages/about/attribution.md
+++ b/site/content/pages/about/attribution.md
@@ -16,7 +16,7 @@ authors: Adam Harvey
<section class="about-menu">
<ul>
<li><a href="/about/">About</a></li>
-<li><a href="/press/">Press</a></li>
+<li><a href="/about/press/">Press</a></li>
<li><a class="current" href="/about/attribution/">Attribution</a></li>
<li><a href="/about/legal/">Legal / Privacy</a></li>
</ul>
diff --git a/site/content/pages/about/index.md b/site/content/pages/about/index.md
index 0d9246ca..4cf390fc 100644
--- a/site/content/pages/about/index.md
+++ b/site/content/pages/about/index.md
@@ -16,7 +16,7 @@ authors: Adam Harvey
<section class="about-menu">
<ul>
<li><a class="current" href="/about/">About</a></li>
-<li><a href="/press/">Press</a></li>
+<li><a href="/about/press/">Press</a></li>
<li><a href="/about/attribution/">Attribution</a></li>
<li><a href="/about/legal/">Legal / Privacy</a></li>
</ul>
diff --git a/site/content/pages/about/legal.md b/site/content/pages/about/legal.md
index 53cbca9e..e88fbb17 100644
--- a/site/content/pages/about/legal.md
+++ b/site/content/pages/about/legal.md
@@ -16,7 +16,7 @@ authors: Adam Harvey
<section class="about-menu">
<ul>
<li><a href="/about/">About</a></li>
-<li><a href="/press/">Press</a></li>
+<li><a href="/about/press/">Press</a></li>
<li><a href="/about/attribution/">Attribution</a></li>
<li><a class="current" href="/about/legal/">Legal / Privacy</a></li>
</ul>
@@ -37,7 +37,7 @@ In order to provide certain features of the site, some 3rd party services are ne
### Links To Other Web Sites
-The MegaPixels.cc contains many links to 3rd party websites, especially in the list of citations that are provided for each dataset. This website has no control over and assumes no responsibility for, the content, privacy policies, or practices of any third party web sites or services. You acknowledge and agree that megapixels.cc shall not be responsible or liable, directly or indirectly, for any damage or loss caused or alleged to be caused by or in connection with use of or reliance on any such content, goods or services available on or through any such web sites or services.
+The MegaPixels.cc contains many links to 3rd party websites, especially in the list of citations that are provided for each dataset. This website has no control over and assumes no responsibility for the content, privacy policies, or practices of any third party web sites or services. You acknowledge and agree that megapixels.cc (and its creators) shall not be responsible or liable, directly or indirectly, for any damage or loss caused or alleged to be caused by or in connection with use of or reliance on any such content, goods or services available on or through any such web sites or services.
We advise you to read the terms and conditions and privacy policies of any third-party web sites or services that you visit.
diff --git a/site/content/pages/about/press.md b/site/content/pages/about/press.md
index a66f231d..2839bf20 100644
--- a/site/content/pages/about/press.md
+++ b/site/content/pages/about/press.md
@@ -16,7 +16,7 @@ authors: Adam Harvey
<section class="about-menu">
<ul>
<li><a href="/about/">About</a></li>
-<li><a class="current" href="/press/">Press</a></li>
+<li><a class="current" href="/about/press/">Press</a></li>
<li><a href="/about/attribution/">Attribution</a></li>
<li><a href="/about/legal/">Legal / Privacy</a></li>
</ul>
diff --git a/site/content/pages/datasets/ijb_c/index.md b/site/content/pages/datasets/ijb_c/index.md
index 46cab323..9e3f1808 100644
--- a/site/content/pages/datasets/ijb_c/index.md
+++ b/site/content/pages/datasets/ijb_c/index.md
@@ -27,6 +27,15 @@ The IARPA Janus Benchmark C is a dataset created by
![caption: A visualization of the IJB-C dataset](assets/ijb_c_montage.jpg)
+## Research notes
+
+From original papers: https://noblis.org/wp-content/uploads/2018/03/icb2018.pdf
+
+Collection for the dataset began by identifying CreativeCommons subject videos, which are often more scarce thanCreative Commons subject images. Search terms that re-sulted in large quantities of person-centric videos (e.g. “in-terview”) were generated and translated into numerous lan-guages including Arabic, Korean, Swahili, and Hindi to in-crease diversity of the subject pool. Certain YouTube userswho upload well-labeled, person-centric videos, such as the World Economic Forum and the International University Sports Federation were also identified. Titles of videos per-taining to these search terms and usernames were scrapedusing the YouTube Data API and translated into English us-ing the Yandex Translate API4. Pattern matching was per-formed to extract potential names of subjects from the trans-lated titles, and these names were searched using the Wiki-data API to verify the subject’s existence and status as a public figure, and to check for Wikimedia Commons im-agery. Age, gender, and geographic region were collectedusing the Wikipedia API.Using the candidate subject names, Creative Commonsimages were scraped from Google and Wikimedia Com-mons, and Creative Commons videos were scraped fromYouTube. After images and videos of the candidate subjectwere identified, AMT Workers were tasked with validat-ing the subject’s presence throughout the video. The AMTWorkers marked segments of the video in which the subjectwas present, and key frames
+
+
+IARPA funds Italian researcher https://www.micc.unifi.it/projects/glaivejanus/
+
{% include 'dashboard.html' %}
{% include 'supplementary_header.html' %}
diff --git a/site/content/pages/datasets/msceleb/assets/notes.md b/site/content/pages/datasets/msceleb/assets/notes.md
new file mode 100644
index 00000000..0d8900d1
--- /dev/null
+++ b/site/content/pages/datasets/msceleb/assets/notes.md
@@ -0,0 +1,3 @@
+## Derivative Datasets
+
+- Racial Faces in the Wild http://whdeng.cn/RFW/index.html \ No newline at end of file
diff --git a/site/content/pages/datasets/msceleb/index.md b/site/content/pages/datasets/msceleb/index.md
index f0b07557..5f48ebfd 100644
--- a/site/content/pages/datasets/msceleb/index.md
+++ b/site/content/pages/datasets/msceleb/index.md
@@ -2,8 +2,8 @@
status: published
title: Microsoft Celeb Dataset
-desc: Microsoft Celeb 1M is a target list and dataset of web images used for research and development of face recognition
-subdesc: The MS Celeb dataset includes over 10 million images of about 100K people and a target list of 1 million individuals
+desc: Microsoft Celeb 1M is a dataset of 10 millions faces images harvested from the Internet
+subdesc: The MS Celeb dataset includes 100K people and a target list of 1 million individuals
slug: msceleb
cssclass: dataset
image: assets/background.jpg
@@ -21,66 +21,66 @@ authors: Adam Harvey
Microsoft Celeb (MS Celeb) is a dataset of 10 million face images scraped from the Internet and used for research and development of large-scale biometric recognition systems. According to Microsoft Research, who created and published the [dataset](https://www.microsoft.com/en-us/research/publication/ms-celeb-1m-dataset-benchmark-large-scale-face-recognition-2/) in 2016, MS Celeb is the largest publicly available face recognition dataset in the world, containing over 10 million images of nearly 100,000 individuals. Microsoft's goal in building this dataset was to distribute an initial training dataset of 100,000 individuals' images to accelerate research into recognizing a larger target list of one million people "using all the possibly collected face images of this individual on the web as training data".[^msceleb_orig]
-These one million people, defined by Microsoft Research as "celebrities", are often merely people who must maintain an online presence for their professional lives. Microsoft's list of 1 million people is an expansive exploitation of the current reality that for many people, including academics, policy makers, writers, artists, and especially journalists; maintaining an online presence is mandatory. This fact should not allow Microsoft nor anyone else to use their biometrics for research and development of surveillance technology. Many names in the target list even include people critical of the very technology Microsoft is using their name and biometric information to build. The list includes digital rights activists like Jillian York; artists critical of surveillance including Trevor Paglen, Jill Magid, and Aram Bartholl; Intercept founders Laura Poitras, Jeremy Scahill, and Glenn Greenwald; Data and Society founder danah boyd; and even Julie Brill, the former FTC commissioner responsible for protecting consumer privacy, to name a few.
+These one million people, defined by Microsoft Research as "celebrities", are often merely people who must maintain an online presence for their professional lives. Microsoft's list of 1 million people is an expansive exploitation of the current reality that for many people, including academics, policy makers, writers, artists, activists, and journalists; maintaining an online presence is mandatory. This fact should not allow Microsoft nor anyone else to use their biometrics for research and development of surveillance technology. Many names in the target list even include people critical of the very technology Microsoft is using their name and biometric information to build. The list includes digital rights activists like Jillian York; artists critical of surveillance including Trevor Paglen, Jill Magid, and Aram Bartholl; Intercept founders Laura Poitras, Jeremy Scahill, and Glenn Greenwald; Data and Society founder danah boyd; and even Julie Brill, the former FTC commissioner responsible for protecting consumer privacy, to name only 8 out of 1 million.
### Microsoft's 1 Million Target List
-Below is a selection of 24 names from the full target list, curated to illustrate Microsoft's expansive and exploitative practice of scraping the Internet for biometric training data. The entire name file can be downloaded from [msceleb.org](https://www.msceleb.org). You can email <a href="mailto:msceleb@microsoft.com?subject=MS-Celeb-1M Removal Request&body=Dear%20Microsoft%2C%0A%0AI%20recently%20discovered%20that%20you%20use%20my%20identity%20for%20commercial%20use%20in%20your%20MS-Celeb-1M%20dataset%20used%20for%20research%20and%20development%20of%20face%20recognition.%20I%20do%20not%20wish%20to%20be%20included%20in%20your%20dataset%20in%20any%20format.%20%0A%0APlease%20remove%20my%20name%20and%2For%20any%20associated%20images%20immediately%20and%20send%20a%20confirmation%20once%20you've%20updated%20your%20%22Top1M_MidList.Name.tsv%22%20file.%0A%0AThanks%20for%20promptly%20handing%20this%2C%0A%5B%20your%20name%20%5D">msceleb@microsoft.com</a> to have your name removed. Names appearing with * indicate that Microsoft also distributed your images.
+Below is a selection of 24 names from the full target list, curated to illustrate Microsoft's expansive and exploitative practice of scraping the Internet for biometric training data. The entire name file can be downloaded from [msceleb.org](https://www.msceleb.org). You can email <a href="mailto:msceleb@microsoft.com?subject=MS-Celeb-1M Removal Request&body=Dear%20Microsoft%2C%0A%0AI%20recently%20discovered%20that%20you%20use%20my%20identity%20for%20commercial%20use%20in%20your%20MS-Celeb-1M%20dataset%20used%20for%20research%20and%20development%20of%20face%20recognition.%20I%20do%20not%20wish%20to%20be%20included%20in%20your%20dataset%20in%20any%20format.%20%0A%0APlease%20remove%20my%20name%20and%2For%20any%20associated%20images%20immediately%20and%20send%20a%20confirmation%20once%20you've%20updated%20your%20%22Top1M_MidList.Name.tsv%22%20file.%0A%0AThanks%20for%20promptly%20handing%20this%2C%0A%5B%20your%20name%20%5D">msceleb@microsoft.com</a> to have your name removed. Subjects whose images were distributed by Microsoft are indicated with the total image count. No number indicates the name is only exists in target list.
=== columns 2
-| Name | Profession |
+| Name (images) | Profession |
| --- | --- | --- |
| Adrian Chen | Journalist |
-| Ai Weiwei* | Artist |
-| Aram Bartholl | Internet artist |
+| Ai Weiwei (220) | Artist, activist |
+| Aram Bartholl | Conceptual artist |
| Astra Taylor | Author, director, activist |
-| Alexander Madrigal | Journalist |
-| Bruce Schneier* | Cryptologist |
+| Bruce Schneier (107) | Cryptologist |
+| Cory Doctorow (104) | Blogger, journalist |
| danah boyd | Data &amp; Society founder |
| Edward Felten | Former FTC Chief Technologist |
-| Evgeny Morozov* | Tech writer, researcher |
-| Glenn Greenwald* | Journalist, author |
+| Evgeny Morozov (108) | Tech writer, researcher |
+| Glenn Greenwald (86) | Journalist, author |
| Hito Steyerl | Artist, writer |
| James Risen | Journalist |
====
-| Name | Profession |
+| Name (images) | Profession |
| --- | --- | --- |
-| Jeremy Scahill* | Journalist |
+| Jeremy Scahill (200) | Journalist |
| Jill Magid | Artist |
| Jillian York | Digital rights activist |
| Jonathan Zittrain | EFF board member |
| Julie Brill | Former FTC Commissioner|
| Kim Zetter | Journalist, author |
-| Laura Poitras* | Filmmaker |
+| Laura Poitras (104) | Filmmaker |
| Luke DuBois | Artist |
| Michael Anti | Political blogger |
-| Manal al-Sharif* | Womens's rights activist |
+| Manal al-Sharif (101) | Womens's rights activist |
| Shoshana Zuboff | Author, academic |
| Trevor Paglen | Artist, researcher |
=== end columns
-After publishing this list, researchers affiliated with Microsoft Asia then worked with researchers affiliated with China's National University of Defense Technology (controlled by China's Central Military Commission) and used the the MS Celeb dataset for their [research paper](https://www.semanticscholar.org/paper/Faces-as-Lighting-Probes-via-Unsupervised-Deep-Yi-Zhu/b301fd2fc33f24d6f75224e7c0991f4f04b64a65) on using "Faces as Lighting Probes via Unsupervised Deep Highlight Extraction" with potential applications in 3D face recognition.
+After publishing this list, researchers affiliated with Microsoft Asia then worked with researchers affiliated with China's [National University of Defense Technology](https://en.wikipedia.org/wiki/National_University_of_Defense_Technology) (controlled by China's Central Military Commission) and used the MS Celeb image dataset for their research paper on using "[Faces as Lighting Probes via Unsupervised Deep Highlight Extraction]((https://www.semanticscholar.org/paper/Faces-as-Lighting-Probes-via-Unsupervised-Deep-Yi-Zhu/b301fd2fc33f24d6f75224e7c0991f4f04b64a65)" with potential applications in 3D face recognition.
In an April 10, 2019 [article](https://www.ft.com/content/9378e7ee-5ae6-11e9-9dde-7aedca0a081a) published by Financial Times based on data surfaced during this investigation, Samm Sacks (a senior fellow at the New America think tank) commented that this research raised "red flags because of the nature of the technology, the author's affiliations, combined with what we know about how this technology is being deployed in China right now". Adding, that "the [Chinese] government is using these technologies to build surveillance systems and to detain minorities [in Xinjiang]".[^madhu_ft]
-Four more papers published by SenseTime, which also use the MS Celeb dataset, raise similar flags. SenseTime is a computer vision surveillance company that until [April 2019](https://uhrp.org/news-commentary/china%E2%80%99s-sensetime-sells-out-xinjiang-security-joint-venture) provided surveillance to Chinese authorities to monitor and track Uighur Muslims in Xinjiang province, and had been [flagged](https://www.nytimes.com/2019/04/14/technology/china-surveillance-artificial-intelligence-racial-profiling.html) numerous times as having potential links to human rights violations.
+Four more papers published by SenseTime that also use the MS Celeb dataset raise similar flags. SenseTime is a computer vision surveillance company that until [April 2019](https://uhrp.org/news-commentary/china%E2%80%99s-sensetime-sells-out-xinjiang-security-joint-venture) provided surveillance to Chinese authorities to monitor and track Uighur Muslims in Xinjiang province, and had been [flagged](https://www.nytimes.com/2019/04/14/technology/china-surveillance-artificial-intelligence-racial-profiling.html) numerous times as having potential links to human rights violations.
One of the 4 SenseTime papers, "[Exploring Disentangled Feature Representation Beyond Face Identification](https://www.semanticscholar.org/paper/Exploring-Disentangled-Feature-Representation-Face-Liu-Wei/1fd5d08394a3278ef0a89639e9bfec7cb482e0bf)", shows how SenseTime was developing automated face analysis technology to infer race, narrow eyes, nose size, and chin size, all of which could be used to target vulnerable ethnic groups based on their facial appearances.
-Earlier in 2019, Microsoft President and Chief Legal Officer [Brad Smith](https://blogs.microsoft.com/on-the-issues/2018/12/06/facial-recognition-its-time-for-action/) called for the governmental regulation of face recognition, citing the potential for misuse, a rare admission that Microsoft's surveillance-driven business model had lost its bearing. More recently Smith also [announced](https://www.reuters.com/article/us-microsoft-ai/microsoft-turned-down-facial-recognition-sales-on-human-rights-concerns-idUSKCN1RS2FV) that Microsoft would seemingly take a stand against such potential misuse, and had decided to not sell face recognition to an unnamed United States agency, citing a lack of accuracy. The software was not suitable to be used on minorities, because it was trained mostly on white male faces.
+Earlier in 2019, Microsoft President and Chief Legal Officer [Brad Smith](https://blogs.microsoft.com/on-the-issues/2018/12/06/facial-recognition-its-time-for-action/) called for the governmental regulation of face recognition, citing the potential for misuse, a rare admission that Microsoft's surveillance-driven business model had lost its bearing. More recently Smith also [announced](https://www.reuters.com/article/us-microsoft-ai/microsoft-turned-down-facial-recognition-sales-on-human-rights-concerns-idUSKCN1RS2FV) that Microsoft would seemingly take a stand against such potential misuse, and had decided to not sell face recognition to an unnamed United States agency, citing a lack of accuracy. In effect, Microsoft's face recognition software was not suitable to be used on minorities because it was trained mostly on white male faces.
What the decision to block the sale announces is not so much that Microsoft had upgraded their ethics, but that Microsoft publicly acknowledged it can't sell a data-driven product without data. In other words, Microsoft can't sell face recognition for faces they can't train on.
-Until now, that data has been freely harvested from the Internet and packaged in training sets like MS Celeb, which are overwhelmingly [white](https://www.nytimes.com/2018/02/09/technology/facial-recognition-race-artificial-intelligence.html) and [male](https://gendershades.org). Without balanced data, facial recognition contains blind spots. And without datasets like MS Celeb, the powerful yet inaccurate facial recognition services like Microsoft's Azure Cognitive Service also would not be able to see at all.
+Until now, that data has been freely harvested from the Internet and packaged in training sets like MS Celeb, which are overwhelmingly [white](https://www.nytimes.com/2018/02/09/technology/facial-recognition-race-artificial-intelligence.html) and [male](https://gendershades.org). Without balanced data, facial recognition contains blind spots. And without datasets like MS Celeb, the powerful yet inaccurate facial recognition services like Microsoft's Azure Cognitive Service the services might not exist at all.
![caption: A visualization of 2,000 of the 100,000 identity included in the image dataset distributed by Microsoft Research. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)](assets/msceleb_montage.jpg)
-Microsoft didn't only create MS Celeb for other researchers to use, they also used it internally. In a publicly available 2017 Microsoft Research project called "[One-shot Face Recognition by Promoting Underrepresented Classes](https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/)," Microsoft leveraged the MS Celeb dataset to analyze their algorithms and advertise the results. Interestingly, Microsoft's [corporate version](https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/) of the paper does not mention they used the MS Celeb datset, but the [open-access version](https://www.semanticscholar.org/paper/One-shot-Face-Recognition-by-Promoting-Classes-Guo/6cacda04a541d251e8221d70ac61fda88fb61a70) published on arxiv.org explicitly mentions that Microsoft Research introspected their algorithms "on the MS-Celeb-1M low-shot learning benchmark task."
+Microsoft didn't only create MS Celeb for other researchers to use, they also used it internally. In a publicly available 2017 Microsoft Research project called "[One-shot Face Recognition by Promoting Underrepresented Classes](https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/)," Microsoft leveraged the MS Celeb dataset to build their algorithms and advertise the results. Interestingly, Microsoft's [corporate version](https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/) of the paper does not mention they used the MS Celeb datset, but the [open-access version](https://www.semanticscholar.org/paper/One-shot-Face-Recognition-by-Promoting-Classes-Guo/6cacda04a541d251e8221d70ac61fda88fb61a70) published on arxiv.org explicitly mentions that Microsoft Research introspected their algorithms "on the MS-Celeb-1M low-shot learning benchmark task."
We suggest that if Microsoft Research wants to make biometric data publicly available for surveillance research and development, they should start with releasing their researchers' own biometric data, instead of scraping the Internet for journalists, artists, writers, actors, athletes, musicians, and academics.
diff --git a/site/content/pages/datasets/uccs/assets/notes.md b/site/content/pages/datasets/uccs/assets/notes.md
new file mode 100644
index 00000000..d248573d
--- /dev/null
+++ b/site/content/pages/datasets/uccs/assets/notes.md
@@ -0,0 +1,5 @@
+
+## Additional papers that used UCCS
+
+- https://verify.megapixels.cc/paper/megaface/verify/841855205818d3a6d6f85ec17a22515f4f062882
+- "we use the database subset that has assigned identities (180 identities total)."
diff --git a/site/content/pages/research/00_introduction/index.md b/site/content/pages/research/00_introduction/index.md
index 477679d4..ad8e2200 100644
--- a/site/content/pages/research/00_introduction/index.md
+++ b/site/content/pages/research/00_introduction/index.md
@@ -32,6 +32,17 @@ There is only biased feature vector clustering and probabilistic thresholding.
Yesterday's [decision](https://www.reuters.com/article/us-microsoft-ai/microsoft-turned-down-facial-recognition-sales-on-human-rights-concerns-idUSKCN1RS2FV) by Brad Smith, CEO of Microsoft, to not sell facial recognition to a US law enforcement agency is not an about face by Microsoft to become more humane, it's simply a perfect illustration of the value of training data. Without data, you don't have a product to sell. Microsoft realized that doesn't have enough training data to sell
+## Cost of Faces
+
+Univ Houston paid subjects $20/ea
+http://web.archive.org/web/20170925053724/http://cbl.uh.edu/index.php/pages/research/collecting_facial_images_from_multiples_in_texas
+
+FaceMeta facedataset.com
+
+- BASIC: 15,000 images for $6,000 USD
+- RECOMMENDED: 50,000 images for $12,000 USD
+- ADVANCED: 100,000 images for $18,000 USD*
+
## Use Your Own Biometrics First
diff --git a/site/content/pages/research/01_from_1_to_100_pixels/index.md b/site/content/pages/research/01_from_1_to_100_pixels/index.md
index b219dffb..ddffdf91 100644
--- a/site/content/pages/research/01_from_1_to_100_pixels/index.md
+++ b/site/content/pages/research/01_from_1_to_100_pixels/index.md
@@ -40,6 +40,11 @@ What can you know from a very small amount of information?
- 100x100 all you need for medical diagnosis
- 100x100 0.5% of one Instagram photo
+
+Notes:
+
+- Google FaceNet used images with (face?) sizes: Input sizes range from 96x96 pixels to 224x224pixels in our experiments. FaceNet: A Unified Embedding for Face Recognition and Clustering https://arxiv.org/pdf/1503.03832.pdf
+
Ideas:
- Find specific cases of facial resolution being used in legal cases, forensic investigations, or military footage
diff --git a/site/public/about/legal/index.html b/site/public/about/legal/index.html
index 5b34c319..d8d81d04 100644
--- a/site/public/about/legal/index.html
+++ b/site/public/about/legal/index.html
@@ -68,7 +68,7 @@
<h2>3rd Party Services</h2>
<p>In order to provide certain features of the site, some 3rd party services are needed. Currently, the MegaPixels.cc site uses two 3rd party services: (1) Leaflet.js for the interactive map and (2) Digital Ocean Spaces as a content delivery network. Both services encrypt your requests to their server using HTTPS and neither service requires storing any cookies or authentication. However, both services will store files in your web browser's local cache (local storage) to improve loading performance. None of these local storage files are using for analytics, tracking, or any similar purpose.</p>
<h3>Links To Other Web Sites</h3>
-<p>The MegaPixels.cc contains many links to 3rd party websites, especially in the list of citations that are provided for each dataset. This website has no control over and assumes no responsibility for, the content, privacy policies, or practices of any third party web sites or services. You acknowledge and agree that megapixels.cc shall not be responsible or liable, directly or indirectly, for any damage or loss caused or alleged to be caused by or in connection with use of or reliance on any such content, goods or services available on or through any such web sites or services.</p>
+<p>The MegaPixels.cc contains many links to 3rd party websites, especially in the list of citations that are provided for each dataset. This website has no control over and assumes no responsibility for the content, privacy policies, or practices of any third party web sites or services. You acknowledge and agree that megapixels.cc (and its creators) shall not be responsible or liable, directly or indirectly, for any damage or loss caused or alleged to be caused by or in connection with use of or reliance on any such content, goods or services available on or through any such web sites or services.</p>
<p>We advise you to read the terms and conditions and privacy policies of any third-party web sites or services that you visit.</p>
<h3>Information We Collect</h3>
<p>When you access the Service, we record your visit to the site in a server log file for the purposes of maintaining site security and preventing misuse. This includes your IP address and the header information sent by your web browser which includes the User Agent, referrer, and the requested page on our site.</p>
diff --git a/site/public/datasets/ijb_c/index.html b/site/public/datasets/ijb_c/index.html
index 3bc23ca5..f58be23f 100644
--- a/site/public/datasets/ijb_c/index.html
+++ b/site/public/datasets/ijb_c/index.html
@@ -75,7 +75,11 @@
<div><a href='https://www.nist.gov/programs-projects/face-challenges' target='_blank' rel='nofollow noopener'>nist.gov</a></div>
</div></div><p>[ page under development ]</p>
<p>The IARPA Janus Benchmark C is a dataset created by</p>
-</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/ijb_c/assets/ijb_c_montage.jpg' alt=' A visualization of the IJB-C dataset'><div class='caption'> A visualization of the IJB-C dataset</div></div></section><section>
+</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/ijb_c/assets/ijb_c_montage.jpg' alt=' A visualization of the IJB-C dataset'><div class='caption'> A visualization of the IJB-C dataset</div></div></section><section><h2>Research notes</h2>
+<p>From original papers: <a href="https://noblis.org/wp-content/uploads/2018/03/icb2018.pdf">https://noblis.org/wp-content/uploads/2018/03/icb2018.pdf</a></p>
+<p>Collection for the dataset began by identifying CreativeCommons subject videos, which are often more scarce thanCreative Commons subject images. Search terms that re-sulted in large quantities of person-centric videos (e.g. “in-terview”) were generated and translated into numerous lan-guages including Arabic, Korean, Swahili, and Hindi to in-crease diversity of the subject pool. Certain YouTube userswho upload well-labeled, person-centric videos, such as the World Economic Forum and the International University Sports Federation were also identified. Titles of videos per-taining to these search terms and usernames were scrapedusing the YouTube Data API and translated into English us-ing the Yandex Translate API4. Pattern matching was per-formed to extract potential names of subjects from the trans-lated titles, and these names were searched using the Wiki-data API to verify the subject’s existence and status as a public figure, and to check for Wikimedia Commons im-agery. Age, gender, and geographic region were collectedusing the Wikipedia API.Using the candidate subject names, Creative Commonsimages were scraped from Google and Wikimedia Com-mons, and Creative Commons videos were scraped fromYouTube. After images and videos of the candidate subjectwere identified, AMT Workers were tasked with validat-ing the subject’s presence throughout the video. The AMTWorkers marked segments of the video in which the subjectwas present, and key frames</p>
+<p>IARPA funds Italian researcher <a href="https://www.micc.unifi.it/projects/glaivejanus/">https://www.micc.unifi.it/projects/glaivejanus/</a></p>
+</section><section>
<h3>Who used IJB-C?</h3>
<p>
diff --git a/site/public/datasets/msceleb/assets/notes/index.html b/site/public/datasets/msceleb/assets/notes/index.html
new file mode 100644
index 00000000..a249f08b
--- /dev/null
+++ b/site/public/datasets/msceleb/assets/notes/index.html
@@ -0,0 +1,75 @@
+<!doctype html>
+<html>
+<head>
+ <title>MegaPixels</title>
+ <meta charset="utf-8" />
+ <meta name="author" content="Adam Harvey" />
+ <meta name="description" content="" />
+ <meta property="og:title" content="MegaPixels: Untitled Page"/>
+ <meta property="og:type" content="website"/>
+ <meta property="og:image" content="https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/msceleb/assets/background.jpg" />
+ <meta property="og:url" content="https://megapixels.cc/datasets/msceleb/assets/"/>
+ <meta property="og:site_name" content="MegaPixels" />
+ <meta name="referrer" content="no-referrer" />
+ <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=no"/>
+ <meta name="apple-mobile-web-app-status-bar-style" content="black">
+ <meta name="apple-mobile-web-app-capable" content="yes">
+
+ <link rel="apple-touch-icon" sizes="57x57" href="/assets/img/favicon/apple-icon-57x57.png">
+ <link rel="apple-touch-icon" sizes="60x60" href="/assets/img/favicon/apple-icon-60x60.png">
+ <link rel="apple-touch-icon" sizes="72x72" href="/assets/img/favicon/apple-icon-72x72.png">
+ <link rel="apple-touch-icon" sizes="76x76" href="/assets/img/favicon/apple-icon-76x76.png">
+ <link rel="apple-touch-icon" sizes="114x114" href="/assets/img/favicon/apple-icon-114x114.png">
+ <link rel="apple-touch-icon" sizes="120x120" href="/assets/img/favicon/apple-icon-120x120.png">
+ <link rel="apple-touch-icon" sizes="144x144" href="/assets/img/favicon/apple-icon-144x144.png">
+ <link rel="apple-touch-icon" sizes="152x152" href="/assets/img/favicon/apple-icon-152x152.png">
+ <link rel="apple-touch-icon" sizes="180x180" href="/assets/img/favicon/apple-icon-180x180.png">
+ <link rel="icon" type="image/png" sizes="192x192" href="/assets/img/favicon/android-icon-192x192.png">
+ <link rel="icon" type="image/png" sizes="32x32" href="/assets/img/favicon/favicon-32x32.png">
+ <link rel="icon" type="image/png" sizes="96x96" href="/assets/img/favicon/favicon-96x96.png">
+ <link rel="icon" type="image/png" sizes="16x16" href="/assets/img/favicon/favicon-16x16.png">
+ <link rel="manifest" href="/assets/img/favicon/manifest.json">
+ <meta name="msapplication-TileColor" content="#ffffff">
+ <meta name="msapplication-TileImage" content="/ms-icon-144x144.png">
+ <meta name="theme-color" content="#ffffff">
+
+ <link rel='stylesheet' href='/assets/css/fonts.css' />
+ <link rel='stylesheet' href='/assets/css/css.css' />
+ <link rel='stylesheet' href='/assets/css/leaflet.css' />
+ <link rel='stylesheet' href='/assets/css/applets.css' />
+ <link rel='stylesheet' href='/assets/css/mobile.css' />
+</head>
+<body>
+ <header>
+ <a class='slogan' href="/">
+ <div class='logo'></div>
+ <div class='site_name'>MegaPixels</div>
+
+ </a>
+ <div class='links'>
+ <a href="/datasets/">Datasets</a>
+ <a href="/about/">About</a>
+ </div>
+ </header>
+ <div class="content content-">
+
+
+
+ </div>
+ <footer>
+ <ul class="footer-left">
+ <li><a href="/">MegaPixels.cc</a></li>
+ <li><a href="/datasets/">Datasets</a></li>
+ <li><a href="/about/">About</a></li>
+ <li><a href="/about/press/">Press</a></li>
+ <li><a href="/about/legal/">Legal and Privacy</a></li>
+ </ul>
+ <ul class="footer-right">
+ <li>MegaPixels &copy;2017-19 &nbsp;<a href="https://ahprojects.com">Adam R. Harvey</a></li>
+ <li>Made with support from &nbsp;<a href="https://mozilla.org">Mozilla</a></li>
+ </ul>
+ </footer>
+</body>
+
+<script src="/assets/js/dist/index.js"></script>
+</html> \ No newline at end of file
diff --git a/site/public/datasets/msceleb/index.html b/site/public/datasets/msceleb/index.html
index f1d59366..aabda46c 100644
--- a/site/public/datasets/msceleb/index.html
+++ b/site/public/datasets/msceleb/index.html
@@ -4,7 +4,7 @@
<title>MegaPixels</title>
<meta charset="utf-8" />
<meta name="author" content="Adam Harvey" />
- <meta name="description" content="Microsoft Celeb 1M is a target list and dataset of web images used for research and development of face recognition" />
+ <meta name="description" content="Microsoft Celeb 1M is a dataset of 10 millions faces images harvested from the Internet" />
<meta property="og:title" content="MegaPixels: Microsoft Celeb Dataset"/>
<meta property="og:type" content="website"/>
<meta property="og:image" content="https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/msceleb/assets/background.jpg" />
@@ -53,7 +53,7 @@
</header>
<div class="content content-dataset">
- <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/msceleb/assets/background.jpg)'><div class='inner'><div class='hero_desc'><span class='bgpad'>Microsoft Celeb 1M is a target list and dataset of web images used for research and development of face recognition</span></div><div class='hero_subdesc'><span class='bgpad'>The MS Celeb dataset includes over 10 million images of about 100K people and a target list of 1 million individuals
+ <section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/msceleb/assets/background.jpg)'><div class='inner'><div class='hero_desc'><span class='bgpad'>Microsoft Celeb 1M is a dataset of 10 millions faces images harvested from the Internet</span></div><div class='hero_subdesc'><span class='bgpad'>The MS Celeb dataset includes 100K people and a target list of 1 million individuals
</span></div></div></section><section><h2>Microsoft Celeb Dataset (MS Celeb)</h2>
</section><section><div class='right-sidebar'><div class='meta'>
<div class='gray'>Published</div>
@@ -77,12 +77,12 @@
<div class='gray'>Website</div>
<div><a href='http://www.msceleb.org/' target='_blank' rel='nofollow noopener'>msceleb.org</a></div>
</div></div><p>Microsoft Celeb (MS Celeb) is a dataset of 10 million face images scraped from the Internet and used for research and development of large-scale biometric recognition systems. According to Microsoft Research, who created and published the <a href="https://www.microsoft.com/en-us/research/publication/ms-celeb-1m-dataset-benchmark-large-scale-face-recognition-2/">dataset</a> in 2016, MS Celeb is the largest publicly available face recognition dataset in the world, containing over 10 million images of nearly 100,000 individuals. Microsoft's goal in building this dataset was to distribute an initial training dataset of 100,000 individuals' images to accelerate research into recognizing a larger target list of one million people "using all the possibly collected face images of this individual on the web as training data".<a class="footnote_shim" name="[^msceleb_orig]_1"> </a><a href="#[^msceleb_orig]" class="footnote" title="Footnote 1">1</a></p>
-<p>These one million people, defined by Microsoft Research as "celebrities", are often merely people who must maintain an online presence for their professional lives. Microsoft's list of 1 million people is an expansive exploitation of the current reality that for many people, including academics, policy makers, writers, artists, and especially journalists; maintaining an online presence is mandatory. This fact should not allow Microsoft nor anyone else to use their biometrics for research and development of surveillance technology. Many names in the target list even include people critical of the very technology Microsoft is using their name and biometric information to build. The list includes digital rights activists like Jillian York; artists critical of surveillance including Trevor Paglen, Jill Magid, and Aram Bartholl; Intercept founders Laura Poitras, Jeremy Scahill, and Glenn Greenwald; Data and Society founder danah boyd; and even Julie Brill, the former FTC commissioner responsible for protecting consumer privacy, to name a few.</p>
+<p>These one million people, defined by Microsoft Research as "celebrities", are often merely people who must maintain an online presence for their professional lives. Microsoft's list of 1 million people is an expansive exploitation of the current reality that for many people, including academics, policy makers, writers, artists, activists, and journalists; maintaining an online presence is mandatory. This fact should not allow Microsoft nor anyone else to use their biometrics for research and development of surveillance technology. Many names in the target list even include people critical of the very technology Microsoft is using their name and biometric information to build. The list includes digital rights activists like Jillian York; artists critical of surveillance including Trevor Paglen, Jill Magid, and Aram Bartholl; Intercept founders Laura Poitras, Jeremy Scahill, and Glenn Greenwald; Data and Society founder danah boyd; and even Julie Brill, the former FTC commissioner responsible for protecting consumer privacy, to name only 8 out of 1 million.</p>
<h3>Microsoft's 1 Million Target List</h3>
-<p>Below is a selection of 24 names from the full target list, curated to illustrate Microsoft's expansive and exploitative practice of scraping the Internet for biometric training data. The entire name file can be downloaded from <a href="https://www.msceleb.org">msceleb.org</a>. You can email <a href="mailto:msceleb@microsoft.com?subject=MS-Celeb-1M Removal Request&body=Dear%20Microsoft%2C%0A%0AI%20recently%20discovered%20that%20you%20use%20my%20identity%20for%20commercial%20use%20in%20your%20MS-Celeb-1M%20dataset%20used%20for%20research%20and%20development%20of%20face%20recognition.%20I%20do%20not%20wish%20to%20be%20included%20in%20your%20dataset%20in%20any%20format.%20%0A%0APlease%20remove%20my%20name%20and%2For%20any%20associated%20images%20immediately%20and%20send%20a%20confirmation%20once%20you've%20updated%20your%20%22Top1M_MidList.Name.tsv%22%20file.%0A%0AThanks%20for%20promptly%20handing%20this%2C%0A%5B%20your%20name%20%5D">msceleb@microsoft.com</a> to have your name removed. Names appearing with * indicate that Microsoft also distributed your images.</p>
+<p>Below is a selection of 24 names from the full target list, curated to illustrate Microsoft's expansive and exploitative practice of scraping the Internet for biometric training data. The entire name file can be downloaded from <a href="https://www.msceleb.org">msceleb.org</a>. You can email <a href="mailto:msceleb@microsoft.com?subject=MS-Celeb-1M Removal Request&body=Dear%20Microsoft%2C%0A%0AI%20recently%20discovered%20that%20you%20use%20my%20identity%20for%20commercial%20use%20in%20your%20MS-Celeb-1M%20dataset%20used%20for%20research%20and%20development%20of%20face%20recognition.%20I%20do%20not%20wish%20to%20be%20included%20in%20your%20dataset%20in%20any%20format.%20%0A%0APlease%20remove%20my%20name%20and%2For%20any%20associated%20images%20immediately%20and%20send%20a%20confirmation%20once%20you've%20updated%20your%20%22Top1M_MidList.Name.tsv%22%20file.%0A%0AThanks%20for%20promptly%20handing%20this%2C%0A%5B%20your%20name%20%5D">msceleb@microsoft.com</a> to have your name removed. Subjects whose images were distributed by Microsoft are indicated with the total image count. No number indicates the name is only exists in target list.</p>
</section><section><div class='columns columns-2'><div class='column'><table>
<thead><tr>
-<th>Name</th>
+<th>Name (images)</th>
<th>Profession</th>
</tr>
</thead>
@@ -92,24 +92,24 @@
<td>Journalist</td>
</tr>
<tr>
-<td>Ai Weiwei*</td>
-<td>Artist</td>
+<td>Ai Weiwei (220)</td>
+<td>Artist, activist</td>
</tr>
<tr>
<td>Aram Bartholl</td>
-<td>Internet artist</td>
+<td>Conceptual artist</td>
</tr>
<tr>
<td>Astra Taylor</td>
<td>Author, director, activist</td>
</tr>
<tr>
-<td>Alexander Madrigal</td>
-<td>Journalist</td>
+<td>Bruce Schneier (107)</td>
+<td>Cryptologist</td>
</tr>
<tr>
-<td>Bruce Schneier*</td>
-<td>Cryptologist</td>
+<td>Cory Doctorow (104)</td>
+<td>Blogger, journalist</td>
</tr>
<tr>
<td>danah boyd</td>
@@ -120,11 +120,11 @@
<td>Former FTC Chief Technologist</td>
</tr>
<tr>
-<td>Evgeny Morozov*</td>
+<td>Evgeny Morozov (108)</td>
<td>Tech writer, researcher</td>
</tr>
<tr>
-<td>Glenn Greenwald*</td>
+<td>Glenn Greenwald (86)</td>
<td>Journalist, author</td>
</tr>
<tr>
@@ -139,13 +139,13 @@
</table>
</div><div class='column'><table>
<thead><tr>
-<th>Name</th>
+<th>Name (images)</th>
<th>Profession</th>
</tr>
</thead>
<tbody>
<tr>
-<td>Jeremy Scahill*</td>
+<td>Jeremy Scahill (200)</td>
<td>Journalist</td>
</tr>
<tr>
@@ -169,7 +169,7 @@
<td>Journalist, author</td>
</tr>
<tr>
-<td>Laura Poitras*</td>
+<td>Laura Poitras (104)</td>
<td>Filmmaker</td>
</tr>
<tr>
@@ -181,7 +181,7 @@
<td>Political blogger</td>
</tr>
<tr>
-<td>Manal al-Sharif*</td>
+<td>Manal al-Sharif (101)</td>
<td>Womens's rights activist</td>
</tr>
<tr>
@@ -194,14 +194,14 @@
</tr>
</tbody>
</table>
-</div></div></section><section><p>After publishing this list, researchers affiliated with Microsoft Asia then worked with researchers affiliated with China's National University of Defense Technology (controlled by China's Central Military Commission) and used the the MS Celeb dataset for their <a href="https://www.semanticscholar.org/paper/Faces-as-Lighting-Probes-via-Unsupervised-Deep-Yi-Zhu/b301fd2fc33f24d6f75224e7c0991f4f04b64a65">research paper</a> on using "Faces as Lighting Probes via Unsupervised Deep Highlight Extraction" with potential applications in 3D face recognition.</p>
+</div></div></section><section><p>After publishing this list, researchers affiliated with Microsoft Asia then worked with researchers affiliated with China's <a href="https://en.wikipedia.org/wiki/National_University_of_Defense_Technology">National University of Defense Technology</a> (controlled by China's Central Military Commission) and used the MS Celeb image dataset for their research paper on using "<a href="(https://www.semanticscholar.org/paper/Faces-as-Lighting-Probes-via-Unsupervised-Deep-Yi-Zhu/b301fd2fc33f24d6f75224e7c0991f4f04b64a65">Faces as Lighting Probes via Unsupervised Deep Highlight Extraction</a>" with potential applications in 3D face recognition.</p>
<p>In an April 10, 2019 <a href="https://www.ft.com/content/9378e7ee-5ae6-11e9-9dde-7aedca0a081a">article</a> published by Financial Times based on data surfaced during this investigation, Samm Sacks (a senior fellow at the New America think tank) commented that this research raised "red flags because of the nature of the technology, the author's affiliations, combined with what we know about how this technology is being deployed in China right now". Adding, that "the [Chinese] government is using these technologies to build surveillance systems and to detain minorities [in Xinjiang]".<a class="footnote_shim" name="[^madhu_ft]_1"> </a><a href="#[^madhu_ft]" class="footnote" title="Footnote 2">2</a></p>
-<p>Four more papers published by SenseTime, which also use the MS Celeb dataset, raise similar flags. SenseTime is a computer vision surveillance company that until <a href="https://uhrp.org/news-commentary/china%E2%80%99s-sensetime-sells-out-xinjiang-security-joint-venture">April 2019</a> provided surveillance to Chinese authorities to monitor and track Uighur Muslims in Xinjiang province, and had been <a href="https://www.nytimes.com/2019/04/14/technology/china-surveillance-artificial-intelligence-racial-profiling.html">flagged</a> numerous times as having potential links to human rights violations.</p>
+<p>Four more papers published by SenseTime that also use the MS Celeb dataset raise similar flags. SenseTime is a computer vision surveillance company that until <a href="https://uhrp.org/news-commentary/china%E2%80%99s-sensetime-sells-out-xinjiang-security-joint-venture">April 2019</a> provided surveillance to Chinese authorities to monitor and track Uighur Muslims in Xinjiang province, and had been <a href="https://www.nytimes.com/2019/04/14/technology/china-surveillance-artificial-intelligence-racial-profiling.html">flagged</a> numerous times as having potential links to human rights violations.</p>
<p>One of the 4 SenseTime papers, "<a href="https://www.semanticscholar.org/paper/Exploring-Disentangled-Feature-Representation-Face-Liu-Wei/1fd5d08394a3278ef0a89639e9bfec7cb482e0bf">Exploring Disentangled Feature Representation Beyond Face Identification</a>", shows how SenseTime was developing automated face analysis technology to infer race, narrow eyes, nose size, and chin size, all of which could be used to target vulnerable ethnic groups based on their facial appearances.</p>
-<p>Earlier in 2019, Microsoft President and Chief Legal Officer <a href="https://blogs.microsoft.com/on-the-issues/2018/12/06/facial-recognition-its-time-for-action/">Brad Smith</a> called for the governmental regulation of face recognition, citing the potential for misuse, a rare admission that Microsoft's surveillance-driven business model had lost its bearing. More recently Smith also <a href="https://www.reuters.com/article/us-microsoft-ai/microsoft-turned-down-facial-recognition-sales-on-human-rights-concerns-idUSKCN1RS2FV">announced</a> that Microsoft would seemingly take a stand against such potential misuse, and had decided to not sell face recognition to an unnamed United States agency, citing a lack of accuracy. The software was not suitable to be used on minorities, because it was trained mostly on white male faces.</p>
+<p>Earlier in 2019, Microsoft President and Chief Legal Officer <a href="https://blogs.microsoft.com/on-the-issues/2018/12/06/facial-recognition-its-time-for-action/">Brad Smith</a> called for the governmental regulation of face recognition, citing the potential for misuse, a rare admission that Microsoft's surveillance-driven business model had lost its bearing. More recently Smith also <a href="https://www.reuters.com/article/us-microsoft-ai/microsoft-turned-down-facial-recognition-sales-on-human-rights-concerns-idUSKCN1RS2FV">announced</a> that Microsoft would seemingly take a stand against such potential misuse, and had decided to not sell face recognition to an unnamed United States agency, citing a lack of accuracy. In effect, Microsoft's face recognition software was not suitable to be used on minorities because it was trained mostly on white male faces.</p>
<p>What the decision to block the sale announces is not so much that Microsoft had upgraded their ethics, but that Microsoft publicly acknowledged it can't sell a data-driven product without data. In other words, Microsoft can't sell face recognition for faces they can't train on.</p>
-<p>Until now, that data has been freely harvested from the Internet and packaged in training sets like MS Celeb, which are overwhelmingly <a href="https://www.nytimes.com/2018/02/09/technology/facial-recognition-race-artificial-intelligence.html">white</a> and <a href="https://gendershades.org">male</a>. Without balanced data, facial recognition contains blind spots. And without datasets like MS Celeb, the powerful yet inaccurate facial recognition services like Microsoft's Azure Cognitive Service also would not be able to see at all.</p>
-</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/msceleb/assets/msceleb_montage.jpg' alt=' A visualization of 2,000 of the 100,000 identity included in the image dataset distributed by Microsoft Research. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)'><div class='caption'> A visualization of 2,000 of the 100,000 identity included in the image dataset distributed by Microsoft Research. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)</div></div></section><section><p>Microsoft didn't only create MS Celeb for other researchers to use, they also used it internally. In a publicly available 2017 Microsoft Research project called "<a href="https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/">One-shot Face Recognition by Promoting Underrepresented Classes</a>," Microsoft leveraged the MS Celeb dataset to analyze their algorithms and advertise the results. Interestingly, Microsoft's <a href="https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/">corporate version</a> of the paper does not mention they used the MS Celeb datset, but the <a href="https://www.semanticscholar.org/paper/One-shot-Face-Recognition-by-Promoting-Classes-Guo/6cacda04a541d251e8221d70ac61fda88fb61a70">open-access version</a> published on arxiv.org explicitly mentions that Microsoft Research introspected their algorithms "on the MS-Celeb-1M low-shot learning benchmark task."</p>
+<p>Until now, that data has been freely harvested from the Internet and packaged in training sets like MS Celeb, which are overwhelmingly <a href="https://www.nytimes.com/2018/02/09/technology/facial-recognition-race-artificial-intelligence.html">white</a> and <a href="https://gendershades.org">male</a>. Without balanced data, facial recognition contains blind spots. And without datasets like MS Celeb, the powerful yet inaccurate facial recognition services like Microsoft's Azure Cognitive Service the services might not exist at all.</p>
+</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/msceleb/assets/msceleb_montage.jpg' alt=' A visualization of 2,000 of the 100,000 identity included in the image dataset distributed by Microsoft Research. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)'><div class='caption'> A visualization of 2,000 of the 100,000 identity included in the image dataset distributed by Microsoft Research. Credit: megapixels.cc. License: Open Data Commons Public Domain Dedication (PDDL)</div></div></section><section><p>Microsoft didn't only create MS Celeb for other researchers to use, they also used it internally. In a publicly available 2017 Microsoft Research project called "<a href="https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/">One-shot Face Recognition by Promoting Underrepresented Classes</a>," Microsoft leveraged the MS Celeb dataset to build their algorithms and advertise the results. Interestingly, Microsoft's <a href="https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/">corporate version</a> of the paper does not mention they used the MS Celeb datset, but the <a href="https://www.semanticscholar.org/paper/One-shot-Face-Recognition-by-Promoting-Classes-Guo/6cacda04a541d251e8221d70ac61fda88fb61a70">open-access version</a> published on arxiv.org explicitly mentions that Microsoft Research introspected their algorithms "on the MS-Celeb-1M low-shot learning benchmark task."</p>
<p>We suggest that if Microsoft Research wants to make biometric data publicly available for surveillance research and development, they should start with releasing their researchers' own biometric data, instead of scraping the Internet for journalists, artists, writers, actors, athletes, musicians, and academics.</p>
</section><section>
<h3>Who used Microsoft Celeb?</h3>
diff --git a/site/public/datasets/uccs/assets/notes/index.html b/site/public/datasets/uccs/assets/notes/index.html
new file mode 100644
index 00000000..0218d1b2
--- /dev/null
+++ b/site/public/datasets/uccs/assets/notes/index.html
@@ -0,0 +1,75 @@
+<!doctype html>
+<html>
+<head>
+ <title>MegaPixels</title>
+ <meta charset="utf-8" />
+ <meta name="author" content="Adam Harvey" />
+ <meta name="description" content="" />
+ <meta property="og:title" content="MegaPixels: Untitled Page"/>
+ <meta property="og:type" content="website"/>
+ <meta property="og:image" content="https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/msceleb/assets/background.jpg" />
+ <meta property="og:url" content="https://megapixels.cc/datasets/uccs/assets/"/>
+ <meta property="og:site_name" content="MegaPixels" />
+ <meta name="referrer" content="no-referrer" />
+ <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=no"/>
+ <meta name="apple-mobile-web-app-status-bar-style" content="black">
+ <meta name="apple-mobile-web-app-capable" content="yes">
+
+ <link rel="apple-touch-icon" sizes="57x57" href="/assets/img/favicon/apple-icon-57x57.png">
+ <link rel="apple-touch-icon" sizes="60x60" href="/assets/img/favicon/apple-icon-60x60.png">
+ <link rel="apple-touch-icon" sizes="72x72" href="/assets/img/favicon/apple-icon-72x72.png">
+ <link rel="apple-touch-icon" sizes="76x76" href="/assets/img/favicon/apple-icon-76x76.png">
+ <link rel="apple-touch-icon" sizes="114x114" href="/assets/img/favicon/apple-icon-114x114.png">
+ <link rel="apple-touch-icon" sizes="120x120" href="/assets/img/favicon/apple-icon-120x120.png">
+ <link rel="apple-touch-icon" sizes="144x144" href="/assets/img/favicon/apple-icon-144x144.png">
+ <link rel="apple-touch-icon" sizes="152x152" href="/assets/img/favicon/apple-icon-152x152.png">
+ <link rel="apple-touch-icon" sizes="180x180" href="/assets/img/favicon/apple-icon-180x180.png">
+ <link rel="icon" type="image/png" sizes="192x192" href="/assets/img/favicon/android-icon-192x192.png">
+ <link rel="icon" type="image/png" sizes="32x32" href="/assets/img/favicon/favicon-32x32.png">
+ <link rel="icon" type="image/png" sizes="96x96" href="/assets/img/favicon/favicon-96x96.png">
+ <link rel="icon" type="image/png" sizes="16x16" href="/assets/img/favicon/favicon-16x16.png">
+ <link rel="manifest" href="/assets/img/favicon/manifest.json">
+ <meta name="msapplication-TileColor" content="#ffffff">
+ <meta name="msapplication-TileImage" content="/ms-icon-144x144.png">
+ <meta name="theme-color" content="#ffffff">
+
+ <link rel='stylesheet' href='/assets/css/fonts.css' />
+ <link rel='stylesheet' href='/assets/css/css.css' />
+ <link rel='stylesheet' href='/assets/css/leaflet.css' />
+ <link rel='stylesheet' href='/assets/css/applets.css' />
+ <link rel='stylesheet' href='/assets/css/mobile.css' />
+</head>
+<body>
+ <header>
+ <a class='slogan' href="/">
+ <div class='logo'></div>
+ <div class='site_name'>MegaPixels</div>
+
+ </a>
+ <div class='links'>
+ <a href="/datasets/">Datasets</a>
+ <a href="/about/">About</a>
+ </div>
+ </header>
+ <div class="content content-">
+
+
+
+ </div>
+ <footer>
+ <ul class="footer-left">
+ <li><a href="/">MegaPixels.cc</a></li>
+ <li><a href="/datasets/">Datasets</a></li>
+ <li><a href="/about/">About</a></li>
+ <li><a href="/about/press/">Press</a></li>
+ <li><a href="/about/legal/">Legal and Privacy</a></li>
+ </ul>
+ <ul class="footer-right">
+ <li>MegaPixels &copy;2017-19 &nbsp;<a href="https://ahprojects.com">Adam R. Harvey</a></li>
+ <li>Made with support from &nbsp;<a href="https://mozilla.org">Mozilla</a></li>
+ </ul>
+ </footer>
+</body>
+
+<script src="/assets/js/dist/index.js"></script>
+</html> \ No newline at end of file
diff --git a/site/public/index.html b/site/public/index.html
index 81e48a1b..9cb30060 100644
--- a/site/public/index.html
+++ b/site/public/index.html
@@ -1,16 +1,16 @@
<!doctype html>
<html>
<head>
- <title>MegaPixels</title>
+ <title>MegaPixels: Face Recognition Datasets</title>
<meta charset="utf-8" />
<meta name="author" content="Adam Harvey, ahprojects.com" />
- <meta name="description" content="MegaPixels: Facial Recognition Datasets" />
+ <meta name="description" content="MegaPixels: Investigating Face Recognition Datasets" />
<meta name="referrer" content="no-referrer" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
<meta property="og:title" content="MegaPixels"/>
<meta property="og:type" content="website"/>
<meta property="og:url" content="https://megapixels.cc/"/>
- <meta property="og:site_name" content="MegaPixels" />
+ <meta property="og:site_name" content="MegaPixels: Face Recognition Datasets" />
<meta name="referrer" content="no-referrer" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=no"/>
<meta name="apple-mobile-web-app-status-bar-style" content="black">
diff --git a/site/public/research/00_introduction/index.html b/site/public/research/00_introduction/index.html
index 64635c55..cb6ff7a7 100644
--- a/site/public/research/00_introduction/index.html
+++ b/site/public/research/00_introduction/index.html
@@ -76,6 +76,15 @@
<p>There is only biased feature vector clustering and probabilistic thresholding.</p>
<h2>If you don't have data, you don't have a product.</h2>
<p>Yesterday's <a href="https://www.reuters.com/article/us-microsoft-ai/microsoft-turned-down-facial-recognition-sales-on-human-rights-concerns-idUSKCN1RS2FV">decision</a> by Brad Smith, CEO of Microsoft, to not sell facial recognition to a US law enforcement agency is not an about face by Microsoft to become more humane, it's simply a perfect illustration of the value of training data. Without data, you don't have a product to sell. Microsoft realized that doesn't have enough training data to sell</p>
+<h2>Cost of Faces</h2>
+<p>Univ Houston paid subjects $20/ea
+<a href="http://web.archive.org/web/20170925053724/http://cbl.uh.edu/index.php/pages/research/collecting_facial_images_from_multiples_in_texas">http://web.archive.org/web/20170925053724/http://cbl.uh.edu/index.php/pages/research/collecting_facial_images_from_multiples_in_texas</a></p>
+<p>FaceMeta facedataset.com</p>
+<ul>
+<li>BASIC: 15,000 images for $6,000 USD</li>
+<li>RECOMMENDED: 50,000 images for $12,000 USD</li>
+<li>ADVANCED: 100,000 images for $18,000 USD*</li>
+</ul>
<h2>Use Your Own Biometrics First</h2>
<p>If researchers want faces, they should take selfies and create their own dataset. If researchers want images of families to build surveillance software, they should use and distibute their own family portraits.</p>
<h3>Motivation</h3>
diff --git a/site/public/research/01_from_1_to_100_pixels/index.html b/site/public/research/01_from_1_to_100_pixels/index.html
index 7b86f5ef..cc9d3f94 100644
--- a/site/public/research/01_from_1_to_100_pixels/index.html
+++ b/site/public/research/01_from_1_to_100_pixels/index.html
@@ -92,6 +92,10 @@
<li>100x100 all you need for medical diagnosis</li>
<li>100x100 0.5% of one Instagram photo</li>
</ul>
+<p>Notes:</p>
+<ul>
+<li>Google FaceNet used images with (face?) sizes: Input sizes range from 96x96 pixels to 224x224pixels in our experiments. FaceNet: A Unified Embedding for Face Recognition and Clustering <a href="https://arxiv.org/pdf/1503.03832.pdf">https://arxiv.org/pdf/1503.03832.pdf</a></li>
+</ul>
<p>Ideas:</p>
<ul>
<li>Find specific cases of facial resolution being used in legal cases, forensic investigations, or military footage</li>
diff --git a/site/templates/home.html b/site/templates/home.html
index 81e48a1b..9cb30060 100644
--- a/site/templates/home.html
+++ b/site/templates/home.html
@@ -1,16 +1,16 @@
<!doctype html>
<html>
<head>
- <title>MegaPixels</title>
+ <title>MegaPixels: Face Recognition Datasets</title>
<meta charset="utf-8" />
<meta name="author" content="Adam Harvey, ahprojects.com" />
- <meta name="description" content="MegaPixels: Facial Recognition Datasets" />
+ <meta name="description" content="MegaPixels: Investigating Face Recognition Datasets" />
<meta name="referrer" content="no-referrer" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
<meta property="og:title" content="MegaPixels"/>
<meta property="og:type" content="website"/>
<meta property="og:url" content="https://megapixels.cc/"/>
- <meta property="og:site_name" content="MegaPixels" />
+ <meta property="og:site_name" content="MegaPixels: Face Recognition Datasets" />
<meta name="referrer" content="no-referrer" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=no"/>
<meta name="apple-mobile-web-app-status-bar-style" content="black">
diff --git a/todo.md b/todo.md
index 4586611e..dc7ebaad 100644
--- a/todo.md
+++ b/todo.md
@@ -2,27 +2,27 @@
## Global
-- JL/AH:U tidy up desktop css
- - dataset index page
- JL: mobile CSS
+ - lightbox/modal on mobile, close button not visible
+ - decrease font-size of intro header
+- AH: change intro heads to match twitter word counts better
+- AH: ensure one good graphic per dataset page for social sharing
+- AH: add social share graphic for homepage
+- AH: add press kit/downloads
## Splash
-- JL: add scripted slow-slow-zoom out effect + intro anim
-- AH: about 50 render heads for homepage + name list for word cloud
-- AH: work on CTA overlay design intro anim
-- AH: add mozilla to footer
+- AH: create high quality 3d heads
+- JL/AH: add IJB-C names to word cloud
## Datasets
- JL: this paper isn't appearing in the UCCS list of verified papers but should be included https://arxiv.org/pdf/1708.02337.pdf
-- AH: add dataset analysis for MS Celeb, IJB-C
-- AH: fix dataset analysis for UCCS, brainwahs graphics
-- AH: add license information to each dataset page
+- AH: add dataset analysis for IJB-C, HRT Transgender, MegaFace, PIPA
## About
-- x
+- ok
## Flickr Analysis
@@ -53,28 +53,37 @@ Collect Flickr IDs and metadata for:
- yfcc_100m
-## Analysis:
+## FT Analysis:
- [x] Brainwash
- [x] Duke MTMC
- [x] UCCS
-- [ ] MSCeleb
+- [x] MSCeleb
- [ ] IJB-C (and IJB-A/B?)
- [ ] HRT Transgender
- [x] Town Centre
+## NYT Analysis:
+
+- [ ] Helen
+- [ ] MegaFace
+- [ ] PIPA
+
## Verifications
- [x] Brainwash
- [x] Duke MTMC
+- [ ] Helen
- [x] UCCS
+- [ ] MegaFace
- [x] MSCeleb
+- [ ] PIPA
- [x] IJB-C (and IJB-A/B?)
- [x] HRT Transgender
- [x] Town Centre
-------------------
+-----------
## Datasets for next launch: