From f701e07f3a47e10c66eef831442b623df88c4597 Mon Sep 17 00:00:00 2001
From: adamhrv MegaPixels is an independent art and research project by Adam Harvey and Jules LaPlace that investigates the ethics, origins, and individual privacy implications of face recognition image datasets and their role in the expansion of biometric surveillance technologies. MegaPixels is made possible with support from Mozilla, our primary funding partner. Additional support for MegaPixels is provided by the European ARTificial Intelligence Network (AI LAB) at the Ars Electronica Center, 1-year research-in-residence grant from Karlsruhe HfG, and sales from the Privacy Gift Shop. This project is made possible with support from Mozilla. The MegaPixels website is based on an earlier installation from 2017 and ongoing research and lectures (TedX, CPDP) about facial recognition datasets. Over the last several years this project has evolved into a large-scale interrogation of hundreds of publicly-available face and person analysis datasets. MegaPixels aims to provide a critical perspective on machine learning image datsets, one that might otherwise escape academia and the industry funded artificial intelligence think tanks that are often supported by the same technology companies who have created many of the datasets presented on this site. MegaPixels is an independent project, designed as a public resource for educators, students, journalists, and researchers. Each dataset presented on this site undergoes a thorough review of its images, intent, and funding sources. Though the goals are similar to publishing a public academic paper, MegaPixels is a website-first reserch project aligns closley with the goals of pre-print academic publications. As such we welcome feedback and ways to improve this site and the clarity of the research. Because this project surfaces many funding issues with datasets (from datasets funded by the C.I.A. to the National Unviversity of Defense and Technology in China), it is important that we are transparent about own funding. The original MegaPixels installation in 2017 was built as a commission for and with support from Tactical Technology Collective and Mozilla. The bulk of the research and web-development during 2018 - 2018 was supported by a grant from Mozilla. Continued development in 2019 is partially supported by a 1-year Reseacher-in-Residence grant from Karlsruhe HfG, lecture and workshop fees, and from commissions and sales from the Privacy Gift Shop. Please get in touch if you are interested in supporting this project. MegaPixels is an art and research project first launched in 2017 for an installation at Tactical Technology Collective's GlassRoom about facial recognition datasets. In 2018 it was extended to cover pedestrian analysis datasets for a commission by Elevate Arts festival in Austria. Since then MegaPixels has evolved into a large-scale interrogation of hundreds of publicly-available face and person analysis datasets. MegaPixels aims to provide a critical perspective on machine learning image datsets, one that might otherwise escape academia and industry funded artificial intelligence think tanks that are often supported by the same technology companies who have created many of the datasets presented on this site. MegaPixels is an independent project, designed as a public resource for educators, students, journalists, and researchers. Each dataset presented on this site undergoes a thorough review of its images, intent, and funding sources. Though the goals are similar to publishing a public academic paper, MegaPixels is a website-first research project. One of the main focuses of the dataset investigations is uncovering where funding originated. Because of our empahasis on other researchers' funding sources, it is important that we are transparent about our own. This site and the past year of reserach have been primarily funded by a privacy art grant from Mozilla in 2018. The original MegaPixels installation in 2017 was built as a commission for and with support from Tactical Technology Collective and Mozilla. Continued development in 2019 is partially supported by a 1-year Reseacher-in-Residence grant from Karlsruhe HfG and lecture and workshop fees. Explore publicly available facial recognition datasets feeding into research and development of biometric surveillance technologies at the largest technology companies and defense contractors in the world. Microsoft Celeb (MS Celeb) is a dataset of 10 million face images scraped from the Internet and used for research and development of large-scale biometric recognition systems. According to Microsoft Research who created and published the dataset in 2016, MS Celeb is the largest publicly available face recognition dataset in the world, containing over 10 million images of nearly 100,000 individuals. Microsoft's goal in building this dataset was to distribute the initial training dataset of 100,000 individuals images and use this to accelerate reserch into recognizing a target list of one million individuals from their face images "using all the possibly collected face images of this individual on the web as training data". 2 These one million people, defined as Micrsoft Research as "celebrities", are often merely people who must maintain an online presence for their professional lives. Microsoft's list of 1 million people is an expansive exploitation of the current reality that for many people including academics, policy makers, writers, artists, and especially journalists maintaining an online presence is mandatory and should not allow Microsoft (or anyone else) to use their biometrics for reserach and development of surveillance technology. Many of names in target list even include people critical of the very technology Microsoft is using their name and biometric information to build. The list includes digital rights activists like Jillian York and [add more]; artists critical of surveillance including Trevor Paglen, Hito Steryl, Kyle McDonald, Jill Magid, and Aram Bartholl; Intercept founders Laura Poitras, Jeremy Scahill, and Glen Greenwald; Data and Society founder danah boyd; and even Julie Brill the former FTC commissioner responsible for protecting consumer’s privacy to name a few. Microsoft Celeb (MS Celeb) is a dataset of 10 million face images scraped from the Internet and used for research and development of large-scale biometric recognition systems. According to Microsoft Research who created and published the dataset in 2016, MS Celeb is the largest publicly available face recognition dataset in the world, containing over 10 million images of nearly 100,000 individuals. Microsoft's goal in building this dataset was to distribute an initial training dataset of 100,000 individuals images and use this to accelerate reserch into recognizing a target list of one million individuals from their face images "using all the possibly collected face images of this individual on the web as training data". 1 These one million people, defined by Micrsoft Research as "celebrities", are often merely people who must maintain an online presence for their professional lives. Microsoft's list of 1 million people is an expansive exploitation of the current reality that for many people including academics, policy makers, writers, artists, and especially journalists maintaining an online presence is mandatory and should not allow Microsoft or anyone else to use their biometrics for reserach and development of surveillance technology. Many of names in target list even include people critical of the very technology Microsoft is using their name and biometric information to build. The list includes digital rights activists like Jillian York and [add more]; artists critical of surveillance including Trevor Paglen, Hito Steryl, Jill Magid, and Aram Bartholl; Intercept founders Laura Poitras, Jeremy Scahill, and Glen Greenwald; Data and Society founder danah boyd; and even Julie Brill the former FTC commissioner responsible for protecting consumer’s privacy to name a few. Below is a list of names that were included in list of 1 million individuals curated to illustrate Microsoft's expansive and exploitative practice of scraping the Internet for biometric training data. The entire name file can be downloaded from msceleb.org. Names appearing with * indicate that Microsoft also distributed imaged. [ cleaning this up ] Below is a list of names that were included in list of 1 million individuals curated to illustrate Microsoft's expansive and exploitative practice of scraping the Internet for biometric training data. The entire name file can be downloaded from msceleb.org. Email msceleb@microsoft.com to have your name removed. Names appearing with * indicate that Microsoft also distributed images. After publishing this list, researchers from Microsoft Asia then worked with researchers affilliated with China's National University of Defense Technology (controlled by China's Central Military Commission) and used the the MS Celeb dataset for their research paper on using "Faces as Lighting Probes via Unsupervised Deep Highlight Extraction" with potential applications in 3D face recognition. In an article published by the Financial Times based on data discovered during this investigation, Samm Sacks (senior fellow at New American and China tech policy expert) commented that this research raised "red flags because of the nature of the technology, the authors affilliations, combined with the what we know about how this technology is being deployed in China right now". 3 Four more papers published by SenseTime which also use the MS Celeb dataset raise similar flags. SenseTime is Beijing based company providing surveillance to Chinese authorities including [ add context here ] has been flagged as complicity in potential human rights violations. One of the 4 SenseTime papers, "Exploring Disentangled Feature Representation Beyond Face Identification", shows how SenseTime is developing automated face analysis technology to infer race, narrow eyes, nose size, and chin size, all of which could be used to target vulnerable ethnic groups based on their facial appearances. 4 After publishing this list, researchers from Microsoft Asia then worked with researchers affiliated with China's National University of Defense Technology (controlled by China's Central Military Commission) and used the the MS Celeb dataset for their research paper on using "Faces as Lighting Probes via Unsupervised Deep Highlight Extraction" with potential applications in 3D face recognition. In an article published by Financial Times based on data surfaced during this investigation, Samm Sacks (a senior fellow at New America think tank) commented that this research raised "red flags because of the nature of the technology, the author's affiliations, combined with what we know about how this technology is being deployed in China right now". Adding, that "the [Chinese] government is using these technologies to biuld surveillance systems and to detain minorities [in Xinjiang]". 2 Four more papers published by SenseTime which also use the MS Celeb dataset raise similar flags. SenseTime is a computer vision surveillance company who until April 2019 provided surveillance to Chinese authorities to monitor and track Uighur Muslims in Xinjiang province and had been flagged numerous times as having potential links to human rights violations. One of the 4 SenseTime papers, "Exploring Disentangled Feature Representation Beyond Face Identification", shows how SenseTime was developing automated face analysis technology to infer race, narrow eyes, nose size, and chin size, all of which could be used to target vulnerable ethnic groups based on their facial appearances. Earlier in 2019, Microsoft CEO Brad Smith called for the governmental regulation of face recognition, citing the potential for misuse, a rare admission that Microsoft's surveillance-driven business model had lost its bearing. More recently Smith also announced that Microsoft would seemingly take stand against potential misuse and decided to not sell face recognition to an unnamed United States law enforcement agency, citing that their technology was not accurate enough to be used on minorities because it was trained mostly on white male faces. What the decision to block the sale announces is not so much that Microsoft has upgraded their ethics, but that it publicly acknolwedged it can't sell a data-driven product without data. Microsoft can't sell face recognition for faces they can't train on. Until now, that data has been freely harvested from the Internet and packaged in training sets like MS Celeb, which are overwhelmingly white and male. Without balanced data, facial recognition contains blind spots. And without datasets like MS Celeb, the powerful yet innaccurate facial recognition services like Microsoft's Azure Cognitive Service also would not be able to see at all. Microsoft didn't only create MS Celeb for other researchers to use, they also used it internally. In a publicly available 2017 Microsoft Research project called "(One-shot Face Recognition by Promoting Underrepresented Classes)", Microsoft leveraged the MS Celeb dataset to analyse their algorithms and advertise the results. Interestingly, the Microsoft's corporate version does not mention they used the MS Celeb datset, but the open-acess version of the paper published on arxiv.org that same year explicity mentions that Microsoft Research tested their algorithms "on the MS-Celeb-1M low-shot learning benchmark task." We suggest that if Microsoft Research wants biometric data for surveillance research and development, they should start with own researcher's biometric data instead of scraping the Internet for journalists, artists, writers, and academics. What the decision to block the sale announces is not so much that Microsoft had upgraded their ethics, but that Microsoft publicly acknowledged it can't sell a data-driven product without data. In other words, Microsoft can't sell face recognition for faces they can't train on. Until now, that data has been freely harvested from the Internet and packaged in training sets like MS Celeb, which are overwhelmingly white and male. Without balanced data, facial recognition contains blind spots. And without datasets like MS Celeb, the powerful yet inaccurate facial recognition services like Microsoft's Azure Cognitive Service also would not be able to see at all. Microsoft didn't only create MS Celeb for other researchers to use, they also used it internally. In a publicly available 2017 Microsoft Research project called "(One-shot Face Recognition by Promoting Underrepresented Classes)", Microsoft leveraged the MS Celeb dataset to analyze their algorithms and advertise the results. Interestingly, Microsoft's corporate version of the paper does not mention they used the MS Celeb datset, but the open-access version published on arxiv.org explicitly mentions that Microsoft Research tested their algorithms "on the MS-Celeb-1M low-shot learning benchmark task." We suggest that if Microsoft Research wants to make biometric data publicly available for surveillance research and development, they should start with releasing their researchers' own biometric data instead of scraping the Internet for journalists, artists, writers, actors, athletes, musicians, and academics.Adam Harvey
@@ -50,11 +49,10 @@
Team
Facial Recognition Datasets
+ Face Recognition Datasets
Microsoft Celeb Dataset (MS Celeb)
Microsoft's 1 Million Target List
-
Name
-ID
Profession
-Images
-
Jeremy Scahill
-/m/02p_8_n
+Adrian Chen
Journalist
-x
-
-Jillian York
-/m/0g9_3c3
-Digital rights activist
-x
-
-
-Astra Taylor
-/m/05f6_39
-Author, activist
-x
-
-
-Jonathan Zittrain
-/m/01f75c
-EFF board member
-no
-
-
Julie Brill
-x
-x
-x
+Ai Weiwei*
+Artist
-
Jonathan Zittrain
-x
-x
-x
+Aram Bartholl
+Internet artist
-
Bruce Schneier
-m.095js
-Cryptologist and author
-yes
+Astra Taylor
+Author, director, activist
-
Julie Brill
-m.0bs3s9g
-x
-x
+Alexander Madrigal
+Journlist
-
Kim Zetter
-/m/09r4j3
-x
-x
+Bruce Schneier*
+Cryptologist
-
Ethan Zuckerman
-x
-x
-x
+danah boyd
+Data & Society founder
-
Jill Magid
-x
-x
-x
+Edward Felten
+Former FTC Chief Technologist
-
Kyle McDonald
-x
-x
-x
+Evgeny Morozov*
+Tech writer, researcher
-
Trevor Paglen
-x
-x
-x
+Glen Greenwald*
+Journalist, author
-
R. Luke DuBois
-x
-x
-x
+Hito Steryl
+Artist, writer
-
Name
-ID
Profession
-Images
-
-Trevor Paglen
-x
-x
-x
-
-
-Ai Weiwei
-/m/0278dyq
-x
-x
-
-
-Jer Thorp
-/m/01h8lg
-x
-x
-
-
Edward Felten
-/m/028_7k
-x
-x
+James Risen
+Journalist
-
Evgeny Morozov
-/m/05sxhgd
-Scholar and technology critic
-yes
+Jeremy Scahill*
+Journalist
-
danah boyd
-/m/06zmx5
-Data and Society founder
-x
+Jill Magid
+Artist
-
Bruce Schneier
-x
-x
-x
+Jillian York
+Digital rights activist
-
Laura Poitras
-x
-x
-x
+Jonathan Zittrain
+EFF board member
-
Trevor Paglen
-x
-x
-x
+Julie Brill
+Former FTC Commissioner
-
Astra Taylor
-x
-x
-x
+Kim Zetter
+Journalist, author
-
Shoshanaa Zuboff
-x
-x
-x
+Laura Poitras*
+Filmmaker
-
Eyal Weizman
-m.0g54526
-x
-x
+Luke DuBois
+Artist
-
Aram Bartholl
-m.06_wjyc
-x
-x
+Shoshana Zuboff
+Author, academic
-
James Risen
-m.09pk6b
-x
-x
+Trevor Paglen
+Artist, researcher
Who used Microsoft Celeb?
@@ -313,10 +228,8 @@
Supplementary Information
-References
References
Explore publicly available facial recognition datasets feeding into research and development of biometric surveillance technologies at the largest technology companies and defense contractors in the world.
+diff --git a/site/public/about/attribution/index.html b/site/public/about/attribution/index.html index d3d38d3c..7b09e5b4 100644 --- a/site/public/about/attribution/index.html +++ b/site/public/about/attribution/index.html @@ -60,17 +60,17 @@ To Adapt: To modify, transform and build upon the database
diff --git a/site/public/about/index.html b/site/public/about/index.html index b0cb3436..48d1bb1c 100644 --- a/site/public/about/index.html +++ b/site/public/about/index.html @@ -50,9 +50,9 @@
MegaPixels is an art and research project first launched in 2017 for an installation at Tactical Technology Collective's GlassRoom about facial recognition datasets. In 2018 it was extended to cover pedestrian analysis datasets for a commission by Elevate Arts festival in Austria. Since then MegaPixels has evolved into a large-scale interrogation of hundreds of publicly-available face and person analysis datasets.
-
MegaPixels aims to provide a critical perspective on machine learning image datsets, one that might otherwise escape academia and industry funded artificial intelligence think tanks that are often supported by the same technology companies who have created many of the datasets presented on this site.
-
MegaPixels is an independent project, designed as a public resource for educators, students, journalists, and researchers. Each dataset presented on this site undergoes a thorough review of its images, intent, and funding sources. Though the goals are similar to publishing a public academic paper, MegaPixels is a website-first research project.
-
One of the main focuses of the dataset investigations is uncovering where funding originated. Because of our empahasis on other researchers' funding sources, it is important that we are transparent about our own. This site and the past year of reserach have been primarily funded by a privacy art grant from Mozilla in 2018. The original MegaPixels installation in 2017 was built as a commission for and with support from Tactical Technology Collective and Mozilla. Continued development in 2019 is partially supported by a 1-year Reseacher-in-Residence grant from Karlsruhe HfG and lecture and workshop fees.
+
MegaPixels aims to provide a critical perspective on machine learning image datsets, one that might otherwise escape academia and industry funded artificial intelligence think tanks that are often supported by the several of the same technology companies who have created datasets presented on this site.
+
MegaPixels is an independent project, designed as a public resource for educators, students, journalists, and researchers. Each dataset presented on this site undergoes a thorough review of its images, intent, and funding sources. Though the goals are similar to publishing an academic paper, MegaPixels is a website-first research project, with an academic paper to follow.
+
One of the main focuses of the dataset investigations presented on this site is to uncover where funding originated. Because of our empahasis on other researchers' funding sources, it is important that we are transparent about our own. This site and the past year of reserach have been primarily funded by a privacy art grant from Mozilla in 2018. The original MegaPixels installation in 2017 was built as a commission for and with support from Tactical Technology Collective and Mozilla. The research into pedestrian analysis datasets was funded by a commission from Elevate Arts, and continued development in 2019 is supported in part by a 1-year Reseacher-in-Residence grant from Karlsruhe HfG and lecture and workshop fees.
Please direct questions, comments, or feedback to mastodon.social/@adamhrv
-The MegaPixels website, research, and development is made possible with support form Mozilla, our primary funding partner.
-[ add logos ]
-Additional support is provided by the European ARTificial Intelligence Network (AI LAB) at the Ars Electronica Center and a 1-year research-in-residence grant from Karlsruhe HfG.
-[ add logos ]
-If you use MegaPixels or any data derived from it for your work, please cite our original work as follows:
@online{megapixels,
@@ -87,23 +81,25 @@ You are free:
title = {MegaPixels: Origins, Ethics, and Privacy Implications of Publicly Available Face Recognition Image Datasets},
year = 2019,
url = {https://megapixels.cc/},
- urldate = {2019-04-20}
+ urldate = {2019-04-18}
}
-+
+
Please direct questions, comments, or feedback to mastodon.social/@adamhrv
+ diff --git a/site/public/about/legal/index.html b/site/public/about/legal/index.html index 9eb5dd5a..ce10014a 100644 --- a/site/public/about/legal/index.html +++ b/site/public/about/legal/index.html @@ -90,17 +90,17 @@ To Adapt: To modify, transform and build upon the database
diff --git a/site/public/about/press/index.html b/site/public/about/press/index.html index 7b0a3e87..70caf03c 100644 --- a/site/public/about/press/index.html +++ b/site/public/about/press/index.html @@ -41,17 +41,17 @@ diff --git a/site/public/datasets/50_people_one_question/index.html b/site/public/datasets/50_people_one_question/index.html index dc7919f7..76d5b92f 100644 --- a/site/public/datasets/50_people_one_question/index.html +++ b/site/public/datasets/50_people_one_question/index.html @@ -88,7 +88,7 @@
- The dataset citations used in the visualizations were collected from Semantic Scholar, a website which aggregates and indexes research papers. Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train or test machine learning algorithms. + The dataset citations used in the visualizations were collected from Semantic Scholar, a website which aggregates and indexes research papers. Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train or test machine learning algorithms. If you use our data, please cite our work.
@@ -96,17 +96,17 @@ diff --git a/site/public/datasets/afad/index.html b/site/public/datasets/afad/index.html index f2b0a5ba..a3ff00cf 100644 --- a/site/public/datasets/afad/index.html +++ b/site/public/datasets/afad/index.html @@ -90,7 +90,7 @@
- The dataset citations used in the visualizations were collected from Semantic Scholar, a website which aggregates and indexes research papers. Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train or test machine learning algorithms. + The dataset citations used in the visualizations were collected from Semantic Scholar, a website which aggregates and indexes research papers. Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train or test machine learning algorithms. If you use our data, please cite our work.
@@ -109,17 +109,17 @@ Motivation
diff --git a/site/public/datasets/brainwash/index.html b/site/public/datasets/brainwash/index.html index b17617a6..cf1f5e5e 100644 --- a/site/public/datasets/brainwash/index.html +++ b/site/public/datasets/brainwash/index.html @@ -99,7 +99,7 @@
- The dataset citations used in the visualizations were collected from Semantic Scholar, a website which aggregates and indexes research papers. Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train or test machine learning algorithms. + The dataset citations used in the visualizations were collected from Semantic Scholar, a website which aggregates and indexes research papers. Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train or test machine learning algorithms. If you use our data, please cite our work.
@@ -137,17 +137,17 @@ diff --git a/site/public/datasets/caltech_10k/index.html b/site/public/datasets/caltech_10k/index.html index 04d63ee3..e86c5ca3 100644 --- a/site/public/datasets/caltech_10k/index.html +++ b/site/public/datasets/caltech_10k/index.html @@ -96,7 +96,7 @@
- The dataset citations used in the visualizations were collected from Semantic Scholar, a website which aggregates and indexes research papers. Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train or test machine learning algorithms. + The dataset citations used in the visualizations were collected from Semantic Scholar, a website which aggregates and indexes research papers. Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train or test machine learning algorithms. If you use our data, please cite our work.
@@ -106,17 +106,17 @@ diff --git a/site/public/datasets/celeba/index.html b/site/public/datasets/celeba/index.html index c72f3798..0236b91c 100644 --- a/site/public/datasets/celeba/index.html +++ b/site/public/datasets/celeba/index.html @@ -94,7 +94,7 @@
- The dataset citations used in the visualizations were collected from Semantic Scholar, a website which aggregates and indexes research papers. Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train or test machine learning algorithms. + The dataset citations used in the visualizations were collected from Semantic Scholar, a website which aggregates and indexes research papers. Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train or test machine learning algorithms. If you use our data, please cite our work.
@@ -108,17 +108,17 @@ diff --git a/site/public/datasets/cofw/index.html b/site/public/datasets/cofw/index.html index eef8cf5e..b0e73dac 100644 --- a/site/public/datasets/cofw/index.html +++ b/site/public/datasets/cofw/index.html @@ -87,7 +87,7 @@
- The dataset citations used in the visualizations were collected from Semantic Scholar, a website which aggregates and indexes research papers. Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train or test machine learning algorithms. + The dataset citations used in the visualizations were collected from Semantic Scholar, a website which aggregates and indexes research papers. Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train or test machine learning algorithms. If you use our data, please cite our work.
@@ -138,7 +138,7 @@ To increase the number of training images, and since COFW has the exact same la
- The dataset citations used in the visualizations were collected from Semantic Scholar, a website which aggregates and indexes research papers. Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train or test machine learning algorithms. + The dataset citations used in the visualizations were collected from Semantic Scholar, a website which aggregates and indexes research papers. Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train or test machine learning algorithms. If you use our data, please cite our work.
@@ -161,17 +161,17 @@ To increase the number of training images, and since COFW has the exact same la diff --git a/site/public/datasets/duke_mtmc/index.html b/site/public/datasets/duke_mtmc/index.html index 14e6bee0..90c131b8 100644 --- a/site/public/datasets/duke_mtmc/index.html +++ b/site/public/datasets/duke_mtmc/index.html @@ -246,7 +246,7 @@
- The dataset citations used in the visualizations were collected from Semantic Scholar, a website which aggregates and indexes research papers. Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train or test machine learning algorithms. + The dataset citations used in the visualizations were collected from Semantic Scholar, a website which aggregates and indexes research papers. Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train or test machine learning algorithms. If you use our data, please cite our work.
@@ -369,17 +369,17 @@ diff --git a/site/public/datasets/feret/index.html b/site/public/datasets/feret/index.html index 387826b0..09abaee2 100644 --- a/site/public/datasets/feret/index.html +++ b/site/public/datasets/feret/index.html @@ -90,7 +90,7 @@
- The dataset citations used in the visualizations were collected from Semantic Scholar, a website which aggregates and indexes research papers. Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train or test machine learning algorithms. + The dataset citations used in the visualizations were collected from Semantic Scholar, a website which aggregates and indexes research papers. Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train or test machine learning algorithms. If you use our data, please cite our work.
@@ -119,17 +119,17 @@ diff --git a/site/public/datasets/hrt_transgender/index.html b/site/public/datasets/hrt_transgender/index.html index 6b9ae7be..4e566a4a 100644 --- a/site/public/datasets/hrt_transgender/index.html +++ b/site/public/datasets/hrt_transgender/index.html @@ -49,17 +49,17 @@ diff --git a/site/public/datasets/index.html b/site/public/datasets/index.html new file mode 100644 index 00000000..6e43e73f --- /dev/null +++ b/site/public/datasets/index.html @@ -0,0 +1,147 @@ + + +
+
+ + + + + + + + + + +
+ + +