diff options
Diffstat (limited to 'site/content/pages/datasets')
20 files changed, 133 insertions, 29 deletions
diff --git a/site/content/pages/datasets/duke_mtmc/index.md b/site/content/pages/datasets/duke_mtmc/index.md index d766e258..b5c6bf1a 100644 --- a/site/content/pages/datasets/duke_mtmc/index.md +++ b/site/content/pages/datasets/duke_mtmc/index.md @@ -18,7 +18,7 @@ authors: Adam Harvey ### sidebar ### end sidebar -Duke MTMC (Multi-Target, Multi-Camera) is a dataset of surveillance video footage taken on Duke University's campus in 2014 and is used for research and development of video tracking systems, person re-identification, and low-resolution facial recognition. The dataset contains over 14 hours of synchronized surveillance video from 8 cameras at 1080p and 60 FPS, with over 2 million frames of 2,000 students walking to and from classes. The 8 surveillance cameras deployed on campus were specifically setup to capture students "during periods between lectures, when pedestrian traffic is heavy"[^duke_mtmc_orig]. +Duke MTMC (Multi-Target, Multi-Camera) is a dataset of surveillance video footage taken on Duke University's campus in 2014 and is used for research and development of video tracking systems, person re-identification, and low-resolution facial recognition. The dataset contains over 14 hours of synchronized surveillance video from 8 cameras at 1080p and 60 FPS, with over 2 million frames of 2,000 students walking to and from classes. The 8 surveillance cameras deployed on campus were specifically setup to capture students "during periods between lectures, when pedestrian traffic is heavy".[^duke_mtmc_orig] For this analysis of the Duke MTMC dataset over 100 publicly available research papers that used the dataset were analyzed to find out who's using the dataset and where it's being used. The results show that the Duke MTMC dataset has spread far beyond its origins and intentions in academic research projects at Duke University. Since its publication in 2016, more than twice as many research citations originated in China as in the United States. Among these citations were papers links to the Chinese military and several of the companies known to provide Chinese authorities with the oppressive surveillance technology used to monitor millions of Uighur Muslims. diff --git a/site/content/pages/datasets/helen/assets/background.jpg b/site/content/pages/datasets/helen/assets/background.jpg Binary files differnew file mode 100755 index 00000000..6958a2b2 --- /dev/null +++ b/site/content/pages/datasets/helen/assets/background.jpg diff --git a/site/content/pages/datasets/helen/assets/ijb_c_montage.jpg b/site/content/pages/datasets/helen/assets/ijb_c_montage.jpg Binary files differnew file mode 100755 index 00000000..3b5a0e40 --- /dev/null +++ b/site/content/pages/datasets/helen/assets/ijb_c_montage.jpg diff --git a/site/content/pages/datasets/helen/assets/index.jpg b/site/content/pages/datasets/helen/assets/index.jpg Binary files differnew file mode 100755 index 00000000..7268d6ad --- /dev/null +++ b/site/content/pages/datasets/helen/assets/index.jpg diff --git a/site/content/pages/datasets/helen/index.md b/site/content/pages/datasets/helen/index.md new file mode 100644 index 00000000..d44c9b98 --- /dev/null +++ b/site/content/pages/datasets/helen/index.md @@ -0,0 +1,30 @@ +------------ + +status: draft +title: HELEN +desc: HELEN Face Dataset +subdesc: HELEN (under development) +slug: helen +cssclass: dataset +image: assets/background.jpg +year: 2000 +published: 2019-4-18 +updated: 2019-4-18 +authors: Adam Harvey + +------------ + +## HELEN + +### sidebar +### end sidebar + +[ page under development ] + +{% include 'dashboard.html' %} + +{% include 'supplementary_header.html' %} + +{% include 'cite_our_work.html' %} + +### Footnotes diff --git a/site/content/pages/datasets/ibm_dif/assets/background.jpg b/site/content/pages/datasets/ibm_dif/assets/background.jpg Binary files differnew file mode 100755 index 00000000..6958a2b2 --- /dev/null +++ b/site/content/pages/datasets/ibm_dif/assets/background.jpg diff --git a/site/content/pages/datasets/ibm_dif/assets/ijb_c_montage.jpg b/site/content/pages/datasets/ibm_dif/assets/ijb_c_montage.jpg Binary files differnew file mode 100755 index 00000000..3b5a0e40 --- /dev/null +++ b/site/content/pages/datasets/ibm_dif/assets/ijb_c_montage.jpg diff --git a/site/content/pages/datasets/ibm_dif/assets/index.jpg b/site/content/pages/datasets/ibm_dif/assets/index.jpg Binary files differnew file mode 100755 index 00000000..7268d6ad --- /dev/null +++ b/site/content/pages/datasets/ibm_dif/assets/index.jpg diff --git a/site/content/pages/datasets/ibm_dif/index.md b/site/content/pages/datasets/ibm_dif/index.md new file mode 100644 index 00000000..4c620e95 --- /dev/null +++ b/site/content/pages/datasets/ibm_dif/index.md @@ -0,0 +1,30 @@ +------------ + +status: draft +title: MegaFace +desc: MegaFace Dataset +subdesc: MegaFace contains 670K identities and 4.7M images +slug: megaface +cssclass: dataset +image: assets/background.jpg +year: 2016 +published: 2019-4-18 +updated: 2019-4-18 +authors: Adam Harvey + +------------ + +## MegaFace + +### sidebar +### end sidebar + +[ page under development ] + +{% include 'dashboard.html' %} + +{% include 'supplementary_header.html' %} + +{% include 'cite_our_work.html' %} + +### Footnotes diff --git a/site/content/pages/datasets/ijb_c/index.md b/site/content/pages/datasets/ijb_c/index.md index d1ac769b..70c71f19 100644 --- a/site/content/pages/datasets/ijb_c/index.md +++ b/site/content/pages/datasets/ijb_c/index.md @@ -21,36 +21,19 @@ authors: Adam Harvey [ page under development ] -The IARPA Janus Benchmark C (IJB–C) is a dataset of web images used for face recognition research and development. The IJB–C dataset contains 3,531 people +The IARPA Janus Benchmark C (IJB–C) is a dataset of web images used for face recognition research and development. The IJB–C dataset contains 3,531 people from 21,294 images and 3,531 videos. The list of 3,531 names are activists, artists, journalists, foreign politicians, and public speakers. -Among the target list of 3,531 names are activists, artists, journalists, foreign politicians, +Key Findings: - - -- Subjects 3531 -- Templates: 140739 -- Genuine Matches: 7819362 -- Impostor Matches: 39584639 - - -Why not include US Soliders instead of activists? - - -was creted by Nobilis, a United States Government contractor is used to develop software for the US intelligence agencies as part of the IARPA Janus program. - -The IARPA Janus program is - -these representations must address the challenges of Aging, Pose, Illumination, and Expression (A-PIE) by exploiting all available imagery. - - -- metadata annotations were created using crowd annotations -- created by Nobilis -- used mechanical turk +- metadata annotations were created using crowd annotations on Mechanical Turk +- The dataset was creatd Nobilis - made for intelligence analysts - improve performance of face recognition tools - by fusing the rich spatial, temporal, and contextual information available from the multiple views captured by today’s "media in the wild" +The dataset includes Creative Commons images + The name list includes @@ -92,7 +75,7 @@ The first 777 are non-alphabetical. From 777-3531 is alphabetical From original papers: https://noblis.org/wp-content/uploads/2018/03/icb2018.pdf -Collection for the dataset began by identifying CreativeCommons subject videos, which are often more scarce thanCreative Commons subject images. Search terms that re-sulted in large quantities of person-centric videos (e.g. “in-terview”) were generated and translated into numerous lan-guages including Arabic, Korean, Swahili, and Hindi to in-crease diversity of the subject pool. Certain YouTube userswho upload well-labeled, person-centric videos, such as the World Economic Forum and the International University Sports Federation were also identified. Titles of videos per-taining to these search terms and usernames were scrapedusing the YouTube Data API and translated into English us-ing the Yandex Translate API4. Pattern matching was per-formed to extract potential names of subjects from the trans-lated titles, and these names were searched using the Wiki-data API to verify the subject’s existence and status as a public figure, and to check for Wikimedia Commons im-agery. Age, gender, and geographic region were collectedusing the Wikipedia API.Using the candidate subject names, Creative Commonsimages were scraped from Google and Wikimedia Com-mons, and Creative Commons videos were scraped fromYouTube. After images and videos of the candidate subjectwere identified, AMT Workers were tasked with validat-ing the subject’s presence throughout the video. The AMTWorkers marked segments of the video in which the subjectwas present, and key frames +Collection for the dataset began by identifying CreativeCommons subject videos, which are often more scarce than Creative Commons subject images. Search terms that re-sulted in large quantities of person-centric videos (e.g. “in-terview”) were generated and translated into numerous lan-guages including Arabic, Korean, Swahili, and Hindi to in-crease diversity of the subject pool. Certain YouTube userswho upload well-labeled, person-centric videos, such as the World Economic Forum and the International University Sports Federation were also identified. Titles of videos per-taining to these search terms and usernames were scrapedusing the YouTube Data API and translated into English us-ing the Yandex Translate API4. Pattern matching was per-formed to extract potential names of subjects from the trans-lated titles, and these names were searched using the Wiki-data API to verify the subject’s existence and status as a public figure, and to check for Wikimedia Commons im-agery. Age, gender, and geographic region were collectedusing the Wikipedia API.Using the candidate subject names, Creative Commonsimages were scraped from Google and Wikimedia Com-mons, and Creative Commons videos were scraped fromYouTube. After images and videos of the candidate subjectwere identified, AMT Workers were tasked with validat-ing the subject’s presence throughout the video. The AMTWorkers marked segments of the video in which the subjectwas present, and key frames IARPA funds Italian researcher https://www.micc.unifi.it/projects/glaivejanus/ diff --git a/site/content/pages/datasets/megaface/assets/background.jpg b/site/content/pages/datasets/megaface/assets/background.jpg Binary files differnew file mode 100755 index 00000000..6958a2b2 --- /dev/null +++ b/site/content/pages/datasets/megaface/assets/background.jpg diff --git a/site/content/pages/datasets/megaface/assets/ijb_c_montage.jpg b/site/content/pages/datasets/megaface/assets/ijb_c_montage.jpg Binary files differnew file mode 100755 index 00000000..3b5a0e40 --- /dev/null +++ b/site/content/pages/datasets/megaface/assets/ijb_c_montage.jpg diff --git a/site/content/pages/datasets/megaface/assets/index.jpg b/site/content/pages/datasets/megaface/assets/index.jpg Binary files differnew file mode 100755 index 00000000..7268d6ad --- /dev/null +++ b/site/content/pages/datasets/megaface/assets/index.jpg diff --git a/site/content/pages/datasets/megaface/index.md b/site/content/pages/datasets/megaface/index.md new file mode 100644 index 00000000..4c620e95 --- /dev/null +++ b/site/content/pages/datasets/megaface/index.md @@ -0,0 +1,30 @@ +------------ + +status: draft +title: MegaFace +desc: MegaFace Dataset +subdesc: MegaFace contains 670K identities and 4.7M images +slug: megaface +cssclass: dataset +image: assets/background.jpg +year: 2016 +published: 2019-4-18 +updated: 2019-4-18 +authors: Adam Harvey + +------------ + +## MegaFace + +### sidebar +### end sidebar + +[ page under development ] + +{% include 'dashboard.html' %} + +{% include 'supplementary_header.html' %} + +{% include 'cite_our_work.html' %} + +### Footnotes diff --git a/site/content/pages/datasets/msceleb/index.md b/site/content/pages/datasets/msceleb/index.md index 5095da3d..453c1522 100644 --- a/site/content/pages/datasets/msceleb/index.md +++ b/site/content/pages/datasets/msceleb/index.md @@ -87,7 +87,8 @@ Until now, that data has been freely harvested from the Internet and packaged in  -Microsoft didn't only create MS Celeb for other researchers to use, they also used it internally. In a publicly available 2017 Microsoft Research project called "[One-shot Face Recognition by Promoting Underrepresented Classes](https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/)," Microsoft used the MS Celeb face dataset to build their algorithms and advertise the results. Interestingly, Microsoft's [corporate version](https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/) of the paper does not mention they used the MS Celeb datset, but the [open-access version](https://www.semanticscholar.org/paper/One-shot-Face-Recognition-by-Promoting-Classes-Guo/6cacda04a541d251e8221d70ac61fda88fb61a70) published on arxiv.org does. It states that Microsoft Research analyzed their algorithms using "the MS-Celeb-1M low-shot learning benchmark task."[^one_shot] +Microsoft didn't only create MS Celeb for other researchers to use, they also used it internally. In a publicly available 2017 Microsoft Research project called "[One-shot Face Recognition by Promoting Underrepresented Classes](https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/)," Microsoft used the MS Celeb face dataset to build their algorithms and advertise the results. Interestingly, Microsoft's [corporate version](https://www.microsoft.com/en-us/research/publication/one-shot-face-recognition-promoting-underrepresented-classes/) of the paper does not mention they used the MS Celeb datset, but the [open-access version](https://www.semanticscholar.org/paper/One-shot-Face-Recognition-by-Promoting-Classes-Guo/6cacda04a541d251e8221d70ac61fda88fb61a70) published on arxiv.org does. It states that Microsoft analyzed their algorithms "on the MS-Celeb-1M low-shot learning [benchmark task](https://www.microsoft.com/en-us/research/publication/ms-celeb-1m-dataset-benchmark-large-scale-face-recognition-2/)"[^one_shot], which is described as a refined version of the original MS-Celeb-1M face dataset. + Typically researchers will phrase this differently and say that they only use a dataset to validate their algorithm. But validation data can't be easily separated from the training process. To develop a neural network model, image training datasets are split into three parts: train, test, and validation. Training data is used to fit a model, and the validation and test data are used to provide feedback about the hyperparameters, biases, and outputs. In reality, test and validation data steers and influences the final results of neural networks. diff --git a/site/content/pages/datasets/oxford_town_centre/index.md b/site/content/pages/datasets/oxford_town_centre/index.md index bd340113..c2e3e7a7 100644 --- a/site/content/pages/datasets/oxford_town_centre/index.md +++ b/site/content/pages/datasets/oxford_town_centre/index.md @@ -29,11 +29,11 @@ The Oxford Town Centre dataset is unique in that it uses footage from a public s ### Location -The street location of the camera used for the Oxford Town Centre dataset was confirmed by matching the road, benches, and store signs [source](https://www.google.com/maps/@51.7528162,-1.2581152,3a,50.3y,310.59h,87.23t/data=!3m7!1e1!3m5!1s3FsGN-PqYC-VhQGjWgmBdQ!2e0!5s20120601T000000!7i13312!8i6656). At that location, two public CCTV cameras exist mounted on the side of the Northgate House building at 13-20 Cornmarket St. Because of the lower camera's mounting pole directionality, a view from a private camera in the building across the street can be ruled out because it would have to show more of silhouette of the lower camera's mounting pole. Two options remain: either the public CCTV camera mounted to the side of the building was used or the researchers mounted their own camera to the side of the building in the same location. Because the researchers used many other existing public CCTV cameras for their [research projects](http://www.robots.ox.ac.uk/ActiveVision/Research/Projects/2009bbenfold_headpose/project.html) it is increasingly likely that they would also be able to access to this camera. +The street location of the camera used for the Oxford Town Centre dataset was confirmed by matching the road, benches, and store signs [source](https://www.google.com/maps/@51.7528162,-1.2581152,3a,50.3y,310.59h,87.23t/data=!3m7!1e1!3m5!1s3FsGN-PqYC-VhQGjWgmBdQ!2e0!5s20120601T000000!7i13312!8i6656). At that location, two public CCTV cameras exist mounted on the side of the Northgate House building at 13-20 Cornmarket St. The upper camera, a public CCTV camera installed for security, is most likely the camera used to create this dataset. -Next, to discredit the theory that this public CCTV is only seen pointing the other way in Google Street View images, at least one public photo shows the upper CCTV camera [pointing in the same direction](https://www.oxcivicsoc.org.uk/northgate-house-cornmarket/) as the Oxford Town Centre dataset, proving the camera can and has been rotated before. +The camera can be seen pointing in the same direction as the dataset's view in this [public image](https://www.oxcivicsoc.org.uk/northgate-house-cornmarket/), and the researchers used other existing public CCTV cameras for additional [research projects](http://www.robots.ox.ac.uk/ActiveVision/Research/Projects/2009bbenfold_headpose/project.html) increasing the likelihood that they could have had access to this camera. -As for the capture date, the text on the storefront display shows a sale happening from December 2nd – 7th indicating the capture date was between or just before those dates. The capture year is either 2008 or 2007, since prior to 2007 the Carphone Warehouse ([photo](https://www.flickr.com/photos/katieportwin/364492063/in/photolist-4meWFE-yd7rw-yd7X6-5sDHuc-yd7DN-59CpEK-5GoHAc-yd7Zh-3G2uJP-yd7US-5GomQH-4peYpq-4bAEwm-PALEr-58RkAp-5pHEkf-5v7fGq-4q1J9W-4kypQ2-5KX2Eu-yd7MV-yd7p6-4McgWb-5pJ55w-24N9gj-37u9LK-4FVcKQ-a81Enz-5qNhTG-59CrMZ-2yuwYM-5oagH5-59CdsP-4FVcKN-4PdxhC-5Lhr2j-2PAd2d-5hAwvk-zsQSG-4Cdr4F-3dUPEi-9B1RZ6-2hv5NY-4G5qwP-HCHBW-4JiuC4-4Pdr9Y-584aEV-2GYBEc-HCPkp/), [history](http://www.oxfordhistory.org.uk/cornmarket/west/47_51.html)) did not exist at this location. Since the sweaters in the GAP window display are more similar to those in a [GAP website snapshot](web.archive.org/web/20081201002524/http://www.gap.com/) from November 2007, our guess is that the footage was obtained during late November or early December 2007. The lack of street vendors and slight waste residue near the bench suggests that it was probably a weekday after rubbish removal. +The capture date is estimated to be during late November or early December in 2007 or 2008. The text on the storefront display shows a sale happening from December 2nd – 7th indicating the capture date was likely around this time. Prior to 2007 the Carphone Warehouse ([photo](https://www.flickr.com/photos/katieportwin/364492063/in/photolist-4meWFE-yd7rw-yd7X6-5sDHuc-yd7DN-59CpEK-5GoHAc-yd7Zh-3G2uJP-yd7US-5GomQH-4peYpq-4bAEwm-PALEr-58RkAp-5pHEkf-5v7fGq-4q1J9W-4kypQ2-5KX2Eu-yd7MV-yd7p6-4McgWb-5pJ55w-24N9gj-37u9LK-4FVcKQ-a81Enz-5qNhTG-59CrMZ-2yuwYM-5oagH5-59CdsP-4FVcKN-4PdxhC-5Lhr2j-2PAd2d-5hAwvk-zsQSG-4Cdr4F-3dUPEi-9B1RZ6-2hv5NY-4G5qwP-HCHBW-4JiuC4-4Pdr9Y-584aEV-2GYBEc-HCPkp/), [history](http://www.oxfordhistory.org.uk/cornmarket/west/47_51.html)) did not exist at this location. And since the sweaters in the GAP window display are more similar to those in a [GAP website snapshot](web.archive.org/web/20081201002524/http://www.gap.com/) from November 2007, it was probably recorded in 2007. The slight waste residue near the bench and the lack street vendors that typically appear on a weekend, suggest that it was perhaps a weekday after rubbish removal.  diff --git a/site/content/pages/datasets/who_goes_there/assets/background.jpg b/site/content/pages/datasets/who_goes_there/assets/background.jpg Binary files differnew file mode 100755 index 00000000..6958a2b2 --- /dev/null +++ b/site/content/pages/datasets/who_goes_there/assets/background.jpg diff --git a/site/content/pages/datasets/who_goes_there/assets/ijb_c_montage.jpg b/site/content/pages/datasets/who_goes_there/assets/ijb_c_montage.jpg Binary files differnew file mode 100755 index 00000000..3b5a0e40 --- /dev/null +++ b/site/content/pages/datasets/who_goes_there/assets/ijb_c_montage.jpg diff --git a/site/content/pages/datasets/who_goes_there/assets/index.jpg b/site/content/pages/datasets/who_goes_there/assets/index.jpg Binary files differnew file mode 100755 index 00000000..7268d6ad --- /dev/null +++ b/site/content/pages/datasets/who_goes_there/assets/index.jpg diff --git a/site/content/pages/datasets/who_goes_there/index.md b/site/content/pages/datasets/who_goes_there/index.md new file mode 100644 index 00000000..feb9896d --- /dev/null +++ b/site/content/pages/datasets/who_goes_there/index.md @@ -0,0 +1,30 @@ +------------ + +status: draft +title: Who Goes There Dataset +desc: Who Goes There Dataset +subdesc: Who Goes There (page under development) +slug: who_goes_there +cssclass: dataset +image: assets/background.jpg +year: 2016 +published: 2019-4-18 +updated: 2019-4-18 +authors: Adam Harvey + +------------ + +## Who Goes There + +### sidebar +### end sidebar + +[ page under development ] + +{% include 'dashboard.html' %} + +{% include 'supplementary_header.html' %} + +{% include 'cite_our_work.html' %} + +### Footnotes |
