diff options
Diffstat (limited to 'site/content/pages/datasets')
21 files changed, 155 insertions, 70 deletions
diff --git a/site/content/pages/datasets/afad/index.md b/site/content/pages/datasets/afad/index.md index c5d20d6e..3b0ca3c3 100644 --- a/site/content/pages/datasets/afad/index.md +++ b/site/content/pages/datasets/afad/index.md @@ -1,6 +1,6 @@ ------------ -status: published +status: draft title: Asian Face Age Dataset desc: AFAD: Asian Face Age Dataset slug: afad @@ -18,8 +18,6 @@ authors: Adam Harvey + Origin: RenRen -**Unconstrained College Students** is a large-scale, unconstrained face detection and recognition dataset. It includes - ----- ## Research diff --git a/site/content/pages/datasets/aflw/index.md b/site/content/pages/datasets/aflw/index.md index 36a713fc..66f0f806 100644 --- a/site/content/pages/datasets/aflw/index.md +++ b/site/content/pages/datasets/aflw/index.md @@ -1,6 +1,6 @@ ------------ -status: published +status: draft title: Annotated Facial Landmarks in The Wild desc: AFLW: Annotated Facial Landmarks in The Wild slug: aflw diff --git a/site/content/pages/datasets/brainwash/assets/00425000_640x480.jpg b/site/content/pages/datasets/brainwash/assets/00425000_640x480.jpg Binary files differnew file mode 100644 index 00000000..de62175a --- /dev/null +++ b/site/content/pages/datasets/brainwash/assets/00425000_640x480.jpg diff --git a/site/content/pages/datasets/brainwash/assets/00818000_640x480.jpg b/site/content/pages/datasets/brainwash/assets/00818000_640x480.jpg Binary files differnew file mode 100644 index 00000000..30c0fcb1 --- /dev/null +++ b/site/content/pages/datasets/brainwash/assets/00818000_640x480.jpg diff --git a/site/content/pages/datasets/brainwash/assets/background.jpg b/site/content/pages/datasets/brainwash/assets/background.jpg Binary files differindex eada1779..f6efb253 100644 --- a/site/content/pages/datasets/brainwash/assets/background.jpg +++ b/site/content/pages/datasets/brainwash/assets/background.jpg diff --git a/site/content/pages/datasets/brainwash/assets/index.jpg b/site/content/pages/datasets/brainwash/assets/index.jpg Binary files differindex c903baea..e85f75c2 100644 --- a/site/content/pages/datasets/brainwash/assets/index.jpg +++ b/site/content/pages/datasets/brainwash/assets/index.jpg diff --git a/site/content/pages/datasets/brainwash/index.md b/site/content/pages/datasets/brainwash/index.md new file mode 100644 index 00000000..a99dce3a --- /dev/null +++ b/site/content/pages/datasets/brainwash/index.md @@ -0,0 +1,80 @@ +------------ + +status: published +title: Brainwash +desc: <span style="color:#ffaa00">Brainwash</span> is a dataset of people from webcams the Brainwash Cafe in San Francisco being used to train face detection algorithms +subdesc: Brainwash dataset includes 11,918 images of people getting coffee at the Brainwash cafe during 2014 +caption: An sample image from the Brainwash dataset used for training face detection algorithms for surveillance. License: Open Data Commons Public Domain Dedication (PDDL) +slug: brainwash +image: assets/background.jpg +published: 2019-2-23 +updated: 2019-2-23 +authors: Adam Harvey + +------------ + +### Statistics + ++ Collected: 2014 ++ Published: 2015 ++ Location: 1122 Folsom Street San Franscisco ++ Images: 11,917 ++ Faces: 91,146 ++ Created by: Stanford Department of Computer Science ++ Funding: Max Planck Center for Visual Computing and Communication ++ Resolution: 640x480px ++ Origin: Angelcam IP Cam ++ Purpose: Training face detection + +- more info1 +- more info2 +- more info3 + +## Brainwash Dataset + +*Brainwash* is a face detection dataset created from the Brainwash Cafe's livecam footage. The stream is It was published in 2015 by researchers at the Stanford University and has been used 1122 Folsom Street | USA + +The photos were collected on +- Oct 27, 2014 +- Nov 11, 2014 +- Nov 245, 2017 + +Sed ut perspiciatis, unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam eaque ipsa, quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt, explicabo. Nemo enim ipsam voluptatem, quia voluptas sit, aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos, qui ratione voluptatem sequi nesciunt, neque porro quisquam est, qui dolorem ipsum, quia dolor sit amet consectetur adipisci[ng] velit, sed quia non-numquam [do] eius modi tempora inci[di]dunt, ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam. + + + + +porro quisquam est, qui dolorem ipsum, quia dolor sit amet consectetur adipisci[ng] velit, sed quia non-numquam [do] eius modi tempora inci[di]dunt, ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum[d] exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit, qui in ea voluptate velit esse, quam nihil molestiae consequatur, vel illum, qui dolorem eum fugiat, quo voluptas nulla pariatur? + + +{% include 'map.html' %} + +<hr class="supp"> + +## Supplementary Information for Brainwash Dataset + +{% include 'citations.html' %} + + +-------- + +RESEARCH below this line + +--- + +The file is 4.1GB +- add sha256 hash +- the images were taken from Dropcam which was runnign on https://www.angelcam.com/ "Angelcam’s Real-time Surveillance takes the weight of keeping your home or business secure off your shoulders." + + +> This package contains the "Brainwash" dataset. The dataset consists of images capturing the everyday life of a busy downtown cafe and is split into the following subsets: +> training set: 10769 with 81975 annotated people +> validation set: 500 images with 3318 annotated people +> test set: 500 images with 5007 annotated people + +> Bounding box annotations are provided in a simple text file format. Each line in the file contains +image name followed by the list of annotation rectangles in the \[xmin, ymin, max, ymax\] format. + +> We refer to the following arXiv submission for details on the dataset and the evaluation procedure: + +http://arxiv.org/abs/1506.04878
\ No newline at end of file diff --git a/site/content/pages/datasets/caltech_10k/index.md b/site/content/pages/datasets/caltech_10k/index.md index 8f49f2d1..6b37c1f9 100644 --- a/site/content/pages/datasets/caltech_10k/index.md +++ b/site/content/pages/datasets/caltech_10k/index.md @@ -1,6 +1,6 @@ ------------ -status: published +status: draft title: Caltech 10K Faces Dataset desc: Caltech 10K Faces Dataset slug: caltech_10k diff --git a/site/content/pages/datasets/facebook/index.md b/site/content/pages/datasets/facebook/index.md index 6e3857fd..84d2b572 100644 --- a/site/content/pages/datasets/facebook/index.md +++ b/site/content/pages/datasets/facebook/index.md @@ -1,6 +1,6 @@ ------------ -status: published +status: draft title: Facebook desc: TBD subdesc: TBD diff --git a/site/content/pages/datasets/feret/index.md b/site/content/pages/datasets/feret/index.md index 9a60deaf..da64f9fb 100644 --- a/site/content/pages/datasets/feret/index.md +++ b/site/content/pages/datasets/feret/index.md @@ -1,6 +1,6 @@ ------------ -status: published +status: draft title: FERET: FacE REcognition desc: LFW: Labeled Faces in The Wild slug: lfw diff --git a/site/content/pages/datasets/index.md b/site/content/pages/datasets/index.md index fa012758..47d0bce2 100644 --- a/site/content/pages/datasets/index.md +++ b/site/content/pages/datasets/index.md @@ -13,10 +13,3 @@ sync: false # Facial Recognition Datasets -+ Found: 275 datasets -+ Created between: 1993-2018 -+ Smallest dataset: 20 images -+ Largest dataset: 10,000,000 images - -+ Highest resolution faces: 450x500 (Unconstrained College Students) -+ Lowest resolution faces: 16x20 pixels (QMUL SurvFace) diff --git a/site/content/pages/datasets/lfpw/index.md b/site/content/pages/datasets/lfpw/index.md index 2fb343ca..80b20647 100644 --- a/site/content/pages/datasets/lfpw/index.md +++ b/site/content/pages/datasets/lfpw/index.md @@ -1,6 +1,6 @@ ------------ -status: published +status: draft title: Labeled Face Parts in The Wild desc: LFPW: Labeled Face Parts in The Wild slug: lfpw @@ -18,7 +18,7 @@ authors: Adam Harvey + Funding: CIA - + -------- diff --git a/site/content/pages/datasets/lfw/assets/render_pgan.mp4 b/site/content/pages/datasets/lfw/assets/render_pgan.mp4 Binary files differnew file mode 100644 index 00000000..b573e0b1 --- /dev/null +++ b/site/content/pages/datasets/lfw/assets/render_pgan.mp4 diff --git a/site/content/pages/datasets/lfw/assets/render_pgan_poster.jpg b/site/content/pages/datasets/lfw/assets/render_pgan_poster.jpg Binary files differnew file mode 100644 index 00000000..003de9f9 --- /dev/null +++ b/site/content/pages/datasets/lfw/assets/render_pgan_poster.jpg diff --git a/site/content/pages/datasets/lfw/assets/synthetic_01.jpg b/site/content/pages/datasets/lfw/assets/synthetic_01.jpg Binary files differnew file mode 100644 index 00000000..93ee8312 --- /dev/null +++ b/site/content/pages/datasets/lfw/assets/synthetic_01.jpg diff --git a/site/content/pages/datasets/lfw/assets/synthetic_02.jpg b/site/content/pages/datasets/lfw/assets/synthetic_02.jpg Binary files differnew file mode 100644 index 00000000..f2ddc681 --- /dev/null +++ b/site/content/pages/datasets/lfw/assets/synthetic_02.jpg diff --git a/site/content/pages/datasets/lfw/assets/synthetic_03.jpg b/site/content/pages/datasets/lfw/assets/synthetic_03.jpg Binary files differnew file mode 100644 index 00000000..40bb3001 --- /dev/null +++ b/site/content/pages/datasets/lfw/assets/synthetic_03.jpg diff --git a/site/content/pages/datasets/lfw/index.md b/site/content/pages/datasets/lfw/index.md index 4161561d..1af263dc 100644 --- a/site/content/pages/datasets/lfw/index.md +++ b/site/content/pages/datasets/lfw/index.md @@ -2,34 +2,33 @@ status: published title: Labeled Faces in The Wild -desc: Labeled Faces in The Wild (LFW) is a database of face photographs designed for studying the problem of unconstrained face recognition. -subdesc: It includes 13,456 images of 4,432 people’s images copied from the Internet during 2002-2004. +desc: <span style="color:#ff0000">Labeled Faces in The Wild (LFW)</span> is a database of face photographs designed for studying the problem of unconstrained face recognition. +subdesc: It includes 13,456 images of 4,432 people's images copied from the Internet during 2002-2004. image: assets/background.jpg -caption: A few of the 5,749 people in the Labeled Faces in the Wild Dataset. The most widely used face dataset for benchmarking commercial face recognition algorithms. +caption: A few of the 5,749 people in the Labeled Faces in the Wild Dataset, thee most widely used face dataset for benchmarking face recognition algorithms. slug: lfw published: 2019-2-23 updated: 2019-2-23 -color: #ff0000 authors: Adam Harvey ------------ -### Statistics +### sidebar -+ Years: 2002-2004 ++ Created: 2002-2004 + Images: 13,233 + Identities: 5,749 -+ Origin: Yahoo News Images -+ Funding: (Possibly, partially CIA) ++ Origin: Yahoo! News Images ++ Used by: Facebook, Google, Microsoft, Baidu, Tencent, SenseTime, Face++, CIA, NSA, IARPA ++ Website: <a href="http://vis-www.cs.umass.edu/lfw">vis-www.cs.umass.edu/lfw</a> -### INSIGHTS - -- There are about 3 men for every 1 woman (4,277 men and 1,472 women) in the LFW dataset[^lfw_www] +- There are about 3 men for every 1 woman in the LFW dataset[^lfw_www] - The person with the most images is [George W. Bush](http://vis-www.cs.umass.edu/lfw/person/George_W_Bush_comp.html) with 530 - There are about 3 George W. Bush's for every 1 [Tony Blair](http://vis-www.cs.umass.edu/lfw/person/Tony_Blair.html) - The LFW dataset includes over 500 actors, 30 models, 10 presidents, 124 basketball players, 24 football players, 11 kings, 7 queens, and 1 [Moby](http://vis-www.cs.umass.edu/lfw/person/Moby.html) - In all 3 of the LFW publications [^lfw_original_paper], [^lfw_survey], [^lfw_tech_report] the words "ethics", "consent", and "privacy" appear 0 times - The word "future" appears 71 times +- \* denotes partial funding for related research ## Labeled Faces in the Wild @@ -39,49 +38,44 @@ The LFW dataset includes 13,233 images of 5,749 people that were collected betwe The *Names and Faces* dataset was the first face recognition dataset created entire from online photos. However, *Names and Faces* and *LFW* are not the first face recognition dataset created entirely "in the wild". That title belongs to the [UCD dataset](/datasets/ucd_faces/). Images obtained "in the wild" means using an image without explicit consent or awareness from the subject or photographer. -### Biometric Trade Routes +The *Names and Faces* dataset was the first face recognition dataset created entire from online photos. However, *Names and Faces* and *LFW* are not the first face recognition dataset created entirely "in the wild". That title belongs to the [UCD dataset](/datasets/ucd_faces/). Images obtained "in the wild" means using an image without explicit consent or awareness from the subject or photographer. -To understand how this dataset has been used, its citations have been geocoded to show an approximate geographic digital trade route of the biometric data. Lines indicate an organization (education, commercial, or governmental) that has cited the LFW dataset in their research. Data is compiled from [Semantic Scholar](https://www.semanticscholar.org). + -``` -map -``` +The *Names and Faces* dataset was the first face recognition dataset created entire from online photos. However, *Names and Faces* and *LFW* are not the first face recognition dataset created entirely "in the wild". That title belongs to the [UCD dataset](/datasets/ucd_faces/). Images obtained "in the wild" means using an image without explicit consent or awareness from the subject or photographer. -### Synthetic Faces +The *Names and Faces* dataset was the first face recognition dataset created entire from online photos. However, *Names and Faces* and *LFW* are not the first face recognition dataset created entirely "in the wild". That title belongs to the [UCD dataset](/datasets/ucd_faces/). Images obtained "in the wild" means using an image without explicit consent or awareness from the subject or photographer. -To visualize the types of photos in the dataset without explicitly publishing individual's identities a generative adversarial network (GAN) was trained on the entire dataset. The images in this video show a neural network learning the visual latent space and then interpolating between archetypical identities within the LFW dataset. +{% include 'map.html' %} - + Sed ut perspiciatis, unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam eaque ipsa, quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt, explicabo. Nemo enim ipsam voluptatem, quia voluptas sit, aspernatur aut odit aut fugit, sed quia. -### Citations - -Browse or download the geocoded citation data collected for the LFW dataset. +<hr class="supp"> -``` -citations -``` +## Supplementary Information for Labeled Faces in The Wild -### Additional Information +{% include 'citations.html' %} -(tweet-sized snippets go here) +{% include 'synthetic_faces_intro.html' %} -- The LFW dataset is considered the "most popular benchmark for face recognition" [^lfw_baidu] -- The LFW dataset is "the most widely used evaluation set in the field of facial recognition" [^lfw_pingan] -- All images in LFW dataset were obtained "in the wild" meaning without any consent from the subject or from the photographer -- The faces in the LFW dataset were detected using the Viola-Jones haarcascade face detector [^lfw_website] [^lfw-survey] -- The LFW dataset is used by several of the largest tech companies in the world including "Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong." [^lfw_pingan] -- All images in the LFW dataset were copied from Yahoo News between 2002 - 2004 -- In 2014, two of the four original authors of the LFW dataset received funding from IARPA and ODNI for their followup paper [Labeled Faces in the Wild: Updates and New Reporting Procedures](https://www.semanticscholar.org/paper/Labeled-Faces-in-the-Wild-%3A-Updates-and-New-Huang-Learned-Miller/2d3482dcff69c7417c7b933f22de606a0e8e42d4) via IARPA contract number 2014-14071600010 -- The dataset includes 2 images of [George Tenet](http://vis-www.cs.umass.edu/lfw/person/George_Tenet.html), the former Director of Central Intelligence (DCI) for the Central Intelligence Agency whose facial biometrics were eventually used to help train facial recognition software in China and Russia + + + + - - - +### Commercial Use of Labeled Faces in The Wild -## Code +Add a paragraph about how usage extends far beyond academia into research centers for largest companies in the world. And even funnels into CIA funded research in the US and defense industry usage in China. -The LFW dataset is so widely used that a popular code library called Sci-Kit Learn includes a function called `fetch_lfw_people` to download the faces in the LFW dataset. +``` +load_file assets/lfw_commercial_use.csv +name_display, company_url, example_url, country, description +``` + +### Code + +The LFW dataset is so widely used that access to the facial data has built directly into a popular code library called Sci-Kit Learn. It includes a function called `fetch_lfw_people` to download the faces in the LFW dataset. ```python #!/usr/bin/python @@ -109,7 +103,7 @@ n_ims = cols * rows # build montages im_scale = 0.5 ims = lfw_people.images[:n_ims] -montages = imutils.build_montages(ims, (int(w * im_scale, int(h * im_scale)), (cols, rows)) +montages = imutils.build_montages(ims, (int(w * im_scale, int(h * im_scale)), (cols, rows)) montage = montages[0] # save full montage image @@ -120,14 +114,7 @@ montage = imutils.resize(montage, width=960) imageio.imwrite('lfw_montage_960.jpg', montage) ``` -### Supplementary Material - -``` -load_file assets/lfw_commercial_use.csv -name_display, company_url, example_url, country, description -``` - -Text and graphics ©Adam Harvey / megapixels.cc +Research, text, and graphics ©Adam Harvey / megapixels.cc ------- @@ -143,10 +130,17 @@ Ignore text below these lines - From: "People-LDA: Anchoring Topics to People using Face Recognition" <https://www.semanticscholar.org/paper/People-LDA%3A-Anchoring-Topics-to-People-using-Face-Jain-Learned-Miller/10f17534dba06af1ddab96c4188a9c98a020a459> and <https://ieeexplore.ieee.org/document/4409055> - This paper was presented at IEEE 11th ICCV conference Oct 14-21 and the main LFW paper "Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments" was also published that same year - 10f17534dba06af1ddab96c4188a9c98a020a459 - - This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via contract number 2014-14071600010. - From "Labeled Faces in the Wild: Updates and New Reporting Procedures" - 70% of people in the dataset have only 1 image and 29% have 2 or more images +- The LFW dataset is considered the "most popular benchmark for face recognition" [^lfw_baidu] +- The LFW dataset is "the most widely used evaluation set in the field of facial recognition" [^lfw_pingan] +- All images in LFW dataset were obtained "in the wild" meaning without any consent from the subject or from the photographer +- The faces in the LFW dataset were detected using the Viola-Jones haarcascade face detector [^lfw_website] [^lfw-survey] +- The LFW dataset is used by several of the largest tech companies in the world including "Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong." [^lfw_pingan] +- All images in the LFW dataset were copied from Yahoo News between 2002 - 2004 +- In 2014, two of the four original authors of the LFW dataset received funding from IARPA and ODNI for their followup paper [Labeled Faces in the Wild: Updates and New Reporting Procedures](https://www.semanticscholar.org/paper/Labeled-Faces-in-the-Wild-%3A-Updates-and-New-Huang-Learned-Miller/2d3482dcff69c7417c7b933f22de606a0e8e42d4) via IARPA contract number 2014-14071600010 +- The dataset includes 2 images of [George Tenet](http://vis-www.cs.umass.edu/lfw/person/George_Tenet.html), the former Director of Central Intelligence (DCI) for the Central Intelligence Agency whose facial biometrics were eventually used to help train facial recognition software in China and Russia ### Footnotes diff --git a/site/content/pages/datasets/uccs/index.md b/site/content/pages/datasets/uccs/index.md index be1d2474..b41aaeb1 100644 --- a/site/content/pages/datasets/uccs/index.md +++ b/site/content/pages/datasets/uccs/index.md @@ -1,6 +1,6 @@ ------------ -status: published +status: draft title: Unconstrained College Students desc: UCCS: Unconstrained College Students slug: uccs diff --git a/site/content/pages/datasets/vgg_face2/index.md b/site/content/pages/datasets/vgg_face2/index.md index 6ee72b0a..718b879b 100644 --- a/site/content/pages/datasets/vgg_face2/index.md +++ b/site/content/pages/datasets/vgg_face2/index.md @@ -1,6 +1,6 @@ ------------ -status: published +status: draft title: VGG Face 2 Dataset desc: VGG Face 2 Dataset slug: vgg_face2 @@ -25,3 +25,23 @@ authors: Adam Harvey - The VGG Face 2 dataset includes approximately 1,331 actresses, 139 presidents, 16 wives, 3 husbands, 2 snooker player, and 1 guru + +### Names and descriptions + +- The original VGGF2 name list has been updated with the results returned from Google Knowledge +- Names with a similarity score greater than 0.75 where automatically updated. Scores computed using `import difflib; seq = difflib.SequenceMatcher(a=a.lower(), b=b.lower()); score = seq.ratio()` +- The 97 names with a score of 0.75 or lower were manually reviewed and includes name changes validating using Wikipedia.org results for names such as "Bruce Jenner" to "Caitlyn Jenner", spousal last-name changes, and discretionary changes to improve search results such as combining nicknames with full name when appropriate, for example changing "Aleksandar Petrović" to "Aleksandar 'Aco' Petrović" and minor changes such as "Mohammad Ali" to "Muhammad Ali" +- The 'Description' text was automatically added when the Knowledge Graph score was greater than 250 + +## TODO + +- create name list, and populate with Knowledge graph information like LFW +- make list of interesting number stats, by the numbers +- make list of interesting important facts +- write intro abstract +- write analysis of usage +- find examples, citations, and screenshots of useage +- find list of companies using it for table +- create montages of the dataset, like LFW +- create right to removal information + diff --git a/site/content/pages/datasets/youtube_celebrities/index.md b/site/content/pages/datasets/youtube_celebrities/index.md index f5a7128d..a81974b5 100644 --- a/site/content/pages/datasets/youtube_celebrities/index.md +++ b/site/content/pages/datasets/youtube_celebrities/index.md @@ -1,9 +1,9 @@ ------------ -status: published +status: draft title: YouTube Celebrities desc: YouTube Celebrities -slug: lfw +slug: youtube_celebrities published: 2019-2-23 updated: 2019-2-23 authors: Adam Harvey |
