diff options
Diffstat (limited to 'site/content/pages/datasets')
| -rw-r--r-- | site/content/pages/datasets/brainwash/index.md | 40 | ||||
| -rw-r--r-- | site/content/pages/datasets/lfw/index.md | 2 | ||||
| -rw-r--r-- | site/content/pages/datasets/vgg_face2/index.md | 20 |
3 files changed, 50 insertions, 12 deletions
diff --git a/site/content/pages/datasets/brainwash/index.md b/site/content/pages/datasets/brainwash/index.md index 04747729..7c43e008 100644 --- a/site/content/pages/datasets/brainwash/index.md +++ b/site/content/pages/datasets/brainwash/index.md @@ -2,10 +2,11 @@ status: published title: Brainwash -desc: Brainwash is a dataset of people appearing in a publicy exposed Dropcam footage at a cafe in -caption: A pixelated image of -slug: Brainwash -color: #ffff00 +desc: Brainwash is a dataset of people from Dropcam footage of the Brainwash Cafe in San Francisco being used to train face detection algorithms +subdesc: The Brainwash dataset includes 11,918 images of people getting coffee at the Brainwash cafe during 2014 +caption: A pixelated sample image from the Brainwash dataset used for training face detection algorithms for surveillance +slug: brainwash +color: #ffaa00 image: assets/background.jpg published: 2019-2-23 updated: 2019-2-23 @@ -13,20 +14,27 @@ authors: Adam Harvey ------------ -# Brainwash +### Statistics -+ Year: 2015 ++ Published: 2015 ++ Collected: 2014 ++ Location: San Franscisco + Images: 11,917 + Faces: 91,146 ++ Resolution: 640x480px + Origin: Dropcam footage -+ Created by: Stanford ++ Created by: Stanford Department of Computer Science +### INSIGHTS -<!--header--> +- facts about Brainwash 1 +- facts about Brainwash 2 + +## Brainwash Dataset + + Sed ut perspiciatis, unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam eaque ipsa, quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt, explicabo. Nemo enim ipsam voluptatem, quia voluptas sit, aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos, qui ratione voluptatem sequi nesciunt, neque porro quisquam est, qui dolorem ipsum, quia dolor sit amet consectetur. - -COFW is "is designed to benchmark face landmark algorithms in realistic conditions, which include heavy occlusions and large shape variations" [Robust face landmark estimation under occlusion]. -------- @@ -35,4 +43,14 @@ RESEARCH below this line --- -add research about Brainwash here
\ No newline at end of file +> This package contains the "Brainwash" dataset. The dataset consists of images capturing the everyday life of a busy downtown cafe and is split into the following subsets: +> training set: 10769 with 81975 annotated people +> validation set: 500 images with 3318 annotated people +> test set: 500 images with 5007 annotated people + +> Bounding box annotations are provided in a simple text file format. Each line in the file contains +image name followed by the list of annotation rectangles in the \[xmin, ymin, max, ymax\] format. + +> We refer to the following arXiv submission for details on the dataset and the evaluation procedure: + +http://arxiv.org/abs/1506.04878
\ No newline at end of file diff --git a/site/content/pages/datasets/lfw/index.md b/site/content/pages/datasets/lfw/index.md index 83245470..b7abbb68 100644 --- a/site/content/pages/datasets/lfw/index.md +++ b/site/content/pages/datasets/lfw/index.md @@ -41,7 +41,7 @@ The *Names and Faces* dataset was the first face recognition dataset created ent ### Biometric Trade Routes -To understand how this dataset has been used, its citations have been geocoded to show an approximate geographic digital trade route of the biometric data. Lines indicate an organization (education, commercial, or governmental) that has cited the LFW dataset in their research. Data is compiled from [Semantic Scholar](https://www.semanticscholar.org). +[convert to template] To understand how this dataset has been used, its citations have been geocoded to show an approximate geographic digital trade route of the biometric data. Lines indicate an organization (education, commercial, or governmental) that has cited the LFW dataset in their research. Data is compiled from [Semantic Scholar](https://www.semanticscholar.org). ``` map diff --git a/site/content/pages/datasets/vgg_face2/index.md b/site/content/pages/datasets/vgg_face2/index.md index 448adde7..718b879b 100644 --- a/site/content/pages/datasets/vgg_face2/index.md +++ b/site/content/pages/datasets/vgg_face2/index.md @@ -25,3 +25,23 @@ authors: Adam Harvey - The VGG Face 2 dataset includes approximately 1,331 actresses, 139 presidents, 16 wives, 3 husbands, 2 snooker player, and 1 guru + +### Names and descriptions + +- The original VGGF2 name list has been updated with the results returned from Google Knowledge +- Names with a similarity score greater than 0.75 where automatically updated. Scores computed using `import difflib; seq = difflib.SequenceMatcher(a=a.lower(), b=b.lower()); score = seq.ratio()` +- The 97 names with a score of 0.75 or lower were manually reviewed and includes name changes validating using Wikipedia.org results for names such as "Bruce Jenner" to "Caitlyn Jenner", spousal last-name changes, and discretionary changes to improve search results such as combining nicknames with full name when appropriate, for example changing "Aleksandar Petrović" to "Aleksandar 'Aco' Petrović" and minor changes such as "Mohammad Ali" to "Muhammad Ali" +- The 'Description' text was automatically added when the Knowledge Graph score was greater than 250 + +## TODO + +- create name list, and populate with Knowledge graph information like LFW +- make list of interesting number stats, by the numbers +- make list of interesting important facts +- write intro abstract +- write analysis of usage +- find examples, citations, and screenshots of useage +- find list of companies using it for table +- create montages of the dataset, like LFW +- create right to removal information + |
