From 5366253cc74b6df84cd0923220d288dc7385e111 Mon Sep 17 00:00:00 2001 From: adamhrv Date: Sun, 31 Mar 2019 18:55:38 +0200 Subject: change gov/mil to mil/gov --- client/chart/constants.js | 4 ++-- site/includes/citations.html | 3 +-- site/includes/map.html | 6 +++--- site/includes/piechart.html | 7 ------- todo.md | 34 +++++++++++++--------------------- 5 files changed, 19 insertions(+), 35 deletions(-) diff --git a/client/chart/constants.js b/client/chart/constants.js index 70375ba3..b916cbd2 100644 --- a/client/chart/constants.js +++ b/client/chart/constants.js @@ -59,6 +59,6 @@ export const institutionOrder = { export const institutionLabels = { 'edu': 'Academic', 'company': 'Commercial', - 'gov': 'Government / Military', - 'mil': 'Government / Military', + 'gov': 'Military / Government', + 'mil': 'Military / Government', } \ No newline at end of file diff --git a/site/includes/citations.html b/site/includes/citations.html index 058a1834..f15c5148 100644 --- a/site/includes/citations.html +++ b/site/includes/citations.html @@ -2,8 +2,7 @@

Citations

- Citations were collected from Semantic Scholar, a website which aggregates - and indexes research papers. The citations were geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train and/or test machine learning algorithms. + Citations were collected from Semantic Scholar, a website which aggregates and indexes research papers. The citations were geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train and/or test machine learning algorithms.

Add [button/link] to download CSV. Add search input field to filter. Expand number of rows to 10. Reduce URL text to show only the domain (ie https://arxiv.org/pdf/123456 --> arxiv.org) diff --git a/site/includes/map.html b/site/includes/map.html index 74771768..867ada4c 100644 --- a/site/includes/map.html +++ b/site/includes/map.html @@ -18,15 +18,15 @@ -

+
  • Academic
  • -
  • Industry
  • -
  • Government / Military
  • +
  • Commercial
  • +
  • Military / Government
  • Citation data is collected using SemanticScholar.org then dataset usage verified and geolocated.
diff --git a/site/includes/piechart.html b/site/includes/piechart.html index e739bb28..94c8aae7 100644 --- a/site/includes/piechart.html +++ b/site/includes/piechart.html @@ -1,10 +1,3 @@ -
-

- These pie charts show overall totals based on country and institution type. -

- -
-
diff --git a/todo.md b/todo.md index d6f76c85..c2941ff0 100644 --- a/todo.md +++ b/todo.md @@ -1,43 +1,35 @@ # TODO -## Paper Review - -- build and deploy paper verification tool to publicly (but password protected) URL -- add field for name (no site registration needed) -- user can maually add their name, so it can be used for double-verification, accountability -- top priority datasets will probably be: DukeMTMC, UCCS, MegaFace, Brainwash, HRT Transgender, IJB-C, VGG Face 2, MS Celeb, Pipa, - ## Splash - AH: work on CTA overlay design -- AH: render one head from each activate dataset +- AH: render heads from IJB-C, and MS Celeb +- AH: create psuedo-ranomized list of names from combined datasets for word/name cloud - JL: add "Name \n Dataset Name" below head? and make linkable to dataset? - change animation to be only colored vertices <---> colored landmarks - add scripted slow-slow-zoom out effect ## Datasets Index -- AH: add more datasets -- AH: finalize intro text +- AH: add dataset analysis for MS Celeb, Duke, UCCS, IJB-C, Brainwash, HRT Transgender +- AH: increase sizes of dataset thumbnails +- AH: add license information to each dataset page ## Datasets -- AH: Try creating another google doc to manually review each citation and send to FT to maybe help with review +Higher priority: + - AH: finalize text for map include, beta disclaimer - JL: add download (button) and search option for CSV? or link to github - JL: remove pointer rollover on tabulators - JL: change PDF url to only show domain (ie https:/arxiv.org/12345/ --> arxiv.org) -- JL: check footnotes (it shows an 'a' next to the numbers on bottom. is this right?) -- JL: Add 'sticky' title appear in header zone when scrolling down page (like NYT) -- JL: add total number of citations next to country "China (1,234)" -- JL: possible to add country with most citations in the "Who Used Dataset" paragraph? -- JL: time permitting, add C3 Pie Graph include: - - one pie graph for citations by country - - one pie graph for citations by sector (academic, commericial, military) -- Integrate verified citations and show only verified citations -- JL/AH: integrate new sidebar JSON or CSV data (AH, working on this...) -- NB: skipping synthetic faces for now +- JL/AH: integrate new sidebar JSON or CSV data (AH, working on this...) to show dataset statistics + +Lower priority: + +- JL: Add 'sticky' title with Dataset appear in header zone when scrolling down page (like NYT) + ## About - AH: update bio images -- cgit v1.2.3-70-g09d2 From ccdc0705c2d06122144755d21c6d2156f045304d Mon Sep 17 00:00:00 2001 From: adamhrv Date: Mon, 1 Apr 2019 12:05:53 +0200 Subject: cosmetics --- site/assets/css/css.css | 30 ++++++++++++++++-------------- 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/site/assets/css/css.css b/site/assets/css/css.css index c3800315..129d6090 100644 --- a/site/assets/css/css.css +++ b/site/assets/css/css.css @@ -4,7 +4,8 @@ html, body { padding: 0; width: 100%; min-height: 100%; - font-family: 'Roboto Mono', sans-serif; + /*font-family: 'Roboto Mono', sans-serif;*/ + font-family: 'Roboto', sans-serif; color: #eee; overflow-x: hidden; } @@ -163,7 +164,7 @@ h1 { margin: 75px 0 10px; padding: 0; transition: color 0.1s cubic-bezier(0,0,1,1); - font-family: 'Roboto'; + font-family: 'Roboto Mono', monospace; } h2 { color: #eee; @@ -173,23 +174,23 @@ h2 { margin: 20px 0 10px; padding: 0; transition: color 0.1s cubic-bezier(0,0,1,1); - font-family: 'Roboto'; + font-family: 'Roboto Mono', monospace; } h3 { margin: 0 0 20px 0; padding: 20px 0 0 0; font-size: 22pt; - font-weight: 500; + font-weight: 400; transition: color 0.1s cubic-bezier(0,0,1,1); - font-family: 'Roboto'; + font-family: 'Roboto Mono', monospace; } h4 { margin: 0 0 10px 0; padding: 0; font-size: 11pt; - font-weight: 500; + font-weight: 400; transition: color 0.1s cubic-bezier(0,0,1,1); - font-family: 'Roboto'; + font-family: 'Roboto Mono', monospace; } .content h3 a { color: #888; @@ -212,11 +213,11 @@ h4 { border-bottom: 0; } th, .gray { - font-family: 'Roboto Mono', monospace; + font-family: 'Roboto', monospace; font-weight: 500; text-transform: uppercase; letter-spacing: .15rem; - color: #999; + color: #777; } th, .gray { font-size: 9pt; @@ -248,7 +249,7 @@ section { p { margin: 0 10px 20px 0; line-height: 2; - font-size: 16px; + font-size: 18px; font-weight: 300; } p.subp{ @@ -272,18 +273,19 @@ p.subp{ flex-direction: row; justify-content: flex-start; align-items: flex-start; - font-size: 14px; + font-size: 12px; + color: #ccc; margin-bottom: 20px; font-family: 'Roboto', sans-serif; } .meta > div { margin-right: 20px; - line-height: 19px + line-height: 17px /*font-size:11px;*/ } .meta .gray { font-size: 9pt; - padding-bottom: 4px; + padding-bottom: 5px; line-height: 14px } .right-sidebar { @@ -303,7 +305,7 @@ p.subp{ padding-top: 10px; padding-right: 20px; /*margin-right: 20px;*/ - margin-bottom: 30px; + margin-bottom: 10px; /*border-right: 1px solid #444;*/ font-family: 'Roboto'; font-size: 14px; -- cgit v1.2.3-70-g09d2 From 1369a597f2f564bf305e0918f91e2a96a4fded82 Mon Sep 17 00:00:00 2001 From: adamhrv Date: Mon, 1 Apr 2019 12:06:14 +0200 Subject: brainwash, example sidebar --- .../datasets/brainwash/assets/00818000_640x480.jpg | Bin 33112 -> 0 bytes .../datasets/brainwash/assets/background_540.jpg | Bin 83594 -> 0 bytes .../datasets/brainwash/assets/background_600.jpg | Bin 86425 -> 0 bytes .../brainwash/assets/brainwash_mean_overlay.jpg | Bin 0 -> 150399 bytes .../brainwash/assets/brainwash_mean_overlay_wm.jpg | Bin 0 -> 151713 bytes site/content/pages/datasets/brainwash/index.md | 41 ++++++++++++++++----- 6 files changed, 31 insertions(+), 10 deletions(-) delete mode 100644 site/content/pages/datasets/brainwash/assets/00818000_640x480.jpg delete mode 100644 site/content/pages/datasets/brainwash/assets/background_540.jpg delete mode 100755 site/content/pages/datasets/brainwash/assets/background_600.jpg create mode 100755 site/content/pages/datasets/brainwash/assets/brainwash_mean_overlay.jpg create mode 100755 site/content/pages/datasets/brainwash/assets/brainwash_mean_overlay_wm.jpg diff --git a/site/content/pages/datasets/brainwash/assets/00818000_640x480.jpg b/site/content/pages/datasets/brainwash/assets/00818000_640x480.jpg deleted file mode 100644 index 30c0fcb1..00000000 Binary files a/site/content/pages/datasets/brainwash/assets/00818000_640x480.jpg and /dev/null differ diff --git a/site/content/pages/datasets/brainwash/assets/background_540.jpg b/site/content/pages/datasets/brainwash/assets/background_540.jpg deleted file mode 100644 index 5c8c0ad4..00000000 Binary files a/site/content/pages/datasets/brainwash/assets/background_540.jpg and /dev/null differ diff --git a/site/content/pages/datasets/brainwash/assets/background_600.jpg b/site/content/pages/datasets/brainwash/assets/background_600.jpg deleted file mode 100755 index 8f2de697..00000000 Binary files a/site/content/pages/datasets/brainwash/assets/background_600.jpg and /dev/null differ diff --git a/site/content/pages/datasets/brainwash/assets/brainwash_mean_overlay.jpg b/site/content/pages/datasets/brainwash/assets/brainwash_mean_overlay.jpg new file mode 100755 index 00000000..2f5917e3 Binary files /dev/null and b/site/content/pages/datasets/brainwash/assets/brainwash_mean_overlay.jpg differ diff --git a/site/content/pages/datasets/brainwash/assets/brainwash_mean_overlay_wm.jpg b/site/content/pages/datasets/brainwash/assets/brainwash_mean_overlay_wm.jpg new file mode 100755 index 00000000..790dbb79 Binary files /dev/null and b/site/content/pages/datasets/brainwash/assets/brainwash_mean_overlay_wm.jpg differ diff --git a/site/content/pages/datasets/brainwash/index.md b/site/content/pages/datasets/brainwash/index.md index 0bf67455..d9bffb39 100644 --- a/site/content/pages/datasets/brainwash/index.md +++ b/site/content/pages/datasets/brainwash/index.md @@ -19,28 +19,24 @@ authors: Adam Harvey + Published: 2015 + Images: 11,918 + Faces: 91,146 -+ Created by: Stanford Department of Computer Science ++ Created by: Stanford University (US)
Max Planck Institute for Informatics (DE) + Funded by: Max Planck Center for Visual Computing and Communication -+ Location: Brainwash Cafe, San Franscisco -+ Purpose: Training face detection ++ Purpose: Face detection + Website: stanford.edu -+ Paper: End-to-End People Detection in Crowded Scenes -+ Explicit Consent: No ## Brainwash Dataset (PAGE UNDER DEVELOPMENT) -*Brainwash* is a face detection dataset created from the Brainwash Cafe's livecam footage including 11,918 images of "everyday life of a busy downtown cafe[^readme]". The images are used to develop face detection algorithms for the "challenging task of detecting people in crowded scenes" and tracking them. +*Brainwash* is a head detection dataset created from San Francisco's Brainwash Cafe livecam footage. It includes 11,918 images of "everyday life of a busy downtown cafe[^readme]". The images are used to train and validate algorithms for detecting people in crowded scenes. -Before closing in 2017, Brainwash Cafe was a "cafe and laundromat" located in San Francisco's SoMA district. The cafe published a publicy available livestream from the cafe with a view of the cash register, performance stage, and seating area. +Before closing in 2017, The Brainwash Cafe was a combination cafe, laundromat, and performance venue located in San Francisco's SoMA district. The images used for Brainwash dataset were captured on 3 days: October 27, November 13, and November 24 in 2014. According the author's reserach paper introducing the dataset, the images were acquired with the help of Angelcam.com [cite orig paper]. -Since it's publication by Stanford in 2015, the Brainwash dataset has appeared in several notable research papers. In September 2016 four researchers from the National University of Defense Technology in Changsha, China used the Brainwash dataset for a research study on "people head detection in crowded scenes", concluding that their algorithm "achieves superior head detection performance on the crowded scenes dataset[^localized_region_context]". And again in 2017 three researchers at the National University of Defense Technology used Brainwash for a study on object detection noting "the data set used in our experiment is shown in Table 1, which includes one scene of the brainwash dataset[^replacement_algorithm]". +Brainwash is not a widely used dataset but since it's publication by Stanford in 2015, it has notably appeared in several research papers from the National University of Defense Technology in Changsha, China. In 2016 and in 2017 researchers there conducted studies on "people head detection in crowded scenes" [^localized_region_context] [^replacement_algorithm]. -![caption: An sample image from the Brainwash dataset used for training face and head detection algorithms for surveillance. The datset contains about 12,000 images. License: Open Data Commons Public Domain Dedication (PDDL)](assets/00425000_960.jpg) +![caption: The pixel-averaged image of all Brainwash dataset images is shown with 81,973 head annotations drawn from the Brainwash training partition. (c) Adam Harvey](assets/brainwash_mean_overlay.jpg) -![caption: 49 of the 11,918 images included in the Brainwash dataset. License: Open Data Commons Public Domain Dedication (PDDL)](assets/brainwash_montage.jpg) {% include 'chart.html' %} @@ -55,12 +51,37 @@ Add more analysis here {% include 'citations.html' %} +![caption: An sample image from the Brainwash dataset used for training face and head detection algorithms for surveillance. The datset contains about 12,000 images. License: Open Data Commons Public Domain Dedication (PDDL)](assets/00425000_960.jpg) + +![caption: 49 of the 11,918 images included in the Brainwash dataset. License: Open Data Commons Public Domain Dedication (PDDL)](assets/brainwash_montage.jpg) ### Additional Information - The dataset author spoke about his research at the CVPR conference in 2016 +To evaluate the performance of our approach, we collected a +large dataset of images from busy scenes using video footage available from public webcams. In +total, we collect 11917 images with 91146 labeled people. We extract images from video footage at +a fixed interval of 100 seconds to ensure a large variation in images. We allocate 1000 images for +testing and validation, and leave the remaining images for training, making sure that no temporal +overlaps exist between training and test splits. The resulting training set contains 82906 instances. +Test and validation sets contain 4922 and 3318 people instances respectively. Images were labeled +using Amazon Mechanical Turk by a handful of workers pre-selected through their performance on +an example task. We label each person’s head to avoid ambiguity in bounding box locations. The +annotator labels any person she is able to recognize, even if a substantial part of the person is not +visible. Images and annotations will be made available 1 . +Examples of collected images are shown in Fig. 6, and in the video included in the supplemental +material. Images in our dataset include challenges such as people at small scales, strong partial +occlusions, and a large variability in clothing and appearance. + +TODO + +- add bounding boxes to the header image +- remake montage with randomized images, with bboxes +- clean up intro text + + ### Footnotes [^readme]: "readme.txt" https://exhibits.stanford.edu/data/catalog/sx925dc9385. -- cgit v1.2.3-70-g09d2 From 1d261333895cb9305c73d02170e61c5100a39358 Mon Sep 17 00:00:00 2001 From: adamhrv Date: Mon, 1 Apr 2019 12:49:57 +0200 Subject: add dataset size --- site/content/pages/datasets/brainwash/index.md | 38 +++++++------------------- 1 file changed, 10 insertions(+), 28 deletions(-) diff --git a/site/content/pages/datasets/brainwash/index.md b/site/content/pages/datasets/brainwash/index.md index d9bffb39..6d90e78f 100644 --- a/site/content/pages/datasets/brainwash/index.md +++ b/site/content/pages/datasets/brainwash/index.md @@ -2,8 +2,8 @@ status: published title: Brainwash -desc: Brainwash is a dataset of webcam images taken from the Brainwash Cafe in San Francisco -subdesc: The Brainwash dataset includes 11,918 images of "everyday life of a busy downtown cafe" and is used for training head detection algorithms +desc: Brainwash is a dataset of webcam images taken from the Brainwash Cafe in San Francisco in 2014 +subdesc: The Brainwash dataset includes 11,918 images of "everyday life of a busy downtown cafe" and is used for training head detection surveillance algorithms slug: brainwash cssclass: dataset image: assets/background.jpg @@ -21,19 +21,18 @@ authors: Adam Harvey + Faces: 91,146 + Created by: Stanford University (US)
Max Planck Institute for Informatics (DE) + Funded by: Max Planck Center for Visual Computing and Communication -+ Purpose: Face detection ++ Purpose: Head detection ++ Download Size: 4.1GB + Website: stanford.edu ## Brainwash Dataset -(PAGE UNDER DEVELOPMENT) +*Brainwash* is a head detection dataset created from San Francisco's Brainwash Cafe livecam footage. It includes 11,918 images of "everyday life of a busy downtown cafe"[^readme] captured at 100 second intervals throught the entire day. Brainwash dataset was captured during 3 days in 2014: October 27, November 13, and November 24. According the author's reserach paper introducing the dataset, the images were acquired with the help of Angelcam.com [cite orig paper]. -*Brainwash* is a head detection dataset created from San Francisco's Brainwash Cafe livecam footage. It includes 11,918 images of "everyday life of a busy downtown cafe[^readme]". The images are used to train and validate algorithms for detecting people in crowded scenes. +Brainwash is not a widely used dataset but since its publication by Stanford University in 2015, it has notably appeared in several research papers from the National University of Defense Technology in Changsha, China. In 2016 and in 2017 researchers there conducted studies on detecting people's heads in crowded scenes for the purpose of surveillance [^localized_region_context] [^replacement_algorithm]. -Before closing in 2017, The Brainwash Cafe was a combination cafe, laundromat, and performance venue located in San Francisco's SoMA district. The images used for Brainwash dataset were captured on 3 days: October 27, November 13, and November 24 in 2014. According the author's reserach paper introducing the dataset, the images were acquired with the help of Angelcam.com [cite orig paper]. - -Brainwash is not a widely used dataset but since it's publication by Stanford in 2015, it has notably appeared in several research papers from the National University of Defense Technology in Changsha, China. In 2016 and in 2017 researchers there conducted studies on "people head detection in crowded scenes" [^localized_region_context] [^replacement_algorithm]. +If you happen to have been at Brainwash cafe in San Franscisco at any time on October 26, November 13, or November 24 in 2014 you are most likely included in the Brainwash dataset. ![caption: The pixel-averaged image of all Brainwash dataset images is shown with 81,973 head annotations drawn from the Brainwash training partition. (c) Adam Harvey](assets/brainwash_mean_overlay.jpg) @@ -44,42 +43,25 @@ Brainwash is not a widely used dataset but since it's publication by Stanford in {% include 'map.html' %} -Add more analysis here - +{% include 'citations.html' %} {% include 'supplementary_header.html' %} -{% include 'citations.html' %} - ![caption: An sample image from the Brainwash dataset used for training face and head detection algorithms for surveillance. The datset contains about 12,000 images. License: Open Data Commons Public Domain Dedication (PDDL)](assets/00425000_960.jpg) ![caption: 49 of the 11,918 images included in the Brainwash dataset. License: Open Data Commons Public Domain Dedication (PDDL)](assets/brainwash_montage.jpg) -### Additional Information +#### Additional Resources - The dataset author spoke about his research at the CVPR conference in 2016 -To evaluate the performance of our approach, we collected a -large dataset of images from busy scenes using video footage available from public webcams. In -total, we collect 11917 images with 91146 labeled people. We extract images from video footage at -a fixed interval of 100 seconds to ensure a large variation in images. We allocate 1000 images for -testing and validation, and leave the remaining images for training, making sure that no temporal -overlaps exist between training and test splits. The resulting training set contains 82906 instances. -Test and validation sets contain 4922 and 3318 people instances respectively. Images were labeled -using Amazon Mechanical Turk by a handful of workers pre-selected through their performance on -an example task. We label each person’s head to avoid ambiguity in bounding box locations. The -annotator labels any person she is able to recognize, even if a substantial part of the person is not -visible. Images and annotations will be made available 1 . -Examples of collected images are shown in Fig. 6, and in the video included in the supplemental -material. Images in our dataset include challenges such as people at small scales, strong partial -occlusions, and a large variability in clothing and appearance. - TODO - add bounding boxes to the header image - remake montage with randomized images, with bboxes - clean up intro text +- verify quote citations ### Footnotes -- cgit v1.2.3-70-g09d2 From 5fd122dfdada96d4857fa48d2f7d1df8f787fcd5 Mon Sep 17 00:00:00 2001 From: adamhrv Date: Mon, 1 Apr 2019 12:51:19 +0200 Subject: caption --- client/chart/countriesByYear.chart.js | 1 + 1 file changed, 1 insertion(+) diff --git a/client/chart/countriesByYear.chart.js b/client/chart/countriesByYear.chart.js index 4257748c..2284f774 100644 --- a/client/chart/countriesByYear.chart.js +++ b/client/chart/countriesByYear.chart.js @@ -158,6 +158,7 @@ class CountriesByYearChart extends Component { } }} /> +
{paper.name}{' dataset citations by country per year'}
) } -- cgit v1.2.3-70-g09d2 From e61935270fe6bf34658865800c203334047be4a7 Mon Sep 17 00:00:00 2001 From: adamhrv Date: Mon, 1 Apr 2019 12:51:34 +0200 Subject: cosmetics --- site/assets/css/applets.css | 1 + site/assets/css/css.css | 7 ++++--- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/site/assets/css/applets.css b/site/assets/css/applets.css index 41d04783..7fac3e27 100644 --- a/site/assets/css/applets.css +++ b/site/assets/css/applets.css @@ -3,6 +3,7 @@ .applet_container { min-height: 340px; clear: left; + margin: 20px auto 40px auto; } .applet_container.autosize { min-height: 0; diff --git a/site/assets/css/css.css b/site/assets/css/css.css index 129d6090..cd16409a 100644 --- a/site/assets/css/css.css +++ b/site/assets/css/css.css @@ -169,9 +169,9 @@ h1 { h2 { color: #eee; font-weight: 400; - font-size: 32pt; - line-height: 43pt; - margin: 20px 0 10px; + font-size: 32px; + line-height: 43px; + margin: 20px 0 20px; padding: 0; transition: color 0.1s cubic-bezier(0,0,1,1); font-family: 'Roboto Mono', monospace; @@ -251,6 +251,7 @@ p { line-height: 2; font-size: 18px; font-weight: 300; + color: #dedede; } p.subp{ font-size: 14px; -- cgit v1.2.3-70-g09d2 From 7aaa8b8cd68d3eb09c68da2b0a64cbe635fdb8d5 Mon Sep 17 00:00:00 2001 From: adamhrv Date: Mon, 1 Apr 2019 12:51:52 +0200 Subject: updating datasets --- .../assets/duke_mtmc_cam5_average_comp.jpg | Bin 0 -> 195172 bytes site/content/pages/datasets/duke_mtmc/index.md | 28 ++++++++++++------ .../datasets/uccs/assets/uccs_bboxes_clr_fill.jpg | Bin 146050 -> 0 bytes .../datasets/uccs/assets/uccs_bboxes_grayscale.jpg | Bin 299802 -> 0 bytes .../datasets/uccs/assets/uccs_mean_bboxes_comp.jpg | Bin 0 -> 253215 bytes site/content/pages/datasets/uccs/index.md | 32 +++++++++++++++------ 6 files changed, 43 insertions(+), 17 deletions(-) create mode 100755 site/content/pages/datasets/duke_mtmc/assets/duke_mtmc_cam5_average_comp.jpg delete mode 100644 site/content/pages/datasets/uccs/assets/uccs_bboxes_clr_fill.jpg delete mode 100644 site/content/pages/datasets/uccs/assets/uccs_bboxes_grayscale.jpg create mode 100644 site/content/pages/datasets/uccs/assets/uccs_mean_bboxes_comp.jpg diff --git a/site/content/pages/datasets/duke_mtmc/assets/duke_mtmc_cam5_average_comp.jpg b/site/content/pages/datasets/duke_mtmc/assets/duke_mtmc_cam5_average_comp.jpg new file mode 100755 index 00000000..3cd64df1 Binary files /dev/null and b/site/content/pages/datasets/duke_mtmc/assets/duke_mtmc_cam5_average_comp.jpg differ diff --git a/site/content/pages/datasets/duke_mtmc/index.md b/site/content/pages/datasets/duke_mtmc/index.md index de1fa14c..c626ef4e 100644 --- a/site/content/pages/datasets/duke_mtmc/index.md +++ b/site/content/pages/datasets/duke_mtmc/index.md @@ -2,8 +2,8 @@ status: published title: Duke Multi-Target, Multi-Camera Tracking -desc: Duke MTMC is a dataset of CCTV footage of students at Duke University -subdesc: Duke MTMC contains over 2 million video frames and 2,000 unique identities collected from 8 cameras at Duke University campus in March 2014 +desc: Duke MTMC is a dataset of surveillance camera footage of students on Duke University campus +subdesc: Duke MTMC contains over 2 million video frames and 2,000 unique identities collected from 8 HD cameras at Duke University campus in March 2014 slug: duke_mtmc cssclass: dataset image: assets/background.jpg @@ -15,17 +15,27 @@ authors: Adam Harvey ### sidebar -+ Collected: March 19, 2014 -+ Cameras: 8 -+ Video Frames: 2,000,000 -+ Identities: Over 2,000 -+ Used for: Person re-identification,
face recognition -+ Sector: Academic ++ Created: 2014 ++ Identities: Over 2,700 ++ Used for: Face recognition, person re-identification ++ Created by: Computer Science Department, Duke University, Durham, US + Website: duke.edu ## Duke Multi-Target, Multi-Camera Tracking Dataset (Duke MTMC) -(PAGE UNDER DEVELOPMENT) +[ PAGE UNDER DEVELOPMENT ] + +Duke MTMC is a dataset of video recorded on Duke University campus during for the purpose of training, evaluating, and improving *multi-target multi-camera tracking*. The videos were recorded during February and March 2014 and cinclude + +Includes a total of 888.8 minutes of video (ind. verified) + +"We make available a new data set that has more than 2 million frames and more than 2,700 identities. It consists of 8×85 minutes of 1080p video recorded at 60 frames per second from 8 static cameras deployed on the Duke University campus during periods between lectures, when pedestrian traffic is heavy." + +The dataset includes approximately 2,000 annotated identities appearing in 85 hours of video from 8 cameras located throughout Duke University's campus. + +![caption: Duke MTMC pixel-averaged image of camera #5 is shown with the bounding boxes for each student drawn in white. (c) Adam Harvey](assets/duke_mtmc_cam5_average_comp.jpg) + +According to the dataset authors, {% include 'map.html' %} diff --git a/site/content/pages/datasets/uccs/assets/uccs_bboxes_clr_fill.jpg b/site/content/pages/datasets/uccs/assets/uccs_bboxes_clr_fill.jpg deleted file mode 100644 index c8002bb9..00000000 Binary files a/site/content/pages/datasets/uccs/assets/uccs_bboxes_clr_fill.jpg and /dev/null differ diff --git a/site/content/pages/datasets/uccs/assets/uccs_bboxes_grayscale.jpg b/site/content/pages/datasets/uccs/assets/uccs_bboxes_grayscale.jpg deleted file mode 100644 index 6e2833dd..00000000 Binary files a/site/content/pages/datasets/uccs/assets/uccs_bboxes_grayscale.jpg and /dev/null differ diff --git a/site/content/pages/datasets/uccs/assets/uccs_mean_bboxes_comp.jpg b/site/content/pages/datasets/uccs/assets/uccs_mean_bboxes_comp.jpg new file mode 100644 index 00000000..18f4c5ec Binary files /dev/null and b/site/content/pages/datasets/uccs/assets/uccs_mean_bboxes_comp.jpg differ diff --git a/site/content/pages/datasets/uccs/index.md b/site/content/pages/datasets/uccs/index.md index 092638c0..b3d16c2e 100644 --- a/site/content/pages/datasets/uccs/index.md +++ b/site/content/pages/datasets/uccs/index.md @@ -2,8 +2,8 @@ status: published title: Unconstrained College Students -desc: Unconstrained College Students (UCCS) is a dataset of images ... -subdesc: The UCCS dataset includes ... +desc: Unconstrained College Students (UCCS) is a dataset of long-range surveillance photos of students taken without their knowledge +subdesc: The UCCS dataset includes 16,149 images and 1,732 identities, is used for face recognition and face detection, and funded was several US defense agences slug: uccs cssclass: dataset image: assets/background.jpg @@ -15,16 +15,22 @@ authors: Adam Harvey ### sidebar -+ Collected: TBD -+ Published: TBD -+ Images: TBD -+ Faces: TBD ++ Published: 2018 ++ Images: 16,149 ++ Identities: 1,732 ++ Used for: Face recognition, face detection ++ Created by: Unviversity of Colorado Colorado Springs (US) ++ Funded by: ODNI, IARPA, ONR MURI, Amry SBIR, SOCOM SBIR ++ Website: vast.uccs.edu ## Unconstrained College Students ... (PAGE UNDER DEVELOPMENT) +![caption: The pixel-average of all Uconstrained College Students images is shown with all 51,838 face annotations. (c) Adam Harvey](assets/uccs_mean_bboxes_comp.jpg) + + {% include 'map.html' %} {% include 'chart.html' %} @@ -36,7 +42,6 @@ authors: Adam Harvey {% include 'citations.html' %} -![Bounding box visualization](assets/uccs_bboxes_grayscale.jpg) ### Research Notes @@ -55,4 +60,15 @@ The more recent UCCS version of the dataset received funding from [^funding_uccs [^funding_sb]: Sapkota, Archana and Boult, Terrance. "Large Scale Unconstrained Open Set Face Database." 2013. -[^funding_uccs]: Günther, M. et. al. "Unconstrained Face Detection and Open-Set Face Recognition Challenge," 2018. Arxiv 1708.02337v3. \ No newline at end of file +[^funding_uccs]: Günther, M. et. al. "Unconstrained Face Detection and Open-Set Face Recognition Challenge," 2018. Arxiv 1708.02337v3. + + +" In most face detection/recognition datasets, the majority of images are “posed”, i.e. the subjects know they are being photographed, and/or the images are selected for publication in public media. Hence, blurry, occluded and badly illuminated images are generally uncommon in these datasets. In addition, most of these challenges are close-set, i.e. the list of subjects in the gallery is the same as the one used for testing. + +This challenge explores more unconstrained data, by introducing the new UnConstrained College Students (UCCS) dataset, where subjects are photographed using a long-range high-resolution surveillance camera without their knowledge. Faces inside these images are of various poses, and varied levels of blurriness and occlusion. The challenge also creates an open set recognition problem, where unknown people will be seen during testing and must be rejected. + +With this challenge, we hope to foster face detection and recognition research towards surveillance applications that are becoming more popular and more required nowadays, and where no automatic recognition algorithm has proven to be useful yet. + +UnConstrained College Students (UCCS) Dataset + +The UCCS dataset was collected over several months using Canon 7D camera fitted with Sigma 800mm F5.6 EX APO DG HSM lens, taking images at one frame per second, during times when many students were walking on the sidewalk. " \ No newline at end of file -- cgit v1.2.3-70-g09d2 From c9c353296dff4b4f0afa770e106d67eb8fe80c70 Mon Sep 17 00:00:00 2001 From: adamhrv Date: Mon, 1 Apr 2019 12:52:06 +0200 Subject: txt tweaks --- site/includes/chart.html | 3 +-- site/includes/citations.html | 4 ++-- site/includes/map.html | 9 +++++---- site/includes/supplementary_header.html | 3 ++- 4 files changed, 10 insertions(+), 9 deletions(-) diff --git a/site/includes/chart.html b/site/includes/chart.html index 45c13493..01c2e83b 100644 --- a/site/includes/chart.html +++ b/site/includes/chart.html @@ -2,8 +2,7 @@

Who used {{ metadata.meta.dataset.name_display }}?

- This bar chart presents a ranking of the top countries where citations originated. Mouse over individual columns - to see yearly totals. These charts show at most the top 10 countries. + This bar chart presents a ranking of the top countries where dataset citations originated. Mouse over individual columns to see yearly totals. These charts show at most the top 10 countries.

diff --git a/site/includes/citations.html b/site/includes/citations.html index f15c5148..74ac5cdc 100644 --- a/site/includes/citations.html +++ b/site/includes/citations.html @@ -1,8 +1,8 @@
-

Citations

+

Dataset Citations

- Citations were collected from Semantic Scholar, a website which aggregates and indexes research papers. The citations were geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train and/or test machine learning algorithms. + The dataset citations used in the visualizations were collected from Semantic Scholar, a website that aggregates and indexes research papers. Each citation has been geocoded using names of institutions found in the PDF front matter, or as listed on other resources then manually verified to show that researchers downloaded and used the dataset to train and/or test machine learning algorithms.

Add [button/link] to download CSV. Add search input field to filter. Expand number of rows to 10. Reduce URL text to show only the domain (ie https://arxiv.org/pdf/123456 --> arxiv.org) diff --git a/site/includes/map.html b/site/includes/map.html index 867ada4c..31d577cd 100644 --- a/site/includes/map.html +++ b/site/includes/map.html @@ -1,6 +1,6 @@

-

Information Supply Chain

+

Biometric Trade Routes

- To understand how {{ metadata.meta.dataset.name_display }} has been used around the world... - affected global research on computer vision, surveillance, defense, and consumer technology, the and where this dataset has been used the locations of each organization that used or referenced the datast + To help understand how {{ metadata.meta.dataset.name_display }} has been used around the world for commercial, military and academic research; publicly available research citations {{ metadata.meta.dataset.name_display }} are collected, verified, and geocoded to show the biometric trade routes of people appearing in the images. Click on the markers to reveal reserach projects at that location.

+
@@ -31,8 +31,9 @@
-
+ \ No newline at end of file diff --git a/site/includes/supplementary_header.html b/site/includes/supplementary_header.html index 5fd4b2b4..bcd84223 100644 --- a/site/includes/supplementary_header.html +++ b/site/includes/supplementary_header.html @@ -6,5 +6,6 @@
-

Supplementary Information

+

Supplementary Information

+
-- cgit v1.2.3-70-g09d2 From 4a11e59f991c8ca12ef4ca20a3b01741f311a0e4 Mon Sep 17 00:00:00 2001 From: adamhrv Date: Mon, 1 Apr 2019 13:10:52 +0200 Subject: updates, broke smth --- site/assets/css/css.css | 4 +- site/content/pages/datasets/index.md | 2 +- site/content/pages/datasets/uccs/index.md | 3 +- .../research/01_from_1_to_100_pixels/index.md | 52 ++++++++++++++++++++++ .../research/02_what_computers_can_see/index.md | 25 ++++++++++- site/includes/map.html | 2 +- 6 files changed, 81 insertions(+), 7 deletions(-) diff --git a/site/assets/css/css.css b/site/assets/css/css.css index cd16409a..0ee8a4f3 100644 --- a/site/assets/css/css.css +++ b/site/assets/css/css.css @@ -884,7 +884,7 @@ ul.map-legend li.source:before { font-family: Roboto, sans-serif; font-weight: 400; background: #202020; - padding: 15px; + padding: 20px; margin: 10px; } .columns .column:first-of-type { @@ -937,7 +937,7 @@ ul.map-legend li.source:before { margin:0 0 0 40px; } .content-about .team-member p{ - font-size:14px; + font-size:16px; } .content-about .team-member img{ margin:0; diff --git a/site/content/pages/datasets/index.md b/site/content/pages/datasets/index.md index 2e943fbe..c0373d60 100644 --- a/site/content/pages/datasets/index.md +++ b/site/content/pages/datasets/index.md @@ -13,4 +13,4 @@ sync: false # Facial Recognition Datasets -### Survey +Explore publicly available facial recognition datasets. More datasets will be added throughout 2019. diff --git a/site/content/pages/datasets/uccs/index.md b/site/content/pages/datasets/uccs/index.md index b3d16c2e..e0925e07 100644 --- a/site/content/pages/datasets/uccs/index.md +++ b/site/content/pages/datasets/uccs/index.md @@ -3,8 +3,7 @@ status: published title: Unconstrained College Students desc: Unconstrained College Students (UCCS) is a dataset of long-range surveillance photos of students taken without their knowledge -subdesc: The UCCS dataset includes 16,149 images and 1,732 identities, is used for face recognition and face detection, and funded was several US defense agences -slug: uccs +subdesc: The UCCS dataset includes 16,149 images and 1,732 identities of students at University of Colorado Colorado Springs campus and is used for face recognition and face detection cssclass: dataset image: assets/background.jpg published: 2019-2-23 diff --git a/site/content/pages/research/01_from_1_to_100_pixels/index.md b/site/content/pages/research/01_from_1_to_100_pixels/index.md index a7b863a9..b219dffb 100644 --- a/site/content/pages/research/01_from_1_to_100_pixels/index.md +++ b/site/content/pages/research/01_from_1_to_100_pixels/index.md @@ -56,3 +56,55 @@ Ideas: - "Note that we only keep the images with a minimal side length of 80 pixels." and "a face will be labeled as “Ignore” if it is very difficult to be detected due to blurring, severe deformation and unrecognizable eyes, or the side length of its bounding box is less than 32 pixels." Ge_Detecting_Masked_Faces_CVPR_2017_paper.pdf - IBM DiF: "Faces with region size less than 50x50 or inter-ocular distance of less than 30 pixels were discarded. Faces with non-frontal pose, or anything beyond being slightly tilted to the left or the right, were also discarded." + + + + +As the resolution +formatted as rectangular databases of 16 bit RGB-tuples or 8 bit grayscale values + + +To consider how visual privacy applies to real world surveillance situations, the first + +A single 8-bit grayscale pixel with 256 values is enough to represent the entire alphabet `a-Z0-9` with room to spare. + +A 2x2 pixels contains + +Using no more than a 42 pixel (6x7 image) face image researchers [cite] were able to correctly distinguish between a group of 50 people. Yet + +The likely outcome of face recognition research is that more data is needed to improve. Indeed, resolution is the determining factor for all biometric systems, both as training data to increase + +Pixels, typically considered the buiding blocks of images and vidoes, can also be plotted as a graph of sensor values corresponding to the intensity of RGB-calibrated sensors. + + +Wi-Fi and cameras presents elevated risks for transmitting videos and image documentation from conflict zones, high-risk situations, or even sharing on social media. How can new developments in computer vision also be used in reverse, as a counter-forensic tool, to minimize an individual's privacy risk? + +As the global Internet becomes increasingly effecient at turning the Internet into a giant dataset for machine learning, forensics, and data analysing, it would be prudent to also consider tools for decreasing the resolution. The Visual Defense module is just that. What are new ways to minimize the adverse effects of surveillance by dulling the blade. For example, a researcher paper showed that by decreasing a face size to 12x16 it was possible to do 98% accuracy with 50 people. This is clearly an example of + +This research module, tentatively called Visual Defense Tools, aims to explore the + + +### Prior Research + +- MPI visual privacy advisor +- NIST: super resolution +- YouTube blur tool +- WITNESS: blur tool +- Pixellated text +- CV Dazzle +- Bellingcat guide to geolocation +- Peng! magic passport + +### Notes + +- In China, out of the approximately 200 million surveillance cameras only about 15% have enough resolution for face recognition. +- In Apple's FaceID security guide, the probability of someone else's face unlocking your phone is 1 out of 1,000,000. +- In England, the Metropolitan Police reported a false-positive match rate of 98% when attempting to use face recognition to locate wanted criminals. +- In a face recognition trial at Berlin's Sudkreuz station, the false-match rate was 20%. + + +What all 3 examples illustrate is that face recognition is anything but absolute. In a 2017 talk, Jason Matheny the former directory of IARPA, admitted the face recognition is so brittle it can be subverted by using a magic marker and drawing "a few dots on your forehead". In fact face recognition is a misleading term. Face recognition is search engine for faces that can only ever show you the mos likely match. This presents real a real threat to privacy and lends + + +Globally, iPhone users unwittingly agree to 1/1,000,000 probably +relying on FaceID and TouchID to protect their information agree to a \ No newline at end of file diff --git a/site/content/pages/research/02_what_computers_can_see/index.md b/site/content/pages/research/02_what_computers_can_see/index.md index ab4c7884..51621f46 100644 --- a/site/content/pages/research/02_what_computers_can_see/index.md +++ b/site/content/pages/research/02_what_computers_can_see/index.md @@ -100,6 +100,7 @@ A list of 100 things computer vision can see, eg: - Wearing Necktie - Wearing Necklace +for i in {1..9};do wget http://visiond1.cs.umbc.edu/webpage/codedata/ADLdataset/ADL_videos/P_0$i.MP4;done;for i in {10..20}; do wget http://visiond1.cs.umbc.edu/webpage/codedata/ADLdataset/ADL_videos/P_$i.MP4;done ## From Market 1501 @@ -149,4 +150,26 @@ Visibility boolean for each keypoint Region annotations (upper clothes, lower clothes, dress, socks, shoes, hands, gloves, neck, face, hair, hat, sunglasses, bag, occluder) Body type (male, female or child) -source: https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/shape/h3d/ \ No newline at end of file +source: https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/shape/h3d/ + +## From Leeds Sports Pose + +=INDEX(A2:A9,MATCH(datasets!D1,B2:B9,0)) +=VLOOKUP(A2, datasets!A:J, 7, FALSE) + +Right ankle +Right knee +Right hip +Left hip +Left knee +Left ankle +Right wrist +Right elbow +Right shoulder +Left shoulder +Left elbow +Left wrist +Neck +Head top + +source: http://web.archive.org/web/20170915023005/sam.johnson.io/research/lsp.html \ No newline at end of file diff --git a/site/includes/map.html b/site/includes/map.html index 31d577cd..30c248a6 100644 --- a/site/includes/map.html +++ b/site/includes/map.html @@ -12,7 +12,7 @@ -->

- To help understand how {{ metadata.meta.dataset.name_display }} has been used around the world for commercial, military and academic research; publicly available research citations {{ metadata.meta.dataset.name_display }} are collected, verified, and geocoded to show the biometric trade routes of people appearing in the images. Click on the markers to reveal reserach projects at that location. + To help understand how {{ metadata.meta.dataset.name_display }} has been used around the world for commercial, military and academic research; publicly available research citing {{ metadata.meta.dataset.name_full} is collected, verified, and geocoded to show the biometric trade routes of people appearing in the images. Click on the markers to reveal reserach projects at that location.

-- cgit v1.2.3-70-g09d2