From 55e2c02d983dbea026effa7decbcd3e46b6ae9c0 Mon Sep 17 00:00:00 2001 From: adamhrv Date: Mon, 27 May 2019 20:13:59 +0200 Subject: remove UCCS exif data --- site/public/datasets/ijb_c/index.html | 57 ++++++++++++++++++++++++++++++++++- site/public/datasets/uccs/index.html | 4 --- 2 files changed, 56 insertions(+), 5 deletions(-) (limited to 'site/public/datasets') diff --git a/site/public/datasets/ijb_c/index.html b/site/public/datasets/ijb_c/index.html index 83d572ab..06232f72 100644 --- a/site/public/datasets/ijb_c/index.html +++ b/site/public/datasets/ijb_c/index.html @@ -74,7 +74,62 @@
Website
nist.gov

[ page under development ]

-

The IARPA Janus Benchmark C is a dataset created by

+

The IARPA Janus Benchmark C (IJB–C) is a dataset of web images used for face recognition research and development. The IJB–C dataset contains 3,531 people

+

Among the target list of 3,531 names are activists, artists, journalists, foreign politicians,

+ +

Why not include US Soliders instead of activists?

+

was creted by Nobilis, a United States Government contractor is used to develop software for the US intelligence agencies as part of the IARPA Janus program.

+

The IARPA Janus program is

+

these representations must address the challenges of Aging, Pose, Illumination, and Expression (A-PIE) by exploiting all available imagery.

+ +

The name list includes

+ +

The first 777 are non-alphabetical. From 777-3531 is alphabetical

 A visualization of the IJB-C dataset
A visualization of the IJB-C dataset

Research notes

From original papers: https://noblis.org/wp-content/uploads/2018/03/icb2018.pdf

Collection for the dataset began by identifying CreativeCommons subject videos, which are often more scarce thanCreative Commons subject images. Search terms that re-sulted in large quantities of person-centric videos (e.g. “in-terview”) were generated and translated into numerous lan-guages including Arabic, Korean, Swahili, and Hindi to in-crease diversity of the subject pool. Certain YouTube userswho upload well-labeled, person-centric videos, such as the World Economic Forum and the International University Sports Federation were also identified. Titles of videos per-taining to these search terms and usernames were scrapedusing the YouTube Data API and translated into English us-ing the Yandex Translate API4. Pattern matching was per-formed to extract potential names of subjects from the trans-lated titles, and these names were searched using the Wiki-data API to verify the subject’s existence and status as a public figure, and to check for Wikimedia Commons im-agery. Age, gender, and geographic region were collectedusing the Wikipedia API.Using the candidate subject names, Creative Commonsimages were scraped from Google and Wikimedia Com-mons, and Creative Commons videos were scraped fromYouTube. After images and videos of the candidate subjectwere identified, AMT Workers were tasked with validat-ing the subject’s presence throughout the video. The AMTWorkers marked segments of the video in which the subjectwas present, and key frames

diff --git a/site/public/datasets/uccs/index.html b/site/public/datasets/uccs/index.html index c4be8af0..5044af2a 100644 --- a/site/public/datasets/uccs/index.html +++ b/site/public/datasets/uccs/index.html @@ -253,10 +253,6 @@ Their setup made it impossible for students to know they were being photographed
  • Please direct any questions about the ethics of the dataset to the University of Colorado Colorado Springs Ethics and Compliance Office
  • For further technical information about the UnConstrained College Students dataset, visit the UCCS dataset project page.
  • -

    Downloads

    -

    Cite Our Work

    -- cgit v1.2.3-70-g09d2