summaryrefslogtreecommitdiff
path: root/site/content/pages/datasets/ijb_c/index.md
diff options
context:
space:
mode:
authoradamhrv <adam@ahprojects.com>2019-06-27 23:58:12 +0200
committeradamhrv <adam@ahprojects.com>2019-06-27 23:58:12 +0200
commitae165ef1235a6997d5791ca241fd3fd134202c92 (patch)
treec258e6837c579d4d4baa42d85dca78b036ca022b /site/content/pages/datasets/ijb_c/index.md
parent5e6803d488b2ea7379d608932214b201b80d9eac (diff)
editing tyupos
Diffstat (limited to 'site/content/pages/datasets/ijb_c/index.md')
-rw-r--r--site/content/pages/datasets/ijb_c/index.md31
1 files changed, 7 insertions, 24 deletions
diff --git a/site/content/pages/datasets/ijb_c/index.md b/site/content/pages/datasets/ijb_c/index.md
index d1ac769b..70c71f19 100644
--- a/site/content/pages/datasets/ijb_c/index.md
+++ b/site/content/pages/datasets/ijb_c/index.md
@@ -21,36 +21,19 @@ authors: Adam Harvey
[ page under development ]
-The IARPA Janus Benchmark C (IJB&ndash;C) is a dataset of web images used for face recognition research and development. The IJB&ndash;C dataset contains 3,531 people
+The IARPA Janus Benchmark C (IJB&ndash;C) is a dataset of web images used for face recognition research and development. The IJB&ndash;C dataset contains 3,531 people from 21,294 images and 3,531 videos. The list of 3,531 names are activists, artists, journalists, foreign politicians, and public speakers.
-Among the target list of 3,531 names are activists, artists, journalists, foreign politicians,
+Key Findings:
-
-
-- Subjects 3531
-- Templates: 140739
-- Genuine Matches: 7819362
-- Impostor Matches: 39584639
-
-
-Why not include US Soliders instead of activists?
-
-
-was creted by Nobilis, a United States Government contractor is used to develop software for the US intelligence agencies as part of the IARPA Janus program.
-
-The IARPA Janus program is
-
-these representations must address the challenges of Aging, Pose, Illumination, and Expression (A-PIE) by exploiting all available imagery.
-
-
-- metadata annotations were created using crowd annotations
-- created by Nobilis
-- used mechanical turk
+- metadata annotations were created using crowd annotations on Mechanical Turk
+- The dataset was creatd Nobilis
- made for intelligence analysts
- improve performance of face recognition tools
- by fusing the rich spatial, temporal, and contextual information available from the multiple views captured by today’s "media in the wild"
+The dataset includes Creative Commons images
+
The name list includes
@@ -92,7 +75,7 @@ The first 777 are non-alphabetical. From 777-3531 is alphabetical
From original papers: https://noblis.org/wp-content/uploads/2018/03/icb2018.pdf
-Collection for the dataset began by identifying CreativeCommons subject videos, which are often more scarce thanCreative Commons subject images. Search terms that re-sulted in large quantities of person-centric videos (e.g. “in-terview”) were generated and translated into numerous lan-guages including Arabic, Korean, Swahili, and Hindi to in-crease diversity of the subject pool. Certain YouTube userswho upload well-labeled, person-centric videos, such as the World Economic Forum and the International University Sports Federation were also identified. Titles of videos per-taining to these search terms and usernames were scrapedusing the YouTube Data API and translated into English us-ing the Yandex Translate API4. Pattern matching was per-formed to extract potential names of subjects from the trans-lated titles, and these names were searched using the Wiki-data API to verify the subject’s existence and status as a public figure, and to check for Wikimedia Commons im-agery. Age, gender, and geographic region were collectedusing the Wikipedia API.Using the candidate subject names, Creative Commonsimages were scraped from Google and Wikimedia Com-mons, and Creative Commons videos were scraped fromYouTube. After images and videos of the candidate subjectwere identified, AMT Workers were tasked with validat-ing the subject’s presence throughout the video. The AMTWorkers marked segments of the video in which the subjectwas present, and key frames
+Collection for the dataset began by identifying CreativeCommons subject videos, which are often more scarce than Creative Commons subject images. Search terms that re-sulted in large quantities of person-centric videos (e.g. “in-terview”) were generated and translated into numerous lan-guages including Arabic, Korean, Swahili, and Hindi to in-crease diversity of the subject pool. Certain YouTube userswho upload well-labeled, person-centric videos, such as the World Economic Forum and the International University Sports Federation were also identified. Titles of videos per-taining to these search terms and usernames were scrapedusing the YouTube Data API and translated into English us-ing the Yandex Translate API4. Pattern matching was per-formed to extract potential names of subjects from the trans-lated titles, and these names were searched using the Wiki-data API to verify the subject’s existence and status as a public figure, and to check for Wikimedia Commons im-agery. Age, gender, and geographic region were collectedusing the Wikipedia API.Using the candidate subject names, Creative Commonsimages were scraped from Google and Wikimedia Com-mons, and Creative Commons videos were scraped fromYouTube. After images and videos of the candidate subjectwere identified, AMT Workers were tasked with validat-ing the subject’s presence throughout the video. The AMTWorkers marked segments of the video in which the subjectwas present, and key frames
IARPA funds Italian researcher https://www.micc.unifi.it/projects/glaivejanus/