summaryrefslogtreecommitdiff
path: root/site/public/datasets/ijb_c/index.html
diff options
context:
space:
mode:
authoradamhrv <adam@ahprojects.com>2019-06-27 23:58:23 +0200
committeradamhrv <adam@ahprojects.com>2019-06-27 23:58:23 +0200
commit852e4c1e36c38f57f80fc5d441da82d5991b2212 (patch)
tree0c8bc3bbcb6c679e28ba387d0c1e47fb3d16830a /site/public/datasets/ijb_c/index.html
parentae165ef1235a6997d5791ca241fd3fd134202c92 (diff)
update public
Diffstat (limited to 'site/public/datasets/ijb_c/index.html')
-rw-r--r--site/public/datasets/ijb_c/index.html22
1 files changed, 6 insertions, 16 deletions
diff --git a/site/public/datasets/ijb_c/index.html b/site/public/datasets/ijb_c/index.html
index ccb7d90d..a36fac14 100644
--- a/site/public/datasets/ijb_c/index.html
+++ b/site/public/datasets/ijb_c/index.html
@@ -76,26 +76,16 @@
<div class='gray'>Website</div>
<div><a href='https://www.nist.gov/programs-projects/face-challenges' target='_blank' rel='nofollow noopener'>nist.gov</a></div>
</div></div><p>[ page under development ]</p>
-<p>The IARPA Janus Benchmark C (IJB&ndash;C) is a dataset of web images used for face recognition research and development. The IJB&ndash;C dataset contains 3,531 people</p>
-<p>Among the target list of 3,531 names are activists, artists, journalists, foreign politicians,</p>
+<p>The IARPA Janus Benchmark C (IJB&ndash;C) is a dataset of web images used for face recognition research and development. The IJB&ndash;C dataset contains 3,531 people from 21,294 images and 3,531 videos. The list of 3,531 names are activists, artists, journalists, foreign politicians, and public speakers.</p>
+<p>Key Findings:</p>
<ul>
-<li>Subjects 3531</li>
-<li>Templates: 140739</li>
-<li>Genuine Matches: 7819362</li>
-<li>Impostor Matches: 39584639</li>
-</ul>
-<p>Why not include US Soliders instead of activists?</p>
-<p>was creted by Nobilis, a United States Government contractor is used to develop software for the US intelligence agencies as part of the IARPA Janus program.</p>
-<p>The IARPA Janus program is</p>
-<p>these representations must address the challenges of Aging, Pose, Illumination, and Expression (A-PIE) by exploiting all available imagery.</p>
-<ul>
-<li>metadata annotations were created using crowd annotations</li>
-<li>created by Nobilis</li>
-<li>used mechanical turk</li>
+<li>metadata annotations were created using crowd annotations on Mechanical Turk</li>
+<li>The dataset was creatd Nobilis</li>
<li>made for intelligence analysts</li>
<li>improve performance of face recognition tools</li>
<li>by fusing the rich spatial, temporal, and contextual information available from the multiple views captured by today’s "media in the wild"</li>
</ul>
+<p>The dataset includes Creative Commons images</p>
<p>The name list includes</p>
<ul>
<li>2 videos from CCC<ul>
@@ -134,7 +124,7 @@
<p>The first 777 are non-alphabetical. From 777-3531 is alphabetical</p>
</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/ijb_c/assets/ijb_c_montage.jpg' alt=' A visualization of the IJB-C dataset'><div class='caption'> A visualization of the IJB-C dataset</div></div></section><section><h2>Research notes</h2>
<p>From original papers: <a href="https://noblis.org/wp-content/uploads/2018/03/icb2018.pdf">https://noblis.org/wp-content/uploads/2018/03/icb2018.pdf</a></p>
-<p>Collection for the dataset began by identifying CreativeCommons subject videos, which are often more scarce thanCreative Commons subject images. Search terms that re-sulted in large quantities of person-centric videos (e.g. “in-terview”) were generated and translated into numerous lan-guages including Arabic, Korean, Swahili, and Hindi to in-crease diversity of the subject pool. Certain YouTube userswho upload well-labeled, person-centric videos, such as the World Economic Forum and the International University Sports Federation were also identified. Titles of videos per-taining to these search terms and usernames were scrapedusing the YouTube Data API and translated into English us-ing the Yandex Translate API4. Pattern matching was per-formed to extract potential names of subjects from the trans-lated titles, and these names were searched using the Wiki-data API to verify the subject’s existence and status as a public figure, and to check for Wikimedia Commons im-agery. Age, gender, and geographic region were collectedusing the Wikipedia API.Using the candidate subject names, Creative Commonsimages were scraped from Google and Wikimedia Com-mons, and Creative Commons videos were scraped fromYouTube. After images and videos of the candidate subjectwere identified, AMT Workers were tasked with validat-ing the subject’s presence throughout the video. The AMTWorkers marked segments of the video in which the subjectwas present, and key frames</p>
+<p>Collection for the dataset began by identifying CreativeCommons subject videos, which are often more scarce than Creative Commons subject images. Search terms that re-sulted in large quantities of person-centric videos (e.g. “in-terview”) were generated and translated into numerous lan-guages including Arabic, Korean, Swahili, and Hindi to in-crease diversity of the subject pool. Certain YouTube userswho upload well-labeled, person-centric videos, such as the World Economic Forum and the International University Sports Federation were also identified. Titles of videos per-taining to these search terms and usernames were scrapedusing the YouTube Data API and translated into English us-ing the Yandex Translate API4. Pattern matching was per-formed to extract potential names of subjects from the trans-lated titles, and these names were searched using the Wiki-data API to verify the subject’s existence and status as a public figure, and to check for Wikimedia Commons im-agery. Age, gender, and geographic region were collectedusing the Wikipedia API.Using the candidate subject names, Creative Commonsimages were scraped from Google and Wikimedia Com-mons, and Creative Commons videos were scraped fromYouTube. After images and videos of the candidate subjectwere identified, AMT Workers were tasked with validat-ing the subject’s presence throughout the video. The AMTWorkers marked segments of the video in which the subjectwas present, and key frames</p>
<p>IARPA funds Italian researcher <a href="https://www.micc.unifi.it/projects/glaivejanus/">https://www.micc.unifi.it/projects/glaivejanus/</a></p>
</section><section>
<h3>Who used IJB-C?</h3>