summaryrefslogtreecommitdiff
path: root/site/content/pages/datasets/ijb_c
diff options
context:
space:
mode:
authorAdam Harvey <adam@ahprojects.com>2019-05-23 18:37:06 +0200
committerAdam Harvey <adam@ahprojects.com>2019-05-23 18:37:06 +0200
commitb2b2c7d7816baa7d6de36c1de3576a31aa92a209 (patch)
tree9105ef39a3bfcd78e9cf4b8c183ee21e7149bf66 /site/content/pages/datasets/ijb_c
parent4559cf6cccfb6f6d8b8e59e95984044fdf5a5610 (diff)
parent84b286e1bd85feba12174a2a480d2be404e7b9c5 (diff)
merge
Diffstat (limited to 'site/content/pages/datasets/ijb_c')
-rw-r--r--site/content/pages/datasets/ijb_c/index.md9
1 files changed, 9 insertions, 0 deletions
diff --git a/site/content/pages/datasets/ijb_c/index.md b/site/content/pages/datasets/ijb_c/index.md
index 0671252b..d1ac769b 100644
--- a/site/content/pages/datasets/ijb_c/index.md
+++ b/site/content/pages/datasets/ijb_c/index.md
@@ -88,6 +88,15 @@ The first 777 are non-alphabetical. From 777-3531 is alphabetical
![caption: A visualization of the IJB-C dataset](assets/ijb_c_montage.jpg)
+## Research notes
+
+From original papers: https://noblis.org/wp-content/uploads/2018/03/icb2018.pdf
+
+Collection for the dataset began by identifying CreativeCommons subject videos, which are often more scarce thanCreative Commons subject images. Search terms that re-sulted in large quantities of person-centric videos (e.g. “in-terview”) were generated and translated into numerous lan-guages including Arabic, Korean, Swahili, and Hindi to in-crease diversity of the subject pool. Certain YouTube userswho upload well-labeled, person-centric videos, such as the World Economic Forum and the International University Sports Federation were also identified. Titles of videos per-taining to these search terms and usernames were scrapedusing the YouTube Data API and translated into English us-ing the Yandex Translate API4. Pattern matching was per-formed to extract potential names of subjects from the trans-lated titles, and these names were searched using the Wiki-data API to verify the subject’s existence and status as a public figure, and to check for Wikimedia Commons im-agery. Age, gender, and geographic region were collectedusing the Wikipedia API.Using the candidate subject names, Creative Commonsimages were scraped from Google and Wikimedia Com-mons, and Creative Commons videos were scraped fromYouTube. After images and videos of the candidate subjectwere identified, AMT Workers were tasked with validat-ing the subject’s presence throughout the video. The AMTWorkers marked segments of the video in which the subjectwas present, and key frames
+
+
+IARPA funds Italian researcher https://www.micc.unifi.it/projects/glaivejanus/
+
{% include 'dashboard.html' %}
{% include 'supplementary_header.html' %}