diff options
| author | jules@lens <julescarbon@gmail.com> | 2019-05-03 16:02:03 +0200 |
|---|---|---|
| committer | jules@lens <julescarbon@gmail.com> | 2019-05-03 16:02:03 +0200 |
| commit | d0bc27630c13c4649eb394a49525f4150e4b82f2 (patch) | |
| tree | 71fbf167457dcbdeff44f223b7dbb8aa6302947f /site/content/pages/datasets/ijb_c/index.md | |
| parent | 8b0408ab56c687352228e8ec50a71ad48bdd6d18 (diff) | |
| parent | f7b1c28108143eaf99df37c2bb5d8e711733b40e (diff) | |
Merge branch 'master' of asdf.us:megapixels_dev
Diffstat (limited to 'site/content/pages/datasets/ijb_c/index.md')
| -rw-r--r-- | site/content/pages/datasets/ijb_c/index.md | 9 |
1 files changed, 9 insertions, 0 deletions
diff --git a/site/content/pages/datasets/ijb_c/index.md b/site/content/pages/datasets/ijb_c/index.md index 46cab323..9e3f1808 100644 --- a/site/content/pages/datasets/ijb_c/index.md +++ b/site/content/pages/datasets/ijb_c/index.md @@ -27,6 +27,15 @@ The IARPA Janus Benchmark C is a dataset created by  +## Research notes + +From original papers: https://noblis.org/wp-content/uploads/2018/03/icb2018.pdf + +Collection for the dataset began by identifying CreativeCommons subject videos, which are often more scarce thanCreative Commons subject images. Search terms that re-sulted in large quantities of person-centric videos (e.g. “in-terview”) were generated and translated into numerous lan-guages including Arabic, Korean, Swahili, and Hindi to in-crease diversity of the subject pool. Certain YouTube userswho upload well-labeled, person-centric videos, such as the World Economic Forum and the International University Sports Federation were also identified. Titles of videos per-taining to these search terms and usernames were scrapedusing the YouTube Data API and translated into English us-ing the Yandex Translate API4. Pattern matching was per-formed to extract potential names of subjects from the trans-lated titles, and these names were searched using the Wiki-data API to verify the subject’s existence and status as a public figure, and to check for Wikimedia Commons im-agery. Age, gender, and geographic region were collectedusing the Wikipedia API.Using the candidate subject names, Creative Commonsimages were scraped from Google and Wikimedia Com-mons, and Creative Commons videos were scraped fromYouTube. After images and videos of the candidate subjectwere identified, AMT Workers were tasked with validat-ing the subject’s presence throughout the video. The AMTWorkers marked segments of the video in which the subjectwas present, and key frames + + +IARPA funds Italian researcher https://www.micc.unifi.it/projects/glaivejanus/ + {% include 'dashboard.html' %} {% include 'supplementary_header.html' %} |
