MegaPixels
IJB-C
IARPA Janus Benchmark C is a dataset of web images used
The IJB-C dataset contains 21,294 images and 11,779 videos of 3,531 identities

IARPA Janus Benchmark C (IJB-C)

[ page under development ]

The IARPA Janus Benchmark C (IJB–C) is a dataset of web images used for face recognition research and development. The IJB–C dataset contains 3,531 people

Among the target list of 3,531 names are activists, artists, journalists, foreign politicians,

Why not include US Soliders instead of activists?

was creted by Nobilis, a United States Government contractor is used to develop software for the US intelligence agencies as part of the IARPA Janus program.

The IARPA Janus program is

these representations must address the challenges of Aging, Pose, Illumination, and Expression (A-PIE) by exploiting all available imagery.

The name list includes

The first 777 are non-alphabetical. From 777-3531 is alphabetical

 A visualization of the IJB-C dataset
A visualization of the IJB-C dataset

Research notes

From original papers: https://noblis.org/wp-content/uploads/2018/03/icb2018.pdf

Collection for the dataset began by identifying CreativeCommons subject videos, which are often more scarce thanCreative Commons subject images. Search terms that re-sulted in large quantities of person-centric videos (e.g. “in-terview”) were generated and translated into numerous lan-guages including Arabic, Korean, Swahili, and Hindi to in-crease diversity of the subject pool. Certain YouTube userswho upload well-labeled, person-centric videos, such as the World Economic Forum and the International University Sports Federation were also identified. Titles of videos per-taining to these search terms and usernames were scrapedusing the YouTube Data API and translated into English us-ing the Yandex Translate API4. Pattern matching was per-formed to extract potential names of subjects from the trans-lated titles, and these names were searched using the Wiki-data API to verify the subject’s existence and status as a public figure, and to check for Wikimedia Commons im-agery. Age, gender, and geographic region were collectedusing the Wikipedia API.Using the candidate subject names, Creative Commonsimages were scraped from Google and Wikimedia Com-mons, and Creative Commons videos were scraped fromYouTube. After images and videos of the candidate subjectwere identified, AMT Workers were tasked with validat-ing the subject’s presence throughout the video. The AMTWorkers marked segments of the video in which the subjectwas present, and key frames

IARPA funds Italian researcher https://www.micc.unifi.it/projects/glaivejanus/

Who used IJB-C?

This bar chart presents a ranking of the top countries where dataset citations originated. Mouse over individual columns to see yearly totals. These charts show at most the top 10 countries.

Information Supply chain

To help understand how IJB-C has been used around the world by commercial, military, and academic organizations; existing publicly available research citing IARPA Janus Benchmark C was collected, verified, and geocoded to show the biometric trade routes of people appearing in the images. Click on the markers to reveal research projects at that location.

Citation data is collected using SemanticScholar.org then dataset usage verified and geolocated.

Dataset Citations

The dataset citations used in the visualizations were collected from Semantic Scholar, a website which aggregates and indexes research papers. Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train or test machine learning algorithms. If you use our data, please cite our work.

Supplementary Information

Cite Our Work

If you find this analysis helpful, please cite our work:

@online{megapixels,
  author = {Harvey, Adam. LaPlace, Jules.},
  title = {MegaPixels: Origins, Ethics, and Privacy Implications of Publicly Available Face Recognition Image Datasets},
  year = 2019,
  url = {https://megapixels.cc/},
  urldate = {2019-04-18}
}