Helen is a dataset of annotated face images used for facial component localization. It includes 2,330 images from Flickr found by searching for "portrait" combined with terms such as "family", "wedding", "boy", "outdoor", and "studio". 1
The dataset was published in 2012 with the primary motivation listed as facilitating "high quality editing of portraits". However, the paper's introduction also mentions that facial feature localization "is an essential component for face recognition, tracking and expression analysis." 1
Irregardless of the authors' primary motivations, the HELEN dataset has become one of the most widely used datasets for training facial landmark algorithms, which are essential parts of most facial recogntion processing systems. Facial landmarking are used to isolate facial features such as the eyes, nose, jawline, and mouth in order to align faces to match a templated pose.
This analysis shows that since its initial publication in 2012, the HELEN dataset has been used in over 200 research projects related to facial recognition with the vast majority of research taking place in China.
Commercial use includes IBM, NVIDIA, NEC, Microsoft Research Asia, Google, Megvii, Microsoft, Intel, Daimler, Tencent, Baidu, Adobe, Facebook
Military and Defense Usage includes NUDT
TODO
| Organization | Paper | Link | Year | Used Duke MTMC |
|---|---|---|---|---|
| SenseTime, Amazon | Look at Boundary: A Boundary-Aware Face Alignment Algorithm | |||
| 2018 | year | ✔ | ||
| SenseTime | ReenactGAN: Learning to Reenact Faces via Boundary Transfer | 2018 | year | ✔ |
The dataset was used for training the OpenFace software "we used the HELEN and LFPW training subsets for training and the rest for testing" https://github.com/TadasBaltrusaitis/OpenFace/wiki/Datasets
The popular dlib facial landmark detector was trained using HELEN
In addition to the 200+ verified citations, the HELEN dataset was used for
It's been converted into new datasets including
The original site
This bar chart presents a ranking of the top countries where dataset citations originated. Mouse over individual columns to see yearly totals. These charts show at most the top 10 countries.
To help understand how Helen Dataset has been used around the world by commercial, military, and academic organizations; existing publicly available research citing Helen Dataset was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.
The dataset citations used in the visualizations were collected from Semantic Scholar, a website which aggregates and indexes research papers. Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train or test machine learning algorithms. If you use our data, please cite our work.
Age and gender estimation distribution were calculated by anlayzing all faces in the dataset images. This may include additional faces appearing next to an annotated face, or this may skip false faces that were erroneously included as part of the original dataset. These numbers are provided as an estimation and not a factual representation of the exact gender and age of all faces.
If you find this analysis helpful, please cite our work:
@online{megapixels,
author = {Harvey, Adam. LaPlace, Jules.},
title = {MegaPixels: Origins, Ethics, and Privacy Implications of Publicly Available Face Recognition Image Datasets},
year = 2019,
url = {https://megapixels.cc/},
urldate = {2019-04-18}
}
If you find the HELEN dataset useful or reference it in your work, please cite the author's original work as:
@inproceedings{Le2012InteractiveFF,
title={Interactive Facial Feature Localization},
author={Vuong Le and Jonathan Brandt and Zhe L. Lin and Lubomir D. Bourdev and Thomas S. Huang},
booktitle={ECCV},
year={2012}
}