MegaPixels
Helen Dataset
Example images from the HELEN dataset

HELEN Dataset

Helen is a dataset of annotated face images used for facial component localization. It includes 2,330 images from Flickr found by searching for "portrait" combined with terms such as "family", "wedding", "boy", "outdoor", and "studio". 1

The dataset was published in 2012 with the primary motivation listed as facilitating "high quality editing of portraits". However, the paper's introduction also mentions that facial feature localization "is an essential component for face recognition, tracking and expression analysis." 1

Irregardless of the authors' primary motivations, the HELEN dataset has become one of the most widely used datasets for training facial landmark algorithms, which are essential parts of most facial recogntion processing systems. Facial landmarking are used to isolate facial features such as the eyes, nose, jawline, and mouth in order to align faces to match a templated pose.

 An example annotation from the HELEN dataset showing 194 points that were originally annotated by Mechanical Turk workers. Graphic © 2019 MegaPixels.cc based on data from HELEN dataset by  Le, Vuong et al.
An example annotation from the HELEN dataset showing 194 points that were originally annotated by Mechanical Turk workers. Graphic © 2019 MegaPixels.cc based on data from HELEN dataset by Le, Vuong et al.

This analysis shows that since its initial publication in 2012, the HELEN dataset has been used in over 200 research projects related to facial recognition with the vast majority of research taking place in China.

Commercial use includes IBM, NVIDIA, NEC, Microsoft Research Asia, Google, Megvii, Microsoft, Intel, Daimler, Tencent, Baidu, Adobe, Facebook

Military and Defense Usage includes NUDT

http://eccv2012.unifi.it/

TODO

Organization Paper Link Year Used Duke MTMC
SenseTime, Amazon Look at Boundary: A Boundary-Aware Face Alignment Algorithm
2018 year
SenseTime ReenactGAN: Learning to Reenact Faces via Boundary Transfer 2018 year

The dataset was used for training the OpenFace software "we used the HELEN and LFPW training subsets for training and the rest for testing" https://github.com/TadasBaltrusaitis/OpenFace/wiki/Datasets

The popular dlib facial landmark detector was trained using HELEN

In addition to the 200+ verified citations, the HELEN dataset was used for

It's been converted into new datasets including

The original site

Example Images

 An image from the HELEN dataset "wedding" category used for training face recognition  2839127417_1.jpg for outdoor studio
An image from the HELEN dataset "wedding" category used for training face recognition 2839127417_1.jpg for outdoor studio
 An image from the HELEN dataset "wedding" category used for training face recognition 2325274893_1
An image from the HELEN dataset "wedding" category used for training face recognition 2325274893_1
 An image from the HELEN dataset "wedding" category used for training face recognition 2325274893_1
An image from the HELEN dataset "wedding" category used for training face recognition 2325274893_1
 An image from the HELEN dataset "wedding" category used for training face recognition 2325274893_1
An image from the HELEN dataset "wedding" category used for training face recognition 2325274893_1
 Original Flickr image used in HELEN facial analysis and recognition dataset for the keyword "family". 296814969
Original Flickr image used in HELEN facial analysis and recognition dataset for the keyword "family". 296814969
 Original Flickr image used in HELEN facial analysis and recognition dataset for the keyword "family". 296814969
Original Flickr image used in HELEN facial analysis and recognition dataset for the keyword "family". 296814969

Who used Helen Dataset?

This bar chart presents a ranking of the top countries where dataset citations originated. Mouse over individual columns to see yearly totals. These charts show at most the top 10 countries.

Information Supply Chain

To help understand how Helen Dataset has been used around the world by commercial, military, and academic organizations; existing publicly available research citing Helen Dataset was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.

Citation data is collected using SemanticScholar.org then dataset usage verified and geolocated.

Dataset Citations

The dataset citations used in the visualizations were collected from Semantic Scholar, a website which aggregates and indexes research papers. Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train or test machine learning algorithms. If you use our data, please cite our work.

Supplementary Information

Age and Gender Distribution

Age and gender estimation distribution were calculated by anlayzing all faces in the dataset images. This may include additional faces appearing next to an annotated face, or this may skip false faces that were erroneously included as part of the original dataset. These numbers are provided as an estimation and not a factual representation of the exact gender and age of all faces.

 Visualization of the HELEN dataset 194-point facial landmark annotations. Credit: graphic © MegaPixels.cc 2019, data from HELEN dataset by Zhou, Brand, Lin 2013. If you use this image please credit both the graphic and data source.
Visualization of the HELEN dataset 194-point facial landmark annotations. Credit: graphic © MegaPixels.cc 2019, data from HELEN dataset by Zhou, Brand, Lin 2013. If you use this image please credit both the graphic and data source.

Cite Our Work

If you find this analysis helpful, please cite our work:

@online{megapixels,
  author = {Harvey, Adam. LaPlace, Jules.},
  title = {MegaPixels: Origins, Ethics, and Privacy Implications of Publicly Available Face Recognition Image Datasets},
  year = 2019,
  url = {https://megapixels.cc/},
  urldate = {2019-04-18}
}

Cite the Original Author's Work

If you find the HELEN dataset useful or reference it in your work, please cite the author's original work as:

@inproceedings{Le2012InteractiveFF,
 title={Interactive Facial Feature Localization},
 author={Vuong Le and Jonathan Brandt and Zhe L. Lin and Lubomir D. Bourdev and Thomas S. Huang},
 booktitle={ECCV},
 year={2012}
}

References

  • 1 abLe, Vuong et al. “Interactive Facial Feature Localization.” ECCV (2012).