diff options
| author | Adam Harvey <adam@ahprojects.com> | 2019-02-24 15:09:53 +0100 |
|---|---|---|
| committer | Adam Harvey <adam@ahprojects.com> | 2019-02-24 15:09:53 +0100 |
| commit | 1fa504df707246cf1bd8489d2f95a41867b0e1b4 (patch) | |
| tree | c0d4da8a100c602f66ff1b509785a65c0b6057c5 /site/content/pages/datasets_v0/lfw/index.md | |
| parent | 7e33aa7731ffbad5108bb514b635f2bee0daef96 (diff) | |
add site .md, assets
Diffstat (limited to 'site/content/pages/datasets_v0/lfw/index.md')
| -rw-r--r-- | site/content/pages/datasets_v0/lfw/index.md | 160 |
1 files changed, 160 insertions, 0 deletions
diff --git a/site/content/pages/datasets_v0/lfw/index.md b/site/content/pages/datasets_v0/lfw/index.md new file mode 100644 index 00000000..ce76a521 --- /dev/null +++ b/site/content/pages/datasets_v0/lfw/index.md @@ -0,0 +1,160 @@ +------------ + +status: published +title: Labeled Faces in The Wild +desc: LFW: Labeled Faces in The Wild +slug: lfw +published: 2018-12-15 +updated: 2018-12-15 +authors: Adam Harvey + +------------ + +# Labeled Faces in the Wild + ++ Created: 2007 ++ Images: 13,233 ++ People: 5,749 ++ Created From: Yahoo News images ++ Search available: Searchable + +``` +face_search +``` + +``` +name_search +``` + + + +### Intro + +Labeled Faces in The Wild (LFW) is among the most widely used facial recognition training datasets in the world and is the first of its kind to be created entirely from images posted online. The LFW dataset includes 13,233 images of 5,749 people that were collected between 2002-2004. Use the tools below to check if you were included in this dataset or scroll down to read the analysis. + +Three paragraphs describing the LFW dataset in a format that can be easily replicated for the other datasets. Nothing too custom. An analysis of the initial research papers with context relative to all the other dataset papers. + + + + + +### LFW by the Numbers + +- Was first published in 2007 +- Developed out of a prior dataset from Berkely called "Faces in the Wild" or "Names and Faces" [^lfw_original_paper] +- Includes 13,233 images and 5,749 different people [^lfw_website] +- There are about 3 men for every 1 woman (4,277 men and 1,472 women)[^lfw_website] +- The person with the most images is George W. Bush with 530 +- Most people (70%) in the dataset have only 1 image +- Thre are 1,680 people in the dataset with 2 or more images [^lfw_website] +- Two out of 4 of the original authors received funding from the Office of Director of National Intelligence and IARPA for their 2016 LFW survey follow up report +- The LFW dataset includes over 500 actors, 30 models, 10 presidents, 24 football players, 124 basketball players, 11 kings, and 2 queens +- In all the LFW publications provided by the authors the words "ethics", "consent", and "privacy" appear 0 times [^lfw_original_paper], [^lfw_survey], [^lfw_tech_report] , [^lfw_website] +- The word "future" appears 71 times + +### Facts + +- Was created for the purpose of improving "unconstrained face recognition" [^lfw_original_paper] +- All images in LFW were obtained "in the wild" meaning without any consent from the subject or from the photographer +- The faces were detected using the Viola-Jones haarcascade face detector [^lfw_website] [^lfw_survey] +- Is considered the "most popular benchmark for face recognition" [^lfw_baidu] +- Is "the most widely used evaluation set in the field of facial recognition" [^lfw_pingan] +- Is used by several of the largest tech companies in the world including "Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong." [^lfw_pingan] + +- All images were copied from Yahoo News between 2002 - 2004 [^lfw_original_paper] +- SenseTime, who has relied on LFW for benchmarking their facial recognition performance, is the leading provider of surveillance to the Chinese Government + + + + + +### People and Companies using the LFW Dataset + +This section describes who is using the dataset and for what purposes. It should include specific examples of people or companies with citations and screenshots. This section is followed up by the graph, the map, and then the supplementary material. + +The LFW dataset is used by numerous companies for [benchmarking](about/glossary#benchmarking) algorithms and in some cases [training](about/glossary#training). According to the benchmarking results page [^lfw_results] provided by the authors, over 2 dozen companies have contributed their benchmark results. + + +According to BiometricUpdate.com [^lfw_pingan], LFW is "the most widely used evaluation set in the field of facial recognition, LFW attracts a few dozen teams from around the globe including Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong." + +According to researchers at the Baidu Research – Institute of Deep Learning "LFW has been the most popular evaluation benchmark for face recognition, and played a very important role in facilitating the face recognition society to improve algorithm. [^lfw_baidu]." + +In addition to commercial use as an evaluation tool, alll of the faces in LFW dataset are prepackaged into a popular machine learning code framework called scikit-learn. + + + + + + + +In benchmarking, companies use a dataset to evaluate their algorithms which are typically trained on other data. After training, researchers will use LFW as a benchmark to compare results with other algorithms. + +For example, Baidu (est. net worth $13B) uses LFW to report results for their "Targeting Ultimate Accuracy: Face Recognition via Deep Embedding". According to the three Baidu researchers who produced the paper: + + +### Citations + +Overall, LFW has at least 116 citations from 11 countries. + +``` +map +``` + +``` +citations +``` + +### Conclusion + +The LFW face recognition training and evaluation dataset is a historically important face dataset as it was the first popular dataset to be created entirely from Internet images, paving the way for a global trend towards downloading anyone’s face from the Internet and adding it to a dataset. As will be evident with other datasets, LFW’s approach has now become the norm. + +For all the 5,000 people in this datasets, their face is forever a part of facial recognition history. It would be impossible to remove anyone from the dataset because it is so ubiquitous. For their rest of the lives and forever after, these 5,000 people will continue to be used for training facial recognition surveillance. + + +## Code + +```python +#!/usr/bin/python + +# ------------------------------------------------------------ +# +# Script to generate montage of LFW faces used in scikit-learn +# +# ------------------------------------------------------------ + +import numpy as np +from sklearn.datasets import fetch_lfw_people +import imageio +import imutils + +# download LFW dataset (first run takes a while) +lfw_people = fetch_lfw_people(min_faces_per_person=1, resize=1, color=True, funneled=False) + +# introspect dataset +n_samples, h, w, c = lfw_people.images.shape +print(f'{n_samples:,} images at {w}x{h} pixels') +cols, rows = (176, 76) +n_ims = cols * rows + +# build montages +im_scale = 0.5 +ims = lfw_people.images[:n_ims] +montages = imutils.build_montages(ims, (int(w * im_scale, int(h * im_scale)), (cols, rows)) +montage = montages[0] + +# save full montage image +imageio.imwrite('lfw_montage_full.png', montage) + +# make a smaller version +montage_960 = imutils.resize(montage, width=960) +imageio.imwrite('lfw_montage_960.jpg', montage_960) +``` + +[^lfw_baidu]: Jingtuo Liu, Yafeng Deng, Tao Bai, Zhengping Wei, Chang Huang. Targeting Ultimate Accuracy: Face Recognition via Deep Embedding. <https://arxiv.org/abs/1506.07310> +[^lfw_readsense]: ReadSense.ai <https://readsense.ai> +[^lfw_easen]: Easen Electron <http://english.easen-electron.com/news/gsxw/2017lfw.html> +[^lfw_pingan]: Lee, Justin. "PING AN Tech facial recognition receives high score in latest LFW test results". BiometricUpdate.com. Feb 13, 2017. <https://www.biometricupdate.com/201702/ping-an-tech-facial-recognition-receives-high-score-in-latest-lfw-test-results> +[^biometric_update_lfw]: "PING AN Tech facial recognition receives high score in latest LFW test results". <https://www.biometricupdate.com/201702/ping-an-tech-facial-recognition-receives-high-score-in-latest-lfw-test-results> +[^lfw_baidu_wuzhen]: "Chinese tourist town uses face recognition as an entry pass". New Scientist. November 17, 2016. <https://www.newscientist.com/article/2113176-chinese-tourist-town-uses-face-recognition-as-an-entry-pass/> +[^lfw_results]: "LFW Results". Accessed Dec 3, 2018. <http://vis-www.cs.umass.edu/lfw/results.html> +[^feret_website**]: "Face Recognition Technology (FERET)". <https://www.nist.gov/programs-projects/face-recognition-technology-feret> |
