Disclaimer

MegaPixels is an educational art project designed to encourage discourse about facial recognition datasets. Any ethical or legal issues should be directed to the researcher's parent organizations. Except where necessary for contact or clarity, the names of researchers have been subsituted by their parent organization. In no way does this project aim to villify researchers who produced the datasets.

diff --git a/site/public/datasets/vgg_face2/index.html b/site/public/datasets/vgg_face2/index.html index 24a1059b..817fc9a0 100644 --- a/site/public/datasets/vgg_face2/index.html +++ b/site/public/datasets/vgg_face2/index.html @@ -28,12 +28,7 @@

VGG Faces2

Created

2018

Images

3.3M

People

9,000

Created From

Scraping search engines

Search available

[Searchable](#)

VGG Face2 is the updated version of the VGG Face dataset and now includes over 3.3M face images from over 9K people. The identities were selected by taking the top 500K identities in Google's Knowledge Graph of celebrities and then selecting only the names that yielded enough training images. The dataset was created in the UK but funded by Office of Director of National Intelligence in the United States.

{INSERT IMAGE SEARCH MODULE}

{INSERT TEXT SEARCH MODULE}

load file: lfw_names_gender_kg_min.csv
-Name, Images, Gender, Description
-

VGG Face2 by the Numbers

1,331 actresses, 139 presidents
3 husbands and 16 wives
The original VGGF2 name list has been updated with the results returned from Google Knowledge
Names with a similarity score greater than 0.75 where automatically updated. Scores computed using import difflib; seq = difflib.SequenceMatcher(a=a.lower(), b=b.lower()); score = seq.ratio()
The 97 names with a score of 0.75 or lower were manually reviewed and includes name changes validating using Wikipedia.org results for names such as "Bruce Jenner" to "Caitlyn Jenner", spousal last-name changes, and discretionary changes to improve search results such as combining nicknames with full name when appropriate, for example changing "Aleksandar Petrović" to "Aleksandar 'Aco' Petrović" and minor changes such as "Mohammad Ali" to "Muhammad Ali"
The 'Description` text was automatically added when the Knowledge Graph score was greater than 250
The 'Description' text was automatically added when the Knowledge Graph score was greater than 250

TODO

Date: Sat, 15 Dec 2018 22:14:29 +0100 Subject: rm --- site/public/datasets/vgg_faces2/index.html | 63 ------------------------------ 1 file changed, 63 deletions(-) delete mode 100644 site/public/datasets/vgg_faces2/index.html (limited to 'site/public/datasets') diff --git a/site/public/datasets/vgg_faces2/index.html b/site/public/datasets/vgg_faces2/index.html deleted file mode 100644 index 3f778f71..00000000 --- a/site/public/datasets/vgg_faces2/index.html +++ /dev/null @@ -1,63 +0,0 @@ - - - - MegaPixels - - - - - - - - - -

- -

MegaPixels

- The Darkside of Datasets - -

- Face Search - Datasets - Research - About -

- -

Labeled Faces in The Wild

Created

2007

Images

13,233

People

5,749

Created From

Yahoo News images

Search available

[Searchable](#)

Labeled Faces in The Wild is amongst the most widely used facial recognition training datasets in the world and is the first dataset of its kind to be created entirely from Internet photos. It includes 13,233 images of 5,749 people downloaded from the Internet, otherwise referred to by researchers as “The Wild”.

INTRO

It began in 2002. Researchers at University of Massachusetts Amherst were developing algorithms for facial recognition and they needed more data. Between 2002-2004 they scraped Yahoo News for images of public figures. Two years later they cleaned up the dataset and repackaged it as Labeled Faces in the Wild (LFW).

Since then the LFW dataset has become one of the most widely used datasets used for evaluating face recognition algorithms. The associated research paper “Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments” has been cited 996 times reaching 45 different countries throughout the world.

The faces come from news stories and are mostly celebrities from the entertainment industry, politicians, and villains. It’s a sampling of current affairs and breaking news that has come to pass. The images, detached from their original context now server a new purpose: to train, evaluate, and improve facial recognition.

As the most widely used facial recognition dataset, it can be said that each individual in LFW has, in a small way, contributed to the current state of the art in facial recognition surveillance. John Cusack, Julianne Moore, Barry Bonds, Osama bin Laden, and even Moby are amongst these biometric pillars, exemplar faces provided the visual dimensions of a new computer vision future.

Commercial Use

The dataset is used by numerous companies for benchmarking algorithms. According to the benchmarking results page ¹ provided by the authors, there over 2 dozen commercial uses of the LFW face dataset.

"LFW Results". Accessed Dec 3, 2018. http://vis-www.cs.umass.edu/lfw/results.html ↩

- -

- - - - - \ No newline at end of file -- cgit v1.2.3-70-g09d2

Labeled Faces in the Wild

Intro

Intro

LFW by the Numbers

Code

download LFW dataset (first run takes a while)

introspect dataset

build montages

Disclaimer

save full montage image

make a smaller version

Disclaimer

VGG Faces2

VGG Face2 by the Numbers

VGG Face2 by the Numbers

TODO

Labeled Faces in The Wild

INTRO

Commercial Use