From c8e7a10be948c2405d46d8c3caf4a8c6675eee29 Mon Sep 17 00:00:00 2001 From: Jules Laplace Date: Wed, 27 Feb 2019 19:35:54 +0100 Subject: rebuild --- site/public/datasets/lfw/index.html | 118 +++++++++++++++++------------- site/public/datasets/vgg_face2/index.html | 33 +-------- 2 files changed, 72 insertions(+), 79 deletions(-) (limited to 'site/public/datasets') diff --git a/site/public/datasets/lfw/index.html b/site/public/datasets/lfw/index.html index a6226720..f83d8a66 100644 --- a/site/public/datasets/lfw/index.html +++ b/site/public/datasets/lfw/index.html @@ -4,7 +4,7 @@ MegaPixels - + @@ -27,54 +27,60 @@
-

Labeled Faces in the Wild

-
Created
2007
Images
13,233
People
5,749
Created From
Yahoo News images
Search available
Searchable
Eighteen of the 5,749 people in the Labeled Faces in the Wild Dataset. The most widely used face dataset for benchmarking commercial face recognition algorithms.
Eighteen of the 5,749 people in the Labeled Faces in the Wild Dataset. The most widely used face dataset for benchmarking commercial face recognition algorithms.

Intro

-

Labeled Faces in The Wild (LFW) is among the most widely used facial recognition training datasets in the world and is the first of its kind to be created entirely from images posted online. The LFW dataset includes 13,233 images of 5,749 people that were collected between 2002-2004. Use the tools below to check if you were included in this dataset or scroll down to read the analysis.

-

Three paragraphs describing the LFW dataset in a format that can be easily replicated for the other datasets. Nothing too custom. An analysis of the initial research papers with context relative to all the other dataset papers.

-
 From George W. Bush to Jamie Lee Curtis: all 5,749 people in the LFW Dataset sorted from most to least images collected.
From George W. Bush to Jamie Lee Curtis: all 5,749 people in the LFW Dataset sorted from most to least images collected.

LFW by the Numbers

+

LFW

+
Years
2002-2004
Images
13,233
Identities
5,749
Origin
Yahoo News Images
Funding
(Possibly, partially CIA*)
Eighteen of the 5,749 people in the Labeled Faces in the Wild Dataset. The most widely used face dataset for benchmarking commercial face recognition algorithms.
Eighteen of the 5,749 people in the Labeled Faces in the Wild Dataset. The most widely used face dataset for benchmarking commercial face recognition algorithms.

Labeled Faces in The Wild (LFW) is "a database of face photographs designed for studying the problem of unconstrained face recognition[^lfw_www]. It is used to evaluate and improve the performance of facial recognition algorithms in academic, commercial, and government research. According to BiometricUpdate.com[^lfw_pingan], LFW is "the most widely used evaluation set in the field of facial recognition, LFW attracts a few dozen teams from around the globe including Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong."

+

The LFW dataset includes 13,233 images of 5,749 people that were collected between 2002-2004. LFW is a subset of Names of Faces and is part of the first facial recognition training dataset created entirely from images appearing on the Internet. The people appearing in LFW are...

+

The Names and Faces dataset was the first face recognition dataset created entire from online photos. However, Names and Faces and LFW are not the first face recognition dataset created entirely "in the wild". That title belongs to the UCD dataset. Images obtained "in the wild" means using an image without explicit consent or awareness from the subject or photographer.

+

Analysis

    -
  • Was first published in 2007
  • -
  • Developed out of a prior dataset from Berkely called "Faces in the Wild" or "Names and Faces" [^lfw_original_paper]
  • -
  • Includes 13,233 images and 5,749 different people [^lfw_website]
  • -
  • There are about 3 men for every 1 woman (4,277 men and 1,472 women)[^lfw_website]
  • -
  • The person with the most images is George W. Bush with 530
  • -
  • Most people (70%) in the dataset have only 1 image
  • -
  • Thre are 1,680 people in the dataset with 2 or more images [^lfw_website]
  • -
  • Two out of 4 of the original authors received funding from the Office of Director of National Intelligence and IARPA for their 2016 LFW survey follow up report
  • -
  • The LFW dataset includes over 500 actors, 30 models, 10 presidents, 24 football players, 124 basketball players, 11 kings, and 2 queens
  • -
  • In all the LFW publications provided by the authors the words "ethics", "consent", and "privacy" appear 0 times [^lfw_original_paper], [^lfw_survey], [^lfw_tech_report] , [^lfw_website]
  • +
  • There are about 3 men for every 1 woman (4,277 men and 1,472 women) in the LFW dataset[^lfw_www]
  • +
  • The person with the most images is George W. Bush with 530
  • +
  • There are about 3 George W. Bush's for every 1 Tony Blair
  • +
  • 70% of people in the dataset have only 1 image and 29% have 2 or more images
  • +
  • The LFW dataset includes over 500 actors, 30 models, 10 presidents, 124 basketball players, 24 football players, 11 kings, 7 queens, and 1 Moby
  • +
  • In all 3 of the LFW publications [^lfw_original_paper], [^lfw_survey], [^lfw_tech_report] the words "ethics", "consent", and "privacy" appear 0 times
  • The word "future" appears 71 times
-

Facts

+

Synthetic Faces

+

To visualize the types of photos in the dataset without explicitly publishing individual's identities a generative adversarial network (GAN) was trained on the entire dataset. The images in this video show a neural network learning the visual latent space and then interpolating between archetypical identities within the LFW dataset.

+

Biometric Trade Routes

+

To understand how this dataset has been used, its citations have been geocoded to show an approximate geographic digital trade route of the biometric data. Lines indicate an organization (education, commercial, or governmental) that has cited the LFW dataset in their research. Data is compiled from SemanticScholar.

+

[add map here]

+

Citations

+

Browse or download the geocoded citation data collected for the LFW dataset.

+

[add citations table here]

+

Additional Information

+

(tweet-sized snippets go here)

    -
  • Was created for the purpose of improving "unconstrained face recognition" [^lfw_original_paper]
  • -
  • All images in LFW were obtained "in the wild" meaning without any consent from the subject or from the photographer
  • -
  • The faces were detected using the Viola-Jones haarcascade face detector [^lfw_website] [^lfw_survey]
  • -
  • Is considered the "most popular benchmark for face recognition" [^lfw_baidu]
  • -
  • Is "the most widely used evaluation set in the field of facial recognition" [^lfw_pingan]
  • -
  • Is used by several of the largest tech companies in the world including "Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong." [^lfw_pingan]

    -
  • -
  • All images were copied from Yahoo News between 2002 - 2004 [^lfw_original_paper]

    +
  • The LFW dataset is considered the "most popular benchmark for face recognition" [^lfw_baidu]
  • +
  • The LFW dataset is "the most widely used evaluation set in the field of facial recognition" [^lfw_pingan]
  • +
  • All images in LFW dataset were obtained "in the wild" meaning without any consent from the subject or from the photographer
  • +
  • The faces in the LFW dataset were detected using the Viola-Jones haarcascade face detector [^lfw_website] [^lfw-survey]
  • +
  • The LFW dataset is used by several of the largest tech companies in the world including "Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong." [^lfw_pingan]
  • +
  • All images in the LFW dataset were copied from Yahoo News between 2002 - 2004 +<<<<<<< HEAD
  • +
  • In 2014, two of the four original authors of the LFW dataset received funding from IARPA and ODNI for their follow up paper Labeled Faces in the Wild: Updates and New Reporting Procedures via IARPA contract number 2014-14071600010
  • +
  • The dataset includes 2 images of George Tenet, the former Director of Central Intelligence (DCI) for the Central Intelligence Agency whose facial biometrics were eventually used to help train facial recognition software in China and Russia

  • -
  • SenseTime, who has relied on LFW for benchmarking their facial recognition performance, is the leading provider of surveillance to the Chinese Government
  • +
  • In 2014, 2/4 of the original authors of the LFW dataset received funding from IARPA and ODNI for their follow up paper "Labeled Faces in the Wild: Updates and New Reporting Procedures" via IARPA contract number 2014-14071600010
  • +
  • The LFW dataset was used Center for Intelligent Information Retrieval, the Central Intelligence Agency, the National Security Agency and National
+

TODO (need citations for the following)

+
    +
  • SenseTime, who has relied on LFW for benchmarking their facial recognition performance, is one the leading provider of surveillance to the Chinese Government [need citation for this fact. is it the most? or is that Tencent?]
  • +
  • Two out of 4 of the original authors received funding from the Office of Director of National Intelligence and IARPA for their 2016 LFW survey follow up report
  • +
+

> 13d7a450affe8ea4f368a97ea2014faa17702a4c

+
+
+
+
+
+
+
 former President George W. Bush
former President George W. Bush
-
 Colin Powell (236), Tony Blair (144), and Donald Rumsfeld (121)
Colin Powell (236), Tony Blair (144), and Donald Rumsfeld (121)

People and Companies using the LFW Dataset

-

This section describes who is using the dataset and for what purposes. It should include specific examples of people or companies with citations and screenshots. This section is followed up by the graph, the map, and then the supplementary material.

-

The LFW dataset is used by numerous companies for benchmarking algorithms and in some cases training. According to the benchmarking results page [^lfw_results] provided by the authors, over 2 dozen companies have contributed their benchmark results.

-

According to BiometricUpdate.com [^lfw_pingan], LFW is "the most widely used evaluation set in the field of facial recognition, LFW attracts a few dozen teams from around the globe including Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong."

-

According to researchers at the Baidu Research – Institute of Deep Learning "LFW has been the most popular evaluation benchmark for face recognition, and played a very important role in facilitating the face recognition society to improve algorithm. [^lfw_baidu]."

-

In addition to commercial use as an evaluation tool, alll of the faces in LFW dataset are prepackaged into a popular machine learning code framework called scikit-learn.

-
 "PING AN Tech facial recognition receives high score in latest LFW test results"
"PING AN Tech facial recognition receives high score in latest LFW test results"
-
 "Face Recognition Performance in LFW benchmark"
"Face Recognition Performance in LFW benchmark"
-
 "The 1st place in face verification challenge, LFW"
"The 1st place in face verification challenge, LFW"

In benchmarking, companies use a dataset to evaluate their algorithms which are typically trained on other data. After training, researchers will use LFW as a benchmark to compare results with other algorithms.

-

For example, Baidu (est. net worth $13B) uses LFW to report results for their "Targeting Ultimate Accuracy: Face Recognition via Deep Embedding". According to the three Baidu researchers who produced the paper:

-

Citations

-

Overall, LFW has at least 116 citations from 11 countries.

-

Conclusion

-

The LFW face recognition training and evaluation dataset is a historically important face dataset as it was the first popular dataset to be created entirely from Internet images, paving the way for a global trend towards downloading anyone’s face from the Internet and adding it to a dataset. As will be evident with other datasets, LFW’s approach has now become the norm.

-

For all the 5,000 people in this datasets, their face is forever a part of facial recognition history. It would be impossible to remove anyone from the dataset because it is so ubiquitous. For their rest of the lives and forever after, these 5,000 people will continue to be used for training facial recognition surveillance.

-

Code

+
 Colin Powell (236), Tony Blair (144), and Donald Rumsfeld (121)
Colin Powell (236), Tony Blair (144), and Donald Rumsfeld (121)
All 5,379 faces in the Labeled Faces in The Wild Dataset
All 5,379 faces in the Labeled Faces in The Wild Dataset

Code

+

The LFW dataset is so widely used that a popular code library called Sci-Kit Learn includes a function called fetch_lfw_people to download the faces in the LFW dataset.

#!/usr/bin/python
 
 import numpy as np
@@ -87,26 +93,38 @@ lfw_people = fetch_lfw_people(min_faces_per_person=1, resize=1, color=True, funn
 
 # introspect dataset
 n_samples, h, w, c = lfw_people.images.shape
-print('{:,} images at {}x{}'.format(n_samples, w, h))
+print(f'{n_samples:,} images at {w}x{h} pixels')
 cols, rows = (176, 76)
 n_ims = cols * rows
 
 # build montages
 im_scale = 0.5
-ims = lfw_people.images[:n_ims
-montages = imutils.build_montages(ims, (int(w*im_scale, int(h*im_scale)), (cols, rows))
+ims = lfw_people.images[:n_ims]
+montages = imutils.build_montages(ims, (int(w * im_scale,   int(h * im_scale)), (cols, rows))
 montage = montages[0]
 
 # save full montage image
 imageio.imwrite('lfw_montage_full.png', montage)
 
 # make a smaller version
-montage_960 = imutils.resize(montage, width=960)
-imageio.imwrite('lfw_montage_960.jpg', montage_960)
+montage = imutils.resize(montage, width=960)
+imageio.imwrite('lfw_montage_960.jpg', montage)
 
-

Disclaimer

-

MegaPixels is an educational art project designed to encourage discourse about facial recognition datasets. Any ethical or legal issues should be directed to the researcher's parent organizations. Except where necessary for contact or clarity, the names of researchers have been subsituted by their parent organization. In no way does this project aim to villify researchers who produced the datasets.

-

Read more about MegaPixels Code of Conduct

+

Supplementary Material

+

Text and graphics ©Adam Harvey / megapixels.cc

+

Ignore text below these lines

+

Research

+
    +
  • "In our experiments, we used 10000 images and associated captions from the Faces in the wilddata set [3]."
  • +
  • "This work was supported in part by the Center for Intelligent Information Retrieval, the Central Intelligence Agency, the National Security Agency and National Science Foundation under CAREER award IIS-0546666 and grant IIS-0326249."
  • +
  • From: "People-LDA: Anchoring Topics to People using Face Recognition" https://www.semanticscholar.org/paper/People-LDA%3A-Anchoring-Topics-to-People-using-Face-Jain-Learned-Miller/10f17534dba06af1ddab96c4188a9c98a020a459 and https://ieeexplore.ieee.org/document/4409055
  • +
  • This paper was presented at IEEE 11th ICCV conference Oct 14-21 and the main LFW paper "Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments" was also published that same year
  • +
  • 10f17534dba06af1ddab96c4188a9c98a020a459

    +
  • +
  • This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via contract number 2014-14071600010.

    +
  • +
  • From "Labeled Faces in the Wild: Updates and New Reporting Procedures"
  • +

    diff --git a/site/public/datasets/vgg_face2/index.html b/site/public/datasets/vgg_face2/index.html index b7ba5a4c..08b02cc7 100644 --- a/site/public/datasets/vgg_face2/index.html +++ b/site/public/datasets/vgg_face2/index.html @@ -4,7 +4,7 @@ MegaPixels - + @@ -27,35 +27,10 @@
    -

    VGG Faces2

    -
    Created
    2018
    Images
    3.3M
    People
    9,000
    Created From
    Scraping search engines
    Search available
    [Searchable](#)

    VGG Face2 is the updated version of the VGG Face dataset and now includes over 3.3M face images from over 9K people. The identities were selected by taking the top 500K identities in Google's Knowledge Graph of celebrities and then selecting only the names that yielded enough training images. The dataset was created in the UK but funded by Office of Director of National Intelligence in the United States.

    -

    VGG Face2 by the Numbers

    +

    VGG Face 2

    +
    Years
    TBD
    Images
    TBD
    Identities
    TBD
    Origin
    TBD
    Funding
    IARPA
    ...
    ...

    Analysis

      -
    • 1,331 actresses, 139 presidents
    • -
    • 3 husbands and 16 wives
    • -
    • 2 snooker player
    • -
    • 1 guru
    • -
    • 1 pornographic actress
    • -
    • 3 computer programmer
    • -
    -

    Names and descriptions

    -
      -
    • The original VGGF2 name list has been updated with the results returned from Google Knowledge
    • -
    • Names with a similarity score greater than 0.75 where automatically updated. Scores computed using import difflib; seq = difflib.SequenceMatcher(a=a.lower(), b=b.lower()); score = seq.ratio()
    • -
    • The 97 names with a score of 0.75 or lower were manually reviewed and includes name changes validating using Wikipedia.org results for names such as "Bruce Jenner" to "Caitlyn Jenner", spousal last-name changes, and discretionary changes to improve search results such as combining nicknames with full name when appropriate, for example changing "Aleksandar Petrović" to "Aleksandar 'Aco' Petrović" and minor changes such as "Mohammad Ali" to "Muhammad Ali"
    • -
    • The 'Description' text was automatically added when the Knowledge Graph score was greater than 250
    • -
    -

    TODO

    -
      -
    • create name list, and populate with Knowledge graph information like LFW
    • -
    • make list of interesting number stats, by the numbers
    • -
    • make list of interesting important facts
    • -
    • write intro abstract
    • -
    • write analysis of usage
    • -
    • find examples, citations, and screenshots of useage
    • -
    • find list of companies using it for table
    • -
    • create montages of the dataset, like LFW
    • -
    • create right to removal information
    • +
    • The VGG Face 2 dataset includes approximately 1,331 actresses, 139 presidents, 16 wives, 3 husbands, 2 snooker player, and 1 guru
    -- cgit v1.2.3-70-g09d2 From 67896d3cdde877de940a282bebacd10ca1c56499 Mon Sep 17 00:00:00 2001 From: Jules Laplace Date: Wed, 27 Feb 2019 20:29:08 +0100 Subject: site watcher / loader --- README.md | 2 +- megapixels/app/site/builder.py | 22 ++-- megapixels/app/site/loader.py | 123 +++++++++++++++++++ megapixels/app/site/parser.py | 204 ++++++++----------------------- megapixels/commands/site/watch.py | 44 +++++++ site/assets/css/css.css | 1 + site/content/pages/datasets/lfw/index.md | 55 ++++----- site/public/datasets/lfw/index.html | 43 ++----- 8 files changed, 266 insertions(+), 228 deletions(-) create mode 100644 megapixels/app/site/loader.py create mode 100644 megapixels/commands/site/watch.py (limited to 'site/public/datasets') diff --git a/README.md b/README.md index e1a2c1d0..e46a6289 100644 --- a/README.md +++ b/README.md @@ -19,7 +19,7 @@ pip install numpy Pillow pip install dlib pip install requests simplejson click pdfminer.six pip install urllib3 flask flask_sqlalchemy mysql-connector -pip install pymediainfo tqdm opencv-python imutils +pip install pymediainfo tqdm opencv-python imutils watchdog pip install scikit-image python-dotenv imagehash scikit-learn colorlog pip install celery keras tensorflow pip install python.app # OSX only! needed for matplotlib diff --git a/megapixels/app/site/builder.py b/megapixels/app/site/builder.py index 188fbc25..15055110 100644 --- a/megapixels/app/site/builder.py +++ b/megapixels/app/site/builder.py @@ -7,6 +7,7 @@ from jinja2 import Environment, FileSystemLoader, select_autoescape import app.settings.app_cfg as cfg import app.site.s3 as s3 +import app.site.loader as loader import app.site.parser as parser env = Environment( @@ -21,7 +22,7 @@ def build_page(fn, research_posts, datasets): - syncs any assets with s3 - handles certain index pages... """ - metadata, sections = parser.read_metadata(fn) + metadata, sections = loader.read_metadata(fn) if metadata is None: print("{} has no metadata".format(fn)) @@ -55,7 +56,7 @@ def build_page(fn, research_posts, datasets): if 'index.md' in fn: s3.sync_directory(dirname, s3_dir, metadata) - content = parser.parse_markdown(sections, s3_path, skip_h1=skip_h1) + content = parser.parse_markdown(metadata, sections, s3_path, skip_h1=skip_h1) html = template.render( metadata=metadata, @@ -73,11 +74,11 @@ def build_index(key, research_posts, datasets): """ build the index of research (blog) posts """ - metadata, sections = parser.read_metadata(os.path.join(cfg.DIR_SITE_CONTENT, key, 'index.md')) + metadata, sections = loader.read_metadata(os.path.join(cfg.DIR_SITE_CONTENT, key, 'index.md')) template = env.get_template("page.html") s3_path = s3.make_s3_path(cfg.S3_SITE_PATH, metadata['path']) - content = parser.parse_markdown(sections, s3_path, skip_h1=False) - content += parser.parse_research_index(research_posts) + content = parser.parse_markdown(metadata, sections, s3_path, skip_h1=False) + content += loader.parse_research_index(research_posts) html = template.render( metadata=metadata, content=content, @@ -93,8 +94,8 @@ def build_site(): """ build the site! =^) """ - research_posts = parser.read_research_post_index() - datasets = parser.read_datasets_index() + research_posts = loader.read_research_post_index() + datasets = loader.read_datasets_index() for fn in glob.iglob(os.path.join(cfg.DIR_SITE_CONTENT, "**/*.md"), recursive=True): build_page(fn, research_posts, datasets) build_index('research', research_posts, datasets) @@ -103,7 +104,8 @@ def build_file(fn): """ build just one page from a filename! =^) """ - research_posts = parser.read_research_post_index() - datasets = parser.read_datasets_index() - fn = os.path.join(cfg.DIR_SITE_CONTENT, fn) + research_posts = loader.read_research_post_index() + datasets = loader.read_datasets_index() + if cfg.DIR_SITE_CONTENT not in fn: + fn = os.path.join(cfg.DIR_SITE_CONTENT, fn) build_page(fn, research_posts, datasets) diff --git a/megapixels/app/site/loader.py b/megapixels/app/site/loader.py new file mode 100644 index 00000000..691efb25 --- /dev/null +++ b/megapixels/app/site/loader.py @@ -0,0 +1,123 @@ +import os +import re +import glob +import simplejson as json + +import app.settings.app_cfg as cfg + +def read_metadata(fn): + """ + Read in read a markdown file and extract the metadata + """ + with open(fn, "r") as file: + data = file.read() + data = data.replace("\n ", "\n") + if "\n" in data: + data = data.replace("\r", "") + else: + data = data.replace("\r", "\n") + sections = data.split("\n\n") + return parse_metadata(fn, sections) + + +default_metadata = { + 'status': 'published', + 'title': 'Untitled Page', + 'desc': '', + 'slug': '', + 'published': '2018-12-31', + 'updated': '2018-12-31', + 'authors': 'Adam Harvey', + 'sync': 'true', + 'tagline': '', +} + +def parse_metadata(fn, sections): + """ + parse the metadata headers in a markdown file + (everything before the second ---------) + also generates appropriate urls for this page :) + """ + found_meta = False + metadata = {} + valid_sections = [] + for section in sections: + if not found_meta and ': ' in section: + found_meta = True + parse_metadata_section(metadata, section) + continue + if '-----' in section: + continue + if found_meta: + valid_sections.append(section) + + if 'title' not in metadata: + print('warning: {} has no title'.format(fn)) + for key in default_metadata: + if key not in metadata: + metadata[key] = default_metadata[key] + + basedir = os.path.dirname(fn.replace(cfg.DIR_SITE_CONTENT, '')) + basename = os.path.basename(fn) + if basedir == '/': + metadata['path'] = '/' + metadata['url'] = '/' + elif basename == 'index.md': + metadata['path'] = basedir + '/' + metadata['url'] = metadata['path'] + else: + metadata['path'] = basedir + '/' + metadata['url'] = metadata['path'] + basename.replace('.md', '') + '/' + + if metadata['status'] == 'published|draft|private': + metadata['status'] = 'published' + + metadata['sync'] = metadata['sync'] != 'false' + + metadata['author_html'] = '
    '.join(metadata['authors'].split(',')) + + return metadata, valid_sections + +def parse_metadata_section(metadata, section): + """ + parse a metadata key: value pair + """ + for line in section.split("\n"): + if ': ' not in line: + continue + key, value = line.split(': ', 1) + metadata[key.lower()] = value + + +def read_research_post_index(): + """ + Generate an index of the research (blog) posts + """ + return read_post_index('research') + + +def read_datasets_index(): + """ + Generate an index of the datasets + """ + return read_post_index('datasets') + + +def read_post_index(basedir): + """ + Generate an index of posts + """ + posts = [] + for fn in sorted(glob.glob(os.path.join(cfg.DIR_SITE_CONTENT, basedir, '*/index.md'))): + metadata, valid_sections = read_metadata(fn) + if metadata is None or metadata['status'] == 'private' or metadata['status'] == 'draft': + continue + posts.append(metadata) + if not len(posts): + posts.append({ + 'title': 'Placeholder', + 'slug': 'placeholder', + 'date': 'Placeholder', + 'url': '/', + }) + return posts diff --git a/megapixels/app/site/parser.py b/megapixels/app/site/parser.py index d6705214..3792e6f1 100644 --- a/megapixels/app/site/parser.py +++ b/megapixels/app/site/parser.py @@ -10,6 +10,49 @@ import app.site.s3 as s3 renderer = mistune.Renderer(escape=False) markdown = mistune.Markdown(renderer=renderer) +def parse_markdown(metadata, sections, s3_path, skip_h1=False): + """ + parse page into sections, preprocess the markdown to handle our modifications + """ + groups = [] + current_group = [] + for section in sections: + if skip_h1 and section.startswith('# '): + continue + elif section.strip().startswith('```'): + groups.append(format_section(current_group, s3_path)) + current_group = [] + current_group.append(section) + if section.strip().endswith('```'): + groups.append(format_applet("\n\n".join(current_group), s3_path)) + current_group = [] + elif section.strip().endswith('```'): + current_group.append(section) + groups.append(format_applet("\n\n".join(current_group), s3_path)) + current_group = [] + elif section.startswith('+ '): + groups.append(format_section(current_group, s3_path)) + groups.append(format_metadata(section)) + current_group = [] + elif '![fullwidth:' in section: + groups.append(format_section(current_group, s3_path)) + groups.append(format_section([section], s3_path, type='fullwidth')) + current_group = [] + elif '![wide:' in section: + groups.append(format_section(current_group, s3_path)) + groups.append(format_section([section], s3_path, type='wide')) + current_group = [] + elif '![' in section: + groups.append(format_section(current_group, s3_path)) + groups.append(format_section([section], s3_path, type='images')) + current_group = [] + else: + current_group.append(section) + groups.append(format_section(current_group, s3_path)) + content = "".join(groups) + return content + + def fix_images(lines, s3_path): """ do our own tranformation of the markdown around images to handle wide images etc @@ -32,6 +75,7 @@ def fix_images(lines, s3_path): real_lines.append(line) return "\n".join(real_lines) + def format_section(lines, s3_path, type=''): """ format a normal markdown section @@ -44,6 +88,7 @@ def format_section(lines, s3_path, type=''): return "
    " + markdown(lines) + "
    " return "" + def format_metadata(section): """ format a metadata section (+ key: value pairs) @@ -54,7 +99,11 @@ def format_metadata(section): meta.append("
    {}
    {}
    ".format(key, value)) return "
    {}
    ".format(''.join(meta)) + def format_applet(section, s3_path): + """ + Format the applets, which load javascript modules like the map and CSVs + """ # print(section) payload = section.strip('```').strip().strip('```').strip().split('\n') applet = {} @@ -79,47 +128,6 @@ def format_applet(section, s3_path): applet['fields'] = payload[1:] return "
    ".format(json.dumps(applet)) -def parse_markdown(sections, s3_path, skip_h1=False): - """ - parse page into sections, preprocess the markdown to handle our modifications - """ - groups = [] - current_group = [] - for section in sections: - if skip_h1 and section.startswith('# '): - continue - elif section.strip().startswith('```'): - groups.append(format_section(current_group, s3_path)) - current_group = [] - current_group.append(section) - if section.strip().endswith('```'): - groups.append(format_applet("\n\n".join(current_group), s3_path)) - current_group = [] - elif section.strip().endswith('```'): - current_group.append(section) - groups.append(format_applet("\n\n".join(current_group), s3_path)) - current_group = [] - elif section.startswith('+ '): - groups.append(format_section(current_group, s3_path)) - groups.append(format_metadata(section)) - current_group = [] - elif '![fullwidth:' in section: - groups.append(format_section(current_group, s3_path)) - groups.append(format_section([section], s3_path, type='fullwidth')) - current_group = [] - elif '![wide:' in section: - groups.append(format_section(current_group, s3_path)) - groups.append(format_section([section], s3_path, type='wide')) - current_group = [] - elif '![' in section: - groups.append(format_section(current_group, s3_path)) - groups.append(format_section([section], s3_path, type='images')) - current_group = [] - else: - current_group.append(section) - groups.append(format_section(current_group, s3_path)) - content = "".join(groups) - return content def parse_research_index(research_posts): """ @@ -141,117 +149,3 @@ def parse_research_index(research_posts): content += row content += '
    ' return content - -def read_metadata(fn): - """ - Read in read a markdown file and extract the metadata - """ - with open(fn, "r") as file: - data = file.read() - data = data.replace("\n ", "\n") - if "\n" in data: - data = data.replace("\r", "") - else: - data = data.replace("\r", "\n") - sections = data.split("\n\n") - return parse_metadata(fn, sections) - -default_metadata = { - 'status': 'published', - 'title': 'Untitled Page', - 'desc': '', - 'slug': '', - 'published': '2018-12-31', - 'updated': '2018-12-31', - 'authors': 'Adam Harvey', - 'sync': 'true', - 'tagline': '', -} - -def parse_metadata_section(metadata, section): - """ - parse a metadata key: value pair - """ - for line in section.split("\n"): - if ': ' not in line: - continue - key, value = line.split(': ', 1) - metadata[key.lower()] = value - -def parse_metadata(fn, sections): - """ - parse the metadata headers in a markdown file - (everything before the second ---------) - also generates appropriate urls for this page :) - """ - found_meta = False - metadata = {} - valid_sections = [] - for section in sections: - if not found_meta and ': ' in section: - found_meta = True - parse_metadata_section(metadata, section) - continue - if '-----' in section: - continue - if found_meta: - valid_sections.append(section) - - if 'title' not in metadata: - print('warning: {} has no title'.format(fn)) - for key in default_metadata: - if key not in metadata: - metadata[key] = default_metadata[key] - - basedir = os.path.dirname(fn.replace(cfg.DIR_SITE_CONTENT, '')) - basename = os.path.basename(fn) - if basedir == '/': - metadata['path'] = '/' - metadata['url'] = '/' - elif basename == 'index.md': - metadata['path'] = basedir + '/' - metadata['url'] = metadata['path'] - else: - metadata['path'] = basedir + '/' - metadata['url'] = metadata['path'] + basename.replace('.md', '') + '/' - - if metadata['status'] == 'published|draft|private': - metadata['status'] = 'published' - - metadata['sync'] = metadata['sync'] != 'false' - - metadata['author_html'] = '
    '.join(metadata['authors'].split(',')) - - return metadata, valid_sections - -def read_research_post_index(): - """ - Generate an index of the research (blog) posts - """ - return read_post_index('research') - -def read_datasets_index(): - """ - Generate an index of the datasets - """ - return read_post_index('datasets') - -def read_post_index(basedir): - """ - Generate an index of posts - """ - posts = [] - for fn in sorted(glob.glob(os.path.join(cfg.DIR_SITE_CONTENT, basedir, '*/index.md'))): - metadata, valid_sections = read_metadata(fn) - if metadata is None or metadata['status'] == 'private' or metadata['status'] == 'draft': - continue - posts.append(metadata) - if not len(posts): - posts.append({ - 'title': 'Placeholder', - 'slug': 'placeholder', - 'date': 'Placeholder', - 'url': '/', - }) - return posts - diff --git a/megapixels/commands/site/watch.py b/megapixels/commands/site/watch.py new file mode 100644 index 00000000..7fd3ba7c --- /dev/null +++ b/megapixels/commands/site/watch.py @@ -0,0 +1,44 @@ +""" +Watch for changes in the static site and build them +""" + +import click +import time +from watchdog.observers import Observer +from watchdog.events import PatternMatchingEventHandler + +import app.settings.app_cfg as cfg +from app.site.builder import build_site, build_file + +class SiteBuilder(PatternMatchingEventHandler): + """ + Handler for filesystem changes to the content path + """ + patterns = ["*.md"] + + def on_modified(self, event): + print(event.src_path, event.event_type) + build_file(event.src_path) + + def on_created(self, event): + print(event.src_path, event.event_type) + build_file(event.src_path) + +@click.command() +@click.pass_context +def cli(ctx): + """ + Run the observer and start watching for changes + """ + print("{} is now being watched for changes.".format(cfg.DIR_SITE_CONTENT)) + observer = Observer() + observer.schedule(SiteBuilder(), path=cfg.DIR_SITE_CONTENT, recursive=True) + observer.start() + + try: + while True: + time.sleep(1) + except KeyboardInterrupt: + observer.stop() + + observer.join() diff --git a/site/assets/css/css.css b/site/assets/css/css.css index 858d98eb..7b2e19fc 100644 --- a/site/assets/css/css.css +++ b/site/assets/css/css.css @@ -346,6 +346,7 @@ section.wide .image { } section.fullwidth { width: 100%; + background-size: contain; } section.fullwidth .image { max-width: 100%; diff --git a/site/content/pages/datasets/lfw/index.md b/site/content/pages/datasets/lfw/index.md index 8b37f035..48d86e1f 100644 --- a/site/content/pages/datasets/lfw/index.md +++ b/site/content/pages/datasets/lfw/index.md @@ -4,6 +4,8 @@ status: published title: Labeled Faces in The Wild desc: Labeled Faces in The Wild (LFW) is a database of face photographs designed for studying the problem of unconstrained face recognition subdesc: It includes 13,456 images of 4,432 people’s images copied from the Internet during 2002-2004. +image: lfw_index.gif +caption: Eighteen of the 5,749 people in the Labeled Faces in the Wild Dataset. The most widely used face dataset for benchmarking commercial face recognition algorithms. slug: lfw published: 2019-2-23 updated: 2019-2-23 @@ -12,22 +14,13 @@ authors: Adam Harvey ------------ -# LFW +### Statistics + Years: 2002-2004 + Images: 13,233 + Identities: 5,749 + Origin: Yahoo News Images -+ Funding: (Possibly, partially CIA*) - -![fullwidth:Eighteen of the 5,749 people in the Labeled Faces in the Wild Dataset. The most widely used face dataset for benchmarking commercial face recognition algorithms.](assets/lfw_index.gif) - -*Labeled Faces in The Wild* (LFW) is "a database of face photographs designed for studying the problem of unconstrained face recognition[^lfw_www]. It is used to evaluate and improve the performance of facial recognition algorithms in academic, commercial, and government research. According to BiometricUpdate.com[^lfw_pingan], LFW is "the most widely used evaluation set in the field of facial recognition, LFW attracts a few dozen teams from around the globe including Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong." - -The LFW dataset includes 13,233 images of 5,749 people that were collected between 2002-2004. LFW is a subset of *Names of Faces* and is part of the first facial recognition training dataset created entirely from images appearing on the Internet. The people appearing in LFW are... - -The *Names and Faces* dataset was the first face recognition dataset created entire from online photos. However, *Names and Faces* and *LFW* are not the first face recognition dataset created entirely "in the wild". That title belongs to the [UCD dataset](/datasets/ucd_faces/). Images obtained "in the wild" means using an image without explicit consent or awareness from the subject or photographer. - ++ Funding: (Possibly, partially CIA) ### Analysis @@ -39,25 +32,35 @@ The *Names and Faces* dataset was the first face recognition dataset created ent - In all 3 of the LFW publications [^lfw_original_paper], [^lfw_survey], [^lfw_tech_report] the words "ethics", "consent", and "privacy" appear 0 times - The word "future" appears 71 times +## Labeled Faces in the Wild + +*Labeled Faces in The Wild* (LFW) is "a database of face photographs designed for studying the problem of unconstrained face recognition[^lfw_www]. It is used to evaluate and improve the performance of facial recognition algorithms in academic, commercial, and government research. According to BiometricUpdate.com[^lfw_pingan], LFW is "the most widely used evaluation set in the field of facial recognition, LFW attracts a few dozen teams from around the globe including Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong." + +The LFW dataset includes 13,233 images of 5,749 people that were collected between 2002-2004. LFW is a subset of *Names of Faces* and is part of the first facial recognition training dataset created entirely from images appearing on the Internet. The people appearing in LFW are... + +The *Names and Faces* dataset was the first face recognition dataset created entire from online photos. However, *Names and Faces* and *LFW* are not the first face recognition dataset created entirely "in the wild". That title belongs to the [UCD dataset](/datasets/ucd_faces/). Images obtained "in the wild" means using an image without explicit consent or awareness from the subject or photographer. + ### Synthetic Faces To visualize the types of photos in the dataset without explicitly publishing individual's identities a generative adversarial network (GAN) was trained on the entire dataset. The images in this video show a neural network learning the visual latent space and then interpolating between archetypical identities within the LFW dataset. ![fullwidth:](assets/lfw_synthetic.jpg) - ### Biometric Trade Routes To understand how this dataset has been used, its citations have been geocoded to show an approximate geographic digital trade route of the biometric data. Lines indicate an organization (education, commercial, or governmental) that has cited the LFW dataset in their research. Data is compiled from [SemanticScholar](https://www.semanticscholar.org). -[add map here] +``` +map +``` ### Citations Browse or download the geocoded citation data collected for the LFW dataset. -[add citations table here] - +``` +citations +``` ### Additional Information @@ -69,27 +72,14 @@ Browse or download the geocoded citation data collected for the LFW dataset. - The faces in the LFW dataset were detected using the Viola-Jones haarcascade face detector [^lfw_website] [^lfw-survey] - The LFW dataset is used by several of the largest tech companies in the world including "Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong." [^lfw_pingan] - All images in the LFW dataset were copied from Yahoo News between 2002 - 2004 -<<<<<<< HEAD -- In 2014, two of the four original authors of the LFW dataset received funding from IARPA and ODNI for their follow up paper [Labeled Faces in the Wild: Updates and New Reporting Procedures](https://www.semanticscholar.org/paper/Labeled-Faces-in-the-Wild-%3A-Updates-and-New-Huang-Learned-Miller/2d3482dcff69c7417c7b933f22de606a0e8e42d4) via IARPA contract number 2014-14071600010 +- In 2014, two of the four original authors of the LFW dataset received funding from IARPA and ODNI for their followup paper [Labeled Faces in the Wild: Updates and New Reporting Procedures](https://www.semanticscholar.org/paper/Labeled-Faces-in-the-Wild-%3A-Updates-and-New-Huang-Learned-Miller/2d3482dcff69c7417c7b933f22de606a0e8e42d4) via IARPA contract number 2014-14071600010 - The dataset includes 2 images of [George Tenet](http://vis-www.cs.umass.edu/lfw/person/George_Tenet.html), the former Director of Central Intelligence (DCI) for the Central Intelligence Agency whose facial biometrics were eventually used to help train facial recognition software in China and Russia -======= -- In 2014, 2/4 of the original authors of the LFW dataset received funding from IARPA and ODNI for their follow up paper "Labeled Faces in the Wild: Updates and New Reporting Procedures" via IARPA contract number 2014-14071600010 -- The LFW dataset was used Center for Intelligent Information Retrieval, the Central Intelligence Agency, the National Security Agency and National - -TODO (need citations for the following) - -- SenseTime, who has relied on LFW for benchmarking their facial recognition performance, is one the leading provider of surveillance to the Chinese Government [need citation for this fact. is it the most? or is that Tencent?] -- Two out of 4 of the original authors received funding from the Office of Director of National Intelligence and IARPA for their 2016 LFW survey follow up report - ->>>>>>> 13d7a450affe8ea4f368a97ea2014faa17702a4c ![Person with the most face images in LFW: former President George W. Bush](assets/lfw_montage_top1_640.jpg) ![Persons with the next most face images in LFW: Colin Powell (236), Tony Blair (144), and Donald Rumsfeld (121)](assets/lfw_montage_top2_4_640.jpg) ![All 5,379 faces in the Labeled Faces in The Wild Dataset](assets/lfw_montage_all_crop.jpg) - - ## Code The LFW dataset is so widely used that a popular code library called Sci-Kit Learn includes a function called `fetch_lfw_people` to download the faces in the LFW dataset. @@ -133,7 +123,6 @@ imageio.imwrite('lfw_montage_960.jpg', montage) ### Supplementary Material - ``` load_file assets/lfw_commercial_use.csv name_display, company_url, example_url, country, description @@ -141,14 +130,13 @@ name_display, company_url, example_url, country, description Text and graphics ©Adam Harvey / megapixels.cc - ------- Ignore text below these lines ------- -Research +### Research - "In our experiments, we used 10000 images and associated captions from the Faces in the wilddata set [3]." - "This work was supported in part by the Center for Intelligent Information Retrieval, the Central Intelligence Agency, the National Security Agency and National Science Foundation under CAREER award IIS-0546666 and grant IIS-0326249." @@ -159,6 +147,9 @@ Research - This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via contract number 2014-14071600010. - From "Labeled Faces in the Wild: Updates and New Reporting Procedures" +### Footnotes + [^lfw_www]: [^lfw_baidu]: Jingtuo Liu, Yafeng Deng, Tao Bai, Zhengping Wei, Chang Huang. Targeting Ultimate Accuracy: Face Recognition via Deep Embedding. [^lfw_pingan]: Lee, Justin. "PING AN Tech facial recognition receives high score in latest LFW test results". BiometricUpdate.com. Feb 13, 2017. + diff --git a/site/public/datasets/lfw/index.html b/site/public/datasets/lfw/index.html index f83d8a66..86f49c52 100644 --- a/site/public/datasets/lfw/index.html +++ b/site/public/datasets/lfw/index.html @@ -27,11 +27,8 @@
    -

    LFW

    -
    Years
    2002-2004
    Images
    13,233
    Identities
    5,749
    Origin
    Yahoo News Images
    Funding
    (Possibly, partially CIA*)
    Eighteen of the 5,749 people in the Labeled Faces in the Wild Dataset. The most widely used face dataset for benchmarking commercial face recognition algorithms.
    Eighteen of the 5,749 people in the Labeled Faces in the Wild Dataset. The most widely used face dataset for benchmarking commercial face recognition algorithms.

    Labeled Faces in The Wild (LFW) is "a database of face photographs designed for studying the problem of unconstrained face recognition[^lfw_www]. It is used to evaluate and improve the performance of facial recognition algorithms in academic, commercial, and government research. According to BiometricUpdate.com[^lfw_pingan], LFW is "the most widely used evaluation set in the field of facial recognition, LFW attracts a few dozen teams from around the globe including Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong."

    -

    The LFW dataset includes 13,233 images of 5,749 people that were collected between 2002-2004. LFW is a subset of Names of Faces and is part of the first facial recognition training dataset created entirely from images appearing on the Internet. The people appearing in LFW are...

    -

    The Names and Faces dataset was the first face recognition dataset created entire from online photos. However, Names and Faces and LFW are not the first face recognition dataset created entirely "in the wild". That title belongs to the UCD dataset. Images obtained "in the wild" means using an image without explicit consent or awareness from the subject or photographer.

    -

    Analysis

    +

    Statistics

    +
    Years
    2002-2004
    Images
    13,233
    Identities
    5,749
    Origin
    Yahoo News Images
    Funding
    (Possibly, partially CIA)

    Analysis

    • There are about 3 men for every 1 woman (4,277 men and 1,472 women) in the LFW dataset[^lfw_www]
    • The person with the most images is George W. Bush with 530
    • @@ -41,15 +38,17 @@
    • In all 3 of the LFW publications [^lfw_original_paper], [^lfw_survey], [^lfw_tech_report] the words "ethics", "consent", and "privacy" appear 0 times
    • The word "future" appears 71 times
    +

    Labeled Faces in the Wild

    +

    Labeled Faces in The Wild (LFW) is "a database of face photographs designed for studying the problem of unconstrained face recognition[^lfw_www]. It is used to evaluate and improve the performance of facial recognition algorithms in academic, commercial, and government research. According to BiometricUpdate.com[^lfw_pingan], LFW is "the most widely used evaluation set in the field of facial recognition, LFW attracts a few dozen teams from around the globe including Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong."

    +

    The LFW dataset includes 13,233 images of 5,749 people that were collected between 2002-2004. LFW is a subset of Names of Faces and is part of the first facial recognition training dataset created entirely from images appearing on the Internet. The people appearing in LFW are...

    +

    The Names and Faces dataset was the first face recognition dataset created entire from online photos. However, Names and Faces and LFW are not the first face recognition dataset created entirely "in the wild". That title belongs to the UCD dataset. Images obtained "in the wild" means using an image without explicit consent or awareness from the subject or photographer.

    Synthetic Faces

    To visualize the types of photos in the dataset without explicitly publishing individual's identities a generative adversarial network (GAN) was trained on the entire dataset. The images in this video show a neural network learning the visual latent space and then interpolating between archetypical identities within the LFW dataset.

    Biometric Trade Routes

    To understand how this dataset has been used, its citations have been geocoded to show an approximate geographic digital trade route of the biometric data. Lines indicate an organization (education, commercial, or governmental) that has cited the LFW dataset in their research. Data is compiled from SemanticScholar.

    -

    [add map here]

    -

    Citations

    +

    Citations

    Browse or download the geocoded citation data collected for the LFW dataset.

    -

    [add citations table here]

    -

    Additional Information

    +

    Additional Information

    (tweet-sized snippets go here)

    • The LFW dataset is considered the "most popular benchmark for face recognition" [^lfw_baidu]
    • @@ -57,27 +56,10 @@
    • All images in LFW dataset were obtained "in the wild" meaning without any consent from the subject or from the photographer
    • The faces in the LFW dataset were detected using the Viola-Jones haarcascade face detector [^lfw_website] [^lfw-survey]
    • The LFW dataset is used by several of the largest tech companies in the world including "Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong." [^lfw_pingan]
    • -
    • All images in the LFW dataset were copied from Yahoo News between 2002 - 2004 -<<<<<<< HEAD
    • -
    • In 2014, two of the four original authors of the LFW dataset received funding from IARPA and ODNI for their follow up paper Labeled Faces in the Wild: Updates and New Reporting Procedures via IARPA contract number 2014-14071600010
    • -
    • The dataset includes 2 images of George Tenet, the former Director of Central Intelligence (DCI) for the Central Intelligence Agency whose facial biometrics were eventually used to help train facial recognition software in China and Russia

      -
    • -
    • In 2014, 2/4 of the original authors of the LFW dataset received funding from IARPA and ODNI for their follow up paper "Labeled Faces in the Wild: Updates and New Reporting Procedures" via IARPA contract number 2014-14071600010
    • -
    • The LFW dataset was used Center for Intelligent Information Retrieval, the Central Intelligence Agency, the National Security Agency and National
    • -
    -

    TODO (need citations for the following)

    -
      -
    • SenseTime, who has relied on LFW for benchmarking their facial recognition performance, is one the leading provider of surveillance to the Chinese Government [need citation for this fact. is it the most? or is that Tencent?]
    • -
    • Two out of 4 of the original authors received funding from the Office of Director of National Intelligence and IARPA for their 2016 LFW survey follow up report
    • +
    • All images in the LFW dataset were copied from Yahoo News between 2002 - 2004
    • +
    • In 2014, two of the four original authors of the LFW dataset received funding from IARPA and ODNI for their followup paper Labeled Faces in the Wild: Updates and New Reporting Procedures via IARPA contract number 2014-14071600010
    • +
    • The dataset includes 2 images of George Tenet, the former Director of Central Intelligence (DCI) for the Central Intelligence Agency whose facial biometrics were eventually used to help train facial recognition software in China and Russia
    -

    > 13d7a450affe8ea4f368a97ea2014faa17702a4c

    -
    -
    -
    -
    -
    -
    -
     former President George W. Bush
    former President George W. Bush
     Colin Powell (236), Tony Blair (144), and Donald Rumsfeld (121)
    Colin Powell (236), Tony Blair (144), and Donald Rumsfeld (121)
    All 5,379 faces in the Labeled Faces in The Wild Dataset
    All 5,379 faces in the Labeled Faces in The Wild Dataset

    Code

    The LFW dataset is so widely used that a popular code library called Sci-Kit Learn includes a function called fetch_lfw_people to download the faces in the LFW dataset.

    @@ -113,7 +95,7 @@ imageio.imwrite('lfw_montage_960.jpg', montage)

    Supplementary Material

    Text and graphics ©Adam Harvey / megapixels.cc

    Ignore text below these lines

    -

    Research

    +

    Research

    • "In our experiments, we used 10000 images and associated captions from the Faces in the wilddata set [3]."
    • "This work was supported in part by the Center for Intelligent Information Retrieval, the Central Intelligence Agency, the National Security Agency and National Science Foundation under CAREER award IIS-0546666 and grant IIS-0326249."
    • @@ -125,6 +107,7 @@ imageio.imwrite('lfw_montage_960.jpg', montage)
    • From "Labeled Faces in the Wild: Updates and New Reporting Procedures"
    +

    Footnotes


      -- cgit v1.2.3-70-g09d2 From 9bac173e85865e4f0d1dba5071b40eb7ebe3dd1a Mon Sep 17 00:00:00 2001 From: Jules Laplace Date: Wed, 27 Feb 2019 22:15:03 +0100 Subject: new intro header for datasets page and sidebar --- client/index.js | 6 +-- megapixels/app/site/parser.py | 70 ++++++++++++++++++++++++++---- megapixels/commands/site/watch.py | 2 + site/assets/css/css.css | 72 ++++++++++++++++++++++++++----- site/assets/css/tabulator.css | 2 +- site/content/pages/datasets/lfw/index.md | 25 +++++------ site/content/pages/datasets/uccs/index.md | 2 +- site/public/datasets/lfw/index.html | 36 ++++------------ 8 files changed, 152 insertions(+), 63 deletions(-) (limited to 'site/public/datasets') diff --git a/client/index.js b/client/index.js index c9335f14..37906f30 100644 --- a/client/index.js +++ b/client/index.js @@ -110,9 +110,9 @@ function runApplets() { function main() { const paras = document.querySelectorAll('section p') - if (paras.length) { - paras[0].classList.add('first_paragraph') - } + // if (paras.length) { + // paras[0].classList.add('first_paragraph') + // } toArray(document.querySelectorAll('header .links a')).forEach(tag => { if (window.location.href.match(tag.href)) { tag.classList.add('active') diff --git a/megapixels/app/site/parser.py b/megapixels/app/site/parser.py index 3792e6f1..dc53177b 100644 --- a/megapixels/app/site/parser.py +++ b/megapixels/app/site/parser.py @@ -16,9 +16,30 @@ def parse_markdown(metadata, sections, s3_path, skip_h1=False): """ groups = [] current_group = [] + in_stats = False + + if 'desc' in metadata and 'subdesc' in metadata: + groups.append(intro_section(metadata, s3_path)) + for section in sections: if skip_h1 and section.startswith('# '): continue + elif section.strip().startswith('---'): + continue + elif section.lower().strip().startswith('ignore text'): + break + elif '### Statistics' in section: + if len(current_group): + groups.append(format_section(current_group, s3_path)) + current_group = [] + current_group.append(section) + in_stats = True + elif in_stats and not section.strip().startswith('## '): + current_group.append(section) + elif in_stats and section.strip().startswith('## '): + current_group = [format_section(current_group, s3_path, 'right-sidebar', tag='div')] + current_group.append(section) + in_stats = False elif section.strip().startswith('```'): groups.append(format_section(current_group, s3_path)) current_group = [] @@ -32,7 +53,7 @@ def parse_markdown(metadata, sections, s3_path, skip_h1=False): current_group = [] elif section.startswith('+ '): groups.append(format_section(current_group, s3_path)) - groups.append(format_metadata(section)) + groups.append('
      ' + format_metadata(section) + '
      ') current_group = [] elif '![fullwidth:' in section: groups.append(format_section(current_group, s3_path)) @@ -52,6 +73,32 @@ def parse_markdown(metadata, sections, s3_path, skip_h1=False): content = "".join(groups) return content +def intro_section(metadata, s3_path): + """ + Build the intro section for datasets + """ + + section = "
      ".format(s3_path + metadata['image']) + section += "
      " + + parts = [] + if 'desc' in metadata: + desc = metadata['desc'] + if 'color' in metadata and metadata['title'] in desc: + desc = desc.replace(metadata['title'], "{}".format(metadata['color'], metadata['title'])) + section += "
      {}
      ".format(desc, desc) + + if 'subdesc' in metadata: + subdesc = markdown(metadata['subdesc']).replace('

      ', '').replace('

      ', '') + section += "
      {}
      ".format(subdesc, subdesc) + + section += "
      " + section += "
      " + + if 'caption' in metadata: + section += "
      {}
      ".format(metadata['caption']) + + return section def fix_images(lines, s3_path): """ @@ -75,19 +122,26 @@ def fix_images(lines, s3_path): real_lines.append(line) return "\n".join(real_lines) - -def format_section(lines, s3_path, type=''): +def format_section(lines, s3_path, type='', tag='section'): """ format a normal markdown section """ if len(lines): + lines = fix_meta(lines) lines = fix_images(lines, s3_path) if type: - return "
      {}
      ".format(type, markdown(lines)) + return "<{} class='{}'>{}".format(tag, type, markdown(lines), tag) else: - return "
      " + markdown(lines) + "
      " + return "<{}>{}".format(tag, markdown(lines), tag) return "" +def fix_meta(lines): + new_lines = [] + for line in lines: + if line.startswith('+ '): + line = format_metadata(line) + new_lines.append(line) + return new_lines def format_metadata(section): """ @@ -97,8 +151,7 @@ def format_metadata(section): for line in section.split('\n'): key, value = line[2:].split(': ', 1) meta.append("
      {}
      {}
      ".format(key, value)) - return "
      {}
      ".format(''.join(meta)) - + return "
      {}
      ".format(''.join(meta)) def format_applet(section, s3_path): """ @@ -107,12 +160,13 @@ def format_applet(section, s3_path): # print(section) payload = section.strip('```').strip().strip('```').strip().split('\n') applet = {} - print(payload) + # print(payload) if ': ' in payload[0]: command, opt = payload[0].split(': ') else: command = payload[0] opt = None + print(command) if command == 'python' or command == 'javascript' or command == 'code': return format_section([ section ], s3_path) if command == '': diff --git a/megapixels/commands/site/watch.py b/megapixels/commands/site/watch.py index 7fd3ba7c..7bd71038 100644 --- a/megapixels/commands/site/watch.py +++ b/megapixels/commands/site/watch.py @@ -35,6 +35,8 @@ def cli(ctx): observer.schedule(SiteBuilder(), path=cfg.DIR_SITE_CONTENT, recursive=True) observer.start() + build_file(cfg.DIR_SITE_CONTENT + "/datasets/lfw/index.md") + try: while True: time.sleep(1) diff --git a/site/assets/css/css.css b/site/assets/css/css.css index 7b2e19fc..fed381a7 100644 --- a/site/assets/css/css.css +++ b/site/assets/css/css.css @@ -4,12 +4,12 @@ html, body { padding: 0; width: 100%; min-height: 100%; - font-family: 'Roboto', sans-serif; - color: #b8b8b8; + font-family: 'Roboto Mono', sans-serif; + color: #eee; overflow-x: hidden; } html { - background: #191919; + background: #111111; } .content { @@ -146,8 +146,8 @@ h2 { h3 { margin: 0 0 20px 0; padding: 0; - font-size: 11pt; - font-weight: 500; + font-size: 14pt; + font-weight: 600; transition: color 0.2s cubic-bezier(0,0,1,1); } h4 { @@ -165,8 +165,15 @@ h4 { color: #fff; text-decoration: underline; } +.right-sidebar h3 { + margin: 0; + padding: 0 0 10px 0; + font-family: 'Roboto Mono'; + text-transform: uppercase; + letter-spacing: 2px; +} -th, .gray, h3, h4 { +th, .gray { font-family: 'Roboto Mono', monospace; font-weight: 400; text-transform: uppercase; @@ -201,6 +208,7 @@ section { } p { margin: 0 0 20px 0; + line-height: 2; } .content a { color: #ddd; @@ -229,10 +237,13 @@ p { } .right-sidebar { float: right; - width: 200px; + width: 240px; margin-left: 20px; + padding-top: 10px; padding-left: 20px; border-left: 1px solid #444; + font-family: 'Roboto'; + font-size: 14px; } .right-sidebar .meta { flex-direction: column; @@ -240,6 +251,9 @@ p { .right-sidebar .meta > div { margin-bottom: 10px; } +.right-sidebar ul { + margin-bottom: 10px; +} /* lists */ @@ -346,17 +360,17 @@ section.wide .image { } section.fullwidth { width: 100%; - background-size: contain; } section.fullwidth .image { max-width: 100%; } .caption { - text-align: center; + text-align: left; font-size: 9pt; - color: #888; - max-width: 620px; + color: #bbb; + max-width: 960px; margin: 10px auto 0 auto; + font-family: 'Roboto'; } /* blog index */ @@ -499,3 +513,39 @@ section.fullwidth .image { .dataset-list a:nth-child(3n+3) { background-color: rgba(255, 255, 0, 0.1); } .desktop .dataset-list .dataset:nth-child(3n+3):hover { background-color: rgba(255, 255, 0, 0.2); } + + +/* intro section for datasets */ + +section.intro_section { + font-family: 'Roboto Mono'; + width: 100%; + background-size: cover; + background-position: bottom left; + padding: 50px 0; + min-height: 60vh; + display: flex; + justify-content: center; + align-items: center; + background-color: #111111; +} +.intro_section .inner { + max-width: 960px; + margin: 0 auto; +} +.intro_section .hero_desc { + font-size: 38px; + line-height: 60px; + margin-bottom: 30px; + color: #fff; +} +.intro_section .hero_subdesc { + font-size: 18px; + line-height: 36px; + max-width: 640px; + color: #ddd; +} +.intro_section span { + box-shadow: -10px -10px #000, 10px -10px #000, 10px 10px #000, -10px 10px #000; + background: #000; +} \ No newline at end of file diff --git a/site/assets/css/tabulator.css b/site/assets/css/tabulator.css index 200f0c5c..63abf050 100755 --- a/site/assets/css/tabulator.css +++ b/site/assets/css/tabulator.css @@ -493,7 +493,7 @@ display: inline-block; position: relative; box-sizing: border-box; - padding: 4px; + padding: 10px; border-right: 1px solid #333; vertical-align: middle; white-space: nowrap; diff --git a/site/content/pages/datasets/lfw/index.md b/site/content/pages/datasets/lfw/index.md index 48d86e1f..1995e1f9 100644 --- a/site/content/pages/datasets/lfw/index.md +++ b/site/content/pages/datasets/lfw/index.md @@ -2,14 +2,14 @@ status: published title: Labeled Faces in The Wild -desc: Labeled Faces in The Wild (LFW) is a database of face photographs designed for studying the problem of unconstrained face recognition +desc: Labeled Faces in The Wild (LFW) is a database of face photographs designed for studying the problem of unconstrained face recognition. subdesc: It includes 13,456 images of 4,432 people’s images copied from the Internet during 2002-2004. -image: lfw_index.gif +image: assets/lfw_feature.jpg caption: Eighteen of the 5,749 people in the Labeled Faces in the Wild Dataset. The most widely used face dataset for benchmarking commercial face recognition algorithms. slug: lfw published: 2019-2-23 updated: 2019-2-23 -color: #00FF00 +color: #ff0000 authors: Adam Harvey ------------ @@ -22,12 +22,11 @@ authors: Adam Harvey + Origin: Yahoo News Images + Funding: (Possibly, partially CIA) -### Analysis +### INSIGHTS - There are about 3 men for every 1 woman (4,277 men and 1,472 women) in the LFW dataset[^lfw_www] - The person with the most images is [George W. Bush](http://vis-www.cs.umass.edu/lfw/person/George_W_Bush_comp.html) with 530 - There are about 3 George W. Bush's for every 1 [Tony Blair](http://vis-www.cs.umass.edu/lfw/person/Tony_Blair.html) -- 70% of people in the dataset have only 1 image and 29% have 2 or more images - The LFW dataset includes over 500 actors, 30 models, 10 presidents, 124 basketball players, 24 football players, 11 kings, 7 queens, and 1 [Moby](http://vis-www.cs.umass.edu/lfw/person/Moby.html) - In all 3 of the LFW publications [^lfw_original_paper], [^lfw_survey], [^lfw_tech_report] the words "ethics", "consent", and "privacy" appear 0 times - The word "future" appears 71 times @@ -40,20 +39,20 @@ The LFW dataset includes 13,233 images of 5,749 people that were collected betwe The *Names and Faces* dataset was the first face recognition dataset created entire from online photos. However, *Names and Faces* and *LFW* are not the first face recognition dataset created entirely "in the wild". That title belongs to the [UCD dataset](/datasets/ucd_faces/). Images obtained "in the wild" means using an image without explicit consent or awareness from the subject or photographer. -### Synthetic Faces - -To visualize the types of photos in the dataset without explicitly publishing individual's identities a generative adversarial network (GAN) was trained on the entire dataset. The images in this video show a neural network learning the visual latent space and then interpolating between archetypical identities within the LFW dataset. - -![fullwidth:](assets/lfw_synthetic.jpg) - ### Biometric Trade Routes -To understand how this dataset has been used, its citations have been geocoded to show an approximate geographic digital trade route of the biometric data. Lines indicate an organization (education, commercial, or governmental) that has cited the LFW dataset in their research. Data is compiled from [SemanticScholar](https://www.semanticscholar.org). +To understand how this dataset has been used, its citations have been geocoded to show an approximate geographic digital trade route of the biometric data. Lines indicate an organization (education, commercial, or governmental) that has cited the LFW dataset in their research. Data is compiled from [Semantic Scholar](https://www.semanticscholar.org). ``` map ``` +### Synthetic Faces + +To visualize the types of photos in the dataset without explicitly publishing individual's identities a generative adversarial network (GAN) was trained on the entire dataset. The images in this video show a neural network learning the visual latent space and then interpolating between archetypical identities within the LFW dataset. + +![fullwidth:](assets/lfw_synthetic.jpg) + ### Citations Browse or download the geocoded citation data collected for the LFW dataset. @@ -136,6 +135,7 @@ Ignore text below these lines ------- + ### Research - "In our experiments, we used 10000 images and associated captions from the Faces in the wilddata set [3]." @@ -146,6 +146,7 @@ Ignore text below these lines - This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via contract number 2014-14071600010. - From "Labeled Faces in the Wild: Updates and New Reporting Procedures" +- 70% of people in the dataset have only 1 image and 29% have 2 or more images ### Footnotes diff --git a/site/content/pages/datasets/uccs/index.md b/site/content/pages/datasets/uccs/index.md index d40dce22..be1d2474 100644 --- a/site/content/pages/datasets/uccs/index.md +++ b/site/content/pages/datasets/uccs/index.md @@ -68,7 +68,7 @@ The more recent UCCS version of the dataset received funding from [^funding_uccs - You are welcomed to use these images for academic and journalistic use including for research papers, news stories, presentations. - Please use the following citation: -```MegaPixels.cc Adam Harvey 2013-2109.``` +```MegaPixels.cc Adam Harvey 2013-2019.``` [^funding_sb]: Sapkota, Archana and Boult, Terrance. "Large Scale Unconstrained Open Set Face Database." 2013. [^funding_uccs]: Günther, M. et. al. "Unconstrained Face Detection and Open-Set Face Recognition Challenge," 2018. Arxiv 1708.02337v3. \ No newline at end of file diff --git a/site/public/datasets/lfw/index.html b/site/public/datasets/lfw/index.html index 86f49c52..1242df0c 100644 --- a/site/public/datasets/lfw/index.html +++ b/site/public/datasets/lfw/index.html @@ -4,7 +4,7 @@ MegaPixels - + @@ -27,26 +27,26 @@
      -

      Statistics

      -
      Years
      2002-2004
      Images
      13,233
      Identities
      5,749
      Origin
      Yahoo News Images
      Funding
      (Possibly, partially CIA)

      Analysis

      +
      Labeled Faces in The Wild (LFW) is a database of face photographs designed for studying the problem of unconstrained face recognition.
      It includes 13,456 images of 4,432 people’s images copied from the Internet during 2002-2004. +
      Eighteen of the 5,749 people in the Labeled Faces in the Wild Dataset. The most widely used face dataset for benchmarking commercial face recognition algorithms.

      Labeled Faces in the Wild

      Labeled Faces in The Wild (LFW) is "a database of face photographs designed for studying the problem of unconstrained face recognition[^lfw_www]. It is used to evaluate and improve the performance of facial recognition algorithms in academic, commercial, and government research. According to BiometricUpdate.com[^lfw_pingan], LFW is "the most widely used evaluation set in the field of facial recognition, LFW attracts a few dozen teams from around the globe including Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong."

      The LFW dataset includes 13,233 images of 5,749 people that were collected between 2002-2004. LFW is a subset of Names of Faces and is part of the first facial recognition training dataset created entirely from images appearing on the Internet. The people appearing in LFW are...

      The Names and Faces dataset was the first face recognition dataset created entire from online photos. However, Names and Faces and LFW are not the first face recognition dataset created entirely "in the wild". That title belongs to the UCD dataset. Images obtained "in the wild" means using an image without explicit consent or awareness from the subject or photographer.

      -

      Synthetic Faces

      +

      Biometric Trade Routes

      +

      To understand how this dataset has been used, its citations have been geocoded to show an approximate geographic digital trade route of the biometric data. Lines indicate an organization (education, commercial, or governmental) that has cited the LFW dataset in their research. Data is compiled from Semantic Scholar.

      +

      Synthetic Faces

      To visualize the types of photos in the dataset without explicitly publishing individual's identities a generative adversarial network (GAN) was trained on the entire dataset. The images in this video show a neural network learning the visual latent space and then interpolating between archetypical identities within the LFW dataset.

      -

      Biometric Trade Routes

      -

      To understand how this dataset has been used, its citations have been geocoded to show an approximate geographic digital trade route of the biometric data. Lines indicate an organization (education, commercial, or governmental) that has cited the LFW dataset in their research. Data is compiled from SemanticScholar.

      -

      Citations

      +

      Citations

      Browse or download the geocoded citation data collected for the LFW dataset.

      Additional Information

      (tweet-sized snippets go here)

      @@ -94,24 +94,6 @@ imageio.imwrite('lfw_montage_960.jpg', montage)

      Supplementary Material

      Text and graphics ©Adam Harvey / megapixels.cc

      -

      Ignore text below these lines

      -

      Research

      -
        -
      • "In our experiments, we used 10000 images and associated captions from the Faces in the wilddata set [3]."
      • -
      • "This work was supported in part by the Center for Intelligent Information Retrieval, the Central Intelligence Agency, the National Security Agency and National Science Foundation under CAREER award IIS-0546666 and grant IIS-0326249."
      • -
      • From: "People-LDA: Anchoring Topics to People using Face Recognition" https://www.semanticscholar.org/paper/People-LDA%3A-Anchoring-Topics-to-People-using-Face-Jain-Learned-Miller/10f17534dba06af1ddab96c4188a9c98a020a459 and https://ieeexplore.ieee.org/document/4409055
      • -
      • This paper was presented at IEEE 11th ICCV conference Oct 14-21 and the main LFW paper "Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments" was also published that same year
      • -
      • 10f17534dba06af1ddab96c4188a9c98a020a459

        -
      • -
      • This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via contract number 2014-14071600010.

        -
      • -
      • From "Labeled Faces in the Wild: Updates and New Reporting Procedures"
      • -
      -

      Footnotes

      -
      -
      -
        -
        -- cgit v1.2.3-70-g09d2 From 1b008e4b4d11def9b13dc0a800b0d068624d43ae Mon Sep 17 00:00:00 2001 From: Jules Laplace Date: Wed, 27 Feb 2019 23:48:35 +0100 Subject: half of a footnote implementation --- megapixels/app/site/parser.py | 35 +++++++++++++++++++++++++++++------ site/assets/css/css.css | 34 ++++++++++++++++++++++++++++++++++ site/public/datasets/lfw/index.html | 15 +++++++++------ 3 files changed, 72 insertions(+), 12 deletions(-) (limited to 'site/public/datasets') diff --git a/megapixels/app/site/parser.py b/megapixels/app/site/parser.py index 98d9f284..ef83b655 100644 --- a/megapixels/app/site/parser.py +++ b/megapixels/app/site/parser.py @@ -18,6 +18,7 @@ def parse_markdown(metadata, sections, s3_path, skip_h1=False): current_group = [] footnotes = [] in_stats = False + in_footnotes = False ignoring = False if 'desc' in metadata and 'subdesc' in metadata: @@ -33,6 +34,7 @@ def parse_markdown(metadata, sections, s3_path, skip_h1=False): continue elif section.strip().startswith('### Footnotes'): groups.append(format_section(current_group, s3_path)) + current_group = [] footnotes = [] in_footnotes = True elif in_footnotes: @@ -82,10 +84,18 @@ def parse_markdown(metadata, sections, s3_path, skip_h1=False): current_group.append(section) groups.append(format_section(current_group, s3_path)) + footnote_txt = '' + footnote_lookup = {} + if len(footnotes): - groups.append(format_footnotes(footnotes, s3_path)) + footnote_txt, footnote_lookup = format_footnotes(footnotes, s3_path) content = "".join(groups) + + if footnote_lookup: + for key, index in footnote_lookup.items(): + content = content.replace(key, '{}'.format(key, index, index)) + content += footnote_txt return content @@ -153,8 +163,10 @@ def format_section(lines, s3_path, type='', tag='section'): return "<{}>{}".format(tag, markdown(lines), tag) return "" - def fix_meta(lines): + """ + Format metadata sections before passing to markdown + """ new_lines = [] for line in lines: if line.startswith('+ '): @@ -162,7 +174,6 @@ def fix_meta(lines): new_lines.append(line) return new_lines - def format_metadata(section): """ format a metadata section (+ key: value pairs) @@ -173,12 +184,24 @@ def format_metadata(section): meta.append("
        {}
        {}
        ".format(key, value)) return "
        {}
        ".format(''.join(meta)) -def format_footnotes(footnotes): +def format_footnotes(footnotes, s3_path): + """ + Format the footnotes section separately and produce a lookup we can use to update the main site + """ footnotes = '\n'.join(footnotes).split('\n') + index = 1 + footnote_index_lookup = {} + footnote_list = [] for footnote in footnotes: if not len(footnote) or '[^' not in footnote: continue - key, footnote = footnotes.split(': ') + key, note = footnote.split(': ', 1) + footnote_index_lookup[key] = index + footnote_list.append('^'.format(key) + markdown(note)) + index += 1 + + footnote_txt = '
        • ' + '
        • '.join(footnote_list) + '
        ' + return footnote_txt, footnote_index_lookup def format_applet(section, s3_path): """ @@ -189,7 +212,7 @@ def format_applet(section, s3_path): applet = {} # print(payload) if ': ' in payload[0]: - command, opt = payload[0].split(': ') + command, opt = payload[0].split(': ', 1) else: command = payload[0] opt = None diff --git a/site/assets/css/css.css b/site/assets/css/css.css index fed381a7..8b4241ea 100644 --- a/site/assets/css/css.css +++ b/site/assets/css/css.css @@ -548,4 +548,38 @@ section.intro_section { .intro_section span { box-shadow: -10px -10px #000, 10px -10px #000, 10px 10px #000, -10px 10px #000; background: #000; +} + +/* footnotes */ + +a.footnote { + font-size: 10px; + position: relative; + display: inline-block; + bottom: 10px; + text-decoration: none; + color: #ff0; + left: 2px; +} +.right-sidebar a.footnote { + bottom: 8px; +} +.desktop a.footnote:hover { + background-color: #ff0; + color: #000; +} +a.footnote_anchor { + font-weight: bold; + color: #ff0; + margin-right: 10px; + text-decoration: underline; + cursor: pointer; +} +ul.footnotes { + list-style-type: decimal; + margin-left: 30px; +} +li p { + margin: 0; padding: 0; + display: inline; } \ No newline at end of file diff --git a/site/public/datasets/lfw/index.html b/site/public/datasets/lfw/index.html index 1242df0c..54b6aa22 100644 --- a/site/public/datasets/lfw/index.html +++ b/site/public/datasets/lfw/index.html @@ -31,7 +31,7 @@
        Eighteen of the 5,749 people in the Labeled Faces in the Wild Dataset. The most widely used face dataset for benchmarking commercial face recognition algorithms.

        Labeled Faces in the Wild

        -

        Labeled Faces in The Wild (LFW) is "a database of face photographs designed for studying the problem of unconstrained face recognition[^lfw_www]. It is used to evaluate and improve the performance of facial recognition algorithms in academic, commercial, and government research. According to BiometricUpdate.com[^lfw_pingan], LFW is "the most widely used evaluation set in the field of facial recognition, LFW attracts a few dozen teams from around the globe including Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong."

        +

        Labeled Faces in The Wild (LFW) is "a database of face photographs designed for studying the problem of unconstrained face recognition1. It is used to evaluate and improve the performance of facial recognition algorithms in academic, commercial, and government research. According to BiometricUpdate.com3, LFW is "the most widely used evaluation set in the field of facial recognition, LFW attracts a few dozen teams from around the globe including Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong."

        The LFW dataset includes 13,233 images of 5,749 people that were collected between 2002-2004. LFW is a subset of Names of Faces and is part of the first facial recognition training dataset created entirely from images appearing on the Internet. The people appearing in LFW are...

        The Names and Faces dataset was the first face recognition dataset created entire from online photos. However, Names and Faces and LFW are not the first face recognition dataset created entirely "in the wild". That title belongs to the UCD dataset. Images obtained "in the wild" means using an image without explicit consent or awareness from the subject or photographer.

        Biometric Trade Routes

        @@ -51,11 +51,11 @@

        Additional Information

        (tweet-sized snippets go here)

          -
        • The LFW dataset is considered the "most popular benchmark for face recognition" [^lfw_baidu]
        • -
        • The LFW dataset is "the most widely used evaluation set in the field of facial recognition" [^lfw_pingan]
        • +
        • The LFW dataset is considered the "most popular benchmark for face recognition" 2
        • +
        • The LFW dataset is "the most widely used evaluation set in the field of facial recognition" 3
        • All images in LFW dataset were obtained "in the wild" meaning without any consent from the subject or from the photographer
        • The faces in the LFW dataset were detected using the Viola-Jones haarcascade face detector [^lfw_website] [^lfw-survey]
        • -
        • The LFW dataset is used by several of the largest tech companies in the world including "Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong." [^lfw_pingan]
        • +
        • The LFW dataset is used by several of the largest tech companies in the world including "Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong." 3
        • All images in the LFW dataset were copied from Yahoo News between 2002 - 2004
        • In 2014, two of the four original authors of the LFW dataset received funding from IARPA and ODNI for their followup paper Labeled Faces in the Wild: Updates and New Reporting Procedures via IARPA contract number 2014-14071600010
        • The dataset includes 2 images of George Tenet, the former Director of Central Intelligence (DCI) for the Central Intelligence Agency whose facial biometrics were eventually used to help train facial recognition software in China and Russia
        • @@ -94,7 +94,10 @@ imageio.imwrite('lfw_montage_960.jpg', montage)

        Supplementary Material

        Text and graphics ©Adam Harvey / megapixels.cc

        -
        +