+ diff --git a/site/public/datasets/lfw/index.html b/site/public/datasets/lfw/index.html index f83d8a66..86f49c52 100644 --- a/site/public/datasets/lfw/index.html +++ b/site/public/datasets/lfw/index.html @@ -27,11 +27,8 @@

LFW

Years

2002-2004

Images

13,233

Identities

5,749

Origin

Yahoo News Images

Funding

(Possibly, partially CIA*)

Eighteen of the 5,749 people in the Labeled Faces in the Wild Dataset. The most widely used face dataset for benchmarking commercial face recognition algorithms.

Labeled Faces in The Wild (LFW) is "a database of face photographs designed for studying the problem of unconstrained face recognition[^lfw_www]. It is used to evaluate and improve the performance of facial recognition algorithms in academic, commercial, and government research. According to BiometricUpdate.com[^lfw_pingan], LFW is "the most widely used evaluation set in the field of facial recognition, LFW attracts a few dozen teams from around the globe including Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong."

The LFW dataset includes 13,233 images of 5,749 people that were collected between 2002-2004. LFW is a subset of Names of Faces and is part of the first facial recognition training dataset created entirely from images appearing on the Internet. The people appearing in LFW are...

The Names and Faces dataset was the first face recognition dataset created entire from online photos. However, Names and Faces and LFW are not the first face recognition dataset created entirely "in the wild". That title belongs to the UCD dataset. Images obtained "in the wild" means using an image without explicit consent or awareness from the subject or photographer.

Analysis

Statistics

Years

2002-2004

Images

13,233

Identities

5,749

Origin

Yahoo News Images

Funding

(Possibly, partially CIA)

Analysis

There are about 3 men for every 1 woman (4,277 men and 1,472 women) in the LFW dataset[^lfw_www]
The person with the most images is George W. Bush with 530
In all 3 of the LFW publications [^lfw_original_paper], [^lfw_survey], [^lfw_tech_report] the words "ethics", "consent", and "privacy" appear 0 times
The word "future" appears 71 times

Labeled Faces in the Wild

Synthetic Faces

To visualize the types of photos in the dataset without explicitly publishing individual's identities a generative adversarial network (GAN) was trained on the entire dataset. The images in this video show a neural network learning the visual latent space and then interpolating between archetypical identities within the LFW dataset.

Biometric Trade Routes

To understand how this dataset has been used, its citations have been geocoded to show an approximate geographic digital trade route of the biometric data. Lines indicate an organization (education, commercial, or governmental) that has cited the LFW dataset in their research. Data is compiled from SemanticScholar.

[add map here]

Citations

Browse or download the geocoded citation data collected for the LFW dataset.

[add citations table here]

Additional Information

(tweet-sized snippets go here)

The LFW dataset is considered the "most popular benchmark for face recognition" [^lfw_baidu]
All images in LFW dataset were obtained "in the wild" meaning without any consent from the subject or from the photographer
The faces in the LFW dataset were detected using the Viola-Jones haarcascade face detector [^lfw_website] [^lfw-survey]
The LFW dataset is used by several of the largest tech companies in the world including "Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong." [^lfw_pingan]
All images in the LFW dataset were copied from Yahoo News between 2002 - 2004 -<<<<<<< HEAD
In 2014, two of the four original authors of the LFW dataset received funding from IARPA and ODNI for their follow up paper Labeled Faces in the Wild: Updates and New Reporting Procedures via IARPA contract number 2014-14071600010
The dataset includes 2 images of George Tenet, the former Director of Central Intelligence (DCI) for the Central Intelligence Agency whose facial biometrics were eventually used to help train facial recognition software in China and Russia
-
In 2014, 2/4 of the original authors of the LFW dataset received funding from IARPA and ODNI for their follow up paper "Labeled Faces in the Wild: Updates and New Reporting Procedures" via IARPA contract number 2014-14071600010
The LFW dataset was used Center for Intelligent Information Retrieval, the Central Intelligence Agency, the National Security Agency and National

TODO (need citations for the following)

SenseTime, who has relied on LFW for benchmarking their facial recognition performance, is one the leading provider of surveillance to the Chinese Government [need citation for this fact. is it the most? or is that Tencent?]
Two out of 4 of the original authors received funding from the Office of Director of National Intelligence and IARPA for their 2016 LFW survey follow up report
All images in the LFW dataset were copied from Yahoo News between 2002 - 2004
In 2014, two of the four original authors of the LFW dataset received funding from IARPA and ODNI for their followup paper Labeled Faces in the Wild: Updates and New Reporting Procedures via IARPA contract number 2014-14071600010
The dataset includes 2 images of George Tenet, the former Director of Central Intelligence (DCI) for the Central Intelligence Agency whose facial biometrics were eventually used to help train facial recognition software in China and Russia

> 13d7a450affe8ea4f368a97ea2014faa17702a4c
-
-
-
-
-
-
-

former President George W. Bush

Colin Powell (236), Tony Blair (144), and Donald Rumsfeld (121)

All 5,379 faces in the Labeled Faces in The Wild Dataset

Code

The LFW dataset is so widely used that a popular code library called Sci-Kit Learn includes a function called fetch_lfw_people to download the faces in the LFW dataset.

@@ -113,7 +95,7 @@ imageio.imwrite('lfw_montage_960.jpg', montage)

Supplementary Material

Text and graphics ©Adam Harvey / megapixels.cc

Ignore text below these lines

Research

"In our experiments, we used 10000 images and associated captions from the Faces in the wilddata set [3]."
"This work was supported in part by the Center for Intelligent Information Retrieval, the Central Intelligence Agency, the National Security Agency and National Science Foundation under CAREER award IIS-0546666 and grant IIS-0326249."
From "Labeled Faces in the Wild: Updates and New Reporting Procedures"

Footnotes

-- cgit v1.2.3-70-g09d2 From 9bac173e85865e4f0d1dba5071b40eb7ebe3dd1a Mon Sep 17 00:00:00 2001 From: Jules Laplace Date: Wed, 27 Feb 2019 22:15:03 +0100 Subject: new intro header for datasets page and sidebar --- client/index.js | 6 +-- megapixels/app/site/parser.py | 70 ++++++++++++++++++++++++++---- megapixels/commands/site/watch.py | 2 + site/assets/css/css.css | 72 ++++++++++++++++++++++++++----- site/assets/css/tabulator.css | 2 +- site/content/pages/datasets/lfw/index.md | 25 +++++------ site/content/pages/datasets/uccs/index.md | 2 +- site/public/datasets/lfw/index.html | 36 ++++------------ 8 files changed, 152 insertions(+), 63 deletions(-) (limited to 'megapixels/app/site/parser.py') diff --git a/client/index.js b/client/index.js index c9335f14..37906f30 100644 --- a/client/index.js +++ b/client/index.js @@ -110,9 +110,9 @@ function runApplets() { function main() { const paras = document.querySelectorAll('section p') - if (paras.length) { - paras[0].classList.add('first_paragraph') - } + // if (paras.length) { + // paras[0].classList.add('first_paragraph') + // } toArray(document.querySelectorAll('header .links a')).forEach(tag => { if (window.location.href.match(tag.href)) { tag.classList.add('active') diff --git a/megapixels/app/site/parser.py b/megapixels/app/site/parser.py index 3792e6f1..dc53177b 100644 --- a/megapixels/app/site/parser.py +++ b/megapixels/app/site/parser.py @@ -16,9 +16,30 @@ def parse_markdown(metadata, sections, s3_path, skip_h1=False): """ groups = [] current_group = [] + in_stats = False + + if 'desc' in metadata and 'subdesc' in metadata: + groups.append(intro_section(metadata, s3_path)) + for section in sections: if skip_h1 and section.startswith('# '): continue + elif section.strip().startswith('---'): + continue + elif section.lower().strip().startswith('ignore text'): + break + elif '### Statistics' in section: + if len(current_group): + groups.append(format_section(current_group, s3_path)) + current_group = [] + current_group.append(section) + in_stats = True + elif in_stats and not section.strip().startswith('## '): + current_group.append(section) + elif in_stats and section.strip().startswith('## '): + current_group = [format_section(current_group, s3_path, 'right-sidebar', tag='div')] + current_group.append(section) + in_stats = False elif section.strip().startswith('```'): groups.append(format_section(current_group, s3_path)) current_group = [] @@ -32,7 +53,7 @@ def parse_markdown(metadata, sections, s3_path, skip_h1=False): current_group = [] elif section.startswith('+ '): groups.append(format_section(current_group, s3_path)) - groups.append(format_metadata(section)) + groups.append('

' + format_metadata(section) + '

') current_group = [] elif '![fullwidth:' in section: groups.append(format_section(current_group, s3_path)) @@ -52,6 +73,32 @@ def parse_markdown(metadata, sections, s3_path, skip_h1=False): content = "".join(groups) return content +def intro_section(metadata, s3_path): + """ + Build the intro section for datasets + """ + + section = "

".format(s3_path + metadata['image']) + section += "

" + + parts = [] + if 'desc' in metadata: + desc = metadata['desc'] + if 'color' in metadata and metadata['title'] in desc: + desc = desc.replace(metadata['title'], "{}".format(metadata['color'], metadata['title'])) + section += "

{}

".format(desc, desc) + + if 'subdesc' in metadata: + subdesc = markdown(metadata['subdesc']).replace('

', '').replace('

', '') + section += "

{}

".format(subdesc, subdesc) + + section += "

" + section += "

" + + if 'caption' in metadata: + section += "

{}

".format(metadata['caption']) + + return section def fix_images(lines, s3_path): """ @@ -75,19 +122,26 @@ def fix_images(lines, s3_path): real_lines.append(line) return "\n".join(real_lines) - -def format_section(lines, s3_path, type=''): +def format_section(lines, s3_path, type='', tag='section'): """ format a normal markdown section """ if len(lines): + lines = fix_meta(lines) lines = fix_images(lines, s3_path) if type: - return "

{}

".format(type, markdown(lines)) + return "<{} class='{}'>{}".format(tag, type, markdown(lines), tag) else: - return "

" + markdown(lines) + "

" + return "<{}>{}".format(tag, markdown(lines), tag) return "" +def fix_meta(lines): + new_lines = [] + for line in lines: + if line.startswith('+ '): + line = format_metadata(line) + new_lines.append(line) + return new_lines def format_metadata(section): """ @@ -97,8 +151,7 @@ def format_metadata(section): for line in section.split('\n'): key, value = line[2:].split(': ', 1) meta.append("

{}

".format(key, value)) - return "

{}

".format(''.join(meta)) - + return "

{}

".format(''.join(meta)) def format_applet(section, s3_path): """ @@ -107,12 +160,13 @@ def format_applet(section, s3_path): # print(section) payload = section.strip('```').strip().strip('```').strip().split('\n') applet = {} - print(payload) + # print(payload) if ': ' in payload[0]: command, opt = payload[0].split(': ') else: command = payload[0] opt = None + print(command) if command == 'python' or command == 'javascript' or command == 'code': return format_section([ section ], s3_path) if command == '': diff --git a/megapixels/commands/site/watch.py b/megapixels/commands/site/watch.py index 7fd3ba7c..7bd71038 100644 --- a/megapixels/commands/site/watch.py +++ b/megapixels/commands/site/watch.py @@ -35,6 +35,8 @@ def cli(ctx): observer.schedule(SiteBuilder(), path=cfg.DIR_SITE_CONTENT, recursive=True) observer.start() + build_file(cfg.DIR_SITE_CONTENT + "/datasets/lfw/index.md") + try: while True: time.sleep(1) diff --git a/site/assets/css/css.css b/site/assets/css/css.css index 7b2e19fc..fed381a7 100644 --- a/site/assets/css/css.css +++ b/site/assets/css/css.css @@ -4,12 +4,12 @@ html, body { padding: 0; width: 100%; min-height: 100%; - font-family: 'Roboto', sans-serif; - color: #b8b8b8; + font-family: 'Roboto Mono', sans-serif; + color: #eee; overflow-x: hidden; } html { - background: #191919; + background: #111111; } .content { @@ -146,8 +146,8 @@ h2 { h3 { margin: 0 0 20px 0; padding: 0; - font-size: 11pt; - font-weight: 500; + font-size: 14pt; + font-weight: 600; transition: color 0.2s cubic-bezier(0,0,1,1); } h4 { @@ -165,8 +165,15 @@ h4 { color: #fff; text-decoration: underline; } +.right-sidebar h3 { + margin: 0; + padding: 0 0 10px 0; + font-family: 'Roboto Mono'; + text-transform: uppercase; + letter-spacing: 2px; +} -th, .gray, h3, h4 { +th, .gray { font-family: 'Roboto Mono', monospace; font-weight: 400; text-transform: uppercase; @@ -201,6 +208,7 @@ section { } p { margin: 0 0 20px 0; + line-height: 2; } .content a { color: #ddd; @@ -229,10 +237,13 @@ p { } .right-sidebar { float: right; - width: 200px; + width: 240px; margin-left: 20px; + padding-top: 10px; padding-left: 20px; border-left: 1px solid #444; + font-family: 'Roboto'; + font-size: 14px; } .right-sidebar .meta { flex-direction: column; @@ -240,6 +251,9 @@ p { .right-sidebar .meta > div { margin-bottom: 10px; } +.right-sidebar ul { + margin-bottom: 10px; +} /* lists */ @@ -346,17 +360,17 @@ section.wide .image { } section.fullwidth { width: 100%; - background-size: contain; } section.fullwidth .image { max-width: 100%; } .caption { - text-align: center; + text-align: left; font-size: 9pt; - color: #888; - max-width: 620px; + color: #bbb; + max-width: 960px; margin: 10px auto 0 auto; + font-family: 'Roboto'; } /* blog index */ @@ -499,3 +513,39 @@ section.fullwidth .image { .dataset-list a:nth-child(3n+3) { background-color: rgba(255, 255, 0, 0.1); } .desktop .dataset-list .dataset:nth-child(3n+3):hover { background-color: rgba(255, 255, 0, 0.2); } + + +/* intro section for datasets */ + +section.intro_section { + font-family: 'Roboto Mono'; + width: 100%; + background-size: cover; + background-position: bottom left; + padding: 50px 0; + min-height: 60vh; + display: flex; + justify-content: center; + align-items: center; + background-color: #111111; +} +.intro_section .inner { + max-width: 960px; + margin: 0 auto; +} +.intro_section .hero_desc { + font-size: 38px; + line-height: 60px; + margin-bottom: 30px; + color: #fff; +} +.intro_section .hero_subdesc { + font-size: 18px; + line-height: 36px; + max-width: 640px; + color: #ddd; +} +.intro_section span { + box-shadow: -10px -10px #000, 10px -10px #000, 10px 10px #000, -10px 10px #000; + background: #000; +} \ No newline at end of file diff --git a/site/assets/css/tabulator.css b/site/assets/css/tabulator.css index 200f0c5c..63abf050 100755 --- a/site/assets/css/tabulator.css +++ b/site/assets/css/tabulator.css @@ -493,7 +493,7 @@ display: inline-block; position: relative; box-sizing: border-box; - padding: 4px; + padding: 10px; border-right: 1px solid #333; vertical-align: middle; white-space: nowrap; diff --git a/site/content/pages/datasets/lfw/index.md b/site/content/pages/datasets/lfw/index.md index 48d86e1f..1995e1f9 100644 --- a/site/content/pages/datasets/lfw/index.md +++ b/site/content/pages/datasets/lfw/index.md @@ -2,14 +2,14 @@ status: published title: Labeled Faces in The Wild -desc: Labeled Faces in The Wild (LFW) is a database of face photographs designed for studying the problem of unconstrained face recognition +desc: Labeled Faces in The Wild (LFW) is a database of face photographs designed for studying the problem of unconstrained face recognition. subdesc: It includes 13,456 images of 4,432 people’s images copied from the Internet during 2002-2004. -image: lfw_index.gif +image: assets/lfw_feature.jpg caption: Eighteen of the 5,749 people in the Labeled Faces in the Wild Dataset. The most widely used face dataset for benchmarking commercial face recognition algorithms. slug: lfw published: 2019-2-23 updated: 2019-2-23 -color: #00FF00 +color: #ff0000 authors: Adam Harvey ------------ @@ -22,12 +22,11 @@ authors: Adam Harvey + Origin: Yahoo News Images + Funding: (Possibly, partially CIA) -### Analysis +### INSIGHTS - There are about 3 men for every 1 woman (4,277 men and 1,472 women) in the LFW dataset[^lfw_www] - The person with the most images is [George W. Bush](http://vis-www.cs.umass.edu/lfw/person/George_W_Bush_comp.html) with 530 - There are about 3 George W. Bush's for every 1 [Tony Blair](http://vis-www.cs.umass.edu/lfw/person/Tony_Blair.html) -- 70% of people in the dataset have only 1 image and 29% have 2 or more images - The LFW dataset includes over 500 actors, 30 models, 10 presidents, 124 basketball players, 24 football players, 11 kings, 7 queens, and 1 [Moby](http://vis-www.cs.umass.edu/lfw/person/Moby.html) - In all 3 of the LFW publications [^lfw_original_paper], [^lfw_survey], [^lfw_tech_report] the words "ethics", "consent", and "privacy" appear 0 times - The word "future" appears 71 times @@ -40,20 +39,20 @@ The LFW dataset includes 13,233 images of 5,749 people that were collected betwe The *Names and Faces* dataset was the first face recognition dataset created entire from online photos. However, *Names and Faces* and *LFW* are not the first face recognition dataset created entirely "in the wild". That title belongs to the [UCD dataset](/datasets/ucd_faces/). Images obtained "in the wild" means using an image without explicit consent or awareness from the subject or photographer. -### Synthetic Faces - -To visualize the types of photos in the dataset without explicitly publishing individual's identities a generative adversarial network (GAN) was trained on the entire dataset. The images in this video show a neural network learning the visual latent space and then interpolating between archetypical identities within the LFW dataset. - -![fullwidth:](assets/lfw_synthetic.jpg) - ### Biometric Trade Routes -To understand how this dataset has been used, its citations have been geocoded to show an approximate geographic digital trade route of the biometric data. Lines indicate an organization (education, commercial, or governmental) that has cited the LFW dataset in their research. Data is compiled from [SemanticScholar](https://www.semanticscholar.org). +To understand how this dataset has been used, its citations have been geocoded to show an approximate geographic digital trade route of the biometric data. Lines indicate an organization (education, commercial, or governmental) that has cited the LFW dataset in their research. Data is compiled from [Semantic Scholar](https://www.semanticscholar.org). ``` map ``` +### Synthetic Faces + +To visualize the types of photos in the dataset without explicitly publishing individual's identities a generative adversarial network (GAN) was trained on the entire dataset. The images in this video show a neural network learning the visual latent space and then interpolating between archetypical identities within the LFW dataset. + +![fullwidth:](assets/lfw_synthetic.jpg) + ### Citations Browse or download the geocoded citation data collected for the LFW dataset. @@ -136,6 +135,7 @@ Ignore text below these lines ------- + ### Research - "In our experiments, we used 10000 images and associated captions from the Faces in the wilddata set [3]." @@ -146,6 +146,7 @@ Ignore text below these lines - This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via contract number 2014-14071600010. - From "Labeled Faces in the Wild: Updates and New Reporting Procedures" +- 70% of people in the dataset have only 1 image and 29% have 2 or more images ### Footnotes diff --git a/site/content/pages/datasets/uccs/index.md b/site/content/pages/datasets/uccs/index.md index d40dce22..be1d2474 100644 --- a/site/content/pages/datasets/uccs/index.md +++ b/site/content/pages/datasets/uccs/index.md @@ -68,7 +68,7 @@ The more recent UCCS version of the dataset received funding from [^funding_uccs - You are welcomed to use these images for academic and journalistic use including for research papers, news stories, presentations. - Please use the following citation: -```MegaPixels.cc Adam Harvey 2013-2109.``` +```MegaPixels.cc Adam Harvey 2013-2019.``` [^funding_sb]: Sapkota, Archana and Boult, Terrance. "Large Scale Unconstrained Open Set Face Database." 2013. [^funding_uccs]: Günther, M. et. al. "Unconstrained Face Detection and Open-Set Face Recognition Challenge," 2018. Arxiv 1708.02337v3. \ No newline at end of file diff --git a/site/public/datasets/lfw/index.html b/site/public/datasets/lfw/index.html index 86f49c52..1242df0c 100644 --- a/site/public/datasets/lfw/index.html +++ b/site/public/datasets/lfw/index.html @@ -4,7 +4,7 @@ MegaPixels - + @@ -27,26 +27,26 @@

Statistics

Years

2002-2004

Images

13,233

Identities

5,749

Origin

Yahoo News Images

Funding

(Possibly, partially CIA)

Analysis

Labeled Faces in The Wild (LFW) is a database of face photographs designed for studying the problem of unconstrained face recognition.

It includes 13,456 images of 4,432 people’s images copied from the Internet during 2002-2004. +

Eighteen of the 5,749 people in the Labeled Faces in the Wild Dataset. The most widely used face dataset for benchmarking commercial face recognition algorithms.

Statistics

Years

2002-2004

Images

13,233

Identities

5,749

Origin

Yahoo News Images

Funding

(Possibly, partially CIA)

INSIGHTS

There are about 3 men for every 1 woman (4,277 men and 1,472 women) in the LFW dataset[^lfw_www]
The person with the most images is George W. Bush with 530
There are about 3 George W. Bush's for every 1 Tony Blair
70% of people in the dataset have only 1 image and 29% have 2 or more images
The LFW dataset includes over 500 actors, 30 models, 10 presidents, 124 basketball players, 24 football players, 11 kings, 7 queens, and 1 Moby
In all 3 of the LFW publications [^lfw_original_paper], [^lfw_survey], [^lfw_tech_report] the words "ethics", "consent", and "privacy" appear 0 times
The word "future" appears 71 times

Labeled Faces in the Wild

Synthetic Faces

Biometric Trade Routes

Synthetic Faces

Biometric Trade Routes

Citations

Browse or download the geocoded citation data collected for the LFW dataset.

Additional Information

(tweet-sized snippets go here)

@@ -94,24 +94,6 @@ imageio.imwrite('lfw_montage_960.jpg', montage)

Supplementary Material

Ignore text below these lines

Research

"In our experiments, we used 10000 images and associated captions from the Faces in the wilddata set [3]."
"This work was supported in part by the Center for Intelligent Information Retrieval, the Central Intelligence Agency, the National Security Agency and National Science Foundation under CAREER award IIS-0546666 and grant IIS-0326249."
From: "People-LDA: Anchoring Topics to People using Face Recognition" https://www.semanticscholar.org/paper/People-LDA%3A-Anchoring-Topics-to-People-using-Face-Jain-Learned-Miller/10f17534dba06af1ddab96c4188a9c98a020a459 and https://ieeexplore.ieee.org/document/4409055
This paper was presented at IEEE 11th ICCV conference Oct 14-21 and the main LFW paper "Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments" was also published that same year
10f17534dba06af1ddab96c4188a9c98a020a459
-
This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via contract number 2014-14071600010.
-
From "Labeled Faces in the Wild: Updates and New Reporting Procedures"

Footnotes

-- cgit v1.2.3-70-g09d2 From 421adbea75c5a4282630a7399f8b1018c4f0dd90 Mon Sep 17 00:00:00 2001 From: Jules Laplace Date: Wed, 27 Feb 2019 23:02:17 +0100 Subject: parser --- megapixels/app/site/parser.py | 29 ++++++++++++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-) (limited to 'megapixels/app/site/parser.py') diff --git a/megapixels/app/site/parser.py b/megapixels/app/site/parser.py index dc53177b..98d9f284 100644 --- a/megapixels/app/site/parser.py +++ b/megapixels/app/site/parser.py @@ -16,7 +16,9 @@ def parse_markdown(metadata, sections, s3_path, skip_h1=False): """ groups = [] current_group = [] + footnotes = [] in_stats = False + ignoring = False if 'desc' in metadata and 'subdesc' in metadata: groups.append(intro_section(metadata, s3_path)) @@ -27,7 +29,16 @@ def parse_markdown(metadata, sections, s3_path, skip_h1=False): elif section.strip().startswith('---'): continue elif section.lower().strip().startswith('ignore text'): - break + ignoring = True + continue + elif section.strip().startswith('### Footnotes'): + groups.append(format_section(current_group, s3_path)) + footnotes = [] + in_footnotes = True + elif in_footnotes: + footnotes.append(section) + elif ignoring: + continue elif '### Statistics' in section: if len(current_group): groups.append(format_section(current_group, s3_path)) @@ -70,9 +81,14 @@ def parse_markdown(metadata, sections, s3_path, skip_h1=False): else: current_group.append(section) groups.append(format_section(current_group, s3_path)) + + if len(footnotes): + groups.append(format_footnotes(footnotes, s3_path)) + content = "".join(groups) return content + def intro_section(metadata, s3_path): """ Build the intro section for datasets @@ -100,6 +116,7 @@ def intro_section(metadata, s3_path): return section + def fix_images(lines, s3_path): """ do our own tranformation of the markdown around images to handle wide images etc @@ -122,6 +139,7 @@ def fix_images(lines, s3_path): real_lines.append(line) return "\n".join(real_lines) + def format_section(lines, s3_path, type='', tag='section'): """ format a normal markdown section @@ -135,6 +153,7 @@ def format_section(lines, s3_path, type='', tag='section'): return "<{}>{}".format(tag, markdown(lines), tag) return "" + def fix_meta(lines): new_lines = [] for line in lines: @@ -143,6 +162,7 @@ def fix_meta(lines): new_lines.append(line) return new_lines + def format_metadata(section): """ format a metadata section (+ key: value pairs) @@ -153,6 +173,13 @@ def format_metadata(section): meta.append("

{}

".format(key, value)) return "

{}

".format(''.join(meta)) +def format_footnotes(footnotes): + footnotes = '\n'.join(footnotes).split('\n') + for footnote in footnotes: + if not len(footnote) or '[^' not in footnote: + continue + key, footnote = footnotes.split(': ') + def format_applet(section, s3_path): """ Format the applets, which load javascript modules like the map and CSVs -- cgit v1.2.3-70-g09d2 From 1b008e4b4d11def9b13dc0a800b0d068624d43ae Mon Sep 17 00:00:00 2001 From: Jules Laplace Date: Wed, 27 Feb 2019 23:48:35 +0100 Subject: half of a footnote implementation --- megapixels/app/site/parser.py | 35 +++++++++++++++++++++++++++++------ site/assets/css/css.css | 34 ++++++++++++++++++++++++++++++++++ site/public/datasets/lfw/index.html | 15 +++++++++------ 3 files changed, 72 insertions(+), 12 deletions(-) (limited to 'megapixels/app/site/parser.py') diff --git a/megapixels/app/site/parser.py b/megapixels/app/site/parser.py index 98d9f284..ef83b655 100644 --- a/megapixels/app/site/parser.py +++ b/megapixels/app/site/parser.py @@ -18,6 +18,7 @@ def parse_markdown(metadata, sections, s3_path, skip_h1=False): current_group = [] footnotes = [] in_stats = False + in_footnotes = False ignoring = False if 'desc' in metadata and 'subdesc' in metadata: @@ -33,6 +34,7 @@ def parse_markdown(metadata, sections, s3_path, skip_h1=False): continue elif section.strip().startswith('### Footnotes'): groups.append(format_section(current_group, s3_path)) + current_group = [] footnotes = [] in_footnotes = True elif in_footnotes: @@ -82,10 +84,18 @@ def parse_markdown(metadata, sections, s3_path, skip_h1=False): current_group.append(section) groups.append(format_section(current_group, s3_path)) + footnote_txt = '' + footnote_lookup = {} + if len(footnotes): - groups.append(format_footnotes(footnotes, s3_path)) + footnote_txt, footnote_lookup = format_footnotes(footnotes, s3_path) content = "".join(groups) + + if footnote_lookup: + for key, index in footnote_lookup.items(): + content = content.replace(key, '{}'.format(key, index, index)) + content += footnote_txt return content @@ -153,8 +163,10 @@ def format_section(lines, s3_path, type='', tag='section'): return "<{}>{}".format(tag, markdown(lines), tag) return "" - def fix_meta(lines): + """ + Format metadata sections before passing to markdown + """ new_lines = [] for line in lines: if line.startswith('+ '): @@ -162,7 +174,6 @@ def fix_meta(lines): new_lines.append(line) return new_lines - def format_metadata(section): """ format a metadata section (+ key: value pairs) @@ -173,12 +184,24 @@ def format_metadata(section): meta.append("

{}

".format(key, value)) return "

{}

".format(''.join(meta)) -def format_footnotes(footnotes): +def format_footnotes(footnotes, s3_path): + """ + Format the footnotes section separately and produce a lookup we can use to update the main site + """ footnotes = '\n'.join(footnotes).split('\n') + index = 1 + footnote_index_lookup = {} + footnote_list = [] for footnote in footnotes: if not len(footnote) or '[^' not in footnote: continue - key, footnote = footnotes.split(': ') + key, note = footnote.split(': ', 1) + footnote_index_lookup[key] = index + footnote_list.append('^'.format(key) + markdown(note)) + index += 1 + + footnote_txt = '

' + '
'.join(footnote_list) + '

' + return footnote_txt, footnote_index_lookup def format_applet(section, s3_path): """ @@ -189,7 +212,7 @@ def format_applet(section, s3_path): applet = {} # print(payload) if ': ' in payload[0]: - command, opt = payload[0].split(': ') + command, opt = payload[0].split(': ', 1) else: command = payload[0] opt = None diff --git a/site/assets/css/css.css b/site/assets/css/css.css index fed381a7..8b4241ea 100644 --- a/site/assets/css/css.css +++ b/site/assets/css/css.css @@ -548,4 +548,38 @@ section.intro_section { .intro_section span { box-shadow: -10px -10px #000, 10px -10px #000, 10px 10px #000, -10px 10px #000; background: #000; +} + +/* footnotes */ + +a.footnote { + font-size: 10px; + position: relative; + display: inline-block; + bottom: 10px; + text-decoration: none; + color: #ff0; + left: 2px; +} +.right-sidebar a.footnote { + bottom: 8px; +} +.desktop a.footnote:hover { + background-color: #ff0; + color: #000; +} +a.footnote_anchor { + font-weight: bold; + color: #ff0; + margin-right: 10px; + text-decoration: underline; + cursor: pointer; +} +ul.footnotes { + list-style-type: decimal; + margin-left: 30px; +} +li p { + margin: 0; padding: 0; + display: inline; } \ No newline at end of file diff --git a/site/public/datasets/lfw/index.html b/site/public/datasets/lfw/index.html index 1242df0c..54b6aa22 100644 --- a/site/public/datasets/lfw/index.html +++ b/site/public/datasets/lfw/index.html @@ -31,7 +31,7 @@

Eighteen of the 5,749 people in the Labeled Faces in the Wild Dataset. The most widely used face dataset for benchmarking commercial face recognition algorithms.

Statistics

Years

2002-2004

Images

13,233

Identities

5,749

Origin

Yahoo News Images

Funding

(Possibly, partially CIA)

INSIGHTS

There are about 3 men for every 1 woman (4,277 men and 1,472 women) in the LFW dataset[^lfw_www]
There are about 3 men for every 1 woman (4,277 men and 1,472 women) in the LFW dataset1
The person with the most images is George W. Bush with 530
There are about 3 George W. Bush's for every 1 Tony Blair
The LFW dataset includes over 500 actors, 30 models, 10 presidents, 124 basketball players, 24 football players, 11 kings, 7 queens, and 1 Moby
The word "future" appears 71 times

Labeled Faces in the Wild

Labeled Faces in The Wild (LFW) is "a database of face photographs designed for studying the problem of unconstrained face recognition1. It is used to evaluate and improve the performance of facial recognition algorithms in academic, commercial, and government research. According to BiometricUpdate.com3, LFW is "the most widely used evaluation set in the field of facial recognition, LFW attracts a few dozen teams from around the globe including Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong."

Biometric Trade Routes

@@ -51,11 +51,11 @@

Additional Information

(tweet-sized snippets go here)

The LFW dataset is considered the "most popular benchmark for face recognition" [^lfw_baidu]
The LFW dataset is "the most widely used evaluation set in the field of facial recognition" [^lfw_pingan]
The LFW dataset is considered the "most popular benchmark for face recognition" 2
The LFW dataset is "the most widely used evaluation set in the field of facial recognition" 3
All images in LFW dataset were obtained "in the wild" meaning without any consent from the subject or from the photographer
The faces in the LFW dataset were detected using the Viola-Jones haarcascade face detector [^lfw_website] [^lfw-survey]
The LFW dataset is used by several of the largest tech companies in the world including "Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong." [^lfw_pingan]
The LFW dataset is used by several of the largest tech companies in the world including "Google, Facebook, Microsoft Research Asia, Baidu, Tencent, SenseTime, Face++ and Chinese University of Hong Kong." 3
All images in the LFW dataset were copied from Yahoo News between 2002 - 2004
In 2014, two of the four original authors of the LFW dataset received funding from IARPA and ODNI for their followup paper Labeled Faces in the Wild: Updates and New Reporting Procedures via IARPA contract number 2014-14071600010
The dataset includes 2 images of George Tenet, the former Director of Central Intelligence (DCI) for the Central Intelligence Agency whose facial biometrics were eventually used to help train facial recognition software in China and Russia

Supplementary Material

^
http://vis-www.cs.umass.edu/lfw/results.html
+
^
Jingtuo Liu, Yafeng Deng, Tao Bai, Zhengping Wei, Chang Huang. Targeting Ultimate Accuracy: Face Recognition via Deep Embedding. https://arxiv.org/abs/1506.07310
+
^
Lee, Justin. "PING AN Tech facial recognition receives high score in latest LFW test results". BiometricUpdate.com. Feb 13, 2017. https://www.biometricupdate.com/201702/ping-an-tech-facial-recognition-receives-high-score-in-latest-lfw-test-results
+

+ + + \ No newline at end of file -- cgit v1.2.3-70-g09d2

LFW

Analysis

Statistics

Analysis

Labeled Faces in the Wild

Synthetic Faces

Biometric Trade Routes

Citations

Citations

Additional Information

Additional Information

The dataset includes 2 images of George Tenet, the former Director of Central Intelligence (DCI) for the Central Intelligence Agency whose facial biometrics were eventually used to help train facial recognition software in China and Russia

Code

Supplementary Material

Research

Footnotes

Statistics

Analysis

Statistics

INSIGHTS

Labeled Faces in the Wild

Labeled Faces in the Wild

Synthetic Faces

Biometric Trade Routes

Synthetic Faces

Biometric Trade Routes

Citations

Citations

Additional Information

Supplementary Material

Research

Footnotes

Statistics

INSIGHTS

Labeled Faces in the Wild

Biometric Trade Routes

Additional Information

Supplementary Material

Disclaimer

About MegaPixels

Press

Privacy Policy

Terms and Conditions ("Terms")

Facial Recognition Datasets

Face Recognition Datasets

SUMMARY

Dataset Portraits

Face Analysis

Face Recognition Datasets

SUMMARY

Dataset Portraits