diff options
Diffstat (limited to 'site/public/datasets')
| -rw-r--r-- | site/public/datasets/lfw/index.html | 2 | ||||
| -rw-r--r-- | site/public/datasets/vgg_face2/index.html | 19 |
2 files changed, 20 insertions, 1 deletions
diff --git a/site/public/datasets/lfw/index.html b/site/public/datasets/lfw/index.html index a8a0aa4b..9657b866 100644 --- a/site/public/datasets/lfw/index.html +++ b/site/public/datasets/lfw/index.html @@ -43,7 +43,7 @@ <p>The LFW dataset includes 13,233 images of 5,749 people that were collected between 2002-2004. LFW is a subset of <em>Names of Faces</em> and is part of the first facial recognition training dataset created entirely from images appearing on the Internet. The people appearing in LFW are...</p> <p>The <em>Names and Faces</em> dataset was the first face recognition dataset created entire from online photos. However, <em>Names and Faces</em> and <em>LFW</em> are not the first face recognition dataset created entirely "in the wild". That title belongs to the <a href="/datasets/ucd_faces/">UCD dataset</a>. Images obtained "in the wild" means using an image without explicit consent or awareness from the subject or photographer.</p> <h3>Biometric Trade Routes</h3> -<p>To understand how this dataset has been used, its citations have been geocoded to show an approximate geographic digital trade route of the biometric data. Lines indicate an organization (education, commercial, or governmental) that has cited the LFW dataset in their research. Data is compiled from <a href="https://www.semanticscholar.org">Semantic Scholar</a>.</p> +<p>[convert to template] To understand how this dataset has been used, its citations have been geocoded to show an approximate geographic digital trade route of the biometric data. Lines indicate an organization (education, commercial, or governmental) that has cited the LFW dataset in their research. Data is compiled from <a href="https://www.semanticscholar.org">Semantic Scholar</a>.</p> </section><section class='applet_container'><div class='applet' data-payload='{"command": "map"}'></div></section><section><h3>Synthetic Faces</h3> <p>To visualize the types of photos in the dataset without explicitly publishing individual's identities a generative adversarial network (GAN) was trained on the entire dataset. The images in this video show a neural network learning the visual latent space and then interpolating between archetypical identities within the LFW dataset.</p> </section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/lfw/assets/synthetic_01.jpg' alt='Synthetically generated face from the visual space of LFW dataset'><div class='caption'>Synthetically generated face from the visual space of LFW dataset</div></div> diff --git a/site/public/datasets/vgg_face2/index.html b/site/public/datasets/vgg_face2/index.html index f06bc879..d0a161cb 100644 --- a/site/public/datasets/vgg_face2/index.html +++ b/site/public/datasets/vgg_face2/index.html @@ -32,6 +32,25 @@ <ul> <li>The VGG Face 2 dataset includes approximately 1,331 actresses, 139 presidents, 16 wives, 3 husbands, 2 snooker player, and 1 guru</li> </ul> +<h3>Names and descriptions</h3> +<ul> +<li>The original VGGF2 name list has been updated with the results returned from Google Knowledge</li> +<li>Names with a similarity score greater than 0.75 where automatically updated. Scores computed using <code>import difflib; seq = difflib.SequenceMatcher(a=a.lower(), b=b.lower()); score = seq.ratio()</code></li> +<li>The 97 names with a score of 0.75 or lower were manually reviewed and includes name changes validating using Wikipedia.org results for names such as "Bruce Jenner" to "Caitlyn Jenner", spousal last-name changes, and discretionary changes to improve search results such as combining nicknames with full name when appropriate, for example changing "Aleksandar Petrović" to "Aleksandar 'Aco' Petrović" and minor changes such as "Mohammad Ali" to "Muhammad Ali"</li> +<li>The 'Description' text was automatically added when the Knowledge Graph score was greater than 250</li> +</ul> +<h2>TODO</h2> +<ul> +<li>create name list, and populate with Knowledge graph information like LFW</li> +<li>make list of interesting number stats, by the numbers</li> +<li>make list of interesting important facts</li> +<li>write intro abstract</li> +<li>write analysis of usage</li> +<li>find examples, citations, and screenshots of useage</li> +<li>find list of companies using it for table</li> +<li>create montages of the dataset, like LFW</li> +<li>create right to removal information</li> +</ul> </section> </div> |
