From 2813b772c8a088307f7a1ab9df167875d320162d Mon Sep 17 00:00:00 2001 From: adamhrv Date: Wed, 17 Apr 2019 22:46:34 +0200 Subject: update duke --- site/public/research/02_what_computers_can_see/index.html | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'site/public/research') diff --git a/site/public/research/02_what_computers_can_see/index.html b/site/public/research/02_what_computers_can_see/index.html index d139e83e..aac0b723 100644 --- a/site/public/research/02_what_computers_can_see/index.html +++ b/site/public/research/02_what_computers_can_see/index.html @@ -52,6 +52,10 @@
  • tired, drowsiness in car
  • affectiva: interest in product, intent to buy
  • +

    From SenseTime paper

    +

    Exploring Disentangled Feature Representation Beyond Face Identification

    +

    From https://arxiv.org/pdf/1804.03487.pdf +The attribute IDs from 1 to 40 corre-spond to: ‘5 o Clock Shadow’, ‘Arched Eyebrows’, ‘Attrac-tive’, ‘Bags Under Eyes’, ‘Bald’, ‘Bangs’, ‘Big Lips’, ‘BigNose’, ‘Black Hair’, ‘Blond Hair’, ‘Blurry’, ‘Brown Hair’,‘Bushy Eyebrows’, ‘Chubby’, ‘Double Chin’, ‘Eyeglasses’,‘Goatee’, ‘Gray Hair’, ‘Heavy Makeup’, ‘High Cheek-bones’, ‘Male’, ‘Mouth Slightly Open’, ‘Mustache’, ‘Nar-row Eyes’, ‘No Beard’, ‘Oval Face’, ‘Pale Skin’, ‘PointyNose’, ‘Receding Hairline’, ‘Rosy Cheeks’, ‘Sideburns’,‘Smiling’, ‘Straight Hair’, ‘Wavy Hair’, ‘Wearing Ear-rings’, ‘Wearing Hat’, ‘Wearing Lipstick’, ‘Wearing Neck-lace’, ‘Wearing Necktie’ and ‘Young’. It’

    From PubFig Dataset

    -

    References

    References

    diff --git a/site/public/datasets/msceleb/index.html b/site/public/datasets/msceleb/index.html index 1f037bae..c42a2767 100644 --- a/site/public/datasets/msceleb/index.html +++ b/site/public/datasets/msceleb/index.html @@ -4,7 +4,7 @@ MegaPixels - + @@ -26,7 +26,7 @@
    -
    MS Celeb is a dataset of web images used for training and evaluating face recognition algorithms
    The MS Celeb dataset includes over 10,000,000 images and 93,000 identities of semi-public figures collected using the Bing search engine +
    Microsoft Celeb 1M is a target list and dataset of web images used for research and development of face recognition technologies
    The MS Celeb dataset includes over 10 million images of about 100K people and a target list of 1 million individuals

    Microsoft Celeb Dataset (MS Celeb)

    The Microsoft Celeb dataset is a face recognition training site made entirely of images scraped from the Internet. According to Microsoft Research who created and published the dataset in 2016, MS Celeb is the largest publicly available face recognition dataset in the world, containing over 10 million images of 100,000 individuals.

    -

    But Microsoft's ambition was bigger. They wanted to recognize 1 million individuals. As part of their dataset they released a list of 1 million target identities for researchers to identity. The identities

    -

    https://www.microsoft.com/en-us/research/publication/ms-celeb-1m-dataset-benchmark-large-scale-face-recognition-2/

    -

    In 2019, Microsoft CEO Brad Smith called for the governmental regulation of face recognition, an admission of his own company's inability to control their surveillance-driven business model. Yet since then, and for the last 4 years, Microsoft has willingly and actively played a significant role in accelerating growth in the very same industry they called for the government to regulate. This investigation looks look into the MS Celeb dataset and Microsoft Research's role in creating and distributing the largest publicly available face recognition dataset in the world to both.

    -

    to spur growth and incentivize researchers, Microsoft released a dataset called MS Celeb, or Microsft Celeb, in which they developed and published a list of exactly 1 million targeted people whose biometrics would go on to build

    +

    Microsoft Celeb (MS Celeb) is a dataset of 10 million face images scraped from the Internet and used for research and development of large-scale biometric recognition systems. According to Microsoft Research who created and published the dataset in 2016, MS Celeb is the largest publicly available face recognition dataset in the world, containing over 10 million images of nearly 100,000 individuals. Microsoft's goal in building this dataset was to distribute the initial training dataset of 100,000 individuals images and use this to accelerate reserch into recognizing a target list of one million individuals from their face images "using all the possibly collected face images of this individual on the web as training data". 2

    +

    These one million people, defined as Micrsoft Research as "celebrities", are often merely people who must maintain an online presence for their professional lives. Microsoft's list of 1 million people is an expansive exploitation of the current reality that for many people including academics, policy makers, writers, artists, and especially journalists maintaining an online presence is mandatory and should not allow Microsoft (or anyone else) to use their biometrics for reserach and development of surveillance technology. Many of names in target list even include people critical of the very technology Microsoft is using their name and biometric information to build. The list includes digital rights activists like Jillian York and [add more]; artists critical of surveillance including Trevor Paglen, Hito Steryl, Kyle McDonald, Jill Magid, and Aram Bartholl; Intercept founders Laura Poitras, Jeremy Scahill, and Glen Greenwald; Data and Society founder danah boyd; and even Julie Brill the former FTC commissioner responsible for protecting consumer’s privacy to name a few.

    +

    Microsoft's 1 Million Target List

    +

    Below is a list of names that were included in list of 1 million individuals curated to illustrate Microsoft's expansive and exploitative practice of scraping the Internet for biometric training data. The entire name file can be downloaded from msceleb.org. Names appearing with * indicate that Microsoft also distributed imaged.

    +

    [ cleaning this up ]

    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    NameIDProfessionImages
    Jeremy Scahill/m/02p_8_nJournalistx
    Jillian York/m/0g9_3c3Digital rights activistx
    Astra Taylor/m/05f6_39Author, activistx
    Jonathan Zittrain/m/01f75cEFF board memberno
    Julie Brillxxx
    Jonathan Zittrainxxx
    Bruce Schneierm.095jsCryptologist and authoryes
    Julie Brillm.0bs3s9gxx
    Kim Zetter/m/09r4j3xx
    Ethan Zuckermanxxx
    Jill Magidxxx
    Kyle McDonaldxxx
    Trevor Paglenxxx
    R. Luke DuBoisxxx
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    NameIDProfessionImages
    Trevor Paglenxxx
    Ai Weiwei/m/0278dyqxx
    Jer Thorp/m/01h8lgxx
    Edward Felten/m/028_7kxx
    Evgeny Morozov/m/05sxhgdScholar and technology criticyes
    danah boyd/m/06zmx5Data and Society founderx
    Bruce Schneierxxx
    Laura Poitrasxxx
    Trevor Paglenxxx
    Astra Taylorxxx
    Shoshanaa Zuboffxxx
    Eyal Weizmanm.0g54526xx
    Aram Barthollm.06_wjycxx
    James Risenm.09pk6bxx
    +

    After publishing this list, researchers from Microsoft Asia then worked with researchers affilliated with China's National University of Defense Technology (controlled by China's Central Military Commission) and used the the MS Celeb dataset for their research paper on using "Faces as Lighting Probes via Unsupervised Deep Highlight Extraction" with potential applications in 3D face recognition.

    +

    In an article published by the Financial Times based on data discovered during this investigation, Samm Sacks (senior fellow at New American and China tech policy expert) commented that this research raised "red flags because of the nature of the technology, the authors affilliations, combined with the what we know about how this technology is being deployed in China right now". 3

    +

    Four more papers published by SenseTime which also use the MS Celeb dataset raise similar flags. SenseTime is Beijing based company providing surveillance to Chinese authorities including [ add context here ] has been flagged as complicity in potential human rights violations.

    +

    One of the 4 SenseTime papers, "Exploring Disentangled Feature Representation Beyond Face Identification", shows how SenseTime is developing automated face analysis technology to infer race, narrow eyes, nose size, and chin size, all of which could be used to target vulnerable ethnic groups based on their facial appearances. 4

    +

    Earlier in 2019, Microsoft CEO Brad Smith called for the governmental regulation of face recognition, citing the potential for misuse, a rare admission that Microsoft's surveillance-driven business model had lost its bearing. More recently Smith also announced that Microsoft would seemingly take stand against potential misuse and decided to not sell face recognition to an unnamed United States law enforcement agency, citing that their technology was not accurate enough to be used on minorities because it was trained mostly on white male faces.

    +

    What the decision to block the sale announces is not so much that Microsoft has upgraded their ethics, but that it publicly acknolwedged it can't sell a data-driven product without data. Microsoft can't sell face recognition for faces they can't train on.

    +

    Until now, that data has been freely harvested from the Internet and packaged in training sets like MS Celeb, which are overwhelmingly white and male. Without balanced data, facial recognition contains blind spots. And without datasets like MS Celeb, the powerful yet innaccurate facial recognition services like Microsoft's Azure Cognitive Service also would not be able to see at all.

    +

    Microsoft didn't only create MS Celeb for other researchers to use, they also used it internally. In a publicly available 2017 Microsoft Research project called "(One-shot Face Recognition by Promoting Underrepresented Classes)", Microsoft leveraged the MS Celeb dataset to analyse their algorithms and advertise the results. Interestingly, the Microsoft's corporate version does not mention they used the MS Celeb datset, but the open-acess version of the paper published on arxiv.org that same year explicity mentions that Microsoft Research tested their algorithms "on the MS-Celeb-1M low-shot learning benchmark task."

    +

    We suggest that if Microsoft Research wants biometric data for surveillance research and development, they should start with own researcher's biometric data instead of scraping the Internet for journalists, artists, writers, and academics.

    Who used Microsoft Celeb?

    @@ -114,14 +313,10 @@

    Supplementary Information

    -

    Additional Information

    - -

    References

    • Brad Smith cite

      +

    References

    • 1 Brad Smith cite +
    • 2 aMS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition +
    • 3 aMicrosoft worked with Chinese military university on artificial intelligence +
    • 4 a"Exploring Disentangled Feature Representation Beyond Face Identification"
    diff --git a/site/public/datasets/oxford_town_centre/index.html b/site/public/datasets/oxford_town_centre/index.html index cf81e2ef..fabcae6b 100644 --- a/site/public/datasets/oxford_town_centre/index.html +++ b/site/public/datasets/oxford_town_centre/index.html @@ -132,8 +132,8 @@ }

    -

    References

    • a

      Benfold, Ben and Reid, Ian. "Stable Multi-Target Tracking in Real-Time Surveillance Video". CVPR 2011. Pages 3457-3464.

      -
    • a

      "Guiding Visual Surveillance by Tracking Human Attention". 2009.

      +

    References

    • 1 aBenfold, Ben and Reid, Ian. "Stable Multi-Target Tracking in Real-Time Surveillance Video". CVPR 2011. Pages 3457-3464. +
    • 2 a"Guiding Visual Surveillance by Tracking Human Attention". 2009.
    diff --git a/site/public/datasets/uccs/index.html b/site/public/datasets/uccs/index.html index 5fdde7e1..3296cabc 100644 --- a/site/public/datasets/uccs/index.html +++ b/site/public/datasets/uccs/index.html @@ -51,7 +51,7 @@
    uccs.edu

    UnConstrained College Students (UCCS) is a dataset of long-range surveillance photos captured at University of Colorado Colorado Springs developed primarily for research and development of "face detection and recognition research towards surveillance applications" 1. According to the authors of two papers associated with the dataset, over 1,700 students and pedestrians were "photographed using a long-range high-resolution surveillance camera without their knowledge". 3 In this investigation, we examine the contents of the dataset, its funding sources, photo EXIF data, and information from publicly available research project citations.

    The UCCS dataset includes over 1,700 unique identities, most of which are students walking to and from class. As of 2018, it was the "largest surveillance [face recognition] benchmark in the public domain." 4 The photos were taken during the spring semesters of 2012 – 2013 on the West Lawn of the University of Colorado Colorado Springs campus. The photographs were timed to capture students during breaks between their scheduled classes in the morning and afternoon during Monday through Thursday. "For example, a student taking Monday-Wednesday classes at 12:30 PM will show up in the camera on almost every Monday and Wednesday." 2.

    -
     The location at University of Colorado Colorado Springs where students were surreptitiously photographed with a long-range surveillance camera for use in a defense and intelligence agency funded research project on face recognition. Image: Google Maps
    The location at University of Colorado Colorado Springs where students were surreptitiously photographed with a long-range surveillance camera for use in a defense and intelligence agency funded research project on face recognition. Image: Google Maps

    The long-range surveillance images in the UnContsrained College Students dataset were taken using a Canon 7D 18-megapixel digital camera fitted with a Sigma 800mm F5.6 EX APO DG HSM telephoto lens and pointed out an office window across the university's West Lawn. The students were photographed from a distance of approximately 150 meters through an office window. "The camera [was] programmed to start capturing images at specific time intervals between classes to maximize the number of faces being captured." 2 +

     The location at University of Colorado Colorado Springs where students were surreptitiously photographed with a long-range surveillance camera for use in a defense and intelligence agency funded research project on face recognition. Image: Google Maps
    The location at University of Colorado Colorado Springs where students were surreptitiously photographed with a long-range surveillance camera for use in a defense and intelligence agency funded research project on face recognition. Image: Google Maps

    The long-range surveillance images in the UnConsrained College Students dataset were taken using a Canon 7D 18-megapixel digital camera fitted with a Sigma 800mm F5.6 EX APO DG HSM telephoto lens and pointed out an office window across the university's West Lawn. The students were photographed from a distance of approximately 150 meters through an office window. "The camera [was] programmed to start capturing images at specific time intervals between classes to maximize the number of faces being captured." 2 Their setup made it impossible for students to know they were being photographed, providing the researchers with realistic surveillance images to help build face recognition systems for real world applications for defense, intelligence, and commercial partners.

     Example images from the UnConstrained College Students Dataset.
    Example images from the UnConstrained College Students Dataset.

    The EXIF data embedded in the images shows that the photo capture times follow a similar pattern to that outlined by the researchers, but also highlights that the vast majority of photos (over 7,000) were taken on Tuesdays around noon during students' lunch break. The lack of any photos taken between Friday through Sunday shows that the researchers were only interested in capturing images of students during the peak campus hours.

     UCCS photos captured per weekday © megapixels.cc
    UCCS photos captured per weekday © megapixels.cc
     UCCS photos captured per weekday © megapixels.cc
    UCCS photos captured per weekday © megapixels.cc

    The two research papers associated with the release of the UCCS dataset (Unconstrained Face Detection and Open-Set Face Recognition Challenge and Large Scale Unconstrained Open Set Face Database), acknowledge that the primary funding sources for their work were United States defense and intelligence agencies. Specifically, development of the UnContsrianed College Students dataset was funded by the Intelligence Advanced Research Projects Activity (IARPA), Office of Director of National Intelligence (ODNI), Office of Naval Research and The Department of Defense Multidisciplinary University Research Initiative (ONR MURI), and the Special Operations Command and Small Business Innovation Research (SOCOM SBIR) amongst others. UCCS's VAST site also explicitly states their involvement in the IARPA Janus face recognition project developed to serve the needs of national intelligence, establishing that immediate benefactors of this dataset include United States defense and intelligence agencies, but it would go on to benefit other similar organizations.

    @@ -250,10 +250,10 @@ Their setup made it impossible for students to know they were being photographed }

    -

    References

    • a

      "2nd Unconstrained Face Detection and Open Set Recognition Challenge." https://vast.uccs.edu/Opensetface/. Accessed April 15, 2019.

      -
    • ab

      Sapkota, Archana and Boult, Terrance. "Large Scale Unconstrained Open Set Face Database." 2013.

      -
    • a

      Günther, M. et. al. "Unconstrained Face Detection and Open-Set Face Recognition Challenge," 2018. Arxiv 1708.02337v3.

      -
    • a

      "Surveillance Face Recognition Challenge". SemanticScholar

      +

    References

    • 1 a"2nd Unconstrained Face Detection and Open Set Recognition Challenge." https://vast.uccs.edu/Opensetface/. Accessed April 15, 2019. +
    • 2 abSapkota, Archana and Boult, Terrance. "Large Scale Unconstrained Open Set Face Database." 2013. +
    • 3 aGünther, M. et. al. "Unconstrained Face Detection and Open-Set Face Recognition Challenge," 2018. Arxiv 1708.02337v3. +
    • 4 a"Surveillance Face Recognition Challenge". SemanticScholar
    diff --git a/site/public/research/00_introduction/index.html b/site/public/research/00_introduction/index.html index 535958cc..ef8a5316 100644 --- a/site/public/research/00_introduction/index.html +++ b/site/public/research/00_introduction/index.html @@ -42,10 +42,15 @@
    Posted
    Dec. 15
    Author
    Adam Harvey

    Facial recognition is a scam.

    +

    It's extractive and damaging industry that's built on the biometric backbone of the Internet.

    During the last 20 years commericial, academic, and governmental agencies have promoted the false dream of a future with face recognition. This essay debunks the popular myth that such a thing ever existed.

    There is no such thing as face recognition. For the last 20 years, government agencies, commercial organizations, and academic institutions have played the public as a fool, selling a roadmap of the future that simply does not exist. Facial recognition, as it is currently defined, promoted, and sold to the public, government, and commercial sector is a scam.

    Committed to developing robust solutions with superhuman accuracy, the industry has repeatedly undermined itself by never actually developing anything close to "face recognition".

    There is only biased feature vector clustering and probabilistic thresholding.

    +

    If you don't have data, you don't have a product.

    +

    Yesterday's decision by Brad Smith, CEO of Microsoft, to not sell facial recognition to a US law enforcement agency is not an about face by Microsoft to become more humane, it's simply a perfect illustration of the value of training data. Without data, you don't have a product to sell. Microsoft realized that doesn't have enough training data to sell

    +

    Use Your Own Biometrics First

    +

    If researchers want faces, they should take selfies and create their own dataset. If researchers want images of families to build surveillance software, they should use and distibute their own family portraits.

    Motivation

    Ever since government agencies began developing face recognition in the early 1960's, datasets of face images have always been central to developing and validating face recognition technologies. Today, these datasets no longer originate in labs, but instead from family photo albums posted on photo sharing sites, surveillance camera footage from college campuses, search engine queries for celebrities, cafe livestreams, or videos on YouTube.

    During the last year, hundreds of these facial analysis datasets created "in the wild" have been collected to understand how they contribute to a global supply chain of biometric data that is powering the global facial recognition industry.

    -- cgit v1.2.3-70-g09d2