1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
|
<!doctype html>
<html>
<head>
<title>MegaPixels: Brainwash Dataset</title>
<meta charset="utf-8" />
<meta name="author" content="Adam Harvey" />
<meta name="description" content="Brainwash is a dataset of webcam images taken from the Brainwash Cafe in San Francisco" />
<meta property="og:title" content="MegaPixels: Brainwash Dataset"/>
<meta property="og:type" content="website"/>
<meta property="og:summary" content="MegaPixels is an art and research project about face recognition datasets created \"in the wild\"/>
<meta property="og:image" content="https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/brainwash/assets/background.jpg" />
<meta property="og:url" content="https://megapixels.cc/datasets/brainwash/"/>
<meta property="og:site_name" content="MegaPixels" />
<meta name="referrer" content="no-referrer" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=no"/>
<meta name="apple-mobile-web-app-status-bar-style" content="black">
<meta name="apple-mobile-web-app-capable" content="yes">
<link rel="apple-touch-icon" sizes="57x57" href="/assets/img/favicon/apple-icon-57x57.png">
<link rel="apple-touch-icon" sizes="60x60" href="/assets/img/favicon/apple-icon-60x60.png">
<link rel="apple-touch-icon" sizes="72x72" href="/assets/img/favicon/apple-icon-72x72.png">
<link rel="apple-touch-icon" sizes="76x76" href="/assets/img/favicon/apple-icon-76x76.png">
<link rel="apple-touch-icon" sizes="114x114" href="/assets/img/favicon/apple-icon-114x114.png">
<link rel="apple-touch-icon" sizes="120x120" href="/assets/img/favicon/apple-icon-120x120.png">
<link rel="apple-touch-icon" sizes="144x144" href="/assets/img/favicon/apple-icon-144x144.png">
<link rel="apple-touch-icon" sizes="152x152" href="/assets/img/favicon/apple-icon-152x152.png">
<link rel="apple-touch-icon" sizes="180x180" href="/assets/img/favicon/apple-icon-180x180.png">
<link rel="icon" type="image/png" sizes="192x192" href="/assets/img/favicon/android-icon-192x192.png">
<link rel="icon" type="image/png" sizes="32x32" href="/assets/img/favicon/favicon-32x32.png">
<link rel="icon" type="image/png" sizes="96x96" href="/assets/img/favicon/favicon-96x96.png">
<link rel="icon" type="image/png" sizes="16x16" href="/assets/img/favicon/favicon-16x16.png">
<link rel="manifest" href="/assets/img/favicon/manifest.json">
<meta name="msapplication-TileColor" content="#ffffff">
<meta name="msapplication-TileImage" content="/ms-icon-144x144.png">
<meta name="theme-color" content="#ffffff">
<link rel='stylesheet' href='/assets/css/fonts.css' />
<link rel='stylesheet' href='/assets/css/css.css' />
<link rel='stylesheet' href='/assets/css/leaflet.css' />
<link rel='stylesheet' href='/assets/css/applets.css' />
<link rel='stylesheet' href='/assets/css/mobile.css' />
</head>
<body>
<header>
<a class='slogan' href="/">
<div class='logo'></div>
<div class='site_name'>MegaPixels</div>
<div class='page_name'>Brainwash Dataset</div>
</a>
<div class='links'>
<a href="/datasets/">Datasets</a>
<a href="/about/">About</a>
<a href="/research">Research</a>
</div>
</header>
<div class="content content-dataset">
<section class='intro_section' style='background-image: url(https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/brainwash/assets/background.jpg)'></section><section><div class='image'><div class='intro-caption caption'>One of the 11,917 images in the Brainwash dataset captured from the Brainwash Cafe in San Francisco</div></div></section><section><h1>Brainwash Dataset</h1>
<p>Update: In response to the publication of this report, the Brainwash dataset has been "removed from access at the request of the depositor."</p>
</section><section><div class='right-sidebar'><div class='meta'>
<div class='gray'>Published</div>
<div>2015</div>
</div><div class='meta'>
<div class='gray'>Images</div>
<div>11,917 </div>
</div><div class='meta'>
<div class='gray'>Purpose</div>
<div>Head detection</div>
</div><div class='meta'>
<div class='gray'>Created by</div>
<div>Stanford University (US), Max Planck Institute for Informatics (DE)</div>
</div><div class='meta'>
<div class='gray'>Funded by</div>
<div>Max Planck Center for Visual Computing and Communication</div>
</div><div class='meta'>
<div class='gray'>Download Size</div>
<div>4.1 GB</div>
</div><div class='meta'>
<div class='gray'>Website</div>
<div><a href='https://purl.stanford.edu/sx925dc9385' target='_blank' rel='nofollow noopener'>stanford.edu</a></div>
</div><div class='meta'><div class='gray'>Press coverage</div><div><a href="https://www.nytimes.com/2019/07/13/technology/">New York Times</a>, <a href="https://www.tijd.be/dossier/legrandinconnu/brainwash/10136670.html">De Tijd</a></div></div></div><p>Brainwash is a dataset of livecam images taken from San Francisco's Brainwash Cafe. It includes 11,917 images of "everyday life of a busy downtown cafe"<a class="footnote_shim" name="[^readme]_1"> </a><a href="#[^readme]" class="footnote" title="Footnote 1">1</a> captured at 100 second intervals throughout the day. The Brainwash dataset includes 3 full days of webcam images taken on October 27, November 13, and November 24 in 2014. According the author's <a href="https://www.semanticscholar.org/paper/End-to-End-People-Detection-in-Crowded-Scenes-Stewart-Andriluka/1bd1645a629f1b612960ab9bba276afd4cf7c666">research paper</a> introducing the dataset, the images were acquired with the help of Angelcam.com. <a class="footnote_shim" name="[^end_to_end]_1"> </a><a href="#[^end_to_end]" class="footnote" title="Footnote 2">2</a></p>
<p>The Brainwash dataset is unique because it uses images from a publicly available webcam that records people inside a privately owned business without their consent. No ordinary cafe customer could ever suspect that their image would end up in dataset used for surveillance research and development, but that is exactly what happened to customers at Brainwash Cafe in San Francisco.</p>
<p>Although Brainwash appears to be a less popular dataset, it was notably used in 2016 and 2017 by researchers affiliated with the National University of Defense Technology in China for two <a href="https://www.semanticscholar.org/paper/Localized-region-context-and-object-feature-fusion-Li-Dou/b02d31c640b0a31fb18c4f170d841d8e21ffb66c">research</a> <a href="https://www.semanticscholar.org/paper/A-Replacement-Algorithm-of-Non-Maximum-Suppression-Zhao-Wang/591a4bfa6380c9fcd5f3ae690e3ac5c09b7bf37b">projects</a> on advancing the capabilities of object detection to more accurately isolate the target region in an image. <a class="footnote_shim" name="[^localized_region_context]_1"> </a><a href="#[^localized_region_context]" class="footnote" title="Footnote 3">3</a> <a class="footnote_shim" name="[^replacement_algorithm]_1"> </a><a href="#[^replacement_algorithm]" class="footnote" title="Footnote 4">4</a> The <a href="https://en.wikipedia.org/wiki/National_University_of_Defense_Technology">National University of Defense Technology</a> is controlled by China's top military body, the Central Military Commission.</p>
<p>The Brainwash dataset also appears in a 2018 research paper affiliated with Megvii (Face++) that used images from Brainwash cafe "to validate the generalization ability of [their] CrowdHuman dataset for head detection."<a class="footnote_shim" name="[^crowdhuman]_1"> </a><a href="#[^crowdhuman]" class="footnote" title="Footnote 5">5</a>. Megvii is the parent company of Face++, who has provided surveillance technology to <a href="https://www.nytimes.com/2019/04/14/technology/china-surveillance-artificial-intelligence-racial-profiling.html">monitor Uighur Muslims</a> in Xinjiang and may be <a href="https://www.bloomberg.com/news/articles/2019-05-22/trump-weighs-blacklisting-two-chinese-surveillance-companies">blacklisted</a> in the United States.</p>
<h4>Updates</h4>
<p>Since <a href="https://twitter.com/adamhrv/status/1132201604999000065">posting</a> about this dataset and <a href="https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d43599e">showing</a> its connections to the National Unviversity of Defense Technology in China, the Brainwash dataset is no longer available for download. As of June 2, 2019 it has been "removed from access at the request of the depositor."</p>
<p>The two papers associated with the National University of Defense Technology in China have also been affected. The citations linking back to the Brainwash dataset paper no longer appear in the Semantic Scholar API search results. The citation references on the pages for <a href="https://www.semanticscholar.org/paper/A-Replacement-Algorithm-of-Non-Maximum-Suppression-Zhao-Wang/591a4bfa6380c9fcd5f3ae690e3ac5c09b7bf37b">NUDT citation 1</a> and <a href="https://www.semanticscholar.org/paper/Localized-region-context-and-object-feature-fusion-Li-Dou/b02d31c640b0a31fb18c4f170d841d8e21ffb66c">NUDT citation 2</a> now display the text "Sorry, this paper is not in our corpus", no longer linking back to the <a href="https://www.semanticscholar.org/paper/End-to-End-People-Detection-in-Crowded-Scenes-Stewart-Andriluka/1bd1645a629f1b612960ab9bba276afd4cf7c666">original Brainwash paper</a>, effectively censoring the NUDT connections from API search results.</p>
</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/brainwash/assets/brainwash_example.jpg' alt=' An sample image from the Brainwash dataset used for training face and head detection algorithms for surveillance. The dataset contains a total of 11,917 images and 81,973 annotated heads. Graphic by megapixels.cc based on Brainwash dataset by Russel et. al. License: <a href="https://opendatacommons.org/licenses/pddl/summary/index.html">Open Data Commons Public Domain Dedication</a> (PDDL)'><div class='caption'> An sample image from the Brainwash dataset used for training face and head detection algorithms for surveillance. The dataset contains a total of 11,917 images and 81,973 annotated heads. Graphic by megapixels.cc based on Brainwash dataset by Russel et. al. License: <a href="https://opendatacommons.org/licenses/pddl/summary/index.html">Open Data Commons Public Domain Dedication</a> (PDDL)</div></div></section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/brainwash/assets/brainwash_saliency_map.jpg' alt=' A visualization of the active regions for 81,973 head annotations in the Brainwash dataset training partition. Graphic by megapixels.cc based on Brainwash dataset by Russel et. al. License: <a href="https://opendatacommons.org/licenses/pddl/summary/index.html">Open Data Commons Public Domain Dedication</a> (PDDL)'><div class='caption'> A visualization of the active regions for 81,973 head annotations in the Brainwash dataset training partition. Graphic by megapixels.cc based on Brainwash dataset by Russel et. al. License: <a href="https://opendatacommons.org/licenses/pddl/summary/index.html">Open Data Commons Public Domain Dedication</a> (PDDL)</div></div></section><section>
<h3>Who used Brainwash Dataset?</h3>
<p>
This bar chart presents a ranking of the top countries where dataset citations originated. Mouse over individual columns to see yearly totals. These charts show at most the top 10 countries.
</p>
</section>
<section class="applet_container">
<!-- <div style="position: absolute;top: 0px;right: -55px;width: 180px;font-size: 14px;">Labeled Faces in the Wild Dataset<br><span class="numc" style="font-size: 11px;">20 citations</span>
</div> -->
<div class="applet" data-payload="{"command": "chart"}"></div>
</section>
<section class="applet_container">
<div class="applet" data-payload="{"command": "piechart"}"></div>
</section>
<section>
<h3>Information Supply Chain</h3>
<p>
To help understand how Brainwash Dataset has been used around the world by commercial, military, and academic organizations; existing publicly available research citing Brainwash Dataset was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.
</p>
</section>
<section class="applet_container fullwidth">
<div class="applet" data-payload="{"command": "map"}"></div>
</section>
<div class="caption">
<ul class="map-legend">
<li class="edu">Academic</li>
<li class="com">Commercial</li>
<li class="gov">Military / Government</li>
</ul>
<div class="source">Citation data is collected using SemanticScholar.org then dataset usage verified and geolocated. Citations are used to provide overview of how and where images were used.</div>
</div>
<section class="applet_container">
<h3>Dataset Citations</h3>
<p>
The dataset citations used in the visualizations were collected from <a href="https://www.semanticscholar.org">Semantic Scholar</a>, a website which aggregates and indexes research papers. Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train or test machine learning algorithms. If you use our data, please <a href="/about/attribution">cite our work</a>.
</p>
<div class="applet" data-payload="{"command": "citations"}"></div>
</section><section>
<div class="hr-wave-holder">
<div class="hr-wave-line hr-wave-line1"></div>
<div class="hr-wave-line hr-wave-line2"></div>
</div>
<h2>Supplementary Information</h2>
</section><section class='images'><div class='image'><img src='https://nyc3.digitaloceanspaces.com/megapixels/v1/datasets/brainwash/assets/brainwash_grid.jpg' alt=' Nine of 11,917 images from the the Brainwash dataset. Graphic: megapixels.cc based on Brainwash dataset by Russel et. al. License: <a href="https://opendatacommons.org/licenses/pddl/summary/index.html">Open Data Commons Public Domain Dedication</a> (PDDL)'><div class='caption'> Nine of 11,917 images from the the Brainwash dataset. Graphic: megapixels.cc based on Brainwash dataset by Russel et. al. License: <a href="https://opendatacommons.org/licenses/pddl/summary/index.html">Open Data Commons Public Domain Dedication</a> (PDDL)</div></div></section><section><h3>Press Coverage</h3>
<ul>
<li>New York Times: <a href="https://www.nytimes.com/2019/07/13/technology/">Facial Recognition Tech Is Growing Stronger, Thanks to Your Face</a></li>
<li>De Tijd: <a href="https://www.tijd.be/dossier/legrandinconnu/brainwash/10136670.html">Brainwash</a></li>
</ul>
</section><section>
<h4>Cite Our Work</h4>
<p>
If you find this analysis helpful, please cite our work:
<pre id="cite-bibtex">
@online{megapixels,
author = {Harvey, Adam. LaPlace, Jules.},
title = {MegaPixels: Origins, Ethics, and Privacy Implications of Publicly Available Face Recognition Image Datasets},
year = 2019,
url = {https://megapixels.cc/},
urldate = {2019-04-18}
}</pre>
</p>
</section><section><h4>Citing Brainwash Dataset</h4>
<p>If you use any data from the Brainwash dataset, please follow their <a href="https://opendatacommons.org/licenses/pddl/summary/index.html">license</a> and cite their work as:</p>
<pre>
@article{Stewart2016EndtoEndPD,
title={End-to-End People Detection in Crowded Scenes},
author={Russell Stewart and Mykhaylo Andriluka and Andrew Y. Ng},
journal={2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2016},
pages={2325-2333}
}
</pre></section><section><h3>References</h3><section><ul class="footnotes"><li>1 <a name="[^readme]" class="footnote_shim"></a><span class="backlinks"><a href="#[^readme]_1">a</a></span>"readme.txt" <a href="https://exhibits.stanford.edu/data/catalog/sx925dc9385">https://exhibits.stanford.edu/data/catalog/sx925dc9385</a>.
</li><li>2 <a name="[^end_to_end]" class="footnote_shim"></a><span class="backlinks"><a href="#[^end_to_end]_1">a</a></span>Stewart, Russel. Andriluka, Mykhaylo. "End-to-end people detection in crowded scenes". 2016.
</li><li>3 <a name="[^localized_region_context]" class="footnote_shim"></a><span class="backlinks"><a href="#[^localized_region_context]_1">a</a></span>Li, Y. and Dou, Y. and Liu, X. and Li, T. Localized Region Context and Object Feature Fusion for People Head Detection. ICIP16 Proceedings. 2016. Pages 594-598.
</li><li>4 <a name="[^replacement_algorithm]" class="footnote_shim"></a><span class="backlinks"><a href="#[^replacement_algorithm]_1">a</a></span>Zhao. X, Wang Y, Dou, Y. A Replacement Algorithm of Non-Maximum Suppression Base on Graph Clustering.
</li><li>5 <a name="[^crowdhuman]" class="footnote_shim"></a><span class="backlinks"><a href="#[^crowdhuman]_1">a</a></span>Shuai Shao, Zijian Zhao, Boxun Li, Tete Xiao, Gang Yu, Xiangyu Zhang, and Jian Sun. CrowdHuman: Benchmark for Detecting Human in a Crowd. 2018. <a href="http://arxiv.org/abs/1805.00123">http://arxiv.org/abs/1805.00123</a>
</li></ul></section></section>
</div>
<footer>
<ul class="footer-left">
<li><a href="/">MegaPixels.cc</a></li>
<li><a href="/datasets/">Datasets</a></li>
<li><a href="/about/">About</a></li>
<li><a href="/about/news/">News</a></li>
<li><a href="/about/legal/">Legal & Privacy</a></li>
</ul>
<ul class="footer-right">
<li>MegaPixels ©2017-19 <a href="https://ahprojects.com">Adam R. Harvey</a></li>
<li>Made with support from <a href="https://mozilla.org">Mozilla</a></li>
</ul>
</footer>
</body>
<script src="/assets/js/dist/index.js"></script>
</html>
|