MegaPixels

Brainwash is a dataset of people from webcams the Brainwash Cafe in San Francisco being used to train face detection algorithms

Brainwash dataset includes 11,918 images of "everyday life of a busy downtown cafe"

Brainwash Dataset

Brainwash is a face detection dataset created from the Brainwash Cafe's livecam footage including 11,918 images of "everyday life of a busy downtown cafe 1". The images are used to develop face detection algorithms for the "challenging task of detecting people in crowded scenes" and tracking them.

Before closing in 2017, Brainwash Cafe was a "cafe and laundromat" located in San Francisco's SoMA district. The cafe published a publicy available livestream from the cafe with a view of the cash register, performance stage, and seating area.

Since it's publication by Stanford in 2015, the Brainwash dataset has appeared in several notable research papers. In September 2016 four researchers from the National University of Defense Technology in Changsha, China used the Brainwash dataset for a research study on "people head detection in crowded scenes", concluding that their algorithm "achieves superior head detection performance on the crowded scenes dataset 2". And again in 2017 three researchers at the National University of Defense Technology used Brainwash for a study on object detection noting "the data set used in our experiment is shown in Table 1, which includes one scene of the brainwash dataset 3".

An sample image from the Brainwash dataset used for training face and head detection algorithms for surveillance. The datset contains about 12,000 images. License: Open Data Commons Public Domain Dedication (PDDL)

49 of the 11,918 images included in the Brainwash dataset. License: Open Data Commons Public Domain Dedication (PDDL)

Information Supply Chain

To understand how and where this dataset has been used, organizations using the dataset are plotted below. The data is generated by collecting all citations for all the original research papers associated with the dataset. The PDFs are then converted to text and the organization names are extracted and geocoded. Because of the automated approach to extracting data, not all organizations have been confirmed as using the dataset. This visualization is provided to help locate and confirm usage and will be updated as data noise is reduced.

Academic

Industry

Government

Data is compiled from Semantic Scholar and not yet manually verified.

Supplementary Information

Citations

Citations were collected from Semantic Scholar, a website which aggregates and indexes research papers. Metadata was extracted from these papers, including extracting names of institutions automatically from PDFs, and then the addresses were geocoded. Data is not yet manually verified, and reflects anytime the paper was cited. Some papers may only mention the dataset in passing, while others use it as part of their research methodology.

Add button/link to download CSV

This bar chart presents a ranking of the top countries where citations originated. Mouse over individual columns to see yearly totals. Colors are only assigned to the top 10 overall countries.

Additional Information

The dataset author spoke about his research at the CVPR conference in 2016 https://www.youtube.com/watch?v=Nl2fBKxwusQ

a
"readme.txt" https://exhibits.stanford.edu/data/catalog/sx925dc9385.
a
Li, Y. and Dou, Y. and Liu, X. and Li, T. Localized Region Context and Object Feature Fusion for People Head Detection. ICIP16 Proceedings. 2016. Pages 594-598.
a
Zhao. X, Wang Y, Dou, Y. A Replacement Algorithm of Non-Maximum Suppression Base on Graph Clustering.