Google Research Blog
The latest news from Research at Google
Exploring and Visualizing an Open Global Dataset
Friday, August 25, 2017
Posted by Reena Jana, Creative Lead, Business Inclusion, and Josh Lovejoy, UX Designer, Google Research
Machine learning systems are increasingly influencing many aspects of everyday life, and are used by both the hardware and software products that serve people globally. As such, researchers and designers seeking to create products that are useful and accessible for
everyone
often face the challenge of finding data sets that reflect the variety and backgrounds of users around the world. In order to train these machine learning systems, open, global — and growing — datasets are needed.
Over the last six months, we’ve seen such a dataset emerge from users of
Quick, Draw!
, Google’s latest approach to helping wide, international audiences understand how neural networks work. A
group of Googlers
designed Quick, Draw! as a way for anyone to interact with a machine learning system in a fun way, drawing everyday objects like trees and mugs. The system will try to guess what their drawing depicts, within 20 seconds. While the goal of Quick, Draw! was simply to create a fun game that runs on machine learning, it has resulted in 800 million drawings from twenty million people in 100 nations, from Brazil to Japan to the U.S. to South Africa.
And now we are releasing an open dataset based on these drawings so that people around the world can contribute to, analyze, and inform product design with this data. The dataset currently includes 50 million drawings Quick Draw! players have generated (we will continue to release more of the 800 million drawings over time).
It’s a considerable amount of data; and it’s also a fascinating lens into how to engage a wide variety of people to participate in (1) training machine learning systems, no matter what their technical background; and (2) the creation of open data sets that reflect a wide spectrum of cultures and points of view.
Seeing national — and global — patterns in one glance
To understand visual patterns within the dataset quickly and efficiently, we worked with artist Kyle McDonald to overlay thousands of drawings from around the world. This helped us create composite images and identify trends in each nation, as well across all nations. We made animations of 1000 layered international drawings of cats and chairs, below, to share how we searched for visual trends with this data:
Cats, made from 1000 drawings from around the world:
Chairs, made from 1,000 drawings around the world:
Doodles of naturally recurring objects, like cats (or trees, rainbows, or skulls) often look alike across cultures:
However, for objects that might be familiar to some cultures, but not others, we saw notable differences. Sandwiches took defined forms or were a jumbled set of lines; mug handles pointed in opposite directions; and chairs were drawn facing forward or sideways, depending on the nation or region of the world:
One size doesn’t fit all
These composite drawings, we realized, could reveal how perspectives and preferences differ between audiences from different regions, from the type of bread used in sandwiches to the shape of a coffee cup, to the aesthetic of how to depict objects so they are visually appealing. For example, a more straightforward, head-on view was more consistent in some nations; side angles in others.
Overlaying the images also revealed how to improve how we train neural networks when we lack a variety of data — even within a large, open, and international data set. For example, when we analyzed 115,000+ drawings of shoes in the Quick, Draw! dataset, we discovered that a single style of shoe, which resembles a sneaker, was overwhelmingly represented. Because it was so frequently drawn, the neural network learned to recognize only
this
style as a “shoe.”
But just as in the physical world, in the realm of training data, one size does not fit all. We asked, how can we consistently and efficiently analyze datasets for clues that could point toward latent bias? And what would happen if a team built a classifier based on a non-varied set of data?
Diagnosing data for inclusion
With the open source tool
Facets
, released last month as part of Google’s
PAIR
initiative, one can see patterns across a large dataset quickly. The goal is to efficiently, and visually, diagnose how representative large datasets, like the Quick, Draw! Dataset, may be.
Here’s a screenshot from the Quick,Draw! dataset within the Facets tool. The tool helped us position thousands of drawings by "faceting" them in multiple dimensions by their feature values, such as country, up to 100 countries. You, too, can filter for for features such as “random faces” in a 10-country view, which can then be expanded to 100 countries. At a glance, you can see proportions of country representations. You can also zoom in and see details of each individual drawing, allowing you to dive deeper into single data points. This is especially helpful when working with a large visual data set like Quick, Draw!, allowing researchers to explore for subtle differences or anomalies, or to begin flagging small-scale visual trends that might emerge later as patterns within the larger data set.
Here’s the same Quick, Draw! data for “random faces,” faceted for 94 countries and seen from another view. It’s clear in the few seconds that Facets loads the drawings in this new visualization that the data is overwhelmingly representative of the United States and European countries. This is logical given that the Quick, Draw! game is currently only available in English. We plan to add more languages over time. However, the visualization shows us that Brazil and Thailand seem to be non-English-speaking nations that are relatively well-represented within the data. This suggested to us that designers could potentially research what elements of the interface design may have worked well in these countries. Then, we could use that information to improve Quick,Draw! in its next iteration for other global, non-English-speaking audiences. We’re also using the faceted data to help us figure out how prioritize local languages for future translations.
Another outcome of using Facets to diagnose the Quick, Draw! data for inclusion was to identify concrete ways that anyone can improve the variety of data, as well as check for potential biases. Improvements could include:
Changing protocols for human rating of data or content generation, so that the data is more accurately representative of local or global populations
Analyzing subgroups of data and identify the database equivalent of "intersectionality" surfaced within visual patterns
Augmenting and reweighting data so that it is more inclusive
By releasing this dataset, and tools like Facets, we hope to facilitate the exploration of more inclusive approaches to machine learning, and to turn those observations into opportunities for innovation. We’re just beginning to draw insights from both Quick, Draw! and Facets. And we invite you to draw more with us, too.
Acknowledgements
Jonas Jongejan, Henry Rowley, Takashi Kawashima, Jongmin Kim, Nick Fox-Gieg, built Quick, Draw! in collaboration with Google Creative Lab and Google’s Data Arts Team. The video about fairness in machine learning was created by Teo Soares, Alexander Chen, Bridget Prophet, Lisa Steinman, and JR Schmidt from Google Creative Lab. James Wexler, Jimbo Wilson, and Mahima Pushkarna, of PAIR, designed Facets, a project led by Martin Wattenberg and Fernanda Viégas, Senior Staff Research Scientists on the Google Brain team, and UX Researcher Jess Holbrook. Ian Johnson from the Google Cloud team contributed to the visualizations of overlaid drawings.
Neural Network-Generated Illustrations in Allo
Thursday, May 11, 2017
Posted by Jennifer Daniel, Expressions Creative Director, Allo
Taking, sharing, and viewing selfies has become a daily habit for many — the car selfie, the cute-outfit selfie, the travel selfie, the I-woke-up-like-this selfie. Apart from a social capacity, self-portraiture has long served as a means for self and identity exploration. For some, it’s about figuring out who they are. For others it’s about projecting how they want to be perceived. Sometimes it’s both.
Photography in the form of a selfie is a very direct form of expression. It comes with a set of rules bounded by reality. Illustration, on the other hand, empowers people to define themselves - it’s warmer and less fraught than reality.
Today, Google is introducing a feature in
Allo
that uses a combination of neural networks and the work of artists to turn your selfie into a personalized sticker pack.
Simply snap a selfie
, and it’ll return an automatically generated illustrated version of you, on the fly, with customization options to help you personalize the stickers even further.
What makes you,
you
?
The traditional computer vision approach to mapping selfies to art would be to analyze the pixels of an image and algorithmically determine attribute values by looking at pixel values to measure color, shape, or texture. However, people today take selfies in all types of lighting conditions and poses. And while people can easily pick out and recognize qualitative features, like eye color, regardless of the lighting condition, this is a very complex task for computers. When people look at eye color, they don’t just interpret the pixel values of blue or green, but take into account the
surrounding visual context
.
In order to account for this, we explored how we could enable an algorithm to pick out qualitative features in a manner similar to the way people do, rather than the traditional approach of hand coding how to interpret every permutation of lighting condition, eye color, etc. While we could have trained a large convolutional neural network from scratch to attempt to accomplish this, we wondered if there was a more efficient way to get results, since we expected that learning to interpret a face into an illustration would be a very iterative process.
That led us to run some experiments, similar to
DeepDream
, on some of Google's existing more general-purpose computer vision neural networks. We discovered that a few neurons among the millions in these networks were good at focusing on things they weren’t explicitly trained to look at that seemed useful for creating personalized stickers. Additionally, by virtue of being large general-purpose neural networks they had already figured out how to abstract away things they didn’t need. All that was left to do was to provide a much smaller number of human labeled examples to teach the classifiers to isolate out the qualities that the neural network already knew about the image.
To create an illustration of you that captures the qualities that would make it recognizable to your friends, we worked alongside an artistic team to create illustrations that represented a wide variety of features. Artists initially designed a set of hairstyles, for example, that they thought would be representative, and with the help of human raters we used these hairstyles to train the network to match the right illustration to the right selfie. We then asked human raters to judge the sticker output against the input image to see how well it did. In some instances, they determined that some styles were not well represented, so the artists created more that the neural network could learn to identify as well.
Raters were asked to classify hairstyles that the icon on the left resembled closest. Then, once consensus was reached, resident artist
Lamar Abrams
drew a representation of what they had in common.
Avoiding the uncanny valley
In the study of aesthetics, a well-known problem is the
uncanny valley
- the hypothesis that human replicas which appear almost, but not exactly, like real human beings can feel repulsive. In machine learning, this could be compounded if were confronted by a computer’s perception of you, versus how you may think of yourself, which can be at odds.
Rather than aim to replicate a person’s appearance exactly, pursuing a lower resolution model, like emojis and stickers, allows the team to explore expressive representation by returning an image that is less about reproducing reality and more about breaking the rules of representation.
The team worked with artist
Lamar Abrams
to design the features that make up more than 563 quadrillion combinations.
Translating pixels to artistic illustrations
Reconciling how the computer perceives you with how you perceive yourself and what you want to project is truly an artistic exercise. This makes a customization feature that includes different hairstyles, skin tones, and nose shapes, essential. After all, illustration by its very nature can be subjective. Aesthetics are defined by race, culture, and class which can lead to creating zones of exclusion without consciously trying. As such, we strove to create a space for a range of race, age, masculinity, femininity, and/or androgyny. Our teams continue to evaluate the research results to help prevent against incorporating biases while training the system.
Creating a broad palette for identity and sentiment
There is no such thing as a ‘universal aesthetic’ or ‘a singular you’. The way people talk to their parents is different than how they talk to their friends which is different than how they talk to their colleagues. It’s not enough to make an avatar that is a literal representation of yourself when there are many versions of you. To address that, the Allo team is working with a range of artistic voices to help others extend their own voice. This first style that launched today speaks to your sarcastic side but the next pack might be more cute for those sincere moments. Then after that, maybe they’ll turn you into a dog. If emojis broadened the world of communication it’s not hard to imagine how this technology and language evolves. What will be most exciting is listening to what people say with it.
This feature is starting to roll out in
Allo today for Android
, and will come soon to Allo on iOS.
Acknowledgements
This work was made possible through a collaboration of the Allo Team and
Machine Perception
researchers at Google. We additionally thank Lamar Abrams, Koji Ashida, Forrester Cole, Jennifer Daniel, Shiraz Fuman, Dilip Krishnan, Inbar Mosseri, Aaron Sarna, Aaron Maschinot and Bhavik Singh.
Exploring the Intersection of Art and Machine Intelligence
Monday, February 22, 2016
Posted by Mike Tyka, Software Engineer
In June of last year, we
published a story
about a visualization technique that helped to understand how neural networks carried out difficult visual classification tasks. In addition to helping us gain a deeper understanding of how NNs worked, these techniques also produced
strange, wonderful and oddly compelling images
.
Following that blog post, and especially after
we released the source code
, dubbed DeepDream, we
witnessed a tremendous interest
not only from the machine learning community but also from the creative coding community. Additionally, several artists such as
Amanda Peterson
(aka Gucky),
Memo Akten
,
Samim Winiger
,
Kyle McDonald
,
Gene Kogan
and many others immediately started experimenting with the technique as a new way to create art.
“
GCHQ
”, 2015, Memo Akten, used with permission.
Soon after, the paper
A Neural Algorithm of Artistic Style
by Leon Gatys in Tuebingen was released. Their technique used a convolutional neural network to factor images into their separate style and content components. This in turn allowed the creation, by using a neural network as a generic image parser, of new images that combined the style of one with the content of another. Once again it took the creative coding community by storm and immediately many artists and coders began
experimenting
with the new algorithm, resulting in
Twitter bots
and other
explorations
and
experiments
.
The
style transfer algorithm
crosses a photo with a painting style; for example
Neil deGrasse Tyson
in the style of
Kadinsky’s Jane Rouge Bleu
. Photo by
Guillaume Piolle
, used with permission.
The open-source deep-learning community, especially projects such as
GitXiv
, hugely contributed to the spread, accessibility and development of these algorithms. Both DeepDream and style transfer were rapidly implemented in a plethora of different languages and deep learning packages. Immediately others took the techniques and developed them
further
.
“Saxophone dreams” - Mike Tyka.
With machine learning as field moving forward at a breakneck pace and rapidly becoming part of many -- if not most -- online products, the opportunities for artistic uses are as wide as they are unexplored and perhaps overlooked. However the interest is growing rapidly: the University of London is now offering a course on
Machine learning and art
. NYU ITP offers
a similar program
this year. The Tate Modern’s IK Prize 2016 topic:
Artificial Intelligence
.
These are exciting early days, and we want to continue to stimulate artistic interest in these emerging technologies. To that end, we are announcing a two day DeepDream event in San Francisco at the
Gray Area Foundation for the Arts
, aimed at showcasing some of the latest exploration of the intersection of Machine Intelligence and Art, and spurring discussion focused around future directions:
Friday Feb 26th:
DeepDream: The Art of Neural Networks
, an exhibit consisting of 29 neural network generated artworks, created by artists at Google and from around the world. The works will be auctioned, with all proceeds going to the Gray Area Foundation, which has been active in supporting the intersection between arts and technology for over 10 years.
On Saturday Feb 27th
:
Art and Machine Learning Symposium
, an open one-day symposium on Machine Learning and Art, aiming to bring together the neural network and the creative coding communities to exchange ideas, learn and discuss. Videos of all the talks will be posted online after the event.
We look forward to sharing some of the interesting works of art generated by the art and machine learning community, and being part of the discussion of how art and technology can be combined.
Labels
accessibility
ACL
ACM
Acoustic Modeling
Adaptive Data Analysis
ads
adsense
adwords
Africa
AI
Algorithms
Android
Android Wear
API
App Engine
App Inventor
April Fools
Art
Audio
Augmented Reality
Australia
Automatic Speech Recognition
Awards
Cantonese
Chemistry
China
Chrome
Cloud Computing
Collaboration
Computational Imaging
Computational Photography
Computer Science
Computer Vision
conference
conferences
Conservation
correlate
Course Builder
crowd-sourcing
CVPR
Data Center
Data Discovery
data science
datasets
Deep Learning
DeepDream
DeepMind
distributed systems
Diversity
Earth Engine
economics
Education
Electronic Commerce and Algorithms
electronics
EMEA
EMNLP
Encryption
entities
Entity Salience
Environment
Europe
Exacycle
Expander
Faculty Institute
Faculty Summit
Flu Trends
Fusion Tables
gamification
Gboard
Gmail
Google Accelerated Science
Google Books
Google Brain
Google Cloud Platform
Google Docs
Google Drive
Google Genomics
Google Maps
Google Photos
Google Play Apps
Google Science Fair
Google Sheets
Google Translate
Google Trips
Google Voice Search
Google+
Government
grants
Graph
Graph Mining
Hardware
HCI
Health
High Dynamic Range Imaging
ICLR
ICML
ICSE
Image Annotation
Image Classification
Image Processing
Inbox
India
Information Retrieval
internationalization
Internet of Things
Interspeech
IPython
Journalism
jsm
jsm2011
K-12
KDD
Keyboard Input
Klingon
Korean
Labs
Linear Optimization
localization
Low-Light Photography
Machine Hearing
Machine Intelligence
Machine Learning
Machine Perception
Machine Translation
Magenta
MapReduce
market algorithms
Market Research
Mixed Reality
ML
MOOC
Moore's Law
Multimodal Learning
NAACL
Natural Language Processing
Natural Language Understanding
Network Management
Networks
Neural Networks
Nexus
Ngram
NIPS
NLP
On-device Learning
open source
operating systems
Optical Character Recognition
optimization
osdi
osdi10
patents
Peer Review
ph.d. fellowship
PhD Fellowship
PhotoScan
Physics
PiLab
Pixel
Policy
Professional Development
Proposals
Public Data Explorer
publication
Publications
Quantum AI
Quantum Computing
renewable energy
Research
Research Awards
resource optimization
Robotics
schema.org
Search
search ads
Security and Privacy
Semantic Models
Semi-supervised Learning
SIGCOMM
SIGMOD
Site Reliability Engineering
Social Networks
Software
Speech
Speech Recognition
statistics
Structured Data
Style Transfer
Supervised Learning
Systems
TensorBoard
TensorFlow
TPU
Translate
trends
TTS
TV
UI
University Relations
UNIX
User Experience
video
Video Analysis
Virtual Reality
Vision Research
Visiting Faculty
Visualization
VLDB
Voice Search
Wiki
wikipedia
WWW
YouTube
Archive
2018
May
Apr
Mar
Feb
Jan
2017
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Jul
May
Apr
Mar
Feb
2007
Oct
Sep
Aug
Jul
Jun
Feb
2006
Dec
Nov
Sep
Aug
Jul
Jun
Apr
Mar
Feb
Feed
Google
on
Follow @googleresearch
Give us feedback in our
Product Forums
.