Blog
The latest news from Google AI
Scalable Deep Reinforcement Learning for Robotic Manipulation
Thursday, June 28, 2018
Posted Alex Irpan, Software Engineer, Google Brain Team and Peter Pastor, Senior Roboticist, X
How can robots acquire skills that generalize effectively to diverse, real-world objects and situations? While designing robotic systems that effectively perform repetitive tasks in controlled environments, like building products on an assembly line, is fairly routine, designing robots that can observe their surroundings and decide the best course of action while reacting to unexpected outcomes is exceptionally difficult. However, there are two tools that can help robots acquire such skills from experience:
deep learning
, which is excellent at handling unstructured real-world scenarios, and
reinforcement learning
, which enables longer-term reasoning while exhibiting more complex and robust sequential decision making. Combining these two techniques has the potential to enable robots to learn continuously from their experience, allowing them to master basic sensorimotor skills using data rather than manual engineering.
Designing reinforcement learning algorithms for robot learning introduces its own set of challenges: real-world objects span a wide variety of visual and physical properties, subtle differences in contact forces can make predicting object motion difficult and objects of interest can be obstructed from view. Furthermore, robotic sensors are inherently noisy, adding to the complexity. All of these factors makes it incredibly difficult to learn a general solution, unless there is enough variety in the training data, which takes time to collect. This motivates exploring learning algorithms that can effectively reuse past experience, similar to our
previous work
on grasping which benefited from large datasets. However, this previous work could not reason about the long-term consequences of its actions, which is important for learning how to grasp. For example, if multiple objects are clumped together, pushing one of them apart (called “singulation”) will make the grasp easier, even if doing so does not directly result in a successful grasp.
Examples of singulation.
To be more efficient, we need to use off-policy reinforcement learning, which can learn from data that was collected hours, days, or weeks ago. To design such an off-policy reinforcement learning algorithm that can benefit from large amounts of diverse experience from past interactions, we combined
large-scale distributed optimization
with a new fitted deep
Q-learning
algorithm that we call QT-Opt. A preprint is available on
arXiv
.
QT-Opt is a distributed Q-learning algorithm that supports continuous action spaces, making it well-suited to robotics problems. To use QT-Opt, we first train a model entirely offline, using whatever data we’ve already collected. This doesn’t require running the real robot, making it easier to scale. We then deploy and finetune that model on the real robot, further training it on newly collected data. As we run QT-Opt, we accumulate more offline data, letting us train better models, which lets us collect better data, and so on.
To apply this approach to robotic grasping, we used 7 real-world robots, which ran for 800 total robot hours over the course of 4 months. To bootstrap collection, we started with a hand-designed policy that succeeded 15-30% of the time. Data collection switched to the learned model when it started performing better. The policy takes a camera image and returns how the arm and gripper should move. The offline data contained grasps on over 1000 different objects.
Some of the training objects used.
In the past, we’ve seen
that sharing experience across robots can accelerate learning
. We scaled this training and data gathering process to ten GPUs, seven robots, and many CPUs, allowing us to collect and process a large dataset of over 580,000 grasp attempts. At the end of this process, we successfully trained a grasping policy that runs on a real world robot and generalizes to a diverse set of challenging objects that were not seen at training time.
Seven robots collecting grasp data.
Quantitatively, the QT-Opt approach succeeded in 96% of the grasp attempts across 700 trial grasps on previously unseen objects. Compared to our previous supervised-learning based grasping approach, which had a 78% success rate, our method reduced the error rate by more than a factor of five.
The objects used at evaluation time. To make the task challenging, we aimed for a large variety of object sizes, textures, and shapes.
Notably, the policy exhibits a variety of closed-loop, reactive behaviors that are often not found in standard robotic grasping systems:
When presented with a set of interlocking blocks that cannot be picked up together, the policy separates one of the blocks from the rest before picking it up.
When presented with a difficult-to-grasp object, the policy figures out it should reposition the gripper and regrasp it until it has a firm hold.
When grasping in clutter, the policy probes different objects until the fingers hold one of them firmly, before lifting.
When we perturbed the robot by intentionally swatting the object out of the gripper -- something it had not seen during training -- it automatically repositioned the gripper for another attempt.
Crucially, none of these behaviors were engineered manually. They emerged automatically from self-supervised training with QT-Opt, because they improve the model’s long-term grasp success.
Examples of the learned behaviors. In the left GIF, the policy corrects for the moved ball. In the right GIF, the policy tries several grasps until it succeeds at picking up the tricky object.
Additionally, we’ve found that QT-Opt reaches this higher success rate using less training data, albeit with taking longer to converge. This is especially exciting for robotics, where the bottleneck is usually collecting real robot data, rather than training time. Combining this with other data efficiency techniques (such as our prior work on
domain adaptation
for grasping) could open several interesting avenues in robotics. We’re also interested in combining QT-Opt with
recent work on learning how to self-calibrate
, which could further improve the generality.
Overall, the QT-Opt algorithm is a general reinforcement learning approach that’s giving us good results on real world robots. Besides the reward definition, nothing about QT-Opt is specific to robot grasping. We see this as a strong step towards more general robot learning algorithms, and are excited to see what other robotics tasks we can apply it to. You can learn more about this work in the short video below.
Acknowledgements
This research was conducted by Dmitry Kalashnikov, Alex Irpan, Peter Pastor, Julian Ibarz, Alexander Herzog, Eric Jang, Deirdre Quillen, Ethan Holly, Mrinal Kalakrishnan, Vincent Vanhoucke, and Sergey Levine. We’d also like to give special thanks to Iñaki Gonzalo and John-Michael Burke for overseeing the robot operations, Chelsea Finn, Timothy Lillicrap, and Arun Nair for valuable discussions, and other people at Google and
X
who’ve contributed their expertise and time towards this research. A preprint is available on
arXiv
.
Self-Supervised Tracking via Video Colorization
Wednesday, June 27, 2018
Posted by Carl Vondrick, Research Scientist, Machine Perception
Tracking objects in video is a fundamental problem in computer vision, essential to applications such as
activity recognition
,
object interaction
, or
video stylization
. However, teaching a machine to visually track objects is challenging partly because it requires large, labeled tracking datasets for training, which are impractical to annotate at scale.
In “
Tracking Emerges by Colorizing Videos
”, we introduce a convolutional network that colorizes grayscale videos, but is constrained to copy colors from a single reference frame. In doing so, the network learns to visually track objects automatically without supervision. Importantly, although the model was never trained explicitly for tracking, it can follow multiple objects, track through occlusions, and remain robust over deformations without requiring
any
labeled training data.
Example tracking predictions on the publicly-available, academic dataset
DAVIS 2017
. After learning to colorize videos, a mechanism for tracking automatically emerges without supervision. We specify regions of interest (indicated by different colors) in the first frame, and our model propagates it forward without any additional learning or supervision.
Learning to Recolorize Video
Our hypothesis is that the temporal coherency of color provides excellent large-scale training data for teaching machines to track regions in video. Clearly, there are exceptions when color is not temporally coherent (such as lights turning on suddenly), but in general color is stable over time. Furthermore, most videos contain color, providing a scalable self-supervised learning signal. We decolor videos, and then add the colorization step because there may be multiple objects with the same color, but by colorizing we can teach machines to track specific objects or regions.
In order to train our system, we use videos from the
Kinetics dataset
, which is a large public collection of videos depicting everyday activities. We convert all video frames except the first frame into gray-scale, and train a convolutional network to predict the original colors in the subsequent frames. We expect the model to learn to follow regions in order to accurately recover the original colors. Our main observation is the need to follow objects for colorization will cause a model for object tracking to be automatically learned.
We illustrate the video recolorization task using video from
the DAVIS 2017 dataset
. The model receives as input one color frame and a gray-scale video, and predicts the colors for the rest of the video. The model learns to copy colors from the reference frame, which enables a mechanism for tracking to be learned without human supervision.
Learning to copy colors from the single reference frame requires the model to learn to internally point to the right region in order to copy the right colors. This forces the model to learn an explicit mechanism that we can use for tracking. To see how the video colorization model works, we show some predicted colorizations from videos in the Kinetics dataset below.
Examples of predicted colors from colorized reference frame applied to input video using the publicly-available
Kinetics dataset
.
Although the network is trained without ground-truth identities, our model learns to track any visual region specified in the first frame of a video. We can track outlined objects or a single point in the video. The only change we make is that, instead of propagating colors throughout the video, we now propagate labels representing the regions of interest.
Analyzing the Tracker
Since the model is trained on large amounts of unlabeled video, we want to gain insight into what the model learns. The videos below show a standard trick to visualize the embeddings learned by our model by projecting them down to three dimensions using
Principal Component Analysis
(PCA) and plotting it as an RGB movie. The results show that nearest neighbors in the learned embedding space tend to correspond to object identity, even over deformations and viewpoint changes.
Top Row: We show videos from the
DAVIS 2017 dataset
. Bottom Row: We visualize the internal embeddings from the colorization model. Similar embeddings will have a similar color in this visualization. This suggests the learned embedding is grouping pixels by object identity.
Tracking Pose
We found the model can also track human poses given key-points in an initial frame. We show results on the publicly-available, academic dataset
JHMDB
where we track a human joint skeleton.
Examples of using the model to track movements of the human skeleton. In this case the input was a human pose for the first frame and subsequent movement is automatically tracked. The model can track human poses even though it was never explicitly trained for this task.
While we do not yet outperform heavily supervised models, the colorization model learns to track video segments and human pose well enough to outperform the
latest methods
based on
optical flow
. Breaking down performance by motion type suggests that our model is a more robust tracker than optical flow for many natural complexities, such as dynamic backgrounds, fast motion, and occlusions. Please see
the paper
for details.
Future Work
Our results show that video colorization provides a signal that can be used for learning to track objects in videos without supervision. Moreover, we found that the failures from our system are correlated with failures to colorize the video, which suggests that further improving the video colorization model can advance progress in self-supervised tracking.
Acknowledgements
This project was only possible thanks to several collaborations at Google. The core team includes Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama and Kevin Murphy. We also thank David Ross, Bryan Seybold, Chen Sun and Rahul Sukthankar.
Teaching Uncalibrated Robots to Visually Self-Adapt
Friday, June 22, 2018
Posted by Fereshteh Sadeghi, Student Researcher, Google Brain Team
People are remarkably proficient at manipulating objects without needing to adjust their viewpoint to a fixed or specific pose. This capability (referred to as
visual motor integration
) is learned during childhood from manipulating objects in various situations, and governed by a self-adaptation and mistake correction mechanism that uses rich sensory cues and vision as feedback. However, this capability is quite difficult for vision-based controllers in robotics, which until now have been built on a rigid setup for reading visual input data from a fixed mounted camera which should not be moved or repositioned at train and test time. The ability to quickly acquire visual motor control skills under large viewpoint variation would have substantial implications for autonomous robotic systems — for example, this capability would be particularly desirable for robots that can help rescue efforts in emergency or disaster zones.
In “
Sim2Real Viewpoint Invariant Visual Servoing by Recurrent Control
” presented at
CVPR 2018
this week, we study a novel deep network architecture (consisting of two
fully convolutional networks
and a
long short-term memory
unit) that learns from a past history of actions and observations to self-calibrate. Using diverse simulated data consisting of demonstrated trajectories and reinforcement learning objectives, our visually-adaptive network is able to control a robotic arm to reach a diverse set of visually-indicated goals, from various viewpoints and independent of camera calibration.
Viewpoint invariant manipulation for visually indicated goal reaching with a physical robotic arm. We learn a single policy that can reach diverse goals from sensory input captured from drastically different camera viewpoints. First row shows the visually indicated goals.
The Challenge
Discovering how the controllable
degrees of freedom
(DoF) affect visual motion can be ambiguous and underspecified from a single image captured from an unknown viewpoint. Identifying the effect of actions on image-space motion and successfully performing the desired task requires a robust perception system augmented with the ability to maintain a memory of past actions. To be able to tackle this challenging problem, we had to address the following essential questions:
How can we make it feasible to provide the right amount of experience for the robot to learn the self-adaptation behavior based on pure visual observations that simulate a lifelong learning paradigm?
How can we design a model that integrates robust perception and self-adaptive control such that it can quickly transfer to unseen environments?
To do so, we devised a new manipulation task where a seven-DoF robot arm is provided with an image of an object and is directed to reach that particular goal amongst a set of distractor objects, while viewpoints change drastically from one trial to another. In doing so, we were able to simulate both the learning of complex behaviors and the transfer to unseen environments.
Visually indicated goal reaching task with a physical robotic arm and diverse camera viewpoints.
Harnessing Simulation to Learn Complex Behaviors
Collecting robot experience data is difficult and time-consuming. In a previous post, we showed how to scale up learning skills by
distributing the data collection and trials to multiple robots
. Although this approach expedited learning, it is still not feasibly extendable to learning complex behaviors such as
visual self-calibration
, where we need to expose robots to a huge space of various viewpoints. Instead, we opt to learn such complex behavior in simulation where we can collect unlimited robot trials and easily move the camera to various random viewpoints. In addition to fast data collection in simulation, we can also surpass hardware limitations requiring the installation of multiple cameras around a robot.
We use domain randomization technique to learn generalizable policies in simulation.
To learn visually robust features to transfer to unseen environments, we used a technique known as domain randomization (a.k.a. simulation randomization) introduced by
Sadeghi & Levine
(2017), that enables robots to learn vision-based policies entirely in simulation such that they can generalize to the real world. This technique was shown to work well for various robotic tasks such as
indoor navigation
,
object localization
,
pick and placing
, etc. In addition, to learn complex behaviors like self-calibration, we harnessed the simulation capabilities to generate synthetic demonstrations and combined
reinforcement learning
objectives to learn a robust controller for the robotic arm.
Viewpoint invariant manipulation for visually indicated goal reaching with a simulated seven-DoF robotic arm. We learn a single policy that can reach diverse goals from sensory input captured from dramatically different camera viewpoints.
Disentangling Perception from Control
To enable fast transfer to unseen environments, we devised a deep neural network that combines perception and control trained end-to-end simultaneously, while also allowing each to be learned independently if needed. This disentanglement between perception and control eases transfer to unseen environments, and makes the model both flexible and efficient in that each of its parts (i.e. 'perception' or 'control') can be independently adapted to new environments with small amounts of data. Additionally, while the control portion of the network was entirely trained by the simulated data, the perception part of our network was complemented by collecting a small amount of static images with object bounding boxes without needing to collect the whole action sequence trajectory with a physical robot. In practice, we fine-tuned the perception part of our network with only 76 object bounding boxes coming from 22 images.
Real-world robot and moving camera setup. First row shows the scene arrangements and the second row shows the visual sensory input to the robot.
Early Results
We tested the visually-adapted version of our network on a physical robot and on real objects with drastically different appearances than the ones used in simulation. Experiments were performed with both one or two objects on a table — “seen objects” (as labeled in the figure below) were used for visual adaptation using small collection of real static images, while “unseen objects” had not been seen during visual adaptation. During the test, the robot arm was directed to reach a visually indicated object from various viewpoints. For the two object experiments the second object was to "fool" the robotic arm. While the simulation-only network has good generalization capability (due to being trained with domain randomization technique), the very small amount of static visual data to visually adapt the controller boosted the performance, due to the flexible architecture of our network.
After adapting the visual features with the small amount of real images, performance was boosted by more than 10%. All used real objects are drastically different from the objects seen in simulation.
We believe that learning online visual self-adaptation is an important and yet challenging problem with the goal of learning generalizable policies for robots that can act in diverse and unstructured real world setup. Our approach can be extended to any sort of automatic self-calibration. See the video below for more information on this work.
Acknowledgements
This research was conducted by Fereshteh Sadeghi, Alexander Toshev, Eric Jang and Sergey Levine. We would also like to thank Erwin Coumans and Yunfei Bai for providing
pybullet
, and Vincent Vanhoucke for insightful discussions.
How Can Neural Network Similarity Help Us Understand Training and Generalization?
Thursday, June 21, 2018
Posted by Maithra Raghu, Google Brain Team and Ari S. Morcos, DeepMind
In order to solve tasks, deep neural networks (DNNs) progressively transform input data into a sequence of complex representations (i.e., patterns of activations across individual neurons). Understanding these representations is critically important, not only for interpretability, but also so that we can more intelligently design machine learning systems. However, understanding these representations has proven quite difficult, especially when comparing representations across networks. In a
previous post
, we outlined the benefits of
Canonical Correlation Analysis
(CCA) as a tool for understanding and comparing the representations of
convolutional neural networks
(CNNs), showing that they converge in a bottom-up pattern, with early layers converging to their final representations before later layers over the course of training.
In “
Insights on Representational Similarity in Neural Networks with Canonical Correlation
” we develop this work further to provide new insights into the representational similarity of CNNs, including differences between networks which memorize (e.g., networks which can only classify images they have seen before) from those which generalize (e.g., networks which can correctly classify previously unseen images). Importantly, we also extend this method to provide insights into the dynamics of
recurrent neural networks
(RNNs), a class of models that are particularly useful for sequential data, such as language. Comparing RNNs is difficult in many of the same ways as CNNs, but RNNs present the additional challenge that their representations change over the course of a sequence. This makes CCA, with its helpful invariances, an ideal tool for studying RNNs in addition to CNNs. As such, we have additionally open sourced the
code used for applying CCA on neural networks
with the hope that will help the research community better understand network dynamics.
Representational Similarity of Memorizing and Generalizing CNNs
Ultimately, a machine learning system is only useful if it can generalize to new situations it has never seen before. Understanding the factors which differentiate between networks that generalize and those that don’t is therefore essential, and may lead to new methods to improve generalization performance. To investigate whether representational similarity is predictive of generalization, we studied two types of CNNs:
generalizing networks
: CNNs trained on data with unmodified, accurate labels and which learn solutions which generalize to novel data.
memorizing networks
: CNNs trained on datasets with randomized labels such that they must memorize the training data and cannot, by definition, generalize (as in
Zhang et al., 2017
).
We trained multiple instances of each network, differing only in the initial randomized values of the network weights and the order of the training data, and used a new weighted approach to calculate the CCA distance measure (see
our paper
for details) to compare the representations within each group of networks and between memorizing and generalizing networks.
We found that groups of
different
generalizing networks consistently converged to more similar representations (especially in later layers) than groups of memorizing networks (see figure below). At the
softmax
, which denotes the network’s ultimate prediction, the CCA distance for each group of generalizing and memorizing networks decreases substantially, as the networks in each separate group make similar predictions.
Groups of generalizing networks (blue) converge to more similar solutions than groups of memorizing networks (red). CCA distance was calculated between groups of networks trained on real CIFAR-10 labels (“Generalizing”) or randomized CIFAR-10 labels (“Memorizing”) and between pairs of memorizing and generalizing networks (“Inter”).
Perhaps most surprisingly, in later hidden layers, the representational distance between any given pair of memorizing networks was about the same as the representational distance between a memorizing and generalizing network (“Inter” in the plot above), despite the fact that these networks were trained on data with entirely different labels. Intuitively, this result suggests that
while there are many different ways to memorize the training data (resulting in greater CCA distances), there are fewer ways to learn generalizable solutions
. In future work, we plan to explore whether this insight can be used to regularize networks to learn more generalizable solutions.
Understanding the Training Dynamics of Recurrent Neural Networks
So far, we have only applied CCA to CNNs trained on image data. However, CCA can also be applied to calculate representational similarity in RNNs, both over the course of training and over the course of a sequence. Applying CCA to RNNs, we first asked whether the RNNs exhibit the same
bottom-up
convergence pattern we observed in our
previous work
for CNNs. To test this, we measured the CCA distance between the representation at each layer of the RNN over the course of training with its final representation at the end of training. We found that the CCA distance for layers closer to the input dropped earlier in training than for deeper layers, demonstrating that, like CNNs, RNNs also converge in a bottom-up pattern (see figure below).
Convergence dynamics for RNNs over the course of training exhibit bottom up convergence, as layers closer to the input converge to their final representations earlier in training than later layers. For example, layer 1 converges to its final representation earlier in training than layer 2 than layer 3 and so on. Epoch designates the number of times the model has seen the entire training set while different colors represent the convergence dynamics of different layers.
Additional findings in our paper show that wider networks (e.g., networks with more neurons at each layer) converge to more similar solutions than narrow networks. We also found that trained networks with identical structures but different learning rates converge to distinct clusters with similar performance, but highly dissimilar representations. We also apply CCA to RNN dynamics over the course of a single sequence, rather than simply over the course of training, providing some initial insights into the various factors which influence RNN representations over time.
Conclusions
These findings reinforce the utility of analyzing and comparing DNN representations in order to provide insights into network function, generalization, and convergence. However, there are still many open questions: in future work, we hope to uncover which aspects of the representation are conserved across networks, both in CNNs and RNNs, and whether these insights can be used to improve network performance. We encourage others to try out the
code
used for the paper to investigate what CCA can tell us about other neural networks!
Acknowledgements
Special thanks to Samy Bengio, who is a co-author on this work. We also thank Martin Wattenberg, Jascha Sohl-Dickstein and Jon Kleinberg for helpful comments.
Google at CVPR 2018
Monday, June 18, 2018
Posted by Christian Howard, Editor-in-Chief, Google AI Communications
This week, Salt Lake City hosts the
2018 Conference on Computer Vision and Pattern Recognition
(CVPR 2018), the premier annual computer vision event comprising the main conference and several co-located
workshops
and
tutorials
. As a leader in computer vision research and a Diamond Sponsor, Google will have a strong presence at CVPR 2018 — over 200 Googlers will be in attendance to present papers and invited talks at the conference, and to organize and participate in multiple workshops.
If you are attending CVPR this year, please stop by our booth and chat with our researchers who are actively pursuing the next generation of intelligent systems that utilize the latest machine learning techniques applied to various areas of
machine perception
. Our researchers will also be available to talk about and demo several recent efforts, including the technology behind
portrait mode on the Pixel 2 and Pixel 2 XL smartphones
, the
Open Images V4 dataset
and much more.
You can learn more about our research being presented at CVPR 2018 in the list below (Googlers highlighted in
blue
)
Organization
Finance Chair:
Ramin Zabih
Area Chairs include:
Sameer Agarwal
,
Aseem Agrawala
,
Jon Barron
,
Abhinav Shrivastava
,
Carl Vondrick
,
Ming-Hsuan Yang
Orals/Spotlights
Unsupervised Discovery of Object Landmarks as Structural Representations
Yuting Zhang, Yijie Guo, Yixin Jin, Yijun Luo, Zhiyuan He,
Honglak Lee
DoubleFusion: Real-time Capture of Human Performances with Inner Body Shapes from a Single Depth Sensor
Tao Yu, Zerong Zheng,
Kaiwen Guo
, Jianhui Zhao, Qionghai Dai, Hao Li, Gerard Pons-Moll, Yebin Liu
Neural Kinematic Networks for Unsupervised Motion Retargetting
Ruben Villegas, Jimei Yang, Duygu Ceylan,
Honglak Lee
Burst Denoising with Kernel Prediction Networks
Ben Mildenhall,
Jiawen Chen
,
Jonathan Barron
,
Robert Carroll
,
Dillon Sharlet
, Ren Ng
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
Benoit Jacob
,
Skirmantas Kligys
,
Bo Chen
,
Matthew Tang
,
Menglong Zhu
,
Andrew Howard
,
Dmitry Kalenichenko
,
Hartwig Adam
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
Chunhui Gu
,
Chen Sun
,
David Ross
,
Carl Vondrick
,
Caroline Pantofaru
,
Yeqing Li
,
Sudheendra Vijayanarasimhan
,
George Toderici
,
Susanna Ricco
,
Rahul Sukthankar
,
Cordelia Schmid
, Jitendra Malik
Focal Visual-Text Attention for Visual Question Answering
Junwei Liang,
Lu Jiang
, Liangliang Cao,
Li-Jia Li
, Alexander G. Hauptmann
Inferring Light Fields from Shadows
Manel Baradad, Vickie Ye, Adam Yedida, Fredo Durand,
William Freeman
, Gregory Wornell, Antonio Torralba
Modifying Non-Local Variations Across Multiple Views
Tal Tlusty, Tomer Michaeli,
Tali Dekel
, Lihi Zelnik-Manor
Iterative Visual Reasoning Beyond Convolutions
Xinlei Chen,
Li-jia Li
,
Fei-Fei Li
,
Abhinav Gupta
Unsupervised Training for 3D Morphable Model Regression
Kyle Genova,
Forrester Cole
,
Aaron Maschinot
,
Daniel Vlasic
,
Aaron Sarna
,
William Freeman
Learning Transferable Architectures for Scalable Image Recognition
Barret Zoph
,
Vijay Vasudevan
,
Jonathon Shlens
,
Quoc Le
The iNaturalist Species Classification and Detection Dataset
Grant van Horn, Oisin Mac Aodha,
Yang Song
, Yin Cui,
Chen Sun
, Alex Shepard,
Hartwig Adam
, Pietro Perona, Serge Belongie
Learning Intrinsic Image Decomposition from Watching the World
Zhengqi Li,
Noah Snavely
Learning Intelligent Dialogs for Bounding Box Annotation
Ksenia Konyushkova,
Jasper Uijlings
, Christoph Lampert,
Vittorio Ferrari
Posters
Revisiting Knowledge Transfer for Training Object Class Detectors
Jasper Uijlings
,
Stefan Popov,
Vittorio Ferrari
Rethinking the Faster R-CNN Architecture for Temporal Action Localization
Yu-Wei Chao,
Sudheendra Vijayanarasimhan
,
Bryan Seybold
,
David Ross
, Jia Deng,
Rahul Sukthankar
Hierarchical Novelty Detection for Visual Object Recognition
Kibok Lee, Kimin Lee, Kyle Min, Yuting Zhang, Jinwoo Shin,
Honglak Lee
COCO-Stuff: Thing and Stuff Classes in Context
Holger Caesar,
Jasper Uijlings
,
Vittorio Ferrari
Appearance-and-Relation Networks for Video Classification
Limin Wang,
Wei Li
, Wen Li, Luc Van Gool
MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks
Ariel Gordon
,
Elad Eban
,
Bo Chen
,
Ofir Nachum
,
Tien-Ju Yang
, Edward Choi
Deformable Shape Completion with Graph Convolutional Autoencoders
Or Litany, Alex Bronstein, Michael Bronstein,
Ameesh Makadia
MegaDepth: Learning Single-View Depth Prediction from Internet Photos
Zhengqi Li,
Noah Snavely
Unsupervised Discovery of Object Landmarks as Structural Representations
Yuting Zhang, Yijie Guo, Yixin Jin, Yijun Luo, Zhiyuan He,
Honglak Lee
Burst Denoising with Kernel Prediction Networks
Ben Mildenhall,
Jiawen Chen
,
Jonathan Barron
,
Robert Carroll
,
Dillon Sharlet
, Ren Ng
Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling
Xingyuan Sun, Jiajun Wu, Xiuming Zhang, Zhoutong Zhang,
Tianfan Xue
, Joshua Tenenbaum,
William Freeman
Sparse, Smart Contours to Represent and Edit Images
Tali Dekel
,
Dilip Krishnan
, Chuang Gan,
Ce Liu
,
William Freeman
MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features
Liang-Chieh Chen
, Alexander Hermans,
George Papandreou
,
Florian Schroff
, Peng Wang,
Hartwig Adam
Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning
Yin Cui,
Yang Song
,
Chen Sun
,
Andrew Howard
,
Serge Belongie
Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks
Nick Johnston
,
Damien Vincent
,
David Minnen
,
Michele Covell
,
Saurabh Singh
,
Sung Jin Hwang, George Toderici
,
Troy Chinen
,
Joel Shor
MobileNetV2: Inverted Residuals and Linear Bottlenecks
Mark Sandler
,
Andrew Howard
,
Menglong Zhu
,
Andrey Zhmoginov
,
Liang-Chieh Chen
ScanComplete: Large-Scale Scene Completion and Semantic Segmentation for 3D Scans
Angela Dai, Daniel Ritchie,
Martin Bokeloh
,
Scott Reed
,
Juergen Sturm
, Matthias Nie
ß
ner
Sim2Real View Invariant Visual Servoing by Recurrent Control
Fereshteh Sadeghi
,
Alexander Toshev
,
Eric Jang
,
Sergey Levine
Alternating-Stereo VINS: Observability Analysis and Performance Evaluation
Mrinal Kanti Paul
,
Stergios Roumeliotis
Soccer on Your Tabletop
Konstantinos Rematas, Ira Kemelmacher, Brian Curless,
Steve Seitz
Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints
Reza Mahjourian
,
Martin Wicke
,
Anelia Angelova
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
Chunhui Gu
,
Chen Sun
,
David Ross
,
Carl Vondrick
,
Caroline Pantofaru
,
Yeqing Li
,
Sudheendra Vijayanarasimhan
,
George Toderici
,
Susanna Ricco
,
Rahul Sukthankar
,
Cordelia Schmid
,
Jitendra Malik
Inferring Light Fields from Shadows
Manel Baradad, Vickie Ye, Adam Yedida, Fredo Durand,
William Freeman
, Gregory Wornell, Antonio Torralba
Modifying Non-Local Variations Across Multiple Views
Tal Tlusty, Tomer Michaeli,
Tali Dekel
, Lihi Zelnik-Manor
Aperture Supervision for Monocular Depth Estimation
Pratul Srinivasan,
Rahul Garg
,
Neal Wadhwa
, Ren Ng,
Jonathan Barron
Instance Embedding Transfer to Unsupervised Video Object Segmentation
Siyang Li,
Bryan Seybold
,
Alexey Vorobyov
,
Alireza Fathi
, Qin Huang, C.-C. Jay Kuo
Frame-Recurrent Video Super-Resolution
Mehdi S. M. Sajjadi,
Raviteja Vemulapalli
,
Matthew Brown
Weakly Supervised Action Localization by Sparse Temporal Pooling Network
Phuc Nguyen,
Ting Liu
,
Gautam Prasad
,
Bohyung Han
Iterative Visual Reasoning Beyond Convolutions
Xinlei Chen,
Li-jia Li
,
Fei-Fei Li
, Abhinav Gupta
Learning and Using the Arrow of Time
Donglai Wei, Andrew Zisserman,
William Freeman
, Joseph Lim
HydraNets: Specialized Dynamic Architectures for Efficient Inference
Ravi Teja Mullapudi,
Noam Shazeer
,
William Mark
, Kayvon Fatahalian
Thoracic Disease Identification and Localization with Limited Supervision
Zhe Li,
Chong Wang
,
Mei Han
,
Yuan Xue
,
Wei Wei
,
Li-jia Li
,
Fei-Fei Li
Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis
Seunghoon Hong, Dingdong Yang, Jongwook Choi,
Honglak Lee
Deep Semantic Face Deblurring
Ziyi Shen, Wei-Sheng Lai, Tingfa Xu, Jan Kautz,
Ming-Hsuan Yang
Unsupervised Training for 3D Morphable Model Regression
Kyle Genova,
Forrester Cole
,
Aaron Maschinot
,
Daniel Vlasic
,
Aaron Sarna
,
William Freeman
Learning Transferable Architectures for Scalable Image Recognition
Barret Zoph
,
Vijay Vasudevan
,
Jonathon Shlens
,
Quoc Le
Learning Intrinsic Image Decomposition from Watching the World
Zhengqi Li,
Noah Snavely
PiCANet: Learning Pixel-wise Contextual Attention for Saliency Detection
Nian Liu, Junwei Han,
Ming-Hsuan Yang
Mobile Video Object Detection with Temporally-Aware Feature Maps
Mason Liu,
Menglong Zhu
Tutorials
Computer Vision for Robotics and Driving
Anelia Angelova
, Sanja Fidler
Unsupervised Visual Learning
Pierre Sermanet
,
Anelia Angelova
UltraFast 3D Sensing, Reconstruction and Understanding of People, Objects and Environments
Sean Fanello
,
Julien Valentin
,
Jonathan Taylor
,
Christoph Rhemann
,
Adarsh Kowdle
,
Jürgen Sturm
,
Christine Kaeser-Chen
,
Pavel Pidlypenskyi
,
Rohit Pandey
,
Andrea Tagliasacchi
,
Sameh Khamis
,
David Kim
,
Mingsong Dou
,
Kaiwen Guo
,
Danhang Tang
,
Shahram Izadi
Generative Adversarial Networks
Jun-Yan Zhu, Taesung Park, Mihaela Rosca, Phillip Isola,
Ian Goodfellow
Google at NAACL
Friday, June 8, 2018
Posted by Kenton Lee, Research Scientist and Slav Petrov, Principal Scientist, Language Team, Google AI
This week, New Orleans, LA hosted the
North American Association of Computational Linguistics
(NAACL) conference, a venue for the latest research on computational approaches to understanding natural language. Google once again had a strong presence, presenting our research on a diverse set of topics, including dialog, summarization, machine translation, and linguistic analysis. In addition to contributing publications, Googlers were also involved as committee members, workshop organizers, panelists and presented one of the conference keynotes. We also provided telepresence robots, which enabled researchers who couldn’t attend in person to present their work remotely at the
Widening Natural Language Processing Workshop
(WiNLP) and several other workshops.
Googler Margaret Mitchell setting up our telepresence robots for remote presenters Diana Gonazelez and Gibran Fuentes Pineda to remotely present their first-place work on visual storytelling from Universidad Nacional Autonoma de Mexico.
This year NAACL also introduced a new
Test of Time Award
recognizing influential papers published between 2002 and 2012. We are happy and honored to recognize that all three papers receiving the award (listed below with a shot summary) were co-authored by researchers who are now at Google (in
blue
):
BLEU: a Method for Automatic Evaluation of Machine Translation
(2002)
Kishore Papineni
, Salim Roukos, Todd Ward, Wei-Jing Zhu
Before the introduction of the BLEU metric, comparing Machine Translation (MT) models required expensive human evaluation. While human evaluation is still the gold standard, the strong correlation of BLEU with human judgment has permitted much faster experiment cycles. BLEU has been a reliable measure of progress, persisting through multiple paradigm shifts in MT.
Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms
(2002)
Michael Collins
The structured perceptron is a generalization of the classical perceptron to structured prediction problems, where the number of possible "labels" for each input is a very large set, and each label has rich internal structure. Canonical examples are speech recognition, machine translation, and syntactic parsing. The structured perceptron was one of the first algorithms proposed for structured prediction, and has been shown to be effective in spite of its simplicity.
Thumbs up?: Sentiment Classification using Machine Learning Techniques
(2002)
Bo Pang
, Lillian Lee, Shivakumar Vaithyanathan
This paper is amongst the first works in sentiment analysis and helped define the subfield of sentiment and opinion analysis and review mining. The paper introduced a new way to look at document classification, developed the first solutions to it using supervised machine learning methods, and discussed insights and challenges. This paper also had significant data impact -- the movie review dataset has supported much of the early work in this area and is still one of the commonly used benchmark evaluation datasets.
If you attended NAACL 2018, we hope that you stopped by the booth to check out some demos, meet our researchers and discuss projects and opportunities at Google that go into solving interesting problems for billions of people. You can learn more about Google research presented at NAACL 2018 below (Googlers highlighted in
blue
), and visit the
Google AI Language Team page
.
Area Chairs include:
Dan Bikel
,
Dilek Hakkani-Tur
,
Zornitsa Kozareva
,
Marius Pasca
,
Emily Pitler
,
Idan Szpektor
,
Taro Watanabe
Publications Co-Chair
Margaret Mitchell
Keynote
Google Assistant or My Assistant? Towards Personalized Situated Conversational Agents
Dilek Hakkani-Tür
Publications
Bootstrapping a Neural Conversational Agent with Dialogue Self-Play, Crowdsourcing and On-Line Reinforcement Learning
Pararth Shah
,
Dilek Hakkani-
Tür
, Bing Liu, Gokhan T
ü
r
SHAPED: Shared-Private Encoder-Decoder for Text Style Adaptation
Ye Zhang,
Nan Ding
,
Radu Soricut
Olive Oil is Made of Olives, Baby Oil is Made for Babies: Interpreting Noun Compounds Using Paraphrases in a Neural Model
Vered Schwartz,
Chris Waterson
Are All Languages Equally Hard to Language-Model?
Ryan Cotterell, Sebastian J. Mielke, Jason Eisner,
Brian Roark
Self-Attention with Relative Position Representations
Peter Shaw
,
Jakob
Uszkoreit
,
Ashish Vaswani
Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems
Bing Liu, Gokhan Tür,
Dilek Hakkani-Tür
,
Parath Shah
, Larry Heck
Workshops
Subword & Character Level Models in NLP
Organizers:
Manaal Faruqui
, Hinrich Schütze, Isabel Trancoso, Yulia Tsvetkov, Yadollah Yaghoobzadeh
Storytelling Workshop
Organizers:
Margaret Mitchell
, Ishan Misra, Ting-Hao 'Kenneth' Huang, Frank Ferraro
Ethics in NLP
Organizers:
Michael Strube, Dirk Hovy,
Margaret Mitchell
, Mark Alfano
NAACL HLT Panels
Careers in Industry
Participants:
Philip Resnik (moderator),
Jason Baldridge
, Laura Chiticariu, Marie Mateer, Dan Roth
Ethics in NLP
Participants:
Dirk Hovy (moderator),
Margaret Mitchell
, Vinodkumar Prabhakaran, Mark Yatskar, Barbara Plank
Realtime tSNE Visualizations with TensorFlow.js
Thursday, June 7, 2018
Posted by Nicola Pezzotti, Software Engineering Intern, Google Zürich
In recent years, the
t-distributed Stochastic Neighbor Embedding
(tSNE) algorithm has become one of the most used and insightful techniques for exploratory data analysis of high-dimensional data. Used to interpret deep neural network outputs in tools such as the
TensorFlow Embedding Projector
and
TensorBoard
, a powerful feature of tSNE is that it reveals clusters of high-dimensional data points at different scales while requiring only minimal tuning of its parameters. Despite these advantages, the computational complexity of the tSNE algorithm limits its application to relatively small datasets. While several evolutions of tSNE have been developed to address this issue (mainly focusing on the scalability of the similarity computations between data points), they have so far not been enough to provide a truly interactive experience when visualizing the evolution of the tSNE embedding for large datasets.
In “
Linear tSNE Optimization for the Web
”, we present a novel approach to tSNE that heavily relies on modern graphics hardware. Given the linear complexity of the new approach, our method generates embeddings faster than comparable techniques and can even be executed on the client side in a web browser by leveraging GPU capabilities through
WebGL
. The combination of these two factors allows for real-time interactive visualization of large, high-dimensional datasets. Furthermore, we are
releasing this work as an open source library
in the
TensorFlow.js
family in the hopes that the broader research community finds it useful.
Real-time evolution of the tSNE embedding for the complete MNIST dataset with our technique. The dataset contains images of 60,000 handwritten digits.
You can find a live demo here
.
The aim of tSNE is to cluster small “neighborhoods” of similar data points while also reducing the overall dimensionality of the data so it is more easily visualized. In other words, the tSNE
objective function
measures how well these neighborhoods of similar data are preserved in the 2 or 3-dimensional space, and arranges them into clusters accordingly.
In previous work, the minimization of the tSNE objective was performed as a
N-body simulation
problem, in which points are randomly placed in the embedding space and two different types of forces are applied on each point. Attractive forces bring the points closer to the points that are most similar in the high-dimensional space, while repulsive forces push them away from all the neighbors in the embedding.
While the attractive forces are acting on a small subset of points (i.e., similar neighbors), repulsive forces are in effect from all pairs of points. Due to this, tSNE requires significant computation and many iterations of the objective function, which limits the possible dataset size to just a few hundred data points. To improve over a brute force solution, the
Barnes-Hut
algorithm was used to approximate the repulsive forces and the gradient of the objective function. This allows scaling of the computation to tens of thousand data points, but it requires more than 15 minutes to compute the
MNIST
embedding in a C++ implementation.
In our paper, we propose a solution to this scaling problem by approximating the gradient of the objective function using textures that are generated in WebGL. Our technique draws a “repulsive field” at every minimization iteration using a three channel texture, with the 3 components treated as colors and drawn in the RGB channels. The repulsive field is obtained for every point to represent both the horizontal and vertical repulsive force created by the point, and a third component used for normalization. Intuitively, the normalization term ensures that the magnitude of the shifts matches the similarity measure in the high-dimensional space. In addition, the resolution of the texture is adaptively changed to keep the number of pixels drawn constant.
Rendering of the three functions used to approximate the repulsive effect created by a single point. In the above figure the repulsive forces show a point in a blue area is pushed to the left/bottom, while a point in the red area is pushed to the right/top while a point in the white region will not move.
The contribution of every point is then added on the GPU, resulting in a texture similar to those presented in the GIF below, that approximate the repulsive fields. This innovative repulsive field approach turns out to be much more GPU friendly than more commonly used calculation of point-to-point interactions. This is because repulsion for multiple points can be computed at once and in a very fast way in the GPU. In addition, we implemented the computation of the attraction between points in the GPU.
This animation shows the evolution of the tSNE embedding (upper left) and of the scalar fields used to approximate its gradient with normalization term (upper right), horizontal shift (bottom left) and vertical shift (bottom right).
We additionally revised the update of the embedding from an ad-hoc implementation to a series of standard tensor operations that are computed in
TensorFlow.js
, a JavaScript library to perform tensor computations in the web browser. Our approach, which is released as
an open source library
in the TensorFlow.js family, allows us to compute the evolution of the tSNE embedding entirely on the GPU while having better computational complexity.
With this implementation, what used to take 15 minutes to calculate (on the MNIST dataset) can now be visualized in real-time and in the web browser. Furthermore this allows real-time visualizations of much larger datasets, a feature that is particularly useful when deep neural output is analyzed. One main limitation of our work is that this technique currently only works for 2D embeddings. However, 2D visualizations are often preferred over 3D ones as they require more interaction to effectively understand cluster results.
Future Work
We believe that having a fast and interactive tSNE implementation that runs in the browser will empower developers of data analytics systems. We are particularly interested in exploring how our implementation can be used for the interpretation of deep neural networks. Additionally, our implementation shows how lateral thinking in using GPU computations (approximating the gradient using RGB texture) can be used to significantly speed up algorithmic computations. In the future we will be exploring how this kind of gradient approximation can be applied not only to speed-up other dimensionality reduction algorithms, but also to implement other
N-body simulations
in the web browser using TensorFlow.js.
Acknowledgements
We would like to thank Alexander Mordvintsev, Yannick Assogba, Matt Sharifi, Anna Vilanova, Elmar Eisemann, Nikhil Thorat, Daniel Smilkov, Martin Wattenberg, Fernanda Viegas, Alessio Bazzica, Boudewijn Lelieveldt, Thomas Höllt, Baldur van Lew, Julian Thijssen and Marvin Ritter.
Announcing an updated YouTube-8M, and the 2nd YouTube-8M Large-Scale Video Understanding Challenge and Workshop
Tuesday, June 5, 2018
Posted by Joonseok Lee, Software Engineer, Google AI
Last year, we organized the first
YouTube-8M Large-Scale Video Understanding Challenge
with
Kaggle
, in which 742 teams consisting of 946 individuals from 60 countries used the
YouTube-8M dataset
(2017 edition) to develop classification algorithms which accurately assign video-level labels. The purpose of the competition was to accelerate improvements in large-scale video understanding, representation learning, noisy data modeling, transfer learning and domain adaptation approaches that can help improve the machine-learning models that classify video. In addition to the competition, we hosted an affiliated
workshop at CVPR’17
, inviting competition top-performers and researchers and share their ideas on how to advance the state-of-the-art in video understanding.
As a continuation of these efforts to accelerate video understanding, we are excited to announce another
update to the YouTube-8M dataset
,
a new Kaggle video understanding challenge
and an affiliated
2nd Workshop on YouTube-8M Large-Scale Video Understanding
, to be held at the
2018 European Conference on Computer Vision
(ECCV'18).
An Updated YouTube-8M Dataset (2018 Edition)
Our YouTube-8M (2018 edition) features a major improvement in the quality of annotations, obtained using a machine learning system that combines audio-visual content with title, description and other metadata to provide more accurate
ground truth
annotations. The updated version contains 6.1 million URLs, labeled with a vocabulary of 3,862 visual entities, with each video annotated with one or more labels and an average of 3 labels per video. We have also updated the
starter code
, with updated instructions for downloading and training
TensorFlow
video annotation models on the dataset.
The 2nd YouTube-8M Video Understanding Challenge
The
2nd YouTube-8M Video Understanding Challenge
invites participants to build audio-visual content classification models using YouTube-8M as training data, and then to label an unknown subset of test videos. Unlike last year, we strictly impose a hard limit on model size, encouraging participants to advance a single model within tight budget rather than assembling as many models as possible. Each of the top 5 teams will be awarded $5,000 to support their travel to Munich to attend ECCV’18. For details, please visit the
Kaggle competition page
.
The 2nd Workshop on YouTube-8M Large-Scale Video Understanding
To be held at
ECCV’18
, the workshop will consist of invited talks by distinguished researchers, as well as presentations by top-performing challenge participants in order to facilitate the exchange of ideas. We encourage those who wish to attend to submit papers describing their research, experiments, or applications based on YouTube-8M dataset, including papers summarizing their participation in the challenge above. Please refer to the
workshop page
for more details.
It is our hope that this update to the dataset, along with the new challenge and workshop, will continue to advance the research in large-scale video understanding. We hope you will join us again!
Acknowledgements
This post reflects the work of many machine perception researchers including Sami Abu-El-Haija, Ke Chen, Nisarg Kothari, Joonseok Lee, Hanhan Li, Paul Natsev, Sobhan Naderi Parizi, Rahul Sukthankar, George Toderici, Balakrishnan Varadarajan, as well as Sohier Dane, Julia Elliott, Wendy Kan and Walter Reade from Kaggle. We are also grateful for the support and advice from our partners at YouTube.
Labels
accessibility
ACL
ACM
Acoustic Modeling
Adaptive Data Analysis
ads
adsense
adwords
Africa
AI
Algorithms
Android
Android Wear
API
App Engine
App Inventor
April Fools
Art
Audio
Augmented Reality
Australia
Automatic Speech Recognition
Awards
Cantonese
Chemistry
China
Chrome
Cloud Computing
Collaboration
Computational Imaging
Computational Photography
Computer Science
Computer Vision
conference
conferences
Conservation
correlate
Course Builder
crowd-sourcing
CVPR
Data Center
Data Discovery
data science
datasets
Deep Learning
DeepDream
DeepMind
distributed systems
Diversity
Earth Engine
economics
Education
Electronic Commerce and Algorithms
electronics
EMEA
EMNLP
Encryption
entities
Entity Salience
Environment
Europe
Exacycle
Expander
Faculty Institute
Faculty Summit
Flu Trends
Fusion Tables
gamification
Gboard
Gmail
Google Accelerated Science
Google Books
Google Brain
Google Cloud Platform
Google Docs
Google Drive
Google Genomics
Google Maps
Google Photos
Google Play Apps
Google Science Fair
Google Sheets
Google Translate
Google Trips
Google Voice Search
Google+
Government
grants
Graph
Graph Mining
Hardware
HCI
Health
High Dynamic Range Imaging
ICLR
ICML
ICSE
Image Annotation
Image Classification
Image Processing
Inbox
India
Information Retrieval
internationalization
Internet of Things
Interspeech
IPython
Journalism
jsm
jsm2011
K-12
KDD
Keyboard Input
Klingon
Korean
Labs
Linear Optimization
localization
Low-Light Photography
Machine Hearing
Machine Intelligence
Machine Learning
Machine Perception
Machine Translation
Magenta
MapReduce
market algorithms
Market Research
Mixed Reality
ML
MOOC
Moore's Law
Multimodal Learning
NAACL
Natural Language Processing
Natural Language Understanding
Network Management
Networks
Neural Networks
Nexus
Ngram
NIPS
NLP
On-device Learning
open source
operating systems
Optical Character Recognition
optimization
osdi
osdi10
patents
Peer Review
ph.d. fellowship
PhD Fellowship
PhotoScan
Physics
PiLab
Pixel
Policy
Professional Development
Proposals
Public Data Explorer
publication
Publications
Quantum AI
Quantum Computing
renewable energy
Research
Research Awards
resource optimization
Robotics
schema.org
Search
search ads
Security and Privacy
Semantic Models
Semi-supervised Learning
SIGCOMM
SIGMOD
Site Reliability Engineering
Social Networks
Software
Speech
Speech Recognition
statistics
Structured Data
Style Transfer
Supervised Learning
Systems
TensorBoard
TensorFlow
TPU
Translate
trends
TTS
TV
UI
University Relations
UNIX
User Experience
video
Video Analysis
Virtual Reality
Vision Research
Visiting Faculty
Visualization
VLDB
Voice Search
Wiki
wikipedia
WWW
YouTube
Archive
2018
Jun
May
Apr
Mar
Feb
Jan
2017
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Jul
May
Apr
Mar
Feb
2007
Oct
Sep
Aug
Jul
Jun
Feb
2006
Dec
Nov
Sep
Aug
Jul
Jun
Apr
Mar
Feb
Feed
Google
on
Follow @googleai
Give us feedback in our
Product Forums
.