Research Blog: February 2017

Google Research Awards 2016

Thursday, February 23, 2017

Posted by Maggie Johnson, Director of Education and University Relations, GoogleGoogle Research Awardsmachine learningmachine perceptionnatural language processingsecurity

The subject areas that received the most support were machine learning, machine perception, networking and systems.

Proposals related to Machine learning represented 20% of the total submissions received, up from 12% in 2015.

Proportionally, proposals from Europe had a 4% higher acceptance rate, attributed to our increased research presence in Zürich.

recipients of this round’s awardsour website

Preprocessing for Machine Learning with tf.Transform

Wednesday, February 22, 2017

Posted by Kester Tong, David Soergel, and Gus Katsiapis, Software Engineerstf.TransformApache BeamGoogle Cloud Dataflowother frameworksTensorflow Serving

tf.Transform allows users to define a preprocessing pipeline. Users can materialize the preprocessed data for use in TensorFlow training, and also export a tf.Transform graph that encodes the transformations as a TensorFlow graph. This transformation graph can then be incorporated into the model graph used for inference.

AcknowledgementsTensorFlowTensorFlow ServingCloud Dataflow

Headset “Removal” for Virtual and Mixed Reality

Tuesday, February 21, 2017

Posted by Vivek Kwatra, Research Scientist and Christian Frueh, Avneesh Sud, Software Engineers
new ways to view the worldrealimaginaryMixed RealityMachine PerceptionDaydream LabsYouTube Spaces

VR user captured in front of a green-screen is blended with the virtual environment to generate the MR output: Traditional MR output has the user face occluded, while our result reveals the face. Note how the headset is modified with a marker to aid tracking.

enhancing Mixed RealityGoogle-VR blogDynamic face model capturegaze-dependent dynamic appearancegaze database

On the left, the user’s face is captured by a camera as she tracks a marker on the monitor with her eyes. On the right, we show the dynamic nature of reconstructed 3D face model: by moving or clicking on the mouse, we are able to simulate both apparent eye gaze and blinking.

Calibration and AlignmentCreating a Mixed Reality videoCompositing and Renderingincorporate eye-trackinguncanny valleyResults and Extensionsheadset removal technology to enhance Mixed RealityGoogle Tilt Brush

An artist creates 3D art using Google Tilt Brush, shown in Mixed Reality. On the top is the traditional MR result where the face is hidden behind the headset. On the bottom is our result, which reveals the entire face and eyes for a more natural and engaging experience.

The CS Capacity Program - New Tools and SIGCSE 2017

Thursday, February 16, 2017

Posted by Chris Stephenson, Head of Computer Science Education StrategyCS Capacity programdramatic increase in undergraduate computer science enrollmentsSPARCMaGE Peer Mentor programentire online course curriculum

MaGE Program Students and Faculty from Mount Holyoke College

a small-group tutoring programSIGCSE 2017 Technical SymposiumAutolabMy Digital HandSIGCSE conference in March

An updated YouTube-8M, a video understanding challenge, and a CVPR workshop. Oh my!

Wednesday, February 15, 2017

Posted by Paul Natsev, Software EngineerYouTube-8M datasetmillions of videos labeled with thousands of classesOpen ImagesYouTube-BoundingBoxesYouTube-8M datasetGoogle Cloud Machine Learningkaggle.comvideo understanding competitionCVPR’17 WorkshopAn Updated YouTube-8Maudio modeling architecturehere

A tree-map visualization of the updated YouTube-8M dataset, organized into 24 high-level verticals, including the top-200 most frequent entities, plus the top-5 entities for each vertical.

Sample videos from the top-18 high-level verticals in the YouTube-8M dataset.

The Google Cloud & YouTube-8M Video Understanding Challenge Google Cloud & YouTube-8M Video Understanding ChallengeGoogle Cloudkaggle.comKaggle competitionhereGoogle Cloud Machine LearningGithubREADMEgetting started guide on KaggleThe CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding 1st YouTube-8M WorkshopCVPR 2017inviteAcknowledgements

Announcing TensorFlow 1.0

Wednesday, February 15, 2017

Posted by Amy McDonald Sandjideh, Technical Program Manager, TensorFlowfirst yearlanguage translationearly detection of skin cancerpreventing blindness in diabetics6000 open-source repositories onlineTensorFlow Developer Summitlivestreamed around the worldTensorFlow 1.0It’s faster:XLAtensorflow.orgtips & tricksIt’s more flexible:KerasIt’s more production-ready than ever:hereTensorFlow 1.0

Python APIs have been changed to resemble NumPy more closely. For this and other backwards-incompatible changes made to support API stability going forward, please use our handy migration guide and conversion script.

Experimental APIs for Java and Go

Higher-level API modules tf.layers, tf.metrics, and tf.losses - brought over from tf.contrib.learn after incorporating skflow and TF Slim

Experimental release of XLA, a domain-specific compiler for TensorFlow graphs, that targets CPUs and GPUs. XLA is rapidly evolving - expect to see more progress in upcoming releases.

Introduction of the TensorFlow Debugger (tfdbg), a command-line interface and API for debugging live TensorFlow programs.

New Android demos for object detection and localization, and camera-based image stylization.

Installation improvements: Python 3 docker images have been added, and TensorFlow’s pip packages are now PyPI compliant. This means TensorFlow can now be installed with a simple invocation of pip install tensorflow.

TensorFlow Developer Summit talks on YouTubeXLA

Click here for a link to the livestream and video playlist (individual talks will be posted online later in the day).

FoldEmbedding ProjectorTensorFlow ServingGitHub issuesStack Overflow@TensorFlowdiscuss@tensorflow.org

On-Device Machine Intelligence

Thursday, February 09, 2017

Posted by Sujith Ravi, Staff Research Scientist, Google Researchconversational understandingimage recognition deep neural networksgraph-based machine learningIoTAndroid Wear 2.0Smart Replyincluding third-party messaging apps

AlloInboxLearning with Projectionssentiment classificationI love this movieThe acting was horriblerecurrent neural networkLSTMgraph learningandquantizationcharacter-level modelsword embeddingsencoder networkslocality sensitive hashing

Projection step: Similar messages are grouped together and projected to nearby vectors. For example, the messages "hey, how's it going?" and "How's it going buddy?" share similar content and might be projected to the same vector 11100011. Another related message “Howdy, everything going well?” is mapped to a nearby vector 11100110 that differs only in 2 bits.

semi-supervisedgraph learning

Learning step: (Top) Messages along with projections and corresponding replies, if available, are used in a machine learning framework to jointly learn a “message projection model”. (Bottom) The message projection model learns to associate replies with the projections of the corresponding incoming messages. For example, the model projects two different messages “Howdy, everything going well?” and “How’s it going buddy?” (bottom center) to nearby bit vectors and learns to map these to relevant replies (bottom right).

Inference step: The model applies the learned projections to an incoming message (or sequence of messages) and suggests relevant and diverse replies. Inference is performed on the device, allowing the model to adapt to user data and personal writing styles.

Converse from Your WristYou can now use this feature to respond to your messages directly from your Google watches or any watch that runs Android Wear 2.0. It is already enabled on Google Hangouts, Google Messenger, and many third-party messaging apps. We also provide an API for developers of third-party Wear apps. Acknowledgements

Announcing TensorFlow Fold: Deep Learning With Dynamic Computation Graphs

Tuesday, February 07, 2017

Posted by Moshe Looks, Marcello Herreshoff and DeLesley Hutchins, Software EngineersTensorFlowcomputation graphSIMDparse treesabstract syntax treesDOM treesTensorFlow Folddynamic batchingDeep Learning with Dynamic Computation Graphs

This animation shows a recursive neural network run with dynamic batching. Operations with the same color are batched together, which lets TensorFlow run them faster. The Embed operation converts words to vector representations. The fully connected (FC) operation combines word vectors to form vector representations of phrases. The output of the network is a vector representation of an entire sentence. Although only a single parse tree of a sentence is shown, the same network can run, and batch together operations, over multiple parse trees of arbitrary shapes and sizes.

our papergithub siteAcknowledgements

Advancing Research on Video Understanding with the YouTube-BoundingBoxes Dataset

Monday, February 06, 2017

Posted by Esteban Real, Vincent Vanhoucke, Jonathon Shlens, Google Brain team and
Stefano Mazzocchi, Google Researchwhat objects wherelocations over timeYouTube-8MYouTube-BoundingBoxes

Summary of dataset statistics. Bar Chart: Relative number of detections in existing image (red) and video (blue) data sets. The YouTube BoundingBoxes dataset (YT-BB) is at the bottom, is at the bottom. Table: The three columns are counts for: classification annotations, bounding boxes, and unique videos with bounding boxes. Full details on the dataset can be found in the preprint.

identifylocalizetrack objectscan

Three video segments, sampled at 1 frame per second. The final frame of each example shows how it is visually challenging to recognize the bounded object, due to blur or occlusion (train example, blue arrow). However, temporally-related frames, where the object has been more clearly identified, can allow object classes to be inferred. Note how only visible parts are included in the box: the orange arrow in the bear example (middle row) points to the hidden head. The dog example illustrates tight bounding boxes that track the tail (orange arrows) and foot (blue arrows). The airplane example illustrates how partial objects are annotated (first frame) tracked across changes in perspective, occlusions and camera cuts.

associated preprintAcknowledgements

Using Machine Learning to Predict Parking Difficulty

Friday, February 03, 2017

Posted by James Cook, Yechen Li, Software Engineers and Ravi Kumar, Research ScientistWhen Solomon said there was a time and a place for everything he had not encountered the problem of parking his automobile.-Bob Edwards, Broadcast Journaliststuck in trafficlooking for parkingGoogle MapsWazelaunched a new feature for Google Maps for Android

Parking availability is highly variable, based on factors like the time, day of week, weather, special events, holidays, and so on. Compounding the problem, there is almost no real time information about free parking spots.

Even in areas with internet-connected parking meters providing information on availability, this data doesn’t account for those who park illegally, park with a permit, or depart early from still-paid meters.

Roads form a mostly-planar graph, but parking structures may be more complex, with traffic flows across many levels, possibly with different layouts.

Both the supply and the demand for parking are in constant flux, so even the best system is at risk of being outdated as soon as it’s built.

Ground Truth Dataeasydifficult.How long did it it take to find parking?Model Featureswisdom of the crowdlive trafficpopular times and visit durations

Model Selection & Traininglogistic regressionLimited parkingEasyResults

Output of our parking difficulty model in the Financial District and Union Square areas of San Francisco. Red denotes a higher confidence that parking is difficult. Top row: a typical Monday at ~8am (left) and ~9pm (right). Bottom row: the same times but on a typical Saturday.

Google Research Blog

Google Research Awards 2016

Preprocessing for Machine Learning with tf.Transform

Headset “Removal” for Virtual and Mixed Reality

The CS Capacity Program - New Tools and SIGCSE 2017

An updated YouTube-8M, a video understanding challenge, and a CVPR workshop. Oh my!

Announcing TensorFlow 1.0

On-Device Machine Intelligence

Announcing TensorFlow Fold: Deep Learning With Dynamic Computation Graphs

Advancing Research on Video Understanding with the YouTube-BoundingBoxes Dataset

Using Machine Learning to Predict Parking Difficulty

Labels

Archive

Feed

Company-wide

Products

Developers