Research Blog: November 2017

Understanding Bias in Peer Review

Thursday, November 30, 2017

Posted by Andrew Tomkins, Director of Engineering and William D. Heavlin, Statistician, Google ResearchMatthew effectMatilda effectstudyAmerican Economic ReviewReviewer bias in single- versus double-blind peer reviewProceedings of the National Academy of Sciences10th ACM Web Search and Data Mining Conference

We invited a number of experts to join the conference Program Committee (PC).

We randomly split these PC members into a single-blind cadre and a double-blind cadre.

We asked all PC members to “bid” for papers they were qualified to review, but only the single-blind cadre had access to the names and institutions of the paper authors.

Based on the resulting bids, we then allocated two single-blind and two double-blind PC members to each paper.

Each PC member read his or her assigned papers and entered reviews, again with only single-blind PC members able to see the authors and institutions.

extended version of our paperfull paper

Interpreting Deep Neural Networks with SVCCA

Tuesday, November 28, 2017

Posted by Maithra Raghu, Google Brain Teamvisionlanguage understandingspeech recognitionadversarial examplescatastrophic forgettingreinforcement learningmode collapsemodellingrepresentational similaritySVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretabilitythe codeactivation vector

Here a DNN is given three inputs, x₁, x₂, x₃. Looking at a neuron inside the DNN (bolded in red, right pane), this neuron produces a scalar output z_i corresponding to each input x_i. These values form the activation vector of the neuron.

net1net2CIFAR-10

net1net2net1net2timenet1net1

Plots showing learning dynamics of convolutional and residual networks on CIFAR-10. Note the additional structure also visible: the 2x2 blocks in the top row are due to batch norm layers, and the checkered pattern in the bottom row due to residual connections.

bottom-upFreeze TrainingDiscrete Fourier transformImagenet Resnet

SVCCA similarity of latent representations with different classes. We take different layers in Imagenet Resnet, with 0 indicating input and 74 indicating output, and compare representational similarity of the hidden layer and the output class. Interestingly, different classes are learned at different speeds: the firetruck class is learned faster than the different dog breeds. Furthermore, the two pairs of dog breeds (a husky-like pair and a terrier-like pair) are learned at the same rate, reflecting the visual similarity between them.

paperNIPS 2017code

Understanding Medical Conversations

Tuesday, November 21, 2017

Posted by Katherine Chou, Product Manager and Chung-Cheng Chiu, Software Engineer, Google Brain TeamElectronic Health Records¹²³

Speech Recognition for Medical ConversationsAutomatic Speech Recognitionresearchreturn joy to practice
1 http://www.annfammed.org/content/15/5/419.full ^↩
2 http://www.mayoclinicproceedings.org/article/S0025-6196%2815%2900716-8/abstract ^↩
3 http://www.annfammed.org/content/15/5/427.full ^↩

SLING: A Natural Language Frame Semantic Parser

Wednesday, November 15, 2017

Posted by Michael Ringgaard, Software Engineer and Rahul Gupta, Research Scientistnatural language understandingdependency parsingcoreference resolutionSLINGsemantic frame graphrecurrent neural networktechnical reportFrame Semantic Parsing1frame“Many people now claim to have predicted Black Monday.”predictedpeopleBlack Monday

SLINGtransitionsTensorFlowDRAGNNBlack Monday

Next StepsreleaseAcknowledgementsThe research described in this post was done by Michael Ringgaard, Rahul Gupta, and Fernando Pereira. We thank the Tensorflow and DRAGNN teams for open-sourcing their packages, and various colleagues at DRAGNN who helped us with multiple aspects of SLING's training setup.
1 Charles J. Fillmore. 1982. Frame semantics. Linguistics in the Morning Calm, pages 111–138.^↩

On-Device Conversational Modeling with TensorFlow Lite

Tuesday, November 14, 2017

Posted by Sujith Ravi, Research Scientist, Google Expander TeamAndroid Wear 2.0"on-device" machine learningGmailInboxAlloTensorFlow Liteon-device conversational modeldemo appProjectionNet: Learning Efficient On-Device Deep Networks Using Neural Projections“hey, how's it going?”“How's it going buddy?”projectiontrainerquantizationdistillation

TensorFlow Lite execution for the On-Device Conversational Model.

modelcodedemo apphere¹Beyond Conversational ModelsProjectionNetProjectionGraphgraphheregraph learning frameworksemi-supervised

ML architecture for training on-device models: ProjectionNet trained using deep learning (left), and ProjectionGraph trained using graph learning (right).

AcknowledgmentsYicheng Fan and Gaurav Nemade contributed immensely to this effort. Special thanks to Rajat Monga, Andre Hentz, Andrew Selle, Sarah Sirajuddin, and Anitha Vijayakumar from the TensorFlow team; Robin Dua, Patrick McGregor, Andrei Broder, Andrew Tomkins and the Google Expander team.
1 The released on-device model was trained to optimize for small size and low latency applications on mobile phones and wearables. Smart Reply predictions in Google apps, however are generated using larger, more complex models. In production systems, we also use multiple classifiers that are trained to detect inappropriate content and apply further filtering and tuning to optimize user experience and quality levels. We recommend that developers using the open-source TensorFlow Lite version also follow such practices for their end applications.^↩

Fused Video Stabilization on the Pixel 2 and Pixel 2 XL

Friday, November 10, 2017

Posted by Chia-Kai Liang, Senior Staff Software Engineer and Fuhao Shi, Android Camera Teamhighest overall rating for a smartphone cameraCamera ShakeMotion BlurRolling ShutterCMOSrolling shutter distortion

A simulated rendering of a video with global (left) and rolling (right) shutter.

Focus Breathingangle of viewbreathingOptical Image Stabilization

The video is taken by Pixel 2 with only OIS enabled. You can see the frame center is stabilized, but the boundaries have some jello-like artifacts.

Electronic Image StabilizationMaking a Better Video: Fused Video Stabilization

Motion Analysis

Left: The stabilized video of a “running” motion with a 3ms timing error. Note the occasional jittering. Right: The stabilized video with correct timestamps. The bottom right corner shows the original shaky video.

Motion FilteringlookaheadFrame Synthesis

Left: The input video with mesh overlay. Right: The warped frame, and the red rectangle is the final stabilized output. Note how the non-rigid warping corrects the rolling shutter distortion.

Lookahead Motion Filtering

Left: The input unstabilized video. Right: The smoothed result after Gaussian filtering.

Left: The Gaussian filtered result. Right: Our lookahead result. We predict that the user is panning to the right, and suppress more vertical motions.

Left: Our lookahead result. The undefined area at the bottom-left are shown in cyan. Right: The final result with the bad region removed.

Left: Pixel 2 with OIS only. Right: Pixel 2 with the basic Fused Video Stabilization. Note that sharpness variation around the “Exit” label.

Left: Pixel 2 with the basic Fused Video Stabilization. Right: The full Fused Video Stabilization solution with motion blur masking.

Results

Videos taken by two Pixel 2 phones mounted on a single hand grip. Fused Video Stabilization is disabled in the left one.

Videos taken by two Pixel 2 phones mounting on a single hand grip. Fused Video Stabilization is disabled in the left one. Note that the videographer jumped together with the subject.

AcknowledgementsFused Video Stabilization is a large-scale effort across multiple teams in Google, including the camera algorithm team, sensor algorithm team, camera hardware team, and sensor hardware team.

Seamless Google Street View Panoramas

Thursday, November 09, 2017

Posted by Mike Krainin, Software Engineer and Ce Liu, Research Scientist, Machine PerceptionGoogle Street Viewparallax

Left: A Street View car carrying a multi-camera rosette. Center: A close-up of the rosette, which is made up of 15 cameras. Right: A visualization of the spatial coverage of each camera. Overlap between adjacent cameras is shown in darker gray.

Left: The Sydney Opera House with stitching seams along its iconic shells. Right: The same Street View panorama after optical flow seam repair.

optical flowOptical FlowPhotoScan blog post

The boundaries of a pair of constituent images from the rosette camera rig that need to be stitched together.

An illustration of optical flow within the pair’s overlap region.

Extracted correspondences in the pair of images. For each colored dot in the overlap region of the left image, there is an equivalently-colored dot in the overlap region of the right image, indicating how the optical flow algorithm has aligned the point. These pairs of corresponding points are used as input to the global optimization stage. Notice that the overlap covers only a small portion of each image.

Global OptimizationsplineCeres Solver

A visualization of the final warping process. Left: A section of the panorama covering 180 degrees horizontally. Notice that the overall effect of warping is intentionally quite subtle. Right: A close-up, highlighting how warping repairs the seams.

previously published work

Left: A close-up of the un-repaired panorama. Middle: Result of kernel-based interpolation. This fixes discontinuities but at the expense of strong wobbling artifacts due to the small image overlap and limited footprint of kernels. Right: Result of our global optimization.

Tower Bridge, London

Christ the Redeemer, Rio de Janeiro

An SUV on the streets of Seattle

AcknowledgementsSpecial thanks to Bryan Klingner for helping to integrate this feature with the Street View infrastructure.

Feature Visualization

Tuesday, November 07, 2017

Posted by Christopher Olah, Research Scientist, Google Brain Team and Alex Mordvintsev, Research Scientist, Google Research
new articleDistillDeepDreamevery neuron1

DistillAcknowledgementWe're extremely grateful to our co-author, Ludwig Schurbert, who made incredible contributions to our paper and especially to the interactive visualizations.

Tangent: Source-to-Source Debuggable Derivatives

Monday, November 06, 2017

Posted by Alex Wiltschko, Research Scientist, Google Brain Team(Crossposted on the Google Open Source Blog)ff

Easily debug your backward pass

Fast gradient surgery

Forward mode automatic differentiation

Efficient Hessian-vector products

Code optimizations

Neural networksreverse-mode automatic differentiation TF EagerPyTorch AutogradTensorFlow

Automatic differentiation of Python codetf.exptf.log

tangent.grad

tangent.gradTensorFlow Eager functionstangent.gradreverse-order processingdf

Using TensorFlow Eager functions, for processing arrays of numbers.

Subroutines

Control flow

Next Stepsgithub.com/google/tangentreportGriewank and Walther 2000Gruslys et al., 2016Acknowledgments

AutoML for large scale image classification and object detection

Thursday, November 02, 2017

Posted by Barret Zoph, Vijay Vasudevan, Jonathon Shlens and Quoc Le, Research Scientists, Google Brain TeamAutoMLImageNet Learning Transferable Architectures for Scalable Image RecognitionCOCO

We redesigned the search space so that AutoML could find the best layer which can then be stacked many times in a flexible manner to create a final network.

We performed architecture search on CIFAR-10 and transferred the best learned architecture to ImageNet image classification and COCO object detection.

“NASNet”

Our NASNet architecture is composed of two types of layers: Normal Layer (left), and Reduction Layer (right). These two layers are designed by AutoML.

Accuracies of NASNet and state-of-the-art, human-invented models at various model sizes on ImageNet image classification.

Example object detection using Faster-RCNN with NASNet.

SlimObject DetectionSpecial thanksReferencesLearning Transferable Architectures for Scalable Image RecognitionBarret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. Going Deeper with ConvolutionsChristian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich.Rethinking the inception architecture for computer visionChristian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. Inception-v4, Inception-ResNet and the Impact of Residual Connections on LearningChristian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alex Alemi.Squeeze-and-Excitation NetworksJie Hu, Li Shen and Gang Sun.Faster R-CNN: Towards Real-Time Object Detection with Region Proposal NetworksShaoqing Ren, Kaiming He, Ross Girshick and Jian Sun.

Latest Innovations in TensorFlow Serving

Thursday, November 02, 2017

Posted by Chris Olston, Research Scientist, and Noah Fiedel, Software Engineer, TensorFlow ServingTensorFlow ServingFebruary 2016

Out-of-the-box optimized serving and customizability: We now offer a pre-built canonical serving binary, optimized for modern CPUs with AVX, so developers don't need to assemble their own binary from our libraries unless they have exotic needs. At the same time, we added a registry-based framework, allowing our libraries to be used for custom (or even non-TensorFlow) serving scenarios.

Multi-model serving: Going from one model to multiple concurrently-served models presents several performance obstacles. We serve multiple models smoothly by (1) loading in isolated thread pools to avoid incurring latency spikes on other models taking traffic; (2) accelerating initial loading of all models in parallel upon server start-up; (3) multi-model batch interleaving to multiplex hardware accelerators (GPUs/TPUs).

Standardized model format: We added SavedModel to TensorFlow 1.0, giving the community a single standard model format that works across training and serving.

Easy-to-use inference APIs: We released easy-to-use APIs for common inference tasks (classification, regression) that we know work for a wide swathe of our applications. To support more advanced use-cases we support a lower-level tensor-based API (predict) and a new multi-inference API that enables multi-task modeling.

SRE TFXUC Berkeley RISE LabClipperCloud ML Predictionreleases

Granular batching: A key technique we employ to achieve high throughput on specialized hardware (GPUs and TPUs) is "batching": processing multiple examples jointly for efficiency. We are developing technology and best practices to improve batching to: (a) enable batching to target just the GPU/TPU portion of the computation, for maximum efficiency; (b) enable batching within recursive neural networks, used to process sequence data e.g. text and event sequences. We are experimenting with batching arbitrary sub-graphs using the Batch/Unbatch op pair.

Distributed model serving: We are looking at model sharding techniques as a means of handling models that are too large to fit on one server node or sharing sub-models in a memory-efficient way. We recently launched a 1TB+ model in production with good results, and hope to open-source this capability soon.

github.com/tensorflow/serving

Google Research Blog

Understanding Bias in Peer Review

Interpreting Deep Neural Networks with SVCCA

Understanding Medical Conversations

SLING: A Natural Language Frame Semantic Parser

On-Device Conversational Modeling with TensorFlow Lite

Fused Video Stabilization on the Pixel 2 and Pixel 2 XL

Seamless Google Street View Panoramas

Feature Visualization

Tangent: Source-to-Source Debuggable Derivatives

AutoML for large scale image classification and object detection

Latest Innovations in TensorFlow Serving

Labels

Archive

Feed

Company-wide

Products

Developers