Google AI Blog

Open Sourcing Active Question Reformulation with Reinforcement Learning

Wednesday, October 10, 2018

Posted by Michelle Chen Huebscher, Software Engineer and Rodrigo Nogueira, New York University PhD Student and Software Engineering Intern, Google AI LanguageNatural language understandingongoing focusmachine translationsyntacticsemanticmuch morebuilding blockTensorFlow package for Active Question Answeringreinforcement learningICLR 2018Ask the Right Questions: Active Question Reformulation with Reinforcement LearningActive Question Answeringsupervised learning techniquesWhen was Tesla born?When is Tesla’s birthdayWhich year was Tesla bornJuly 10 1856

reinforcement learningour paperpolicytf-idf query term re-weightingword stemmingBuild Your Own ActiveQA System

A pretrained sequence to sequence model that takes as input a question and returns its reformulations. This task is similar to machine translation, translating from English to English, and indeed the initial model can be used for general paraphrasing. For its implementation we use and customize the TensorFlow Neural Machine Translation Tutorial code. We adapted the code to support training with reinforcement learning, using policy gradient methods.^*

An answer selection model. The answer selector uses a convolutional neural network and assigns a score to each triplet of original question, reformulation and answer. The selector uses pre-trained, publicly available word embeddings (GloVe).

A question answering system (the environment). For this purpose we use BiDAF, a popular question answering system, described in Seo et al. (2017).

AcknowledgmentsContributors to this research and release include Alham Fikri Aji, Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Wojciech Gajewski, Andrea Gesmundo, Alexey Gronskiy, Neil Houlsby, Yannic Kilcher, and Wei Wang.
* The system we reported on in our paper used the TensorFlow sequence-to-sequence code used in Britz et al. (2017). Later, an open source version of the Google Translation model (GNMT) was published as a tutorial. The ActiveQA version released today is based on this more recent, and actively developed implementation. For this reason the released system varies slightly from the paper’s. Nevertheless, the performance and behavior are qualitatively and quantitatively comparable.^↩

Highlights from the Google AI Residency Program

Tuesday, October 9, 2018

Posted by Phing Lee, Program Manager, Google AI Residency
inaugural class of the Google Brain ResidencyGoogle AI ResidencyGoogle AI teams

Some of our 2017 Google AI residents at the 2017 Neural Information Processing Systems Conference, hosted in Long Beach, California.

machine learningroboticshealthcare

A study on the effect of adversarial examples on human visual perception.

An algorithm that enables robots to learn more safely by avoiding states from which they cannot reset.

Initialization methods which enable training of neural network with unprecedented depths of 10K+ layers.

A method to make training more scalable by using larger mini-batches, which when applied to ResNet-50 on ImageNet reduced training time without compromising test accuracy.

And many more...

This experiment demonstrated (for the first time) the susceptibility of human time-limited vision to adversarial examples. For more details, see “Adversarial Examples that Fool both Computer Vision and Time-Limited Humans” accepted at NIPS 2018).

An algorithm for safe reinforcement learning prevents robots from taking actions they cannot undo. For more details, see “Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning” (accepted at ICLR 2018).

Extremely deep CNNs can be trained without the use of any special tricks simply by using a specially designed (Delta-Orthogonal) initialization. Test (solid) and training (dashed) curves on MNIST (top) and CIFAR10 (bottom). For more details, see “Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks” accepted at ICML 2018.

Applying a sequence of simple scaling rules, we increase the SGD batch size and reduce the number of parameter updates required to train our model by an order of magnitude, without sacrificing test set accuracy. This enables us to dramatically reduce model training time. For more details, see “Don’t Decay the Learning Rate, Increase the Batch Size”, accepted at ICLR 2018.

global officesperceptionalgorithms and optimizationlanguagehealthcaremuch moreg.co/airesidency/applyg.co/airesidency

Introducing the Kaggle “Quick, Draw!” Doodle Recognition Challenge

Friday, September 28, 2018

Posted by Thomas Deselaers, Senior Staff Software Engineer and Jake Walker, Product Manager, Machine PerceptionOnline handwriting recognitionTranslateKeepHandwriting Inputimprove your drawing abilitiesbuild virtual worldsQuick, Draw!dataset of 50M drawings1B that were drawnmany different new projectsKaggle "Quick, Draw!" Doodle Recognition Challenge“Quick, Draw!” datasetThe Dataset

Correct: the user drew the prompted category and the computer only recognized it correctly after the user was done drawing.

Correct, but incomplete: the user drew the prompted category and the computer recognized it correctly before the user had finished. Incompleteness can vary from nearly ready to only a fraction of the category drawn. This is probably fairly common in images that are marked as recognized correctly.

Correct, but not recognized correctly: The player drew the correct category but the AI never recognized it. Some players react to this by adding more details. Others scribble out and try again.

Incorrect: some players have different concepts in mind when they see a word - e.g. in the category seesaw, we have observed a number of saw drawings.

Get Startedtutorialchallenge websitekernelsAcknowledgementsWe'd like to thank everyone who worked with us on this, particularly Jonas Jongejan and Brenda Fogg from the Creative Lab team, Julia Elliott and Walter Reade from the Kaggle team, and the handwriting recognition team.

Building Google Dataset Search and Fostering an Open Data Ecosystem

Wednesday, September 26, 2018

Posted by Matthew Burgess and Natasha Noy, Google AIGoogle Dataset SearchlaunchWhy is my dataset not showing up in Google Dataset Search?An Overviewadding structured metadata on their sitesschema.org/Dataset

An overview of the technology behind Google Dataset Search

Using Structured Metadata from Data Providersguidelinesscholarly discussionsprovided bypublishercreatorConnecting Replicas of Datasetsschema.org/sameAsDigital Object IdentifierReconciling to the Google Knowledge GraphKnowledge GraphcloudsvaporwaterLinking to other Google ResourcesGoogle Scholar

It provides a valuable signal about the importance and prominence of a dataset.

It gives dataset authors an easy place to see citations to their data and to get credit.

Search and Ranking of ResultsA Better Open Data Ecosystemschema.orgW3C DCATJSON-LD

Widespread adoption of open metadata formats to describe published data.

Further development of open metadata formats to describe more types of data and in more detail.

The culture of citing data the way we cite research publications, giving those who create and publish the data the credit that they deserve.

The development of tools that leverage this metadata to enable more discovery or better use of data.

So, Where is Your Dataset?Structured Data Testing TooladdDataset SearchAcknowledgementsWe would like to thank Xiaomeng Ban, Dan Brickley, Lee Butler, Thomas Chen, Corinna Cortes, Kevin Espinoza, Archana Jain, Mike Jones, Kishore Papineni, Chris Sater, Gokhan Turhan, Shubin Zhao and Andi Vajda for their work on the project and all our partners, collaborators, and early adopters for their help.

Google’s Next Generation Music Recognition

Friday, September 14, 2018

Posted by James Lyon, Google AI, ZürichNow Playingdeep neural networksSound SearchWhat’s this song?Hey Google, what’s this song?

Now Playing versus Sound SearchNow Playing miniaturized music recognitionThe Core Matching Process of Now Playing

Matching, phase 1: Finding good candidates: For every embedding, Now Playing performs a nearest neighbor search on the on-device database of songs for similar embeddings. The database uses a hybrid of spatial partitioning and vector quantization to efficiently search through millions of embedding vectors. Because the audio buffer is noisy, this search is approximate, and not every embedding will find a nearby match in the database for the correct song. However, over the whole clip, the chances of finding several nearby embeddings for the correct song are very high, so the search is narrowed to a small set of songs which got multiple hits.

Matching, phase 2: Final matching: Because the database search used above is approximate, Now Playing may not find song embeddings which are nearby to some embeddings in our query. Therefore, in order to calculate an accurate similarity score, Now Playing retrieves all embeddings for each song in the database which might be relevant to fill in the “gaps”. Then, given the sequence of embeddings from the audio buffer and another sequence of embeddings from a song in the on-device database, Now Playing estimates their similarity pairwise and adds up the estimates to get the final matching score.

Scaling up Now Playing for the Sound Search server

We quadrupled the size of the neural network used, and increased each embedding from 96 to 128 dimensions, which reduces the amount of work the neural network has to do to pack the high-dimensional input audio into a low-dimensional embedding. This is critical in improving the quality of phase two, which is very dependent on the accuracy of the raw neural network output.

We doubled the density of our embeddings — it turns out that fingerprinting audio every 0.5s instead of every 1s doesn’t reduce the quality of the individual embeddings very much, and gives us a huge boost by doubling the number of embeddings we can use for the match.

Conclusion

AcknowledgementsWe would like to thank Micha Riser, Mihajlo Velimirovic, Marvin Ritter, Ruiqi Guo, Sanjiv Kumar, Stephen Wu, Diego Melendo Casado‎, Katia Naliuka, Jason Sanders, Beat Gfeller, Julian Odell, Christian Frank, Dominik Roblek, Matt Sharifi and Blaise Aguera y Arcas‎.

Introducing the Unrestricted Adversarial Examples Challenge

Thursday, September 13, 2018

Posted by Tom B. Brown and Catherine Olsson, Research Engineers, Google Brain Teammedicinechemistryagricultureadversarial examplesprevious research on adversarial examplesimproved modelsnot subject to the “small modification” constraintconfident errors when faced with an adversaryanyUnrestricted Adversarial Examples Challenge

Adversarial examples can be generated through a variety of means, including by making small modifications to the input pixels, but also using spatial transformations, or simple guess-and-check to find misclassified inputs.

Structure of the Challengedefenderattackerfull two-sided challenge with prizes for both attacks and defenses

Is this an unambiguous picture of a bird, a bicycle, or is it ambiguous / not obvious?defender's goalattacker's goal

Examples of ambiguous and unambiguous images. Defenders must make no confident mistakes on unambiguous bird or bicycle images. We discard all images that humans find ambiguous or not obvious. All images under CC licenses 1, 2, 3, 4.

our paperHow to Participatethe project on githubleaderboardAcknowledgementsThe team behind the Unrestricted Adversarial Examples Challenge includes Tom Brown, Catherine Olsson, Nicholas Carlini, Chiyuan Zhang, and Ian Goodfellow from Google, and Paul Christiano from OpenAI.

The What-If Tool: Code-Free Probing of Machine Learning Models

Tuesday, September 11, 2018

Posted by James Wexler, Software Engineer, Google AIHow would changes to a datapoint affect my model’s prediction? Does it perform differently for various groups–for example, historically marginalized people? How diverse is the dataset I am testing my model on?Google AI PAIR initiativeWhat-If ToolTensorBoard

The What-If Tool, showing a set of 250 face pictures and their results from a model that detects smiles.

Facets

Exploring what-if scenarios on a datapoint.

CounterfactualsUCI census dataset

Comparing counterfactuals.

Analysis of Performance and Algorithmic Fairnessnumerical fairness criteriaCelebA datasetROC curveconfusion matrixequal opportunity

Comparing the performance of two slices of data on a smile detection model, with their classification thresholds set to satisfy the “equal opportunity” constraint.

Demos

Detecting misclassifications: A multiclass classification model, which predicts plant type from four measurements of a flower from the plant. The tool is helpful in showing the decision boundary of the model and what causes misclassifications. This model is trained with the UCI iris dataset.

Assessing fairness in binary classification models: The image classification model for smile detection mentioned above. The tool is helpful in assessing algorithmic fairness across different subgroups. The model was purposefully trained without providing any examples from a specific subset of the population, in order to show how the tool can help uncover such biases in models. Assessing fairness requires careful consideration of the overall context — but this is a useful quantitative starting point.

Investigating model performance across different subgroups: A regression model that predicts a subject’s age from census information. The tool is helpful in showing relative performance of the model across subgroups and how the different features individually affect the prediction. This model is trained with the UCI census dataset.

What-If in PracticeAcknowledgmentsThe What-If Tool was a collaborative effort, with UX design by Mahima Pushkarna, Facets updates by Jimbo Wilson, and input from many others. We would like to thank the Google teams that piloted the tool and provided valuable feedback and the TensorBoard team for all their help.

Text-to-Speech for Low-Resource Languages (Episode 4): One Down, 299 to Go

Friday, September 7, 2018

Posted by Alexander Gutkin, Software Engineer, Google AIThis is the fourth episode in the series of posts reporting on the work we are doing to build text-to-speech (TTS) systems for low resource languages. In the first episode, we described the crowdsourced acoustic data collection effort for Project Unison. In the second episode, we described how we built parametric voices based on that data. In the third episode, we described the compilation of a pronunciation lexicon for a TTS system. In this episode, we describe how to make a single TTS system speak many languages.multiple languagesmultilingualinitial investigationnew modelInternational Phonetic AlphabetExploring the Closely Related Languages of Indonesialanguages of IndonesiaStandard IndonesianJavaneseSundanesephonologies

Joint phoneme inventory of Indonesian, Javanese, and Sundanese in International Phonetic Alphabet notation.

Google TranslateAndroidExpanding to the More Diverse Language Families of South Asiavery differentIndo-AryanDravidianSanskritculture

Descendants of Sanskrit word for “culture” across languages.

TeluguKannadaWest BengaliOdiaGujaratiMarathiphonologyorthographyin our recent paper

Diagram illustrating our multilingual text-to-speech approach. The input text queries are processed by language-specific linguistic front-ends to generate pronunciations in a shared phonemic representation serving as input to the language-agnostic acoustic model. The model then generates audio for the respective queries.

Indian BengaliGujaratiKannadaMalayalamMarathiTamilTeluguUrduNepaliSinhalaHindiBangladeshi BengaliGoogle for IndiaNepaliSinhalaBengaliKhmerJavaneseSundaneseSLTUInterspeech

Introducing the Inclusive Images Competition

Thursday, September 6, 2018

Posted by Tulsee Doshi, Product Manager, Google AIImageNetOpen ImagesConceptual Captionsfound to be geographically skewedwedding

Wedding photographs (donated by Googlers), labeled by a classifier trained on the Open Images dataset. The classifier’s label predictions are recorded below each image.

Inclusive Images Competition on KaggleConference on Neural Information Processing Systems Competition TrackOpen Images

The three geographical distributions of data in this competition. Competitors will train their models on Open Images, a widely used publicly available benchmark dataset for image classification which happens to be drawn mostly from North America and Western Europe. Models are then evaluated first on Challenge Stage 1 and finally on Challenge Stage 2, each with different un-revealed geographical distributions. In this way, models are stress-tested for their ability to operate inclusively beyond their training data.

Crowdsource project

Examples of labeled images from the challenge dataset. Clockwise from top left, image donation by Peter Tester, Mukesh Kumhar, HeeYoung Moon, Sudipta Pramanik, jaturan amnatbuddee, Tomi Familoni and Anu Subhi

September 5thMonday, November 5thTuesday, November 6thInclusive Images Competition websiteConference on Neural Information Processing Systemsthis pageAcknowledgementsWe would like to thank the following individuals for making the Inclusive Image Competition and dataset possible: James Atwood, Pallavi Baljekar, Parker Barnes, Anurag Batra, Eric Breck, Peggy Chi, Tulsee Doshi, Julia Elliott, Gursheesh Kour, Akshay Gaur, Yoni Halpern, Henry Jicha, Matthew Long, Jigyasa Saxena, Richa Singh and D. Sculley.

Conceptual Captions: A New Dataset and Challenge for Image Captioning

Wednesday, September 5, 2018

Posted by Piyush Sharma, Software Engineer and Radu Soricut, Research Scientist, Google AIAlt-text HTMLtext-to-speech systemsautomatic image captioning

Image captioning can help millions with visual impairments by converting images captions to text. Image by Francis Vallance (Heritage Warrior), used under CC BY 2.0 license.

Conceptual Captionsa paperACL 2018MS-COCO datasetConceptual Captions Challenge

Illustration of images and captions in the Conceptual Captions dataset.
Clockwise from top left, images by Jonny Hunter, SigNote Cloud, Tony Hisgett, ResoluteSupportMedia. All images used under CC BY 2.0 license

Generating the Datasetour paperimage classification modelsword variationsFrom Specific Names to General Concepts¹Former Miss World Priyanka Chopraactorin Los AngelesItalianartist and artist

Illustration of text modification. Image by Rockoleando used under CC BY 2.0 license.

Dataset ImpactRNNTransformerTensor2TensorMS-COCOour paperFlickr30K

Get InvolvedConceptual Captions ChallengeAcknowledgementsThanks to Nan Ding, Sebastian Goodman and Bo Pang for training models with Conceptual Captions dataset, and to Amol Wankhede for driving the public release efforts for the dataset.
1 In our paper, we posit that if automatic determination of names, locations, brands, etc. from the image is needed, it should be done as a separate task that may leverage image meta-information (e.g. GPS info), or complementary techniques such as OCR.^↩

Blog