Google Research Blog
The latest news from Research at Google
Google at ACL 2017
Sunday, July 30, 2017
Posted by Christian Howard, Editor-in-Chief, Research Communications
This week, Vancouver, Canada hosts the
2017 Annual Meeting of the Association for Computational Linguistics
(ACL 2017), the premier conference in the field of
natural language understanding
, covering a broad spectrum of diverse research areas that are concerned with computational approaches to natural language.
As a leader in natural language processing & understanding and a Platinum sponsor of ACL 2017, Google will be on hand to showcase research interests that include syntax, semantics, discourse, conversation, multilingual modeling, sentiment analysis, question answering, summarization, and generally building better systems using labeled and unlabeled data, state-of-the-art modeling and learning from indirect supervision.
If you’re attending ACL 2017, we hope that you’ll stop by the Google booth to check out some demos, meet our researchers and discuss projects and opportunities at Google that go into solving interesting problems for billions of people. Learn more about the Google research being presented at ACL 2017 below (Googlers highlighted in
blue
).
Organizing Committee
Area Chairs include:
Sujith Ravi
(Machine Learning),
Thang Luong
(Machine Translation)
Publication Chairs include:
Margaret Mitchell
(Advisory)
Accepted Papers
A Polynomial-Time Dynamic Programming Algorithm for Phrase-Based Decoding with a Fixed Distortion Limit
Yin-Wen Chang
,
Michael Collins
Cross-Sentence N-ary Relation Extraction with Graph LSTMs
Nanyun Peng, Hoifung Poon, Chris Quirk,
Kristina Toutanova
, Wen-Tau Yih
Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision
Chen Liang, Jonathan Berant,
Quoc Le
, Kenneth D. Forbus, Ni Lao
Coarse-to-Fine Question Answering for Long Documents
Eunsol Choi,
Daniel Hewlett
,
Jakob Uszkoreit
, Illia Polosukhin, Alexandre Lacoste,
Jonathan Berant
Automatic Compositor Attribution in the First Folio of Shakespeare
Maria Ryskina, Hannah Alpert-Abrams,
Dan Garrette
, Taylor Berg-Kirkpatrick
A Nested Attention Neural Hybrid Model for Grammatical Error Correction
Jianshu Ji, Qinlong Wang,
Kristina Toutanova
, Yongen Gong, Steven Truong, Jianfeng Gao
Get To The Point: Summarization with Pointer-Generator Networks
Abigail See,
Peter J. Liu
, Christopher D. Manning
Identifying 1950s American Jazz Composers: Fine-Grained IsA Extraction via Modifier Composition
Ellie Pavlick
*
,
Marius Pasca
Learning to Skim Text
Adams Wei Yu,
Hongrae Lee
,
Quoc Le
Workshops
2017 ACL Student Research Workshop
Program Committee includes:
Emily Pitler
,
Brian Roark
,
Richard Sproat
WiNLP: Women and Underrepresented Minorities in Natural Language Processing
Organizers include:
Margaret Mitchell
Gold Sponsor
BUCC: 10th Workshop on Building and Using Comparable Corpora
Scientific Committee includes:
Richard Sproat
CLPsych: Computational Linguistics and Clinical Psychology – From Linguistic Signal to Clinical
Reality
Program Committee includes:
Brian Roark
,
Richard Sproat
Repl4NLP: 2nd Workshop on Representation Learning for NLP
Program Committee includes:
Ankur Parikh
,
John Platt
RoboNLP: Language Grounding for Robotics
Program Committee includes:
Ankur Parikh
,
Tom Kwiatkowski
CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Management Group includes:
Slav Petrov
CoNLL-SIGMORPHON-2017 Shared Task: Universal Morphological Reinflection
Organizing Committee includes:
Manaal Faruqui
Invited Speaker:
Chris Dyer
SemEval: 11th International Workshop on Semantic Evaluation
Organizers include:
Daniel Cer
ALW1: 1st Workshop on Abusive Language Online
Panelists include:
Margaret Mitchell
EventStory: Events and Stories in the News
Program Committee includes:
Silvia Pareti
NMT: 1st Workshop on Neural Machine Translation
Organizing Committee includes:
Thang Luong
Program Committee includes:
Hieu Pham
,
Taro Watanabe
Invited Speaker:
Quoc Le
Tutorials
Natural Language Processing for Precision Medicine
Hoifung Poon, Chris Quirk,
Kristina Toutanova
, Wen-tau Yih
Deep Learning for Dialogue Systems
Yun-Nung Chen, Asli Celikyilmaz,
Dilek Hakkani-Tur
*
Contributed during an internship at Google.
↩
Meet Parsey’s Cousins: Syntax for 40 languages, plus new SyntaxNet capabilities
Monday, August 08, 2016
Posted by Chris Alberti, Dave Orr & Slav Petrov, Google Natural Language Understanding Team
Just in time for
ACL 2016
, we are pleased to announce that Parsey McParseface,
released in May as part of SyntaxNet
and the basis for the
Cloud Natural Language API
, now has 40 cousins!
Parsey’s Cousins
is a collection of pretrained syntactic models for 40 languages, capable of analyzing the native language of more than half of the world’s population at often unprecedented
accuracy
. To better address the linguistic phenomena occurring in these languages we have endowed SyntaxNet with new abilities for
Text Segmentation
and
Morphological Analysis
.
When we released Parsey, we were already planning to expand to more languages, and it soon became clear that this was both urgent and important, because researchers were having trouble creating top notch SyntaxNet models for other languages.
The reason for that is a little bit subtle. SyntaxNet, like other
TensorFlow
models, has a lot of knobs to turn, which affect accuracy and speed. These knobs are called hyperparameters, and control things like the learning rate and its decay, momentum, and random initialization. Because neural networks are more sensitive to the choice of these hyperparameters than many other machine learning algorithms, picking the right hyperparameter setting is very important. Unfortunately there is no tested and proven way of doing this and picking good hyperparameters is mostly an empirical science -- we try a bunch of settings and see what works best.
An additional challenge is that training these models can take a long time, several days on very fast hardware. Our solution is to train many models in parallel via
MapReduce
, and when one looks promising, train a bunch more models with similar settings to fine-tune the results. This can really add up -- on average, we train more than 70 models per language. The plot below shows how the accuracy varies depending on the hyperparameters as training progresses. The best models are up to 4% absolute more accurate than ones trained without hyperparameter tuning.
Held-out set accuracy for various English parsing models with different hyperparameters (each line corresponds to one training run with specific hyperparameters). In some cases training is a lot slower and in many cases a suboptimal choice of hyperparameters leads to significantly lower accuracy. We are releasing the best model that we were able to train for each language.
In order to do a good job at analyzing the grammar of other languages, it was not sufficient to just fine-tune our English setup. We also had to expand the capabilities of SyntaxNet. The first extension is a model for text segmentation, which is the task of identifying word boundaries. In languages like English, this isn’t very hard -- you can mostly look for spaces and punctuation. In Chinese, however, this can be very challenging, because words are not separated by spaces. To correctly analyze dependencies between Chinese words, SyntaxNet needs to understand text segmentation -- and now it does.
Analysis of a Chinese string into a parse tree showing dependency labels, word tokens, and parts of speech (read top to bottom for each word token).
The second extension is a model for morphological analysis. Morphology is a language feature that is poorly represented in English. It describes inflection: i.e., how the grammatical function and meaning of the word changes as its spelling changes. In English, we add an -s to a word to indicate plurality. In Russian, a
heavily inflected language
, morphology can indicate number, gender, whether the word is the subject or object of a sentence, possessives, prepositional phrases, and more. To understand the syntax of a sentence in Russian, SyntaxNet needs to understand morphology -- and now it does.
Parse trees showing dependency labels, parts of speech, and morphology.
As you might have noticed, the parse trees for all of the sentences above look very similar. This is because we follow the content-head principle, under which dependencies are drawn between content words, with function words becoming leaves in the parse tree. This idea was developed by the
Universal Dependencies
project in order to increase parallelism between languages. Parsey’s Cousins are trained on
treebanks
provided by this project and are designed to be cross-linguistically consistent and thus easier to use in multi-lingual language understanding applications.
Using the same set of labels across languages can help us understand how sentences in different languages, or variations in the same language, convey the same meaning. In all of the above examples, the root indicates the main verb of the sentence and there is a passive nominal subject (indicated by the arc labeled with ‘nsubjpass’) and a passive auxiliary (‘auxpass’). If you look closely, you will also notice some differences because the grammar of each language differs. For example, English uses the preposition ‘by,’ where Russian uses morphology to mark that the phrase ‘the publisher (издателем)’ is in
instrumental case
-- the meaning is the same, it is just expressed differently.
Google has been involved in the Universal Dependencies project since its
inception
and we are very excited to be able to bring together our efforts on datasets and modeling. We hope that this release will facilitate research progress in building computer systems that can understand all of the world’s languages.
Parsey's Cousins can be found on
GitHub
, along with
Parsey McParseface
and
SyntaxNet
.
ACL 2016 & Research at Google
Sunday, August 07, 2016
Posted by Slav Petrov, Research Scientist
This week, Berlin hosts the
2016 Annual Meeting of the Association for Computational Linguistics
(ACL 2016), the premier conference of the field of computational linguistics, covering a broad spectrum of diverse research areas that are concerned with computational approaches to natural language. As a leader in
Natural Language Processing
(NLP) and a Platinum Sponsor of the conference, Google will be on hand to showcase research interests that include syntax, semantics, discourse, conversation, multilingual modeling, sentiment analysis, question answering, summarization, and generally building better learners using labeled and unlabeled data, state-of-the-art modeling, and learning from indirect supervision.
Our systems are used in numerous ways across Google, impacting user experience in search, mobile, apps, ads, translate and more. Our work spans the range of traditional NLP tasks, with general-purpose syntax and semantic algorithms underpinning more specialized systems.
Our researchers are experts in natural language processing and machine learning, and combine methodological research with applied science, and our engineers are equally involved in long-term research efforts and driving immediate applications of our technology.
If you’re attending ACL 2016, we hope that you’ll stop by the booth to check out some demos, meet our researchers and discuss projects and opportunities at Google that go into solving interesting problems for billions of people. Learn more about Google research being presented at ACL 2016 below (Googlers highlighted in
blue
), and visit the Natural Language Understanding Team page at
g.co/NLUTeam
.
Papers
Generalized Transition-based Dependency Parsing via Control Parameters
Bernd Bohnet
,
Ryan McDonald
,
Emily Pitler
,
Ji Ma
Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning
Yulia Tsvetkov, Manaal Faruqui,
Wang Ling (Google DeepMind)
,
Chris Dyer (Google DeepMind)
Morpho-syntactic Lexicon Generation Using Graph-based Semi-supervised Learning
(
TACL
)
Manaal Faruqui,
Ryan McDonald
,
Radu Soricut
Many Languages, One Parser
(
TACL
)
Waleed Ammar, George Mulcaire, Miguel Ballesteros,
Chris Dyer (Google DeepMind)
*
, Noah A. Smith
Latent Predictor Networks for Code Generation
Wang Ling (Google DeepMind)
,
Phil Blunsom (Google DeepMind)
,
Edward Grefenstette (Google DeepMind)
,
Karl Moritz Hermann (Google DeepMind)
,
Tomáš Kočiský (Google DeepMind)
,
Fumin Wang (Google DeepMind)
,
Andrew Senior (Google DeepMind)
Collective Entity Resolution with Multi-Focal Attention
Amir Globerson
,
Nevena Lazic
,
Soumen Chakrabarti,
Amarnag Subramanya
,
Michael Ringgaard
,
Fernando Pereira
Plato: A Selective Context Model for Entity Resolution
(
TACL
)
Nevena Lazic
,
Amarnag Subramanya
,
Michael Ringgaard
,
Fernando Pereira
WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia
Daniel Hewlett
,
Alexandre Lacoste
,
Llion Jones
,
Illia Polosukhin
,
Andrew Fandrianto
,
Jay Han
,
Matthew Kelcey
,
David Berthelot
Stack-propagation: Improved Representation Learning for Syntax
Yuan Zhang,
David Weiss
Cross-lingual Models of Word Embeddings: An Empirical Comparison
Shyam Upadhyay, Manaal Faruqui,
Chris Dyer (Google DeepMind)
,
Dan Roth
Globally Normalized Transition-Based Neural Networks
(Outstanding Papers Session)
Daniel Andor
,
Chris Alberti
,
David Weiss
,
Aliaksei Severyn
,
Alessandro Presta
,
Kuzman Ganchev
,
Slav Petrov
,
Michael Collins
Posters
Cross-lingual projection for class-based language models
Beat Gfeller
,
Vlad Schogol
,
Keith Hall
Synthesizing Compound Words for Machine Translation
Austin Matthews,
Eva Schlinger
*
, Alon Lavie,
Chris Dyer (Google DeepMind)
*
Cross-Lingual Morphological Tagging for Low-Resource Languages
Jan Buys,
Jan A. Botha
Workshops
1st Workshop on Representation Learning for NLP
Keynote Speakers include:
Raia Hadsell (Google DeepMind)
Workshop Organizers include:
Edward Grefenstette (Google DeepMind)
,
Phil Blunsom (Google DeepMind)
,
Karl Moritz Hermann (Google DeepMind)
Program Committee members include:
Tomáš Kočiský (Google DeepMind)
,
Wang Ling (Google DeepMind)
,
Ankur Parikh (Google)
,
John Platt (Google)
,
Oriol Vinyals (Google DeepMind)
1st Workshop on Evaluating Vector-Space Representations for NLP
Contributed Papers:
Problems With Evaluation of Word Embeddings Using Word Similarity Tasks
Manaal Faruqui, Yulia Tsvetkov, Pushpendre Rastogi,
Chris Dyer (Google DeepMind)
*
Correlation-based Intrinsic Evaluation of Word Vector Representations
Yulia Tsvetkov, Manaal Faruqui,
Chris Dyer (Google DeepMind)
SIGFSM Workshop on Statistical NLP and Weighted Automata
Contributed Papers:
Distributed representation and estimation of WFST-based n-gram models
Cyril Allauzen
,
Michael Riley
,
Brian Roark
Pynini: A Python library for weighted finite-state grammar compilation
Kyle Gorman
*
Work completed at CMU
↩
Google at ACL 2011
Wednesday, May 18, 2011
Posted by Ryan McDonald and Fernando Pereira, Research Team
The Annual Meeting of the
Association for Computational Linguistics
is one of the premier conferences for language and text technologies. Many employees at Google have strong roots in the community of researchers that attend this meeting, including many of our researchers working on
machine translation
and
speech
.
At
this years conference
, Google is particularly well represented. The General Chair is
Dekang Lin
and a few Googlers are serving as technical
Area Chairs
(in addition to the plethora of Googlers that reviewed papers for the conference). Google is also a
Platinum Sponsor
of ACL this year.
Research advances at Google can be seen throughout the conference’s technical content. Below is a complete list of Googler-authored or co-authored papers in the main conference. We want to give special emphasis to this year’s best paper award, given to “
Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections
” by CMU graduate student and Google intern
Dipanjan Das
and his internship advisor
Slav Petrov
. ACL is an extremely selective conference and this award speaks volumes to the importance of
syntactic analysis
and using bilingual corpora to project syntactic resources from resource rich languages (like English) to other languages. Congratulations Dipanjan and Slav!
Googlers are also involved in two of this year’s tutorials.
Marius Pasca
will present “
Web Search Queries as a Corpus
” and
Kuzman Ganchev
and his colleagues will teach about “
Rich Prior Knowledge in Learning for Natural Language Processing
”. Finally,
Katja Filippova
and her colleagues are running a workshop on “
Monolingual Text-to-Text Generation
”.
ACL will take place this year in Portland from June 19th to June 24th.
Papers by Googlers (a * indicates a paper that will be linked to after the conference):
Ranking Class Labels Using Query Sessions*
Marius Pasca
Fine-Grained Class Label Markup of Search Queries*
Joseph Reisinger and Marius Pasca
Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections
Dipanjan Das and Slav Petrov
Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models
Sameer Singh, Amarnag Subramanya, Fernando Pereira and Andrew McCallum
Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition
Stefan Rüd, Massimiliano Ciaramita, Jens Müller and Hinrich Schütze
Beam-Width Prediction for Efficient Context-Free Parsing
Nathan Bodenstab, Aaron Dunlop, Keith Hall and Brian Roark
Language-independent compound splitting with morphological operations
Klaus Macherey, Andrew Dai, David Talbot, Ashok Popat and Franz Och
Model-Based Aligner Combination Using Dual Decomposition
John DeNero and Klaus Macherey
Binarized Forest to String Translation
Hao Zhang, Licheng Fang, Peng Xu and Xiaoyun Wu
Semi-supervised Latent Variable Models for Fine-grained Sentiment Analysis
Oscar Tackstrom and Ryan McDonald
Labels
accessibility
ACL
ACM
Acoustic Modeling
Adaptive Data Analysis
ads
adsense
adwords
Africa
AI
Algorithms
Android
Android Wear
API
App Engine
App Inventor
April Fools
Art
Audio
Augmented Reality
Australia
Automatic Speech Recognition
Awards
Cantonese
Chemistry
China
Chrome
Cloud Computing
Collaboration
Computational Imaging
Computational Photography
Computer Science
Computer Vision
conference
conferences
Conservation
correlate
Course Builder
crowd-sourcing
CVPR
Data Center
Data Discovery
data science
datasets
Deep Learning
DeepDream
DeepMind
distributed systems
Diversity
Earth Engine
economics
Education
Electronic Commerce and Algorithms
electronics
EMEA
EMNLP
Encryption
entities
Entity Salience
Environment
Europe
Exacycle
Expander
Faculty Institute
Faculty Summit
Flu Trends
Fusion Tables
gamification
Gboard
Gmail
Google Accelerated Science
Google Books
Google Brain
Google Cloud Platform
Google Docs
Google Drive
Google Genomics
Google Maps
Google Photos
Google Play Apps
Google Science Fair
Google Sheets
Google Translate
Google Trips
Google Voice Search
Google+
Government
grants
Graph
Graph Mining
Hardware
HCI
Health
High Dynamic Range Imaging
ICLR
ICML
ICSE
Image Annotation
Image Classification
Image Processing
Inbox
India
Information Retrieval
internationalization
Internet of Things
Interspeech
IPython
Journalism
jsm
jsm2011
K-12
KDD
Keyboard Input
Klingon
Korean
Labs
Linear Optimization
localization
Low-Light Photography
Machine Hearing
Machine Intelligence
Machine Learning
Machine Perception
Machine Translation
Magenta
MapReduce
market algorithms
Market Research
Mixed Reality
ML
MOOC
Moore's Law
Multimodal Learning
NAACL
Natural Language Processing
Natural Language Understanding
Network Management
Networks
Neural Networks
Nexus
Ngram
NIPS
NLP
On-device Learning
open source
operating systems
Optical Character Recognition
optimization
osdi
osdi10
patents
Peer Review
ph.d. fellowship
PhD Fellowship
PhotoScan
Physics
PiLab
Pixel
Policy
Professional Development
Proposals
Public Data Explorer
publication
Publications
Quantum AI
Quantum Computing
renewable energy
Research
Research Awards
resource optimization
Robotics
schema.org
Search
search ads
Security and Privacy
Semantic Models
Semi-supervised Learning
SIGCOMM
SIGMOD
Site Reliability Engineering
Social Networks
Software
Speech
Speech Recognition
statistics
Structured Data
Style Transfer
Supervised Learning
Systems
TensorBoard
TensorFlow
TPU
Translate
trends
TTS
TV
UI
University Relations
UNIX
User Experience
video
Video Analysis
Virtual Reality
Vision Research
Visiting Faculty
Visualization
VLDB
Voice Search
Wiki
wikipedia
WWW
YouTube
Archive
2018
May
Apr
Mar
Feb
Jan
2017
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Jul
May
Apr
Mar
Feb
2007
Oct
Sep
Aug
Jul
Jun
Feb
2006
Dec
Nov
Sep
Aug
Jul
Jun
Apr
Mar
Feb
Feed
Google
on
Follow @googleresearch
Give us feedback in our
Product Forums
.