Google Research Blog
The latest news from Research at Google
A Billion Words: Because today's language modeling standard should be higher
Wednesday, April 30, 2014
Posted by Dave Orr, Product Manager, and Ciprian Chelba, Research Scientist
Language is chock full of ambiguity, and it can turn up in surprising places. Many words are hard to tell apart without context: most Americans
pronounce “ladder” and “latter” identically
, for instance. Keyboard inputs on mobile devices have a similar problem, especially for
IME keyboards
. For example, the input patterns for “Yankees” and “takes” look very similar:
Photo credit: Kurt Partridge
But in this context -- the previous two words, “New York” -- “Yankees” is much more likely.
One key way computers use context is with
language models
. These are used for predictive keyboards, but also speech recognition, machine translation, spelling correction, query suggestions, and so on. Often those are specialized: word order for queries versus web pages can be very different. Either way, having an accurate language model with wide coverage drives the quality of all these applications.
Due to interactions between components, one thing that can be tricky when evaluating the quality of such complex systems is error attribution. Good engineering practice is to evaluate the quality of each module separately, including the language model. We believe that the field could benefit from a large, standard set with benchmarks for easy comparison and experiments with new modeling techniques.
To that end,
we are releasing scripts
that convert a set of public data into a language model consisting of over a billion words, with standardized training and test splits, described in an
arXiv paper
. Along with the scripts, we’re releasing the processed data in one convenient location, along with the training and test data. This will make it much easier for the research community to quickly reproduce results, and we hope will speed up progress on these tasks.
The benchmark scripts and data are freely available, and can be found here:
http://www.statmt.org/lm-benchmark/
The field needs a new and better standard benchmark. Currently, researchers report from a set of their choice, and results are very hard to reproduce because of a lack of a standard in preprocessing. We hope that this will solve both those problems, and become the standard benchmark for language modeling experiments. As more researchers use the new benchmark, comparisons will be easier and more accurate, and progress will be faster.
For all the researchers out there, try out this model, run your experiments, and let us know how it goes -- or publish, and we’ll enjoy finding your results at conferences and in journals.
Lens Blur in the new Google Camera app
Wednesday, April 16, 2014
Posted by Carlos Hernández, Software Engineer
One of the biggest advantages of
SLR cameras
over camera phones is the ability to achieve shallow depth of field and
bokeh
effects. Shallow depth of field makes the object of interest "pop" by bringing the foreground into focus and de-emphasizing the background. Achieving this optical effect has traditionally required a big lens and aperture, and therefore hasn’t been possible using the camera on your mobile phone or tablet.
That all changes with
Lens Blur,
a new mode in the
Google Camera
app. It lets you take a photo with a shallow depth of field using just your Android phone or tablet. Unlike a regular photo, Lens Blur lets you change the point or level of focus
after
the photo is taken. You can choose to make any object come into focus simply by tapping on it in the image. By changing the depth-of-field slider, you can simulate different aperture sizes, to achieve bokeh effects ranging from subtle to surreal (e.g.,
tilt-shift
). The new image is rendered instantly, allowing you to see your changes in real time.
Lens Blur replaces the need for a large optical system with algorithms that
simulate
a larger lens and aperture. Instead of capturing a single photo, you move the camera in an upward sweep to capture a whole series of frames. From these photos, Lens Blur uses computer vision algorithms to create a 3D model of the world, estimating the
depth
(distance) to every point in the scene. Here’s an example -- on the left is a raw input photo, in the middle is a “depth map” where darker things are close and lighter things are far away, and on the right is the result blurred by distance:
Here’s how we do it. First, we pick out visual features in the scene and track them over time, across the series of images. Using computer vision algorithms known as Structure-from-Motion (SfM) and
bundle adjustment
, we compute the camera’s 3D position and orientation and the 3D positions of all those image features throughout the series.
Once we’ve got the 3D pose of each photo, we compute the depth of each pixel in the reference photo using
Multi-View Stereo
(MVS) algorithms. MVS works the way human stereo vision does: given the location of the same object in two different images, we can
triangulate
the 3D position of the object and compute the distance to it. How do we figure out which pixel in one image corresponds to a pixel in another image? MVS measures how similar they are -- on mobile devices, one particularly simple and efficient way is computing the Sum of Absolute Differences (SAD) of the RGB colors of the two pixels.
Now it’s an optimization problem: we try to build a depth map where all the corresponding pixels are most similar to each other. But that’s typically not a well-posed optimization problem -- you can get the same similarity score for different depth maps. To address this ambiguity, the optimization also incorporates assumptions about the 3D geometry of a scene, called a "prior,” that favors reasonable solutions. For example, you can often assume two pixels near each other are at a similar depth. Finally, we use
Markov Random Field
inference methods to solve the optimization problem.
Having computed the depth map, we can re-render the photo, blurring pixels by differing amounts depending on the pixel’s depth, aperture and location relative to the focal plane. The focal plane determines which pixels to blur, with the amount of blur increasing proportionally with the distance of each pixel to that focal plane. This is all achieved by simulating a physical lens using the
thin lens
approximation.
The algorithms used to create the 3D photo run entirely on the mobile device, and are closely related to the computer vision algorithms used in 3D mapping features like Google Maps
Photo Tours
and
Google Earth
. We hope you have fun with your
bokeh
experiments!
Sawasdeee ka Voice Search
Wednesday, April 02, 2014
Posted by Keith Hall and Richard Sproat, Staff Research Scientists, Speech
Typing on mobile devices can be difficult, especially when you're on the go. Google Voice Search gives you a fast, easy, and natural way to search by speaking your queries instead of typing them. In Thailand, Voice Search has been one of the most requested services, so we’re excited to now offer users there the ability to speak queries in Thai, adding to over 75 languages and accents in which you can talk to Google.
To power Voice Search, we teach computers to understand the sounds and words that build spoken language. We trained our speech recognizer to understand Thai by collecting speech samples from hundreds of volunteers in Bangkok, which enabled us to build this recognizer in just a fraction of the time it took to build other models. Our helpers are asked to read popular queries in their native tongue, in a variety of acoustic conditions such as in restaurants, out on busy streets, and inside cars.
Each new language for voice recognition often requires our research team to tackle new challenges, including Thai.
Segmentation is a major challenge in Thai, as the Thai script has no spaces between words, so it is harder to know when a word begins and ends. Therefore, we created a Thai segmenter to help our system recognize words better. For example: ตากลม can be segmented to ตาก ลม or ตา กลม. We collected a large corpus of text and asked Thai speakers to manually annotate plausible segmentations. We then trained a sequence segmenter on this data allowing it to generalize beyond the annotated data.
Numbers are an important part of any language: the string “87” appears on a web page and we need to know how people would say that. As with over 40 other languages, we included a number grammar for Thai, that tells you that “87” would be read as แปดสิบเจ็ด.
Thai users often mix English words with Thai, such as brand or artist names, in both spoken and written Thai which adds complexity to our acoustic models, lexicon models, and segmentation models. We addressed this by introducing ‘code switching’, which allows Voice Search to recognize when different languages are being spoken interchangeably and adjust phonetic transliteration accordingly.
Many Thai users frequently leave out accents and tone markers when they search (eg โน๊ตบุก instead of โน้ตบุ๊ก OR หมูหยอง instead of หมูหย็อง) so we had to create a special algorithm to ensure accents and tones were restored in search results provided and our Thai users would see properly formatted text in the majority of cases.
We’re particularly excited that Voice Search can help people find locally relevant information, ranging from travel directions to the nearest restaurant, without having to type long phrases in Thai.
Voice Search is available for Android devices running Jelly Bean and above. It will be available for older Android releases and iOS users soon.
Making Blockly Universally Accessible
Tuesday, April 01, 2014
Posted by Neil Fraser, Chief Interplanetary Liaison
We work hard to make our products accessible to people everywhere, in every culture. Today we’re expanding our outreach efforts to support a traditionally underserved community -- those who call themselves "tlhIngan."
Google's Blockly programming environment is used in K-12 classrooms around the world to teach programming. But the world is not enough. Students on
Qo'noS
have had difficulty learning to code because most of the teaching tools aren't available in their native language. Additionally, many existing tools are too fragile for their pedagogical approach. As a result, Klingons have found it challenging to enter computer science. This is reflected in the fact that less than 2% of Google engineers are Klingon.
Today we launch a full translation of Blockly in Klingon. It incorporates Klingon cultural norms to facilitate learning in this unique population:
Blockly has no syntax errors. This reduces frustration, and reduces the number of computers thrown through bulkheads.
Variables are untyped. Type errors can too easily be perceived as a challenge to the honor of a student's family (and we’ve seen where that ends).
Debugging and bug reports have been omitted, our research indicates that in the event of a bug, they prefer the entire program to just blow up.
Get a little keyboard dirt under your fingernails. Learn that although
ghargh
is delicious, code structure should not resemble it. And above all, be proud that
tlhIngan maH
. Qapla'!
You can try out the demo
here
or get involved
here
.
Labels
accessibility
ACL
ACM
Acoustic Modeling
Adaptive Data Analysis
ads
adsense
adwords
Africa
AI
Algorithms
Android
API
App Engine
App Inventor
April Fools
Art
Audio
Australia
Automatic Speech Recognition
Awards
Cantonese
China
Chrome
Cloud Computing
Collaboration
Computational Imaging
Computational Photography
Computer Science
Computer Vision
conference
conferences
Conservation
correlate
Course Builder
crowd-sourcing
CVPR
Data Center
Data Discovery
data science
datasets
Deep Learning
DeepDream
DeepMind
distributed systems
Diversity
Earth Engine
economics
Education
Electronic Commerce and Algorithms
electronics
EMEA
EMNLP
Encryption
entities
Entity Salience
Environment
Europe
Exacycle
Expander
Faculty Institute
Faculty Summit
Flu Trends
Fusion Tables
gamification
Gmail
Google Books
Google Brain
Google Cloud Platform
Google Docs
Google Drive
Google Genomics
Google Play Apps
Google Science Fair
Google Sheets
Google Translate
Google Trips
Google Voice Search
Google+
Government
grants
Graph
Graph Mining
Hardware
HCI
Health
High Dynamic Range Imaging
ICLR
ICML
ICSE
Image Annotation
Image Classification
Image Processing
Inbox
Information Retrieval
internationalization
Internet of Things
Interspeech
IPython
Journalism
jsm
jsm2011
K-12
KDD
Klingon
Korean
Labs
Linear Optimization
localization
Machine Hearing
Machine Intelligence
Machine Learning
Machine Perception
Machine Translation
MapReduce
market algorithms
Market Research
ML
MOOC
Multimodal Learning
NAACL
Natural Language Processing
Natural Language Understanding
Network Management
Networks
Neural Networks
Ngram
NIPS
NLP
open source
operating systems
Optical Character Recognition
optimization
osdi
osdi10
patents
ph.d. fellowship
PhD Fellowship
PiLab
Policy
Professional Development
Proposals
Public Data Explorer
publication
Publications
Quantum Computing
renewable energy
Research
Research Awards
resource optimization
Robotics
schema.org
Search
search ads
Security and Privacy
Semi-supervised Learning
SIGCOMM
SIGMOD
Site Reliability Engineering
Social Networks
Software
Speech
Speech Recognition
statistics
Structured Data
Style Transfer
Supervised Learning
Systems
TensorFlow
Translate
trends
TTS
TV
UI
University Relations
UNIX
User Experience
video
Video Analysis
Vision Research
Visiting Faculty
Visualization
VLDB
Voice Search
Wiki
wikipedia
WWW
YouTube
Archive
2017
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Jul
May
Apr
Mar
Feb
2007
Oct
Sep
Aug
Jul
Jun
Feb
2006
Dec
Nov
Sep
Aug
Jul
Jun
Apr
Mar
Feb
Feed
Google
on
Follow @googleresearch
Give us feedback in our
Product Forums
.