Google Research Blog
The latest news from Research at Google
Sawasdeee ka Voice Search
Wednesday, April 02, 2014
Posted by Keith Hall and Richard Sproat, Staff Research Scientists, Speech
Typing on mobile devices can be difficult, especially when you're on the go. Google Voice Search gives you a fast, easy, and natural way to search by speaking your queries instead of typing them. In Thailand, Voice Search has been one of the most requested services, so we’re excited to now offer users there the ability to speak queries in Thai, adding to over 75 languages and accents in which you can talk to Google.
To power Voice Search, we teach computers to understand the sounds and words that build spoken language. We trained our speech recognizer to understand Thai by collecting speech samples from hundreds of volunteers in Bangkok, which enabled us to build this recognizer in just a fraction of the time it took to build other models. Our helpers are asked to read popular queries in their native tongue, in a variety of acoustic conditions such as in restaurants, out on busy streets, and inside cars.
Each new language for voice recognition often requires our research team to tackle new challenges, including Thai.
Segmentation is a major challenge in Thai, as the Thai script has no spaces between words, so it is harder to know when a word begins and ends. Therefore, we created a Thai segmenter to help our system recognize words better. For example: ตากลม can be segmented to ตาก ลม or ตา กลม. We collected a large corpus of text and asked Thai speakers to manually annotate plausible segmentations. We then trained a sequence segmenter on this data allowing it to generalize beyond the annotated data.
Numbers are an important part of any language: the string “87” appears on a web page and we need to know how people would say that. As with over 40 other languages, we included a number grammar for Thai, that tells you that “87” would be read as แปดสิบเจ็ด.
Thai users often mix English words with Thai, such as brand or artist names, in both spoken and written Thai which adds complexity to our acoustic models, lexicon models, and segmentation models. We addressed this by introducing ‘code switching’, which allows Voice Search to recognize when different languages are being spoken interchangeably and adjust phonetic transliteration accordingly.
Many Thai users frequently leave out accents and tone markers when they search (eg โน๊ตบุก instead of โน้ตบุ๊ก OR หมูหยอง instead of หมูหย็อง) so we had to create a special algorithm to ensure accents and tones were restored in search results provided and our Thai users would see properly formatted text in the majority of cases.
We’re particularly excited that Voice Search can help people find locally relevant information, ranging from travel directions to the nearest restaurant, without having to type long phrases in Thai.
Voice Search is available for Android devices running Jelly Bean and above. It will be available for older Android releases and iOS users soon.
Under the hood of Croatian, Filipino, Ukrainian, and Vietnamese in Google Voice Search
Thursday, July 25, 2013
Posted by Eugene Weinstein and Pedro Moreno, Google Speech Team
Although we’ve been working on speech recognition for several years, every new language requires our engineers and scientists to tackle unique challenges. Our most recent additions - Croatian, Filipino, Ukrainian, and Vietnamese - required creative solutions to reflect how each language is used across devices and in everyday conversations.
For example, since Vietnamese is a
tonal language
, we had to explore how to take tones into consideration. One simple technique is to model the tone and vowel combinations (
tonemes
) directly in our lexicons. This, however, has the side effect of a larger phonetic inventory. As a result we had to come up with special algorithms to handle the increased complexity. Additionally, Vietnamese is a heavily diacritized language, with tone markers on a majority of syllables. Since Google Search is very good at returning valid results even when diacritics are omitted, our Vietnamese users frequently omit the diacritics when typing their queries. This creates difficulties for the speech recognizer, which selects its vocabulary from typed queries. For this purpose, we created a special diacritic restoration algorithm which enables us to present properly formatted text to our users in the majority of cases.
Filipino also presented interesting challenges. Much like in other multilingual societies such as Hong Kong, India, South Africa, etc., Filipinos often mix several languages in their daily life. This is called
code switching
. Code switching complicates the design of pronunciation, language, and acoustic models. Speech scientists are effectively faced with a dilemma: should we build one system per language, or should we combine all languages into one?
In such situations we prefer to model the reality of daily language use in our speech recognizer design. If users mix several languages, our recognizers should do their best in modeling this behavior. Hence our Filipino voice search system, while mainly focused on the Filipino language, also allows users to mix in English terms.
The algorithms we’re using to model how speech sounds are spoken in each language make use of our distributed large-scale
neural network
learning infrastructure (yes, the same one that spontaneously
discovered cats
on YouTube!). By partitioning the gigantic parameter set of the model, and by evaluating each partition on a separate computation server, we’re able to achieve unprecedented levels of parallelism in training acoustic models.
The more people use Google speech recognition products, the more accurate the technology becomes. These new neural network technologies will help us bring you lots of improvements and many more languages in the future.
Google Correlate expands to 49 additional countries
Tuesday, January 03, 2012
Posted by Matt Mohebbi, Software Engineer
In May of this year we
launched
Google Correlate on Google Labs.
This system
enables a correlation search between a user-provided time series and millions of time series of Google search traffic. Since our initial launch, we've graduated to Google Trends and we've seen a number of great applications of Correlate in several domains, including economics (
consumer spending
,
unemployment rate
and
housing inventory
),
sociology
and
meteorology
. The correspondence of
gas prices and search activity for fuel efficient cars
was even briefly discussed in a
Fox News presidential debate
and NPR recently
covered
correlations related to political commentators.
Health has always been an area of particular interest to our team (Matt Mohebbi, Julia Kodysh, Rob Schonberger and Dan Vanderkam). Correlate was inspired by Google Flu Trends and many of us worked on both systems. So we were very excited when the BioSense division at the CDC
published
a page which shows correlations between some of their national trends in patient diagnosis activity and Google search activity. With just three years of weekly data, relevant search terms are surfaced. For example, the time series for
bloody nose
surfaces "bloody snot" and "blood in snot".
While these terms shouldn't come as a surprise, there are others which are more interesting, including searches related to static electricity, dry skin, and red cheeks. Of course, correlation is not causation but we hope that Correlate can be used as a method for researchers to generate new hypotheses with their data.
To help researchers outside the United States, we're pleased to announce support for 49 additional countries in Google Correlate. It's now possible to see correlations like
"snorkeling" in Australia
,
"cherry blossoms" in Japan
, and
"beer garden" in Germany
. We look forward to seeing what new correlations researchers can find with this data!
Labels
accessibility
ACL
ACM
Acoustic Modeling
Adaptive Data Analysis
ads
adsense
adwords
Africa
AI
Algorithms
Android
Android Wear
API
App Engine
App Inventor
April Fools
Art
Audio
Augmented Reality
Australia
Automatic Speech Recognition
Awards
Cantonese
Chemistry
China
Chrome
Cloud Computing
Collaboration
Computational Imaging
Computational Photography
Computer Science
Computer Vision
conference
conferences
Conservation
correlate
Course Builder
crowd-sourcing
CVPR
Data Center
Data Discovery
data science
datasets
Deep Learning
DeepDream
DeepMind
distributed systems
Diversity
Earth Engine
economics
Education
Electronic Commerce and Algorithms
electronics
EMEA
EMNLP
Encryption
entities
Entity Salience
Environment
Europe
Exacycle
Expander
Faculty Institute
Faculty Summit
Flu Trends
Fusion Tables
gamification
Gboard
Gmail
Google Accelerated Science
Google Books
Google Brain
Google Cloud Platform
Google Docs
Google Drive
Google Genomics
Google Maps
Google Photos
Google Play Apps
Google Science Fair
Google Sheets
Google Translate
Google Trips
Google Voice Search
Google+
Government
grants
Graph
Graph Mining
Hardware
HCI
Health
High Dynamic Range Imaging
ICLR
ICML
ICSE
Image Annotation
Image Classification
Image Processing
Inbox
India
Information Retrieval
internationalization
Internet of Things
Interspeech
IPython
Journalism
jsm
jsm2011
K-12
KDD
Keyboard Input
Klingon
Korean
Labs
Linear Optimization
localization
Low-Light Photography
Machine Hearing
Machine Intelligence
Machine Learning
Machine Perception
Machine Translation
Magenta
MapReduce
market algorithms
Market Research
Mixed Reality
ML
MOOC
Moore's Law
Multimodal Learning
NAACL
Natural Language Processing
Natural Language Understanding
Network Management
Networks
Neural Networks
Nexus
Ngram
NIPS
NLP
On-device Learning
open source
operating systems
Optical Character Recognition
optimization
osdi
osdi10
patents
Peer Review
ph.d. fellowship
PhD Fellowship
PhotoScan
Physics
PiLab
Pixel
Policy
Professional Development
Proposals
Public Data Explorer
publication
Publications
Quantum AI
Quantum Computing
renewable energy
Research
Research Awards
resource optimization
Robotics
schema.org
Search
search ads
Security and Privacy
Semantic Models
Semi-supervised Learning
SIGCOMM
SIGMOD
Site Reliability Engineering
Social Networks
Software
Speech
Speech Recognition
statistics
Structured Data
Style Transfer
Supervised Learning
Systems
TensorBoard
TensorFlow
TPU
Translate
trends
TTS
TV
UI
University Relations
UNIX
User Experience
video
Video Analysis
Virtual Reality
Vision Research
Visiting Faculty
Visualization
VLDB
Voice Search
Wiki
wikipedia
WWW
YouTube
Archive
2018
May
Apr
Mar
Feb
Jan
2017
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Jul
May
Apr
Mar
Feb
2007
Oct
Sep
Aug
Jul
Jun
Feb
2006
Dec
Nov
Sep
Aug
Jul
Jun
Apr
Mar
Feb
Feed
Google
on
Follow @googleresearch
Give us feedback in our
Product Forums
.