Research Blog: April 2014

A Billion Words: Because today's language modeling standard should be higher

Wednesday, April 30, 2014

Posted by Dave Orr, Product Manager, and Ciprian Chelba, Research Scientistpronounce “ladder” and “latter” identicallyIME keyboards

Photo credit: Kurt Partridge

language modelswe are releasing scriptsarXiv paperhttp://www.statmt.org/lm-benchmark/

Lens Blur in the new Google Camera app

Wednesday, April 16, 2014

Posted by Carlos Hernández, Software EngineerSLR camerasbokehLens Blur,Google Cameraaftertilt-shift

simulatedepth

bundle adjustmentMulti-View StereotriangulateMarkov Random Fieldthin lens

Photo ToursGoogle Earthbokeh

Sawasdeee ka Voice Search

Wednesday, April 02, 2014

Posted by Keith Hall and Richard Sproat, Staff Research Scientists, Speech

Segmentation is a major challenge in Thai, as the Thai script has no spaces between words, so it is harder to know when a word begins and ends. Therefore, we created a Thai segmenter to help our system recognize words better. For example: ตากลม can be segmented to ตาก ลม or ตา กลม. We collected a large corpus of text and asked Thai speakers to manually annotate plausible segmentations. We then trained a sequence segmenter on this data allowing it to generalize beyond the annotated data.

Numbers are an important part of any language: the string “87” appears on a web page and we need to know how people would say that. As with over 40 other languages, we included a number grammar for Thai, that tells you that “87” would be read as แปดสิบเจ็ด.

Thai users often mix English words with Thai, such as brand or artist names, in both spoken and written Thai which adds complexity to our acoustic models, lexicon models, and segmentation models. We addressed this by introducing ‘code switching’, which allows Voice Search to recognize when different languages are being spoken interchangeably and adjust phonetic transliteration accordingly.

Many Thai users frequently leave out accents and tone markers when they search (eg โน๊ตบุก instead of โน้ตบุ๊ก OR หมูหยอง instead of หมูหย็อง) so we had to create a special algorithm to ensure accents and tones were restored in search results provided and our Thai users would see properly formatted text in the majority of cases.

Making Blockly Universally Accessible

Tuesday, April 01, 2014

Posted by Neil Fraser, Chief Interplanetary Liaison
Qo'noS

Blockly has no syntax errors. This reduces frustration, and reduces the number of computers thrown through bulkheads.

Variables are untyped. Type errors can too easily be perceived as a challenge to the honor of a student's family (and we’ve seen where that ends).

Debugging and bug reports have been omitted, our research indicates that in the event of a bug, they prefer the entire program to just blow up.

gharghtlhIngan maHherehere

Google Research Blog

A Billion Words: Because today's language modeling standard should be higher

Lens Blur in the new Google Camera app

Sawasdeee ka Voice Search

Making Blockly Universally Accessible

Labels

Archive

Feed

Company-wide

Products

Developers