Google Research Blog
The latest news from Research at Google
Conference Report: Workshop on Internet and Network Economics (WINE) 2012
Wednesday, December 19, 2012
Posted by Vahab Mirrokni, Research Scientist, Google Research New York
Google regularly participates in the
WINE
conference: Workshop on Internet & Network Economics. WINE’12 just happened last week in Liverpool, UK, where there is a strong
economics and computation group
. WINE provides a forum for researchers across various disciplines to examine interesting algorithmic and economic problems of mutual interest that have emerged from the Internet over the past decade. For Google, the exchange of ideas at this selective workshop has resulted in innovation and improvements in algorithms and economic auctions, such as our display ad allocation.
Googlers co-authored three papers this year; here’s a synopsis of each, as well as some highlights from invited talks at the conference:
Budget Optimization for Online Campaigns with Positive Carryover Effects
This paper first argues that ad impressions may have some long-term impact on user behaviour, and refers to an older
WWW ’10 paper
. Based on this motivation, the paper presents a scalable budget optimization algorithm for online advertising campaigns in the presence of Markov user behavior. In such settings, showing an ad to a user may change their actions in the future through a Markov model, and the probability of conversion for the ad does not only depend on the last ad shown, but also on earlier user activities. The main purpose of the paper is to give a simpler algorithm to solve a constrained Markov Decision Process, and confirms this easier solution via simulations on some advertising data sets. The paper was written when Nikolay Archak, a PhD student at NYU business school, was an intern with the New York market algorithms research team.
On Fixed-Price Marketing for Goods with Positive Network Externalities
This paper presents an approximation algorithm for marketing “networked goods” and services that exhibit positive network externalities - for example, is the buyer's value for the goods or service influenced positively by other buyers owning the goods or using the service? Such positive network externalities arise in many products like operating systems or smartphone services. While most of previous research is concerned with influence maximization, this paper attempts to identify a revenue maximizing marketing strategy for such networked goods, as follows: The seller selects a set (S) of buyers and gives them the goods for free, then sets a fixed per-unit price (p), at which other consumers can buy the item. The strategy is consistent with practice and is easy to implement. The authors use ideas from non-negative submodular maximization to find the optimal revenue maximizing fixed-price marketing strategy.
The AND-OR game: Equilibrium Characterization
Yishay Mansour, former Visiting Faculty in Google New York, presented the results; he first argued that the existence and uniqueness of market equilibria is only known for markets with divisible goods and concave or convex utilities. Then he described a simple market AND-OR game for divisible goods. To my surprise, he showed a class of mixed strategies are basically the unique set of randomized equilibria for this market (up to minor changes in the outcome). At the end, Yishay challenged the audience to give such characterization for more general markets with indivisible goods.
Kamal Jain of Ebay Research gave an interesting talk about mechanism design problems, inspired by application in companies like Ebay and Google. In one part, Kamal proposed "
coopetitive ad auctions
" for settings in which the auctioneer runs an auction among buyers who may cooperate with some advertisers, and at the same time compete with others for sealing advertising slots. He gave context around "product ads"; for example, a retailer like Best Buy may cooperate with a manufacturer like HP to put out a product ad for an HP computer sold at Best Buy. Kamal argued that if the cooperation is not an explicit part of the auction, an advertiser may implicitly end up competing with itself, thus decreasing the social welfare. By making the cooperation an explicit part of the auction, he was able to design a mechanism with better social welfare and revenue properties, compared to both first-price and second-price auctions. Kamal also discussed optimal mechanisms for intermediaries, and “surplus auctions” to avoid cyclic bidding behavior resulted from running naive variants of first-price auctions in repeated settings.
David Parkes of Harvard University discussed techniques to combine mechanism design with machine learning or heuristic search algorithms. At one point David discussed how to implement a
branch-and-bound search algorithm
in a way that results in a "monotone" allocation rule, so that if we implement a VCG-type allocation and pricing rule based on this allocation algorithm, the resulting mechanism becomes truthful. David also presented ways to compute a set of prices for any allocation, respecting incentive compatibility constraints as much as possible. Both of these topics appeared in ACM EC 2012 papers that he had co-authored.
At the business meeting, there was a proposal to change the title of the conference from “workshop” to “conference” or “symposium” to reflect its fully peer-reviewed and archival nature, keeping the same acronym of WINE. (Changing the title to “Symposium on the Web, Internet, and Network Economics” was rejected: SWINE!) WINE 2013 will be held at Harvard University in Boston, MA, and we look forward to reconnecting with fellow researchers in the field and continuing to nurture new developments and research topics.
Using online courses in Spain to teach entrepreneurship
Tuesday, December 18, 2012
Posted by Francisco Ruiz Anton, Policy Manager, Google Spain
Cross-posted with the
Policy by the Numbers Blog
At the end of the third quarter in 2012,
roughly 25%
of adults in Spain were out of work. More than half of adults under 24 years old are unemployed. Recent graduates and young adults preparing to enter the workforce face the toughest job market in decades.
The Internet presents an opportunity for growth and economic development. According to
recent research
, more than 100,000 jobs in Spain originate from the Internet and it directly contributes to the GDP with 26.7 billion euros (2.5%). That impact that could triple by 2015 under the right conditions.
One of those conditions is making high-quality education accessible, echoed by a
recent OECD
report on the youth labor market in Spain. This is no easy task. University degrees are in high demand, straining the reach of our existing institutions.
The web has become a way for learners to develop new skills when traditional institutions aren’t an option. Recent courses on platforms like
Udacity
,
Coursera
and
edX
have seen hundreds of thousands of students enroll and participate in courses taught by prestigious professors and lecturers.
Google is partnering with numerous organizations and universities in Spain to organize
UniMOOC
, an online course intended to educate citizens in Spain and the rest of the Spanish-speaking world about entrepreneurship. It was built with
Course Builder
, Google’s new open source toolkit for constructing online courses.
To date nearly 10,000 students have registered for the course, over two-thirds of them from Spain and one-third from 93 countries. It recently
won an award
for the “Most innovative project” in 2012 from the newspaper El Mundo.
Spain’s situation is not entirely unique in Europe. Policymakers across the continent are asking themselves how best to create economic opportunity for their citizens, and how to ensure that their best and brightest students are on a path toward financial success. Our hope is that the people taking this course will be more empowered with the right skills and tools to start their own businesses that can create jobs. They will push not only Spain, but Europe and the rest of the world towards economic recovery and growth.
The course is still running, and you’re able to
join today
.
Millions of Core-Hours Awarded to Science
Monday, December 17, 2012
Posted by Andrea Held, Program Manager, University Relations
In 2011 Google University Relations
launched
a new academic research awards program,
Google Exacycle for Visiting Faculty
, offering up to one billion core-hours to qualifying proposals. We were looking for projects that would consume 100M+ core-hours each and be of critical benefit to society. Not surprisingly, there was no shortage of applications.
Since then, the following seven scientists have been working on-site at Google offices in Mountain View and Seattle. They are here to run large computing experiments on Google’s infrastructure to change the future. Their projects include exploring antibiotic drug resistance, protein folding and structural modelling, drug discovery, and last but not least, the dynamic universe.
Today, we would like to introduce the Exacycle award recipients and their work. Please stay tuned for updates next year.
Simulating a Dynamic Universe with the Large Synoptic Sky Survey
Jeff Gardner
, University of Washington, Seattle, WA
Collaborators:
Andrew Connolly
, University of Washington, Seattle, WA, and
John Peterson
, Purdue University, West Lafayette, IN
Research subject
:
The Large Synoptic Survey Telescope
(LSST) is one of the most ambitious astrophysical research programs ever undertaken. Starting in 2019, the LSST’s 3.2 Gigapixel camera will repeatedly survey the southern sky, generating tens of petabytes of data every year. The images and catalogs from the LSST have the potential to transform both our understanding of the universe and the way that we engage in science in general.
Exacycle impact
: In order to design the telescope to yield the best possible science, the LSST collaboration has undertaken a formidable computational campaign to simulate the telescope itself. This will optimize how the LSST surveys the sky and provide realistic datasets for the development of analysis pipelines that can operate on hundreds of petabytes. Using Exacycle, we are reducing the time required to simulate one night of LSST observing, roughly 5 million images, from 3 months down to a few days. This rapid turnaround will enable the LSST engineering teams to test new designs and new algorithms with unprecedented precision, which will ultimately lead to bigger and better science from the LSST.
Designing and Defeating Antibiotic Drug Resistance
Peter Kasson, Assistant Professor, Departments of Molecular Physiology and Biological Physics and of Biomedical Engineering, University of Virginia
Research subject
: Antibiotics have made most bacterial infections routinely treatable. As antibiotic use has become common, bacterial resistance to these drugs has also increased. Recently, some bacteria have arisen that are resistant to almost all antibiotics. We are studying the basis for this resistance, in particular the enzyme that acts to break down many antibiotics. Identifying the critical changes required for pan-resistance will aid surveillance and prevention; it will also help elucidate targets for the development of new therapeutic agents.
Exacycle impact
: Exacycle allows us to simulate the structure and dynamics of several thousand enzyme variants in great detail. The structural differences between enzymes from resistant and non-resistant bacteria are subtle, so we have developed methods to compare structural "fingerprints" of the enzymes and identify distinguishing characteristics. The complexity of this calculation and large number of potential bacterial sequences mean that this is a computationally intensive task; the massive computing power offered by Exacycle in combination with some novel sampling strategies make this calculation tractable.
Sampling the conformational space of G protein-coupled receptors
Kai Kohlhoff, Research Scientist at Google
Collaborators: Research labs of
Vijay Pande
and
Russ Altman
at Stanford University
Research subject
: G protein-coupled receptors (
GPCRs
) are proteins that act as signal transducers in the cell membrane and influence the response of a cell to a variety of external stimuli. GPCRs play a role in many human diseases, such as asthma and hypertension, and are well established as a primary drug target.
Exacycle impact
: Exacycle let us perform many tens of thousands of molecular simulations of membrane-bound GPCRs in parallel using the
Gromacs
software. With
MapReduce
,
Dremel
, and other technologies, we analyzed the 100s of Terabytes of generated data and built
Markov State Models
. The information contained in these models can help scientists design drugs that have higher potency and specificity than those presently available.
Results
: Our models let us explore kinetically meaningful receptor states and transition rates, which improved our understanding of the structural changes that take place during activation of a signaling receptor. In addition, we used Exacycle to study the affinity of drug molecules when binding to different receptor states.
Modeling transport through the nuclear pore complex
Daniel Russel, post doc in structural biology, University of California, San Francisco
Research subject
: Our goal is to develop a predictive model of transport through the nuclear pore complex (NPC). Developing the model requires understanding how the behavior of the NPC varies as we change the parameters governing the components of the system. Such a model will allow us to understand how transportins, the unstructured domains and the rest of the cellular milieu, interact to determine efficiency and specificity of macromolecular transport into and out of the nucleus.
Exacycle impact
: Since data describing the microscopic behavior of most parts of the nuclear transport process is incomplete and contradictory, we have to explore a larger parameter space than would be feasible with traditional computational resources.
Status
: We are currently modeling various experimental measurements of aspects of the nuclear transport process. These experiments range from simple ones containing only a few components of the transport process to measurements on the whole nuclear pore with transportins and cellular milieu.
Large scale screening for new drug leads that modulate the activity of disease-relevant proteins
James Swetnam, Scientific Software Engineer, drugable.org, NYU School of Medicine
Collaborators: Tim Cardozo, MD, PhD - NYU School of Medicine.
Research subject
: We are using a high throughput, CPU-bound procedure known as virtual ligand screening to ‘dock’, or produce rough estimates of binding energy, for a large sample of bioactive chemical space to the entirety of known protein structures. Our goal is the first computational picture of how bioactive chemistry with therapeutic potential can affect human and pathogen biology.
Exacycle Impact
: Typically, using our academic lab’s resources, we could screen a few tens of thousands of compounds against a single protein to try to find modulators of its function. To date, Exacycle has enabled us to screen 545,130 compounds against 8,535 protein structures that are involved in important and underserved diseases as cancer, diabetes, malaria, and HIV to look for new leads towards future drugs.
Status
: We are currently expanding our screens to an additional 206,190 models from
ModBase. We aim to have a public dataset for the research community in the first half of 2013.
Protein Structure Prediction and Design
Michael Tyka, Research Fellow, University of Washington, Seattle, WA
Research subject
: The precise relationship between the primary sequence and the
three dimensional structure of proteins
is one of the unsolved grand challenges of computational biochemistry. The
Baker Lab
has made significant progress in recent years by developing more powerful protein prediction and design algorithms using the
Rosetta Protein Modelling suite
.
Exacycle impact
: Limitations in the accuracy of the
physical model
and lack of sufficient computational power have prevented solutions to broader classes of medically relevant problems. Exacycle allows us to improve model quality by conducting large parameter optimization sweeps with a very large dataset of experimental protein structural data. The improved energy functions will benefit the entire theoretical protein research community.
We are also using Exacycle to conduct simultaneous
docking
and
one-sided protein design
to develop novel protein binders for a number of medically relevant targets. For the first time, we are able to aggressively redesign backbone conformations at the binding site. This allows for a much greater flexibility in possible binding shapes but also hugely increases the space of possibilities that have to be sampled. Very promising designs have already been found using this method.
Continuing the quest for future computer scientists with CS4HS
Thursday, December 13, 2012
Erin Mindell, Program Manager, Google Education
Computer Science for High School (CS4HS) began five years ago with a simple question: How can we help create a much needed influx of CS majors into universities and the workforce? We took our questions to three of our university partners--University of Washington, Carnegie Mellon, and UCLA--and together we came up with CS4HS. The model was based on a “train the trainer” technique. By focusing our efforts on teachers and bringing them the skills they need to implement CS into their classrooms, we would be able to reach even more students. With grants from Google, our partner universities created curriculum and put together hands-on, community-based workshops for their local area teachers.
Since the initial experiment, CS4HS has exploded into a worldwide program, reaching more than 4,000 teachers and 200,000 students either directly or indirectly in more than 34 countries. These hands-on, in-person workshops are a hallmark of our program, and we will continue to fund these projects going forward. (For information on
how to apply
, please see our
website
.) The success of this popular program speaks for itself, as we receive more quality proposals each year. But success comes at a price, and we have found that the current format of the workshops is not infinitely scalable.
This is where Research at Google comes in. This year, we are experimenting with a new model for CS4HS workshops. By harnessing the success of online courses such as
Power Searching with Google
, and utilizing open-source platforms like the one found in
Course Builder
, we are hoping to put the
“M” in “MOOC”
and reach a broader audience of educators, eager to learn how to teach CS in their classrooms.
For this pilot, we are looking to sponsor two online workshops, one that is geared toward CS teachers, and one that is geared toward CS for non-CS teachers to go live in 2013. This is a way for a university (or several colleges working together) to create one incredible workshop that has the potential to reach thousands of enthusiastic teachers. Just as with our in-person workshops, applications will be open to college, university, and technical schools of higher learning only, as we depend on their curriculum expertise to put together the most engaging programs. For this pilot, we will be looking for MOOC proposals in the US and Canada only.
We are really excited about the possibilities of this new format, and we are looking for quality applications to fund. While applications don’t have to run on our
Course Builder platform
, we may be able to offer some additional support to funded projects that do. If you are interested in joining our experiment or just learning more, you can find information on how to apply on our
CS4HS website
(or click
here
).
Applications are open until February 16, 2013; we can’t wait to see what you come up with. If you have questions, please email us at
cs4hs@google.com
.
Large Scale Language Modeling in Automatic Speech Recognition
Wednesday, October 31, 2012
Posted by Ciprian Chelba, Research Scientist
At Google, we’re able to use the large amounts of data made available by the Web’s fast growth. Two such data sources are the anonymized queries on google.com and the web itself. They help improve automatic speech recognition through large language models: Voice Search makes use of the former, whereas YouTube speech transcription benefits significantly from the latter.
The language model is the component of a speech recognizer that assigns a probability to the next word in a sentence given the previous ones. As an example, if the previous words are “new york”, the model would assign a higher probability to “pizza” than say “granola”. The n-gram approach to language modeling (predicting the next word based on the previous n-1 words) is particularly well-suited to such large amounts of data: it scales gracefully, and the non-parametric nature of the model allows it to grow with more data. For example, on Voice Search we were able to train and evaluate 5-gram language models consisting of 12 billion n-grams, built using large vocabularies (1 million words), and trained on as many as 230 billion words.
The computational effort pays off, as highlighted by the plot above: both word error rate (a measure of speech recognition accuracy) and search error rate (a metric we use to evaluate the output of the speech recognition system when used in a search engine) decrease significantly with larger language models.
A more detailed
summary of results on Voice Search and a few YouTube speech transcription tasks
(authors: Ciprian Chelba, Dan Bikel, Maria Shugrina, Patrick Nguyen, Shankar Kumar) presents our results when increasing both the amount of training data, and the size of the language model estimated from such data. Depending on the task, availability and amount of training data used, as well as language model size and the performance of the underlying speech recognizer, we observe reductions in word error rate between 6% and 10% relative, for systems on a wide range of operating points.
Cross-posted with the Research at Google G+ Page
Ngram Viewer 2.0
Thursday, October 18, 2012
Posted by Jon Orwant, Engineering Manager
Since launching the
Google Books Ngram Viewer
, we’ve been overjoyed by the public reception. Co-creator Will Brockman and I hoped that the ability to track the usage of phrases across time would be of interest to professional linguists, historians, and bibliophiles. What we didn’t expect was its popularity among casual users. Since the launch in 2010, the Ngram Viewer has been used about 50 times every minute to explore how phrases have been used in books spanning the centuries. That’s over 45 million graphs created, each one a glimpse into the history of the written word. For instance, comparing
flapper
,
hippie
, and
yuppie
, you can see when each word peaked:
Meanwhile, Google Books reached a milestone, having scanned 20 million books. That’s approximately one-seventh of all the books published since Gutenberg invented the printing press. We’ve updated the Ngram Viewer datasets to include a lot of those new books we’ve scanned, as well as improvements our engineers made in OCR and in hammering out inconsistencies between library and publisher metadata. (We’ve kept the old dataset around for scientists pursuing empirical, replicable language experiments such as the ones Jean-Baptiste Michel and Erez Lieberman Aiden conducted for our
Science paper
.)
At Google, we’re also trying to understand the meaning behind what people write, and to do that it helps to understand grammar. Last summer Slav Petrov of Google’s Natural Language Processing group and his intern Yuri Lin (who’s since joined Google full-time) built a system that identified parts of speech—nouns, adverbs, conjunctions and so forth—for all of the words in the millions of Ngram Viewer books. Now, for instance, you can compare the verb and noun forms of “cheer” to see how the frequencies have converged over time:
Some users requested the ability to combine Ngrams, and Googler Matthew Gray generalized that notion into what we’re calling Ngram compositions: the ability to add, subtract, multiply, and divide Ngram counts. For instance, you can see how “record player” rose at the expense of “Victrola”:
Our
info page
explains all the details about this curious notion of treating phrases like components of a mathematical expression. We’re guessing they’ll only be of interest to lexicographers, but then again that’s what we thought about Ngram Viewer 1.0.
Oh, and we added Italian too, supplementing our current languages: English, Chinese, Spanish, French, German, Hebrew, and Russian. Buon divertimento!
ReFr: A New Open-Source Framework for Building Reranking Models
Thursday, October 04, 2012
Posted by
Dan Bikel
and
Keith Hall
, Research Scientists at Google
We are pleased to announce the release of an open source, general-purpose framework designed for reranking problems, ReFr (Reranker Framework), now available at:
http://code.google.com/p/refr/
.
Many types of systems capable of processing speech and human language text produce multiple hypothesized outputs for a given input, each with a score. In the case of machine translation systems, these hypotheses correspond to possible translations from some sentence in a source language to a target language. In the case of speech recognition, the hypotheses are possible word sequences of what was said derived from the input audio. The goal of such systems is usually to produce a single output for a given input, and so they almost always just pick the highest-scoring hypothesis.
A
reranker
is a system that uses a trained model to rerank these scored hypotheses, possibly inducing a different ranked order. The goal is that by employing a second model after the fact, one can make use of additional information not available to the original model, and produce better overall results. This approach has been shown to be useful for a wide variety of speech and natural language processing problems, and was the
subject of one of the groups
at the 2011 summer workshop at Johns Hopkins’ Center for Language and Speech Processing. At that workshop, led by Professor Brian Roark of Oregon Health & Science University, we began building a general-purpose framework for training and using reranking models. The result of all this work is
ReFr
.
From the outset, we designed ReFr with both speed and flexibility in mind. The core implementation is entirely in C++, with a flexible architecture allowing rich experimentation with both features and learning methods. The framework also employs a powerful runtime configuration mechanism to make experimentation even easier. Finally, ReFr leverages the parallel processing power of Hadoop to train and use large-scale reranking models in a distributed computing environment.
EMEA Faculty Summit 2012
Tuesday, October 02, 2012
Michel Benard, University Relations Manager
Last week we held our fifth Europe, Middle East and Africa (EMEA) Faculty Summit in London, bringing together 94 of EMEA’s foremost computer science academics from 65 universities representing 25 countries, together with more than 60 Googlers.
This year’s jam-packed agenda included a welcome reception at the
Science Museum
(plus a tour of the special exhibition: “
Codebreaker - Alan Turing’s life and legacy
”), a keynote on “Research at Google” by
Alfred Spector
, Vice President of Research and Special Initiatives and a welcome address by Nelson Mattos, Vice President of Engineering and Products in EMEA, covering Google’s engineering activity and recent innovations in the region.
The Faculty Summit is a chance for us to meet with academics in Computer Science and other areas to discuss the latest exciting developments in research and education, and to explore ways in which we can collaborate via our our
University Relations programs
.
The two and a half day program consisted of tech talks, break out sessions, a panel on online education, and demos. The program covered a variety of computer science topics including Infrastructure, Cloud Computing Applications, Information Retrieval, Machine Translation, Audio/Video, Machine Learning, User Interface, e-Commerce, Digital Humanities, Social Media, and Privacy. For example,
Ed H. Chi
summarized how researchers use
data analysis to understand the ways users share content with their audiences
using the
Circle feature in Google+
.
Jens Riegelsberger
summarized how UI design and user experience research is essential to creating a seamless experience on Google Maps.
John Wilkes
discussed some of the research challenges - and opportunities - associated with building, managing, and using computer systems at massive scale. Breakout sessions ranged from technical follow-ups on the talk topics to discussing ways to increase the presence of women in computer science.
We also held one-on-one sessions where academics and Googlers could meet privately and discuss topics of personal interest, such as how to develop a compelling research award proposal, how to apply for a sabbatical at Google or how to gain Google support for a conference in a particular research area.
The Summit provides a great opportunity to build and strengthen research and academic collaborations. Our hope is to drive research and education forward by fostering mutually beneficial relationships with our academic colleagues and their universities.
Running Continuous Geo Experiments to Assess Ad Effectiveness
Tuesday, September 18, 2012
Posted by Jon Vaver, Research Scientist and Lizzy Van Alstine, Marketing Manager
Advertisers have a fundamental need to measure the effectiveness of their advertising campaigns. In a
previous paper
, we described the application of geo experiments to measuring the impact of advertising on consumer behavior (e.g. clicks, conversions, downloads). This method involves randomly assigning experimental units to control and test conditions and measuring the subsequent impact on consumer behavior. It is a practical way of incorporating the gold standard of randomized experiments into the analysis of marketing effectiveness. However, advertising decisions are not static, and the original method is most applicable to a one-time analysis. In a follow-up
paper
, we generalize the approach to accommodate periodic (ongoing) measurement of ad effectiveness.
In this expanded approach, the test and control assignments of each geographic region rotate across multiple test periods, and these rotations provide the opportunity to generate a sequence of measurements of campaign effectiveness. The data across test periods can also be pooled to create a single aggregate measurement of campaign effectiveness. These sequential and pooled measurements have smaller confidence intervals than measurements from a series of geo experiments with a single test period. Alternatively, the same confidence interval can be achieved with a reduced magnitude or duration of ad spend change, thereby lowering the cost of measurement. The net result is a better method for periodic
and
isolated measurement of ad effectiveness.
Power Searching with Google is back
Tuesday, September 11, 2012
Posted by Dan Russell, Uber Tech Lead, Search Quality & User Happiness
If you missed
Power Searching with Google
a few months ago
or were unable to complete the course the first time around, now’s your chance to
sign up again
for our free online course that aims to empower our users with the tools and knowledge to find what they’re looking for more quickly and easily.
The community-based course features six 50-minute classes along with interactive activities and the opportunity to hear from search experts and Googlers about how search works. Beginning September 24, you can take the classes over a two-week period, share what you learn with other students in a community forum, and complete the course assessments to earn a certificate of completion.
During the course’s first run in July, people told us how they not only liked learning about new features and more efficient ways to use Google, but they also enjoyed sharing tips and learning from one another through the forums and Hangouts. Ninety-six percent of people who completed the course also said they liked the format and would be interested in taking similar courses, so we plan to offer a suite of upcoming courses in the coming months, including Advanced Power Searching.
Stay tuned for further announcements on those upcoming courses, and don’t forget to
register now for Power Searching with Google
. You’ll learn about things like how to search by color, image, and time and how to solve harder trivia questions like our
A Google a Day
questions. We’ll see you when we start up in two weeks!
Helping the World to Teach
Tuesday, September 11, 2012
Posted by Peter Norvig, Director of Research
In July,
Research at Google
ran a large open online course,
Power Searching with Google
, taught by search expert, Dan Russell. The course was
successful
, with 155,000 registered students. Through this experiment, we learned that Google technologies can help bring education to a global audience. So we packaged up the technology we used to build Power Searching and are providing it as an open source project called
Course Builder
. We want to make this technology available so that others can experiment with online learning.
The Course Builder open source project is an experimental early step for us in the world of online education. It is a snapshot of an approach we found useful and an indication of our future direction. We hope to continue development along these lines, but we wanted to make this limited code base available now, to see what early adopters will do with it, and to explore the future of learning technology. We will be hosting a community building event in the upcoming months to help more people get started using this software.
edX
shares in the open source vision for online learning platforms, and Google and the
edX
team are in discussions about open standards and technology sharing for course platforms.
We are excited that
Stanford University
,
Indiana University
,
UC San Diego
,
Saylor.org
,
LearningByGivingFoundation.org
,
Swiss Federal Institute of Technology in Lausanne (EPFL)
, and a group of universities in Spain led by
Universia
,
CRUE
, and
Banco Santander-Universidades
are considering how this experimental technology might work for some of their online courses. Sebastian Thrun at
Udacity
welcomes this new option for instructors who would like to create an online class, while Daphne Koller at Coursera notes that the educational landscape is changing and it is exciting to see new avenues for teaching and learning emerge. We believe Google’s preliminary efforts here may be useful to those looking to scale online education through the cloud.
Along with releasing the experimental open source code, we’ve provided documentation and forums for anyone to learn how to develop and deploy an online course like
Power Searching
. In addition, over the next two weeks we will provide educators the opportunity to connect with the Google team working on the code via Google Hangouts. For access to the code, documentation, user forum, and information about the Hangouts, visit the
Course Builder Open Source Project Page
. To see what is possible with the Course Builder technology register for Google’s next version of
Power Searching
. We invite you to explore this brave new world of online learning with us.
Users love simple and familiar designs – Why websites need to make a great first impression
Wednesday, August 29, 2012
Posted by Javier Bargas-Avila, Senior User Experience Researcher at YouTube UX Research
I’m sure you’ve experienced this at some point: You click on a link to a website, and after a quick glance you already know you’re not interested, so you click ‘back’ and head elsewhere. How did you make that snap judgment? Did you really read and process enough information to know that this website wasn’t what you were looking for? Or was it something more immediate?
We form first impressions of the people and things we encounter in our daily lives in an extraordinarily short timeframe. We know the first impression a website’s design creates
is crucial
in capturing users’ interest. In less than 50 milliseconds, users build an initial “gut feeling” that helps them decide whether they’ll stay or leave. This first impression depends on many factors: structure, colors, spacing, symmetry, amount of text, fonts, and more.
In
our study
we investigated how users' first impressions of websites are influenced by two design factors:
Visual complexity -- how complex the visual design of a website looks
Prototypicality -- how representative a design looks for a certain category of websites
We presented screenshots of existing websites that varied in both of these factors -- visual complexity and prototypicality -- and asked users to rate their beauty.
The results show that both visual complexity and prototypicality play crucial roles in the process of forming an aesthetic judgment. It happens within incredibly short timeframes between 17 and 50 milliseconds. By comparison, the average blink of an eye takes 100 to 400 milliseconds.
And these two factors are interrelated: if the visual complexity of a website is high, users perceive it as less beautiful, even if the design is familiar. And if the design is unfamiliar -- i.e., the site has low prototypicality -- users judge it as uglier, even if it’s simple.
In other words, users strongly prefer website designs that look both
simple
(low complexity) and
familiar
(high prototypicality). That means if you’re designing a website, you’ll want to consider both factors. Designs that contradict what users typically expect of a website may hurt users’ first impression and damage their expectations.
Recent research
shows that negative product expectations lead to lower satisfaction in product interaction -- a downward spiral you’ll want to avoid. Go for simple and familiar if you want to appeal to your users’ sense of beauty.
Google at UAI 2012
Tuesday, August 28, 2012
Posted by Kevin Murphy, Research Scientist
The conference on
Uncertainty in Artificial Intelligence
(UAI) is one of the premier venues for research related to probabilistic models and reasoning under uncertainty. This year's conference (the 28th) set several new records: the largest number of submissions (304 papers, last year 285), the largest number of participants (216, last year 191), the largest number of tutorials (4, last year 3), and the largest number of workshops (4, last year 1). We interpret this as a sign that the conference is growing, perhaps as part of the larger trend of increasing interest in machine learning and data analysis.
There were many interesting presentations. A couple of my favorites included:
"
Video In Sentences Out
," by Andrei Barbu et al. This demonstrated an impressive system that is able to create grammatically correct sentences describing the objects and actions occurring in a variety of different videos.
"
Exploiting Compositionality to Explore a Large Space of Model Structures
," by Roger Grosse et al. This paper (which won the Best Student Paper Award) proposed a way to view many different latent variable models for matrix decomposition - including PCA, ICA, NMF, Co-Clustering, etc. - as special cases of a general grammar. The paper then showed ways to automatically select the right kind of model for a dataset by performing greedy search over grammar productions, combined with Bayesian inference for model fitting.
A strong theme this year was causality. In fact, we had an
invited talk
on the topic by
Judea Pearl
, winner of the 2011 Turing Award, in addition to a one-day workshop. Although causality is sometimes regarded as something of an academic curiosity, its relevance to important practical problems (e.g., to medicine, advertising, social policy, etc.) is becoming more clear. There is still a large gap between theory and practice when it comes to making causal predictions, but it was pleasing to see that researchers in the UAI community are making steady progress on this problem.
There were two presentations at UAI by Googlers. The first, "
Latent Structured Ranking
," by Jason Weston and John Blitzer, described an extension to a ranking model called Wsabie, that was published at ICML in 2011, and is widely used within Google. The Wsabie model embeds a pair of items (say a query and a document) into a low dimensional space, and uses distance in that space as a measure of semantic similarity. The UAI paper extends this to the setting where there are multiple candidate documents in response to a given query. In such a context, we can get improved performance by leveraging similarities between documents in the set.
The second paper by Googlers, "
Hokusai - Sketching Streams in Real Time
," was presented by Sergiy Matusevych, Alex Smola and Amr Ahmed. (Amr recently joined Google from Yahoo, and Alex is a visiting faculty member at Google.) This paper extends the Count-Min sketch method for storing approximate counts to the streaming context. This extension allows one to compute approximate counts of events (such as the number of visitors to a particular website) aggregated over different temporal extents. The method can also be extended to store approximate n-gram statistics in a very compact way.
In addition to these presentations, Google was involved in UAI in several other ways: I held a program co-chair position on the
organizing committee
, several of the referees and attendees work at Google, and Google provided some sponsorship for the conference.
Overall, this was a very successful conference, in an idyllic setting (Catalina Island, an hour off the coast of Los Angeles). We believe UAI and its techniques will grow in importance as various organizations -- including Google -- start combining structured, prior knowledge with raw, noisy unstructured data.
Better table search through Machine Learning and Knowledge
Thursday, August 23, 2012
Posted By Johnny Chen, Product Manager, Google Research
The Web offers a trove of structured data in the form of tables. Organizing this collection of information and helping users find the most useful tables is a key mission of
Table Search
from Google Research. While we are still a long way away from the perfect table search, we made a few steps forward recently by revamping how we determine which tables are "good" (one that contains meaningful structured data) and which ones are "bad" (for example, a table that hold the layout of a Web page). In particular, we switched from a rule-based system to a machine learning classifier that can tease out subtleties from the table features and enables rapid quality improvement iterations. This new classifier is a
support vector machine
(SVM) that makes use of multiple
kernel functions
which are automatically combined and optimized using training examples. Several of these kernel combining techniques were in fact studied and developed within Google Research [1,2].
We are also able to achieve a better understanding of the tables by leveraging the
Knowledge Graph
. In particular, we improved our algorithms for identifying the context and topics of each table, the entities represented in the table and the properties they have. This knowledge not only helps our classifier make a better decision on the quality of the table, but also enables better matching of the table to the user query.
Finally, you will notice that we added an easy way for our users to import Web tables found through Table Search into their
Google Drive
account as
Fusion Tables
. Now that we can better identify good tables, the import feature enables our users to further explore the data. Once in Fusion Tables, the data can be visualized, updated, and accessed programmatically using the
Fusion Tables API
.
These enhancements are just the start. We are continually updating the quality of our Table Search and adding features to it.
Stay tuned for more from Boulos Harb, Afshin Rostamizadeh, Fei Wu, Cong Yu and the rest of the Structured Data Team.
[1]
Algorithms for Learning Kernels Based on Centered Alignment
[2]
Generalization Bounds for Learning Kernels
Machine Learning Book for Students and Researchers
Wednesday, August 22, 2012
Posted by Afshin Rostamizadeh, Google Research
Our machine learning book,
The Foundations of Machine Learning
, is now published! The book, with authors from both Google Research and academia, covers a large variety of fundamental machine learning topics in depth, including the theoretical basis of many learning algorithms and key aspects of their applications. The material presented takes its origin in a machine learning graduate course, "Foundations of Machine Learning", taught by
Mehryar Mohri
over the past seven years and has considerably benefited from comments and suggestions from students and colleagues at Google.
The book can serve as a textbook for both graduate students and advanced undergraduate students and a reference manual for researchers in machine learning, statistics, and many other related areas. It includes as a supplement introductory material to topics such as linear algebra and optimization and other useful conceptual tools, as well as a large number of exercises at the end of each chapter whose full solutions are provided online.
Faculty Summit 2012: Online Education Panel
Monday, August 20, 2012
Posted by
Peter Norvig
, Director of Research
On July 26th, Google's 2012
Faculty Summit
hosted computer science professors from around the world for a chance to talk and hear about some of the work done by Google and by our faculty partners. One of the sessions was a panel on Online Education. Daphne Koller's presentation on "
Education at Scale
" describes how a talk about YouTube at the 2009 Google Faculty Summit was an early inspiration for her, as she was formulating her approach that led to the founding of
Coursera
. Koller started with the goal of allowing Stanford professors to have more time for meaningful interaction with their students, rather than just lecturing, and ended up with a model based on the flipped classroom, where students watch videos out of class, and then come together to discuss what they have learned. She then refined the flipped classroom to work when there is no classroom, when the interactions occur in online discussion forums rather than in person. She described some fascinating experiments that allow for more flexible types of questions (beyond multiple choice and fill-in-the-blank) by using peer grading of exercises.
In my
talk
, I describe how I arrived at a similar approach but starting with a different motivation: I wanted a textbook that was more interactive and engaging than a static paper-based book, so I too incorporated short videos and frequent interactions for the
Intro to AI class
I taught with Sebastian Thrun.
Finally, Bradley Horowitz, Vice President of Product Management for Google+ gave a
talk
describing the goals of Google+. It is not to build the largest social network; rather it is to understand our users better, so that we can serve them better, while respecting their privacy, and keeping each of their conversations within the appropriate circle of friends. This allows people to have more meaningful conversations, within a limited context, and turns out to be very appropriate to education.
By bringing people together at events like the Faculty Summit, we hope to spark the conversations and ideas that will lead to the next breakthroughs, perhaps in online education, or perhaps in other fields. We'll find out a few years from now what ideas took root at this year's Summit.
Improving Google Patents with European Patent Office patents and the Prior Art Finder
Tuesday, August 14, 2012
Posted by Jon Orwant, Engineering Manager
Cross-posted with the
US Public Policy Blog
, the
European Public Policy Blog
, and
Inside Search Blog
At Google, we're constantly trying to make important collections of information more useful to the world. Since 2006, we’ve let people discover, search, and read United States patents online. Starting this week, you can do the same for the millions of ideas that have been submitted to the European Patent Office, such as
this one
.
Typically, patents are granted only if an invention is new and not obvious. To explain why an invention is new, inventors will usually cite prior art such as earlier patent applications or journal articles. Determining the novelty of a patent can be difficult, requiring a laborious search through many sources, and so we’ve built a Prior Art Finder to make this process easier. With a single click, it searches multiple sources for related content that existed at the time the patent was filed.
Patent pages now feature a “Find prior art” button that instantly pulls together information relevant to the patent application.
The Prior Art Finder identifies key phrases from the text of the patent, combines them into a search query, and displays relevant results from Google Patents, Google Scholar, Google Books, and the rest of the web. You’ll start to see the blue “Find prior art” button on individual patent pages starting today.
Our hope is that this tool will give patent searchers another way to discover information relevant to a patent application, supplementing the search techniques they use today. We’ll be refining and extending the Prior Art Finder as we develop a better understanding of how to analyze patent claims and how to integrate the results into the workflow of patent searchers.
These are small steps toward making this collection of important but complex documents better understood. Sometimes language can be a barrier to understanding, which is why earlier this year we
released an update to Google Translate
that incorporates the European Patent Office’s parallel patent texts, allowing the EPO to provide translation between English, French, German, Spanish, Italian, Portuguese, and Swedish, with more languages scheduled for the future. And with the help of the United States Patent & Trademark Office, we’ve continued to add to
our repository of USPTO bulk data
, making it easier for researchers and law firms to analyze the entire corpus of US patents. More to come!
Teaching the World to Search
Wednesday, August 08, 2012
Posted by Maggie Johnson, Director of Education and University Relations
For two weeks in July, we ran Power Searching with Google, a
MOOC
(Massive Open Online Course) similar to those
pioneered by Stanford and MIT
. We blended this format with our social and communication tools to create a community learning experience around search. The course covered tips and tricks for Google Search, like using the search box as a calculator, or color filtering to find images.
The course had interactive activities to practice new skills and reinforce learning, and many opportunities to connect with other students using tools such as Google Groups, Moderator and Google+. Two of our search experts, Dan Russell and Matt Cutts, moderated Hangouts on Air, answering dozens of questions from students in the course. There were pre-, mid- and post-class assessments that students were required to pass to receive a certificate of completion. The
course content
is still available.
We had 155,000 students register for the course, from 196 countries. Of these, 29% of those who completed the first assessment passed the course and received a certificate. What was especially surprising was 96% of the students who completed the course liked the format and would be interested in taking other MOOCs.
This learning format is not new, as anyone who has worked in eLearning over the past 20 years knows. But what makes it different now is the large, global cohort of students who go through the class together. The discussion forums and Google+ streams were very active with students asking and answering questions, and providing additional ideas and content beyond what’s offered by the instructor. This learning interaction enabled by a massive “classroom”, is truly a new experience for students and teachers in an online environment.
Going forward, we will be offering Power Searching with Google again, so if you missed the first opportunity to get your certificate, you’ll have a second chance. Watch here for news about Power Searching as well as some educational ideas that we are exploring.
Speech Recognition and Deep Learning
Monday, August 06, 2012
Posted by Vincent Vanhoucke, Research Scientist, Speech Team
The New York Times recently published
an article
about Google’s large scale deep learning project, which learns to discover patterns in large datasets, including... cats on YouTube!
What’s the point of building a gigantic cat detector you might ask? When you combine large amounts of data, large-scale distributed computing and powerful machine learning algorithms, you can apply the technology to address a large variety of practical problems.
With the launch of the latest Android platform release, Jelly Bean, we’ve taken a significant step towards making that technology useful: when you speak to your Android phone, chances are, you are talking to a neural network trained to recognize your speech.
Using neural networks for speech recognition is nothing new: the first proofs of concept were developed in the late 1980s
(1)
, and after what can only be described as a 20-year dry-spell, evidence that the technology could scale to modern computing resources has recently begun to emerge
(2)
. What changed? Access to larger and larger databases of speech, advances in computing power, including GPUs and fast distributed computing clusters such as the
Google Compute Engine
, unveiled at
Google I/O
this year, and a better understanding of how to scale the algorithms to make them effective learners.
The research, which reduces the error rate by over 20%, will be presented
(3)
at a conference this September, but true to our
philosophy of integrated research
, we’re delighted to bring the bleeding edge to our users first.
--
1 Phoneme recognition using time-delay neural networks, A. Waibel, T. Hanazawa, G. Hinton, K. Shikano and K.J. Lang. IEEE Transactions on Acoustics, Speech and Signal Processing, vol.37, no.3, pp.328-339, Mar 1989.
2 Acoustic Modeling using Deep Belief Networks, A. Mohamed, G. Dahl and G. Hinton. Accepted for publication in IEEE Transactions on Audio, Speech and Language Processing.
3 Application Of Pretrained Deep Neural Networks To Large Vocabulary Speech Recognition, N. Jaitly, P. Nguyen, A. Senior and V. Vanhoucke, Accepted for publication in the Proceedings of Interspeech 2012.
Reflections on Digital Interactions: Thoughts from the 2012 NA Faculty Summit
Thursday, August 02, 2012
Posted by Alfred Spector, Vice President of Research and Special Initiatives
Last week, we held our eighth annual North America
Computer Science Faculty Summit
at our headquarters in Mountain View. Over 100 leading faculty joined us from 65 universities located in North America, Asia Pacific and Latin America to attend the two-day Summit, which focused on new interactions in our increasingly digital world.
In my introductory remarks, I shared some themes that are shaping our research agenda. The first relates to the amazing scale of systems we now can contemplate. How can we get to computational clouds of, perhaps, a billion cores (or processing elements)? How can such clouds be efficient and manageable, and what will they be capable of? Google is actively working on most aspects of large scale systems, and we continue to look for opportunities to collaborate with our academic colleagues. I note that we announced a cloud-based
program
to support Education based on Google App Engine technology.
Another theme in my introduction was semantic understanding. With the introduction of our
Knowledge Graph
and other work, we are making great progress toward data-driven analysis of the meaning of information. Users, who provide a continual stream of subtle feedback, drive continuous improvement in the quality of our systems, whether about a celebrity, the meaning of a word in context, or a historical event. In addition, we have found that the combination of information from multiple sources helps us understand meaning more efficiently. When multiple signals are aggregated, particularly with different types of analysis, we have fewer errors and improved semantic understanding. Applying the “combination hypothesis,” makes systems more intelligent.
Finally, I talked about User Experience. Our field is developing ever more creative user interfaces (which both present information to users, and accept information from them), partially due to the revolution in mobile computing but also due in-part to the availability of large-scale processing in the cloud and deeper semantic understanding. There is no doubt that our interactions with computers will be vastly different 10 years from now, and they will be significantly more fluid, or natural.
This page
lists the Googler and Faculty presentations at the summit.
One of the highest intensity sessions we had was the panel on online learning with Daphne Koller from Stanford/Coursera, and Peter Norvig and Bradley Horowitz from Google. While there is a long way to go, I am so pleased that academicians are now thinking seriously about how information technology can be used to make education more effective and efficient. The infrastructure and user-device building blocks are there, and I think the community can now quickly get creative and provide the experiences we want for our students. Certainly, our own recent experience with our online
Power Searching Course
shows that the baseline approach works, but it also illustrates how much more can be done.
I asked Elliot Solloway (University of Michigan) and Cathleen Norris (University of North Texas), two faculty attendees, to provide their perspective on the panel and they have posted their reflections on
their blog
.
The digital era is changing the human experience. The summit talks and sessions exemplified the new ways in which we interact with devices, each other, and the world around us, and revealed the vast potential for further innovation in this space. Events such as these keep ideas flowing and it’s immensely fun to be part of very broadly-based, computer science community.
Natural Language in Voice Search
Tuesday, July 31, 2012
Posted by Jakob Uszkoreit, Software Engineer
On July 26 and 27, we held our eighth annual
Computer Science Faculty Summit
on our Mountain View Campus. During the event, we brought you a series of blog posts dedicated to sharing the Summit's talks, panels and sessions, and we continue with this glimpse into natural language in voice search. --Ed
At this year’s Faculty Summit, I had the opportunity to showcase the newest version of
Google Voice Search
. This version hints at how Google Search, in particular on mobile devices and by voice, will become increasingly capable of responding to natural language queries.
I first outlined the trajectory of Google Voice Search, which was initially released in 2007.
Voice actions
, launched in 2010 for Android devices, made it possible to control your device by speaking to it. For example, if you wanted to set your device alarm for 10:00 AM, you could say “set alarm for 10:00 AM. Label: meeting on voice actions.” To indicate the subject of the alarm, a meeting about voice actions, you would have to use the keyword “label”! Certainly not everyone would think to frame the requested action this way. What if you could speak to your device in a more natural way and have it understand you?
At last month’s
Google I/O 2012
, we announced a version of voice actions that supports much more natural commands. For instance, your device will now set an alarm if you say “my meeting is at 10:00 AM, remind me”. This makes even previously existing functionality, such as sending a text message or calling someone, more discoverable on the device -- that is, if you express a voice command in whatever way feels natural to you, whether it be “let David know I’ll be late via text” or “make sure I buy milk by 3 pm”, there is now a good chance that your device will respond how you anticipated it to.
I then discussed some of the possibly unexpected decisions we made when designing the system we now use for interpreting natural language queries or requests. For example, as you would expect from Google, our approach to interpreting natural language queries is data-driven and relies heavily on machine learning. In complex machine learning systems, however, it is often difficult to figure out the underlying cause for an error: after supplying them with training and test data, you merely obtain a set of metrics that hopefully give a reasonable indication about the system’s quality but they fail to provide an explanation for why a certain input lead to a given, possibly wrong output.
As a result, even understanding why some mistakes were made requires experts in the field and detailed analysis, rendering it nearly impossible to harness non-experts in analyzing and improving such systems. To avoid this, we aim to make every partial decision of the system as interpretable as possible. In many cases, any random speaker of English could look at its possibly erroneous behavior in response to some input and quickly identify the underlying issue - and in some cases even fix it!
We are especially interested in working with our academic colleagues on some of the many fascinating research and engineering challenges in building large-scale, yet interpretable natural language understanding systems and devising the machine learning algorithms this requires.
New Challenges in Computer Science Research
Friday, July 27, 2012
Posted by Jeff Walz, Head of University Relations
Yesterday afternoon at the
2012 Computer Science Faculty Summit
, there was a round of lightning talks addressing some of the research problems faced by Google across several domains. The talks pointed out some of the biggest challenges emerging from increasing digital interaction, which is this year’s Faculty Summit theme.
Research Scientist
Vivek Kwatra
kicked things off with a talk about video stabilization on YouTube. The popularity of mobile devices with cameras has led to an explosion in the amount of video people capture, which can often be shaky. Vivek and his team have found algorithmic approaches to make casual videos look more professional by simulating professional camera moves. Their stabilization technology vastly improves the quality of amateur footage.
Next,
Ed Chi
(Research Scientist) talked about social media focusing on the experimental circle model that characterizes Google+. Ed is particularly interested in how social interaction on the web can be designed to mimic live communication. Circles on Google+ allow a user to manage their audience and share content in a targeted fashion, which reflects face-to-face interaction. Ed discussed how, from an HCI perspective, the challenge going forward is the need to consider the trinity of social media: context, audience, content.
John Wilkes
, Principal Software Engineer, talked about cluster management at Google and the challenges of building a new cluster manager-- that is, an operating system for a fleet of machines. Everything at Google is big and a consequence of operating at such tremendous scale is that machines are bound to fail. John’s team is working to make things easier for internal users enabling our ability to respond to more system requests. There are several hard problems in this domain, such as issues with configuration, making it as easy as possible to run a binary, increasing failure tolerance, and helping internal users understand their own needs as well as the behavior and performance of their system in our complicated distributed environment.
Research Scientist and coffee connoisseur
Alon Halevy
took to the podium to confirm that he did indeed author an empirical book on coffee, and also talked with attendees about structured data on the web. Structured data is comprised of hundreds of millions of (relatively small) tables of data, and Alon’s work is focused on enabling data enthusiasts to discover and visualize those data sets. Great possibilities open up when people start combining data sets in meaningful ways, which inspired the creation of
Fusion Tables
. An example is a map made in the aftermath of the 2011 earthquake and tsunami in Japan, that shows natural disaster data alongside the locations of the world’s nuclear plants. Moving forward, Alon’s team will continue to think about interesting things that can be done with data, and the techniques needed to distinguish good data from bad data.
To wrap up the session, Praveen Paritosh did a brief, but deep dive into the
Knowledge Graph
, an intelligent model that understands real-world entities and their relationships to one another-- things, not strings-- which
launched
earlier this year.
The Google Faculty Summit continued today with more talks, and breakout sessions centered on our theme of digital interaction. Check back for additional blog posts in the coming days.
Education in the Cloud
Friday, July 27, 2012
Posted by Andrea Held, University Relations
In the last 10 years, we’ve seen a major transition from stand-alone applications that run on desktop computers to applications running in the cloud. Unfortunately, many computer science students don’t have the opportunity to learn and work in the cloud due to a lack of resources in traditional undergrad programs. Without this access students are limited to the resources their school can provide.
So today, we’re announcing a new award program: the
Google App Engine Education Awards
. We are excited because
Google App Engine
can teach students how to build sophisticated large-scale systems in the cloud without needing access to a large physical network.
Google App Engine can be used to build mobile or social applications, traditional browser-based applications, or stand-alone web services that scale to millions of users with ease. The Google App Engine infrastructure and storage tools are useful for collecting and analyzing educational data, building a learning management system to organize courses, or implementing a teacher forum for exchanging ideas and practices. All of these adaptations of the Google App Engine platform will use the same infrastructure that powers Google.
We invite teachers at universities across the United States to submit a proposal describing how to use Google App Engine for their course development, educational research or tools, or for student projects. Selected proposals will receive $1,000 in App Engine credits.
If you teach at an accredited college, university or community college in the US, we encourage you to apply. You can submit a proposal by filling out
this form
. The application deadline is midnight PST August 31, 2012.
Big Pictures with Big Messages
Thursday, July 26, 2012
Posted by Maggie Johnson, Director of Education and University Relations
Google’s Eighth Annual
Computer Science Faculty Summit
opened today in Mountain View with a fascinating talk by Fernanda Viégas and Martin Wattenberg, leaders of the data visualization group at our Cambridge office. They provided insight into their design process in visualizing big data, by highlighting Google+ Ripples and a map of the wind they created.
To preface his explanation of the design process, Martin shared that his team “wants visualization to be ‘G-rated,’ showing the full detail of the data - there’s no need to simplify it, if complexity is done right.” Martin discussed how their
wind map
started as a personal art project, but has gained interest particularly among groups that are interested in information on the wind (sailors, surfers, firefighters). The map displays surface wind data from the
US National Digital Forecast Database
and updates hourly. You can zoom around the United States looking for where the winds are fastest - often around lakes or just offshore - or check out the
gallery
to see snapshots of the wind from days past.
Fernanda discussed the development of
Google+ Ripples
, a visualization that shows
how news spreads
on Google+. The visualization shows spheres of influence and different patterns of spread. For example, someone might post a video to their Google+ page and if it goes viral, we’ll see several circles in the visualization. This depicts the influence of different individuals sharing content, both in terms of the number of their followers and the re-shares of the video, and has revealed that individuals are at times more influential than organizations in the social media domain.
Martin and Fernanda closed with two important lessons in data visualization: first, don’t “dumb down” the data. If complexity is handled correctly and in interesting ways, our users find the details appealing and find their own ways to interact with and expand upon the data. Second, users like to see their personal world in a visualization. Being able to see the spread of a Google+ post, or zoom in to see the wind around one’s town is what makes a visualization personal and compelling-- we call this the “I can see my house from here” feature.
The
Faculty Summit
will continue through Friday, July 27 with talks by Googlers and faculty guests as well as breakout sessions on specific topics related to this year’s theme of digital interactions. We will be looking closely at how computation and bits have permeated our everyday experiences via smart phones, wearable computing, social interactions, and education.
We will be posting here throughout the summit with updates and news as it happens.
Site Reliability Engineers: “solving the most interesting problems”
Wednesday, July 25, 2012
Posted by Chris Reid, Sydney Staffing team
I recently sat down with Ben Appleton, a Senior Staff Software Engineer, to talk about his recent move from Software Engineer (SWE) on the Maps team to Site Reliability Engineering (SRE). In the interview, Ben explains why he transitioned from a pure development role to a role in production, and how his work has changed:
Chris
: Tell us about your path to Google.
Ben
: Before I joined Google I didn’t consider myself a “software engineer”. I went to the University of Queensland and graduated with a Bachelor’s Degree in Electrical Engineering and Mathematics, before going on to complete a Ph.D. My field of research was image segmentation, extending graph cuts to continuous space for analyzing X-rays and MRIs. At a conference in France I met a friend of my Ph.D. advisor’s, and he raved about Google, commenting that they were one of the only companies that really understood technology. I’d already decided academia wasn’t for me, so I interviewed for a general Software Engineering role at Google. I enjoyed the interviews, met some really smart people, and learned about some interesting stuff they were working on. I joined the Maps team in Sydney in 2005 and spent the next 6 years working on the
Maps API
.
Chris
: Tell us about some of the coolest work you did for Google Maps, and how you applied your research background.
Ben
: My background in algorithms and computational geometry was really useful. We were basically making browsers do stuff they’re not designed to do, such as rendering millions of vectors or warping images, inventing techniques as we went. On the server-side we focused on content distribution, pushing tiles or vectors from Google servers down through caches to the user’s browser, optimizing for load and latency at every stage. On the client-side, we had to make the most of limited processors with new geometric algorithms and clever prefetching to hide network latency. It was really interesting work.
Chris
: I understand you received company-wide recognition when you were managing the Maps API team. Tell us more about what that entailed.
Ben
: In September 2008, when managing the Maps API, my team received an award that was recognized Google-wide, which is a big honor. My main contributions were latency optimizations, stability, enterprise support, and Street View integration. The award was in recognition of strong sustained growth of the Maps API, in relation to the number of sites using it, and total views per day. Currently the Google Maps API is serving more than 600,000 websites.
Chris
: So what prompted the move to Site Reliability Engineering (SRE)?
Ben
: In my experience, a lot of software engineers don’t understand what SREs do. I’d worked closely with SREs, particularly those in Sydney supporting Maps, and had formed a high opinion of them. They’re a very strong team - they’re smart and they get things done. After 6 years working on the Maps API I felt it was time for a change. In Sydney there are SWE teams covering most of the product areas, including Chrome and Apps, Social and Blogger, Infrastructure Networking and the Go programming language, as well as Maps and GeoCommerce. I talked to all of them, but chose SRE because in my opinion, they’re solving the most interesting problems.
Chris
: How would you describe SRE?
Ben
: It really depends on the individual. At one end are the Systems Administrator types, sustaining ridiculously large systems. But at the other end are the Software Engineers like me. As SREs get more experienced this distinction tends to be blurred. The best SREs think programmatically even if they don’t do the programming. For me, I don’t see a difference in my day-to-day role. When I was working on the Maps API I was the primary on-call one week in three, whereas in SRE the typical on-call roster is one week in six. When you’re primary on-call it just means you’re the go-to person for the team, responsible for when something breaks or pushing new code into production. I was spending 50% of my time doing coding and development work, and as an SRE this has increased to 80%.
Chris
: Wow! So as an SRE in Production, you’re spending less time on-call and more time writing code than you were as a SWE on the Maps team?
Ben
: Yes! I’m not managing a team now, but I’m definitely spending more time coding than I was before. I guess the average SRE spends 50% of their time doing development work, but as I said, it depends on the person and it ranges from 20-80%.
Chris
: What does your team do?
Ben
: In Sydney there are SRE teams supporting Maps, Blogger, App Engine, as well as various parts of the infrastructure and storage systems. I’m working on Blobstore, an infrastructure storage service based on Bigtable which simplifies building and deploying applications that store users' binary data (BLOBs, or "Binary Large OBjects"). Example BLOBs include images, videos, or email attachments - any data objects that are immutable and long-lived. The fact that we're storing user data means that Blobstore must be highly available for reads and writes, be extremely reliable (so that we never lose data), and be efficient in terms of storage usage (so that we can provide large amounts of storage to users at low cost).
Chris
: Tell us more about some of the problems you’re solving, and how they differ with those you faced as a SWE in a development role.
Ben
: With the massive expansion in online data storage, we’re solving problems at a scale never before seen. Due to the global nature of our infrastructure, we think in terms of load balancing at many levels: across regions, across data centers within a region, and across machines within a data center. The problems we’re facing in SRE are much closer to the metal. We’re constantly optimizing resource allocation and efficiency and scalability of Google’s massive computer systems, as opposed to developing new features for a product like Maps. So the nature of the work is very similar to SWE, but the problems are bigger and there is a strong focus on scale.
Chris
: Are you planning on staying in SRE for a while?
Ben
: Yeah. I signed up for a six month rotation program called “Mission Control,” the goal of which is to teach engineers to understand the challenges of building and operating a high reliability service at Google scale. In other words, it’s an SRE training program. In my first three months of Mission Control I’ve been on-call twice, and always during office hours so there were SREs to help me when I got into trouble...which I did. I’ve got no intention of going back to SWE at the end of the six months and plan to stay in SRE for at least a few years. Right now the problems seem more interesting. For example, last year’s storage solutions are facing additional strain from the growth of Gmail, Google+ and Google Drive. So you’re constantly reinventing.
Chris
: What advice do you have for Software Engineers contemplating a role in SRE?
Ben
: SRE gives you the opportunity to work on infrastructure at a really big scale in a way you don’t get to in SWE. Whereas SWE is more about developing new features, SRE is dealing with bigger problems and more complex engineering due to the sheer scale. SRE is a great way to learn how systems really work in order to become a great engineer.
If you’re interesting in applying for a Site Reliability Engineering role, please note that we advertise the roles in several different ways to reflect the diversity of the team. The two main roles are “Software Engineer, Google.com” and “Systems Engineer, Google.com”. We use the term “Google.com” to signify that the roles are in Production as opposed to R&D. You can find all the openings listed on the
Google jobs site
. We’re currently hiring across many regions, including Sydney in Australia, and of course Mountain View in California.
Google at SIGMOD/PODS 2012
Friday, July 13, 2012
Posted by
Anish Das Sarma
, Research Scientist and Jeff Shute, Software Engineer
Over the years,
SIGMOD
has expanded beyond a traditional "database" conference to include several areas related to information management. This year’s
ACM SIGMOD/PODS conference (on Management of Data, and Principles of Database Systems)
, held in Scottsdale, Arizona was no different. We were impressed by the wide variety of researchers from industry and academia alike the conference attracted, and enjoyed learning how others are pushing the limits of scalability in data storage and processing. In addition to an excellent set of papers on a large number of topics, we saw a couple of recurring themes:
1)
Data Visualization
Pat Hanrahan
from Stanford gave a keynote on some of the challenges involved in building systems to enable "data enthusiasts" to manage and visualize data.
Google’s
Fusion Tables
group also had a paper on this topic:
Efficient Spatial Sampling of Large Geographical Tables
, by Anish Das Sarma, Hongrae Lee, Hector Gonzalez, Jayant Madhavan, Alon Halevy. (This paper has been invited to a TODS special issue on best papers of SIGMOD 2012).
A similar effort from the University of Washington was presented as a demo:
VizDeck: Self-Organizing Dashboards for Visual Analytics
, by Alicia Key, Bill Howe, Daniel Perry, Cecilia Aragon.
2)
Big Data
As has been the case for the last couple of years, “Big Data" has been of ever-growing interest to the entire community, particularly from industry. Google presented a talk on
F1
, a new distributed database system we’ve built to power the AdWords system. A complex business application like AdWords has different requirements than many systems at Google that often use storage systems like Bigtable. We have a single database shared by hundreds of developers and systems, so we need the robustness and ease of use we’re used to from traditional databases. F1 is built to scale like Bigtable, without giving up the database features we also need, like strong consistency, ACID transactions, schema enforcement, and most importantly, SQL query.
There’s been a widespread trend over the last several years away from databases, towards highly scalable “NoSQL” systems. We don’t think that trade-off is necessary, and were happy to see several other speakers advocate a similar theme -- yes, databases are useful, and developers shouldn’t need to give up database features and ease of use in the name of scalability.
This theme was supported by an industry session on Big Data featuring talks from other companies: Facebook (TAO: How Facebook Serves the Social Graph), Twitter (Large-Scale Machine Learning at Twitter), and Microsoft (Recurring Job Optimization in Scope). Googler
Kirsten LeFevre
was a panelist on the "Perspectives on Big Data" panel organized by
Surajit Chaudhuri
from Microsoft, and also featuring
Donald Kossmann
from ETHZ,
Sam Madden
from MIT, and Anand Rajaraman from Walmart Labs. Last but not the least, Surajit Chaudhuri also gave an excellent keynote outlining some of the research challenges that the new era of "Big Data and Cloud" poses.
As has been the practice for several years now, to continue generating great interest in data management research, SIGMOD has been organizing panels such as this year's "New Research Symposium" (which included
Anish Das Sarma
from Google as a panelist).
In addition to sponsoring the conference, many Googlers attended contributing to a robust presence and affording us the opportunity to interact with the broader information management community. We've been pushing the frontiers of science with cutting-edge research in many aspects of data management, and we were eager to share our innovations and see what others have been working on. We found
Amin Vahdat's
keynote on the intersection of Networking and Databases to be a highlight of Google’s participation, which also included presenting papers, participating on panels, and taking part in planning and program committees:
Program Committee Members
Anish Das Sarma
, Venkatesh Ganti,
Zoltan Gyongyi
,
Alon Halevy
(Tutorials Chair),
Kristen LeFevre
,
Cong Yu
Talks
Symbiosis in Scale Out Networking and Data Management
Amin Vahdat, Google (Keynote)
F1-The Fault-Tolerant Distributed RDBMS Supporting Google's Ad Business
Jeff Shute, Mircea Oancea, Stephan Ellner, Ben Handy, Eric Rollins, Bart Samwel, Radek Vingralek, Chad Whipkey, Xin Chen, Beat Jegerlehner, Kyle Littlefield, Phoenix Tong (Googlers)
Finding Related Tables
Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Cong Yu (Googlers)
Papers
CloudRAMSort: Fast and Efficient Large-Scale Distributed RAM Sort on Shared-Nothing Cluster
Changkyu Kim, Jongsoo Park, Nadathur Satish, Hongrae Lee (Google), Pradeep Dubey, Jatin Chhugani
Efficient Spatial Sampling of Large Geographical Tables
Anish Das Sarma, Hongrae Lee, Hector Gonzalez, Jayant Madhavan, Alon Halevy (Googlers)
Panels
Perspectives on Big Data Plenary Session: Privacy and Big Data
Kristen LeFevre, Google
SIGMOD New Researcher Symposium - How to be a good advisor/advisee?
Anish Das Sarma, Google
Overall, this year’s SIGMOD was a great conference, widely attended by researchers from industry and academia, and comprised of a very interesting mix of research presentations and discussions. Google had a good showing at the conference, and we look forward to continuing this trend in the coming years.
Labels
accessibility
ACL
ACM
Acoustic Modeling
Adaptive Data Analysis
ads
adsense
adwords
Africa
AI
Algorithms
Android
Android Wear
API
App Engine
App Inventor
April Fools
Art
Audio
Augmented Reality
Australia
Automatic Speech Recognition
Awards
Cantonese
Chemistry
China
Chrome
Cloud Computing
Collaboration
Computational Imaging
Computational Photography
Computer Science
Computer Vision
conference
conferences
Conservation
correlate
Course Builder
crowd-sourcing
CVPR
Data Center
Data Discovery
data science
datasets
Deep Learning
DeepDream
DeepMind
distributed systems
Diversity
Earth Engine
economics
Education
Electronic Commerce and Algorithms
electronics
EMEA
EMNLP
Encryption
entities
Entity Salience
Environment
Europe
Exacycle
Expander
Faculty Institute
Faculty Summit
Flu Trends
Fusion Tables
gamification
Gboard
Gmail
Google Accelerated Science
Google Books
Google Brain
Google Cloud Platform
Google Docs
Google Drive
Google Genomics
Google Maps
Google Photos
Google Play Apps
Google Science Fair
Google Sheets
Google Translate
Google Trips
Google Voice Search
Google+
Government
grants
Graph
Graph Mining
Hardware
HCI
Health
High Dynamic Range Imaging
ICLR
ICML
ICSE
Image Annotation
Image Classification
Image Processing
Inbox
India
Information Retrieval
internationalization
Internet of Things
Interspeech
IPython
Journalism
jsm
jsm2011
K-12
KDD
Keyboard Input
Klingon
Korean
Labs
Linear Optimization
localization
Low-Light Photography
Machine Hearing
Machine Intelligence
Machine Learning
Machine Perception
Machine Translation
Magenta
MapReduce
market algorithms
Market Research
Mixed Reality
ML
MOOC
Moore's Law
Multimodal Learning
NAACL
Natural Language Processing
Natural Language Understanding
Network Management
Networks
Neural Networks
Nexus
Ngram
NIPS
NLP
On-device Learning
open source
operating systems
Optical Character Recognition
optimization
osdi
osdi10
patents
Peer Review
ph.d. fellowship
PhD Fellowship
PhotoScan
Physics
PiLab
Pixel
Policy
Professional Development
Proposals
Public Data Explorer
publication
Publications
Quantum AI
Quantum Computing
renewable energy
Research
Research Awards
resource optimization
Robotics
schema.org
Search
search ads
Security and Privacy
Semantic Models
Semi-supervised Learning
SIGCOMM
SIGMOD
Site Reliability Engineering
Social Networks
Software
Speech
Speech Recognition
statistics
Structured Data
Style Transfer
Supervised Learning
Systems
TensorBoard
TensorFlow
TPU
Translate
trends
TTS
TV
UI
University Relations
UNIX
User Experience
video
Video Analysis
Virtual Reality
Vision Research
Visiting Faculty
Visualization
VLDB
Voice Search
Wiki
wikipedia
WWW
YouTube
Archive
2018
May
Apr
Mar
Feb
Jan
2017
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Jul
May
Apr
Mar
Feb
2007
Oct
Sep
Aug
Jul
Jun
Feb
2006
Dec
Nov
Sep
Aug
Jul
Jun
Apr
Mar
Feb
Feed
Google
on
Follow @googleresearch
Give us feedback in our
Product Forums
.