Google Research Blog
The latest news from Research at Google
Two Googlers elected to the American Academy of Arts and Sciences
Thursday, April 25, 2013
Posted by Alfred Spector, Vice President, Engineering
Cross-posted with the
Official Google Blog
On Wednesday, the
American Academy of Arts and Sciences
announced its list of
2013 elected members
. We’re proud to congratulate
Peter Norvig
, director of research, and
Arun Majumdar
, vice president for energy; two Googlers who are among the new members elected this year.
Membership in the American Academy of Arts and Sciences is considered one of the nation’s highest honors, with those elected recognized as leaders in the arts, public affairs, business, and academic disciplines. With more than 250 Nobel Prize laureates and 60 Pulitzer Prize winners among its fellows, the American Academy celebrates the exceptional contributions of the elected members to critical social and intellectual issues.
With their election, Peter and Arun join six other Googlers as American Academy members: Eric Schmidt, Vint Cerf, Alfred Spector, Hal Varian, Ray Kurzweil, and founders Sergey Brin and Larry Page, all of whom embody our commitment to innovation and real-world impact. You can read more detailed summaries of Peter and Arun’s achievements below.
Dr. Peter Norvig,
currently director of research at Google, is known most for his broad expertise in computer science and artificial intelligence, exemplified by his co-authorship (with Stuart Russell) of the leading college text,
Artificial Intelligence: A Modern Approach
. With more than 50 publications and a plethora of webpages, essays and software programs on a wide variety of CS topics, Peter is a catalyst of fundamental research across a wide range of disciplines while remaining a hands-on scientist who writes his own code. Recently, he has taught courses on artificial intelligence and the design of computer programs via massively open online courses (MOOC). Learn more about Peter and his research on
norvig.com
.
Dr. Arun Majumdar
leads Google.org’s energy initiatives and advises Google on its broader energy strategy. Prior to joining Google last year, he was the founding director of the U.S. Department of Energy's
Advanced Research Projects Agency-Energy (ARPA-E)
, where he served from October 2009 until June 2012. Earlier, he was a professor of mechanical engineering as well as materials science and engineering at the University of California, Berkeley, and headed the
Environmental Energy Technologies Division
at the Lawrence Berkeley National Laboratory. He has published several hundred papers, patents, and conference proceedings. Find out
more about Arun
.
50,000 Lessons on How to Read: a Relation Extraction Corpus
Thursday, April 11, 2013
Posted by Dave Orr, Product Manager, Google Research
One of the most difficult tasks in NLP is called
relation extraction.
It’s an example of information extraction, one of the goals of natural language understanding. A relation is a semantic connection between (at least) two entities. For instance, you could say that
Jim Henson
was in a spouse relation with
Jane Henson
(and in a creator relation with
many
beloved
characters
and
shows
).
The goal of relation extraction is to learn relations from unstructured natural language text. The relations can be used to answer questions (“
Who created Kermit?
”), learn
which proteins interact
in the biomedical literature, or to build a database of
hundreds of millions of entities and billions of relations
to try and help people
explore the world’s information
.
To help researchers investigate relation extraction, we’re releasing a
human-judged dataset
of two relations about public figures on
Wikipedia
: nearly 10,000 examples of “place of birth”, and over 40,000 examples of “attended or graduated from an institution”. Each of these was judged by at least 5 raters, and can be used to train or evaluate relation extraction systems. We also plan to release more relations of new types in the coming months.
(Update: you can find additional relations
here
.)
Each relation is in the form of a triple: the relation in question, called a predicate; the subject of the relation; and the object of the relation. In the relation “Stephen Hawking graduated from Oxford,” Stephen Hawking is the subject, graduated from is the relation, and Oxford University is the object. Subjects and objects are represented by their
Freebase MID’s
, and the relation is defined as a
Freebase property
. So in this case, the triple would be represented as:
"pred":"
/education/education/institution
"
"sub":"
/m/01tdnyh
"
"obj":"
/m/07tgn
"
Just having the triples is interesting enough if you want a database of entities and relations, but doesn’t make much progress towards training or evaluation a relation extraction system. So we’ve also included the evidence for the relation, in the form of a URL and an excerpt from the web page that our raters judged. We’re also including examples where the evidence does not support the relation, so you have negative examples for use in training better extraction systems. Finally, we included ID’s and actual judgments of individual raters, so that you can filter triples by agreement.
Gory Details
The corpus itself, extracted from Wikipedia, can be found here:
https://code.google.com/p/relation-extraction-corpus/
The files are in
JSON
format. Each line is a triple with the following fields:
pred: predicate of a triple
sub: subject of a triple
obj: object of a triple
evidences: an array of evidences for this triple
url: the web page from which this evidence was obtained
snippet: short piece of text supporting the triple
judgments: an array of judgements from human annotators
rator: hash code of the identity of the annotator
judgment: judgement of the annotator. It can take the values "yes" or "no"
Here’s an example:
{"pred":"/people/person/place_of_birth","sub":"/m/026_tl9","obj":"/m/02_286","evidences":[{"url":"http://en.wikipedia.org/wiki/Morris_S._Miller","snippet":"Morris Smith Miller (July 31, 1779 -- November 16, 1824) was a United States Representative from New York. Born in New York City, he graduated from Union College in Schenectady in 1798. He studied law and was admitted to the bar. Miller served as private secretary to Governor Jay, and subsequently, in 1806, commenced the practice of his profession in Utica. He was president of the village of Utica in 1808 and judge of the court of common pleas of Oneida County from 1810 until his death."}],"judgments":[{"rater":"11595942516201422884","judgment":"yes"},{"rater":"16169597761094238409","judgment":"yes"},{"rater":"1014448455121957356","judgment":"yes"},{"rater":"16651790297630307764","judgment":"yes"},{"rater":"1855142007844680025","judgment":"yes"}]}
The web is chock full of information, put there to be read and learned from. Our hope is that this corpus is a small step towards computational understanding of the wealth of relations to be found everywhere you look.
This dataset is licensed by Google Inc. under the
Creative Commons Attribution-Sharealike 3.0
license.
Thanks to Shaohua Sun, Ni Lao, and Rahul Gupta for putting this dataset together.
Thanks also to Michael Ringgaard, Fernando Pereira, Amar Subramanya, Evgeniy Gabrilovich, and John Giannandrea for making this data release possible.
Advanced Power Searching with Google: Lessons Learned
Tuesday, April 09, 2013
Posted by Dan Russell, Uber Tech Lead, Search Quality & User Happiness and Maggie Johnson, Director of Education and University Relations
Large classes are something you normally want to avoid like the plague. So the idea of being in a class with tens of thousands of students seems like a completely crazy idea.
But in January, 2013, Google offered a free “MOOC” (a Massive Open Online Course) to teach Advanced Power Searching (APS) to a wide variety of information professionals.
The wholly online class ran for two weeks covering advanced research skills in a challenge-based format. It also had a bit more than 35,000 students sign up for the class.
In this case, the large class size was a boon to the students. Not only was there a vigorous discussion of the material in the social media, but with a class this large, anytime you had a question, someone else in the class had almost certainly asked the same question and had an answer ready. As in many MOOCs, the large online class size did not stress any lecture hall capacities, but it did give the students the benefit of multicultural classmates that were effectively always present in the social spaces of the MOOC.
A typical Massive Open Online Course (MOOC) is a simple progression through a series of mini-lectures--usually a short video followed by reflective questions, problem sets and a few assessments. MOOCs can have huge numbers of students; dozens have been offered with over 150,000 students enrolled. Based on our experiments with Power Searching with Google in 2012, we wanted to do something different. When we offered Advanced Power Searching with Google (APS) in January of 2013, we decided to try out a number of new ideas.
Through this course, we wanted to enable our students to solve complex research questions using a variety of tools, such as Google Scholar, Patents, Books, Google+, etc.. We defined complex problems that had more than one right answer and more than one way to find those answers.
Unlike a traditional MOOC, the APS course had twelve challenges that students could tackle in any order they liked. There were four easy, four medium and four difficult challenges. Part of the design of the class was to have students discover the skills they’d need to solve the challenges and select appropriate video or text lessons. Students could also access case studies that showed how others solve similar problems.
We called our MOOC design “Choose your own adventure.” Each challenge presented a research question like this:
“You are in the city that is home to the House of Light. Nearby there is a museum in a converted school featuring paintings from the far-away Forest of Honey.
What traditional festival are you visiting?”
In this class, the large cohort of 35,000 students worked through the materials together, using online forums to ask questions as well as Google+ Hangouts to attend office hours and collaborate on solving challenges. Instructor Dan Russell and a group of teaching assistants monitored students’ activities and provided support as needed.
If they needed additional help, students could post a question on the forum or see how others solved the challenge. Students could post their solutions to challenges in a special “Peer explanations” section; a feature that many students appreciated as it let them see how others in the class approached the problem in their own ways.
In analyzing the data, we found that there were a decreasing number of views on each challenge page, indicating that students most likely tried the challenges in the order given. While some liked the ability to jump around, most tended to go through the content linearly. Most students who completed the course tried (or at least looked at) all twelve challenges. Many students who did not complete the course tried three or fewer challenges.
To earn a certificate of completion, students submitted two detailed case studies of how they solved a complex search challenge. Students provided great examples of how they used Google tools to research their family’s history, the origins of common objects, or trips they anticipate taking. In addition to listing their queries, they wrote details about how they knew websites were credible and what they learned along the way.
To assess their work, we experimented with letting the students grade their assignments based on a rubric. We collected their scores and compared them with a random sample of assignments graded by TAs. There was a moderate yet statistically significant correlation (r=0.44) between student scores and TA scores. In fact, the majority of students graded themselves within two points of how an expert grader assessed their work. This is a positive result since it suggests that self-graded project work in a MOOC can be valuable as a source of insight into student performance.
The challenge format seemed to be effective and motivating for a small, dedicated population of students. We had 35,000 registrants for this advanced course, and 12% earned a certificate of completion. This rate is somewhat lower than what we saw for Power Searching with Google, a more traditional MOOC. Students who did not complete the course reported a lack of time, and difficulty of the content as barriers.
One interesting point was that labeling the challenges as easy, medium or difficult likely had an unintentional effect. The first challenge was marked as “easy,” but many people found it difficult. This may have de-motivated students from attempting more difficult challenges. Next time, we plan to ask students if the first challenge was too easy, or too challenging, and then send them to a challenge at an appropriate level of difficulty.
Watch for more MOOCs on our products and services in the coming months. And watch for more experimentation as we apply what we have learned, and try more ideas and new approaches in future online courses.
Labels
accessibility
ACL
ACM
Acoustic Modeling
Adaptive Data Analysis
ads
adsense
adwords
Africa
AI
Algorithms
Android
API
App Engine
App Inventor
April Fools
Art
Audio
Australia
Automatic Speech Recognition
Awards
Cantonese
China
Chrome
Cloud Computing
Collaboration
Computational Imaging
Computational Photography
Computer Science
Computer Vision
conference
conferences
Conservation
correlate
Course Builder
crowd-sourcing
CVPR
Data Center
Data Discovery
data science
datasets
Deep Learning
DeepDream
DeepMind
distributed systems
Diversity
Earth Engine
economics
Education
Electronic Commerce and Algorithms
electronics
EMEA
EMNLP
Encryption
entities
Entity Salience
Environment
Europe
Exacycle
Expander
Faculty Institute
Faculty Summit
Flu Trends
Fusion Tables
gamification
Gmail
Google Books
Google Brain
Google Cloud Platform
Google Docs
Google Drive
Google Genomics
Google Play Apps
Google Science Fair
Google Sheets
Google Translate
Google Trips
Google Voice Search
Google+
Government
grants
Graph
Graph Mining
Hardware
HCI
Health
High Dynamic Range Imaging
ICLR
ICML
ICSE
Image Annotation
Image Classification
Image Processing
Inbox
Information Retrieval
internationalization
Internet of Things
Interspeech
IPython
Journalism
jsm
jsm2011
K-12
KDD
Klingon
Korean
Labs
Linear Optimization
localization
Machine Hearing
Machine Intelligence
Machine Learning
Machine Perception
Machine Translation
MapReduce
market algorithms
Market Research
ML
MOOC
Multimodal Learning
NAACL
Natural Language Processing
Natural Language Understanding
Network Management
Networks
Neural Networks
Ngram
NIPS
NLP
open source
operating systems
Optical Character Recognition
optimization
osdi
osdi10
patents
ph.d. fellowship
PhD Fellowship
PiLab
Policy
Professional Development
Proposals
Public Data Explorer
publication
Publications
Quantum Computing
renewable energy
Research
Research Awards
resource optimization
Robotics
schema.org
Search
search ads
Security and Privacy
Semi-supervised Learning
SIGCOMM
SIGMOD
Site Reliability Engineering
Social Networks
Software
Speech
Speech Recognition
statistics
Structured Data
Style Transfer
Supervised Learning
Systems
TensorFlow
Translate
trends
TTS
TV
UI
University Relations
UNIX
User Experience
video
Video Analysis
Vision Research
Visiting Faculty
Visualization
VLDB
Voice Search
Wiki
wikipedia
WWW
YouTube
Archive
2017
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Jul
May
Apr
Mar
Feb
2007
Oct
Sep
Aug
Jul
Jun
Feb
2006
Dec
Nov
Sep
Aug
Jul
Jun
Apr
Mar
Feb
Feed
Google
on
Follow @googleresearch
Give us feedback in our
Product Forums
.