Google Research Blog
The latest news from Research at Google
Education Awards on Google App Engine
Wednesday, March 27, 2013
Posted by Andrea Held, Google University Relations
Cross-posted with
Google Developers Blog
Last year we
invited
proposals for innovative projects built on Google’s infrastructure. Today we are pleased to announce the 11 recipients of a
Google App Engine Education Award
. Professors and their students are using the award in cloud computing courses to study databases, distributed systems, web mashups and to build educational applications. Each selected project received $1000 in Google App Engine credits.
Awarding computational resources to classroom projects is always gratifying. It is impressive to see the creative ideas students and educators bring to these programs.
Below is a brief introduction to each project. Congratulations to the recipients!
John David N. Dionisio
, Loyola Marymount University
Project description
: The objective of this undergraduate database systems course is for students to implement one database application in two technology stacks, a traditional relational database and on Google App Engine. Students are asked to study both models and provide concrete comparison points.
Xiaohui (Helen) Gu
, North Carolina State University
Project description
:
Advanced Distributed Systems Class
The goal of the project is to allow the students to learn distributed system concepts by developing real distributed system management systems and testing them on real world cloud computing infrastructures such as Google App Engine.
Shriram Krishnamurthi
, Brown University
Project description
:
WeScheme
is a programming environment that runs in the Web browser and supports interactive development. WeScheme uses App Engine to handle user accounts, serverside compilation, and file management.
Feifei Li
, University of Utah
Project description
: A graduate-level course that will be offered in Fall 2013 on the design and implementation of large data management system kernels. The objective is to integrate features from a relational database engine with some of the new features from NoSQL systems to enable efficient and scalable data management over a cluster of commodity machines.
Mark Liffiton
, Illinois Wesleyan University
Project description
:
TeacherTap
is a free, simple classroom-response system built on Google App Engine. It lets students give instant, anonymous feedback to teachers about a lecture or discussion from any computer or mobile device with a web browser, facilitating more adaptive class sessions.
Eni Mustafaraj
, Wellesley College
Project description
: Topics in Computer Science: Web Mashups. A CS2 course that combines Google App Engine and MIT App Inventor. Students will learn to build apps with App Inventor to collect data about their life on campus. They will use Google App Engine to build web services and apps to host the data and remix it to create web mashups. Offered in the 2013 Spring semester.
Manish Parashar
, Rutgers University
Project description
: Cloud Computing for Scientific Applications -- Autonomic Cloud Computing teaches students how a hybrid HPC/Grid + Cloud cyber infrastructure can be effectively used to support real-world science and engineering applications. The goal of our efforts is to explore application formulations, Cloud and hybrid HPC/Grid + Cloud infrastructure usage modes that are meaningful for various classes of science and engineering application workflows.
Orit Shaer
, Wellesley College
Project description
:
GreenTouch
GreenTouch is a collaborative environment that enables novice users to engage in authentic scientific inquiry. It consists of a mobile user interface for capturing data in the field, a web application for data curation in the cloud, and a tabletop user interface for exploratory analysis of heterogeneous data.
Elliot Soloway
, University of Michigan
Project description
: WeLearn Mobile Platform: Making Mobile Devices Effective Tools for K-12. The platform makes mobile devices (Android, iOS, WP8) effective, essential tools for all-the-time, everywhere learning. WeLearn’s suite of productivity and communication apps enable learners to work collaboratively; WeLearn’s portal, hosted on Google App Engine, enables teachers to send assignments, review, and grade student artifacts. WeLearn is available to educators at no charge.
Jonathan White
, Harding University
Project description
: Teaching Cloud Computing in an Introduction to Engineering class for freshmen. We explore how well-designed systems are built to withstand unpredictable stresses, whether that system is a building, a piece of software or even the human body. The grant from Google is allowing us to add an overview of cloud computing as a platform that is robust under diverse loads.
Dr. Jiaofei Zhong
, University of Central Missouri
Project description
: By building an online Course Management System, students will be able to work on their team projects in the cloud. The system allows instructors and students to manage the course materials, including course syllabus, slides, assignments and tests in the cloud; the tool can be shared with educational institutions worldwide.
Scaling Computer Science Education
Wednesday, March 13, 2013
Posted by Maggie Johnson, Director of Education and University Relations
Last week, I attended the annual
SIGCSE
(Special Interest Group, Computer Science Education) conference in Denver, CO. Google has been a platinum sponsor of SIGCSE for many years now, and the conference provides an opportunity for hundreds of computer science (CS) educators to share ideas and work on strategies to bring high quality CS education to K12 and undergraduate students.
Significant accomplishments over the last few years have laid a strong foundation for scaling CS curriculum, professional development (PD) and related programs in this country. The
NSF
has been funding curriculum and PD around the new
CS Principles
Advanced Placement course. The
CSTA
has published
standards
for K12 CS and a
report
on the limited extent to which schools, districts and states provide CS instruction to their students. CS Advocacy group,
Computing in the Core
, even provides a toolkit for communities to follow as they urge legislators for integration of Computer Science education into core K12 curriculum.
All of this work has made an impact, but there is still more to do.
I see our priorities in CS education to be ones of awareness and access. As CS educators, we must continue to raise awareness about the tremendous demand for jobs in the computing sector, and balance misconceptions with accurate data. Many students, parents, teachers and administrators remember the hype and disillusionment of the Dotcom period and myths on outsourcing and dwindling jobs yet the US Bureau of Labor Statistics (BLS) reports that ⅔ of all job growth in Science and Engineering will be in Computer Science employment over the next decade. (See 2010 BLS report
here
.) Clearing up this misconception is essential if we hope to satisfy US labor needs with recent graduates over the next several years.
Source: Gianchandani, Erwin. Revisiting ‘Where the Jobs Are’. The Computing Community Consortium Blog post on 23 May 2012.
Link
accessed on 8 March 2013.
Another misconception surrounds the range of CS-focused occupations that exist. The world of CS is expanding rapidly and we should celebrate the diversity of CS applications that are gaining momentum. Instead of the archetype of a sun-starved computer scientist, or software engineers working in isolation with little teamwork or communication opportunities, educators can encourage project-based learning, video game development, robotics, and graphic design as more concrete representations for abstract computational thinking.
Google believes that computing and CS are critical to our future, not only in the high tech sector, but for everyone. Our economy is becoming more and more dependent on technology-based solutions, which will require a future workforce with significant levels of CS knowledge and experience. In addition, we anticipate new career opportunities opening up in the next 3-5 years as more businesses move into the cloud and shift the way they run their IT departments.
Help us get the word out about the great opportunities in computing through organizations such as
code.org
,
ACM
, and
NCWIT
. Google is doing its part to support CS education and outreach through many programs including
CS4HS
, our
Exploring Computational Thinking
curriculum, and several
student
and
teacher
programs. So much opportunity, so little time!
Our Commitment to Social Computing Research: Social Interactions Focused Awards Announcement
Tuesday, March 12, 2013
Ed H. Chi, Staff Research Scientist
Social interactions have always been an important part of the human experience. Social interaction research has shown results ranging from
influences on our behavior from social networks
[Aral2012] to
our understanding of social belonging on health
[Walton2011], as well as
how conflicts and coordination play out in Wikipedia
[Kittur2007]. Interestingly, social scientists have studied social interactions for many years, but it wasn’t until very recently that researchers can study these mechanisms through the explosion of services and data available on web-based social systems.
From information dissemination and the spread of innovation and ideas, to scientific discovery, we are seeing how a deep understanding of social interactions is affecting many different fields, such as health and education. For instance, scientists now have strong evidence that
social interactions underlie many fundamental learning mechanisms
starting from infancy well into adulthood [Meltzoff2009], and that
peer discussions are critical in conceptual learning in college classes
[Smith2009]. How might these learning science findings be built into social systems and products so that users maximize what they learn on the Web?
We know that interactions on the Web are diverse and people-centered. Google now enables social interactions to occur across many of our products, from
Google+
to Search to
YouTube
. To understand the future of this socially connected web, we need to investigate fundamental patterns, design principles, and laws that shape and govern these social interactions.
We envision research at the intersection of disciplines including Computer Science, Human-Computer Interaction (HCI), Social Science, Social Psychology, Machine Learning, Big Data Analytics, Statistics and Economics. These fields are central to the study of how social interactions work, particularly driven by new sources of data, for example, open data sets from Web2.0 and social media sites, government databases, crowdsourcing, new survey techniques, and crisis management data collections. New techniques from network science and computational modeling, social network and sentiment analysis, application of statistical and machine learning, as well as theories from evolutionary theory, physics, and information theory, are actively being used in social interaction research.
We’re pleased to announce that Google has awarded over $1.2 million dollars to support the Social Interactions Research Awards, which are given to university research groups doing work in social computing and interactions. Research topics range from crowdsourcing, social annotations, a social media behavioral study, social learning, conversation curation, and scientific studies of how to start online communities.
We have awarded 15 researchers in 7 universities. We selected these proposals after a rigorous internal review. We believe the results will be broadly useful to product development and will further scientific research.
Joseph Konstan, Loren Terveen, and John Riedl from University of Minnesota. Precision Crowdsourcing: Closing the Loop to turn Information Consumers into Information Contributors.
Mor Naaman from Rutgers University, and Oded Nov from Polytechnic Institute of New York University. Examining the Impact of Social Traces on Page Visitors’ Opinions and Engagement.
Paul Resnick, Eytan Adar, and Cliff Lampe from University of Michigan. MTogether: A Living Lab for Social Media Research.
Marti Hearst from UC Berkeley. Understanding Social Learning Among Subgroups Within Large Online Learning Environments.
David Karger and Rob Miller from MIT. Crowdsourced Curation of Conversations.
Robert Kraut, Laura Dabbish, Jason Hong, Aniket Kittur from CMU. Successfully Starting Online Groups.
We look forward to working with these researchers, and we hope that we will jointly push the frontier of social interactions research to the next level.
References
[1] Aral, S., & Walker, D. (2012). Identifying Influential and Susceptible Members of Social Networks. Science , 337 (6092 ), 337–341. doi:10.1126/science.1215842
[2] Walton, G. M., & Cohen, G. L. (2011). A Brief Social-Belonging Intervention Improves Academic and Health Outcomes of Minority Students. Science , 331 (6023 ), 1447–1451. doi:10.1126/science.1198364
[3] Aniket Kittur, Bongwon Suh, Bryan Pendleton, Ed H. Chi.
He Says, She Says: Conflict and Coordination in Wikipedia
. In Proc. of ACM Conference on Human Factors in Computing Systems (CHI2007), pp. 453--462, April 2007. ACM Press. San Jose, CA.
[4] Meltzoff, A. N., Kuhl, P. K., Movellan, J., & Sejnowski, T. J. (2009). Foundations for a New Science of Learning. Science , 325 (5938), 284–288. doi:10.1126/science.1175626
[5] Smith, M. K., Wood, W. B., Adams, W. K., Wieman, C., Knight, J. K., Guild, N., & Su, T. T. (2009). Why Peer Discussion Improves Student Performance on In-Class Concept Questions. Science , 323 (5910), 122–124. doi:10.1126/science.1165919
Learning from Big Data: 40 Million Entities in Context
Friday, March 08, 2013
Posted by Dave Orr, Amar Subramanya, and Fernando Pereira, Google Research
When someone mentions Mercury, are they talking about the
planet
, the
god
, the
car
, the
element
,
Freddie
, or one of some
89 other possibilities
? This problem is called
disambiguation
(a word that is itself
ambiguous
), and while it’s necessary for communication, and humans are amazingly good at it (when was the last time you confused a
fruit
with a
giant tech company
?), computers need help.
To provide that help, we are releasing the Wikilinks Corpus: 40 million total disambiguated mentions within over 10 million web pages -- over 100 times bigger than the next largest corpus (about 100,000 documents, see the table below for mention and entity counts). The mentions are found by looking for links to Wikipedia pages where the anchor text of the link closely matches the title of the target Wikipedia page. If we think of each page on Wikipedia as an entity (
an idea we’ve discussed before
), then the anchor text can be thought of as a mention of the corresponding entity.
Dataset
Number of Mentions
Number of Entities
Bentivogli et al.
(
data
) (2008)
43,704
709
Day et al.
(2008)
less than 55,000
3,660
Artiles et al.
(
data
) (2010)
57,357
300
Wikilinks Corpus
40,323,863
2,933,659
What might you do with this data? Well, we’ve already written one
ACL paper on cross-document co-reference
(and received lots of requests for the underlying data, which partly motivates this release). And really, we look forward to seeing what you are going to do with it! But here are a few ideas:
Look into
coreference
-- when different mentions mention the same entity -- or
entity resolution
-- matching a mention to the underlying entity
Work on the bigger problem of
cross-document coreference
, which is how to find out if different web pages are talking about the same person or other entity
Learn things about entities by aggregating information across all the documents they’re mentioned in
Type tagging
tries to assign types (they could be broad, like person, location, or specific, like amusement park ride) to entities. To the extent that the Wikipedia pages contain the type information you’re interested in, it would be easy to construct a training set that annotates the Wikilinks entities with types from Wikipedia.
Work on any of the above, or more, on subsets of the data. With existing datasets, it wasn’t possible to work on just musicians or chefs or train stations, because the sample sizes would be too small. But with 10 million Web pages, you can find a decent sampling of almost anything.
Gory Details
How do you actually get the data? It’s right here:
Google’s Wikilinks Corpus
. Tools and data with extra context can be found on our partners’ page:
UMass Wiki-links
. Understanding the corpus, however, is a little bit involved.
For copyright reasons, we cannot distribute actual annotated web pages. Instead, we’re providing an index of URLs, and the tools to create the dataset, or whichever slice of it you care about, yourself. Specifically, we’re providing:
The URLs of all the pages that contain labeled mentions, which are links to English Wikipedia
The anchor text of the link (the mention string), the Wikipedia link target, and the byte offset of the link for every page in the set
The byte offset of the 10 least frequent words on the page, to act as a signature to ensure that the underlying text hasn’t changed -- think of this as a version, or fingerprint, of the page
Software tools (on the
UMass site
) to: download the web pages; extract the mentions, with ways to recover if the byte offsets don’t match; select the text around the mentions as local context; and compute evaluation metrics over predicted entities.
The format looks like this:
URL http://1967mercurycougar.blogspot.com/2009_10_01_archive.html
MENTION Lincoln Continental Mark IV 40110 http://en.wikipedia.org/wiki/Lincoln_Continental_Mark_IV
MENTION 1975 MGB roadster 41481 http://en.wikipedia.org/wiki/MG_MGB
MENTION Buick Riviera 43316 http://en.wikipedia.org/wiki/Buick_Riviera
MENTION Oldsmobile Toronado 43397 http://en.wikipedia.org/wiki/Oldsmobile_Toronado
TOKEN seen 58190
TOKEN crush 63118
TOKEN owners 69290
TOKEN desk 59772
TOKEN relocate 70683
TOKEN promote 35016
TOKEN between 70846
TOKEN re 52821
TOKEN getting 68968
TOKEN felt 41508
We’d love to hear what you’re working on, and look forward to what you can do with 40 million mentions across over 10 million web pages!
Thanks to our collaborators at
UMass Amherst
:
Sameer Singh
and
Andrew McCallum
.
Labels
accessibility
ACL
ACM
Acoustic Modeling
Adaptive Data Analysis
ads
adsense
adwords
Africa
AI
Algorithms
Android
Android Wear
API
App Engine
App Inventor
April Fools
Art
Audio
Augmented Reality
Australia
Automatic Speech Recognition
Awards
Cantonese
Chemistry
China
Chrome
Cloud Computing
Collaboration
Computational Imaging
Computational Photography
Computer Science
Computer Vision
conference
conferences
Conservation
correlate
Course Builder
crowd-sourcing
CVPR
Data Center
Data Discovery
data science
datasets
Deep Learning
DeepDream
DeepMind
distributed systems
Diversity
Earth Engine
economics
Education
Electronic Commerce and Algorithms
electronics
EMEA
EMNLP
Encryption
entities
Entity Salience
Environment
Europe
Exacycle
Expander
Faculty Institute
Faculty Summit
Flu Trends
Fusion Tables
gamification
Gboard
Gmail
Google Accelerated Science
Google Books
Google Brain
Google Cloud Platform
Google Docs
Google Drive
Google Genomics
Google Maps
Google Photos
Google Play Apps
Google Science Fair
Google Sheets
Google Translate
Google Trips
Google Voice Search
Google+
Government
grants
Graph
Graph Mining
Hardware
HCI
Health
High Dynamic Range Imaging
ICLR
ICML
ICSE
Image Annotation
Image Classification
Image Processing
Inbox
India
Information Retrieval
internationalization
Internet of Things
Interspeech
IPython
Journalism
jsm
jsm2011
K-12
KDD
Keyboard Input
Klingon
Korean
Labs
Linear Optimization
localization
Low-Light Photography
Machine Hearing
Machine Intelligence
Machine Learning
Machine Perception
Machine Translation
Magenta
MapReduce
market algorithms
Market Research
Mixed Reality
ML
MOOC
Moore's Law
Multimodal Learning
NAACL
Natural Language Processing
Natural Language Understanding
Network Management
Networks
Neural Networks
Nexus
Ngram
NIPS
NLP
On-device Learning
open source
operating systems
Optical Character Recognition
optimization
osdi
osdi10
patents
Peer Review
ph.d. fellowship
PhD Fellowship
PhotoScan
Physics
PiLab
Pixel
Policy
Professional Development
Proposals
Public Data Explorer
publication
Publications
Quantum AI
Quantum Computing
renewable energy
Research
Research Awards
resource optimization
Robotics
schema.org
Search
search ads
Security and Privacy
Semantic Models
Semi-supervised Learning
SIGCOMM
SIGMOD
Site Reliability Engineering
Social Networks
Software
Speech
Speech Recognition
statistics
Structured Data
Style Transfer
Supervised Learning
Systems
TensorBoard
TensorFlow
TPU
Translate
trends
TTS
TV
UI
University Relations
UNIX
User Experience
video
Video Analysis
Virtual Reality
Vision Research
Visiting Faculty
Visualization
VLDB
Voice Search
Wiki
wikipedia
WWW
YouTube
Archive
2018
May
Apr
Mar
Feb
Jan
2017
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Jul
May
Apr
Mar
Feb
2007
Oct
Sep
Aug
Jul
Jun
Feb
2006
Dec
Nov
Sep
Aug
Jul
Jun
Apr
Mar
Feb
Feed
Google
on
Follow @googleresearch
Give us feedback in our
Product Forums
.