Research Blog: March 2013

Education Awards on Google App Engine

Wednesday, March 27, 2013

Posted by Andrea Held, Google University RelationsCross-posted with Google Developers BloginvitedGoogle App Engine Education AwardJohn David N. DionisioProject descriptionXiaohui (Helen) GuProject descriptionAdvanced Distributed Systems ClassShriram KrishnamurthiProject descriptionWeSchemeFeifei LiProject descriptionMark LiffitonProject descriptionTeacherTapEni MustafarajProject descriptionManish ParasharProject descriptionOrit ShaerProject descriptionGreenTouchElliot SolowayProject descriptionJonathan WhiteProject descriptionDr. Jiaofei ZhongProject description

Scaling Computer Science Education

Wednesday, March 13, 2013

Posted by Maggie Johnson, Director of Education and University RelationsSIGCSENSFCS PrinciplesCSTAstandardsreportComputing in the Corehere

Source: Gianchandani, Erwin. Revisiting ‘Where the Jobs Are’. The Computing Community Consortium Blog post on 23 May 2012. Link accessed on 8 March 2013.

code.orgACMNCWITCS4HSExploring Computational Thinkingstudentteacher

Our Commitment to Social Computing Research: Social Interactions Focused Awards Announcement

Tuesday, March 12, 2013

Ed H. Chi, Staff Research Scientistinfluences on our behavior from social networksour understanding of social belonging on healthhow conflicts and coordination play out in Wikipediasocial interactions underlie many fundamental learning mechanismspeer discussions are critical in conceptual learning in college classesGoogle+YouTube

Joseph Konstan, Loren Terveen, and John Riedl from University of Minnesota. Precision Crowdsourcing: Closing the Loop to turn Information Consumers into Information Contributors.

Mor Naaman from Rutgers University, and Oded Nov from Polytechnic Institute of New York University. Examining the Impact of Social Traces on Page Visitors’ Opinions and Engagement.

Paul Resnick, Eytan Adar, and Cliff Lampe from University of Michigan. MTogether: A Living Lab for Social Media Research.

Marti Hearst from UC Berkeley. Understanding Social Learning Among Subgroups Within Large Online Learning Environments.

David Karger and Rob Miller from MIT. Crowdsourced Curation of Conversations.

Robert Kraut, Laura Dabbish, Jason Hong, Aniket Kittur from CMU. Successfully Starting Online Groups.

References[1] Aral, S., & Walker, D. (2012). Identifying Influential and Susceptible Members of Social Networks. Science , 337 (6092 ), 337–341. doi:10.1126/science.1215842[2] Walton, G. M., & Cohen, G. L. (2011). A Brief Social-Belonging Intervention Improves Academic and Health Outcomes of Minority Students. Science , 331 (6023 ), 1447–1451. doi:10.1126/science.1198364[3] Aniket Kittur, Bongwon Suh, Bryan Pendleton, Ed H. Chi. He Says, She Says: Conflict and Coordination in Wikipedia. In Proc. of ACM Conference on Human Factors in Computing Systems (CHI2007), pp. 453--462, April 2007. ACM Press. San Jose, CA.[4] Meltzoff, A. N., Kuhl, P. K., Movellan, J., & Sejnowski, T. J. (2009). Foundations for a New Science of Learning. Science , 325 (5938), 284–288. doi:10.1126/science.1175626[5] Smith, M. K., Wood, W. B., Adams, W. K., Wieman, C., Knight, J. K., Guild, N., & Su, T. T. (2009). Why Peer Discussion Improves Student Performance on In-Class Concept Questions. Science , 323 (5910), 122–124. doi:10.1126/science.1165919

Learning from Big Data: 40 Million Entities in Context

Friday, March 08, 2013

Posted by Dave Orr, Amar Subramanya, and Fernando Pereira, Google ResearchplanetgodcarelementFreddie89 other possibilitiesdisambiguationambiguousfruitgiant tech company

an idea we’ve discussed before

Dataset

Number of Mentions

Number of Entities

Bentivogli et al. (data) (2008)

43,704

709

Day et al. (2008)

less than 55,000

3,660

Artiles et al. (data) (2010)

57,357

300

Wikilinks Corpus

40,323,863

2,933,659

ACL paper on cross-document co-reference

Look into coreference -- when different mentions mention the same entity -- or entity resolution -- matching a mention to the underlying entity

Work on the bigger problem of cross-document coreference, which is how to find out if different web pages are talking about the same person or other entity

Learn things about entities by aggregating information across all the documents they’re mentioned in

Type tagging tries to assign types (they could be broad, like person, location, or specific, like amusement park ride) to entities. To the extent that the Wikipedia pages contain the type information you’re interested in, it would be easy to construct a training set that annotates the Wikilinks entities with types from Wikipedia.

Work on any of the above, or more, on subsets of the data. With existing datasets, it wasn’t possible to work on just musicians or chefs or train stations, because the sample sizes would be too small. But with 10 million Web pages, you can find a decent sampling of almost anything.

Gory DetailsGoogle’s Wikilinks CorpusUMass Wiki-links

The URLs of all the pages that contain labeled mentions, which are links to English Wikipedia

The anchor text of the link (the mention string), the Wikipedia link target, and the byte offset of the link for every page in the set

The byte offset of the 10 least frequent words on the page, to act as a signature to ensure that the underlying text hasn’t changed -- think of this as a version, or fingerprint, of the page

Software tools (on the UMass site) to: download the web pages; extract the mentions, with ways to recover if the byte offsets don’t match; select the text around the mentions as local context; and compute evaluation metrics over predicted entities.

URL http://1967mercurycougar.blogspot.com/2009_10_01_archive.htmlMENTION Lincoln Continental Mark IV 40110 http://en.wikipedia.org/wiki/Lincoln_Continental_Mark_IVMENTION 1975 MGB roadster 41481 http://en.wikipedia.org/wiki/MG_MGBMENTION Buick Riviera 43316 http://en.wikipedia.org/wiki/Buick_RivieraMENTION Oldsmobile Toronado 43397 http://en.wikipedia.org/wiki/Oldsmobile_ToronadoTOKEN seen 58190TOKEN crush 63118TOKEN owners 69290TOKEN desk 59772TOKEN relocate 70683TOKEN promote 35016TOKEN between 70846TOKEN re 52821TOKEN getting 68968TOKEN felt 41508

UMass AmherstSameer SinghAndrew McCallum

Google Research Blog

Education Awards on Google App Engine

Scaling Computer Science Education

Our Commitment to Social Computing Research: Social Interactions Focused Awards Announcement

Learning from Big Data: 40 Million Entities in Context

Labels

Archive

Feed

Company-wide

Products

Developers