Google Research Blog
The latest news from Research at Google
Research from VLDB 2016: Improved Friend Suggestion using Ego-Net Analysis
Thursday, September 15, 2016
Posted by Alessandro Epasto, Research Scientist, Google Research NY
On September 5 - 9, New Delhi, India hosted the
42nd International Conference on Very Large Data Bases
(VLDB), a premier annual forum for academic and industry research on databases, data management, data mining and data analytics. Over the past several years, Google has actively participated in VLDB, both as official sponsor and with numerous contributions to the research and industrial tracks. In this post, we would like to share the research presented in one of the Google papers from VLDB 2016.
In
Ego-net Community Mining Applied to Friend Suggestion
, co-authored by Googlers
Silvio Lattanzi
,
Vahab Mirrokni
, Ismail Oner Sebe,
Ahmed Taei,
Sunita Verma and
myself,
we explore how social networks can provide better friend suggestions to users, a challenging practical problem faced by all social network platforms
Friend suggestion – the task of suggesting to a user the contacts she might already know in the network but that she hasn’t added yet – is major driver of user engagement and social connection in all online social networks. Designing a high quality system that can provide relevant and useful friend recommendations is very challenging, and requires state-of-the-art machine learning algorithms based on a multitude of parameters.
An effective family of features for friend suggestion consist of
graph
features such as the
number of common friends
between two users. While widely used, the number of common friends has some major drawbacks, including the following which is shown in Figure 1.
Figure 1: Ego-net of Sally.
In this figure we represent the social connections of Sally and her friends – the
ego-net
of Sally. An ego-net of a node (in this case, Sally) is defined as the graph that contains the node itself, all of the node’s neighbors and the connection among
those
nodes. Sally has 6 friends in her ego-net:
A
lbert (her husband),
B
rian (her son),
C
harlotte (her mother) as well as
U
ma (her boss),
V
incent and
W
ally (two of her team members). Notice how
A
,
B
and
C
are all connected with each other while they do not know
U
,
V
or
W
. On the other hand
U,
V
and
W
have all added each other as their friend (except
U
and
W
who are good friend but somehow forgot to add each other).
Notice how each of
A
,
B
,
C
have a common friend with each of
U
,
V
and
W
: Sally herself. A friend recommendation system based on common neighbors might suggest to Sally’s son (for instance) to add Sally’s boss as his friend! In reality the situation is even more complicated because users’ online and offline friends span several different social circles or communities (family, work, school, sports, etc).
In our paper we introduce a novel technique for friend suggestions based on independently analyzing the ego-net structure. The main contribution of the paper is to show that it is possible to provide friend suggestions efficiently by constructing all ego-nets of the nodes in the graph and then independently applying community detection algorithms on them in large-scale
distributed systems
.
Specifically, the algorithm proceeds by constructing the ego-nets of all nodes and applying, independently on each of them, a community detection algorithm. More precisely the algorithm operates on so-called “ego-net-minus-ego” graphs, which is defined as the graph including only the neighbors of a given node, as shown in the figure below.
Figure 2: Clustering of the ego-net of Sally.
Notice how in this example the ego-net-minus-ego of Sally has two very clear communities: her family (
A
,
B
,
C
) and her co-workers (
U
,
V
,
W
) which are easily separated. Intuitively, this is because one might expect that while nodes (e.g. Sally) participate in many communities, there is usually a single (or a limited number of) contexts in which two specific neighbors interact. While Sally is both part of her family and work community, Sally and Uma interact
only
at work. Through extensive experimental evaluation on large-scale public social networks and formally through a simple mathematical model, our paper confirms this intuition. It seems that while communities are hard to separate in a global graph, they are easier to identify at the local level of ego-nets.
This allows for a novel graph-based method for friend suggestion which intuitively only allows suggestion of pairs of users that are clustered together in the same community from the point of view of their common friends. With this method,
U
and
W
will be suggested to add each other (as they are in the same community and they are not yet connected) while
B
and
U
will
not
be suggested as friends as they span two different communities.
From an algorithmic point of view, the paper introduces efficient parallel and distributed techniques for computing and clustering all ego-nets of very large graphs at the same time – a fundamental aspect enabling use of the system on the entire world Google+ graph. We have applied this feature in the “You May Know” system of Google+, resulting in a clear positive impact on the prediction task, improving the acceptance rate by more than 1.5% and decreasing the rejection rate by more than 3.3% (a significative impact at Google scales).
We believe that many future directions of work might stem from our preliminary results. For instance ego-net analysis could be potentially to automatically classify a user contacts in circles and to detect spam. Another interesting direction is the study of ego-network evolution in dynamic graphs.
VLDB 2015 and Database Research at Google
Monday, August 31, 2015
Posted by Corinna Cortes, Head of Google Research NY and Cong Yu, Research Scientist
This week, Kohala, Hawaii hosts the
41st International Conference of Very Large Databases
(VLDB 2015), a premier annual international forum for data management and database researchers, vendors, practitioners, application developers and users. As a leader in Database research, Google will have a strong presence at VLDB 2015 with many Googlers publishing work, organizing workshops and presenting demos.
The research Google is presenting at VLDB involves the work of Structured Data teams who are building intelligent and efficient systems to discover, annotate and explore structured data from the Web, surfacing them creatively through Google products (such as
structured snippets
and
table search
), as well as engineering efforts that create scalable, reliable, fast and general-purpose infrastructure for large-scale data processing (such as
F1
,
Mesa
, and Google Cloud's
BigQuery
).
If you are attending VLDB 2015, we hope you’ll stop by our booth and chat with our researchers about the projects and opportunities at Google that go into solving interesting problems for billions of people. You can also learn more about our research being presented at VLDB 2015 in the list below (Googlers highlighted in
blue
).
Google is a Gold Sponsor of VLDB 2015.
Papers:
Keys for Graphs
Wenfei Fan, Zhe Fan, Chao Tian,
Xin Luna Dong
In-Memory Performance for Big Data
Goetz Graefe, Haris Volos, Hideaki Kimura, Harumi Kuno, Joseph Tucek, Mark Lillibridge,
Alistair Veitch
The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing
Tyler Akidau
,
Robert Bradshaw
,
Craig Chambers
,
Slava Chernyak
,
Rafael Fernández-Moctezuma
,
Reuven Lax
,
Sam McVeety
,
Daniel Mills
,
Frances Perry
,
Eric Schmidt
,
Sam Whittle
Resource Bricolage for Parallel Database Systems
Jiexing Li
, Jeffrey Naughton, Rimma Nehme
AsterixDB: A Scalable, Open Source BDMS
Sattam Alsubaiee, Yasser Altowim, Hotham Altwaijry, Alex Behm, Vinayak Borkar, Yingyi Bu, Michael Carey, Inci Cetindil,
Madhusudan Cheelangi
, Khurram Faraaz, Eugenia Gabrielova, Raman Grover, Zachary Heilbron, Young-Seok Kim, Chen Li, Guangqiang Li, Ji Mahn Ok, Nicola Onose, Pouria Pirzadeh, Vassilis Tsotras, Rares Vernica, Jian Wen, Till Westmann
Knowledge-Based Trust: A Method to Estimate the Trustworthiness of Web Sources
Xin Luna Dong
,
Evgeniy Gabrilovich
,
Kevin Murphy
,
Van Dang
,
Wilko Horn
,
Camillo Lugaresi
,
Shaohua Sun
,
Wei Zhang
Efficient Evaluation of Object-Centric Exploration Queries for Visualization
You Wu,
Boulos Harb
, Jun Yang,
Cong Yu
Interpretable and Informative Explanations of Outcomes
Kareem El Gebaly, Parag Agrawal, Lukasz Golab,
Flip Korn
, Divesh Srivastava
Take me to your leader! Online Optimization of Distributed Storage Configurations
Artyom Sharov,
Alexander Shraer
,
Arif Merchant
,
Murray Stokely
TreeScope: Finding Structural Anomalies In Semi-Structured Data
Shanshan Ying,
Flip Korn
, Barna Saha, Divesh Srivastava
Workshops:
Workshop on Big-Graphs Online Querying - Big-O(Q) 2015
Workshop co-chair:
Cong Yu
3rd International Workshop on In-Memory Data Management and Analytics
Program committee includes:
Sandeep Tata
High-Availability at Massive Scale: Building Google's Data Infrastructure for Ads
Invited talk at BIRTE by:
Ashish Gupta
,
Jeff Shute
Demonstrations:
KATARA: Reliable Data Cleaning with Knowledge Bases and Crowdsourcing
Xu Chu, John Morcos, Ihab Ilyas, Mourad Ouzzani, Paolo Papotti, Nan Tang,
Yin Ye
Error Diagnosis and Data Profiling with Data X-Ray
Xiaolan Wang, Mary Feng, Yue Wang,
Xin Luna Dong
, Alexandra Meliou
Labels
accessibility
ACL
ACM
Acoustic Modeling
Adaptive Data Analysis
ads
adsense
adwords
Africa
AI
Algorithms
Android
Android Wear
API
App Engine
App Inventor
April Fools
Art
Audio
Augmented Reality
Australia
Automatic Speech Recognition
Awards
Cantonese
Chemistry
China
Chrome
Cloud Computing
Collaboration
Computational Imaging
Computational Photography
Computer Science
Computer Vision
conference
conferences
Conservation
correlate
Course Builder
crowd-sourcing
CVPR
Data Center
Data Discovery
data science
datasets
Deep Learning
DeepDream
DeepMind
distributed systems
Diversity
Earth Engine
economics
Education
Electronic Commerce and Algorithms
electronics
EMEA
EMNLP
Encryption
entities
Entity Salience
Environment
Europe
Exacycle
Expander
Faculty Institute
Faculty Summit
Flu Trends
Fusion Tables
gamification
Gboard
Gmail
Google Accelerated Science
Google Books
Google Brain
Google Cloud Platform
Google Docs
Google Drive
Google Genomics
Google Maps
Google Photos
Google Play Apps
Google Science Fair
Google Sheets
Google Translate
Google Trips
Google Voice Search
Google+
Government
grants
Graph
Graph Mining
Hardware
HCI
Health
High Dynamic Range Imaging
ICLR
ICML
ICSE
Image Annotation
Image Classification
Image Processing
Inbox
India
Information Retrieval
internationalization
Internet of Things
Interspeech
IPython
Journalism
jsm
jsm2011
K-12
KDD
Keyboard Input
Klingon
Korean
Labs
Linear Optimization
localization
Low-Light Photography
Machine Hearing
Machine Intelligence
Machine Learning
Machine Perception
Machine Translation
Magenta
MapReduce
market algorithms
Market Research
Mixed Reality
ML
MOOC
Moore's Law
Multimodal Learning
NAACL
Natural Language Processing
Natural Language Understanding
Network Management
Networks
Neural Networks
Nexus
Ngram
NIPS
NLP
On-device Learning
open source
operating systems
Optical Character Recognition
optimization
osdi
osdi10
patents
Peer Review
ph.d. fellowship
PhD Fellowship
PhotoScan
Physics
PiLab
Pixel
Policy
Professional Development
Proposals
Public Data Explorer
publication
Publications
Quantum AI
Quantum Computing
renewable energy
Research
Research Awards
resource optimization
Robotics
schema.org
Search
search ads
Security and Privacy
Semantic Models
Semi-supervised Learning
SIGCOMM
SIGMOD
Site Reliability Engineering
Social Networks
Software
Speech
Speech Recognition
statistics
Structured Data
Style Transfer
Supervised Learning
Systems
TensorBoard
TensorFlow
TPU
Translate
trends
TTS
TV
UI
University Relations
UNIX
User Experience
video
Video Analysis
Virtual Reality
Vision Research
Visiting Faculty
Visualization
VLDB
Voice Search
Wiki
wikipedia
WWW
YouTube
Archive
2018
May
Apr
Mar
Feb
Jan
2017
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Jul
May
Apr
Mar
Feb
2007
Oct
Sep
Aug
Jul
Jun
Feb
2006
Dec
Nov
Sep
Aug
Jul
Jun
Apr
Mar
Feb
Feed
Google
on
Follow @googleresearch
Give us feedback in our
Product Forums
.