July 11, 2017

Archives for November 2014

How do we decide how much to reveal? (Hint: Our privacy behavior might be socially constructed.)

[Let’s welcome Aylin Caliskan-Islam, a graduate student at Drexel. In this post she discusses new work that applies machine learning and natural-language processing to questions of privacy and social behavior. — Arvind Narayanan.]

How do we decide how much to share online given that information can spread to millions in large social networks? Is it always our own decision or are we influenced by our friends? Let’s isolate this problem to one variable, private information. How much private information are we sharing in our posts and are we the only authority controlling how much private information to divulge in our textual messages? Understanding how privacy behavior is formed could give us key insights for choosing our privacy settings, friends circles, and how much privacy to sacrifice in social networks. Christakis and Fowler’s network analytics study showed that obesity spreads through social ties. In another study, they explain that smoking cessation is a collective behavior. Our intuition before analyzing end users’ privacy behavior was that privacy behavior might also be under the effect of network phenomena.

In a recent paper that appeared at the 2014 Workshop on Privacy in the Electronic Society, we present a novel method for quantifying privacy behavior of users by using machine learning classifiers and natural-language processing techniques including topic categorization, named entity recognition, and semantic classification. Following the intuition that some textual data is more private than others, we had Amazon Mechanical Turk workers label tweets of hundreds of users as private or not based on nine privacy categories that were influenced by Wang et al.’s Facebook regrets categories and Sleeper et al.’s Twitter regrets categories. These labels were used to associate a privacy score with each user to reflect the amount of private information they reveal. We trained a machine learning classifier based on the calculated privacy scores to predict the privacy scores of 2,000 Twitter users whose data were collected through the Twitter API.
[Read more…]

Let’s Encrypt: Bringing HTTPS to Every Web Site

HTTPS, the cryptographic protocol used to secure web traffic as it travels across the Internet, has been in the news a lot recently. We’ve heard about security problems like Goto Fail, Heartbleed, and POODLE — vulnerabilities in the protocol itself or in specific implementations — that resulted in major security headaches. Yet the single biggest problem with HTTPS is that not enough sites use it. More than half of popular sites — and a much larger fraction of sites overall — still use old-fashioned HTTP, which provides no cryptographic protection whatsoever. As a result, these sites and their users are vulnerable to eavesdropping and manipulation by a range of threat vectors, from compromised WiFi access points to state-level mass surveillance. When deployed correctly, HTTPS defends against all these attacks.

Why don’t more sites use HTTPS? The major obstacle is that it’s too difficult for web sites to set up and maintain. Switching to HTTPS involves purchasing a digital certificate (a cryptographic statement that your domain name belongs to you) from a “certificate authority,” an identity-checking organization that users’ browsers are programmed to trust. This process involves a long series of manual steps, as well as fees that range from tens to hundreds of dollars a year. Site operators must also navigate a complicated process to generate crypto keys, validate the site’s identity, retrieve a certificate, and configure their server to use it. These steps, which have to be repeated every year or so when the certificate expires, are also prone to human error, with the result that a substantial fraction of all HTTPS sites have configuration problems that jeopardize their security.

For the past two years, I’ve been working with a talented group of people to do something about these problems. My student James Kasten and I joined forces with Peter Eckersley and Seth Schoen from EFF and Eric Rescorla, Josh Aas, and Richard Barnes from Mozilla. Our goal is to remove the barriers to deploying HTTPS and see an encrypted web completely replace unencrypted HTTP.

Today, we’re announcing Let’s Encrypt, a new certificate authority we’re creating that will begin operation in Summer 2015. What makes Let’s Encrypt different is that it takes the pain out of switching to HTTPS. Web site operators simply install a small piece of software that takes care of the entire process. This software interacts with Let’s Encrypt to validate the server’s identity, obtain a certificate, securely configure the server to use HTTPS, and automatically renew the certificate when necessary. With Let’s Encrypt, one click or one command is all it will take for a site to deploy HTTPS.
[Read more…]

PCLOB testimony on "Defining Privacy"

This morning I’m testifying at a hearing of the Privacy and Civil Liberties Oversight Board, on the topic of “Defining Privacy”. Here is the text of my oral testimony. (This is the text as prepared; there might be minor deviations when I deliver it.) [Update (Nov. 16): video stream of my panel is now available.]
[Read more…]