January 9, 2017

NYC to Collect GPS Data on Car Service Passengers—Good Intentions Gone Awry or Something Else?

During the holiday season, New York City through its Taxi & Limousine Commission (the “TLC”) proposed a new rule expanding data reporting obligations for car service platform companies including Uber and Lyft. If the rule is adopted, car services will now have to report the GPS coordinates of both passenger pick-up and drop-off locations to the city government. Under NY’s Freedom of Information Law, that data in bulk will also be subject to full public release.

This proposal is either a classic case of good intentions gone awry or a clandestine effort to track millions of car service riders while riding roughshod over passenger privacy.

The stated justification for the new rule is to combat “driver fatigue” and improve car service safety. While the goal is laudable and important, the proposed data collection does not match the purpose and makes no sense. Does anyone really think GPS data measures a driver’s hours on the job or is relevant for the calculation of a trip’s duration? If the data collection were really designed to address driver fatigue, then the relevant data would be shift length (driver start/stop times, ride durations, possibly trip origination), not pick up/drop off locations.

The reporting, though, of this GPS data to the city government poses a real and serious threat to passenger privacy. The ride patterns can be mined to identify specific individuals and where they travel. In 2014, for instance, The Guardian reported that the TLC released anonymized taxi ride data that was readily reverse engineered to identify drivers. A 2015 paper shows that mobility patterns can also be used to identify gender and ethnicity. Numerous examples—from the Netflix release of subscriber film ratings  that were reverse engineered to identify subscribers to the re-identification of patients from supposedly anonymous health records—show that bulk data can often be identified to specific individuals. Disturbingly, the TLC proposal only makes one innocuous reference to protecting “privacy and confidentiality” and yet includes neither any privacy safeguards against identification of individual passengers from ride patterns nor any exemption from the NY State Freedom of Information Law.

If this weren’t worrisome enough for privacy, here’s the flashing red light. The TLC proposal mentions in passing that the data might be useful for “other enforcement actions.” But, the examples given for “other enforcement actions” do not map to the data being collected. For instance, the proposal says the GPS data “will facilitate investigating passenger complaints or complaints from a pedestrian or other motorist about unsafe driving, including for incidents alleged to have occurred during or between trips, by allowing TLC to determine the location of a vehicle at a particular time.” The pick-up and drop-off locations will not work for this goal. Likewise, the proposal says that “[b]y understanding when for-hire trips to and from the airports occur TLC can better target resources to ensure that passengers are picked up at the airport only by drivers authorized to do so.” This too is a strange justification to collect individual passenger records for every ride throughout the city! This goal would be satisfied much more effectively by seeking aggregate drop-off data for the particular areas of concern to the TLC.

This vague enforcement language and the mismatch between the proposal and the articulated goals strongly suggests that the rule may be a smokescreen for a new mass surveillance program of individuals traveling within New York City. Only two years ago, the NY Police Department was caught deploying a controversial program to track cars throughout the city using EZ Pass readers on traffic lights. This proposed new rule looks like a surreptitious expansion of that program to car service passengers. The TLC rule, if adopted, would provide a surveillance data trove that makes an end run around judicial oversight, subpoenas, and warrants.

It’s time to put the brakes on the city’s collection of trip location data for car service rides.

Disrupting The Business Model of the Fake News Industry

By Katherine Haenschen & Paul Ellenbogen 

In the aftermath of the 2016 election, researchers and media professionals alike seized on the vast proliferation of so-called “Fake News” on Facebook as a cause for concern. An informed citizenry is a necessary condition for democracy, so it is far from ideal to have millions of people consuming intentionally misleading information masquerading as hard news. Now that Facebook has admitted that it has a problem with Fake News, Mark Zuckerberg and Co. need to do even more to prevent its spread on the platform. We propose one solution: Facebook should block advertising links to Fake News websites and Fake News pages on the Facebook platform itself.

When we talk about Fake News, we’re referring to websites that intentionally and knowingly publish factually untrue content intended to masquerade as traditional “hard news.”  Individuals may choose to publish Fake News for political reasons, such as seeking to impact voting decisions. However, there is also a profit motive behind Fake News: publishers can make big money from advertising revenue that results from traffic to their sites. The Fake News business model utilizes Facebook’s paid features to gain readers and build audiences. Facebook offers a variety of advertising options for individuals looking to reach its 156 million users in the United States, including newsfeed and sidebar ads and promoted posts from pages. It is Facebook’s advertising features that should be rendered unavailable for the paid promotion of Fake News to users.

There is precedent for calling on Facebook to block Fake News from being advertised directly to its users: Facebook already bans certain kinds of ads on the platform such as those promoting dietary supplements and “controversial content.” Additionally, Facebook announced that it will stop placing ads on third-party Fake News websites. Now, we are calling on Facebook to ban Fake News from being advertised and promoted on the Facebook platform itself. Facebook should apply this type of ban even if it hurts Facebook’s own revenue.

There’s Big Money In Fake News

Media stories about Fake News producers emphasize the tremendous profits to be made in publishing knowingly false information and helping it to “go viral” on Facebook. Display ad revenue generated by Fake News sites can reach $10,000 to $30,000 per month. One Macedonian teen who publishes Fake News sites told Buzzfeed that advertising revenue could reach thousands of dollars per day or week. And though Google and Facebook have blocked the sites from their third-party advertising platforms, the Fake News publishers also note that there is no shortage of advertising networks willing to display ads on their websites. The reach of these articles amounts to millions of Facebook shares and clicks to the website, in turn generating millions of ad impressions, according to Buzzfeed.

The same articles about Fake News sites indicate that publishers are not bothered by the potential impact of sharing incorrect information. A Macedonian teen who operated multiple sites admitted that his content was “bad, false, and misleading,” and that he was motivated by the advertising revenue generated by his Fake News site. Therefore, if Facebook wants to curtail the proliferation of Fake News on its website, it should disrupt its business model using tools that are already at the platform’s disposal.

The Fake News Business Model

Multiple news articles have referenced Fake News producers’ use of Facebook advertising features to promulgate their posts. Here’s how the business model for Fake News works, with each step in the process illustrated by the diagram below:

  1. An individual publishes false information on a Fake News website, then pays to advertise a link to the post in Facebook users’ newsfeeds.
  2. Facebook profits from advertising on its platform, earning money for every person who clicks the link or every 1,000 users who see the ads.
  3. Facebook users click on the advertised links and go to the Fake News website, generating an impression for each display ad on the website.
  4. The Fake News site earns revenue from the resulting advertising impressions, which amount to millions of page views and tens of thousands of dollars per month.

Publishers must have a Facebook page to run newsfeed ads. As such, an ancillary cycle exists in which Fake News Publishers can promote this page to gain fans and organic traffic.

  1. Fake News producers advertise their Page to fans, growing an organic Facebook audience to whom they can share links at no cost.
  2. Fans can share these links to their own Facebook networks, furthering the organic reach of Fake News. This is how something “goes viral.”

As long as the cost of the Facebook ads that promote the posts is lower than the display ad revenue from the resulting clicks, the business model above will generate net income for the Fake News producer.

Cut Off Paid Features for Fake News

Our solution is simple: Facebook needs to deny the use of paid features by pages that promote Fake News. This means Fake News pages should not be able to run newsfeed and sidebar ads, promote page posts, or market their Facebook page to gain fans. Furthermore, Facebook should block any third-party attempts to advertise links to Fake News sites. Currently, any individual with a Facebook account can create a public page and use it to run ads for Fake News stories in Facebook users’ newsfeeds. Banning all advertising links to Fake News sites would prevent publishers from setting up new and deceptive Facebook pages for the purpose of advertising.

Facebook has already taken action to limit its role in directly funding Fake News. The platform cut off advertising on — but not leading to — Fake News websites, as has Google’s AdSense network. However, the Facebook advertising platform can still be used to drive traffic to these sites and fuel the cycle detailed above. Even if it does ban the use of advertising outbound links to Fake News sites, Facebook will still need to grapple with the size of Fake News pages, some of which surpass 700,000 fans and have a tremendous potential for organic reach. Our purpose is not to weigh in on that argument, but simply to point out a simple step Facebook can take that is consistent with its external ad placement on third-party sites.

Facebook Already Bans Certain Advertisers

Furthermore, a ban on the use of Facebook’s advertising features by Fake News sites would be in keeping with existing rules pertaining to the kinds of advertisers that can use the platform to reach users. For example, Facebook restricts the advertising of unsafe supplements at its “sole discretion,” such as various diet aids and performance-enhancing substances. Other prohibited content includes “controversial content,” which is defined as “…content that exploits controversial political or social issues for commercial purposes.”

Given that Fake News producers are open about their profit motivations, their use of Facebook advertisements to drive traffic should be considered a commercial purpose rather than a political purpose. As such, Facebook should use its existing rules to draw a line between political content and commercial content. If it fails to do so, unscrupulous individuals could start dressing up their questionable advertisements as political speech — Donald Trump Diet Pills, anyone? They’ll make your waistline great again!

Distinguishing between Fake News and news from reputable outlets is something that Facebook is already committed to doing now that it has pledged to pull advertisements from Fake News sites in the Facebook Audience Network program. Facebook could use the same criteria that it uses in the Audience Network on its advertising platform. While we are not proposing a heuristic to determine what is Fake News and what is merely an opinion piece devoid of factual content, we suggest that Facebook apply the same rules for banned third-party sites to advertisements on the platform for those very same sites.

Fight Fake News, Or Else Everyone Gets Played

It isn’t clear how Facebook’s long-term interests are served by enabling Fake News to market to its users, essentially creating a back door around its own advertising policies. Facebook makes money from advertisements for Fake News, but in the long term it may come to hurt Facebook, with suspicion and lost goodwill outweighing earnings from this category of advertisements. If Facebook chooses to regulate Fake News as political speech, Zuckerberg et al. are setting themselves up to be useful idiots for websites trying to make a quick buck off sensationalist and false stories.

As for the users, they are being intentionally misled with incorrect articles about political actors, which have the potential to impact issue awareness and candidate choice. At worst, people are basing their vote on misinformation-for-profit. At best, users may be getting quick entertainment out of these links (if they recognize them as false), but for the most part it seems like the Fake News operators are getting the benefit of the arrangement. Removing paid advertisements for these sites from users’ Facebook newsfeeds is not going to negatively impact their lives. Furthermore, these individuals remain free to like the Facebook pages for Fake News sites and share their posts organically with friends.

We are merely proposing that Facebook cut off the use of its paid features to promote links to Fake News to wider audiences, in accordance with its existing advertising policies. Advertisements for Fake News should be regulated like ads for “controversial content” and dietary supplements. This would cut off one stream of revenue for these Fake News websites, forcing them to gain traffic from Facebook entirely through organic reach. Failure to ban this type of advertising would suggest that Facebook values its own revenue over the need to curtail bad actors who are using its platform to intentionally spread misinformation harmful to our democratic society.

Announcing the Open Review Toolkit

I’m happy to announce the release of the Open Review Toolkit, open source software that enables you to convert your book manuscript into a website that can be used for Open Review. During the Open Review process everyone can read and annotate your manuscript, and you can collect valuable data to help launch your book. The goals of the Open Review process are better books, higher sales, and increased access to knowledge. In an earlier post, I described some of the helpful feedback that I’ve received during the Open Review of my book Bit by Bit: Social Research in the Digital Age.  Now, in this post I’ll describe more about the Open Review Toolkit—which has been generously supported by a grant from the Alfred P. Sloan Foundation—and how you can use it for your book.

As described on the project’s website, the Open Review Toolkit is a set of open source scripts that you can download and use to convert your manuscript to an Open Review website. One way to think about it is that the Open Review Toolkit is the plumbing that ties together four outstanding projects: Hypothes.is, Pandoc, Google Analytics, and Google Forms. Full technical details and all the code are available from the Open Review Toolkit GitHub repository, but here’s an overview.

The build process that converts a manuscript into an Open Review website is codified in a single Makefile and has three primary steps:

  1. Pandoc converts the book manuscript into a single HTML file.
  2. A set of custom scripts enrich the single HTML (e.g., with richer information about each citation) and then split the single HTML file into a bunch of different HTML files, one for each section of the book.
  3. Middleman uses those HTML files and some custom templates to create the Open Review website, which is a static HTML website.

Step 1

Pandoc converts the book manuscript into a single HTML file. Currently, the only supported input format for this first step is Markdown. In other words, at this time, your manuscript must be written in Markdown. However, Pandoc supports a variety of formats as inputs, and in the future we hope to add support for additional input formats, such as LaTeX and Word. If you’d like to help build support for additional input formats, please get in touch.

Step 2

The custom scripts enrich and split the HTML output from Pandoc. First, an enrichment script adds information to each citation. In the future, additional enrichments could also be added at this step. Next, the splitting script splits the single HTML file into one file for each section of the book. These sections are then placed in directory structure that reflects to hierarchy of the sections in the manuscript. This splitting script also creates a JSON file that includes metadata about the manuscript structure. This JSON metadata file that allows the Middleman build process to create things such as the table of contents and previous / next page links between sections.

Step 3

Middleman builds the Open Review website, which is a static HTML website. The Middleman project lives inside the website/ directory. This project is pre-populated with existing layouts that include Google Analytics, Hypothes.is, and navigational elements for the site. This is also where pages that are part of the Open Review website but are not part of the manuscript reside (e.g., an About page). The HTML files from step 2 are used as the primary content for each book page on the site. These HTML files should not be manually modified as they will be overwritten the next time the site is built.

This entire build process takes place inside of a virtual machine we created that comes pre-installed with all the open-source software that you will need. By using this virtual machine, we hope to ensure that the Open Review Toolkit will work right the first time no matter what operating system you are using.

Once those three steps are complete, you have a set of static html files that you can host anywhere that you want (for my book, we are using GitHub pages). On the Open Review Toolkit website, I also describe additional features of the Open Review websites.

We’ve tried to make it as easy as possible to convert your manuscript into a modern and functional Open Review website. All of our code is open source, but if you’d like to hire a developer to help you do the conversion, the Open Review Toolkit has a recommend list of Preferred Partners.

The Open Review Toolkit, which was inspired by earlier innovations in academic publishing, would not have been possible without the help of many people. I would like to thank the folks at the the Agathon Group, particularly Luke Baker (coding) and Paul Yuen (design) who built the Open Review website for my book Bit by Bit: Social Research in the Digital Age. The Open Review Toolkit grew out of that initial code and design. I would also like to thank Meagan Levinson and Princeton University Press for their support during the first Open Review process. Further, I would like to thank the Alfred P. Sloan Foundation for their support of the Open Review Toolkit. Finally, the Open Review Toolkit builds on some amazing open source software. I’d like to thank everyone who contributed to the project we used in the Open Review Toolkit: Pandoc, LaTeX, hypothes.is, Vagrant, Ansible, Middleman, Bootstrap, Nokogiri, GNU Make, and Bundler.

 You can read more about the Open Review Toolkit at our webpage and download our code from GitHub.