September 15, 2017

Blockchains and voting

I’ve been asked about a number of ideas lately involving voting systems and blockchains. This blog piece talks about all the security properties that a voting system needs to have, where blockchains help, and where they don’t.

Let’s start off a decade ago, when Daniel Sandler and I first wrote a paper saying blockchains would be useful for voting systems. We observed that voting machines running on modern computers have overwhelming amounts of CPU and storage, so let’s use it in a serious way. Let’s place a copy of every vote on every machine and let’s use timeline entanglement (Maniatis and Baker 2002), so every machine’s history is protected by hashes stored on other machines. We even built a prototype voting system called VoteBox that used all of this, and many of the same ideas now appear in a design called STAR-Vote, which we hope could someday be used by real voters in real elections.

What is a blockchain good for? Fundamentally, it’s about having a tamper-evident history of events. In the context of a voting system, this means that a blockchain is a great place to store ballots to protect their integrity. STAR-Vote and many other “end-to-end” voting systems have a concept of a “public bulletin board” where encrypted votes go, and a blockchain is the obvious way to implement the public bulletin board. Every STAR-Vote voter leaves the polling place with a “receipt” which is really just the hash of their encrypted ballot, which in turn has the hash of the previous ballot. In other words, STAR-Vote voters all leave the polling place with a pointer into the blockchain which can be independently verified.

So great, blockchain for the win, right? Not so fast. Turns out, voting systems need many additional security properties before they can be meaningfully secure. Here’s a simplified list with some typical vocabulary used for these security properties.

  • Cast as intended. A voter is looking at a computer of some sort and indicates “Alice for President!”, and our computer handily indicates this with a checkbox or some highlighting, but evil malware inside the computer can silently record the vote as “Bob for President!” instead. Any voting system needs a mechanism to defeat malware that might try to compromise the integrity of the vote. One common approach is to have printed paper ballots (and/or hand-marked paper ballots) which can be statistically compared to the electronic ballots. Another approach is to have a process whereby the machine can be “challenged” to prove that it correctly encrypted the ballot (Benaloh 2006, Benaloh 2007).
  • Vote privacy. It’s important that there is no way to identify a particular voter with how they voted. To understand the importance of vote privacy, consider a hypothetical alternate where all votes were published, in the newspaper, with the voter’s name next to each vote. At that point, you could trivially bribe or coerce people to vote in a particular way. The modern secret ballot, also called the Australian ballot, ensures that votes are secret, with various measures taken to make it hard or impossible for voters to violate this secrecy. When you wish to maintain a privacy property in the face of voting computers, that means you have to prevent the computer from retaining state (i.e., keeping a private list of the plaintext votes in the order cast) and you have to ensure that the ciphertext votes, published to the blockchain, aren’t quietly leaking information about their plaintext through various subliminal channels.
  • Counted as cast. If we have voters taking home a receipt of some sort that identifies their ciphertext vote in the blockchain, then they also want to have some sort of cryptographic proof that the final vote tally includes their specific vote. This turns out to be a straightforward application of homomorphic cryptographic primitives and/or mixnets.

If you look at these three properties, you’ll notice that the blockchain doesn’t do much to help with the first two, although they are very useful for the third.

Achieving a “cast as intended” property requires a variety of mechanisms ranging from paper ballots and spot challenges of machines. The blockchain protects the integrity of the recorded vote, but has nothing to say about its fidelity to the intent of the voter.

Achieving a “vote privacy” property requires locking down the software on the voting platform, and for that matter locking down the entire computer. And how can that lock-down property be verified? We need strong attestations that can be independently verified. We also need to ensure that the user cannot be spoofed into running a fake voting application. We can almost imagine how we can achieve this in the context of electronic voting machines which are used exclusively for voting purposes. We can centrally deploy a cryptographic key infrastructure and place physical controls over the motion of the machines. But for mobile phones and personal computers? We simply don’t have the infrastructure in place today, and we probably won’t have it for years to come.

To make matters worse, a commonly expressed desire is to vote from home. It’s convenient! It increases turnout! (Maybe.) Well, it also makes it exceptionally easy for your spouse or your boss or your neighbor to watch over your shoulder and “help” you vote the way they want you to vote.

Blockchains do turn out to be incredibly helpful for verifying a “counted as cast” property, because they force everybody to agree on the exact set of ballots being tabulated. If an election official needs to disqualify a ballot for whatever reason, that fact needs to be public and everybody needs to know that a specific ballot, right there in the blockchain, needs to be discounted, otherwise the cryptographic math won’t add up.

Wrapping up, it’s easy to see how blockchains are an exceptionally useful primitive that can help build voting systems, with particular value in verifying that the final tally is consistent with the cast ballot records. However, a good voting system needs to satisfy many additional properties which a blockchain cannot provide. While there’s an intellectual seduction to pretend that casting votes is no different than moving coins around on a blockchain, the reality of the problem is a good bit more complicated.

BlockSci: a platform for blockchain science and exploration

The Bitcoin blockchain — currently 140GB and growing — contains a massive amount of data that can give us insights into the Bitcoin ecosystem, including how users, businesses, and miners operate. Today we’re announcing BlockSci, an open-source software tool that enables fast and expressive analysis of Bitcoin’s and many other blockchains, and an accompanying working paper that explains its design and applications. Our Jupyter notebook demonstrates some of BlockSci’s capabilities.

Current tools for blockchain analysis depend on general-purpose databases that have full support for transactions. But that’s unnecessary for blockchain analysis where the data structures are append-only. We take advantage of this observation in the design of our custom in-memory blockchain database as well as an analysis library.

BlockSci’s core infrastructure is written in C++ and optimized for speed. (For example, traversing every transaction input and output on the Bitcoin blockchain takes only 10.3 seconds on our r4.2xlarge EC2 machine.) To make analysis more convenient, we provide Python bindings and a Jupyter notebook interface. This interface is slower, but is ideal for exploratory analyses and allows users to quickly iterate when developing new queries.

The code below shows the convenience of traversing the blockchain using straightforward Python idioms, built-in currency conversion using historical exchange-rate data, and the use of pandas DataFrames for analysis and visualization..

fees = [sum(block.fees) for block in chain.range('2017')]
times = [block.time for block in chain.range('2017')]
converter = blocksci.CurrencyConverter()
df = pandas.DataFrame({"Fee":fees}, index=times)
df = converter.satoshi_to_currency_df(df, chain)

When plotted, it results in the following graph showing the average transaction fee per block:

BlockSci uses a custom data format; it comes with a parser that generates this data from the serialized blockchain format recorded by cryptocurrency nodes such as bitcoind. The parser supports incremental updates when new blocks are received, and making it easy to stay up to date with the latest version of the blockchain. We’ve used BlockSci to analyze Bitcoin, Bitcoin Cash, Litecoin, Namecoin, Dash, and ZCash; many other cryptocurrencies make no changes to the blockchain format, and so should be supported with no changes to BlockSci.

In our working paper, we present four analyses that show BlockSci’s usefulness for answering research questions. We show how multisignatures unfortunately weaken privacy and confidentiality; we apply the cluster intersection attack to Dash, a privacy-focused altcoin; we analyze inefficiencies in the usage of block space; and we present improved methods for estimating of how often coins change possession as opposed to just being shuffled around.

Here’s an illustrative example. Exploratory graph analysis using BlockSci allowed us to discover a behavioral pattern in the usage of multisignatures that weakens security. Multisignatures are a security-enhancing mechanism that distribute control of an address over a number of different public keys. Surprisingly, we found that users often negate this security by moving their funds from a multisig address to a regular address and then back again after a period of a few hours to days. We think this happens when users are changing the access control policy on their wallet, although it is unclear why they transfer their funds to a regular address in the interim, and not directly to the new multisig address. This pattern of behavior has led over $12 million dollars to be left insecure over the course of  over 22,000 transactions. What users may not appreciate is that the temporary weakening of security is advertised to potential attackers on the blockchain.

There’s far more to explore on public blockchains. BlockSci is publicly available now, and we hope you’ll find it useful. It is easy to get started using the EC2 image we’ve released, which includes the Bitcoin blockchain data in addition to the tool. BlockSci is open-source, and we welcome contributions. This is an alpha release; we’re continuing to improve it and the interface may change a bit in future releases. We look forward to working with the community and to hearing about other creative uses of the data and the tool.

Getting serious about research ethics: Security and Internet Measurement

[This blog post is a continuation of our series about research ethics in computer science that we started last week]

Research projects in the information security and Internet measurement sub-disciplines typically interact with third-party systems or devices to collect a large amounts of data. Scholars engaging in these fields are interested to collect data about technical phenomenon. As a result of the widespread use of the Internet, their experiments can interfere with human use of devices and reveal all sorts of private information, such as their browsing behaviour. As awareness of the unintended impact on Internet users grew, these communities have spent considerable time debating their ethical standards at conferences, dedicated workshops, and in journal publications. Their efforts have culminated in guidelines for topics such as vulnerability disclosure or privacy, whereby the aim is to protect unsuspecting Internet users and human implicated in technical research.

 

Prof. Nick Feamster, Prof. Prateek Mittal, moderator Prof. Elana Zeide, and I discussed some important considerations for research ethics in a panel dedicated to these sub-disciplines at the recent CITP conference on research ethics in computer science communities. We started by explaining that gathering empirical data is crucial to infer the state of values such as privacy and trust in communication systems. However, as methodological choices in computer science will often have ethical impacts, researchers need to be empowered to reflect on their experimental setup meaningfully.

 

Prof. Feamster discussed several cases where he had sought advice from ethical oversight bodies, but was left with unsatisfying guidance. For example, when his team conducted Internet censorship measurements (pdf), they were aware that they were initiating requests and creating data flows from devices owned by unsuspecting Internet users. These new information flows were created in realms where adversaries were also operating, for example in the form of a government censors. This may pose a risk to the owners of devices that were implicated in the experimentation and data collection. The ethics board, however, concluded that such measurements did not meet the strict definition of “human subjects research”, which thereby excluded the need for formal review. Prof. Feamster suggests computer scientists reassess how they think about their technologies or newly initiated data flows that can be misused by adversaries, and take that into account in ethical review procedures.

 

Ethical tensions and dilemmas in technical Internet research could be seen as interesting research problems for scholars, argued Prof. Mittal. For example, to reason about privacy and trust in the anonymous Tor network, researchers need to understand to what extent adversaries can exploit vulnerabilities and thus observe Internet traffic of individual users. The obvious, relatively easy, and ethically dubious measurement would be to attack existing Tor nodes and attempt to collect real-time traffic of identifiable users. However, Prof. Mittal gave an insight into his own critical engagement with alternative design choices, which led his team to create a new node within Princeton’s university network that they subsequently attacked. This more lab-based approach eliminates risks for unsuspecting Internet users, but allowed for the same inferences to be done.

 

I concluded the panel, suggesting that ethics review boards at universities, academic conferences, and scholarly journals engage actively with computer scientists to collect valuable data whilst respecting human values. Currently, a panel on non-experts in either computer science or research ethics are given a single moment to judge the full methodology of a research proposal or the resulting paper. When a thumbs-down is issued, researchers have no or limited opportunity to remedy their ethical shortcomings. I argued that a better approach would be an iterative process with in-person meetings and more in-depth consideration of design alternatives, as demonstrated in a recent paper about Advertising as a Platform for Internet measurements (pdf). This is the approach advocates in the Networked Systems Ethics Guidelines. Cross-disciplinary conversation, rather than one-time decisions, allow for a mutual understanding between the gatekeepers of ethical standards and designers of useful computer science research.

 

See the video of the panel here.