What’s a blockchain, anyway?

by Peter Van Valkenburgh April 25, 2017

“Blockchain” has become a buzzword in the technology and financial industries. It is often cited as a panacea for all manner business and governance problems. “Blockchain’s” popularity may be an encouraging sign for innovation, but it has also resulted in the word coming to mean too many things to too many people, and—ultimately—almost nothing at all.

The word “blockchain” is like the word “vehicle” in that they both describe a broad class of technology. But unlike the word “blockchain” no one ever asks you, “Hey, how do you feel about vehicle?” or excitedly exclaims, “I’ve got it! We can solve this problem with vehicle.” And while you and I might talk about “vehicle technology,” even that would be a strangely abstract conversation. We should probably talk about cars, trains, boats, or rocketships, depending on what it is about vehicles that we are interested in. And “blockchain” is the same. There is no “The Blockchain” any more than there is “The Vehicle,” and the category “blockchain technology” is almost hopelessly broad.

There’s one thing that we definitely know is blockchain technology, and that’s Bitcoin. We know this for sure because the word was originally invented to name and describe the distributed ledger of bitcoin transactions that is created by the Bitcoin network. But since the invention of Bitcoin in 2008, there have been several individuals, companies, consortia, and nonprofits who have created new networks or software tools that borrow something from Bitcoin—maybe directly borrowing code from Bitcoin’s reference client or maybe just building on technological or game-theoretical ideas that Bitcoin’s emergence uncovered. You’ve probably heard about some of these technologies and companies or seen their logos.

Aside from being in some way inspired by Bitcoin what do all of these technologies have in common? Is there anything we can say is always true about a blockchain technology? Yes.

All blockchains have…

All blockchain technologies should have three constituent parts: peer-to-peer networking, consensus mechanisms, and (yes) blockchains, A.K.A. hash-linked data structures. You might be wondering why we call them blockchain technologies if the blockchain is just one of three essential parts. It probably just comes down to good branding. Ever since Napster and BitTorrent, the general public has unfortunately come to associate peer-to-peer networks with piracy and copyright infringement. “Consensus mechanism” sounds very academic and a little too hard to explain a little too much of a mouthful to be a good brand. But “blockchain,” well that sounds interesting and new. It almost rolls off the tongue; at least compared to, say, “cryptography” which sounds like it happens in the basement of a church.

But understanding each of those three constituent parts makes blockchain technology suddenly easier to understand. And that’s because we can write a simple one sentence explanation about how the three parts achieve a useful result:

Connected computers reach agreement over shared data.

That’s what a blockchain technology should do; it should allow connected computers to reach agreement over shared data. And each part of that sentence corresponds to our three constituent technologies.

Connected Computers. The computers are connected in a peer-to-peer network. If your computer is a part of a blockchain network it is talking directly to other computers on that network, not through a central server owned by a corporation or other central party.

Reach Agreement. Agreement between all of the connected computers is facilitated by using a consensus mechanism. That means that there are rules written in software that the connected computers run, and those rules help ensure that all the computers on the network stay in sync and agree with each other.

Shared Data. And the thing they all agree on is this shared data called a blockchain. “Blockchain” just means the data is in a specific format (just like you can imagine data in the form of a word document or data in the form of a image file). The blockchain format simply makes data easy for machines to verify the consistency of a long and growing log of data. Later data entries must always reference earlier entries, creating a linked chain of data. Any attempt to alter an early entry will necessitate altering every subsequent entry, otherwise digital signatures embedded in the data will reveal a mismatch. Specifically how that all works is beyond the scope of this backgrounder, but it mostly has to do with the science of cryptography and digital signatures. Some people might tell you that this makes blockchains “immutable,” that’s not really accurate. The blockchain data structure will make alterations evident, but if the people running the connected computers choose to accept or ignore the alterations then they will remain.

Bitcoin as illustration.

Explaining how this all works in Bitcoin provides a helpful example.

So, what are the connected computers in the Bitcoin blockchain technology? They are any devices on the Internet running Bitcoin-compatible software. That software could be a wallet app or it could be software for “mining” bitcoin. If, for example, you run a Bitcoin software wallet on your phone, then whenever you send or receive Bitcoin transactions your phone will be talking directly to any other nearby computers that are running Bitcoin software; it’s peer-to-peer. Some people are uncomfortable running important software on their personal devices and that’s reasonable because if you are not careful when you run that software, you could accidentally lose your bitcoins. So some people might use a Bitcoin wallet that is created and maintained by a company. In this case, the wallet app on your smartphone will talk to a server that the company maintains, and it’s that server that connects to the peer-to-peer network on your behalf.

What about the consensus mechanism in Bitcoin? Well, as with any consensus mechanism, it’s a series of rules written in computer code. To be compatible with the Bitcoin network any software you run on your Internet-connected device must follow these rules. If your software is modified to try and break the rules, then the messages it sends on the Internet will be ignored by all the other computers running honest, rule-obeying Bitcoin software.

There are a bunch of rules in the Bitcoin consensus mechanism, but we can highlight two of them here and transcribe them roughly from computer code into natural language:

Nobody can send bitcoins that they have not first received from someone else or a coinbase transaction.
Every 10 minutes one of the connected computers will be selected to choose the order of valid transactions for that period; that computer can write itself a coinbase transaction.

That first rule is pretty self-explanatory. It’s a rule against counterfeiting. The only exception is when someone sends themselves brand new bitcoins (known as a coinbase transaction) according to the network’s rules for new money creation. The second one isn’t very hard to understand either once we have some context.

Recall that the connected computers are talking directly to one another, and keep in mind that those computers could be anywhere in the world because it all works on top of the global Internet.

If some computers are in, for example, China, and others are in the U.S., it’s likely they will get out of sync because messages about transactions will originate in different parts of the world and propagate across the Internet at different rates. A connected computer in China might think the most recent transactions came in this order: A, B, C. While a computer in the U.S. may have seen them come in the reverse order C, B, A. How do we make sure all the computers agree on the order? Well, as rule 2 specifies, every 10 minutes one computer will be chosen to state the authoritative order of transactions for that period of time, and then another will be chosen, and so on. In computer science this arrangement is called a repeated leader election, but unlike a normal political election the periodic leader is simply chosen at random.

Notice also that our rule 2 specifies that the leader can only give the order of valid transactions. If the chosen leader tried to include a transaction where they gave themselves millions of counterfeit bitcoins, then they would have broken rule one. Their scammy messages are simply ignored by the rest of the computers as per the rules of the consensus mechanism.

The chosen leader can, however, write themselves a coinbase transaction that will reward them for their honest work in maintaining the network. This transaction creates new bitcoins out of thin air as a reward, but it must match a predefined money creation schedule (you can’t just choose the size of your reward). That money creation schedule is just another rule within the Bitcoin consensus mechanism software.

Finally, there’s Bitcoin’s shared data, its blockchain. This is just a list of all Bitcoin transactions that have occurred since the network started in 2009. Here’s a stylized illustration:

Of course the real Bitcoin blockchain has many more transactions in it, millions since the network started. Also, the transactions don’t have human-readable names in them like the illustration above suggests. Instead, the sender and recipient are represented by what’s called a public address. It’s a pseudorandom but unique string of letters and numbers that is generated locally on the smartphone or computer of a particular Bitcoin user. It looks like this, 1CPwNACt62wts2yGbz1vUuqeGD58SzzeAL, and the user’s device will also generate a matching secret key (another pseudorandom but unique string of numbers and letters) that must be used to sign transactions spending funds from that address. Think of it like a password. All in all, however, the blockchain is pretty simple in that sense, it’s just a list of transactions between addresses that’s presented in a way that makes it easy for computers to verify the data.

How various blockchain technologies may differ.

What about other, non-Bitcoin blockchain technologies? Well they all follow the same design pattern. They will have peer-to-peer networking, a consensus mechanism, and a blockchain, and they will enable connected computers to reach agreement over shared data.

There are two things that can differ from Bitcoin, however. The shared data may be different, and the consensus mechanism may be engineered with different design choices.

Here’s how the data can differ. Instead of being a list of bitcoin transactions, the shared data could be votes in an election, or identity credentials (think of it like a tokenized driver’s license or proof of a credit score). Or the data could be the current state of a running computation. In other words the data could be related to a global computer that anyone is allowed to write and read data from; that’s one way to describe Ethereum, another open blockchain network inspired by Bitcoin.

The consensus mechanism could also be different than Bitcoin’s. These differences aren’t necessarily good or bad; remember that “blockchain” is like “vehicle.” Sometimes you might need a boat, other times a rocketship. Not all vehicles are good for all use cases.

There are three big design choices that might make the consensus mechanism different from Bitcoin’s. These tradeoffs and choices merit a much longer discussion, but here’s a basic overview:

Open or Closed? Does the consensus mechanism allow anyone to join and participate, or is participation limited to identified parties on the network who were previously provisioned with an access credential by a company, consortium, or other central party that is creating or implementing the blockchain technology? In other words is it an open network (like the Internet) or a closed or permissioned network (like a company intranet)?
Private or Transparent? Does the consensus mechanism privilege data privacy above data transparency and auditability? Or vice versa? To some extent this is an iron trade-off. Recall that all the computers must reach agreement on the shared data. If the data was private to a handful of individuals then only those individuals on the network would be able to verify and agree on the data. There may be a way around this tradeoff in consensus design thanks to some new research into “zero-knowledge proofs,” and the launch of a new privacy-protecting public network called Zcash.
Edge or Center? Does the consensus mechanism put security at the edge of the network or at the center. Open blockchain networks like Bitcoin have consensus mechanisms that push the responsibility for security to the edge, to the individual computers owned and controlled by users. So if you receive bitcoins on your smartphone using a software wallet, for example, your device is the only device on the whole network that can now spend those bitcoins. Without the secret key generated on your phone, the bitcoins can never move. This is in sharp contrast to pre-Bitcoin electronic payment systems where an intermediary like a credit card company could step in and reverse a transaction or move funds out of your account without needing you to take any action with your card or banking app.Having security at the edge may be a disadvantage for someone who loses their phone and failed make a backup of their credentials, but it’s also an advantage system-wide because there’s no longer a central party who could be hacked or be dishonest and thereby put everyone’s money or data at risk.Permissioned blockchain technologies retain some power at the center of the network because—at the very least—there will be one party who is relied upon to identity permitted member computers and provision them with an access credential.

Those are the primary possible differences between blockchain technologies. There’s still plenty of room for elaboration, details, and future possibilities, but hopefully you’ve got a better handle on the fundamental architecture of these exciting new tools. Just remember, blockchain technology means that connected computers reach agreement over shared data.