Before reading this article, we recommend reading our previous post “Code your own blockchain in less than 200 lines of Go!”.
Interest in the blockchain has hit feverish levels lately. While much of the buzz has been around applications of the blockchain such as cryptocurrencies and ICOs, the technology itself is just as exciting. The blockchain provides a democratized trust and validation protocol that has already disrupted banking and is on the verge of overhauling healthcare, financial services, social apps and more.
However, from a technological perspective, the blockchain is not without its warts. Current proof of work consensus mechanisms have slowed transaction speeds to near crippling levels. Waiting for Bitcoin transactions to complete makes the platform nearly unusable to some and Cryptokitties almost brought the Ethereum network to a grinding halt.
This makes storing data or large files on the blockchain a non-starter. If the blockchain can barely sustain small strings of text that simply record a balance transfer between two parties, how on earth are we ever going to store large files or images on the blockchain? Are we just going to have to be OK with limiting the utility of the blockchain to things that can only be captured in tiny text strings?
The most promising solution that’s available today is IPFS, or Interplanetary File System, created by the folks at Protocol Labs. It’s a peer-to-peer protocol where each node stores a collection of hashed files. A client who wants to retrieve any of those files enjoys access to a nice abstraction layer where it simply needs to call the hash of the file it wants. IPFS then combs through the nodes and supplies the client with the file.
You can think of it as being similar to BitTorrent. It’s a decentralized way of storing and referring to files but gives you more control and refers to files by hashes, allowing for much richer programmatic interactions.
Here are some simple diagrams so you can see the workflow of IPFS.
- John wants to upload a PDF file to IPFS
- He puts his PDF file in his working directory
- He tells IPFS he wants to add this file, which generates a hash of the file (you can tell it’s IPFS because the hash always starts with Qm…)
- His file is available on the IPFS network
Now suppose John wants to share this file with his colleague Mary through IPFS. He simply tells Mary the hash from Step 3 above. Then steps 1–4 above just work in reverse for Mary. All Mary needs to do is call the hash from IPFS and she gets a copy of the PDF file. Pretty cool.
There is an obvious security hole here. As long as anyone has the hash of the PDF file, they can retrieve it from IPFS. So sensitive files are not well suited for IPFS in their native states. Unless we do something to these files, sharing sensitive files like health records or images is a poor fit for IPFS.
Enter Asymmetric Encryption
Luckily, we have tools at our disposable that pair very nicely with IPFS to secure files before uploading them to IPFS. Asymmetric encryption allows us to encrypt a file with the public key of the intended recipient so that only they can decrypt it when they retrieve it with IPFS. A malicious party who retrieves the file from IPFS can’t do anything with it since they can’t decrypt it. For this tutorial we’ll be using GPG for asymmetric encryption.
Let’s edit our workflow diagram a bit so we include encryption and decryption:
- John wants to upload a PDF file to IPFS but only give Mary access
- He puts his PDF file in his working directory and encrypts it with Mary’s public key
- He tells IPFS he wants to add this encrypted file, which generates a hash of the encrypted file
- His encrypted file is available on the IPFS network
- Mary can retrieve it and decrypt the file since she owns the associated private key of the public key that was used to encrypt the file
- A malicious party cannot decrypt the file because they lack Mary’s private key
So where does the blockchain fit into this? Before we go on, we encourage you to read our popular post: Code your own blockchain in less than 200 lines of Go!
Of particular importance is this diagram:
Pay attention to the BPM part. This kind of simple text recording is all the blockchain can really handle today. This is why cryptocurrencies are a good fit for the blockchain. All you need to record is the sender, recipient and amount of Bitcoin (or Ether, etc.) being transferred. Because all these hashes need to be calculated and verified to preserve integrity of the chain, the blockchain is horrible, absolutely horrible at storing files or large amounts of data in a block.
This is why IPFS is so powerful when coupled with the blockchain. Instead of BPM above, we simply store the hash of the IPFS file! This is really cool stuff. We keep the simplicity of data that’s required on the blockchain but we get to enjoy the file storage and decentralized peer-to-peer properties of IPFS! It’s the best of both worlds. Since we also added security with asymmetric encryption (GPG), we have a very elegant way of “storing”, encrypting, and sharing large data and files on the blockchain.
A real world application would be storing referents to our health or lab records in each block. When we get a new lab result, we simply create a new block that refers to an encrypted image or PDF of our lab result that sits in IPFS.
Enough talk already. Show me how to do this!
In this tutorial we will do the following:
- Set up GPG
- Set up IPFS
- Encrypt a file with someone else’s public key
- Upload the encrypted file to IPFS
- Download the file from another computer (or Virtual Machine) and make sure only the privileged party can decrypt and view it
Things you’ll need
- A second computer or a Virtual Machine instance. The second computer simulates a person with whom you want to securely share your files.
- A test file. We recommend downloading this, which is a sample PDF lab result. This is the exact type of sensitive, personal data we need to protect and since we’re a healthcare company, it’s a nice example. Put this file in your working directory.
That’s it! Let’s get started.
Let’s download GPG on both our main and secondary computers.
Follow the instructions in this article for your OS. On Mac, the easiest way is to open your terminal and
brew install gnupg assuming Homebrew is installed.
Generate a key on each of your computers after GPG installation. Use the following steps:
gpg --gen-key and follow the prompts and pick the default options. Make sure to securely remember or store the password you choose for your username and email.
You’ll get to a stage where
gpg asks you to do some random things to generate entropy. I just typed a bunch of random characters until the process was finished.
After the key has been generated on the second computer, we need to add that key to the keyring of the first computer, so we can encrypt files that only the second computer can decrypt.
Export your public key on your second computer into an armored blob using the email address you chose when creating the key
gpg --export --armor -email > pubkey.asc
pubkey.asc file you just created to your first computer. Make sure to do this securely. A USB stick is better than sending it over email.
pubkey.asc file is on your first computer and your working directory, import it into your keyring like this
gpg --import pubkey.asc
You can check to see it was imported correctly with
gpg --list-keys. My second computer’s name was Cory Heath and it shows up correctly:
Great! We’re done with GPG setup. Let’s move onto IPFS.
Follow the instructions to download and install IPFS for your OS here for both computers. Once you’ve done that, initialize IPFS with
ipfs init on both computers and start your daemon with
ipfs daemon on both computers:
Nice! We’ve set everything up. Let’s get to encrypting and uploading our PDF file to IPFS.
Remember the sample lab result we downloaded earlier? Make sure to move that to your working directory on your first computer.
Let’s encrypt that file (I renamed it
myriad.pdf since the lab result was produced by Myriad Genetics) using the public key of the 2nd computer (in my case, named Cory Heath).
gpg --encrypt --recipient "Cory Heath" myriad.pdf
If you check your directory now with
ls you’ll see a new encrypted file named
Only your second computer can decrypt and see this file. Try it! Email it to another friend and try as they might, they won’t be able to open it! Even if they rename it back to
We’ve got our encrypted file now. Let’s upload it to IPFS!
Uploading to IPFS
To upload to IPFS, all we need to do on our first computer is
ipfs add myriad.pdf.gpg
We get an output like this:
Qm... string is the hash of the file. You can send this to your friend or anyone to whom you wish to give access so they can download it from IPFS.
Let’s just double check to make sure our file is available on IPFS with
ipfs pin ls
You can see the hash of our file is indeed present and now available on IPFS!
Downloading from IPFS
Let’s now switch to our second computer. Remember, we are simulating a second person. To make this more realistic, swap in the second computer throughout this tutorial with a friend!
In our case, instead of a second computer we’re using a Ubuntu VM with Vagrant. This is not a requirement.
On your second computer, download the posted encrypted file from your first computer from IPFS using the same hash:
ipfs get QmYqSCWuzG8Cyo4MFQzqKcC14ct4ybAWyrAc9qzdJaFYTL
This is what it should look like when successfully downloaded:
Since we’re on our second computer, and this encrypted file was encrypted with the second computer’s public key, in theory we should be able to decrypt and view this file without any issues.
Let’s give it a try.
Decrypt the downloaded file and let’s rename it to
gpg --decrypt QmYqSCWuzG8Cyo4MFQzqKcC14ct4ybAWyrAc9qzdJaFYTL > myriad.pdf
Moment of truth:
Let’s open this file and if all went well we should be able to see it on our second computer.
TADA! We successfully downloaded, decrypted and opened our file which was stored fully encrypted on IPFS, protected from anyone who shouldn’t have access!
Recap and Next Steps
Give yourself a pat on the back. What we just accomplished is incredibly powerful and addresses some key issues found in blockchain technology today.
Let’s do a quick review of what we did:
- Recognized that the blockchain is pretty bad at storing large volumes of data and files
- Got IPFS up and running, connected to the network
- Secured sensitive files using GPG and stored them on IPFS
- Understood hashing in IPFS and how we can store the hashes on the blockchain to combine the strengths of the blockchain with distributed file storage
Where you take what you learned here is completely up to you. There are many places to branch off from this. Consider deploying these examples to live servers to act as your own IPFS nodes to store important files. The drawback to IPFS is that if your files aren’t very popular, when you stop your node, your file is gone from the IPFS network. You can prevent this by spinning up cloud servers to act as their own IPFS nodes, so you can host them yourself until more nodes become interested in your files and start storing them.
Check out our previous “Code your own blockchain” tutorials, Parts 1 , 2 and 3 and 4. Once you’ve gone through those, try integrating IPFS and blockchain with your own large, encrypted files. You can also learn about Byzantine fault tolerance, Turing completeness and other advanced blockchain concepts here. If you’re so inclined, here’s how to start your own Hyperledger blockchain and here’s how to build a DApp on Hyperledger.
To learn more about Coral Health and how we’re using the blockchain to advance personalized medicine research, visit our website.