Learn about document storage’s relationship with cryptocurrencies. Why, how, and where to store documents on the blockchain. The original version of this article by Ben Whittle was published on CoinCentral.

Cryptocurrencies like Bitcoin have demonstrated the application of blockchain technology for new forms of money and currency. They store transactions as digital packs of data within blocks. However, there is no reason this data cannot extend beyond financial data. In theory, any form of data can be stored on a blockchain.

Over the past several years, there has been a keen interest in how we can use blockchains for storing documents. There are many reasons you might want to store documents or hashes of documents on a blockchain, and multiple ways to do this. Various projects are currently innovating around this idea, each proposing different methods with different trade-offs.

Why Use a Blockchain Anyway?

Throughout 2017, there was a huge amount of hype around the applications of blockchain technology and cryptocurrencies.

These expectations were often focused on projects with grand promises and little proof of concept. As a result, the reality did not match the hype, and many of them have yet to attract users to their products.

In contrast, document storage is a much drier and less exciting application. However, it is deliverable, with multiple improvements over existing document storage systems.

Tamper Resistance

Immutability is perhaps the most important benefit a blockchain provides. Cryptographically linked blocks provide a record immune from tampering. This tamper resistance is highly effective in preventing the counterfeiting of documents and document fraud. If you cannot store the actual document on the blockchain due to file size limitations, then even storing a hash of the document makes a lot of sense.

Documents often take up a lot of space, compared to financial transactions, which blockchains like Bitcoin are designed for. It is often not feasible to store a whole document on a blockchain. Hashes take up just a small fraction of this space, therefore, are a much more efficient option.

Storing just the hash still offers you tamper resistance. Whenever you change the input of a file, its corresponding hash value will always change. This is a vital benefit secure hash algorithms provide. Regardless of where you store your document, whether in a centralized system like MySQL or in a distributed database like Azure, you can still verify the document has not been tampered with by rehashing it and comparing it to the blockchain-stored hash.

Visibility

Using a public blockchain is a great way to make your document accessible to the public. Of course, you need to be absolutely confident that you want to make it fully visible. Once you store the document or its hash on the blockchain, it will be there permanently. There is no way to change data once you include it in a block.

A blockchain is certainly not the only way to do this. However, given its level of security and tamper-resistance, you can be confident of permanent visibility.

Of course, you could also use a federated or private blockchain if you wanted to limit access to your documents. Such blockchains can provide you with the ability to offer permanent visibility to a preselected group. These alternatives will, however, undermine decentralization and possibly tamper-resistance.

Need for Decentralization

The final reason to use a blockchain is if you require decentralization. Perhaps the nature of your document means that you cannot reliably trust a third-party storage provider to not tamper with or delete the document.

One such instance would be politically sensitive files, which malicious parties could target, if published. By uploading the document or its hash to a public blockchain you would have peace of mind that it is safe from state or corporate censorship. Of course, choosing the correct blockchain is very important here. Blockchains are not all made alike. If the consensus protocol is not properly decentralized or allows full nodes to reverse or censor transactions, then you will have the same problems as using traditional systems.

 

The Different Ways to Store a Document on a Blockchain

There are two main ways you might choose to store a document on the blockchain. One option is to store the entire document itself on-chain. Alternatively, you can store a hash of it on the blockchain.

Storing the Entire Document

Storing a whole document on-chain is possible with certain blockchains, however, it is rarely a good idea. Due to the huge data demands, unless it is a very small file or of extreme importance, you would be better choosing another method. If you wanted to store the document on Bitcoin, then you first have to compress it and then format it into a hexadecimal form.

The problem with storing whole documents on a blockchain is because of something called access latency. This just means how long it takes network users to upload and download files, such as documents. Fully decentralized public blockchains have thousands of nodes. Unfortunately, the benefits that come with this number of nodes also results in a corresponding increase in latency. Any file storage, including documents, needs to have low latency otherwise the system becomes clogged up, slow, and expensive to use.

A hybrid strategy can also make sense. This would involve storing a small part of the document, perhaps the signatures, as well as the document hash on-chain. This allows you to maintain decentralization and full transparency of the parts that absolutely require it while maintaining a cap on the data load.

Storing a Hash

The most efficient method is to store a document’s hash on-chain while keeping the whole document elsewhere. The document could be stored in a centralized database or on a distributed file storage system. You would put the document through a secure hash algorithm like SHA-256 and then store the hash in a block. This way you save a huge amount of space and cost. Additionally, you will be able to tell if someone tampers with the original document. The change in input would result in a completely new hash value, different from your original document.

Hash values are far smaller than whole documents and so are a vastly more efficient blockchain storage method. It also scales efficiently. For storing multiple documents, you can put the hashes into a distributed hash table, which you then store on-chain. The downside is that the storage of the original document is not decentralized nor necessarily publicly visible.

 

What do you think the future holds for blockchains and document storage?