Kirsle.net logo Kirsle.net

Blockchain in a nutshell

May 9, 2021 by Noah

Have you ever wondered basically how blockchains work and what it means to "mine a Bitcoin" and why it's a time-consuming (and energy-intensive) process to mine one? In this blog post I'll summarize what basically drives blockchains and how they work, from my understanding of them anyway.

If you're anywhere close to a novice web developer or have ever hashed a password in your life, you already know the basic technology at play here. And even if you haven't, I think I can get you caught up pretty quickly.

The explanation I give below is surely super simplified and is based on my reading of how these work, along with several "Create your own blockchain in under 200 lines of code" type of tutorials which you could find for Python, or JavaScript or anything. It doesn't cover how the decentralized networking aspects work or any of those details, but just the basics on the cryptography side.

Hashing algorithms

A hashing algorithm takes some data as input and it produces a "random" looking output of a fixed length. Historically, hashing algorithms have been used by websites to store user passwords without the website owner being able to find out what everyone's password is.

While the output of a hash function looks random, you actually get exactly the same output hash given the same inputs. For example, with the MD5 hashing algorithm and an input of the word password you always get the MD5 hash of 5f4dcc3b5aa765d61d8327deb882cf99 out. If you add a number to the end (password1), the hash is 7c6a180b36896a0a8c02787eeafb0e4c which looks wildly different. Here are a few more examples:

  • password2 = 6cb75f652a9b52798eb6cf2201057c73
  • password3 = 819b0643d6b89dc9b579fdfc9094f28e
  • Password = dc647eb65e6711e155375218212b3964
  • MyB$GSecretP@ssW0rd12345 = f8963d1c3c7b402c7e6539c21a3f1833

So even if you modify the input value just a tiny bit, the output looks wildly random and different to before; but every single time MyB$GSecretP@ssW0rd12345 is hashed as MD5, it always comes out as f8963d1c3c7b402c7e6539c21a3f1833, and so a website would store just the output hash and that way it can prove you entered the correct password if it hashes to the same value the website stored in its database.

Note: MD5 is now well-known to be a very weak and insecure algorithm and has several major flaws. It used to be an industry standard algorithm for password hashing before ~2005 or so, but is no longer suitable for most modern applications. Nonetheless, the size of hash it produces is short & sweet and it's a fast enough algorithm for the purposes of this blog post.

Also, real blockchains use longer and more modern hashing algorithms, but they all function the same in principal.

Looking for "interesting" hashes

You'll notice above that I showed the example hashes of the words password1, password2 and password3 -- all are basically the word 'password' plus a number that starts at 1 and incremented each time, to see how the hash turns out.

The hash looks "random" each time, but eventually, if you go for long enough, you might happen upon a hash that has an interesting property, like that maybe 6 of its right-most digits would come out as zeroes instead of random noise.

In fact, the magical number that does this is 10,456,925:

  • MD5("password10456925") == "5eb62e644f71a22369b5a2d172000000"

The right-most six digits of this hash came out as zeroes. How did I come to the number of 10,456,925 though? By trial-and-error, starting from 1 and counting up and up until I found the hash that fit this criteria of ending with six zeroes.

Well actually, I wrote a script to compute this for me, and it took about 3.6 seconds on my laptop; this is because MD5 is very fast and looking for six zeroes wasn't so big of an ask; what if I wanted seven zeroes on the end of it?

  • 6 zeroes: 10,456,925; MD5("password10456925") == "5eb62e644f71a22369b5a2d172000000" (3.6 seconds)
  • 7 zeroes: 85,950,218; MD5("password85950218") == "88d8efac3879874c50814cef40000000" (29.59 seconds)
  • 8 zeroes: you tell me, I gave up after waiting 10 minutes.

While it took some time for me to compute the magic number that produced an interesting hash, it takes no time at all for a computer to verify my finding: producing a single hash of a single input takes virtually no time at all, and a computer can see that "password" + "85950218" produces the hash with seven trailing zeroes without needing to reproduce all of the work that went in to finding that number in the first place.

Mining a coin

In a blockchain-based cryptocurrency, the word "password" in the above example is replaced with a transaction log (a record of who is sending money to who), and the miners on the network take that transaction message and add a number to it, called a 'nonce,' beginning at 1 and counting up and up until the magic number is found that produces a hash with an interesting property like this, such as a number of trailing zeroes on the end.

The difficulty is controlled in how many zeroes you want back; we saw that six zeroes took basically no time, seven took a bit longer, and eight took exponentially longer still; if coins are being mined too quickly, the network changes the difficulty factor to slow down the next coin.

What makes it a 'chain' then?

The way you create a validated chain of such hashes is that you take the previous hash as part of the message you're computing.

Say the very first message ever mined had a hash of 88d8efac3879874c50814cef40000000 with seven zeroes at the end. The next message to mine then might look like this:

parent: 88d8efac3879874c50814cef40000000
message: password
nonce: 1

The MD5 hash of the above message is 4419e1e83f3cb7e13cfba489382fecbb; then you make the "nonce:" be 2 and try again, then 3 and try again, and so on. The next winning hash becomes e5f2ac163622785bbfef6710c0000000 with a nonce value of 338,188,800 calculated after 3 minutes 15 seconds on my laptop:

parent: 88d8efac3879874c50814cef40000000
message: password
nonce: 338188800

A script you can play with

Here is a simple Perl script I wrote for this blog post. Every programming language has an MD5 hashing library available, and this Perl script probably will work out-of-box on any Linux and Mac OS system:

#!/usr/bin/env perl

use Digest::MD5 qw(md5_hex);

my $message = "password";
my $nonce = 1;
my $difficulty = 7;  # number of zeroes you want to find

while (1) {
  my $test = $message . $nonce;
  my $md5 = md5_hex($test);

  if ($md5 =~ /0{$difficulty}$/x) {
    print("FOUND IT: nonce=$nonce hash=$md5\n");
    exit(0);
  }

  # Uncomment this if you want to see every single try:
  #print("$test: $md5\n");
  $nonce++;
}

You can play with the $difficulty value to look for more or fewer zeroes at the end, or change the $message to a different string and notice that the magic winning $nonce will be a wildly different number then. Uncomment out the print() towards the end and you can see every guess attempted and what its hash was, though this will drastically slow down the script; a modest consumer PC can crank out tens-of-millions of MD5 hashes per second but printing a line of text to your terminal window takes a drastic amount of time comparatively.

Addendum: Git is basically a blockchain

Git is a popular version control system for developers. If you've used Git, this section is for you, and if not -- no worries!

Git users may be familiar with the concept of a "revision hash" which tracks a specific change made in the history of your source code. You author a bunch of changes, run git commit and write a description about your change, and it goes into the git log along with a random hash; with that hash, a different developer could check out exactly the version of code you wrote.

The git commit history is basically a blockchain: the hash is generated by taking the changes you've prepared + your commit message describing those changes + the "parent commit" hash all together.

This is why if you go back and revise an old commit from months ago in your project, you break everybody else, because the history of "this commit came after that commit" is hashed as part of subsequent commits all the way down the chain; revising a historical commit causes every subsequent commit to be hashed differently and you'll piss off all the other developers who now have broken copies of the code history!

You could play with your git commit history by repeatedly doing "git commit --amend" and modifying your commit message, including a nonce counter from 1 to n, until the commit hash has some property you like.

% git init
Initialized empty Git repository in /home/kirsle/git/test/.git/

% echo 'Hello' > README.md
% git add .
% git commit -m "Initial commit"
% git show
commit 0bb4980598954dbb042e9d9716e3db5f81a14a38 (HEAD -> master)
Author: kirsle
Date:   Sun May 9 15:28:37 2021 -0700

    Initial commit

# Play with nonces and re-roll that commit hash...
% git commit --amend -m "Initial commit + 1"
% git show
commit e0ae47bd3599e576ff39ed5a46bbddf93a25afed (HEAD -> master)
Author: kirsle
Date:   Sun May 9 15:28:37 2021 -0700

    Initial commit + 1

# Again...
% git commit --amend -m "Initial commit + 2"
% git show
commit 343946b2fb12db5422e4efd1a7e3af1674a6718e (HEAD -> master)
Author: kirsle
Date:   Sun May 9 15:28:37 2021 -0700

    Initial commit + 2

Tags:

Comments

There are 0 comments on this page. Add yours.

Add a Comment

Used for your Gravatar and optional thread subscription. Privacy policy.
You may format your message using GitHub Flavored Markdown syntax.