Friday, December 6, 2013

A Graphical Introduction to Hash Functions with SHA-2

Bitcoin Logo
Cryptography is the glue that holds together our modern society of electronic banking, web authentication mechanisms and identity verification. It is also the basis for the radical shifts in the way we think about our money such as Bitcoin.

I have been fascinated by the concepts of cryptography lately and decided to implement a cryptographic hashing function to try and better understand them. I will present the knowledge that I have gained in an easy to understand fashion. I am by no means an expert on this subject but if you have a general interest this should be a good starting point.

I have decided to implement SHA-256 due to the widespread adoption and ubiquity of this particular function. This implementation is designed in LabVIEW which is a graphical programming language. Whether you are a seasoned programmer or just interested in cryptography, this gentle introduction will be beneficial to you.

Hashing Functions

One of the basic elements of cryptography is a hashing function. A hashing function is a mathematical operation that accepts a message as an input and produces a message summary or digest as an output. A hash function will always produce the same digest for a given message.

In order for a hashing function to be useful in cryptography it must be impossible to create an input that produces a desired output. If it is possible to manipulate the output by strategically choosing the input then it is said to have been compromised.

Hash Function
Once a hashing function has been compromised, it is considered insecure. MD5 is a popular hashing function that has been compromised.


SHA-2 is a very popular hash function designed by the US National Security Agency (NSA). It is available in a variety of lengths including 224, 256, 384 and 512. The method used to arrive at the message digest for all of these is very similar. Notably, the difference between SHA-224 and SHA-256 are the use of different initial hash values and the omission of the last 32-bits of the hash.

You will find that SHA-256 is available as sha256sum on most Linux systems.

sha256sum in a shell
SHA-256 is the hashing algorithm used behind Bitcoin, a cryptographic currency.

SHA-2 Implementation

I have implemented SHA-256 in LabVIEW, a graphical programming language. I will take you through the entire implementation from start to finish. I have used the pseudo code available on the SHA-2 Wikipedia page as a reference while writing this program. I hope that transferring it into a graphical form will help facilitate learning.

LabVIEW SHA-256 User Interface
Below is the source code to the SHA-256 program. Data flows from left to right. We take the string input from the Message textbox, convert it into a byte array, apply the SHA-256 hash to it and then output it into the Hash control.

LabVIEW SHA-256 Source Code
The red box labelled "SHA 256 U8[]" accepts a byte array and produces an array of unsigned 32-bit integers containing the hash. Below are the contents of this block. The first step is to prepare the message. Once the message is prepared, we apply the SHA-256 function to it.

SHA 256 U8[] Source Code

Message Preparation

Below is the source code to the message preparation block. The first step is to add a single 1 bit to the message. This is accomplished by appending a 0x80 byte to the message array. Next, we need to pad the message with 0x00 bytes until the message length in bits modulo 512 is equal to 448. Since I am working with bytes, I will take the message length modulo 64 and add enough padding bytes to make the message length equal modulo 64 equal to 56.

The last step is to append a 64-bit unsigned integer indicating how long the message is in bits. Follow the bottom branch of the source code to see this.

Finally, the byte array is converted to an unsigned 32-bit integer array.

Prepare Message Source Code

512-bit Chunk Hashing

The next step is to apply the hash function to the message in 512 bit chunks. This is a simple loop that works on 512 bit (16 x 32-bit) pieces of the message at a time. The up and down arrows on the edge of the loop construct indicate that the output of the previous iteration is used as the input to the next iteration. The actual SHA-256 compression function is in "SHA 256 BLK".

SHA-256 Hash Function

Block Hashing

Each iteration of the loop will execute the following SHA-256 hash block function with the output of one iteration used as the input to the next. This the most complicated part of SHA-256.

The first step is declare an array of 64 32-bit unsigned integers. This is the working register where the message block will be expanded by a variety of xor, right shift and right rotate operations.

Initialize the SHA-256 Block
The result of this will be the 512-bit block expanded into a 2048 bit space. The details of these XOR, right shift and right rotate operations can be found on the Wikipedia pseudo code.

Round Constants

SHA-2 uses a group of round constants. These are numbers chosen to exhibit specific properties to make it difficult to architect an input to yield a desired output. The challenge with round constants is that they must be chosen such that a user can be convinced that they were not chosen with an ulterior motive. An example of a ulterior motive would be selecting constants that open a back door to the designer of the hashing algorithm. These numbers are known as nothing up my sleeve numbers.

The SHA-2 series of hash functions use the square and cube roots of various small prime numbers as round constants.

Compression Loop

The next stage of the block hashing algorithm is to loop through the array produced in the previous stage and apply the round constants along with a variety of XOR and rotate right operations.

Again, I will refer you to the Wikipedia pseudo code for the details of these operations.

Compression Loop
The last step of this operation is to add the results obtained in the compression loop to the previous iteration of hash values. Some implementations use a, b, c, d, e, f, g and h variables. I prefer to work with array of 8 32-bit unsigned integers.

Finalize Hash Values


As you can see, implementing the SHA-2 hash functions is a relatively painless process. I certainly learned a lot along the way.

The next step would be to implement a bitcoin miner in LabVIEW to try and understand what is happening when a miner is executing and some of the protocols used to communicate with the mining pools.

Source Code

If you have a copy of LabVIEW, here is a link to the source code. It is saved in LabVIEW 2010 format.