Creating and Breaking MD5 Hash

Step 1: Breaking the Message

The input message is divided into fixed-size blocks, typically 512 bits (64 bytes).

Step 2: Padding the Message

  • If the message size is not a multiple of the block size, padding is added to make it fit.
  • Padding includes a “1” bit followed by “0” bits to reach the last 64 bits of the block. ( 512 – 64 bits )

What about the Last 64 bits

The last 64 bits in the MD5 hash are used to represent the original message’s length (in bits) before padding. How the Last 64 bits are filled :

  1. Determine the Message Length in Bits:
    • The length of the message “This is a longer message!” is 232 bits.
  2. Convert to Binary Representation:
    • To represent 232 in binary, we write: “11101000”
  3. Fill with Leading Zeros:
    • Since the binary representation is less than 64 bits, we need to add leading zeros to make it 64 bits.
  4. Final 64-Bit Representation (Modulo 2^64):
    • The final 64-bit binary representation of the message length is the result of taking the original length modulo 2^64: “0000000000000000000000000000000000000000000000000000000011101000”

The modulo operation effectively keeps the 64 least significant bits of the original message length, discarding any higher-order bits that won’t fit within the 64-bit representation.

The use of modulo 2^64 ensures that the MD5 algorithm can handle messages of any length, even those longer than 64 bits, and produce consistent and unique hash values for them.

A Short Example

The whole message is converted into bits. Then no of bits are calculated. then the no. it self is written in the form of bits Then zeros are added from the front till the size increases to 64bits

Example :

Hello ⇒ 0100100001100101011011000110110001101111

0100100001100101011011000110110001101111 ⇒ 40 bits

40 ⇒ 101000 bits ( Representing 40 in Binary )

Adding Zeros to 101000 ( In Front ) till total bits are 640000000000000000000000000000000000000000000000000000000011101000

Above is the last 64 bits of the block

Step 3: Initializing Internal State

  • MD5 has four 32-bit internal state variables (A, B, C, and D) initialized with specific constants.

Step 4: Processing the Blocks

  • Each block goes through a series of 64 rounds, applying operations to update the internal state.
  • Rounds involve bitwise operations (AND, OR, XOR), bit shifts, and addition modulo 2^32.

Step 5: Updating Internal State

  • The state variables (A, B, C, and D) are updated at the end of each round using the mixing function.

Step 6: Final Output

  • After processing all blocks, the final output is a 128-bit hash value.
  • The hash value is usually represented in hexadecimal format (32 characters).

Hash Collision

In computer science, a hash collision or hash clash is when two pieces of data share the same hash value.
For example, if “johnwick” and “jhonwick” have the same hash value of 123654asdf23432 then the phenomenon is called hash collision.

Pigeonhole Principle

In mathematics, the pigeonhole principle states that if n items are put into m containers, with n > m, then at least one container must contain more than one item.

💡 The principle says, if you have some holes, lets say 100 and you also have 110 pigeons and all the pigeons are in these 100 holes then at least one hole should contain more then 1 pigeon.

How a Collision is Created

💡 We need to create the same hash for different inputs in MD5 which is very hard but the process goes like this

let’s say I have a string “Hello World I am a Hacker” Now if I change the string too much the resulting hash will be different for sure thus I will try to change some characters in the string to see if the hash changes

Exploiting Collision Vulnerability

  1. Brute Force: I think you know what brute Forcing is ( Right ? )
  2. Birthday Attack: The birthday attack is a more efficient method to find collisions. It is based on the birthday paradox, which states that in a set of only 23 people, there is a 50% chance that two people share the same birthday. Similarly, with MD5’s 128-bit output, it takes approximately 2^64 computations to find a collision with a 50% probability.