Table of Contents

MD5 and shorter hashing techniques

I had the need for creating a link in an e-mail which should have had an unique identifier appended to it.

I did not want to use a numeric ID as it would have been to easy to “play” with and thus possibly creating information leakage further down the line.

I rather wanted to create a random looking string which isn't easy to tamper with while still being “roomy” enough for millions of valid hashes with even vaster empty space between - collision resistance is the key here. One other objective is it being not too lengthy, maybe even feasible to type in if there is no link handler registered in the e-mail application.

What I was looking for is something similiar to Youtube's video ID, e.g. https://www.youtube.com/watch?v=WhoPPnDiY5c

MD5 rebase

One approach is to create a unique ID and get the MD5 value, then rebasing from hexadecimal base16 to base64.

In PHP, that's rather easy to do, all one has to do is use pack and base64_encode. In this example, the unique ID is created concatenating a numeric ID and a passphrase.

echo md5('123456secret');
992f3c4e7ec9a0e96173284c816612bd

To rebase the MD5 sum from hexadecimal base16 to base64 and making it shorter and less looking like MD5 -

echo base64_encode(pack('H*', md5('123456secret')));
mS88Tn7JoOlhcyhMgWYSvQ==

To convert the hash back to the original hexadecimal expression, use

print_r(unpack('H*', base64_decode('mS88Tn7JoOlhcyhMgWYSvQ==')));
Array
(
    [1] => 992f3c4e7ec9a0e96173284c816612bd
)

ID encryption

An alternative might be the encryption of a numeric ID and rebasing this to anything from base16 to base64. An example might follow.

UPDATE (2018-04-06)

After revisiting this article, I found a very detailed post about Youtube Video IDs here: https://webapps.stackexchange.com/questions/54443/format-for-id-of-youtube-video

UPDATE (2019-02-10)

Tom wrote:

[…] After scouring the web and stack overflow it was by far the simplest solution I could find. One thing I might suggest is working from the raw md5 output (rather than re-packing it) and remove undesirable characters. For example, here is the function I came up with for my needs:

function tinymd5($str, $length) { // convert md5 to ~base(64 - count($remove)) and truncate to $length
    // remove vowels to prevent undesirable words and similarly + / which may be problematic
    $remove = array('a', 'e', 'i', 'o', 'u', 'A', 'E', 'I', 'O', 'U', '+', '/');
    return str_pad(substr(str_replace($remove, '', base64_encode(md5($str, TRUE))), 0, $length), $length, '=');
}