I was discussing passwords with someone recently and thought of a neat little hands-on exercise that shows not only how password hashing works, but shows you a real-world example of cracking a weakly hashed password just using Google.
The hands-on exercise should be easily approachable for beginners. I'll also go over a general history of passwords on the Internet -- I've been working as a web developer long enough to watch the whole transition from MD5 to bcrypt play out.
Any Unix-like environment with the
md5sum command. Most Linux distros have it
by default as part of the
coreutils package. The Windows Subsystem for Linux
Mac OS might have these built-in too. Not sure.
In your Linux environment, open the Terminal app and type this command:
echo -n 'hunter2' | md5sum -
If done correctly your terminal might look like the following:
[kirsle@caskir ~]$ echo -n 'hunter2' | md5sum - 2ab96390c7dbe3439de74d0c9b0b1767 -
Now, copy and paste that MD5 hash into Google. You will find results that say the password in that hash is "hunter2"!
You can try this for all sorts of common passwords. Even some of your own passwords may come up! If you have a really good one, though, and nobody has ever cracked it before, you won't find it on Google. But most passwords that people think are unique are already out there and have been cracked.
For an explanation of what that
echo | md5sumstuff was doing, scroll to the bottom of this article.
Note: this page is only talking about one particular hashing algorithm, MD5, that old websites used for password hashing. MD5 is widely regarded to be broken and nobody should be using it anymore -- though the key word being "should." Legacy old apps still use MD5 for password hashing.
Most websites don't store your password in plain text. When the hackers steal the database, they get ahold of usernames and hashed passwords and not the real passwords.
A hash is a one-way function that takes an input (your password) and produces a very random looking output. With the same input password, you get the same output, but if you change the input even a tiny amount, the output becomes wildly different. And you can't work backwards from the hash and derive the password.
When you sign up for the website, the website hashes your password and puts the hash in the database and discards the plain text password. When you come back later and log in with your password again, the server hashes the password you provided and then checks that the hash matches the one in the database. The server doesn't know your password but it can know that you know because the hashes match.
In the old days, a common algorithm to hash passwords was MD5.
md5("password") '5f4dcc3b5aa765d61d8327deb882cf99' md5("password1") '7c6a180b36896a0a8c02787eeafb0e4c' md5("password2") '6cb75f652a9b52798eb6cf2201057c73' md5("123456") 'e10adc3949ba59abbe56e057f20f883e' md5("correct horse battery staple 123456789!") '387951653ee212fd966b9bcdf3d49a83'
MD5 is nowadays considered broken and insecure and more sophisticated algorithms are used instead. One of the biggest problems was that MD5 is too fast, meaning you can compute billions of passwords per second even using modest hardware.
MD5 was intended for hashing data for file integrity purposes: I send you a PDF document and also the MD5 sum of the document. When you download it, you compute your own MD5 hash and see that it matches the sum I sent you. For passwords you want something slower, which I'll get to below.
You may have noticed that every MD5 hash is exactly the same length regardless of what the password was. Whether it's a hash of a short password or the hash of the collective work of Shakespeare, you will get a 32 character random-looking string. Each algorithm has a different length output, but it is consistent within the algorithm and MD5 is always 32 hexadecimal characters when printed as text.
There is the minute chance that two completely different inputs would produce exactly the same hash. Effectively, this would mean two different passwords would both log you in to an account: the user's actual password and another that happens to collide with the same hash. The server doesn't know the real password, it just hashes what you provided.
Part of the strength of a hashing algorithm is in how resistant it is to collisions. More on that later.
MD5 hashes weren't always fast to compute -- it's just that Moore's Law has caught up and computers have gotten fast. So back in the day you would use a Rainbow Table to speed up your hacking.
Suppose you're a hacker and you've just stolen a database, and the website used MD5 hashes for the passwords. This was a common practice for websites to do.
You could brute force guess each password, but that would take a long time. Why not just use a pre-computed database of all the passwords and their hashes already figured out? This is what a rainbow table was. It looked like this, but for billions of passwords:
21232f297a57a5a743894a0e4a801fc3 admin 827ccb0eea8a706c4c34a16891f84e7b 12345 5f4dcc3b5aa765d61d8327deb882cf99 password 7c6a180b36896a0a8c02787eeafb0e4c password1 365d38c60c4e98ca5ca6dbc02d396e53 password12345
The rainbow table would be several gigabytes in size and include every possible alphanumeric password under 10 characters, or whatever, or they would be a list of the top million most common passwords (found from other leaks in the past).
So when you're combing through your hacked database and see the MD5 hash of
4d93a909b13445dbf6db5d4461912ac9 you can look in your rainbow table and see
that the password was
snowflake42 and you're trading disk space for time.
Nowadays MD5 is so fast you don't need rainbow tables.
So storing a plain MD5 hash in the database was found to be a Bad Idea because rainbow tables became a thing and password cracking was getting easier.
"Salting" a password means the website adds some extra text to the user's password, before hashing it, and then the website has to remember both the hash and the salt it added.
password = "hunter2" # the user's password salt = "5e821c" # the salt for the password salted_password = password + salt # "hunter25e821c" hashed_password = md5( salted_password )
The salt would immediately invalidate the rainbow tables because now all of those common passwords in the table aren't quite the same as what gets hashed.
Some websites would have a global salt that gets added to every single password. This was insecure because, if your database and salt do get leaked, a hacker can just spend some time creating his own Rainbow Table specifically for your salt and then start cracking the passwords as normal.
The better solution was to use a per-password salt that's different every time.
So your database would store columns for
salt so that the server could always recompute the same hash again to verify
Even this isn't good for MD5 anymore because MD5 is just too fast.
MD5 was also found to have collisions. There is a GIF image that displays its own MD5 hash -- think about it for a minute. If the data of the GIF changes, its MD5 hash is 'randomized', so if you edit the GIF to display its current hash -- its hash will change. But this GIF has the same MD5 sum (as a file) as what it displays in pixels inside its data.
As weaknesses in MD5 started to grow, developers would start switching to other hashing algorithms like SHA1 or SHA-256. But these algorithms are all in the same family as MD5: the fast kind intended for data integrity and not password hashing. They all suffer the same problems with rainbow tables and sheer speed.
Collisions have been found in SHA-1 but so far SHA-256 and SHA-512 should be resistant for the foreseeable future. Some web developers just chase the SHA hashes down the line, but really you should not use these hashes for passwords. Their sheer speed is such a weakness alone.
Nowadays the recommended practice is to use a cryptographic hashing function like bcrypt or Argon2. I personally go for bcrypt.
A cryptographic hashing function is deliberately designed to be slow. "Slow" here of course is a relative term: if your hash takes 10 milliseconds to compute, this won't be noticeable to a user but would be a death blow to a hacker who hopes to crack the database. With each password taking milliseconds to hash instead of picoseconds (billions per second) it takes drastically longer to break a password by brute force but hardly an inconvenience for the website.
How they work is the website configures a "cost factor" as part of the hash. The higher this number, the slower it takes to compute the hash. So as time goes on and hardware gets better, you can increase this cost factor in your code and any new passwords will compute a stronger and slower hash.
Under the hood what this basically is doing is: running a fast hash function like SHA-512 a whole bunch of times (related to the cost factor). So one SHA hash may be fast, but 100,000,000 iterations of hashing to end up on the final hash? Significantly slower per password. This is massively simplified but it provides a way to ramp up security to keep up with hardware advances.
Salted passwords are used in these, too, but they are automatic. When you have bcrypt hash a new password to save in the database, it will produce a hash that encodes the cost factor, random salt, and the hash itself. On future checks to see if the password is correct, bcrypt is able to verify using the salt and cost factor stored with the hash in your database.
Since the cost factor is stored in the hash, too, increasing it over time will ramp up security for new passwords (and when users change theirs) while still allowing older passwords to log in.
That about catches us up to the present. Until collisions are found in algorithms like bcrypt, they should serve us well for the foreseeable future. Always Google for the latest best practices when writing new code!
Footnote: Explanation of the shell command from the exercise.
The echo command just prints back what you gave it. If you ran
echo hunter2by itself it would print
hunter2back instead of the MD5 hash. I put the password in 'single quotes' so that, if you need to include special characters in it, the command line shell won't be confused.
echoprints a line break at the end, so that when your command prompt returns, it doesn't collide with the output from
echo. But the
-noption suppresses the newline character:
hunter2\n. Example:[user@localhost ~]$ echo hello hello [user@localhost ~]$ echo -n hello hello[user@localhost ~]$ _
Notice that the shell prompt is on the same line. We need to suppress the
\ncharacter so we take a hash of
hunter2\nmd5("hunter2") '2ab96390c7dbe3439de74d0c9b0b1767' md5("hunter2\n") '6a0f0731d84afa4082031e3a72354991'
md5sum is a command that computes the MD5 hash of a file. Normally you run it against a file, like
md5sum Fedora-28.iso. Using
-as the file tells md5sum to read from the program's input instead of a file on disk.
Finally, the output of
echois "piped" (the
|symbol) as the input for
md5sum -, so that md5sum produces the hash of the password you echo'd. A pipe redirects the output of one program away from the terminal and into another program as its input data instead.
There are 2 comments on this page. Add yours.
Thx for this infos. was looking for some basics and got to your site. Thx dude!
Awesome introduction to password hashing! I'm getting Solaris installed so I can start working on this!