I recently read an article about the Ashley Madison hack. For those of you who aren’t into cheating on your spouse, Ashley Madison is a web site where you can sign up and meet up with other lovely people in order to have an affair.
Anyway, they were recently hacked. More than 11 million passwords were obtained, and the reason why made me fall off my chair.
In short, the users’ usernames + passwords were stored in the site’s database in an MD5 hash – along side a BCrypt hash of the user’s password.
Well, what does all this mean to you?
How password authentication works
This part might get a little hairy, but sit tight.
When you sign up for an account on a site, you choose a username and a password. The username is usually stored as plain old text in the site’s database. The password, on the other hand, is hashed before being stored.
A hash function is a mathematical algorithm that will take data of any size (like a password) and turn it into a chunk of funky-looking data of fixed size.
If the data to be hashed is 1 character long, the resulting hash will be 60 characters long.
If the data is 20 characters, the hash will be 60 characters long.
If the data is 2,000 characters long, the resulting hash will still be 60 characters long.
The hash function iterates over the data, and expands/reduces it into a “secret code” of a fixed length. The output of this type of hash function will be unique, and it will always produce the same result. In other words, the hash of “cheese” is always the same. But the hash of “cheesE” will be substantially different than the hash of “cheese” (lower-case “e” at the end). It’s the consistency of the output and the very unique results that are the key here.
So, when you choose a password, this is how your password hash is generated and stored in the site’s database:
- User enters the username “bubba”
- User enters the password “cheese”
- The hash algorithm is run like so: BCrypt(“cheese”)
- The result of the BCrypt hash, which in this case is $2a$10$6aOmc.wbz4yt/czCGvFrIu6UYrngif1B1rwVmzJxV0YzyIxMi6qA., is stored as the user’s password
Thus, the web site’s database has a table of users, and there is now a new row of data that will look something like this:
Username E-mail Password -------- ------ --------- bubba firstname.lastname@example.org $2a$10$6aOmc.wbz4yt/czCGvFrIu6UYrngif1B1rwVmzJxV0YzyIxMi6qA.
Ta-DA! Now you have an account, and you can log in. Note that your password is not visible in the data row above. That’s by design.
How is my password verified by the server when I log in?
Okay, so now you want to log in. Well, how on Earth does the site know if your password is correct since it’s not stored in the Users table?
When you log in, you type your username and your password. The site then takes your password and applies the same hash function again, like so:
BCrypt(“cheese”) = $2a$10$6aOmc.wbz4yt/czCGvFrIu6UYrngif1B1rwVmzJxV0YzyIxMi6qA.
Oh look! It’s the same result. All the site has to do is compare the newly generated hash of the password you typed in. If the new hash matches the Password hash stored in the database, it knows you’ve typed in the correct password. VOILA! You’re logged in.
Let’s see what happens if there’s a typo in your password:
BCrypt(“cheesw“) = $2a$10$i0GOBtbnozTmH2efWka2OezCsxlgaoeiEjnTYsfn6zRLTpx1XyLzy
Correct Password Hash = $2a$10$6aOmc.wbz4yt/czCGvFrIu6UYrngif1B1rwVmzJxV0YzyIxMi6qA.
Wrong Password Hash = $2a$10$i0GOBtbnozTmH2efWka2OezCsxlgaoeiEjnTYsfn6zRLTpx1XyLzy
Oops! That’s a very different hash than the one stored in the database. Wrong password!
Why all the mathematical gymnastics?
This is where our story of the Ashley Madison hack comes into play. BCrypt is the preferred password-hashing algorithm to use specifically because it’s SLOW. Why is this important?
Well, think about what happens if someone hacks your site, and they download a copy of your database. If you store passwords in plain text, they’ve got all your user accounts hacked. If you store passwords as BCrypt hashes, then they have a problem.
The hackers don’t know what the passwords are. So, they’d have to perform some kind of brute force tests, like so:
BCrypt(“bubba123“) = DOESN’T MATCH!
BCrypt(“bubba321“) = DOESN’T MATCH!
BCrypt(“ilovebubba“) = DOESN’T MATCH!
BCrypt(“bubba4prez“) = DOESN’T MATCH!
BCrypt(“opensesame“) = DOESN’T MATCH!
For each row in the user’s table, they would have to just loop, over and over, and try different passwords until they found some password that would produce the same hash that’s stored in the Password field. As you can imagine, this would take a long time. There are other details missing here, but you get the idea.
It would take an especially long time specifically because the BCrypt hashing algorithm is designed to run very slowly. The faster the algorithm, the more “tries” can be run, and the quicker the password can be hacked. Again, this process must be repeated for each user in the table, over and over, trying different random passwords…
The fact that BCrypt is slow is precisely why it is recommended over fast hash algorithms (like SHA-1, SHA-256, SHA-3, etc.) for hashing and storing critical data like passwords.
Now, in the case of the Ashley Madison debacle, the bozos who programmed the site decided to keep an MD5 hash of the username + password in the database next to the properly BCrypted password. Since the usernames were known, the hackers simply took a guess, and POOF! They cracked the MD5 hashed “username + password”. This was possible because the MD5 hash algorithm is very, very old and very, very weak and they can be computed REALLY, REALLY fast by modern computers.
That’s exactly why nobody uses MD5 hashes for things like passwords any more! They haven’t for a long, long time. In fact, the sheer idiocy displayed by Ashley Madison’s programmers here is actually mindboggling.
Frankly, there are many large web sites that do very stupid things like this. In fact, I would say that the #1 threat to your online security isn’t vulnerabilities in some protocol, but simply the fact that most programmers are lazy and/or not very bright. It’s the human element that’ll get you every time.
I will also note that MD5 is considered unsafe, as there are also certain mathematical “bugs” or workarounds that speed up the “cracking” process. So, it’s not just computational speed that matters, but also the safety of the algorithm itself. Everyone and their dog has known for ages already that you just don’t use MD5 for… well, for anything, really!
Can you tell me my password?
Hopefully, that clarified things a bit. But, it also raises an interesting point: If some web site can tell you what your password is should you forget it, it’s not secure.
When properly implemented, your password is known only to you, because the password isn’t stored in the site’s database – a hash of the password is stored.
As we have seen, cracking these hashed passwords is not easy or quick. And that’s the whole point!
That’s also exactly why the “Forgot Password” feature of sites will generate a new random password and send it to you – because ONLY YOU know what your old password is. Even site admins can’t know your password without “cracking” the hash.
That’s how it should be done, anyway…
UPDATE: The BCrypt examples here are actually pseudocode, because BCrypt is wonky – at least in its Ruby implementation. As Leet pointed out below in the Comments, BCrypt(“blah”) always outputs a different hash. So, what you actually need to is something like this:
BCrypt::Password.create(“text_password”) => returns a HASH
Store HASH as the user’s hashed password in your database, or whatever. Then when they try to log in again, you do this:
BCrypt::Password.new(HASH) == “retyped_password”
The == comparison is required, otherwise BCrypt::Password.new will simply return the HASH you gave it. Yeah, that’s really confusing, and NOT how you use other hash functions. I’ve left the article as it is for simplicity’s sake, but be advised that at least in Ruby, BCrypt works a bit differently in terms of actual technical implementation.