For some reason a couple of days ago I got hooked up with password strength stuff.
First, a disclaimer: I'm no security or math expert. There's some handwaving here, and it's been too many years since I studied this, but I hope I didn't make any gross conceptual errors. If I did, please tell me.
Now, if you look around you'll find lots of password strength checkers, but most of them work with a set of rules, each adding a fixed number of 'points' for password length, using numbers, symbols, etc. That's good enough for many cases, but we can do better.
The better way involves calculating the entropy of the password. In a nutshell, information entropy is a measure of unpredictability/uncertainty.
For example, random bytes are unpredictable. Truly random bytes (i.e. not pseudo-random) would make terrific passwords if it weren't because nobody can remember them. "902c5cce62e11db18264" (10 random bytes represented in hex, courtesy of random.org) is a pretty good password. Its entropy is high. But it's also very hard to remember, so it's not very practical.
On the other hand, there's plain text. Text is much more memorable, but it's also more predictable, so plain text passwords are usually easily guessable. Unless they're long enough, but then they aren't memorable.
So there's a tension between password strength and usefulness, which prompts some to claim that passwords are broken, and gurus to refute that. Most people pick trivial memorable passwords, while security experts recommend creating strong passwords and writing them down. One way to do that would be using password managers.
Anyway, back to password strength checkers. One of the best I found is this one in Javascript. It checks a dictionary for common words (in English), and also calculates entropy using a table of letter pair frequencies in English. The character set is semi-dynamic depending on the input password. It's pretty sophisticated.
A cheaper way (for my little brain that is, not necessarily for the computer) to estimate entropy indirectly is by using compression. Entropy and compression rate are directly related: the higher the entropy in a piece of data, the harder it is to compress it, so the resulting compressed data will be longer. (EDIT: turns out this is theoretically incorrect, see discussion below)
This of course relies on having built-in compression in your platform, which Javascript on browsers has not, but in .NET we can easily write (F#):
open System.IO open System.IO.Compression let compressedLength (s: string) = use buffer = new MemoryStream() use comp = new DeflateStream(buffer, CompressionMode.Compress) use w = new StreamWriter(comp) w.Write(s) w.Flush() buffer.Length
Let's see how this function behaves for various potential passwords:
["a"; "god"; "123456"; "password"; "password1"; "password12"; "password1!"; "q!1[Ba5"; "902c5cce62e11db18264"; "iloveyou"; "ILoveYou"; "qwerty123456"; "q1w2e3r4t5y6"; "this is a good password"; "12345678"; "111111"; "letmein"; "qwerty"] |> List.map (fun p -> p,compressedLength p) |> List.sortBy snd
Password | Compressed length |
---|---|
a | 98 |
god | 100 |
111111 | 100 |
123456 | 104 |
password | 104 |
iloveyou | 104 |
letmein | 104 |
qwerty | 104 |
password1 | 106 |
password12 | 106 |
q!1[Ba5 | 106 |
ILoveYou | 106 |
12345678 | 106 |
password1! | 108 |
qwerty123456 | 110 |
q1w2e3r4t5y6 | 110 |
this is a good password | 116 |
902c5cce62e11db18264 | 118 |
Due to header overhead, even a single character compresses to 98 bytes. Still, the results reflect quite nicely most of the common recommendations for passwords: make them as long as possible; use letters, numbers and symbols; mix upper and lower case; don't repeat characters.
I wouldn't accept a password that 'scores' less than 106. Requiring 108+ is safer but might piss off some users.
Note that this is highly platform-dependent. For example, on Python:
import zlib p = ["a", "god", "123456", "password", "password1", "password12", "password1!", "q!1[Ba5", "902c5cce62e11db18264", "iloveyou", "ILoveYou", "qwerty123456", "q1w2e3r4t5y6", "this is a good password", "12345678", "111111", "letmein", "qwerty"] l = [(w,len(zlib.compress(w,1))) for w in p] sorted(l, key=lambda a:a[1])
Password | Compressed length |
---|---|
a | 9 |
god | 11 |
111111 | 11 |
123456 | 14 |
qwerty | 14 |
q!1[Ba5 | 15 |
letmein | 15 |
password | 16 |
iloveyou | 16 |
ILoveYou | 16 |
12345678 | 16 |
password1 | 17 |
password12 | 18 |
password1! | 18 |
qwerty123456 | 20 |
q1w2e3r4t5y6 | 20 |
902c5cce62e11db18264 | 28 |
this is a good password | 29 |
And there could be differences even between .NET versions. It would be safer to compare against some reference values instead of comparing with a 'magic' value.
Of course, entropy is but one dimension of password strength that only considers brute-force attacks, you still have to couple this with a dictionary check to filter out common passwords. And some even go as far as saying that password entropy is rarely relevant any more!