上QQ阅读APP看书，第一时间看更新

A crash course on hash algorithms

A hash is a one-way function; you can't take a hash value and work backwards to an input. The hash value is a fixed length defined by the algorithm, whereas the input is a variable length. You can create a SHA-256 hash value, 256 bits long, for a single letter or for the entire works of Shakespeare.

Some hash examples using SHA-256 include:

The ASCII letter a (lowercase):

ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb

The ASCII letter A (uppercase):

559aead08264d5795d3909718cdd05abd49572e84fe55590eef31a88a08fdffd

Shakespeare's The Tragedy of Titus Andronicus (entire play):

02b8d381c9e39d6189efbc9a42511bbcb2d423803bb86c28ae248e31918c3b9a

Shakespeare's The Tragedy of Titus Andronicus but with a single word misspelled:

4487eba46b2327cfb59622a6b8984a74f1e1734285e4f8093fe242c885b4aadb

With these examples, you can see the fundamental nature of a hash algorithm at work. The output is fixed length; in these examples, the output is 64 hexadecimal characters long. (A single hexadecimal character is 4 bits long; 256 divided by 4 yields 64 characters.) A SHA-256 hash is always 64 characters, no matter the length of the input – even if the length is zero! Yes, there's even a hash value for literally nothing. It's 64 characters even for massive inputs, like Shakespeare's Titus Andronicus – that's 1.19 million characters. When it comes to the security application of hashing, one critical feature is the fact that changing a single character in a Shakespeare play radically changed the hash value. This is due to a principle in cryptography called the avalanche effect, and it's a core feature of secure algorithms.

Let's suppose that a bad guy has captured a hash representing my password. Thanks to the avalanche effect, he has no way of knowing by merely hashing his guesses that he was getting close to the actual value. He could be a single character off and the hash would look radically different. I know what the hacker in you is thinking, though: "mathematically speaking, as long as the fixed-length one-way function will accept inputs of arbitrarily longer lengths, there will always be some pair of values that will hash to the same output." Brilliant point, and you're right. This is called a collision. The primary goal of any secure hashing algorithm design is to reduce the risk of collisions. Mathematically speaking, you can't eliminate them – you can just make them extremely hard to find so that you may as well just try to find the target input.

Now, it's best to not go too deep into the rabbit hole of hashing when discussing Windows security, because in classic Microsoft form, they just had to do things their way. A Windows hash, from any point in the history of the operating system, is no ordinary hash.