Hashing is for more than just potatoes. So what does it mean to hash something in the digital world?
What is a Cryptographic Hash?
When you hear the term hashing in the digital world, it’s usually referring to a cryptographic hash. This is essentially the “fingerprint” of some data. A hash is a string of random-looking characters that uniquely identifies the data in question, much like your fingerprint identifies you. You can hash any data, whether it’s a file (like a music MP3 or spreadsheet) or just a string of characters (like a password). You find the hash by running the data through a hash generator. Every time you hash the same data, you will get the exact same hash value as a result.
Examples of Hashing
I’m using an MD5 hash generator for these examples. I hashed several types of data: some text, a document, and an MP3 file. Notice that the data are all different sizes but the hashes are always the same length.
What is Hashing Used for?
Say I have a file that I downloaded that I want to verify hasn’t been corrupted or infected with a virus. First I would generate the hash of the file I downloaded, then I would compare that hash against the one provided by the site where the file came from. If they match, then the file I received has not been altered. (Most sites do not provide hashes for their file downloads).
If the hashes don’t match, then it’s been altered somehow. It could have been corrupted, infected, or changed some other way. The file may look the same, taste the same, work the same, etc. But if even one teensy tiny bit of data has been changed, the hashes will not match.
There are two ways passwords are typically stored on a computer or website. The first is in “plain text.” If a crook steals a plain text database of passwords they will be able to see the passwords in the clear. This is the wrong way to store passwords.
The best way to store passwords is to hash them first. When you create a password on a secure system, it hashes the password before it’s stored. It does not store your actual password. It stores the hash of the password and forgets what you actually typed. Then the next time you type in your password, it hashes what you type in and checks it against the stored hash. If the hashes match, it lets you in. Your actual password is never saved on the computer or website.
This is valuable because if the hashed database is stolen, it cannot be read by the bad guys. Instead of seeing a list of passwords, they see a list of hashes. This is useless to them unless they can figure out how to reverse engineer the hashes to get the passwords. Easier said than done if it’s been hashed correctly and if the passwords are strong enough.
Hashing can speed up the process of searching through a database. Say that we’re storing a long list of names in a table. We need to find if a certain name is in that list. Well, the computer can do a search for that name, but that might be a long process because it has to match a large string of characters.
We can significantly shorten that time by creating a hash for every name on the list. As long as the hash is shorter than the average name length, then the search will be faster. The computer can search the hash table instead of the actual names and find out more quickly if that name exists in the database.
How is Hashing Different from Encryption?
Encryption is a two way function. Data is encrypted with the purpose of being decrypted at a later time. This is the only good way to store or move data in a secure fashion.
Hashing, however, is never meant to be reversed. It’s not meant to be a secure way to store or move data, but is purely used as an easy way to compare two blobs of data.
Also, hashing will always produce a fixed-length value. Take the example of MD5 hashing above. The blobs of data are different sizes to begin with, but the resulting hashes are exactly the same length. Encryption, on the other hand, produces cipher text of which the size is directly proportionate to the original size of the data.
Popular Cryptographic Hash Functions
The Message Digest 5 algorithm produces hashes that are 128 bits in length, expressed as 32 hexadecimal characters. Introduced in 1991.
The Secure Hashing Algorithm comes in several flavors. The most often used for common purposes today are SHA-1 and SHA-256, which produce 160- and 256-bit hashes respectively (expressed as 40 and 64 characters).
Find a full list of cryptographic hashes here:
The Problem of Collisions
You may have noticed a problem inherent with hashing. Since they produce a fixed-length value, there are a finite number of hashes for each type of algorithm. This makes collisions possible. A collision is when two different blobs of data produce the exact same hash. It’s extremely rare for this to happen, but they have been reported. As a result, some older hashing functions have been deemed unworthy to be used for secure applications.
Naturally, the longer the hash value, the less likely a collision will happen. For instance, a function that creates a 256-bit hash (like SHA) will have fewer collisions than one that produces a 128-bit hash (like MD5) because there are more possible hash values when you have more bits.