Topics
Data Hashing

Data Hashing

A cryptographic hash function is an algorithm that takes some data as input and generates a fixed-length output of ciphertext called a hash value or just a hash. The hash can then be stored instead of the data itself so that it could not be used by anyone unless they have the same hash function and key. Some common algorithms are Secure Hashing Algorithm (SHA256) and Message Digest (MD5).

While this appears very similar to encryption, there is one significant difference. Hashing is one-way encryption only, which means the hash cannot be decrypted. The process, therefore, is to take clear text values, generate a hash using the same algorithm and key and then compare the hash values.

For example, user passwords entered in clear text (hidden at the UI level maybe but still received by the server as clear text) are hashed and stored in the database. A user accessing the database has no way to extract the password since the hash values cannot be reversed even if you have the algorithm. If a password has to be validated, it must be received in clear text, the hash value generated, and the hash value can be compared with that stored in the database to see if the entered password is a match to the original one.

Cryptographic hash functions have many information security applications, notably in digital signatures and other forms of authentication. They can also be used to index data in hash tables from fingerprinting to detect duplicate data or uniquely identify files as checksums to detect accidental data corruption. Hashes are for encryption in use cases where values need to be compared (like passwords), not where the original values needs to be retrieved.

Hashing is secure enough for most practical purposes, but academically less secure as symmetric or asymmetric key encryption. Hash values of any plaintext data can be determined if one has the right computational power and access to what is known as "rainbow tables", which are large pre-computed sets of plaintext strings and their corresponding hashes.

The ideal cryptographic hash function has the following main properties:

  • It is deterministic, meaning the same data always results in the same hash.
  • It is impossible to re-generate the original data from the hash value.
  • It is improbably to have two different data values with the same hash value.
  • A small change to the data should change the hash value so extensively that the new hash value appears uncorrelated with the old hash value.

Data Hashing