# Hashing

Cryptographic hashes are functions that take some arbitrary input and return a fixed-length value. The particular value depends on the given hash algorithm in use, such as SHA-1 (opens new window) (used by git), SHA-256 (opens new window), or BLAKE2 (opens new window), but a given hash algorithm always returns the same value for a given input. Have a look at Wikipedia's full list of hash functions (opens new window) for more.

As an example, the input:

Hello world

would be represented by SHA-1 as:

0x7B502C3A1F48C8609AE212CDFB639DEE39673F5E

However, the exact same input generates the following output using SHA-256:

0x64EC88CA00B268E5BA1A35678A1B5316D212F4F366B2477232534A8AECA37F3C

Notice that the second hash is longer than the first one. This is because SHA-1 creates a 160-bit hash, while SHA-256 creates a 256-bit hash. The prepended 0x indicates that the following hash is represented as a hexadecimal number.

Hashes can be represented in different bases (base2, base16, base32, etc.). In fact, IPFS uses that as part of its content identifiers and supports multiple base representations at the same time, using the Multibase (opens new window) protocol.

For example, the SHA-256 hash of "Hello world" from above can be represented as base 32 as:

mtwirsqawjuoloq2gvtyug2tc3jbf5htm2zeo4rsknfiv3fdp46a

TIP

If you're interested in how cryptographic hashes fit into how IPFS works with files in general, check out this video from IPFS Camp 2019! Core Course: How IPFS Deals With Files (opens new window)

# Important hash characteristics

Cryptographic hashes come with several important characteristics:

deterministic - the same input message always returns exactly the same output hash
uncorrelated - a small change in the message should generate a completely different hash
unique - it's infeasible to generate the same hash from two different messages
one-way - it's infeasible to guess or calculate the input message from its hash

These features also mean we can use a cryptographic hash to identify any piece of data: the hash is unique to the data we calculated it from and it's not too long so sending it around the network doesn't take up a lot of resource. A hash is a fixed length, so the SHA-256 hash of a one-gigabyte video file is still only 32 bytes.

That's critical for a distributed system like IPFS, where we want to be able to store and retrieve data from many places. A computer running IPFS can ask all the peers it's connected to whether they have a file with a particular hash and, if one of them does, they send back the whole file. Without a short, unique identifier like a cryptographic hash, content addressing wouldn't be possible.

Was this information helpful?

Edit this page on GitHub or open an issue

Help us improve this site!

Suggest new content

Give general feedback