What is a Hash Function?
A hash function uses a specific calculation to convert a piece of data of any size to a numerical value with a fixed number of bits. The value calculated by a hash function is called a Hash Value. The hash function is designed to always return the same hash value for the same data.
Uses of Hashing
If there are many large pieces of data to be compared, it will take a long time to compare them all individually. In this situation, it would be effective to find and compare the hash values of each piece of data. If the hash values are different, the data they refer to is also different, so mismatches can be found in a short period of time and with a high rate of probability. When the hash values are the same, the original data must be compared. Even when the hash values are the same, the data might be different.) However, overall, the time required for comparison can be greatly reduced.
You can also use a hash value to confirm that data has not been corrupted. For example, when saving data to a backup device, a hash value for all the saved data can be appended to the data string. When the data is loaded, the hash value for the loaded data is obtained and compared with the value stored at the end of the data string to confirm that the data is not corrupted. Similarly, this method can be used to confirm the integrity of data sent over a network.
The Message Digest
There are cases in which a cryptographic hash function is desired. A cryptographic hash function makes it impossible to restore the original data from the hash value. The hash value has a minimum size of 80 bits, and might be around 512 bits, depending on how it is to used. A cryptographic hash function is designed so that it is impossible to determine the original data from a given hash value. By impossible, we mean that it would take an extremely long period of time to compute the original data using current technology and thus is computationally infeasible. When a cryptographic hash function is used to detect tampering, the hash function is called a Message Digest Function and the hash value is called a Message Digest Value.
Using Message Digests to Detect Tampering
It is possible to use the message digest value to prevent data tampering by malicious third parties. Since a message digest value is short and has a fixed length, it is easy to distribute safely. Assuming a message digest value can be distributed safely, if the message digest for a large data set that is sent separately matches the original message digest value, we know that the data sent or received is authentic. This is because the one-way nature of the function makes it impossible to create false data that generates the same message digest value.
Using HMAC for Detection of Tampering
What do you do if you need to send the message digest value with the data? The message digest function will always return the same message digest value for the same data no matter who uses it. Accordingly, if you send the message digest value with the data, a malicious third party can substitute falsified data along with a falsified message digest value. In this case, you can use HMAC (Keyed Hashing for Message Authentication Code). In simple terms, HMAC is a mechanism that provides additional encryption of the hash value. If you provide an HMAC function with the data to hash, the data size, and the encrypted key data and key size, the function can produce a hash value encrypted with the key. Since this value cannot be generated as long as the encrypted key data is not known, if the keyed hash values match when the data is received, one can be sure that the data was sent by someone who knows the key. However, with HMAC, both the sender and the receiver use the same key (the shared-key system), so that a recipient program could analyze and determine the key, and then falsify the data. To prevent this, an electronic signature system that uses public-key encryption is needed. This library is not included in the NITRO-SDK.
Hash Functions Included in the SDK
NITRO-SDK includes the following hash functions (non-cryptographic).
8-bit Checksum | MATH_Checksum8* | Computes the one's complement of the one's complement sum in 8-bit units. | |
---|---|---|---|
16-bit Checksum | MATH_Checksum16* | Computes the one's complement of the one's complement sum in 16-bit units. | Used in IP, TCP, and UDP protocols. |
CRC-8 | MATH_CRC8* | Computes the 8-bit CRC. The generator polynomial is x8+x2+x1+1 0 and NOT is not applied to the output. |
|
CRC-16 | MATH_CRC16* | Computes the 16-bit CRC. The generator polynomial is x16+x15+x2+1 0 and NOT is not applied to the output. |
This is used in ARC, LHA, and other tools. |
CRC-16 / CCITT | MATH_CRC16CCITT* | Computes the 16-bit CRC. The generator polynomial is x16+x12+x5+1 0xffff |
Defined in CCITT X.25, and used in communication frames of various standards. |
CRC-32 | MATH_CRC32* | Computes the 32-bit CRC. The generator polynomial is x32+x26+x23+x22+x16+x12+x11+x10+x8+x7+x5+x4+x2+x1+1 0xffffffff |
Used in PKZIP, PNG and Ethernet. See ISO 3309, RFC 2083 etc. |
CRC-32 / POSIX | MATH_CRC32POSIX* | Computes the 32-bit CRC. The generator polynomial is x32+x26+x23+x22+x16+x12+x11+x10+x8+x7+x5+x4+x2+x1+1 0 . NOT is applied to output. |
Used in the POSIX-compliant Unix cksum command. Defined in POSIX 1003.2 (IEEE Std 1003.2-1992). |
checksum
is about twice as fast as CRC, but it generates the same value even when the data order is changed. This makes it a poor hash function. The checksum function included in the SDK will result in a value of 0
if the original data is added to the checksum value to obtain the checksum for all the data. Also, in the one's complement world, 0xffff
and 0
express the same value. Therefore, when a checksum value is 0
, it can be replaced by 0xffff
and the original data plus the checksum value will still be 0
. Based on this fact, you can prevent 0
from being used for normal checksum values and use it for a null value (such as uncalculated). For details, see the general UDP Checksum description.
When the data length is below a certain value, CRC is mathematically designed so that it is guaranteed to find errors up to a certain number of bits. For example, with CRC-16 and CRC-16/CCITT, when the data length, including the CRC, is up to SVC_GetCRC16
, which returns the same results as MATH_CalcCRC16
. But the MATH CRC function creates the calculation tables in advance, which speeds up operation.
Note that these are not cryptographic hash functions so it is easy to falsify data so that it returns the same hash value.
The following cryptographic hash functions are included.
MD5 | MATH_MD5* | The message digest value length is 128 bits. | Used widely on the Internet. See RFC 1321. |
---|---|---|---|
SHA-1 | MATH_SHA1* | The message digest value length is 160 bits. | Used widely as a successor to MD5. See RFC 3174. |
HMAC-MD5 | MATH_CalcHMACMD5 | HMAC that uses MD5. The message digest value length is 128 bits. | See RFC 2104. |
HMAC-SHA-1 | MATH_CalcHMACSHA1 | HMAC that uses SHA-1. The message digest value length is 160 bits. | See RFC 2104. |
MD5 is the most widely-used cryptographic hash function, but research has found that it is vulnerable under certain conditions and that it may not be secure with bit lengths of 128 bits. As a result, there is currently a shift to SHA-1.
These cryptographic hash functions will still provide sufficient utility as hash functions if bits that are computed partway through computation are used. (Saving and using the first 80 bits in HMAC-SHA-1
Speed Comparison of the Hash Functions
In the SDK, the dgt-2
Checksum8: 80201 us/MB Checksum16: 80174 us/MB CRC-8: 182277 us/MB CRC-16: 225972 us/MB CRC-16/CCITT: 227036 us/MB CRC-32: 196804 us/MB CRC-32/POSIX: 196690 us/MB MD5: 299266 us/MB SHA-1: 591120 us/MB HMAC-MD5: 299839 us/MB HMAC-SHA-1: 591214 us/MB CRC-16(SVC): 862306 us/MB
MATH Function List (Hash Value), MATH Function List (Message Digest)
07/08/2005 Updated the output sample for the dgt-2 demo in conjunction with the implementation updates for SHA-1.
06/24/2005 Updated the output sample for the dgt-2 demo in conjunction with the implementation updates for MD5.
04/28/2005 Initial version.