CRC
In embedded systems, CRC stands for cyclic redundancy check. In many respects, a CRC is similar to a checksum, and this is, in fact, what it is typically used for. CRCs come in a wide range of sizes, from 4 bits to 64 bits, but there is no upper limit.
Generating CRCs in hardware
In hardware, CRCs can be generated easily by shifting data through a shift register with feedback at certain bit positions. A 32-bit CRC does not require much more than a 32-bit shift register. This is ideal for applications in which data is transferred bit-by-bit.
Many MCUs have built-in CRC circuits that typically allow computation of a specific CRC, such as 0x1021 (16-bit). There are two downsides to these hardware units: The first is that they are fixed to a specific CRC. The second is that they are not thread safe, as the CRC unit can only be used from one thread(task) at a time, so the software needs to be designed in a way that makes sure this happens. This is typically done by using the unit in a single thread only, by securing it via a Mutex/Semaphore/Critical region, or simply by disabling interrupts.
Generating CRCs in software
CRC generation is something that needs to be carried out often, and there are many ways to do it. The easiest way is to compute a CRC bit by bit. Unfortunately, this is also the slowest way. More advanced ways of generating CRCs are table based. The tables can be any size, but there are typically 16 entries for 4 bits at a time, 256 entries for 8 bits (one byte) at a time or even 64k entries to process two bytes at once.
Portable implementation in C
Explanation
Shown below is a piece of C code that SEGGER actually uses in production code. It first computes a CRC and LSB for any polynomial of up to 32 bits, and for any amount of data. The starting value can be specified, and the routine can be used to compute the CRC of multiple memory areas simply by using the return value of the first block as the initial value for the second block.
This code is small and efficient and is thus used where size is more important than speed. The downside is that it is a bit slow, as each byte is computed in an inner loop, on a bit-by-bit basis (executed 8 times for each byte).
Typically, CPUs will need about 150 cycles per byte, so a 200 MHz microcontroller will deliver about 1.3 MiB/sec. Faster versions with a table (typically with 256 entries, so that they handle 1 byte per loop) can be optimized to use about 15 cycles per loop, delivering performance of about 13MiB. Note that the code does not need any globals/statics, and it is fully reentrant.
C - Source code
//
// Universal CRC computation code, for LSB (least significant bit first) computations.
//
// Parameters
// pData Pointer to the (beginning of) the data area, over which the CRC shall be computed,
// Data is read one byte at at time
// NumBytes Obviously the number of bytes to perform the computation on. Note that the code for simplicity and speed does not
// check if the value is 0, so it has to be a positive value ( > 0)
// crc The initial value (start value). Usually all 0s or all 1s.
// Also very important so the CRC can be computed in multiple steps, over multiple data areas.
// In this case, the crc of the first block is the start value for the second block, etc.
// Polynom Polynomial, in LSB format.
// Since it is in LSB format, there is no need to specify the size of the polynomial
//
U32 CRC_Calc(const U8* pData, unsigned NumBytes, U32 crc, U32 Polynom) {
int i;
U8 Data;
do {
Data = *pData++;
i = 8;
do {
if ((crc ^ Data) & 1) {
crc >>= 1;
crc ^= Polynom;
} else {
crc >>= 1;
}
Data >>= 1;
} while (--i);
} while (--NumBytes);
return crc;
}