Reed-Solomon
The Problem
A satellite sends 223 bytes to Earth. Some bytes get corrupted by noise on the way down. We want the receiver to detect and fix the corruption automatically, without retransmission.
Reed-Solomon (RS) coding adds 32 extra parity bytes to the 223 data bytes, making 255 total. If up to 16 of those 255 bytes are corrupted in transit, the receiver can figure out which bytes are wrong and what they should have been.
Why We Need Arithmetic
A simple check like XOR-ing all bytes together can tell you whether something changed, but not which byte changed or what it changed to. To recover both the position and the value of a corrupted byte, you need more information.
The idea is to build multiple independent equations that relate the parity bytes to the data bytes. Each equation multiplies every data byte by a different coefficient and sums the results. For example, one parity byte might be:
and another:
If a single byte gets corrupted, both and will be wrong by amounts that depend on 's position (because each equation weighted differently). The receiver can solve these two equations to find both where and by how much.
More errors need more equations, which means more parity bytes. Correcting up to 16 errors requires 32 parity bytes (2 per error: one for position, one for magnitude).
This is why arithmetic — multiplication and addition — is unavoidable.
A Special Number System
Since we're doing arithmetic on bytes, we need every operation to produce a result that is also a byte. In normal arithmetic this fails: , which doesn't fit. If intermediate results grow beyond one byte, the parity symbols become a different size than the data symbols, and the fixed-size structure (255 symbols 1 byte each) falls apart.
So we use a number system called GF() — a Galois field with 256 elements (0--255) — where every operation on two bytes always produces exactly one byte:
- Addition is XOR: (everything cancels itself)
- Multiplication uses special rules that always stay within 0--255
The field has a special element called (alpha), defined as . Its powers generate every non-zero value in the field:
So every non-zero byte can be written as for some . This is useful because it turns multiplication into addition of exponents: .
We store these powers in a lookup table called EXP (and the reverse mapping in LOG), so a multiplication becomes two table lookups and one addition — effectively free on any CPU.
Bytes as a Polynomial
Take the 223 data bytes, say . We treat them as coefficients of a polynomial:
This isn't something we "solve for x." It's a way of representing the byte sequence so we can do algebra on it. Each byte is a coefficient; its position in the array determines the power of .
The Generator Polynomial
We want a polynomial that is zero at 32 specific points. Start with the simplest polynomial that is zero at one point — :
This is zero when and non-zero everywhere else. To also be zero at , multiply by another factor:
Continue for all 32 points:
Multiplying these 32 factors out (in GF() arithmetic, where subtraction is the same as addition/XOR) gives a degree-32 polynomial with 33 concrete byte-valued coefficients. It is fixed — the same for every message, computed once.
The key property: , , ..., , by construction.
Encoding
We have the data polynomial (223 bytes) and the generator (degree 32). The goal is to find 32 parity bytes such that the complete 255-byte codeword is divisible by .
First, shift up by 32 positions to make room for parity:
This is like writing the 223 data bytes followed by 32 zeros. Now divide by :
where is the remainder (degree < 32, so exactly 32 bytes). The codeword is:
In GF(), addition is XOR, so adding replaces those 32 trailing zeros with the parity bytes. Now check — is divisible by ?
Yes — because (XOR with itself). So is exactly divisible by , which means evaluating at any root of gives zero:
The 223 data bytes sit at the front, unchanged. Only 32 parity bytes are appended. This is called systematic encoding.
Decoding
Syndrome Check
The receiver evaluates the received 255-byte polynomial at the same 32 special values. If the data arrived intact, all 32 results are zero (because a valid codeword is divisible by ).
If any byte was corrupted, at least some results will be non-zero. These 32 numbers are the syndromes — a fingerprint of the damage.
Finding Error Positions: Berlekamp-Massey
For a single error, the ratio directly reveals the error position (as shown in the example). For multiple errors, we need to find all error positions simultaneously.
The idea is to build an error locator polynomial whose roots correspond to the error positions. If there are errors at positions , define for each error and:
Each is a root of , and from we can recover .
The Berlekamp-Massey algorithm finds from the syndromes. It works iteratively, processing one syndrome at a time:
- Start with (no errors assumed).
- For each syndrome (), compute a discrepancy: where is the current number of estimated errors.
- If , the current already predicts correctly — no update needed.
- If , the current is wrong. Update it using a correction term scaled by . If this increases the estimated error count (), also update .
- After all 32 syndromes, is the number of errors and has degree . If , the codeword has too many errors to correct.
The algorithm is efficient because it reuses a saved "old" version of as the correction term, so each step is just a few multiplications.
Finding Error Positions: Chien Search
Berlekamp-Massey gives us , but we need the actual roots. The Chien search simply evaluates at every possible position:
- For each , compute using the GF() lookup tables.
- If , then is a root, meaning there is an error at position .
- Collect all positions where evaluates to zero.
If the number of roots found does not match from Berlekamp-Massey, the codeword is uncorrectable.
This is a brute-force search, but with only 255 positions and each evaluation being a few table lookups, it is fast.
Finding Error Magnitudes: Forney Algorithm
Now we know where the errors are. The Forney algorithm computes how much each corrupted byte is off by.
First, build the error evaluator polynomial:
where is the syndrome polynomial. This multiplication and truncation combines the syndrome information with the error locations.
Next, compute the formal derivative of . In GF(), the derivative has a simplification: since in GF(), all even-power terms vanish. Only the odd-index coefficients survive:
The error magnitude at position is then:
where is the error locator value for position , and the factor adjusts for the first consecutive root being instead of .
Correction
XOR each corrupted byte with its computed error magnitude. The original data is restored. As a final check, recompute the 32 syndromes — if they are all zero, the correction succeeded.
When Correction Fails
If more than 16 bytes are corrupted, the decoder detects the failure at one of three points:
- Berlekamp-Massey finds : the syndrome equations imply more errors than the code can handle.
- Chien search finds fewer roots than : the error locator polynomial has no valid solution within the 255 positions.
- Verification after correction: the recomputed syndromes are still non-zero, meaning the applied corrections were wrong.
In all three cases, the decoder reports failure. The data remains corrupted and must be recovered by retransmission or a higher-level protocol.
Why These Numbers
Why 32 parity bytes for 16 errors? Each error has two unknowns: where it is and what changed. That's 2 unknowns per error, and each syndrome gives 1 equation. 32 syndromes -> at most 16 solvable errors.
Why 255 total? In GF() there are 255 non-zero values. Each one maps to a position in the codeword. You can't have more positions than field elements, so 255 is the maximum codeword length for byte-sized symbols.
Why 223 data bytes? . It's simply whatever room is left after reserving space for 32 parity bytes.
Interleaving
Burst errors (e.g. a brief signal dropout) corrupt consecutive bytes. With interleaving depth , the transmitter shuffles independent codewords together so that consecutive bytes belong to different codewords.
A burst of corrupted bytes gets spread across codewords, each seeing at most 16 errors — exactly within the correction capability. CCSDS supports through .
CCSDS Parameters
The specific parameters used in this implementation follow CCSDS 131.0-B-5:
| Parameter | Value |
|---|---|
| Field polynomial | (0x187) |
| Primitive element | |
| Codeword length | 255 symbols |
| Data length | 223 symbols |
| Parity symbols | 32 |
| Error correction | Up to 16 symbol errors |
| First consecutive root | |
| Interleave depth | to |
End-to-End Example
We encode the message "HELLO" (bytes 72, 69, 76, 76, 79),
corrupt one byte in transit, and recover the original.
1. Encoding
The 5-byte message is padded with zeros to fill all 223 data positions:
Data (223 bytes):
[ 72, 69, 76, 76, 79, 0, 0, 0, ... , 0 ]
H E L L O
The data bytes become a polynomial (each byte is a coefficient, highest power first):
(The remaining 218 coefficients are zero.)
Shift it up by 32 to make room for parity:
The generator is a fixed degree-32 polynomial.
Divide by . The remainder has 32
coefficients — these become the parity bytes
[243, 147, 197, 58, ...]. XOR them into the trailing zeros:
Codeword (255 bytes):
[ 72, 69, 76, 76, 79, 0, ..., 0, 243, 147, 197, 58, 154, 156, 250, 218, ... ]
|--- 223 data bytes ----------| |-------- 32 parity bytes --------------|
The data is unchanged at the front. Only parity was added.
2. Corruption
During transmission, byte 2 gets corrupted — the L (76)
becomes 255:
Received:
[ 72, 69, 255, 76, 79, 0, ..., 0, 243, 147, 197, ... ]
^^^
was 76, now 255
3. Syndrome Check
The received 255 bytes form a polynomial:
The receiver computes each syndrome by replacing with one of the 32 special values. Since , the first syndrome uses . In GF(), exponents wrap mod 255, so is just a single byte (looked up from the EXP table):
Each term like simplifies to , which is just another byte from the EXP table. Plugging in the received bytes (, , , ...):
All arithmetic is in GF(): every multiplication uses the lookup tables, every addition is XOR, and the result is always a single byte.
In general, each syndrome uses a different power of 2:
for . The same bytes, but different coefficients each time — that's what gives the receiver 32 independent equations to work with.
For the original (uncorrupted) codeword, every syndrome is zero. Recall from the encoding step that the codeword is , where . To compute , we evaluate at :
Expanding :
The first factor is (any value XOR'd with itself is zero). A product with a zero factor is zero, so and:
The same applies to through : each evaluates at through , and each is a root of by construction, so every factor chain has a zero term:
After the corruption at byte 2, the syndromes become non-zero:
These 32 values encode both where the error is and how large it is. The decoder's job is to extract that information.
4. Error Location (Berlekamp-Massey)
The decoder builds an error locator polynomial whose roots reveal the corrupted positions. It processes the syndromes one at a time, updating whenever the current guess fails to predict the next syndrome.
Start with and (no errors assumed).
Iteration : Compute the discrepancy . This is non-zero, so the current is wrong. Update: , .
Iteration : . Non-zero, so update . Since doesn't increase (), only the coefficients change: .
Iteration : . Zero — already predicts correctly. No update.
Iterations through : All discrepancies are zero. The polynomial has converged.
Result: with (one error). Since , the error locator coefficient directly encodes the error position.
4b. Error Location (Chien Search)
The Chien search finds the roots of by evaluating it at every possible value :
, so is a root. The error position is
. That is byte 2 — exactly where the L
was corrupted.
5. Error Magnitude (Forney)
The Forney algorithm computes the error evaluator polynomial , which starts:
So (a constant). The formal derivative of is also a constant: .
The error magnitude uses and :
where adjusts for the first consecutive root, and in GF().
Correction is XOR:
6. Result
The corrected codeword matches the original:
Corrected:
[ 72, 69, 76, 76, 79, 0, ..., 0, 243, 147, 197, ... ]
H E L L O (parity intact)
Errors corrected: 1