H.264/AVC Context Adaptive Variable Length Coding (CAVLC)

1. Introduction

The H.264 / AVC standard specifies two types of entropy coding: Context-based Adaptive Binary Arithmetic Coding (CABAC) and Variable-Length Coding (VLC). The Variable-Length Coding scheme is described in this document.

2. Coded elements

Parameters that require to be encoded and transmitted include the following (Table 21).

Above the slice layer, syntax elements are encoded as fixed- or variable-length binary codes. At the slice layer and below, elements are coded using either variable-length codes (VLCs) or context-adaptive arithmetic coding (CABAC) depending on the entropy encoding mode.

3. Variable length coding (VLC)

When entropy_coding_mode is set to 0, residual block data is coded using a context-adaptive variable length coding (CAVLC) scheme and other variable-length coded units are coded using Exp-Golomb codes.

3.1. Exp-Golomb entropy coding

Exp-Golomb codes (Exponential Golomb codes) are variable length codes with a regular construction. Table 31 lists the first 9 codewords; it is clear from this table that the codewords progress in a logical order. Each codeword is constructed as follows:

[M zeros][1][INFO]

where INFO is an M-bit field carrying information. The first codeword has no leading zero or trailing INFO; codewords 1 and 2 have a single-bit INFO field; codewords 3-6 have a 2-bit INFO field; and so on. The length of each codeword is (2M+1) bits.

Each Exp-Golomb codeword can be constructed by the encoder based on its index code_num: M = ⎣log2(code_num+1)⎦

INFO = code_num + 1 – 2M

A codeword can be decoded as follows:

Read in M leading zeros followed by 1.
Read M-bit INFO field.
code_num = 2M + INFO – 1

(For codeword 0, INFO and M are zero).

A parameter v to be encoded is mapped to code_num in one of 3 ways:

ue(v) : Unsigned direct mapping, code_num = v. Used for macroblock type, reference frame index and others.

se(v) : Signed mapping, used for motion vector diﬀerence, delta QP and others. v is mapped to

code_num as follows (Table 32).

code_num = 2|v| (v < 0)

code_num = 2|v| - 1 (v ≥ 0)

Table 32 Signed mapping se(v)

me(v) : Mapped symbols; parameter v is mapped to code_num according to a table specified in the standard. This mapping is used for the coded_block_pattern parameter. Table 33 lists a small part of the table for Inter predicted macroblocks: coded_block_pattern indicates which 8x8 blocks in a macroblock contain non-zero coeﬃcients.

Table 33 Part of coded_block_pattern table

Each of these mappings (ue, se and me) is designed to produce short codewords for frequently-occurring values and longer codewords for less common parameter values. For example, macroblock type Pred_L0_16x16 (i.e. 16x16 prediction from a previous picture) is assigned code_num 0 because it occurs frequently whereas macroblock type Pred_8x8 (8x8 prediction from a previous picture) is assigned code_num 3 because it occurs less frequently. The commonly-occurring motion vector diﬀerence (MVD) value of 0 maps to code_num 0 whereas the less-common MVD = -3 maps to code_num

3.2. Context-based adaptive variable length coding (CAVLC)

This is the method used to encode residual, zig-zag ordered 4x4 (and 2x2) blocks of transform coeﬃcients. CAVLC is designed to take advantage of several characteristics of quantized 4x4 blocks:

After prediction, transformation and quantization, blocks are typically sparse (containing mostly zeros). CAVLC uses run-level coding to compactly represent strings of zeros.
The highest non-zero coeﬃcients after the zig-zag scan are often sequences of +/-1. CAVLC signals the number of high-frequency +/-1 coeﬃcients (“Trailing 1s” or “T1s”) in a compact way.
The number of non-zero coeﬃcients in neighbouring blocks is correlated. The number of coeﬃcients is encoded using a look-up table; the choice of look-up table depends on the number of non-zero coeﬃcients in neighbouring blocks.
The level (magnitude) of non-zero coeﬃcients tends to be higher at the start of the reordered array (near the DC coeﬃcient) and lower towards the higher frequencies. CAVLC takes advantage of this by adapting the choice of VLC look-up table for the “level” parameter depending on recently-coded level magnitudes.

CAVLC encoding of a block of transform coeﬃcients proceeds as follows.

1. Encode the number of coefficients and trailing ones (COEFF_TOKEN)

The first VLC, coeﬀ_token, encodes both the total number of non-zero coeﬃcients (TotalCoeﬀs) and the number of trailing +/-1 values (T1). TotalCoeﬀs can be anything from 0 (no coeﬃcients in the 4x4 block)1 to 16 (16 non-zero coeﬃcients). T1 can be anything from 0 to 3; if there are more than 3 trailing +/-1s, only the last 3 are treated as “special cases” and any others are coded as normal coeﬃcients.

There are 4 choices of look-up table to use for encoding coeﬀ_token, described as Num-VLC0, Num-VLC1, Num-VLC2 and Num-FLC (3 variable-length code tables and a fixed-length code). The choice of table depends on the number of non-zero coeﬃcients in upper and left-hand previously coded blocks Nu and NL. A parameter N is calculated as follows:

If blocks U and L are available (i.e. in the same coded slice), N = (Nu + NL)/2

If only block U is available, N=NU ; if only block L is available, N=NL ; if neither is available, N=0.

Figure 31 Neighbouring blocks NU and NL

N selects the look-up table (Table 34) and in this way the choice of VLC adapts depending on the number of coded coeﬃcients in neighbouring blocks (context adaptive). Num-VLC0 is “biased” towards small numbers of coeﬃcients; low values of TotalCoeﬀs (0 and 1) are assigned particularly short codes and high values of TotalCoeﬀ particularly long codes. Num-VLC1 is biased towards medium numbers of coeﬃcients (TotalCoeﬀ values around 2-4 are assigned relatively short codes), Num-VLC2 is biased towards higher numbers of coeﬃcients and FLC assigns a fixed 6-bit code to every value of TotalCoeﬀ.

_____________

1 Note: coded_block_pattern (described earlier) indicates which 8x8 blocks in the macroblock contain nonzero
coefficients; however, within a coded 8x8 block, there may be 4x4 sub-blocks that do not contain any
coefficients, hence TotalCoeff may be 0 in any 4x4 sub-block. In fact, this value of TotalCoeff occurs most
often and is assigned the shortest VLC.

2. Encode the sign of each T1.

For each T1 (trailing +/-1) signalled by coeﬀ_token, a single bit encodes the sign (0=+, 1=-). These are encoded in reverse order, starting with the highest-frequency T1.

3. Encode the levels of the remaining non-zero coefficients

The level (sign and magnitude) of each remaining non-zero coeﬃcient in the block is encoded in reverse order, starting with the highest frequency and working back towards the DC coeﬃcient. The choice of VLC table to encode each level adapts depending on the magnitude of each successive coded level (context adaptive). There are 7 VLC tables to choose from, Level_VLC0 to Level_VLC6. Level_VLC0 is biased towards lower magnitudes; Level_VLC1 is biased towards slightly higher magnitudes and so on. The choice of table is adapted in the following way:

(a) Initialise the table to Level_VLC0 (unless there are more than 10 non-zero coeﬃcients and less than 3 trailing ones, in which case start with Level_VLC1).

(b) ) Encode the highest-frequency non zero coeﬃcient.

(c) If the magnitude of this coeﬃcient is larger than a pre-defined threshold, move up to the next VLC table.

In this way, the choice of level is matched to the magnitude of the recently-encoded coeﬃcients. The thresholds are listed in Table 35; the first threshold is zero which means that the table is always incremented after the first coeﬃcient level has been encoded.

4. Encode the total number of zeros before the last coefficient

TotalZeros is the sum of all zeros preceding the highest non-zero coeﬃcient in the reordered array. This is coded with a VLC. The reason for sending a separate VLC t indicate TotalZeros is that many blocks contain a number of non-zero coeﬃcients at the start of the array and (as will be seen later) this approach means that zero-runs at the start of the array need not be encoded.

5. Encode each run of zeros

The number of zeros preceding each non-zero coeﬃcient (run_before) is encoded in reverse order. A run_before parameter is encoded for each non-zero coeﬃcient, starting with the highest frequency, with two exceptions:

(a) If there are no more zeros left to encode (i.e. ∑[run_before] = TotalZeros), it is not necessary to encode any more run_before values.

(b) It is not necessary to encode run_before for the final (lowest frequency) non-zero coeﬃcient.

The VLC for each run of zeros is chosen depending on (a) the number of zeros that have not yet been encoded (ZerosLeft) and (b) run_before. For example, if there are only 2 zeros left to encode, run_before can only take 3 values (0,1 or 2) and so the VLC need not be more than 2 bits long; if there are 6 zeros still to encode then run_before can take 7 values (0 to 6) and the VLC table needs to be correspondingly larger.

CAVLC Examples

In all the following examples, we assume that table Num-VLC0 is used to encode coeﬀ_token.

Decoding:

The output array is “built up” from the decoded values as shown below. Values added to the output array at each stage are underlined.

Note 1: Level (3), with a value of -3, is encoded as a special case. If there are less than 3 T1s, then the first non-T1 level will not have a value of +/-1 (otherwise it would have been encoded as a T1). To save bits, this level is incremented if negative (decremented if positive) so that +/-2 maps to +/-1, +/-3 maps to +/-2, and so on. In this way, shorter VLCs are used.

Note 2: After encoding level (3), the level_VLC table is incremented because the magnitude of this level is greater than the first threshold (which is 0). After encoding level (1), with a magnitude of 4, the table number is incremented again because level (1) is greater than the second threshold (which is 3). Note that the final level (-2) uses a diﬀerent code from the first encoded level (also –2).

About the author

Vcodex is led by Professor Iain Richardson, an internationally known expert on the MPEG and H.264 video compression standards. Based in Delft, The Netherlands, he frequently travels to the US and Europe.

Iain Richardson is an internationally recognised expert on video compression and digital video communications. He is the author of four other books about video coding which include two widely-cited books on the H.264 Advanced Video Coding standard. For over thirty years, he has carried out research in the field of video compression and video communications, as a Professor at the Robert Gordon University in Aberdeen, Scotland and as an independent consultant with his own company, Vcodex. He advises companies on video compression technology and is sought after as an expert witness in litigation cases involving video coding.