![]() |
Vcodex Limited |
| home > resources > H.264 tutorials > H.264 4x4 transform |
| about services resources blog contact |
Vcodex White Paper 4x4 Transform and Quantization in H.264/AVC Revised April 2009; Copyright (c) Iain Richardson, 2007-2009; All rights reserved. Do not reproduce any material without permission.
1
Overview
Figure 1 Transform and quantization in an H.264 codec This paper describes a derivation of the forward and inverse
transform and quantization processes applied to 4x4 blocks of luma and chroma
samples in an H.264 codec. The transform is a scaled approximation to a 4x4
Discrete Cosine Transform that can be computed using simple integer arithmetic.
A normalisation step is incorporated into forward and inverse quantization
operations.
2
The H.264 transform and quantization process
Figure 2 Re-scaling and inverse transform
3
Developing the forward transform and quantization process
The basic 4x4 transform used in H.264 is a scaled
approximate Discrete Cosine Transform (DCT). The transform and quantization
processes are structured such that computational complexity is minimized. This
is achieved by reorganising the processes into a core part and a scaling part. Consider a block of pixel data that is processed by a
two-dimensional Discrete Cosine Transform (DCT) followed by quantization
(dividing by a quantization step size, Qstep , then rounding the
result) (Figure
4a). Rearrange the DCT process into a core transform (Cf) and a scaling matrix (Sf) (Figure 4b). Scale the quantization process by a constant (215)
and compensate by dividing and rounding the final result (Figure 4c). Combine Sf
and the quantization process into Mf (Figure 4d), where:
Equation 1
Figure 4 Development of the forward transform
and quantization process
4
Developing the rescaling and inverse transform process
Consider a re-scaling (or “inverse quantization”) operation
followed by a two-dimensional inverse DCT (IDCT) (Figure 5a). Rearrange the IDCT process into a core transform (Ci) and a scaling matrix (Si) (Figure 5b). Scale the re-scaling process by a constant (26)
and compensate by dividing and rounding the final result (Figure 5c)[1]. Combine the re-scaling process and S into Vi (Figure 5d), where: Equation 2
Figure 5 Development of the rescaling and
inverse transform process
5
Developing Cf and Sf (4x4 blocks)
Consider a 4x4 two-dimensional DCT of a block X: Y = A×X×AT Equation 3 Where × indicates matrix multiplication and:
The rows of A are
orthogonal and have unit norms (i.e. the rows are orthonormal). Calculation of Equation
3
on a practical processor requires approximation of the irrational numbers b and c.
A fixed-point approximation is equivalent to scaling each row of A and rounding to the
nearest integer. Choosing a particular approximation (multiply by 2.5 and
round) gives Cf :
This approximation is chosen to minimise the complexity of
implementing the transform (multiplication by Cf requires only additions and binary shifts) whilst
maintaining good compression performance. The rows of Cf have different norms. To restore the orthonormal property of the
original matrix A, multiply all
the values cij in row r by A1 = Cf · Rf where · denotes
element-by-element multiplication (Hadamard-Schur product[2]).
Note that the new matrix A1 is orthonormal. The two-dimensional transform (Equation 3) becomes: Y =
A1×X×A1T = [Cf · Rf]×X×[CfT · RfT ] Rearranging: Y = [Cf×X×CfT] · [Rf
·
RfT] = [Cf×X×CfT] · Sf Where
6
Developing Ci and Si (4x4 blocks)
Consider a 4x4 two-dimensional IDCT of a block Y: Z = AT×Y×A Equation 4 Where
Choose a particular approximation by scaling each row of A and rounding to the nearest 0.5, giving Ci
:
The rows of Ci are orthogonal but have non-unit norms. To restore orthonormality,
multiply all the values cij in row r by A2 = Ci · Ri where The two-dimensional inverse transform (Equation 4) becomes: Z =
A2T×Y×A2 = [CiT · RiT]×Y×[Ci · Ri ] Rearranging: Z = [CiT]×[Y · RiT · Ri ]×[ Ci] = [CiT]×[Y · Si ]×[ Ci] Where
The core inverse transform Ci and the
rescaling matrix Vi
are defined in the H.264
standard. Hence we now develop Vi
and will then derive Mf .
7
Developing Vi
From Equation 2, Vi = Si × Qstep × 26 H.264 supports a range of quantization step sizes Qstep
. The precise step sizes are not defined in the standard, rather the scaling
matrix Vi is specified. Qstep values corresponding
to the entries in Vi
are shown in the following Table.
The ratio between successive Qstep values is chosen
to be Qstep(QP) = Qstep(QP%6)
× 2floor(QP/6) The values in the matrix Vi depend on Qstep (hence QP) and on the
scaling factor matrix Si . These are shown for QP
0 to 5 in the following Table.
For higher values of QP, the corresponding values in Vi are doubled (i.e. Vi (QP=6) = 2Vi(QP=0) , etc). Note that there are only three unique values in each matrix Vi
. These three values are defined as a
table of values v in
the H.264 standard, for QP=0 to QP=5 : Table 1 Matrix v defined in H.264 standard
Hence for QP values from 0 to 5, Vi is
obtained as:
Denote this as: Vi = v(QP,
n) Where v (r,n) is row
r, column n of v. For larger values of QP (QP>5), index the row of array v by QP%6 and then multiply by 2floor(QP/6)
. In general: Vi = v (QP%6,n)× 2floor(QP/6) The complete inverse transform and scaling process (for 4x4
blocks in macroblocks excluding 16x16-Intra mode) becomes: Z = round ( [CiT]×[Y · v (QP%6,n)× 2floor(QP/6)]×[ Ci] × (Note: rounded division by 26 can be carried out
by adding an offset and right-shifting by 6 bit positions).
8
Deriving Mf
Combining Equation
1
and Equation 2:
Si , Sf are
known and Vi is
defined as described in the previous section. Define Mf as:
The entries in matrix Mf may be calculated as follows (Table 2): Table
2 Tables v and m
Hence for QP values from 0 to 5, Mf can be obtained from m , the last
three columns of Table 2:
Denote this as: Mf = m(QP,
n) Where m (r,n) is row
r, column n of m. For larger values of QP (QP>5), index the row of array m by QP%6 and then divide by 2floor(QP/6) .
In general: Mf = m (QP%6,n)/
2floor(QP/6) Where m (r,n) is row
r, column n of m. The complete forward transform, scaling and quantization
process (for 4x4 blocks and for modes excluding 16x16-Intra) becomes: Y = round ( [Cf]×[Y]×[ CfT] · m (QP%6,n)/
2floor(QP/6) ] × (Note: rounded division by 215 may be carried out
by adding an offset and right-shifting by 15 bit positions).
9
Further reading
ITU-T Recommendation H.264, Advanced Video Coding for
Generic Audio-Visual Services, November
2007. H. Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky, Low-complexity
transform and quantization in H.264/AVC,
IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, pp.
598–603, July 2003. T. Wiegand, G.J. Sullivan, G. Bjontegaard, A. Luthra, Overview of the H.264/AVC video coding standard, IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, No. 7. (2003), pp. 560-576. I. Richardson, The H.264 Advanced Video Compression
Standard, to be published in late 2009. See http://www.vcodex.com/links.html for links to further resources on H.264 and video
compression. Acknowledgement I would like to thank Gary Sullivan for suggesting a
treatment of the H.264 transform and quantization processes along these lines
and for his helpful comments on earlier drafts of this document. About the author As a researcher, consultant and author working in the field
of video compression (video coding), my books on video codec design and the
MPEG-4 and H.264 standards are widely read by engineers, academics and
managers. I advise companies on video coding standards, design and intellectual
property and lead the Centre for Video Communications Research at The Robert
Gordon University in Aberdeen, UK and the Fully Configurable Video Coding
research initiative. Using the material in this document This document is copyright – you may not reproduce the
material without permission. Please contact me to ask for permission. Please
cite the document as follows:
Iain Richardson, 4x4 Transform and Quantization in
H.264/AVC, VCodex Ltd White Paper, April
2009, http://www.vcodex.com/ [1] This rounding operation need not be to the nearest integer. [2] P = Q·R
means that each element pij = qij×rij
|
| (c) Vcodex Limited 2001-2009 |