CN101617539A

CN101617539A - Based on computation complexity in the digital media coder of conversion and precision control

Info

Publication number: CN101617539A
Application number: CN200880005630A
Authority: CN
Inventors: S·斯里尼瓦杉; C·图; S·瑞古纳萨恩
Original assignee: Microsoft Corp
Current assignee: Microsoft Technology Licensing LLC
Priority date: 2007-02-21
Filing date: 2008-02-20
Publication date: 2009-12-30
Anticipated expiration: 2028-02-20
Also published as: RU2009131599A; TWI471013B; BRPI0807465A8; IL199994A; HK1140341A1; CN101617539B; TW200843515A; KR20090115726A; KR20150003400A; BRPI0807465B1; RU2518417C2; EP2123045A4; JP5457199B2; US20080198935A1; KR101507183B1; IL199994A0; JP2010519858A; KR101550166B1; BRPI0807465A2; WO2008103766A3

Abstract

The Digital Media encoder/decoder comprises and signaling in the computation complexity of the decoding place various patterns relevant with precision.Encoder can send the syntax elements of indication in the arithmetic precision (for example, using 16 or 32 bit arithmetics) of the transform operation of decoding place execution.Encoder can also signal in decoder output place whether use convergent-divergent, and this permits the wideer dynamic range of the intermediate data of decoding place, but owing to the convergent-divergent computing has increased computation complexity.

Description

Computational Complexity and Precision Control in Transform-Based Digital Media Codecs

背景background

基于块变换的编码Block Transform Based Coding

变换编码是在许多数字媒体(例如音频、图像和视频)压缩系统中使用的一种压缩技术。未压缩的数字图像和视频通常作为以二维(2D)网格排列的图像或视频帧中各位置处的图元或色彩的样本来表示或捕捉。这被称为图像或视频的空间域表示。例如，用于图像的典型格式由被排列为网格的24位色彩图元样本流构成。每一样本是表示诸如RGB或YIQ等色彩空间内该网格中的一个像素位置处的色彩分量的数字。各种图像和视频系统可使用各种不同的色彩、空间和时间分辨率的采样。类似地，数字音频通常被表示为时间采样的音频信号流。例如，典型的音频格式由以有规律的时间间隔所取的16位音频信号幅度样本流构成。Transform coding is a compression technique used in many digital media (eg, audio, image, and video) compression systems. Uncompressed digital images and video are typically represented or captured as samples of primitives or colors at various locations in an image or video frame arranged in a two-dimensional (2D) grid. This is called a spatial domain representation of an image or video. For example, a typical format for images consists of a stream of 24-bit color primitive samples arranged as a grid. Each sample is a number representing a color component at a pixel location in the grid in a color space such as RGB or YIQ. Various image and video systems may use sampling at various color, spatial and temporal resolutions. Similarly, digital audio is often represented as a stream of time-sampled audio signals. For example, a typical audio format consists of a stream of 16-bit audio signal amplitude samples taken at regular time intervals.

未压缩的数字音频、图像和视频信号可消耗大量的存储和传输能力。变换编码通过将信号的空间域表示变换成频域(或其它类似的变换域)表示，然后降低该变换域表示的某些一般较不可感知的频率分量的分辨率，从而减小了数字音频、图像和视频的大小。与降低空间域中的图像或视频或时域中的音频的色彩或空间分辨率相比，这一般产生了较不可感知的数字信号劣化。Uncompressed digital audio, image and video signals can consume large amounts of storage and transmission capacity. Transform coding reduces digital audio, The size of images and videos. This generally produces less perceptible degradation of the digital signal than reducing the color or spatial resolution of images or video in the spatial domain or audio in the temporal domain.

更具体而言，图1所示的典型的基于块变换的编码器/解码器系统100(也被称为“编解码器”)将未压缩的数字图像的像素划分成固定大小的二维块(X₁，...X_n)，每一块可能与其它块重叠。对每一块应用进行空间-频率分析的线性变换120-121，这将块内彼此隔开的样本转换成一般表示块间隔上相应的频带内的数字信号的强度的一组频率(或变换)系数。为了压缩，变换系数可被选择性地量化130(即，诸如通过丢弃系数值的最低有效位或将较高分辨率数字集中的值映射到较低分辨率来降低分辨率)，并且还被熵编码或可变长度编码130成压缩数据流。在解码时，变换系数进行逆变换170-171以便几乎重构原始的色彩/空间采样图像/视频信号(重构块

)。More specifically, a typical block-transform-based encoder/decoder system 100 (also referred to as a "codec") shown in FIG. 1 divides the pixels of an uncompressed digital image into fixed-size two-dimensional blocks (X ₁ , . . . X _n ), each block may overlap other blocks. A linear transform 120-121 that performs a space-frequency analysis is applied to each block, which converts the samples spaced apart from each other within the block into a set of frequency (or transform) coefficients that generally represent the strength of the digital signal in the corresponding frequency band over the block interval . For compression, the transform coefficients may be selectively quantized 130 (i.e., to reduce resolution such as by discarding the least significant bits of coefficient values or mapping values in a higher resolution digital set to a lower resolution), and also entropy Encoding or variable length encoding 130 into a compressed data stream. On decoding, the transform coefficients are inverse transformed 170-171 in order to nearly reconstruct the original color/space sampled image/video signal (reconstruction block

).

块变换120-121可被定义为对大小为N的向量x的数学运算。最通常的是，该运算是线性乘法，从而产生变换域输出y＝Mx，M是变换矩阵。当输入数据是任意长时，它被分段成大小为N的向量，并且向每一段应用块变换。出于数据压缩的目的，选择可逆块变换。换言之，矩阵M是可逆的。在多个维度中(例如，对于图像和视频)，块变换通常被实现为可分运算。沿数据的每一维(即，行和列)可分地应用矩阵乘法。A block transform 120-121 may be defined as a mathematical operation on a vector x of size N. Most commonly, this operation is a linear multiplication, resulting in a transform domain output y=Mx, where M is the transformation matrix. When the input data is arbitrarily long, it is segmented into vectors of size N, and a block transformation is applied to each segment. For the purpose of data compression, the reversible block transform is chosen. In other words, matrix M is invertible. In multiple dimensions (eg, for images and videos), block transformations are often implemented as separable operations. Matrix multiplication is applied separably along each dimension of the data (ie, rows and columns).

为了压缩，变换系数(向量y的分量)可被选择性地量化(即，诸如通过丢弃系数值的最低有效位或将较高分辨率数字集中的值映射到较低分辨率来降低分辨率)，并还可被熵编码或可变长度编码成压缩数据流。For compression, the transform coefficients (components of the vector y) can be selectively quantized (i.e., to reduce resolution such as by discarding the least significant bits of the coefficient values or mapping values in higher resolution digit sets to lower resolutions) , and can also be entropy coded or variable length coded into a compressed data stream.

在解码器150中解码时，如图1所示，在解码器150侧应用这些运算的逆过程(解量化(dequantization)/熵解码160和逆块变换170-171)。在重构数据时，将逆矩阵M^-1(逆变换170-171)作为乘数应用于变换域数据。当应用于变换域数据时，逆变换几乎重构原始时域或空间域数字媒体。When decoding in the decoder 150, as shown in FIG. 1, the inverse process of these operations (dequantization/entropy decoding 160 and inverse block transformation 170-171) is applied on the decoder 150 side. When reconstructing the data, the inverse matrix M ^-1 (inverse transform 170-171) is applied as a multiplier to the transform domain data. When applied to transform-domain data, the inverse transform nearly reconstructs the original time-domain or space-domain digital media.

在许多基于块变换的编码应用中，变换理想地是可逆的以取决于量化因子同时支持有损和无损压缩两者。如果例如没有量化(一般被表示为量化因子1)，则利用可逆变换的编解码器可在解码时精确地再现输入数据。然而，这些应用中的可逆性的要求约束了对用于设计编解码器的变换的选择。In many block transform-based coding applications, the transform is ideally reversible to support both lossy and lossless compression depending on the quantization factor. A codec utilizing reversible transforms can reproduce the input data exactly when decoded if, for example, there is no quantization (commonly denoted quantization factor 1). However, the requirement of invertibility in these applications constrains the choice of transforms for designing codecs.

诸如MPEG和Windows Media等许多图像和视频压缩系统利用基于离散余弦变换(DCT)的变换。已知DCT具有得到近乎最优的数据压缩的良好能量压缩特性。在这些压缩系统中，在压缩系统的编码器和解码器两者中的重构环路中采用了逆DCT(IDCT)来重构各个图像块。Many image and video compression systems, such as MPEG and Windows Media, utilize discrete cosine transform (DCT) based transforms. DCT is known to have good energy compression properties leading to near-optimal data compression. In these compression systems, an inverse DCT (IDCT) is employed in the reconstruction loop in both the encoder and decoder of the compression system to reconstruct the individual image blocks.

量化Quantify

量化是大多数图像和视频编解码器控制压缩的图像质量和压缩比的主要机制。根据一个可能的定义，量化是用于通常用于有损压缩的近似不可逆映射函数的术语，其中有一组指定的可能输出值，并且该组可能的输出值中的每一成员具有导致对该特定输出值的选择的一组相关联的输入值。已经开发了各种量化技术，包括标量或矢量、均匀或非均匀、有或没有死区、以及自适应或非自适应量化。Quantization is the primary mechanism by which most image and video codecs control compressed image quality and compression ratio. According to one possible definition, quantization is the term used for approximately irreversible mapping functions commonly used in lossy compression, where there is a specified set of possible output values, and each member of the set of possible output values has A set of associated input values for a selection of output values. Various quantization techniques have been developed, including scalar or vector, uniform or non-uniform, with or without dead zone, and adaptive or non-adaptive quantization.

量化运算本质上是按照量化参数QP的加偏除法(biased division)，这在编码器处执行。逆量化或乘法运算是与QP的乘法，这在解码器处执行。这些过程共同引入了原始变换系数数据的损失，这表现为解码的图像中的压缩误差或伪像。The quantization operation is essentially a biased division according to the quantization parameter QP, which is performed at the encoder. The inverse quantization or multiplication operation is a multiplication with the QP, which is performed at the decoder. Together, these processes introduce a loss of the original transform coefficient data, which manifests as compression errors or artifacts in the decoded image.

概述overview

以下详细描述呈现控制使用数字媒体编解码器的解码的计算复杂度和精度的工具和技术。在该技术的一个方面，编码器用信号通知在解码器处使用缩放或未缩放精度模式中的一个。在缩放精度模式中，在编码器处预乘(例如乘8)输入图像。解码器处的输出也通过取整除法来缩放。在未缩放精度模式中，不应用这种缩放运算。在未缩放精度模式中，编码器或解码器可以处理较小的变换系数动态范围，并且因此具有较低的计算复杂度。The following detailed description presents tools and techniques for controlling the computational complexity and precision of decoding using digital media codecs. In one aspect of the technique, the encoder signals at the decoder to use one of scaled or unscaled precision modes. In scaled precision mode, the input image is premultiplied (eg multiplied by 8) at the encoder. The output at the decoder is also scaled by rounding and division. In unscaled precision mode, this scaling operation is not applied. In unscaled precision mode, the encoder or decoder can handle a smaller dynamic range of transform coefficients and thus has lower computational complexity.

在该技术的另一方面，编解码器还可以用信号通知解码器执行变换运算所要求的精度。在一个实现中，位流句法的元素用信号通知是否对解码器处的变换采用较低精度算术运算。In another aspect of this technique, the codec can also signal to the decoder the precision required to perform the transform operations. In one implementation, an element of the bitstream syntax signals whether to employ lower precision arithmetic operations for the transform at the decoder.

提供本概述是为了以简化的形式介绍将在以下详细描述中进一步描述的一些概念。该概述不旨在标识所要求保护的主题的关键特征或必要特征，也不旨在用于帮助确定所要求保护的主题的范围。本发明的其它特征和优点在参考附图继续阅读以下对实施例的详细描述后将变得显而易见。This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. Other features and advantages of the present invention will become apparent after continuing to read the following detailed description of the embodiments with reference to the accompanying drawings.

附图简述Brief description of the drawings

图1是现有技术中常规的基于块变换的编解码器的框图。Fig. 1 is a block diagram of a conventional block transform based codec in the prior art.

图2是包含块模式编码的代表性编码器的流程图。Figure 2 is a flow diagram of a representative encoder including block mode encoding.

图3是包含块模式编码的代表性解码器的流程图。Figure 3 is a flowchart of a representative decoder including block mode encoding.

图4是图2和图3的代表性编码器/解码器的一个实现中的包括核心变换和后滤波(重叠)运算的逆重叠变换的图。4 is a diagram of an inverse lapped transform including a kernel transform and a post-filtering (overlap) operation in one implementation of the representative encoder/decoder of FIGS. 2 and 3 .

图5是标识变换运算的输入数据点的图。5 is a diagram identifying input data points for a transformation operation.

图6是用于实现图2和图3的媒体编码器/解码器的合适的计算环境的框图。6 is a block diagram of a suitable computing environment for implementing the media encoder/decoder of FIGS. 2 and 3 .

详细描述A detailed description

以下描述涉及控制基于变换的数字媒体编解码器的精度和计算复杂度的技术。以下描述在数字媒体压缩系统或编解码器的上下文中描述了该技术的一个示例实现。该数字媒体系统以压缩形式对数字媒体数据进行编码以便传输或存储，并解码该数据以供回放或其它处理。出于说明的目的，包含计算复杂度和精度控制的该示例性压缩系统是图像或视频压缩系统。另选地，该技术也可被结合到用于其它数字媒体数据的压缩系统或编解码器中。计算复杂度和精度控制技术不要求数字媒体压缩系统以特定的编码格式来编码压缩数字媒体数据。The following description relates to techniques for controlling the precision and computational complexity of transform-based digital media codecs. The following description describes one example implementation of the techniques in the context of a digital media compression system or codec. The digital media system encodes digital media data in a compressed form for transmission or storage and decodes the data for playback or other processing. For purposes of illustration, this exemplary compression system involving computational complexity and precision control is an image or video compression system. Alternatively, this technique may also be incorporated into compression systems or codecs for other digital media data. Computational complexity and precision control techniques do not require digital media compression systems to encode compressed digital media data in a specific encoding format.

1.1.编码器/解码器1.1. Encoder/Decoder

图2和图3是在代表性2维(2D)数据编码器200和解码器300中采用的过程的一般化图示。该图呈现结合了2D数据编码器和解码器的压缩系统的一般化或简化的图示，该2D数据编码器和解码器使用计算复杂度和精度控制技术来实现压缩。在使用控制技术的替换压缩系统中，可使用比该代表性编码器和解码器中所示的更多或更少的过程来进行2D数据压缩。例如，某些编码器/解码器还可包括色彩转换、色彩格式、可缩放编码、无损编码、宏块模式等。取决于可基于从无损到有损变化的量化参数的量化，压缩系统(编码器和解码器)可提供2D数据的无损和/或有损压缩。2 and 3 are generalized illustrations of processes employed in a representative 2-dimensional (2D) data encoder 200 and decoder 300 . The figure presents a generalized or simplified illustration of a compression system incorporating a 2D data encoder and decoder that uses computational complexity and precision control techniques to achieve the compression. In alternative compression systems using control techniques, more or fewer processes than shown in this representative encoder and decoder can be used for 2D data compression. For example, some encoders/decoders may also include color conversion, color formats, scalable encoding, lossless encoding, macroblock modes, etc. Compression systems (encoder and decoder) may provide lossless and/or lossy compression of 2D data, depending on quantization based on quantization parameters that may vary from lossless to lossy.

2D数据编码器200产生压缩位流220，压缩位流220是作为输入提供给编码器的2D数据210的更紧凑表示(对于典型输入)。例如，2D数据输入可以是图像、视频序列帧、或具有两个维度的其它数据。2D数据编码器将输入数据帧划分成块(一般在图2中示为分区230)，这在所示的实现中是形成跨该帧的平面的规则图案的非重叠4×4像素块。这些块被分组成称为宏块的群集，在该代表性编码器中其大小是16×16像素。宏块进而被分组成称为瓦块(tile)的规则结构。瓦块也可形成图像上的规则图案，使得水平行中的瓦块是统一的高度且是对齐的，而垂直列中的瓦块是统一的宽度且是对齐的。在该代表性编码器中，瓦块可以是任意大小，该大小在水平和/或垂直方向上是16的倍数。替换编码器实现可以将图像划分成块、宏块、瓦块或其它大小和结构的其它单元。The 2D data encoder 200 produces a compressed bitstream 220, which is a more compact representation (for a typical input) of the 2D data 210 provided as input to the encoder. For example, a 2D data input may be an image, a frame of a video sequence, or other data having two dimensions. The 2D data encoder divides the input data frame into blocks (shown generally in Figure 2 as partitions 230), which in the implementation shown are non-overlapping 4x4 pixel blocks that form a regular pattern across the plane of the frame. These blocks are grouped into clusters called macroblocks, which in this representative encoder are 16x16 pixels in size. Macroblocks are in turn grouped into regular structures called tiles. The tiles can also form a regular pattern on the image, such that tiles in horizontal rows are of uniform height and aligned, while tiles in vertical columns are of uniform width and aligned. In this representative encoder, tiles can be of any size that is a multiple of 16 in the horizontal and/or vertical directions. Alternative encoder implementations may divide the image into blocks, macroblocks, tiles, or other units of other sizes and structures.

对块之间的每一边缘应用“前向重叠”算子240，之后使用块变换250来变换每一4×4的块。该块变换250可以是由Srinivasan在2004年12月17日提交的题为“Reversible Transform For Lossy And Lossless 2-D DataCompression”(用于有损和无损2D数据压缩的可逆变换)的美国专利申请第11/015,707号中所描述的可逆的、无缩放的2D变换。重叠算子240可以是由Tu等人在2004年12月17日提交的题为“Reversible OverlapOperator for Efficient Lossless Data Compression”(用于高效无损数据压缩的可逆重叠算子)的美国专利申请第11/015,148号；以及Tu等人在2005年1月14日提交的题为“Reversible 2-Dimensional Pre-/Post-Filter for LappedBiorthogonal Transform”(用于重叠双正交变换的可逆2维预/后滤波器)的美国专利申请第11/035,991号中描述的可逆重叠算子。或者，可使用离散余弦变换或其它块变换和重叠算子。在变换之后，令每一4×4的变换块的DC系数260经受一类似的处理链(块化、前向重叠、之后是4×4的块变换)。所得的DC变换系数和AC变换系数被量化270、熵编码280和分组化290。A "forward overlap" operator 240 is applied to each edge between blocks, after which a block transform 250 is used to transform each 4x4 block. The block transform 250 may be obtained by Srinivasan in U.S. Patent Application No. 1, entitled "Reversible Transform For Lossy And Lossless 2-D Data Compression", filed on December 17, 2004 by Srinivasan. The reversible, unscaled 2D transform described in 11/015,707. The overlap operator 240 may be submitted by Tu et al. on December 17, 2004, entitled "Reversible Overlap Operator for Efficient Lossless Data Compression" (reversible overlap operator for efficient lossless data compression) U.S. Patent Application No. 11/ 015,148; and "Reversible 2-Dimensional Pre-/Post-Filter for Lapped Biorthogonal Transform" by Tu et al., filed Jan. 14, 2005 ) of US Patent Application No. 11/035,991 described in the reversible overlap operator. Alternatively, a discrete cosine transform or other block transform and overlap operator may be used. After transformation, the DC coefficients 260 of each 4x4 transform block are subjected to a similar processing chain (blocking, forward overlap, followed by 4x4 block transform). The resulting DC transform coefficients and AC transform coefficients are quantized 270 , entropy encoded 280 and packetized 290 .

解码器执行逆过程。在解码器侧，从其各自的分组中提取310变换系数位，从中系数本身被解码320和解量化330。DC系数340通过应用逆变换来重新生成，并且使用跨DC块边缘应用的合适的平滑算子来“逆重叠”DC系数的平面。随后，通过向DC系数应用4×4的逆变换350来重新生成整个数据，并从位流中解码AC系数342。最后，对所得图像平面中的块边缘进行逆重叠滤波360。这产生重构的2D数据输出。The decoder performs the reverse process. On the decoder side, the transform coefficient bits are extracted 310 from their respective packets, from which the coefficients themselves are decoded 320 and dequantized 330 . The DC coefficients 340 are regenerated by applying an inverse transform, and "inverse overlapping" the planes of the DC coefficients using a suitable smoothing operator applied across the DC block edges. Subsequently, the entire data is regenerated by applying a 4x4 inverse transform 350 to the DC coefficients, and the AC coefficients are decoded 342 from the bitstream. Finally, inverse overlap filtering is performed 360 on the block edges in the resulting image plane. This produces a reconstructed 2D data output.

在一示例性实现中，编码器200(图2)将输入图像压缩成压缩位流220(例如文件)，而解码器300(图3)基于所采用的是无损还是有损编码来重构原始输入或其近似。编码过程涉及应用以下所讨论的前向重叠变换(LT)，这是用同样在以下更全面描述的可逆2维预/后滤波来实现的。解码过程涉及应用使用可逆2维预/后滤波的逆重叠变换(ILT)。In an exemplary implementation, encoder 200 (FIG. 2) compresses an input image into a compressed bitstream 220 (e.g., a file), and decoder 300 (FIG. 3) reconstructs the original image based on whether lossless or lossy encoding was used. input or its approximation. The encoding process involves applying the forward lapped transform (LT) discussed below, which is achieved with invertible 2-dimensional pre/post filtering, also described more fully below. The decoding process involves applying an inverse lapped transform (ILT) using reversible 2-dimensional pre/post filtering.

所示的LT和ILT在确切的意义上是彼此的逆，并且因此可被统称为可逆重叠变换。作为一种可逆变换，LT/ILT对可用于无损图像压缩。The shown LT and ILT are in the exact sense the inverse of each other, and thus may be collectively referred to as an invertible lapped transform. As a reversible transform, the LT/ILT pair can be used for lossless image compression.

由所示的编码器200/解码器300压缩的输入数据210可以是各种色彩格式(例如，RGB/YUV 4:4:4、YUV 4:2:2或YUV 4:2:0彩色图像格式)的图像。通常，输入图像总是具有亮度(Y)分量。如果它是RGB/YUV 4:4:4、YUV 4:2:2或YUV 4:2:0图像，则该图像还具有色度分量，诸如U分量和V分量。图像的这些单独的色彩平面或分量可具有不同的空间分辨率。在例如YUV 4:2:0色彩格式的输入图像的情况下，U和V分量具有Y分量一半的宽度和高度。The input data 210 compressed by the illustrated encoder 200/decoder 300 can be in various color formats (e.g., RGB/YUV 4:4:4, YUV 4:2:2, or YUV 4:2:0 color image formats )Image. In general, an input image always has a luminance (Y) component. If it is an RGB/YUV 4:4:4, YUV 4:2:2 or YUV 4:2:0 image, the image also has chroma components such as U and V components. These individual color planes or components of an image may have different spatial resolutions. In case of an input image in eg YUV 4:2:0 color format, the U and V components have half the width and height of the Y component.

如上所述，编码器200将输入图像或图片块化成宏块。在一示例性实现中，编码器200将输入图像块化成Y通道中的16×16像素区域(称为“宏块”)(取决于色彩格式，可以是U和V通道中的16×16、16×8或8×8区域)。每一宏块色彩平面被块化成4×4像素的区域或块。因此，对于本示例性编码器实现，宏块按以下的方式由各种色彩格式组成：As described above, the encoder 200 blocks an input image or picture into macroblocks. In an exemplary implementation, the encoder 200 blocks the input image into 16×16 pixel regions (referred to as “macroblocks”) in the Y channel (depending on the color format, this could be 16×16 in the U and V channels, 16×8 or 8×8 area). Each macroblock color plane is blockized into regions or blocks of 4x4 pixels. Therefore, for this exemplary encoder implementation, a macroblock is composed of various color formats in the following manner:

1.对于灰度图像，每一宏块包含16个4×4的亮度(Y)块。1. For grayscale images, each macroblock contains 16 4x4 luma (Y) blocks.

2.对于YUV 4:2:0格式彩色图像，每一宏块包含16个4×4的Y块，以及4个各自为4×4的色度(U和V)块。2. For a YUV 4:2:0 format color image, each macroblock contains 16 4×4 Y blocks, and 4 4×4 chroma (U and V) blocks each.

3.对于YUV 4:2:2格式彩色图像，每一宏块包含16个4×4的Y块，以及8个各自为4×4的色度(U和V)块。3. For YUV 4:2:2 format color images, each macroblock contains 16 4×4 Y blocks, and 8 4×4 chrominance (U and V) blocks each.

4.对于RGB或YUV 4:4:4彩色图像，每一宏块对Y、U和V通道中的每一个包含16个块。4. For RGB or YUV 4:4:4 color images, each macroblock contains 16 blocks for each of the Y, U, and V channels.

因此，在变换之后，该代表性编码器200/解码器300中的宏块具有三个频率子带：DC子带(DC宏块)、低通子带(低通宏块)和高通子带(高通宏块)。在该代表性系统中，低通和/或高通子带在位流中是可任选的——这些子带可被完全丢弃。Thus, after transformation, a macroblock in this representative encoder 200/decoder 300 has three frequency subbands: a DC subband (DC macroblock), a low-pass subband (low-pass macroblock), and a high-pass subband (Qualcomm macroblock). In this representative system, the lowpass and/or highpass subbands are optional in the bitstream - these subbands can be dropped entirely.

此外，压缩数据可按以下两种次序之一被填塞到位流中：空间次序和频率次序。对于空间次序，瓦块内的同一宏块的不同子带被排序在一起，且所得的每一瓦块的位流被写入一个分组中。对于频率次序，来自瓦块内的不同宏块的同一子带被分组在一起，且因此瓦块的位流被写入以下三个分组中：DC瓦块分组、低通瓦块分组和高通瓦块分组。另外，可以有其它数据层。Furthermore, compressed data can be stuffed into the bitstream in one of two orders: spatial order and frequency order. For spatial order, different subbands of the same macroblock within a tile are ordered together, and the resulting bitstream for each tile is written into one packet. For frequency order, the same subbands from different macroblocks within a tile are grouped together, and thus the bitstream of the tile is written in the following three groups: DC tile grouping, low-pass tile grouping, and high-pass tile grouping block grouping. Additionally, there may be other data layers.

因此，对于该代表性系统，图像按以下“维度”来组织：Therefore, for this representative system, images are organized in the following "dimensions":

空间维度：帧→瓦块→宏块；Spatial dimension: frame→tile→macroblock;

频率维度：DC|低通|高通；以及Frequency Dimensions: DC|Low Pass|High Pass; and

通道维度：亮度|色度0|色度1……(例如，Y|U|V)。Channel dimensions: luma|chroma0|chroma1... (eg, Y|U|V).

以上箭头表示分层结构，而垂直条表示划分。Arrows above indicate hierarchy, while vertical bars indicate divisions.

尽管该代表性系统按照空间、频率和通道维度来组织压缩的数字媒体数据，但是此处描述的灵活量化方法可以应用于沿着更少、更多或其它维度来组织其数据的替换编码器/解码器系统。例如，该灵活量化方法可应用于使用更大数量的频带、其它格式的色彩通道(例如，YIQ、RGB等)、附加图像通道(例如，用于立体声视觉或其它多照相机阵列)的编码。Although this representative system organizes compressed digital media data along spatial, frequency, and channel dimensions, the flexible quantization methods described here can be applied to alternative encoders/coders that organize their data along fewer, more, or other dimensions. decoder system. For example, this flexible quantization method can be applied to encodings using larger numbers of frequency bands, color channels in other formats (eg, YIQ, RGB, etc.), additional image channels (eg, for stereo vision or other multi-camera arrays).

2.逆核心及重叠变换2. Inverse core and overlapping transformation

概览overview

在编码器200/解码器300的一个实现中，解码器侧的逆变换采取两级重叠变换的形式。步骤如下：In one implementation of the encoder 200/decoder 300, the inverse transform at the decoder side takes the form of a two-stage lapped transform. Proceed as follows:

·对与安排在被称为DC平面的平面阵列中的重构DC和低通系数相对应的每一4×4块应用逆核心变换(ICT)。• An inverse core transform (ICT) is applied to each 4x4 block corresponding to the reconstructed DC and low-pass coefficients arranged in a planar array called a DC plane.

·可任选地将后滤波运算应用于均匀地跨DC平面中的块的4×4区域。此外，对边界2×4和4×2区域应用后滤波器，而四个角区域不改变。- Optionally apply post-filtering operations to uniformly span the 4x4 region of the block in the DC plane. In addition, the post-filter is applied to the border 2×4 and 4×2 regions, while the four corner regions are not changed.

·所得阵列包含对应于第一级变换的4×4块的DC系数。DC系数被(象征性地)复制到更大的阵列，并且重构的高通系数被填充到剩余位置中。• The resulting array contains DC coefficients corresponding to a 4x4 block of the first stage transform. The DC coefficients are (symbolically) copied to a larger array, and the reconstructed high-pass coefficients are filled into the remaining positions.

·对每一4×4块应用ICT。• Apply ICT to each 4x4 block.

该过程在图4中示出。This process is illustrated in FIG. 4 .

后滤波器的应用由压缩位流220中的OVERLAP_INFO(重叠信息)句法元素来管控。OVERLAP_INFO可以取三个值：Application of the post-filter is governed by the OVERLAP_INFO (overlap information) syntax element in the compressed bitstream 220 . OVERLAP_INFO can take three values:

·如果OVERLAP_INFO＝0，则不执行后滤波。• If OVERLAP_INFO = 0, no post-filtering is performed.

·如果OVERLAP_INFO＝1，则只执行外部后滤波。• If OVERLAP_INFO = 1, only external post-filtering is performed.

·如果OVERLAP_INFO＝2，则执行内部及外部后滤波。• If OVERLAP_INFO = 2, perform internal and external post-filtering.

逆核心变换inverse kernel transform

核心变换(CT)受常规地被称为4×4离散余弦变换(DCT)启发，但它在根本上是不同的。第一关键差异是DCT是线性的而CT是非线性的。第二关键差异是由于其是在实数上定义的事实，DCT不是整数到整数空间中的无损运算。CT是在整数上定义的，并且在该空间中是无损的。第三关键差异是2D DCT是可分运算。CT特意是不可分的。The Kernel Transform (CT) is inspired by what is conventionally known as the 4x4 Discrete Cosine Transform (DCT), but it is fundamentally different. The first key difference is that DCT is linear while CT is non-linear. The second key difference is due to the fact that it is defined on the real numbers, the DCT is not a lossless operation in integer-to-integer space. CT is defined on integers and is lossless in that space. The third key difference is that 2D DCT is a separable operation. CT is specifically indivisible.

整个逆变换过程可被写成三个基本的2×2变换运算的级联，它们是：The entire inverse transformation process can be written as a cascade of three basic 2×2 transformation operations, which are:

·2×2哈达玛(Hadamard)变换：T_h2×2 Hadamard transform: T_h

·逆1D旋转：InvT_oddInverse 1D rotation: InvT_odd

·逆2D旋转：InvT_odd_oddInverse 2D rotation: InvT_odd_odd

这些变换是作为不可分运算来实现的，并且被首先描述，其后是整个ICT的描述。These transformations are implemented as inseparable operations and are described first, followed by the description of the entire ICT.

2D 2×2哈达玛变换T_h2D 2×2 Hadamard Transform T_h

如以下伪码表所示，编码器/解码器实现2D 2×2哈达玛变换T_h。R是舍入因子，其值只可以是0或1。T_h是对合的(即，对数据向量[a b c d]应用两次T-h会成功恢复[a b c d]的原始值，假定R在两次应用之间未改变)。逆T_h是T_h本身。As shown in the following pseudocode table, the encoder/decoder implements the 2D 2×2 Hadamard transform T_h. R is a rounding factor whose value can only be 0 or 1. T_h is involuntary (i.e., applying T-h twice to the data vector [a b c d] successfully restores the original value of [a b c d], assuming R is unchanged between the two applications). The inverse T_h is T_h itself.

逆1D旋转InvT_oddInverse 1D rotation InvT_odd

T_odd的无损逆由下表中的伪码定义。The lossless inverse of T_odd is defined by the pseudocode in the table below.

逆2D旋转InvT_odd_oddInverse 2D rotation InvT_odd_odd

逆2D旋转InvT_odd_odd由下表中的伪码定义。The inverse 2D rotation InvT_odd_odd is defined by the pseudocode in the table below.

ICT运算ICT operation

2×2数据和先前列出的伪码之间的对应在图5中示出。此处介绍使用四个灰度级来指示四个数据点的彩色编码，以方便下一节中的变换描述。The correspondence between the 2×2 data and the previously listed pseudocode is shown in FIG. 5 . Here we introduce color coding using four gray levels to indicate four data points to facilitate the transformation description in the next section.

2D 4×4点ICT是使用T_h、逆T_odd和逆T_odd_odd来构建的。注意，逆T_h是T_h本身。ICT包括两个阶段，其在以下伪码中示出。每一阶段包括能在该阶段内以任意顺序或同时完成的四个2×2变换。2D 4×4 point ICT is constructed using T_h, inverse T_odd and inverse T_odd_odd. Note that the inverse T_h is T_h itself. ICT consists of two phases, which are shown in the following pseudocode. Each stage consists of four 2x2 transforms that can be done in any order or simultaneously within that stage.

如果输入数据块是 $[\begin{matrix} a & b & c & d \\ e & f & g & h \\ i & j & k & l \\ m & n & o & p \end{matrix}],$ 则4×4_IPCT_1stStage()和4×4_IPCT_2ndStage()定义如下：If the input data block is $[\begin{matrix} a & b & c & d \\ e & f & g & h \\ i & j & k & l \\ m & no & o & p \end{matrix}],$ Then 4×4_IPCT_1stStage() and 4×4_IPCT_2ndStage() are defined as follows:

函数2×2_ICT与T_h相同。The function 2×2_ICT is the same as T_h.

后滤波概览Post Filtering Overview

四个算子定义逆重叠变换中所使用的后滤波器。它们是：Four operators define the post-filter used in the inverse lapped transform. They are:

·4×4后滤波器· 4×4 post-filter

·4点后滤波器· 4-point post filter

·2×2后滤波器· 2×2 post-filter

·2点后滤波器· 2-point post filter

后滤波器使用T_h、InvT_odd_odd、invScale和invRotate。invRotate和invScale分别在以下各表中定义。The post filter uses T_h, InvT_odd_odd, invScale and invRotate. invRotate and invScale are defined in the following tables respectively.

4×4后滤波器4×4 post filter

最初，在OVERLAP_INFO是1或2时，对所有色彩平面中的所有块连结(均匀地跨4个块的区域)应用4×4后滤波器。同样，在OVERLAP_INFO是2时，对所有平面的DC平面中的所有块连结应用4×4滤波器，而在OVERLAP_INFO是2且色彩格式是YUV 4:2:0或YUV 4:2:2时，只对亮度平面的DC平面中的所有块连结应用4×4滤波器。Initially, when OVERLAP_INFO is 1 or 2, a 4x4 post-filter is applied to all block concatenations (uniformly across an area of 4 blocks) in all color planes. Similarly, when OVERLAP_INFO is 2, a 4×4 filter is applied to all block connections in the DC plane of all planes, and when OVERLAP_INFO is 2 and the color format is YUV 4:2:0 or YUV 4:2:2, A 4x4 filter is applied to all block concatenations in the DC plane of the luma plane only.

如果输入数据是 $[\begin{matrix} a & b & c & d \\ e & f & g & h \\ i & j & k & l \\ m & n & o & p \end{matrix}],$ 则4×4后滤波器4×4PostFilter(a，b，c，d，e，f，g，h，i，j，k，l，m，n，o，p)在下表中定义：If the input data is $[\begin{matrix} a & b & c & d \\ e & f & g & h \\ i & j & k & l \\ m & no & o & p \end{matrix}],$ Then the 4×4 post filter 4×4PostFilter(a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p) is defined in the following table:

4点后滤波器4 point post filter

对跨图像的边界上的2×4和4×2区域的边缘应用线性4点滤波器。如果输入数据是[a b c d]，则4点后滤波器4PostFilter(a，b，c，d)在下表中定义。Applies a linear 4-point filter to the edges of the 2×4 and 4×2 regions across the border of the image. If the input data is [a b c d], the 4-point post filter 4PostFilter(a, b, c, d) is defined in the following table.

2×2后滤波器2×2 post filter

对跨YUV 4:2:0和YUV 4:2:2数据的色度通道的DC平面中的块的区域应用2×2后滤波器。如果输入数据是 $[\begin{matrix} a & b \\ c & d \end{matrix}],$ 则2×2后滤波器2×2PostFilter(a，b，c，d)在下表中定义：A 2×2 post-filter is applied to regions of blocks in the DC plane spanning YUV 4:2:0 and chroma channels of YUV 4:2:2 data. If the input data is $[\begin{matrix} a & b \\ c & d \end{matrix}],$ Then the 2×2 post filter 2×2PostFilter(a, b, c, d) is defined in the following table:

2点后滤波器2 point post filter

对跨块的边界2×1和1×2样本应用2点后滤波器。2点后滤波器2PostFilter(a，b)在下表中定义：Applies a 2-point post-filter to 2×1 and 1×2 samples across block boundaries. The 2-point post filter 2PostFilter(a, b) is defined in the following table:

用于执行上述重叠变换的变换运算所要求的精度的信令可以在压缩数据结构的头部中执行。在该示例实现中，LONG_WORD_FLAG和NO_SCALED_FLAGS是在压缩位流中(例如，在图像头部中)发送来用信号通知解码器要应用的精度和计算复杂度的句法元素。The signaling of the precision required for the transform operations used to perform the lapped transform described above can be performed in the header of the compressed data structure. In this example implementation, LONG_WORD_FLAG and NO_SCALED_FLAGS are syntax elements sent in the compressed bitstream (eg, in the picture header) to signal the decoder the precision and computational complexity to apply.

3.精度和字长3. Precision and word length

该示例编码器/解码器执行整数运算。此外，该示例编码器/解码器支持无损编码和解码。因此，该示例编码器/解码器所要求的主机器精度是整数。The example encoder/decoder performs integer arithmetic. Additionally, the sample encoder/decoder supports lossless encoding and decoding. Therefore, the required host machine precision for this example encoder/decoder is integer.

然而，在该示例编码器/解码器中定义的整数运算对有损编码导致舍入误差。这些误差在设计上很小，然而，它们在率失真曲线上导致下降。出于通过减少舍入误差来改进编码性能的目的，示例编码器/解码器定义第二机器精度。在该模式下，对输入预乘8(即，左移3位)，并且最终输出除以8取整(即，右移3位)。这些运算在编码器的前端和解码器的后端执行，并且对该过程的其余部分在很大程度上是不可见的。此外，相应地缩放量化等级，以便用主机器精度创建并使用第二机器精度解码(反之亦然)的流产生可接受的图像。However, the integer operations defined in this example encoder/decoder result in round-off errors for lossy encoding. These errors are small by design, however, they cause a drop in the rate-distortion curve. For the purpose of improving encoding performance by reducing round-off errors, the example encoder/decoder defines a second machine precision. In this mode, the input is premultiplied by 8 (ie, shifted left by 3 bits), and the final output is divided by 8 and rounded (ie, shifted right by 3 bits). These operations are performed at the front end of the encoder and at the back end of the decoder, and are largely invisible to the rest of the process. Furthermore, the quantization levels are scaled accordingly so that streams created with the main machine precision and decoded with the second machine precision (and vice versa) produce acceptable images.

在需要无损压缩时不能使用第二机器精度。在创建压缩文件时使用的机器精度在头部中被显式地标记。Second machine precision cannot be used when lossless compression is required. The machine precision used when creating the archive is explicitly marked in the header.

第二机器精度等于在编解码器中使用缩放算术，并且因此该模式被称为缩放的。主机器精度被称为未缩放的。Second machine precision is equal to using scaling arithmetic in the codec, and thus this mode is called scaled. Main machine precision is said to be unscaled.

该示例编码器/解码器被设计来提供良好的编码和解码速度。该示例编码器/解码器的设计目标是对一个8位输入而言，编码器和解码器上的数据值不超过16位有符号值。(然而，变换阶段内的中间运算可超过这一数字。)这对两种机器精度模式而言都是成立的。This example encoder/decoder is designed to provide good encoding and decoding speed. The design goal of this example encoder/decoder is that for an 8-bit input, the data values at the encoder and decoder do not exceed 16-bit signed values. (However, intermediate operations within the transform stage can exceed this number.) This is true for both machine-precision modes.

相反，在选择第二机器精度时，中间值的范围跨度是8位的。因为主机器精度避免预乘8，所以其范围跨度是8-3＝5位。In contrast, when the second machine precision is selected, the range span of intermediate values is 8 bits. Because main machine precision avoids premultiplication by 8, its range spans 8-3=5 bits.

第一示例编码器/解码器对中间值使用两种不同字长。这些字长是16和32位。The first example encoder/decoder uses two different word sizes for intermediate values. These word lengths are 16 and 32 bits.

第二示例位流句法和语义Second Example Bitstream Syntax and Semantics

第二示例位流句法和语义是分层的，并且包括以下各层：图像、瓦块、宏块、和块。The second example bitstream syntax and semantics is layered and includes the following layers: picture, tile, macroblock, and block.

图像(IMAGE)Image (IMAGE)

IMAGE(){ 位数描述符IMAGE(){ Descriptor

IMAGE_HEADER 可变 structIMAGE_HEADER variable struct

bAlphaPlane＝FALSEbAlphaPlane=FALSE

IMAGE_PLANE_HEADER 可变 structIMAGE_PLANE_HEADER variable struct

if(ALPHACHANNEL_FLAG){if(ALPHACHANNEL_FLAG){

bAlphaPlane＝TRUEbAlphaPlane=TRUE

IMAGE_PLANE_HEADER 可变 StructIMAGE_PLANE_HEADER Variable Struct

}}

INDEX_TABLE 可变 structINDEX_TABLE mutable struct

TILE 可变 structTILE struct variable

}}

图像头部(IMAGE_HEADER)Image header (IMAGE_HEADER)

IMAGE_HEADER(){ 位数描述符IMAGE_HEADER(){ Descriptor

GDISIGNATURE 64 uimsbfGDISIGNATURE 64 uimsbf

RESERVED1 4 uimsbfRESERVED1 4 4 uimsbf

RESERVED2 4 uimsbfRESERVED2 4 4 uimsbf

TILING_FLAG 1 boolTILING_FLAG 1 bool

FREQUENCYMODE_BITSTREAM_FLAG 1 uimsbfFREQUENCYMODE_BITSTREAM_FLAG 1 uimsbf

IMAGE_ORIENTATION 3 uimsbfIMAGE_ORIENTATION 3 uimsbf

INDEXTABLE_PRESENT_FLAG 1 uimsbfINDEXTABLE_PRESENT_FLAG 1 uimsbf

OVERLAP_INFO 2 uimsbfOVERLAP_INFO 2 uimsbf

SHORT_HEADER_FLAG 1 boolSHORT_HEADER_FLAG 1 bool

LONG_WORD_FLAG 1 boolLONG_WORD_FLAG 1 bool

WINDOWING_FLAG l boolWINDOWING_FLAG l bool

TRIM_FLEXBITS_FLAG 1 boolTRIM_FLEXBITS_FLAG 1 bool

RESERVED3 3 uimsbfRESERVED3 3 3 uimsbf

ALPHACHANNEL_FLAG 1 boolALPHACHANNEL_FLAG 1 bool

SOURCE_CLR_FMT 4 uimsbfSOURCE_CLR_FMT 4 uimsbf

SOURCE_BITDEPTH 4 uimsbfSOURCE_BITDEPTH 4 uimsbf

If(SHORT_HEADER_FLAG){If(SHORT_HEADER_FLAG){

WIDTH_MINUS1 16 uimsbfWIDTH_MINUS1 16 uimsbf

HEIGHT_MINUS1 16 uimsbfHEIGHT_MINUS1 16 uimsbf

}}

else{else {

WIDTH_MINUS1 32 uimsbfWIDTH_MINUS1 32 uimsbf

HEIGHT_MINUS1 32 uimsbfHEIGHT_MINUS1 32 uimsbf

}}

if(TILING_FLAG){if(TILING_FLAG){

NUM_VERT_TILES_MINUS1 12 uimsbfNUM_VERT_TILES_MINUS1 12 uimsbf

NUM_HORIZ_TILES_MINUS1 12 uimsbfNUM_HORIZ_TILES_MINUS1 12 uimsbf

}}

for(n＝0；n＜for(n=0;n<

NUM_VERT_TILES_MINUS1；n++){NUM_VERT_TILES_MINUS1; n++){

If(SHORT_HEADER_FLAG)If(SHORT_HEADER_FLAG)

8 uimsbf8 uimsbf

WIDTH_IN_MB_OF_TILE_MINUS1[n]WIDTH_IN_MB_OF_TILE_MINUS1[n]

elseelse

16 uimsbf...

WIDTH_IN_MB_OF_TILE_MINUS1[n]WIDTH_IN_MB_OF_TILE_MINUS1[n]

}}

for(n＝0；n＜for(n=0;n<

NUM_HORIZ_TILES_MINUS1；n++){NUM_HORIZ_TILES_MINUS1; n++){

If(SHORT_HEADER_FLAG)If(SHORT_HEADER_FLAG)

8 uimsbf8 uimsbf

HEIGHT_IN_MB_OF_TILE_MINUS1[n]HEIGHT_IN_MB_OF_TILE_MINUS1[n]

elseelse

16 uimsbf...

HEIGHT_IN_MB_OF_TILE_MINUS1[n]HEIGHT_IN_MB_OF_TILE_MINUS1[n]

}}

if(WINDOWING_FLAG){if(WINDOWING_FLAG){

NUM_TOP_EXTRAPIXELS 6 uimsbfNUM_TOP_EXTRAPIXELS 6 uimsbf

NUM_LEFT_EXTRAPIXELS 6 uimsbfNUM_LEFT_EXTRAPIXELS 6 uimsbf

NUM_BOTTOM_EXTRAPIXELS 6 uimsbfNUM_BOTTOM_EXTRAPIXELS 6 uimsbf

NUM_RIGHT_EXTRAPIXELS 6 uimsbfNUM_RIGHT_EXTRAPIXELS 6 uimsbf

}}

IMAGE_PLANE_HEADER(){ 位数描述IMAGE_PLANE_HEADER(){ digits description

符Symbol

CLR_FMT 3 uimsbfCLR_FMT 3 uimsbf

NO_SCALED_FLAG 1 boolNO_SCALED_FLAG 1 bool

BANDS_PRESENT 4 uimsbfBANDS_PRESENT 4 uimsbf

if(CLR_FMT＝＝YUV444){If(CLR_FMT==YUV444){

CHROMA_CENTERING 4 uimsbfCHROMA_CENTERING 4 uimsbf

COLOR_INTERPRETATION 4 uimsbfCOLOR_INTERPRETATION 4 uimsbf

}}

Else if(CLR_FMT＝＝NCHANNEL){Else if(CLR_FMT==NCHANNEL){

NUM_CHANNELS_MINUS 14 uimsbfNUM_CHANNELS_MINUS 14 uimsbf

COLOR_INTERPRETATION 4 uimsbfCOLOR_INTERPRETATION 4 uimsbf

}}

if(SOURCE_CLR_FMT＝＝BAYER){if(SOURCE_CLR_FMT==BAYER){

BAYER_PATTERN 2 uimsbfBAYER_PATTERN 2 uimsbf

CHROMA_CENTERING_BAYER 2 uimsbfCHROMA_CENTERING_BAYER 2 uimsbf

COLOR_INTERPRETATION 4 uimsbfCOLOR_INTERPRETATION 4 uimsbf

}}

if(SOURCE_BITDEPTH ∈if(SOURCE_BITDEPTH ∈

{BD16，BD16S，BD32，BD32S}){{BD16, BD16S, BD32, BD32S}){

SHIFT_BITS 8 uimsbfSHIFT_BITS 8 uimsbf

}}

if(SOURCE_BITEPTH＝＝BD32F){If(SOURCE_BITEPTH==BD32F){

LEN_MANTISSA 8 uimsbfLEN_MANTISSA 8 uimsbf

EXP_BIAS 8 uimsbfEXP_BIAS 8 uimsbf

}}

DC_FRAME_UNIFORM 1 boolDC_FRAME_UNIFORM 1 bool

if(DC_FRAME_UNIFORM){if(DC_FRAME_UNIFORM){

DC_QP() 可变 structDC_QP() mutable struct

}}

if(BANDS_PRESENT！＝SB_DC_ONLY){if(BANDS_PRESENT!=SB_DC_ONLY){

USE_DC_QP 1 boolUSE_DC_QP 1 bool

if(USE_DC_QP＝＝FALSE){If(USE_DC_QP==FALSE){

LP_FRAME_UNIFORM 1 boolLP_FRAME_UNIFORM 1 bool

if(LP_FRAME_UNIFORM){ if(LP_FRAME_UNIFORM){

NUM_LP_QPS＝1NUM_LP_QPS＝1

LP_QP() 可变 structLP_QP() mutable struct

}}

if(BANDS_PRESENT！＝SB_NO_HIGHPASS){if(BANDS_PRESENT!=SB_NO_HIGHPASS){

USE_LP_QP 1 boolUSE_LP_QP 1 bool

if(USE_LP_QP＝＝FALSE){If(USE_LP_QP==FALSE){

HP_FRAME_UNIFORM 1 bool

if(HP_FRAME_UNIFORM){ if(HP_FRAME_UNIFORM){

NUM_HP_QPS＝1NUM_HP_QPS＝1

HP_QP() 可变 struct

}}

FLUSH_BYTE 可变FLUSH_BYTE variable

}}

从第二示例位流句法和语义中所选择的一些位流元素定义如下。Some selected bitstream elements from the second example bitstream syntax and semantics are defined as follows.

长字标志(LONG_WORD_FLAG)(1位)Long word flag (LONG_WORD_FLAG) (1 bit)

LONG_WORD_FLAG是1位句法元素并指定是否将16位整数用于变换计算。在该第二示例位流句法中，如果LONG_WORD_FLAG＝＝0(FALSE(假))，则16位整数和数组可以用于变换计算的外部阶段(变换中的中间运算(如(3*a+1)＞＞1)是用更高准确度来执行的)。如果LONG_WORD_FLAG＝＝TRUE(真)，则应将32位整数和数组用于变换计算。LONG_WORD_FLAG is a 1-bit syntax element and specifies whether 16-bit integers are used for transform calculations. In this second example bitstream syntax, if LONG_WORD_FLAG == 0 (FALSE (false)), then 16-bit integers and arrays can be used in the external stages of the transformation calculation (intermediate operations in the transformation (such as (3*a+1 )>>1) is performed with higher accuracy). If LONG_WORD_FLAG == TRUE, then 32-bit integers and arrays shall be used for transform calculations.

注意：32位算术可被用来解码图像而不管LONG_WORD_FLAG的值。该句法元素可由解码器用来选择用于实现的最高效字长。Note: 32-bit arithmetic can be used to decode images regardless of the value of LONG_WORD_FLAG. This syntax element can be used by the decoder to select the most efficient word size for implementation.

无缩放算术标志(NO_SCALED_FLAG)(1位)No scaling arithmetic flag (NO_SCALED_FLAG) (1 bit)

NO_SCALED_FLAG是指定变换是否使用缩放的1位句法元素。如果NO_SCALED_FLAG＝＝1，则不应执行缩放。如果NO_SCALED_FLAG＝＝0，则应当执行缩放。在这种情况下，缩放应当通过将最终阶段(色彩转换)的输出适当地下舍入3位来执行。NO_SCALED_FLAG is a 1-bit syntax element that specifies whether the transform uses scaling. If NO_SCALED_FLAG==1, scaling should not be performed. If NO_SCALED_FLAG==0, then scaling should be performed. In this case scaling should be performed by appropriately rounding down the output of the final stage (color conversion) by 3 bits.

注意：如果需要无损编码，则即使无损编码只用于图像的子区域，NO_SCALED_FLAG也应被设为TRUE。有损编码可以使用任一模式。NOTE: If lossless encoding is desired, NO_SCALED_FLAG should be set to TRUE even if lossless encoding is only used for subregions of the image. Lossy encoding can use either mode.

注意：在使用缩放时(即，NO_SCALED_FLAG＝＝FALSE)，尤其是在低QP的情况下，有损编码的率失真性能很好。Note: The rate-distortion performance of lossy encoding is good when scaling is used (ie, NO_SCALED_FLAG==FALSE), especially at low QP.

4.长字标志的信令和使用4. Signaling and use of the long word flag

代表性编码器/解码器的一个示例图像格式支持各种各样的像素格式，包括高动态范围和宽色域格式。所支持的数据类型包括有符号整数、无符号整数、定点浮动和浮点浮动。所支持的位深包括每色彩通道8、16、24和32位。示例图像格式允许使用达每色彩通道24位的图像的无损压缩，以及使用达每色彩通道32位的图像的有损压缩。An example image format of a representative encoder/decoder supports a wide variety of pixel formats, including high dynamic range and wide color gamut formats. Supported data types include signed integer, unsigned integer, fixed-point float, and float-float. Supported bit depths include 8, 16, 24 and 32 bits per color channel. An example image format allows lossless compression of images using up to 24 bits per color channel, and lossy compression of images using up to 32 bits per color channel.

同时，该示例图像格式被设计成提供高质量图像和压缩效率，并允许低复杂度编码和解码实现。At the same time, this example image format is designed to provide high image quality and compression efficiency, and to allow low-complexity encoding and decoding implementations.

为支持低复杂度实现，示例图像格式中的变换被设计成最小化动态范围的扩张。两阶段变换只将动态范围增加5位。因此，如果图像位深是每色彩通道8位，则16位算术可足以在解码器处执行所有变换运算。对于其它位深，变换运算可能需要更高精度的算术。To support low-complexity implementations, the transformations in the example image formats are designed to minimize expansion of the dynamic range. The two-stage transform only increases the dynamic range by 5 bits. Thus, if the image bit depth is 8 bits per color channel, 16-bit arithmetic may be sufficient to perform all transform operations at the decoder. For other bit depths, transform operations may require higher precision arithmetic.

如果在解码器处已知执行变换运算所要求的精度，则解码特定位流的计算复杂度可以降低。可以使用句法元素(例如，图像头部中的1位标志)来用信号将该信息通知给解码器。所描述的信令技术和句法元素可以降低解码位流的计算复杂度。The computational complexity of decoding a particular bitstream can be reduced if the precision required to perform the transform operation is known at the decoder. This information can be signaled to the decoder using a syntax element (eg, a 1-bit flag in the picture header). The described signaling techniques and syntax elements can reduce the computational complexity of decoding a bitstream.

在一个示例实现中，使用1位句法元素LONG_WORD_FLAG。例如，如果LONG_WORD_FLAG＝＝FALSE，则16位整数和数组可被用于变换计算的外部阶段，并且如果LONG_WORD_FLAG＝＝TRUE，则32位整数和数组应被用于变换计算。In one example implementation, the 1-bit syntax element LONG_WORD_FLAG is used. For example, if LONG_WORD_FLAG==FALSE, 16-bit integers and arrays may be used for the external stages of transform calculations, and if LONG_WORD_FLAG==TRUE, 32-bit integers and arrays should be used for transform calculations.

在该代表性编码器/解码器的一个实现中，可以对16位宽的字执行原地变换运算，但变换内的中间运算(如计算b+＝(3*a+1)＞＞1所给出的“提升”步骤的3*a的积)是用更高准确度(例如，18位或更高精度)来执行的。然而，在该示例中，中间变换值a和b本身可以存储在16位整数内。In one implementation of this representative encoder/decoder, transform operations can be performed in-place on 16-bit wide words, but intermediate operations within the transform (as given by computing b+=(3*a+1)>>1 The product of 3*a of the "lifting" step out) is performed with higher accuracy (eg, 18 bits or higher). However, in this example, the intermediate transformed values a and b may themselves be stored in 16-bit integers.

32位算术可被用来解码图像而不管LONG_WORD_FLAG元素的值。LONG_WORD_FLAG元素可由编码器/解码器用来选择用于实现的最高效字长。例如，如果编码器能验证16位和32位精度变换步骤产生相同的输出值，则它可以选择将LONG_WORD_FLAG元素设为FALSE。32-bit arithmetic can be used to decode images regardless of the value of the LONG_WORD_FLAG element. The LONG_WORD_FLAG element may be used by an encoder/decoder to select the most efficient word length for implementation. For example, an encoder may choose to set the LONG_WORD_FLAG element to FALSE if it can verify that 16-bit and 32-bit precision transform steps produce the same output value.

5.NO_SCALED_FLAG的信令和使用5. Signaling and use of NO_SCALED_FLAG

代表性编码器/解码器的一个示例图像格式支持各种各样的像素格式，包括高动态范围和宽色域格式。同时，该代表性编码器/解码器的设计优化图像质量和压缩效率，并允许低复杂度的编码和解码实现。An example image format of a representative encoder/decoder supports a wide variety of pixel formats, including high dynamic range and wide color gamut formats. Meanwhile, the design of this representative encoder/decoder optimizes image quality and compression efficiency, and allows low-complexity encoding and decoding implementation.

如上所述，该代表性编码器/解码器使用两阶段的分层的基于块的变换，其中所有变换步骤都是整数运算。这些整数运算中存在的小舍入误差导致有损压缩期间的压缩效率的损失。为对抗这一问题，该代表性编码器/解码器的一个实现定义用于解码器运算的两个不同的精度模式：缩放模式和未缩放模式。As mentioned above, this representative encoder/decoder uses a two-stage layered block-based transform where all transform steps are integer operations. The presence of small round-off errors in these integer operations results in a loss of compression efficiency during lossy compression. To combat this problem, one implementation of this representative encoder/decoder defines two different precision modes for decoder operations: scaled mode and unscaled mode.

在缩放精度模式下，在编码器处对输入图像预乘8(即，左移3位)，并且在解码器处的最终输出除以8取整(即，右移3位)。缩放精度模式中的运算最小化舍入误差，并且产生改进的率失真性能。In scaled precision mode, the input image is premultiplied by 8 at the encoder (ie, shifted left by 3 bits), and the final output at the decoder is divided by 8 and rounded (ie shifted right by 3 bits). Operations in scaled precision mode minimize round-off errors and yield improved rate-distortion performance.

在未缩放精度模式中，不存在这种缩放。以未缩放精度模式运算的编码器或解码器必须处理较小的变换系数动态范围，并且因此具有较低的计算复杂度。然而，对于在该模式中运算而言，压缩效率上存在少量恶化。无损编码(不用量化，即将量化参数即QP设为1)只能使用未缩放精度模式来得到所确保的可逆性。In unscaled precision mode, there is no such scaling. An encoder or decoder operating in unscaled precision mode has to deal with a smaller dynamic range of transform coefficients and thus has lower computational complexity. However, there is a small penalty in compression efficiency for operating in this mode. Lossless coding (without quantization, i.e. setting the quantization parameter ie QP to 1) can only use the unscaled precision mode to get the guaranteed reversibility.

编码器在创建压缩文件时所使用的精度模式在压缩位流220的图像头部中使用NO_SCALED_FLAG来显式地用信号通知(图2)。建议解码器300也对其运算使用同一精度模式。The precision mode used by the encoder when creating the compressed file is explicitly signaled using NO_SCALED_FLAG in the image header of the compressed bitstream 220 (FIG. 2). It is proposed that the decoder 300 also use the same precision mode for its operations.

NO_SCALED_FLAG是图像头部中的如下指定精度模式的1位句法元素：NO_SCALED_FLAG is a 1-bit syntax element in the image header specifying the precision mode as follows:

如果NO_SCALED_FLAG＝＝TRUE，则未缩放模式应被用于解码器运算。If NO_SCALED_FLAG == TRUE, unscaled mode shall be used for decoder operations.

如果NO_SCALED_FLAG＝＝FALSE，则应当使用缩放。在这种情况下，缩放模式应当通过将最终阶段(色彩转换)的输出适当地舍入3位来用于运算。If NO_SCALED_FLAG == FALSE, then scaling should be used. In this case, the scaling mode should be used for operations by properly rounding the output of the final stage (color conversion) by 3 bits.

在使用未缩放模式时(即，NO_SCALED_FLAG＝＝FALSE)，尤其是在低QP的情况下，有损编码的率失真性能很好。然而，在使用未缩放模式时，由于以下两个原因，计算复杂度较低：The rate-distortion performance of lossy coding is good when using unscaled mode (ie, NO_SCALED_FLAG==FALSE), especially at low QP. However, when using unscaled mode, the computational complexity is lower for two reasons:

未缩放模式中的较小的动态范围扩张意味着较短的字可以用于变换计算，尤其是在结合“LONG_WORD_FLAG”的情况下。在VLSI实现中，降低的动态范围扩张意味着实现更多有效位的门逻辑可被断电。Smaller dynamic range expansion in unscaled mode means shorter words can be used for transform calculations, especially in combination with "LONG_WORD_FLAG". In a VLSI implementation, the reduced dynamic range expansion means that gate logic implementing more significant bits can be powered down.

缩放模式在解码器侧要求加法运算和右移3位(实现除以8取整)。在编码器侧，其要求左移3位。总体上，这比未缩放模式在计算上要求稍高。Scaling mode requires an addition operation and a right shift of 3 bits on the decoder side (to implement division by 8). On the encoder side, it requires a left shift of 3 bits. Overall, this is slightly more computationally demanding than unscaled mode.

此外，未缩放模式允许比缩放模式压缩更多的有效位。例如，使用32位算术，未缩放模式准许每样本达27个有效位的无损压缩(以及解压)。相反，缩放模式在同样情况下只允许24位压缩。这是因为缩放过程引入了动态范围的三个附加位。Also, unscaled mode allows more significant bits to be compressed than scaled mode. For example, using 32-bit arithmetic, unscaled mode permits lossless compression (and decompression) of up to 27 effective bits per sample. In contrast, scaling mode only allows 24-bit compression under the same circumstances. This is because the scaling process introduces three additional bits of dynamic range.

对这两种精度模式而言，对于8位输入，解码器上的数据值都不超过16个有符号位。(然而，变换阶段内的中间运算可超过这一数字。)For both precision modes, the data value at the decoder does not exceed 16 signed bits for an 8-bit input. (However, intermediate operations within the transform stage can exceed this number.)

注意：如果需要无损编码(QP＝1)，即使只有图像的子区域需要无损编码，则编码器将NO_SCALED_FLAG设为TRUE。NOTE: The encoder sets NO_SCALED_FLAG to TRUE if lossless encoding is required (QP=1), even if only a subregion of the image requires lossless encoding.

编码器可以使用任一模式来用于有损压缩。建议解码器对其运算使用NO_SCALED_MODE用信号通知的精度模式。然而，缩放量化等级，以便用缩放精度模式创建并使用未缩放的精度模式解码(反之亦然)的流在大多数情况下产生可接受的图像。Encoders can use either mode for lossy compression. It is recommended that decoders use the precision mode signaled by NO_SCALED_MODE for their operations. However, scaling the quantization level so that a stream created with a scaled precision mode and decoded with an unscaled precision mode (and vice versa) produces acceptable images in most cases.

6.用于增加的准确度的缩放算术6. Scaled arithmetic for increased accuracy

在该代表性编码器/解码器的一个实现中，变换(包括色彩转换)是整数变换并通过一系列提升步骤来实现。在这些提升步骤中，截断误差损害变换性能。对于有损压缩的情况，为最小化截断误差的损害并因而最大化变换性能，对于变换的输入数据需要被左移若干位。然而，另一极其需要的特征是如果输入图像是8位，则每一变换的输出应当在16位以内。所以左移位数不能很大。该代表性解码器实现缩放算术来达到这两个目标的技术。缩放算术技术通过最小化截断误差的损害来最大化变换性能，并且在输入图像是8位的情况下仍然将每一变换步骤的输出限制在16位以内。这使简单的16位实现成为可能。In one implementation of this representative encoder/decoder, the transformations (including color transformations) are integer transformations and are implemented through a series of lifting steps. In these lifting steps, truncation errors impair transformation performance. For the case of lossy compression, to minimize the penalty of truncation errors and thus maximize the transform performance, the input data for the transform needs to be left shifted by a number of bits. However, another highly desirable feature is that if the input image is 8 bits, the output of each transform should be within 16 bits. So the number of left shifts cannot be very large. This representative decoder implements techniques for scaling arithmetic to achieve both goals. Scaled arithmetic techniques maximize transform performance by minimizing the penalty of truncation errors, and still limit the output of each transform step to 16 bits when the input image is 8 bits. This enables simple 16-bit implementations.

该代表性编码器/解码器中所使用的变换是整数变换并通过提升步骤来实现。大多数提升步骤涉及右移，这引入截断误差。变换通常涉及多个提升步骤，并且累积截断误差明显损害变换性能。The transform used in this representative encoder/decoder is an integer transform and is implemented by a lifting step. Most boosting steps involve right shifts, which introduce truncation errors. Transformations often involve multiple lifting steps, and accumulating truncation errors significantly impairs transformation performance.

降低截断误差的损害的一种方式是在编码器中进行变换之前左移输入数据，并在解码器处在变换(与量化相组合)之后右移相同位数。如上所述，该代表性编码器/解码器具有两阶段变换结构：可任选第一阶段重叠+第一阶段CT+可任选第二阶段重叠+第二阶段CT。实验显示为最小化截断误差，左移3位是必要的。所以，在有损的情况下，在色彩转换之前，输入数据可以左移3位，即乘或放大因数8(例如，对于上述缩放模式)。One way to reduce the penalty of truncation error is to left shift the input data before the transform at the encoder, and right shift by the same number of bits after the transform (combined with quantization) at the decoder. As mentioned above, the representative encoder/decoder has a two-stage transform structure: optional first-stage overlap + first-stage CT + optional second-stage overlap + second-stage CT. Experiments have shown that a left shift of 3 bits is necessary to minimize truncation error. So, in the lossy case, the input data can be left shifted by 3 bits before color conversion, i.e. multiplied or upscaled by a factor of 8 (e.g. for the above scaling modes).

然而，色彩转换和变换扩大数据。如果输入数据左移3位，则在输入数据是8位的情况下，第二阶段4×4DCT的输出具有17位动态范围(其它变换的输出仍然在16位以内)。这是极不需要的，因为它阻止了16位实现(这是极其需要的特征)。为避开这一点，在第二阶段4×4CT之前，输入数据右移1位，并且故而输出也在16位以内。因为只对数据(第一阶段DCT的DC变换系数)的1/16应用了第二阶段4×4CT，并且第一阶段变换已经将该数据放大，所以截断误差的损害很小。However, color conversions and transformations enlarge the data. If the input data is shifted left by 3 bits, the output of the second stage 4×4DCT has a dynamic range of 17 bits if the input data is 8 bits (the output of other transformations is still within 16 bits). This is highly undesirable as it prevents 16-bit implementation (which is a highly desired feature). To get around this, before the second stage 4x4CT, the input data is shifted right by 1 bit, and thus the output is also within 16 bits. Since the second-stage 4x4CT is only applied to 1/16 of the data (DC transform coefficients of the first-stage DCT), and the first-stage transform already amplifies this data, the damage of truncation errors is small.

所以在8位图像的有损情况下，在编码器侧，在色彩转换之前输入被左移3位，并且在第二阶段4×4CT之前右移1位。在解码器侧，在第一阶段4×4IDCT之前左移1位并在色彩转换之后右移3位。So in the lossy case of an 8-bit image, on the encoder side, the input is left shifted by 3 bits before color conversion, and right shifted by 1 bit before the second stage 4×4CT. On the decoder side, 1 bit is left shifted before the first stage 4x4 IDCT and 3 bits are shifted right after color conversion.

7.计算环境7. Computing environment

上述用于数字媒体编解码器中的计算复杂度和精度信令的处理技术可以在各种数字媒体编码和/或解码系统的任一种上实现，包括计算机(各种形状因数，包括服务器、台式机、膝上型计算机、手持式计算机等)；数字媒体记录器和播放器；图像和视频捕捉设备(诸如照相机、扫描仪等)；通信设备(诸如电话、移动电话、会议设备等)；显示、打印或其它呈现设备；以及其它示例等等。数字媒体编解码器中的计算复杂度和精度信令技术可用硬件电路、控制数字媒体处理硬件的固件、以及在计算机或在诸如图6中所示的其它计算环境中执行的通信软件来实现。The processing techniques described above for computational complexity and precision signaling in digital media codecs can be implemented on any of a variety of digital media encoding and/or decoding systems, including computers (various form factors, including servers, desktop computers, laptop computers, handheld computers, etc.); digital media recorders and players; image and video capture equipment (such as cameras, scanners, etc.); communication equipment (such as telephones, mobile phones, conferencing equipment, etc.); display, printing, or other presentation device; and other examples, etc. Computational complexity and precision signaling techniques in digital media codecs may be implemented with hardware circuitry, firmware controlling digital media processing hardware, and communications software executing on a computer or in other computing environments such as shown in FIG. 6 .

图6示出了其中可实现所描述的实施例的合适计算环境(600)的一个一般示例。计算环境(600)不旨在对本发明的使用范围或功能提出任何限制，因为本发明可以在完全不同的通用或专用计算环境中实现。Figure 6 shows one general example of a suitable computing environment (600) in which described embodiments may be implemented. The computing environment (600) is not intended to suggest any limitation as to the scope of use or functionality of the invention, as the invention can be implemented in entirely different general-purpose or special-purpose computing environments.

参考图6，计算环境(600)包括至少一个处理单元(610)和存储器(620)。在图6中，这一最基本的配置(630)被包括在虚线内。处理单元(610)执行计算机可执行指令，并且可以是真实或虚拟处理器。在多处理系统中，多个处理单元执行计算机可执行指令以提高处理能力。存储器(620)可以是易失性存储器(例如，寄存器、高速缓存、RAM)、非易失性存储器(例如，ROM、EEPROM、闪存等)或两者的某种组合。存储器(602)存储实现所描述的使用计算复杂度和精度信令技术的数字媒体编码/解码的软件(680)。Referring to Figure 6, the computing environment (600) includes at least one processing unit (610) and memory (620). In Figure 6, this most basic configuration (630) is enclosed within the dashed line. The processing unit (610) executes computer-executable instructions and may be a real or virtual processor. In a multi-processing system, multiple processing units execute computer-executable instructions to increase processing power. The memory (620) may be volatile memory (eg, registers, cache, RAM), non-volatile memory (eg, ROM, EEPROM, flash memory, etc.), or some combination of the two. The memory (602) stores software (680) implementing the described digital media encoding/decoding using computational complexity and precision signaling techniques.

计算环境可具有附加特征。例如，计算环境(600)包括存储(640)、一个或多个输入设备(650)、一个或多个输出设备(660)以及一个或多个通信连接(670)。诸如总线、控制器或网络等互连机制(未示出)将计算环境(600)的各组件互连。通常，操作系统软件(未示出)为在计算环境(600)中执行的其它软件提供操作环境，并协调计算环境(600)的各组件的活动。A computing environment can have additional features. For example, computing environment (600) includes storage (640), one or more input devices (650), one or more output devices (660), and one or more communication connections (670). An interconnection mechanism (not shown), such as a bus, controller, or network, interconnects the various components of the computing environment (600). In general, operating system software (not shown) provides an operating environment for other software executing in the computing environment (600) and coordinates the activities of the various components of the computing environment (600).

存储(640)可以是可移动或不可移动的，并包括磁盘、磁带或磁带盒、CD-ROM、CD-RW、DVD或可用于储存信息并可在计算环境(600)内访问的任何其它介质。存储(640)存储用于实现所描述的使用计算复杂度和精度信令技术的数字媒体编码/解码的软件(680)的指令。Storage (640) may be removable or non-removable and includes magnetic disks, tape or cassettes, CD-ROM, CD-RW, DVD, or any other medium that can be used to store information and be accessed within the computing environment (600) . The storage (640) stores instructions for implementing the described software (680) for digital media encoding/decoding using computational complexity and precision signaling techniques.

输入设备(650)可以是诸如键盘、鼠标、笔或跟踪球的触摸输入设备、语音输入设备、扫描设备或向计算环境(600)提供输入的另一设备。对于音频，输入设备(650)可以是声卡或接受来自话筒或话筒阵列的模拟或数字形式的音频输入的类似设备，或向计算环境提供音频样本的CD-ROM读取器。输出设备(660)可以是显示器、打印机、CD刻录机或提供来自计算环境(600)的输出的另一设备。The input device (650) may be a touch input device such as a keyboard, mouse, pen, or trackball, a voice input device, a scanning device, or another device that provides input to the computing environment (600). For audio, the input device (650) may be a sound card or similar device that accepts audio input in analog or digital form from a microphone or microphone array, or a CD-ROM reader that provides audio samples to the computing environment. The output device (660) may be a display, printer, CD recorder, or another device that provides output from the computing environment (600).

通信连接(670)允许在通信介质上与另一计算实体的通信。通信介质在已调制数据信号中传达诸如计算机可执行指令、压缩音频或视频信息、或其它数据等信息。已调制数据信号是其一个或多个特征以在信号中编码信息的方式设置或改变的信号。作为示例而非局限，通信介质包括以电、光、RF、红外、声学或其它载波实现的有线或无线技术。A communication connection (670) allows communication with another computing entity over a communication medium. Communication media convey information such as computer-executable instructions, compressed audio or video information, or other data in a modulated data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired or wireless technologies implemented with electrical, optical, RF, infrared, acoustic or other carrier waves.

此处所描述的使用灵活量化技术的数字媒体编码/解码可在计算机可读介质的一般上下文中描述。计算机可读介质可以是可在计算环境内访问的任何可用介质。作为示例而非局限，对于计算环境(600)，计算机可读介质可包括存储器(620)、存储(640)、通信介质和以上任一种的组合。The encoding/decoding of digital media using flexible quantization techniques described herein may be described in the general context of computer-readable media. Computer readable media can be any available media that can be accessed within a computing environment. By way of example, and not limitation, for the computing environment (600), computer-readable media can include memory (620), storage (640), communication media, and combinations of any of the above.

此处描述的使用计算复杂度和精度信令技术的数字媒体编码/解码可在诸如程序模块中所包括的、在目标真实或虚拟处理器上的计算环境中执行的计算机可执行指令的一般上下文中描述。一般而言，程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、库、对象、类、组件、数据结构等。程序模块的功能可以如各实施例中所需的组合或在程序模块之间分离。用于程序模块的计算机可执行指令可以在本地或分布式计算环境中执行。Digital media encoding/decoding using computational complexity and precision signaling techniques described herein may be in the general context of computer-executable instructions executed in a computing environment on a target real or virtual processor, such as included in a program module described in. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The functions of the program modules may be combined as desired in various embodiments or separated among the program modules. Computer-executable instructions for program modules may be executed in local or distributed computing environments.

出于表示的目的，详细描述使用了如“确定”、“生成”、“调整”和“应用”等术语来描述计算环境中的计算机操作。这些术语是由计算机执行的操作的高级抽象，且不应与人类所执行的动作混淆。对应于这些术语的实际的计算机操作取决于实现而不同。For purposes of presentation, the detailed description uses terms such as "determine," "generate," "modify," and "apply" to describe computer operations in a computing environment. These terms are high-level abstractions for operations performed by computers and should not be confused with actions performed by humans. The actual computer operations that correspond to these terms vary depending on the implementation.

鉴于可应用本发明的原理的许多可能的实施例，要求保护落入所附权利要求书及其等效技术方案的范围和精神之内的所有这样的实施例作为本发明。In view of the many possible embodiments to which the principles of the invention may be applied, the invention is claimed all such embodiments which come within the scope and spirit of the appended claims and their equivalents.

Claims

1. A digital media decoding method, comprising:

receiving a compressed digital media bitstream at a digital media decoder;

parsing syntax elements from the bitstream that signal arithmetic precision for transform calculations during processing of the digital media data; and

Output the reconstructed image.

2. The digital media decoding method of claim 1, wherein the syntax element signals to use one of high arithmetic precision or low arithmetic precision.

3. The digital media decoding method according to claim 2, wherein the high arithmetic precision is 32-bit digital processing, and the low arithmetic precision is 16-bit digital processing.

4. The digital media decoding method according to claim 2, further comprising:

decoding a block of transform coefficients from said compressed digital media bitstream;

where the syntax element signals use of the high arithmetic precision, applying an inverse transform to the transform coefficients using high arithmetic precision processing; and

Where the syntax element signals use of the low arithmetic precision, an inverse transform is applied to the transform coefficients using low arithmetic precision processing.

5. The digital media decoding method according to claim 4, wherein the high arithmetic precision is 32-bit digital processing, and the low arithmetic precision is 16-bit digital processing.

6. The digital media decoding method as claimed in claim 2, further comprising:

The inverse transform is applied to the transform coefficients using high arithmetic precision processing, regardless of the arithmetic precision signaled via the syntax element.

7. A digital media encoding method, comprising:

receiving digital media data at a digital media encoder;

making a decision whether to use lower precision arithmetic for transform calculations during processing of said digital media data;

expressing said decision whether to use lower-precision arithmetic for transform calculations by a syntax element in an encoded bitstream, wherein said syntax element can be used to communicate said decision to a digital media decoder; and

The encoded bitstream is output.

8. The digital media encoding method according to claim 7, wherein said making a decision comprises:

verifying that said lower precision arithmetic used for transform calculations produces the same decoder output as using higher precision arithmetic for transform calculations; and

Based on the verification, a decision is made whether to use the lower precision arithmetic.

9. The digital media encoding method of claim 7, wherein the lower precision arithmetic is 16-bit arithmetic precision.

10. digital media encoding method as claimed in claim 7, is characterized in that, also comprises:

making a determination whether to apply scaling of the input digital media data prior to transform encoding; and

The decision whether to apply the scaling is represented by a syntax element in the encoded bitstream.

11. The digital media encoding method according to claim 10, wherein said determining whether to apply scaling comprises deciding not to apply scaling to said input digital media data when encoding said digital media data losslessly .

12. A digital media decoding method, comprising:

receiving a compressed digital media bitstream at a digital media decoder;

parsing syntax elements from the bitstream that signal a precision mode selection for transform calculations during processing of the digital media data;

scaling the output of the decoder if a scaled first-precision mode is signaled;

where a second precision mode without scaling is signaled, omitting to apply the scaling of the output; and

Output the reconstructed image.

13. The digital media decoding method according to claim 12, wherein said scaling the output of said decoder comprises rounding and dividing said output by a certain number.

14. The digital media decoding method according to claim 12, characterized in that, the rounding and division of the output is a rounding and division of the number 8.

15. The digital media decoding method according to claim 12, further comprising:

parsing a second syntax element from the bitstream, the second syntax element signaling whether to use lower arithmetic precision for transform calculations during processing of the digital media data;

decoding a block of transform coefficients from said compressed digital media bitstream; and

In the case of the second precision mode without scaling and the use of a lower arithmetic precision is signaled, the inverse transform process of the transform coefficients is performed using the lower arithmetic precision.

16. The digital media decoding method according to claim 15, wherein the lower arithmetic precision is 16-bit arithmetic precision.

17. The digital media decoding method as claimed in claim 12, wherein said digital media data is encoded using a two-stage transform structure, said two-stage transform structure having a first-stage transform followed by a conversion of said two-stage transform structure. The second stage transformation of the DC coefficient of the first stage transformation, the digital media decoding method also includes:

decoding digital media data from said compressed digital media bitstream;

applying an inverse second stage transform to the digital media data;

applying an inverse first stage transform to the digital media data;

performing color conversion of said digital media data; and

Wherein, in case the first precision mode using scaling is signaled, said scaling of the output of said decoder comprises:

shifting the digital media data to the left by a single bit prior to input to the inverse first-stage transform;

After the color conversion, the digital media data is shifted right by 3 bits.

18. The digital media decoding method according to claim 12, wherein the compressed digital media bit stream is encoded according to a syntax pattern defining separate main image planes and alpha image planes of an image, the syntax elements signaling a selection of a precision mode signaled per picture plane, whereby the precision modes of said main picture plane and said alpha picture plane are signaled independently, and said decoding method comprises performing a parsing signaled said action of said syntax element for selection of a precision mode for each image plane, and if said first precision mode using scaling is signaled for said corresponding image plane, scaling said corresponding image plane output of the decoder.