Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD FOR SCALABLE VIDEO CODING
Document Type and Number:
WIPO Patent Application WO/2006/058921
Kind Code:
A1
Abstract:
The method for inheriting coding data implements the following steps : - superimposition (4), after zooming and for the common video part, of an image (F2) (101) of low resolution onto an image (F1) (102) of higher resolution, - determination (61) of the number of zoomed BR blocks of F2 covering a HR block of F1, - if this number equals 1 (64), assignment of the mode and motion data (65) of the zoomed BR block to this HR image block, - if this number is greater than 1, assignment of a mode and of motion data (66) as a function of the modes and the motion data of the zoomed BR blocks covering this HR block, - merging neighbouring HR blocks depending on the similarity of the motion data and/or of the coding modes assigned to the HR blocks and assignment of the coding modes and motion data to the new blocks according to the coding modes and motion data of the merged blocks.

Inventors:
MARQUANT GWENAELLE (FR)
BURDIN NICOLAS (FR)
LOPEZ PATRICK (FR)
FRANCOIS EDOUARD (FR)
BOISSON GUILLAUME (FR)
Application Number:
PCT/EP2005/056451
Publication Date:
June 08, 2006
Filing Date:
December 02, 2005
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
THOMSON LICENSING (FR)
MARQUANT GWENAELLE (FR)
BURDIN NICOLAS (FR)
LOPEZ PATRICK (FR)
FRANCOIS EDOUARD (FR)
BOISSON GUILLAUME (FR)
International Classes:
H04N7/26; H04N7/36; H04N7/46
Domestic Patent References:
WO2001077871A12001-10-18
WO2004073312A12004-08-26
Foreign References:
US20020154697A12002-10-24
EP0485230A21992-05-13
US20020001411A12002-01-03
EP1322121A22003-06-25
Other References:
REICHEL J ET AL: "Scalable Video Model 3.0", ISO/IEC JTC1/SC29/WG11 N6716, XX, XX, October 2004 (2004-10-01), pages 1 - 85, XP002341767
WIEGAND T: "JOINT MODEL NUMBER 1, REVISION 1(JM-IRL)", ITU STUDY GROUP 16 - VIDEO CODING EXPERTS GROUP, 3 December 2001 (2001-12-03), pages 1,3 - 75, XP001086627
WAN W K ET AL: "Adaptive format conversion for video scalability at low enhancement bitrates", PROCEEDINGS OF THE 44TH. IEEE 2001 MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS. MWSCAS 2001. DAYTON, OH, AUG. 14 - 17, 2001, MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, NEW YORK, NY : IEEE, US, vol. VOL. 1 OF 2, 14 August 2001 (2001-08-14), pages 588 - 592, XP010579270, ISBN: 0-7803-7150-X
Attorney, Agent or Firm:
Ruellan-lemonnier, Brigitte (46 quai A. Le Gallo, Boulogne Cedex, FR)
Download PDF:
Claims:
CLAIMS
1. Method for inheriting coding data, for a first image (F1 ) in the format F1 (102), from coding data of a second image (F2) in the format F2 (101 ) of lower resolution than the first format F1 , the video content of the images (F1 ) and (F2) having at least one common part, the first image and the second image being partitioned respectively into HR blocks and BR blocks, a coding mode and motion data having being calculated and attributed to BR blocks of the image (F2) for their coding/decoding, characterized in that the following steps are implemented: superimposition (4), after zooming (3) and for the common part, of the image (F2) onto the image (F1 ) determination (61 ) of the number of zoomed BR blocks covering the said HR block, if this number equals 1 (64), assignment of the mode and motion data (65) of the zoomed BR block to this HR image block, if this number is greater than 1 , assignment of a mode and of motion data (66) as a function of the modes and the motion data of the zoomed BR blocks covering this HR block, merging neighbouring HR blocks depending on the similarity of the motion data and/or of the coding modes assigned to the HR blocks and assignment of the coding modes and motion data to the new blocks according to the coding modes and motion data of the merged blocks. Method according to claim 1 , characterised in that the first image is split into macroblocks made up of a predetermined number of HR blocks and in that the step of merging is implemented on HR blocks of a same macroblock.
2. Method according to Claim 1 , characterized in that the lowresolution image is partitioned into macroblocks, in that a macroblock is partitioned into submacroblocks, which are themselves subdivided into blocks according to the coding modes chosen for the macroblock, a BR block corresponding, for each macroblock, to a macroblock, a submacroblock or a block, depending on the partition of the macroblock. Method according to Claim 1 , characterized in that the size of a HR block corresponds to the size of a block making up a submacroblock in the MPEG4 standard.
3. Method according to Claim 1 , characterized in that, if the number is greater than one, the coding mode assigned to the HR block is the majority or predominant mode.
4. Method according to Claim 5, characterized in that the motion data are at least a motion vector and in that the motion vector assigned to the HR block is chosen from amongst the candidate motion vectors corresponding to the zoomed BR blocks covering the block and having the assigned coding mode.
5. Method according to Claim 1 , characterized in that the images are subband images obtained by temporal decomposition of the wavelet type or subband coding of the source images.
6. Method according to claim 1 , characterised in that the BR blocks covering a HR block are determined by selecting the BR blocks colocated to the pixels corresponding to each corner of the HR block.
7. Method according to claim 1 , characterised in that motion data are the picture reference indexes and corresponding motion vectors as defined in H264 standard.
8. Method according to claim 9, characterised in that, when several reference indexes exist for a list, the one selected for the block is the minimum and in that, when several motion vectors exist for the selected reference index, the assigned motion vector corresponds to their mean value.
9. Method for scalable coding of images with a F1 format (102) corresponding to an enhanced layer and with a F2 format (101 ) having a lower resolution than F1 , corresponding to a base layer, characterised in that it implements a method for inheriting coding data according to claim 1 and in that it further implements a step of coding blocks or merged blocks according to the assigned coding mode and motion data.
10. Method for decoding data of images with a F1 format (102) corresponding to an enhanced layer and a F2 format (101 ) having a lower resolution than F1 , corresponding to a base layer, characterised in that it implements a method for inheriting coding data according to claim 1 and in that it further implements a step of decoding blocks or merged blocks according to the assigned coding mode and motion data.
Description:
METHOD FOR SCALABLE VIDEO CODING

The invention relates to a method for scalable video coding and decoding of high resolution images from images with lower resolution, more specifically a method for inheriting coding data from images with lower resolution in order to code and decode high resolution images.

It also relates to a video sequence bitstream generation. Said coded bitstream is constituted of encoded video data in which each data item is described by means of a bitstream syntax allowing to recognize and decode all the elements of the content of said bitstream.

The domain is the video compression complying, for instance, with the video coding standards of the MPEG family such as MPEG-1 , MPEG-2, MPEG-4 and the recommendations of the ITU-H. 26X family (H. 261 , H. 263 and extensions, H. 264).

Video coders with spatial scalability are a known field. The data stream generated by the video coder has a scalable hierarchy; the coded data is incorporated into the stream in a hierarchical manner, with spatial scalability. The video formats to which these coders relate are those for which the dimensions of the high resolution correspond to a multiple of 2 of those of the low resolution allowing a dyadic decomposition. Thus, a coding compatible with a Quarter Common Intermediate Format, or QCIF, and of dimensions 176 x 144 and with a CIF format, of dimensions 352 x 288, or else a coding compatible with a CIF format and with a 4 CIF format, of dimensions 704 x 576, can be obtained by under-sampling and filtering of the high-resolution image.

The hierarchical coding allows to obtain a base layer relating to the low-resolution format and an upper layer that corresponds to the higher- resolution format. The complementary data relating to the upper layer are generally calculated according to a method comprising the following steps:

- coding of the low-resolution image and local decoding of this coded image in order to obtain a reconstructed image,

- rescaling or zoom of the reconstructed low-resolution image, for example by interpolation and filtering, in order to obtain an image in high- resolution format,

- pixel-to-pixel difference of the luminance values of the source image and of a prediction image based on the reconstructed image in order to obtain residues forming the data of the upper layer.

The method is also applied to the chrominance images if they exist. In simplified versions, the reconstructed image is directly the original low-resolution image. Thus, the coding of the high-resolution image exploits the low- resolution image rescaled as a prediction image.

The last step exploits a coding mode called default inter-layer mode using the macroblock of the zoomed image overlaid on the current macroblock of the high-resolution image to be coded, as prediction macroblock. This mode corresponds to the inter-layer mode with a zero motion vector. It is referred to as default because no motion vector is transmitted, the current macroblock using motion vectors derived from those of its prediction macroblock. This coding mode can only be exploited in the case where the images can be overlaid. If the high-resolution format is not linked to the low-resolution format by a dyadic transformation, the macroblocks cannot be overlaid and the default inter-layer coding mode cannot be used.

The available coding modes do not therefore allow the cost of coding the high-resolution image to be optimized, in particular when the image resolutions or formats are not proportional.

The invention addresses the spatial scalability issue and also proposes additional syntax for the MPEG 4 part 10 or H264. The Scalable Video Model relating to this standard, known as SVM, only addresses dyadic spatial scalability, that is, configurations where the factor between pictures width and height of two successive (high to low) spatial layers (or resolutions) equals 2.

In SVM3.0, Scalable Video Model version described in document "Scalable Video Model 3.0", ISO/IEC JTC 1/SC 29/WG 11 N6716, J.Reichel, M.Wien, H.Schwarz, an approach for inter-layer prediction of

motion and texture data is proposed. The approach is limited to case where the low layer resolution has half width and height of the high layer resolution.

One of the aim of the invention is to overcome the aforementioned drawbacks.

The subject of the invention is a method for inheriting coding data, for a first image (F1 ) in the format F1 , from coding data of a second image (F2) in the format F2 of lower resolution than the first format F1 , the video content of the images (F1 ) and (F2) having at least one common part, the first image and the second image being partitioned respectively into HR blocks and BR blocks, a coding mode and motion data having being calculated and attributed to BR blocks of the image (F2) for their coding/decoding, characterized in that the following steps are implemented:

- superimposition, after zooming and for the common part, of the image (F2) onto the image (F1 )

- determination of the number of zoomed BR blocks covering the said HR block,

- if this number equals 1 , assignment of the mode and motion data of the zoomed BR block to this HR image block, - if this number is greater than 1 , assignment of a mode and of motion data as a function of the modes and the motion data of the zoomed BR blocks covering this HR block,

- merging neighbouring HR blocks depending on the similarity of the motion data and/or of the coding modes assigned to the HR blocks and assignment of the coding modes and motion data to the new blocks according to the coding modes and motion data of the merged blocks.

According to one particular amendment of the method, the first image is split into macroblocks made up of a predetermined number of HR blocks and the step of merging is implemented on HR blocks of a same macroblock.

According to one particular amendment of the method, the low- resolution image is partitioned into macroblocks, a macroblock is partitioned into sub-macroblocks, which are themselves subdivided into blocks according to the coding modes chosen for the macroblock, a BR block

corresponding, for each macroblock, to a macroblock, a sub-macroblock or a block, depending on the partition of the macroblock.

According to one particular amendment of the method, the size of a HR block corresponds to the size of a block making up a sub-macroblock in the MPEG4 standard.

According to one particular amendment of the method, if the number is greater than one, the coding mode assigned to the HR block is the majority or predominant mode.

According to one particular amendment, the motion data are at least a motion vector and the motion vector assigned to the HR block is chosen from amongst the candidate motion vectors corresponding to the zoomed BR blocks covering the block and having the assigned coding mode.

According to one particular amendment of the method, the images are sub-band images obtained by temporal decomposition of the wavelet type or sub-band coding of the source images.

According to one particular amendment of the method, the BR blocks covering a HR block are determined by selecting the BR blocks colocated to the pixels corresponding to each corner of the HR block.

According to one particular amendment of the method motion data are the picture reference indexes and corresponding motion vectors as defined in H264 standard.

According to one particular amendment, when several reference indexes exist for a list, the one selected for the block is the minimum and, when several motion vectors exist for the selected reference index, the assigned motion vector corresponds to their mean value.

The invention also relates to a method for scalable coding of images with a F1 format corresponding to an enhanced layer and with a F2 format having a lower resolution than F1 , corresponding to a base layer, characterised in that it implements a method for inheriting coding data according to the previous method and in that it further implements a step of coding blocks or merged blocks according to the assigned coding mode and motion data.

The invention also relates to a method for decoding data of images with a F1 format corresponding to an enhanced layer and a F2 format having a lower resolution than F1 , corresponding to a base layer,

characterised in that it implements a method for inheriting coding data according to the previous method and in that it further implements a step of decoding blocks or merged blocks according to the assigned coding mode and motion data.

The default inter-layer predictive coding mode consists in exploiting sub-macroblocks and/or blocks making up macroblocks of the rescaled low-resolution image, overlaying HR blocks of the high-resolution image, for the coding of these HR blocks or of macroblocks formed from these blocks.

This default coding mode allows the inter-layer redundancy to be exploited efficiently and the total cost of coding to thus be reduced. An improved ratio image quality / coding cost can be obtained by way of this additional coding mode. A further advantage is that it allows correlation calculations that are costly in terms of processing time or calculating power to be avoided.

The invention allows to extend the inter-layer prediction to generic cases where low layer and high layer pictures dimensions are not multiple of 2. New syntax elements are added to deal with generic cases when non dyadic spatial scalability appears, allowing efficient encoding/decoding of the macroblocks of the enhancement or high layer knowing the decoded base layer or low layer.

A new tool is proposed, providing extended spatial scalability, that is, managing configurations where: - the factor between pictures width and height of two successive

(high to low) spatial layers (or resolutions) does not necessarily equal 2;

- pictures of the higher level of resolution can contain parts (picture borders) that are not present in corresponding pictures of the low level of resolution.

Other features and advantages of the invention will become clearly apparent in the following description, presented by way of non-limiting example and with regard to the appended figures, which show:

- Figure 1 , a flow diagram of the coding method, - Figure 2, a representation of the formats to be coded,

- Figure 3, an illustration of the coding modes of a macroblock,

- Figure 4, an overlay of the macroblocks of a high-resolution and low-resolution image,

- Figure 5, a macroblock of the high-resolution image, - Figure 6, a flow diagram of the inter-layer coding method,

- Figure 7, a flow diagram of the process for selection of the coding mode and of the motion vectors of an HR block,

- Figure 8, an illustration of the steps of the method,

- Figure 9, examples of the assignment of the coding modes and motion vectors to the HR blocks.

- Figure 10, relations between enhancement layer and base layer,

- Figure 1 1 , macroblock overlapping between upsampled base layer picture and enhancement layer picture, - Figure 12, authorized blocks for inter layer motion vector prediction,

- Figure 13, pixels of a 4x4 block b,

- Figure 14, 4x4 blocks b of a 8x8 block B

- Figure 15, 8x8 blocks B of a prediction macroblock, - Figure 16, authorized blocks for inter layer texture prediction,

- Figure 17, co-located base layer macroblocks and 8x8 blocks of a macrobloc MB.

The method for coding the data is a hierarchical coding method, in other words the stream of coded data is structured in a hierarchical manner, the data relating to the lowest-resolution format being integrated into a base layer or lower layer, the complementary data relating to the higher- resolution format being integrated into an upper layer. Selecting only the data relating to one standard or one format, from within the data stream, is thus facilitated by selecting only the layers corresponding to the desired resolution level. Here, this corresponds to spatial scalability, compatible with any temporal scalability demanded by the standard relating to the resolution format.

The invention relates to the coding of video contents with different formats, which contents could be different but have at least one

video part in common. It is particularly adapted to the formats that are non- proportional in width and/or height of the image. One of the formats is of lower resolution than the other format. It is either of lower definition with the number of pixels per row or the number of rows defining, for example, the common video part, being lower, or, for a same definition, of smaller size.

Figure 1 shows a flow diagram of the coding method according to the invention.

A first step 1 takes the various video formats to be coded into account. The stream of coded data obtained at the output of the coder supplies decoders compatible with one of these formats, the selection of the format, dependent on the display device, on the decoder or on parameters such as the transmission rate, being made by filtering the data from this stream of coded data, upstream of or within the decoder. In the example described, a first high-resolution format F1 and a second low-resolution format F2 are used. Each of these formats is defined by its width LF1 , LF2 or number of pixels in a row and its height HF1 , HF2 or number of rows.

It is considered that the video sources supplying the coder are in the F1 and F2 formats. These correspond, possibly for only one part, to the same video content. For only one part means that the video content of these two sources is then different, in other words that a simple homothetic transformation from one format to the other is not possible or, put another way, that the formats are not proportional. Geometric parameters, allowing the video part common to the two formats to be defined, are also transmitted to the coder.

The creation of these source images and the calculation of the geometric parameters may, for example, be effected in the following manner:

Starting from the chosen formats, a first and a second video window are dimensioned and positioned within an original image in order to define the video contents of this image to be coded in each of the formats. It is assumed that these two windows overlap at least partially. They define the video contents to be coded in the F1 format and in the F2 format. The dimensions of these first and second windows are chosen to be homothetic with the F1 and F2 formats, respectively.

The high- and low-resolution source images transmitted to the coder may have the same definition as the original image or different definitions from that of the original image or from one another, depending on whether they undergo filtering and sampling operations or not. The high- resolution image, called (F1), is chosen as reference image in order to define the geometric parameters. These are, for example, the position of the image

(F2) with F2 format within the image (F1 ) and the definition ratio which corresponds to the zoom to be applied to the image (F2) in order to make the video content of (F2) correspond to the video content of (F1 ) for the common part.

Step 2 performs a coding of the video image in the F2 format.

Step 3 performs the decoding of this coded image in order to supply a local decoded image or reconstructed image. In one simplified version, the local decoded image may consist of the original image before coding, in other words of the source image (F2). This image is then rescaled or zoomed, by a ratio corresponding to the geometric parameter relating to the definition ratio, in order to obtain a zoomed image (F2), called (Fz).

The following step 4 performs the positioning of the image (Fz) on the high-resolution image (F1) as a function of the geometric parameter relating to the position, so as to make the video contents correspond.

The next step 5 performs the coding of the high-resolution image. This coding takes into account various coding modes, of which the inter-layer coding, which is the subject of the invention, is explained below.

Step 6 inserts the coding data relating to the image (F2) into a base layer of the data stream and the coding data specific to the high- resolution image (F1 ) into an upper layer.

Figure 2 shows a first video content in the high-resolution coding format F1 , referenced 11 and a window with dimensions l_ w and H w , referenced 12. This window is defined and positioned within the high- resolution image based on the geometric parameters. The video content of this window is calculated from the video content in the low-resolution coding F2 format of dimensions LF2 and HF2, referenced 13. The image in the F2 format is coded then decoded in order to deliver a local decoded image

which is then oversampled to produce a rescaled or zoomed image (Fz) with the dimensions of the window 12.

Figure 3 shows, in a first row, macroblocks of dimensions 16x16 pixels and the various partitions of macroblocks into sub-macroblocks such as are proposed by the MPEG4 standard. These are sub-macroblocks of dimensions 16x8, 8x16 and 8x8 pixels. Again according to the standard, the coding mode and the prediction block during the coding in inter mode can be defined for each sub-macroblock. A coding mode and, where required, a motion vector MV defining the correlated image block are thus assigned to this sub-macroblock.

In a second row in Figure 3, sub-macroblocks of dimensions 8x8 pixels and the various sub-partitions of the sub-macroblocks into blocks are shown. According to the MPEG4 standard, in the case where the macroblock is divided into 4 sub-macroblocks of dimensions 8x8 pixels, a new decomposition of these sub-macroblocks is possible into blocks of dimensions 8x4, 4x8 and 4x4 pixels for the calculation of the prediction blocks. Thus, for a coding mode defined for the 8x8 pixel sub-macroblock, the correlation calculations may make use of various subdivisions of the sub- macroblock into blocks and therefore various motion vectors associated with these blocks.

As previously indicated, the high and low resolutions do not necessarily correspond to dyadic transformations, and may also be different along the horizontal x-axis and the vertical y-axis. Figure 4 shows an overlay of a high-resolution image, referenced 31 and partitioned into HR macroblocks onto a zoomed low-resolution image referenced 32 and partitioned into BR macroblocks. An HR macroblock of the high-resolution image may have a correspondence with either no BR macroblock of the zoomed low-resolution image, for example for the edges of the high- resolution image, or one or more BR macroblocks of this zoomed low- resolution image. Figure 5 shows an HR macroblock of the high-resolution image that has a correspondence with 4 BR macroblocks of the zoomed low- resolution image.

The high-resolution image is subdivided into HR macroblocks for its coding corresponding to step 5 in Figure 1. As in the MPEG4 standard, for each macroblock, a coding corresponding to a possible partition into sub- macroblocks or blocks is determined with, for a macroblock, sub-macroblock or block, the assignment of a coding mode and of motion vectors. This coding is for example a coding of the intra type for the macroblock or of the inter type for example between the current high-resolution image and the preceding high-resolution image. A new coding mode, called inter-layer coding and described hereinbelow, with reference to Figures 6 and 7, is proposed. It is added to the various known coding modes, from which the selection for the coding of the macroblock is made. This selection is generally based on entropic criteria, on criteria of coding cost, of distortion, etc. Preferably, the HR macroblocks have a size of 16x16 pixels and the HR blocks have a size corresponding to that of the blocks of a sub-macroblock in the MPEG4 standard, i.e. 4x4 pixels.

Figures 6 and 7 describe the inter-layer coding mode according to the invention.

The current macroblock of the high-resolution image is taken into account at step 51. This macroblock is subdivided into HR blocks at step 52. Step 53 determines, for each block, a coding mode and a motion vector according to the process corresponding to steps 61 to 66 described in Figure 7. The following step 54 defines the inter-layer coding mode for the macroblock, which will depend on the various coding modes and motion vectors assigned to the blocks forming the macroblock. It calculates the prediction macroblock, made up of all of the prediction blocks relating to the HR blocks forming the macroblock. Depending on the motion vectors assigned to the HR blocks, the macroblock may be partitioned, for this coding mode, into sub-macroblocks, by fusion of the HR blocks with the same motion vector.

Step 53 selects the coding mode and calculates the motion vectors of each of the HR blocks forming an HR macroblock. For this purpose, a step 61 determines the number of zoomed BR blocks covering the HR block for each of the blocks of the macroblock. If this number is equal to 0, test performed at step 62, step 63 is the next step and defines the inter-

layer mode as not taken into account. If this number is equal to 1 , test performed at step 64, the following step 65 defines the coding mode and the motion vector of the HR block as being the coding mode and motion vector of the zoomed BR block covering this HR block. In the opposite case, step 66 defines the coding mode and the motion vector for this HR block. For this purpose, a coding mode and a motion vector are assigned to each of the pixels of the HR block depending on the coding mode and motion vector of the zoomed BR block covering the pixel. The coding mode and the motion vector of the HR block are calculated as a function of those assigned to the pixels of the HR block. This could be a majority choice, in other words of the mode or vector corresponding to the majority of the pixels of the HR block. In an improved version, several zoomed BR blocks covering the HR block, but having different coding modes, can contribute to the HR block. For example, a first zoomed BR block has the anticipated prediction mode, better known in the MPEG standard as 'inter forward' mode, and its neighbour the 'direct spatial' mode that consists in choosing, for a current zoomed block, the mode of the neighbouring block. In the end, both these blocks can contribute to the HR block, the latter being assigned the 'inter forward' mode.

Figure 8 illustrates these various steps in examples of a zoom of 1.5 and of a zoom of 1.6 of the low-resolution image. During the coding of this low-resolution image, a macroblock of the low-resolution image, of dimensions 16x16 pixels, referenced 71 , is for example partitioned into various BR sub-macroblocks and blocks of sizes 8x8 pixels, 8x4 pixels, 4x8 pixels and 4x4 pixels. In the first example, the zoomed BR sub-macroblocks and blocks of the zoomed macroblock referenced 72 correspond to height and width dimensions of 12 pixels and of 6 pixels for the high-resolution image.

The high-resolution image is subdivided into macroblocks of size 16x16 pixels and these macroblocks are partitioned into HR blocks of size 4x4 pixels.

The zoomed low-resolution image is overlaid onto the high- resolution image, 36 HR blocks of dimensions 4x4 correspond to a zoomed BR macroblock referenced 73. The shaded HR blocks are the blocks covered by a single zoomed BR sub-macroblock or block, depending on the subdivision applied during the coding.

An additional step, consisting in a finer subdivision of the HR blocks, referenced 74, that yields HR sub-blocks, allows the coverage by a single zoomed sub-macroblock or block to be improved by making it correspond to this entity of smaller size. Here, all of these sub-blocks are covered by a single zoomed BR block. The assignment of the coding mode and of the motion vectors is effected at the HR sub-block level and no longer at the level of the HR block. This additional step may be triggered for example in the case where the number of HR blocks not covered by a single zoomed BR block is greater than a given threshold. In the second example of a zoom ratio of 1.6, the zoomed BR sub-macroblocks and blocks of the zoomed macroblock referenced 75 correspond to height and width dimensions of 12.8 pixels and of 6.4 pixels of the high-resolution image.

The high-resolution image is subdivided into macroblocks of size 16x16 pixels and these macroblocks are partitioned into HR blocks of size 4x4 pixels.

The zoomed low-resolution image is overlaid onto the high- resolution image, and here a zoomed BR macroblock 76 corresponds to an area of less than 36 HR blocks of dimensions 4x4. The shaded HR blocks are the blocks covered by a single BR sub-macroblock or block depending on the subdivision effected during the coding of the low-resolution image.

The additional step, consisting of a finer subdivision of the HR blocks, referenced 77, to yield HR sub-blocks, allows more pixels of the high- resolution image to be made to correspond to the zoomed BR sub- macroblocks or blocks, which pixels correspond to the sub-blocks covered by a single zoomed BR sub-macroblock or block. This step can be followed by a fusion step in order to restore the size of the HR blocks, if, for example, it relates to the minimum size of the sub-macroblock blocks in the MPEG4 standard. The choice of the coding mode and of the motion vector assigned to the HR block formed from 4 fused sub-blocks may use for example the majority rule or, for the motion vectors, the mean or median value. Furthermore, this fusion principle may be extended to the HR sub- macroblocks, or even to the HR macroblocks, by adopting similar fusion rules.

As previously indicated, this additional partitioning step may be triggered in the case where the number of HR blocks not covered by a single zoomed BR block is greater than a given threshold.

The above explanation relates of course to exemplary embodiments. The subdivision of the high-resolution image may be carried out, without departing from the scope of the invention, into blocks of any given size of NxM pixels. The size may be chosen depending on the coding standards, on the horizontal and vertical zoom ratios, on the compromise between coding cost and calculation cost, a smaller size allowing a better correlation between blocks of size MxN and zoomed BR blocks owing to a non-singular choice of coding mode and motion vector for a larger surface area of the HR image as shown by the shaded area of the blocks 76 and 77 in Figure 8.

Figure 9 presents an example of assignment of the coding modes and motion vectors at the HR block level. The macroblock 91 is the zoomed BR macroblock partitioned into sub-macroblocks and blocks. A coding mode (m or mode) and a motion vector (mv) have been assigned to each sub-macroblock and block. The high-resolution image is subdivided into HR macroblocks and blocks referenced 92. Only the HR blocks 92 covered, even partially, by the macroblock 91 are shown in the figure.

A first step consists in assigning the coding modes to the HR blocks of a macroblock. The blocks covered by a single zoomed BR sub-macroblock or block, shown as shaded, are assigned the coding modes of the sub- macroblocks or blocks covering them. For the other cases, the choice of the dominant mode can be made.

The block referenced 93 is covered by a zoomed BR sub- macroblock and block with coding mode 0 and 1. The mode 1 is assigned to this block because this is the dominant mode, since the block corresponding to this mode 1 covers the largest number of pixels of the HR block 93.

The block referenced 94 is covered by two zoomed BR blocks with coding modes 0 and 3. There is no dominant mode since these blocks cover the same number of pixels of the HR block, so the first mode

encountered is for example chosen, which is the mode 0 and which is assigned to this HR block 94.

The block referenced 95 is covered by a zoomed BR sub- macroblock and three zoomed BR blocks with coding modes 0, 2, 0, 3. The mode 0 is assigned to this HR block since it is the dominant mode, the zoomed BR block corresponding to this mode 0 covering the largest number of pixels of the HR block.

A second step consists in assigning the motion vectors to the HR blocks. The basic rule is to consider, as candidate vectors, the motion vectors of the sub-macroblocks or blocks which both cover the HR block and have, as coding mode, the mode assigned to the HR block.

The following complementary rules may be chosen: if the mode assigned is a dominant mode, the motion vector that is dominant and that therefore relates to this mode is chosen. In this case, the motion vector assigned to the block 93 is the vector mv1. if several blocks or sub-macroblocks cover the HR block, the motion vector is an average, potentially weighted, of the candidate motion vectors. In this case, the components of the motion vector which is assigned to the block 93 are an average of the components of the vectors mv1 and mvO. This average can be weighted as a function of the number of pixels of the HR block corresponding to each of the vectors. if more than two blocks or sub-macroblocks cover the HR block, the motion vector is the median value of the candidate vectors.

These examples are non-limiting examples, such that it is equally possible, for the calculation of the motion vectors, to choose the mean or median value of the whole of the motion vectors of the blocks or sub-macroblocks covering the HR block, without taking into account the coding modes. The motion vectors of the preceding images may also be taken into account in the choice of the inter-layer mode motion vector. When the coding modes and motion vectors have been assigned to all of the HR blocks of the macroblock of the high-resolution image, the inter-layer coding of this macroblock is carried out. The macroblock may be partitioned into sub-macroblocks and blocks of various sizes, to which coding modes and motion vectors are assigned. These sub- macroblocks and blocks are for example of sizes conforming to a coding

standard such as the MPEG4 standard. The partition depends on the coding modes and motion vectors assigned to the HR blocks forming these sub- macroblocks and blocks. The fusion rules for the HR blocks, for forming sub- macroblocks, are known and for example relate to the motion vectors having the same value. The size of the sub-macroblocks of the high-resolution image then depends on the homogeneity of the motion vector field. These partitioning criteria are relative to the coding costs, distortions, etc.

When the partitioning has been carried out, the coding of the macroblock thus partitioned, performed according to the inter-layer coding mode, is compared with the other coding modes for the macroblock in order to determine the one actually implemented by the coder.

The subdivision of the image into HR blocks, the assignment of coding modes and of motion vectors, together with the fusion of the blocks, can also be performed at the image level, in other words before the coding by macroblock. Once the modes and vectors have been assigned to all of the blocks concerned, the neighbouring blocks that have similarities as regards these modes and vectors are fused until they correspond to sub-macroblocks or macroblocks of the image. These fusion processes depend not only on the size of the macroblocks and sub-macroblocks, but also on the location of these macroblocks within the high-resolution image.

The predictive coding mode proposed here is called 'inter-layer' mode since it makes use of coding data of the lower-resolution image and 'default' since it does not carry out a calculation for the determination of the mode and of the associated motion vector. This candidate coding mode performs a coding of the macroblock of the high-resolution image according to the associated mode or modes and motion vectors, for example starting from a block of a preceding high-resolution image designated by the associated motion vector if using the inter mode, or else starting from an already coded macroblock of the current high-resolution image designated by the associated motion vector if using the intra mode associated with the macroblock.

This coding mode described is to be added to the other known candidate coding modes for the actual coding of the high-resolution image,

the choice of the mode being for example dependent on the coding cost and distortion of the image relating to each of these modes.

Amongst the known modes, a coding mode called intra predictive coding uses one of the previously coded macroblocks of the image. The current macroblock of the current high-definition image is coded by taking into account one of the already coded macroblocks of this high- resolution image and designated by the motion vector. This selection is made as a function of the strength of correlation with the current macroblock to be coded. Another coding mode, called inter predictive coding, uses a previously coded high-resolution image. The current macrobloc of the high- resolution image is coded starting from a predictive macroblock which is an image block selected within a search window of a preceding high-resolution image. This selection is made as a function of the strength of correlation with the current macrobloc to be coded; the image block selected is defined by a motion vector. This mode may exploit more than one previously coded image, for example in the case of the bidirectional inter mode.

As regards an inter-layer predictive coding mode, this performs a coding of the macroblock of the high-resolution image starting from a prediction macroblock corresponding to the co-localized macroblocks, in other words having the same position for at least a part of their pixels, within the zoomed low-resolution image.

The low-resolution image used for the zoom can be the low- resolution image corresponding, from the temporal point of view, to the high- resolution image, but equally one or more preceding low-resolution images. The prediction block can thus be sought within the current or preceding low- resolution image which may be either the source image or the rescaled or zoomed reconstructed image.

Of course, the method is not limited, for the association of the modes and of the motion vectors with the blocks, to a single motion vector and the coding modes making use of several motion vectors, or even with no motion vector, are within the scope of the invention. Thus, when the associated coding mode is the pure intra mode, there is no assigned motion

vector. Similarly, when the intercoding mode is of the bidirectional type, two vectors are associated with the coding mode.

In the same way, the method is not limited to the coding mode and motion vector. Other parameters, such as the number or numbers of the temporal reference images, the weighting factors of the luminance, respectively known as the 'reference picture index' and the 'weighted prediction factor' in the MPEG4 AVC standard, may be used according to a similar process.

In the description, the coding of the high-resolution image is carried out at the level of macroblocks made up of HR blocks. It could just as easily be envisaged, without departing from the scope of the invention, to perform a coding of the image at the HR block level, a macroblock then being composed of a single HR block and the default inter-layer predictive coding mode of the macroblock then corresponding to the coding mode and motion vector assigned according to the method to the HR block forming this macroblock.

The video images coded according to the method can be source images but equally sub-band images obtained by temporal decomposition of the wavelet type or sub-band coding of the source images.

The following part of the description gives other specific implementations of the assignment or inheritance process of the coding modes and motion vectors, including macroblock partitioning, from the base layer, also proposing modifications to the existing syntax used for the data bitstream.

Terms or semantic corresponding to the standard H 264, for example the reference indices, the reference list... or used in SVM 3,0, such as base_layer_mode, qpel_refinement_mode...are not defined as definition can be found in documents ITU_T Rec. H.264 (2002E) or "Scalable Video

Model 3.0" previously cited.

Also, the mathematical operators are those used in the C programming language or in the specifications of the standard, defined for example in document ITU-T Rec.H.264 (2002 E) paragraph 5.1. For

example, "/" corresponds to the integer division, "%" to the remainder of the division, "(..)?0,1" has the value 0 or 1 when the equation into brackets is respectively true or false.

Figure 10 represents a picture within two successive spatial layers, a low layer 101 , considered as base layer, and a high layer 102, considered as enhancement layer. Width and height of enhancement layer pictures are defined respectively as w enh and h enh . In the same way, base layer pictures dimensions are defined as w base and h base . They are a subsampled version of sub-pictures of enhancement layer pictures, of dimensions w extr act and h extr act, positioned at coordinates {x or ;g , y O πg) in the enhancement layer pictures coordinates system. In addition, w enh , h enh , w base , h base , Wextract and hextract are constrained to be multiple of 16. The enhancement and base layer pictures are divided in macroblocks.

Upsampling factors between base layer pictures and extraction pictures in enhancement layer are respectively defined, for horizontal and vertical dimensions, as :

Ohoriz = Wextract I W baS e Overtic ~ "extract ' "base

Parameters (x or ig , y o ng , Ohoπz , α ve rtic) completely define the geometrical relations between high and low layer pictures. For example, in a standard dyadic scalable version, these parameters are equal to (0,0,2,2).

In many cases a macroblock of enhancement layer may have either no base layer corresponding block, for example on borders of the enhancement layer picture, either one or several base layer corresponding macroblocks as illustrated in figure 11. In this figure, dashed lines 111 represent macroblocks of the upsampled base layer picture and solid lines 112 macroblocks of the enhancement layer picture. Consequently a different managing of the inter layer prediction than in SVM3.0 is necessary.

In the sequel, according to the semantic defined further, we will name: scaled_base_column = x O φ scaled_base_line = scaled_base_width = w ex tract scaled_base_height = hextract

A given high layer macroblock can exploit inter-layer motion prediction using scaled base layer motion data, using either "BASE_LAYER_MODE" or "QPEL_REFINEMENT_MODE", as in case of dyadic spatial scalability. In case of using one of these two modes, the high layer macroblock is reconstructed with default motion data deduced from the base layer. These modes are only authorized for high layer macroblocks having corresponding base layer blocks, blocks corresponding, in figure 12, to the grey-colored area 121. On this figure, bold solid lines 122 represent the upsampled base layer window, bold dashed lines, the macroblocks 123 of the base layer window. This grey area 121 corresponds to macroblocks whose coordinates (MB χ ,Mb y ) respect the following conditions: MB x >= scaled_base_column / 16 and MB x < (scaled_base_column+ scaled_base_width+15) / 16 and MBy >= scaled_base_ line / 16 and

MBy < (scaled_base_ line + scaled_base_height+15) / 16

As in SVM3.0, these macroblock modes indicate that the motion/prediction information including macroblock partitioning are directly derived from the base layer. The solution consists in constructing a prediction macroblock, MBjpred, inheriting motion data from base layer. When using "BASE_LAYER_MODE" mode, the macroblock partitioning as well as the reference indices and motion vectors are those of the prediction macroblock MB_pred. "QPEL_REFINEMENT_MODE" is similar, but a quarter-sample motion vector refinement is achieved. The process to derive MBjpred works in three steps: for each 4x4 block of MBjpred, inheritance of motion data from the base layer motion data partitioning choice for each 8x8 block of MBjpred mode choice for MBjpred These 3 steps are described in the following paragraphs.

Motion data block inheritance

Let's consider a 4x4 high layer block b as represented in figure

13. The process consists in looking at the four corners of this block, identified as c1 to c4. Let (x, y) be the position of a corner pixel c in the high layer

coordinates system. Let (Xbase, Ybase) the corresponding position in the base layer coordinates system, defined as follows:

1 χ - χ oriJ/ k base OC horiz

(y-yorij.

Y base 'OC vertic where [a] is the integer portion a.

Using the semantic described further, the actual formulas are as follows :

_ {x - scaled base column).16. (BasePicWidthlnMbsMinus 1 + 1 )

X bαse scaled base width

{y - scaled base line ).16. (BasePicHeightlnMbsMinus 1 + 1 )

J base scaled base height

BasePic_width_in_mbs_minus and BasePic_height_in_mbs_minus1 specify the size of the base layer luma picture array in units of macroblocks.

In the sequel, the co-located macroblock of pixel (x,y) is the base layer macroblock containing pixel (Xbase,ybase)- In the same way, the co- located 8x8 block of pixel (x,y) is the base layer 8x8 block containing pixel

(Xbase.ybase) and the co-located 4x4 block of pixel (x,y) is the base layer 4x4 block containing pixel (Xbase, Ybase)-

The motion data inheritance process for b is the following: - for each corner c, the reference index r(c,listx) and motion vector mv(c,x) of each list listx (listx=0 or 1 ) are set to those, if they exist, of the co-located base layer 4x4 block - for each corner c,

> if no co-located macroblock exists, or if the co-located macroblock is in intra mode, then b is set as an intra block

> else, for each list listx

» if none of the corners uses this list, no reference index and motion vector for this list is set to b

» else

>»the reference index r b (//sfx) set for b is the minimum of the existing reference indices of the 4 corners: r b (listx) = min(r(c, listx))

>»the motion vector mv b (//sfx) set for b is the mean of existing motion vectors of the 4 corners, having the reference index r b (listx).

Partitioning choice

Once each 4x4 block motion data has been set, a merging process is necessary in order to determine the actual partitioning of the 8x8 block it belongs to and to avoid forbidden configurations. In the following, 4x4 blocks b of a 8x8 block B are identified as b1 to b4, as indicated in figure 14. For each 8x8 block, the following process is applied:

- if the 4 4x4 blocks have been classified as intra blocks, B is considered as an intra block.

- else, B partitioning choice is achieved:

> The following process for assigning the same reference indices to each 4x4 block is applied: for each list listx

» if no 4x4 block uses this list, no reference index and motion vector of this list are set to B

» else >» reference index r B (//sfx) for B is computed as the minimum of the existing reference indices of the 4 4x4 blocks: r B (listx) = min(r b (listx)) b

>» mean motion vector mv mea n(//sfx) of the 4x4 blocks having the same reference index r B (//sfx) is computed

>» 4x4 blocks (1) classified as intra blocks or (2) not using this list or (3) having a reference index r b (listx) different from r B (//sfx) are enforced to have r B (//sfx) and mv mea n(//sfx) as reference index and motion vector. > Then the choice of the partitioning mode for B is achieved. Two 4x4 blocks are considered as identical if their motion vectors are identical. The merging process is applied as follows:

» if bi is identical to b 2 and b 3 is identical to b4 then

>» if bi is identical to b 3 then BLK_8x8 is chosen

>» else BLK_8x4 is chosen » else if bi is identical to b 3 and b 2 is identical to b4 then BLK_4x8 is chosen

» else BLK_4x4 is chosen.

Prediction macroblock mode choice

A final process is achieved to determine the MB_pred mode. In the following, 8x8 blocks of the macroblock are identified as B1 to B4, as indicated in figure 15.

Two 8x8 blocks are considered as identical blocks if:

- One or both of the two 8x8 blocks are classified as intra blocks or - Partitioning mode of both blocks is BLK_8x8 and reference indices and motion vectors of listO and Iist1 of each 8x8 block, if they exist, are identical.

The mode choice is done using the following process:

- if all 8x8 blocks are classified as intra blocks, then MB_pred is classified as INTRA macroblock

- else, MB_pred is an INTER macroblock. Its mode choice is achieved as follows:

> 8x8 blocks classified as intra are enforced to BLK_8x8 partitioning. Their reference indices and motion vectors are computed as follows. Let B INTRA be such a 8x8 block. for each list listx

» if no 8x8 block uses this list, no reference index and motion vector of this list is assigned to B INTRA

» else, the following steps are applied: >» a reference index r min (//sfx) is computed as the minimum of the existing reference indices of the 8x8 blocks: T n J n (IiStX) = min(r B (listx))

>» a mean motion vector mv mea n(//sfx) of the 8x8 blocks having the same reference index r min (//sfx) is computed

>» r min (//sfx) is assigned to BINTRA and each 4x4 block of BINTRA is enforced to have r min (//sfx) and mv mea n(//sfx) as reference index and motion vector.

> Then the choice of the partitioning mode for B is achieved. Two 8x8 blocks are considered as identical if their Partitioning mode is BLK_8x8 and reference indices and motion vectors of listO and Iist1 of each 8x8 block, if they exist, are identical. The merging process is applied as follows:

» if B1 is identical to B2 and B3 is identical to B4 then

>» if B1 is identical to B3 then MODE_16x16 is chosen.

>» else MODE_16x8 is chosen. » else if B1 is identical to B3 and B2 is identical to B4 then MODE_8x16 is chosen.

» else MODE_8x8 is chosen.

Motion vectors scaling

A motion vector rescaling is finally applied to every existing motion vectors of the prediction macroblock MB_pred. A Motion vector mv=(dx,dy) is scaled in the vector mv s =(d s x,d S y) using the following equations: jd S χ = [d x ho ri z ] [d sy = [d y vertic ]

Using the semantic described further, the actual formulas are as follows :

2. rf ;c .scaled_base_width + 16.(BasePicWidthInMbsMinusl + l) 2.16.(BasePicWidthInMbsMinusl + 1) d. =

- 2.d x .scaled base width in mbs + (BasePicWidthlnMbsMinusl + 1) if ^ < 0

2.(BasePicWidthInMbsMinusl + 1)

2.rf scaled base height + 16.(BasePicHeightInMbsMinusl + l)

- if d v >= 0

2.16.(BasePicHeightInMbsMinusl + l) y

- l.d .scaled base height + 16.(BasePicHeightInMbsMinusl + 1)

- if d v < 0

2.16.(BasePicHeightInMbsMinusl + l) y

Inter-Layer texture Prediction Inter layer texture prediction is based on the same principles as inter layer motion prediction. However it is only possible for macroblocks fully embedded in the scaled base layer window 161 in figure 16, that is, macroblocks corresponding to the grey-colored area 162, whose coordinates (MB χ ,Mby) respect the following conditions: MB x >= (scaled_base_column+15) / 16 and MB x < (scaled_base_column+ scaled_base_width) / 16 and

MBy >= (scaled_base_ line+15) / 16 and

MBy < (scaled_base_ line + scaled_base_ height) / 16

Base layer texture is upsampled, for example, by applying the two-lobed or three-lobed Lanczos-windowed sine functions. These filters, widely used in graphics applications, described in the document titled "Filters for common resampling tasks", author Ken Turkowski, Apple Computer, April 1990, are considered to offer the best compromise in terms of reduction of aliasing, sharpness, and minimal ringing. The two-lobed Lanczos-windowed sine function is defined as follows:

Lanczos2(x)

Simpler filters, such as bilinear filter, may also be used. This upsampling step may be processed either on the full frame or block by block. In this latter case and for frame borders, repetitive padding is used.

Inter-Layer Intra texture Prediction

A given macroblock MB of current layer, high layer, can exploit inter layer intra texture prediction ("I_BL" mode) only if co-located macroblocks of the base layer exist and are intra macroblocks. Co-located base layer macroblocks are those that, once upsampled, partly or completely cover MB. Figure 17 illustrates three possible cases (a), (b) and (c). The bold solid line 171 corresponds to the current 16x16 macrobloc MB of the high layer. The bold dotted lines correspond to the scaled 16x16 base layer macroblocks 172: current macroblock MB may be included into 1 , figure 17a, up to 4, figure 17b, scaled base layer MB.

Intra texture prediction may be used both in low pass and high pass pictures, resulting from the temporal filtering performed at the encoder (a group of pictures is temporally filtered to generate one low pass picture and several high pass pictures, corresponding to different temporal frequency bands).

For generating the intra prediction signal for high-pass macroblocks, that is, macroblocks belonging to temporal high frequency pictures, coded in I_BL mode, the co-located 8x8 blocks 173 of the base layer high-pass signal are directly de-blocked and interpolated, as in case of 'standard' dyadic spatial scalability.

Inter-Layer Residual Prediction

Inter texture prediction may be used only for high pass pictures. A given macroblock MB of current layer can exploit inter layer residual prediction only if co-located macroblocks of the base layer exist and are not intra macroblocks.

Changes in Syntax and Semantics

Syntax in tabular form, where changes are highlighted, is given at the end of the specification. The main changes are the addition in the slice header syntax of a flag extended_spatial_scalability, and accordingly 4 parameters, scaled_base_column, scaled_base_line, scaled_base_width, scaled_base_height, related to the geometrical transformation to be applied in the base layer upsampling process. extended_spatial_scalability specifies the presence of syntax elements related to geometrical parameters for the base layer upsampling in the slice header. When this syntax element is not present, it shall be inferred to be equal to 0. scaled_base_column corresponds to the horizontal coordinate of the upper left pixel of the upsampled base layer in the current layer coord i nates system . scaled_base_line corresponds to the vertical coordinate of the upper left pixel of the upsampled base layer in the current layer coordinates system. scaled_base_width corresponds to the number of pixels per line of the upsampled base layer in the current layer. scaled_base_height corresponds to the number of pixels per column of the upsampled base layer in the current layer.

BaseLayer specifies whether the current layer is the base layer, the spatial base layer. If BaseLayer is equal to 1 , the current layer is the base layer.

Decoding Process

Decoding process for prediction data Compared to SVM3.0, the following processes have to be added.

For each macroblock, the following applies.

If base_layer_mode_flag is equal to 1 and if extended_spatial_scalability is equal to 1 , the motion vector field including the macroblock partitioning is deduced using the process previously

described. As in SVM3.0, if the base layer macroblock represents an intra macroblock, the current macroblock mode is set to I_BL.

- Otherwise, if base_layer_mode_flag is equal to 0 and base_layer_refinement_flag is equal to 1 , the base layer refinement mode is signalled. The base_layer_refinement_flag is only present, when the base layer represents a layer with a different spatial resolution than the current layer spatial resolution. The base layer refinement mode is similar to the base layer prediction mode. The macroblock partitioning as well as the reference indices and motion vectors are derived as for the base layer prediction mode. However, for each motion vector, a quarter-sample motion vector refinement mvd_ref_IX (-1 , 0, or +1 for each motion vector component) is additionally transmitted and added to the derived motion vectors.

The rest of the process is identical as in SVM3.0.

Decoding process for subband pictures

Compared to SVM3.0, the following processes have to be added.

- If mb_type specifies an I macroblock type that is equal to I_BL and if extended_spatial_scalability is equal to 1 , intra prediction signal is generated by the following process.

> The 8x8 blocks of the base layer picture BasePic that cover partly or completely the same area as the current macroblock are referred to as base layer blocks, the corresponding macroblocks they belong to are referred to as base layer macroblocks.

> The base layer blocks / macroblocks are extended by a 4- pixel border in each direction using the same process as described in section 6.3 of SVM3.0.

> The intra prediction signal is generated by interpolating the padded and deblocked base layer blocks. The interpolation is performed using the upsampling process previously described.

The rest of the process is identical as in SVM3.0.

- Otherwise, if mb_type does not specify an I macroblock mode and if extended_spatial_scalability is equal to 1 , and if residual_prediction_flag is equal to 1 , the following applies.

> The residual signal of the base layer blocks is upsampled and added to the residual signal of the current macroblock. The interpolation filter is not applied across boundaries between transform blocks. The interpolation is performed using the upsampling process previously described.

Sequence parameter set RBSP syntax

Macroblock layer in scalable extension syntax