Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHODS, SYSTEMS AND APPARATUS FOR AUTOMATIC VIDEO QUALITY ASSESSMENT
Document Type and Number:
WIPO Patent Application WO/2013/031362
Kind Code:
A1
Abstract:
Aspects of the present invention are related to systems, methods and apparatus for automatic quality assessment of a video sequence. According to a first aspect of the present invention, a quality index may be generated by combining a spatial quality index and a temporal quality index. According to a second aspect of the present invention, a spatial quality index may be calculated using a modified exponential moving average model to pool multi-scale structural similarity indices computed from test frame - reference frame pairs. According to a third aspect of the present invention, a temporal quality index may be generated by averaging multi-scale structural similarity indices computed from difference image pairs, wherein one difference image is formed between reference frames and another difference image is formed between a reference frame and a test frame.

Inventors:
VU CUONG
DESHPANDE SACHIN G
Application Number:
PCT/JP2012/066457
Publication Date:
March 07, 2013
Filing Date:
June 21, 2012
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
SHARP KK (JP)
VU CUONG
DESHPANDE SACHIN G
International Classes:
H04N17/00
Foreign References:
JP2008535349A2008-08-28
Attorney, Agent or Firm:
HARAKENZO WORLD PATENT & TRADEMARK (2-6, Tenjinbashi 2-chome Kita, Kita-ku, Osaka-sh, Osaka 41, JP)
Download PDF:
Claims:
CLAIMS

1. A method for determining a quality index for a test video sequence, said method comprising:

receiving, in a processor, a test video sequence; receiving, in said processor, a reference video sequence corresponding to said test video sequence;

in said processor, calculating a spatial quality index using said test video sequence and said reference video sequence;

in said processor, calculating a temporal quality index using said test video sequence and said reference video sequence; and

in said processor, combining said spatial quality index and said temporal quality index to form a final quality index for said test video sequence.

2. A method as described in claim 1, wherein said test video sequence is a degraded version of said reference video sequence, a processed version of said reference video sequence or a previously compressed version of said reference video sequence.

3. A method as described in claim 1, wherein said combining comprises averaging said spatial quality index and said temporal quality index. method as described in claim 1, wherein:

said test video sequence comprises a first plurality of image frames;

said reference video sequence comprises a second plurality of image frames; and

said calculating a spatial quality index comprises: calculating a multi-scale structural similarity (MS-SSIM) index for each image frame in said first plurality of image frames and a temporally corresponding image frame in said second plurality of image frames, thereby producing a plurality of MS-SSIM indices;

pooling said plurality of MS-SSIM indices, thereby producing a plurality of pooled MS- SSIM indices;

determining a minimum value from said plurality of MS-SSIM indices; and

setting said spatial quality index to said minimum value.

5. A method as described in claim 4, wherein said pooling said plurality of MS-SSIM indices comprises:

calculating an initial pooled MS-SSIM index by averaging a first plurality of said MS-SSIM indices in said plurality of MS-SSIM indices, wherein said first plurality of said MS-SSIM indices corresponds to a temporally initial portion of said MS-SSIM indices in said plurality of MS-SSIM indices;

calculating a first subsequent pooled MS-SSIM index by forming a linear combination of said initial pooled MS-SSIM index and a first next MS- SSIM index, wherein said first next MS-SSIM index is an immediately temporally subsequent MS- SSIM index to said temporally initial portion of said MS-SSIM indices in said plurality of MS-SSIM indices; and

calculating a second subsequent pooled MS-SSIM index by forming a linear combination of said first subsequent pooled MS-SSIM index and a second next MS-SSIM index, wherein said second next MS- SSIM index is an immediately temporally subsequent MS-SSIM index to said first next MS- SSIM index in said plurality of MS-SSIM indices.

6. A method as described in claim 5, wherein said initial portion of said MS-SSIM indices is associated with a portion of video at least one-half second in length. 7. A method as described in claim 5, wherein:

said calculati g a temporal quality index comprises:

forming a first reference difference image between a first image frame in said second plurality of image frames and a second image frame in said second plurality of image frames, wherein said second image frame is an immediately temporally previous image frame to said first image frame in said second plurality of image frames;

forming a first test difference image between a test image frame in said first plurality of image frames, wherein said test image frame corresponds temporally to said first image frame, and said second image frame; calculating a difference-frames multi-scale structural similarity (MS-SSIM) index using said first reference difference image and said first test difference image; and averaging said difference-frames MS-SSIM index with a lurality of previously calculated difference-frames MS-SSIM indices.

8. A method as described in claim 7, wherein said combining comprises averaging said spatial quality index and said temporal quality index.

9. A method as described in claim 4, wherein:

said calculating a temporal quality index comprises:

forming a first reference difference image between a first image frame in said second plurality of image frames and a second image frame in said second plurality of image frames, wherein said second image frame is an immediately temporally previous image frame to said first image frame in said second plurality of image frames;

forming a first test difference image between a test image frame in said first plurality of image frames, wherein said test image frame corresponds temporally to said first image frame, and said second image frame; calculating a difference-frames multi-scale structural similarity (MS-SSIM) index using said first reference difference image and said first test difference image; and

averaging said difference-frames MS-SSIM index with a plurality of previously calculated difference-frames MS-SSIM indices.

10. A method as described in claim 1, wherein:

said test video sequence comprises a first plurality of image frames;

said reference video sequence comprises a second plurality of image frames; and

said calculating a temporal quality index comprises:

forming a first reference difference image between a first image frame in said second plurality of image frames and a second image frame in said second plurality of image frames, wherein said second image frame is an immediately temporally previous image frame to said first image frame in said second plurality of image frames;

forming a first test difference image between a test image frame in said first plurality of image frames, wherein said test image frame corresponds temporally to said first image frame, and said second image frame;

calculating a multi-scale structural similarity (MS-SSIM) index using said first reference difference image and said first test difference image; and

averaging said MS-SSIM index with a plurality of previously calculated MS-SSIM indices.

11. A method for determining a quality index for a test video sequence, said method comprising:

receiving, in a processor, a test video sequence, wherein said test video sequence comprises a first plurality of image frames;

receiving, in said processor, a reference video sequence corresponding to said test video sequence, wherein said reference video sequence comprises a second plurality of image frames; and

in said processor, calculating a spatial quality index using said test video sequence and said reference video sequence, wherein said calculating comprises:

calculating a multi-scale structural similarity (MS-SSIM) index for each image frame in said first plurality of image frames and a temporally corresponding image frame in said second plurality of image frames, thereby producing a plurality of MS-SSIM indices;

pooling said plurality of MS-SSIM indices, thereby producing a plurality of pooled MS- SSIM indices;

determining a minimum value from said plurality of MS-SSIM indices; and

setting said spatial quality index to said minimum value.

12. A method as described in claim 11, wherein said pooling said plurality of MS-SSIM indices comprises:

calculating an initial pooled MS-SSIM index by averaging a first plurality of said MS-SSIM indices in said plurality of MS-SSIM indices, wherein said first plurality of said MS-SSIM indices corresponds to a temporally initial portion of said MS-SSIM indices in said plurality of MS-SSIM indices;

calculating a first subsequent pooled MS-SSIM index by forming a linear combination of said initial pooled MS-SSIM index and a first next MS- SSIM index, wherein said first next MS-SSIM index is an immediately temporally subsequent MS- SSIM index to said temporally initial portion of said MS-SSIM indices in said plurality of MS-SSIM indices; and

calculating a second subsequent pooled MS-SSIM index by forming a linear combination of said first subsequent pooled MS-SSIM index and a second next MS-SSIM index, wherein said second next MS- SSIM index is an immediately temporally subsequent MS-SSIM index to said first next MS- SSIM index in said plurality of MS-SSIM indices.

13. A method as described in claim 12, wherein said initial portion of said MS-SSIM indices is associated with a portion of video at least one-half second in length.

14. A method as described in claim 11 further comprising combining said spatial quality index with a temporal quality index . 15. A method as described in claim 14, wherein said combining comprises averaging said spatial quality index and said temporal quality index.

16. A method as described in claim 11, wherein said test video sequence is a degraded version of said reference video sequence, a processed version of said reference video sequence or a previously compressed version of said reference video sequence.

17. A method for determining a quality index for a test video sequence, said method comprising:

receiving, in a processor, a test video sequence, wherein said test video sequence comprises a first plurality of image frames;

receiving, in said processor, a reference video sequence corresponding to said test video sequence, wherein said reference video sequence comprises a second plurality of image frames; and

in said processor, calculating a temporal quality index using said test video sequence and said reference video sequence, wherein said calculating comprises;

forming a first reference difference image between a first image frame in said second plurality of image frames and a second image frame in said second plurality of image frames, wherein said second image frame is an immediately temporally previous image frame to said first image frame in said second plurality of image frames; forming a first test difference image between a test image frame in said first plurality of image frames, wherein said test image frame corresponds temporally to said first image frame, and said second image frame; calculating a multi-scale structural similarity (MS-SSIM) index using said first reference difference image and said first test difference image; and

averaging said MS-SSIM index with a plurality of previously calculated MS-SSIM indices.

18. A method as described in claim 17 further comprising combining said temporal quality index with a spatial quality index.

19. A method as described in claim 18, wherein said combining comprises averaging said spatial quality index and said temporal quality index.

20. A method as described in claim 17, wherein said test video sequence is a degraded version of said reference video sequence, a processed version of said reference video sequence or a previously compressed version of said reference video sequence.

21. An apparatus for determing a quality index for a test video sequence, said apparatus comprising:

a video-sequence receiver for receiving a test video sequence and a reference video sequence corresponding to the test video sequence;

a spatial-quality-index calculator for calculating a spatial quality index using said test video sequence and said reference video sequence;

a temporal-quality-index calculator for calculating a temporal quality index using said test video sequence and said reference video sequence; and

a quality-index combiner for combining said spatial quality index and said temporal quality index to form a final quality index for said test video sequence.

22. An apparatus for determing a quality index for a test video sequence, said apparatus comprising:

a first video-frame receiver for recieving a test video sequence, wherein said test video sequence comprises a first plurality of image frames; said first video-frame receiver for recieving a reference video sequence corresponding to said test video sequence, wherein said reference video sequence comprises a second plurality of image frames; and

a first multi-scale structural similarity (MS-' SSIM) index calculator for calculating MS-SSIM index for each image frame in said first plurality of image frames and a temporally corresponding image frame in said second plurality of image frames, thereby producing a plurality of MS-SSIM indices;

an MS-SSIM index pooler for pooling said plurality of MS-SSIM indices, thereby producing a plurality of pooled MS-SSIM indices;

a minimum calculator for determining a minimum value from said plurality of MS-SSIM indices and for setting said spatial quality index to said minimum value.

23. An apparatus for determing a quality index for a test video sequence, said apparatus comprising:

a second video-frame receiver for recieving a test video sequence, wherein said test video sequence comprises a first plurality of image frames and a reference video sequence corresponding to said test video sequence, wherein said reference video sequence comprises a second plurality of image frames; and

a frame differencer for forming a first reference difference image between a first image frame in said second plurality of image frames and a second image frame in said second plurality of image frames, wherein said second image frame is an immediately temporally previous image frame to said first image frame in said second plurality of image frames;

said frame differencer for forming a first test difference image between a test image frame in said first plurality of image frames, wherein said test image frame corresponds temporally to said first image frame, and said second image frame; a second MS-SSIM index calculator for calculating MS-SSIM index using said first reference difference image and said first test difference image; and

an MS-SSIM conbiner for averaging said MS-SSIM index with a plurality of previously calculated MS- SSIM indices.

A system for determining a quality index for a test > sequence, said system comprising: the apparatus as described in any one of claim 1-23.

Description:
DESCRIPTION

TITLE OF THE INVENTION: METHODS, SYSTEMS AND APPARATUS FOR AUTOMATIC VIDEO QUALITY ASSESSMENT

TECHNICAL FIELD

Embodiments of the present invention relate generally to methods, systems and apparatus for automatically assessing the quality of a video sequence and, in particular, for obtaining a quality index for the video sequence.

BACKGROUND ART

A measurement of the quality of a video sequence may be important in a video processing system, or other video system. One reliable method for quantifying the quality of a video sequence involves having human subjects rate the quality of the video sequence. However, this method may be time consuming and expensive and, therefore, impractical in some applications. Methods, systems and apparatus, for automatic video quality assessment, that determine a quality measure, for a video sequence, that is highly correlated with a human rating may be desirable.

SUMMRY OF INVENTION

According to one aspect of the invention, there is provided a method for determining a quality index for a test video sequence, said method comprising: receiving, in a processor, a test video sequence; receiving, in said processor, a reference video sequence corresponding to said test video sequence; in said processor, calculating a spatial quality index using said test video sequence and said reference video sequence; in said processor, calculating a temporal quality index using said test video sequence and said reference video sequence; and in said processor, combining said spatial quality index and said temporal quality index to form a final quality index for said test video sequence.

According to one aspect of the invention, there is provided a method for determining a quality index for a test video sequence, said method comprising: receiving, in a processor, a test video sequence, wherein said test video sequence comprises a first plurality of image frames; receiving, in said processor, a reference video sequence corresponding to said test video sequence, wherein said reference video sequence comprises a second plurality of image frames; and in said processor, calculating a spatial quality index using said test video sequence and said reference video sequence, wherein said calculating comprises: calculating a multi-scale structural similarity (MS-SSIM) index for each image frame in said first plurality of image frames and a temporally corresponding image frame in said second plurality of image frames, thereby producing a plurality of MS-SSIM indices; pooling said plurality of MS-SSIM indices, thereby producing a plurality of pooled MS-SSIM indices; determining a minimum value from said plurality of MS-SSIM indices; and setting said spatial quality index to said minimum value.

According to one aspect of the invention, there is provided a method for determining a quality index for a test video sequence, said method comprising: receiving, in a processor, a test video sequence, wherein said test video sequence comprises a first plurality of image frames; receiving, in said processor, a reference video sequence corresponding to said test video sequence, wherein said reference video sequence comprises a second plurality of image frames; and in said processor, calculating a temporal quality index using said test video sequence and said reference video sequence, wherein said calculating comprises; forming a first reference difference image between a first image frame in said second plurality of image frames and a second image frame in said second plurality of image frames, wherein said second image frame is an immediately temporally previous image frame to said first image frame in said second plurality of image frames; forming a first test difference image between a test image frame in said first plurality of image frames, wherein said test image frame corresponds temporally to said first image frame, and said second image frame; calculating a multi- scale structural similarity (MS-SSIM) index using said first reference difference image and said first test difference image; and

averaging said MS-SSIM index with a plurality of previously calculated MS-SSIM indices.

According to one aspect of the invention, there is provided an apparatus for determing a quality index for a test video sequence, said apparatus comprising: a video- sequence receiver for receiving a test video sequence and a reference video sequence corresponding to the test video sequence; a spatial-quality-index calculator for calculating a spatial quality index using said test video sequence and said reference video sequence; a temporal-quality-index calculator for calculating a temporal quality index using said test video sequence and said reference video sequence; and a quality-index combiner for combining said spatial quality index and said temporal quality index to form a final quality index for said test video sequence. According to one aspect of the invention, there is provided an apparatus for determing a quality index for a test video sequence, said apparatus comprising: a first video-frame receiver for recieving a test video sequence, wherein said test video sequence comprises a first plurality of image frames; said first video-frame receiver for recieving a reference video sequence corresponding to said test video sequence, wherein said reference video sequence comprises a second plurality of image frames; and a first multi-scale structural similarity (MS-SSIM) index calculator for calculating MS-SSIM index for each image frame in said first plurality of image frames and a temporally corresponding image frame in said second plurality of image frames, thereby producing a plurality of MS-SSIM indices; an MS-SSIM index pooler for pooling said plurality of MS- SSIM indices, thereby producing a plurality of pooled MS- SSIM indices; a minimum calculator for determining a minimum value from said plurality of MS-SSIM indices and for setting said spatial quality index to said minimum value.

According to one aspect of the invention, there is provided an apparatus for determing a quality index for a test video sequence, said apparatus comprising: a second video-frame receiver for recieving a test video sequence, wherein said test video sequence comprises a first plurality of image frames and a reference video sequence corresponding to said test video sequence, wherein said reference video sequence comprises a second plurality of image frames; and a frame differencer for forming a first reference difference image between a first image frame in said second plurality of image frames and a second image frame in said second plurality of image frames, wherein said second image frame is an immediately temporally previous image frame to said first image frame in said second plurality of image frames; said frame differencer for forming a first test difference image between a test image frame in said first plurality of image frames, wherein said test image frame corresponds temporally to said first image frame, and said second image frame; a second MS-SSIM index calculator for calculating MS-SSIM index using said first reference difference image and said first test difference image; and an MS-SSIM conbiner for averaging said MS-SSIM index with a plurality of previously calculated MS-SSIM indices.

According to one aspect of the invention, there is provided a system for determining a quality index for a test video sequence, the system comprising: the apparatus as described above.

The foregoing and other objectives, features, and advantages of the invention will be more, readily understood upon consideration of the following detailed description of the invention taken in conjunction with the accompanying drawings.

BRIEF DESCRIPTION OF DRAWINGS

Fig. 1 is a chart showing exem lary embodiments of the present invention comprising calculating a spatial quality index, calculating a temporal quality index and combining the spatial quality index and the temporal quality index to form a final quality index;

Fig. 2 is a chart showing exemplary embodiments of the present invention comprising calculating a plurality of multi-scale structural similarity (MS-SSIM) indices, pooling the indices and selecting the minimum- valued pooled index as the spatial quality index;

Fig. 3 is a chart showing exemplary embodiments of the present invention comprising calculating multi-scale structural similarity (MS-SSIM) indices for a plurality of reference difference frame and reference - test difference frame pairs and averaging the MS-SSIM index values to determine a temporal quality index;

Fig. 4 is a picture depicting exemplary embodiments of the present invention comprising a spatial-quality-index calculator, a temporal-quality-index calculator and a quality-index combiner for combining a spatial quality index and a temporal quality index; Fig. 5 is a picture depicting exemplary embodiments of a spatial-quality-index calculator according to the present invention; and

Fig. 6 is a picture depicting exemplary embodiments of a temporal-quality-index calculator according to the present invention.

DESCRIPTION OF EMBODIMENTS

Embodiments of the present invention will be best understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The figures listed above are expressly incorporated as part of this detailed description.

It will be readily understood that the components of the present invention, as generally described and illustrated in the figures herein, could be arranged and designed in a wide variety of different configurations. Thus, the following more detailed description of the embodiments of the methods and systems of the present invention is not intended to limit the scope of the invention, but the detailed description is merely representative of the presently preferred embodiments of the invention.

Elements of embodiments of the present invention may be embodied in hardware, firmware and/or a computer program product comprising a computer-readable storage medium having instructions stored thereon/in which may be used to program a computing system. While exemplary embodiments revealed herein may only describe one of these forms, it is to be understood that one skilled in the art would be able to effectuate these elements in any of these forms while resting within the scope of the present invention.

Although the charts and diagrams in the figures may show a specific order of execution, it is understood that the order of execution may differ from that which is depicted. For example, the order of execution of the blocks may be changed relative to the shown order. Also, as a further example, two or more blocks shown in succession in a figure may be executed concurrently, or with partial concurrence. It is understood by those with ordinary skill in the art that a computer program product comprising a computer-readable storage medium having instructions stored thereon/in which may be used to program a computing system, hardware and/or firmware may be created by one of ordinary skill in the art to carry out the various logical functions described herein.

Some embodiments of the present invention may comprise a computer program product comprising a computer-readable storage medium having instructions stored thereon/in which may be used to program a computing system to perform any of the features and methods described herein. Exemplary computer-readable storage media may include, but are not limited to, flash memory devices, disk storage media, for example, floppy disks, optical disks, magneto-optical disks, Digital Versatile Discs (DVDs), Compact Discs (CDs), micro-drives and other disk storage media, Read-Only Memory (ROMs), Programmable Readonly Memory (PROMs), Erasable Programmable Read-Only Memory (EPROMS), Electrically Erasable Programmable Read-Only Memory (EEPROMs), Random- Access Memory (RAMS), Video Random- Access Memory (VRAMs), Dynamic Random- Access Memory (DRAMs) and any type of media or device suitable for storing instructions and/or data.

A measurement of the quality of a video sequence may be important in a video processing system, or other video system. One reliable method for quantifying the quality of a video sequence involves having human subjects rate the quality of the video sequence. However, this method may be time consuming and expensive and, therefore, impractical in some applications. Methods, systems and apparatus, for automatic video quality assessment, that determine a quality measure, for a video sequence, that is highly correlated with a human rating may be desirable.

Some embodiments of the present invention may be described in relation to Figure 1. Figure 1 illustrates exemplary method(s) 100 of video quality assessment according to embodiments of the present invention. In these embodiments, a test video sequence may be received 102 in a processor. The test video sequence may be, for example, a processed video sequence, a degraded video sequence, a decoded video sequence or any video sequence for which a quality assessment may be desired. The test video sequence may comprise a first plurality of temporally related image frames, which may be referred to as test frames. A reference video sequence comprising a second plurality of temporally related image frames, which may be referred to as reference frames, corresponding temporally to the first plurality of image frames in the test video sequence may be received 102 in the processor. A spatial quality index, also considered a spatial quality measure, for the test video sequence, may be calculated 104, in the processor, using the test video sequence and the reference video sequence. A temporal quality index, also considered a temporal quality measure, for the test video sequence, may be calculated 106, in the processor, using the test video sequence and the reference video sequence. The spatial quality index and the temporal quality index may be combined 108, in the processor, to form a final quality index, also considered a final quality measure, for the test video sequence. Exemplary processors may include a computational processing system in a computing system, a computational processing system in a video processing system, a computational processing system in a video encoder, a computational processing system in a video decoder and other processors and computational processing units .

The calculation 104 of the spatial quality index, in some embodiments of the present invention, may be understood in relation to Figure 2. Figure 2 illustrates exemplary method(s) 104 of spatial quality index calculation according to embodiments of the present invention. In some embodiments of the present invention, a multi-scale structural similarity (M S - S S IM) index may be calculated 200 for each temporally corresponding test frame and reference frame pair. For each test frame and the temporally corresponding reference frame, a contrast comparison component and a structure comparison component may be determined for a plurality of scales, also considered layers. For a particular layer, m , the test frame and the reference frame may be low-pass filtered and down- sampled m-ltimes, and the contrast comparison component for the layer, which may be denoted c m (x,y) , may be computed according to: c m ( x >y)

and the structure comparison component for the layer, which may be denoted s m (x,y), may be computed according to:

'xy,m ' ^3

(*>y)= σ x,m σ y,m +C 3-, where x and y may denote aligned image patches in the m - layer test frame and reference frame, respectively, and σ χιη and y m may denote the standard deviation of the luminance of x and y, respectively, and σ may denote the covariance. In some embodiments of the present invention, the aligned patches, x and y, may comprise the entire test frame and reference frame. In alternative embodiments, the aligned patches, x and y, may comprise a fixed-block-size block in the test frame and in the reference frame. A luminance comparison component, which may be denoted / M (x,y) , may be determined only for the highest scale, which may be denoted M, according to:

where μ χη and μ may denote the mean of the luminance of x and y, respectively. The constants C x , C 2 and C 3 may be stabilizing terms of the corresponding components. In an exemplary embodiment of the present invention comprising 8 bits-per-pixel luminance images, wherein the dynamic range, which may be denoted L , is equal to 255, the constants C 15 C 2 and C 3 may be determined according to: respectively, where i. " , € 1 and K 2 € I . In an exemplary embodiment, i^=0.01 and ^T 2 =0.03. The components may be combined to generate an MS-SSIM index, for the reference frame - test frame pair, according to:

MS-SSIM(x,y) = [l M (x,y)] ¾ ]J[c m (x,y) [s m (x,y) J" .

In an exemplary embodiment of the present invention, M = 5, a M =0.1333 and /? m=1 5 = y m _ S =[0.0448,0.2856,0.3001,0.2363,0.1333].

The MS-SSIM indices for the reference frame - test frame pairs may be pooled 202 to create a plurality of spatial quality values. In some embodiments of the present invention, the MS-SSIM indices may be pooled using a modified exponential moving average. An initial spatial quality value, which may be denoted S x , may be computed according to:

( P Λ

jMSSSIM i

r- _ =!

ύι - '

P

where MSSSIM i denotes the MS-SSIM index of the z th temporally located reference frame - test frame pair. For n = \,2,...,N-p , where Nis the number of frames in each the test video sequence and the reference video sequence, S n+l may be computed according to:

S n+1 =aMSSSIM n+p+ (l-a)S n , where a is a smoothing factor which may be, in an exem lary embodiment of the present invention, selected η

according to: ct=-, r , where ?? = 0.25 and o = 30. In some embodiments of the present invention, each S n may contain information from, at least, half a second of the video, and in each ^, a new frame may not make an immediate strong effect and the contribution of previous frames may not drop too fast. In some embodiments of the present invention, setting p = 30 and a to a small value may achieve the above- described three constraints o S n .

In some embodiments of the present invention, the spatial quality of the test video sequence may be based on the worst-quality video segment within the test video sequence. In these exemplary embodiments, the minimum value of the pooled MS-SSIM indices may be determined 204, and the spatial quality index, which may be denoted Q s , for the test sequence may be set 206 to the minimum value: Q s =mmS„.

n

The calculation 106 of the temporal quality index, in some embodiments of the present invention, may be understood in relation to Figure 3. Figure 3 illustrates exemplary method(s) 106 of temporal quality index calculation according to embodiments of the present invention. Reference difference frames, which may be denoted D rj , and reference - test difference frames, which may be denoted D di , may be formed 300, 302 according to:

and

respectively, where f r j and f r M may denote temporally adjacent frames within the reference video sequence and f d M may denote the test frame temporally corresponding to reference frame f r M , and wherein may be a temporal index. An MS-SSIM index may calculated 304 for each pair { di ,D rJ , where z ' = l,...,N-l . The MS-SSIM index may be calculated according to the method described above. The MS-SSIM index associated with temporal index i may be denoted 7 , and the N-l MS-SSIM indices may be, in some embodiments of the present invention, averaged 306 and the temporal quality index, which may be denoted ζ ) Γ , m a y be set 308 to the average index:

In alternative embodiments, the N-l MS-SSIM indices may be combined using a weighted average, an exponential weighting or another data fusion method known in the art.

Referring to Figure 1, the spatial quality index, Q s , and the temporal quality index, Q T , may be combined 108 to generate a final quality index, which may be denoted Q , for the test video sequence. In some embodiments of the present invention, the spatial quality index, Q s , and the temporal quality index, Q T , may be combined 108 according to:

In alternative embodiments, the spatial quality index, Q s , and the temporal quality index, Q T , may be combined using a weighted average, an exponential weighting or another data fusion method known in the art.

The final quality index, Q , may be a value in the range of zero to one, wherein a video sequence with a larger final quality index value may correspond to a visibly higher quality video sequence than a video sequence a smaller final quality index value.

Some embodiments of the present invention, described in relation to Figure 4, may comprise a system 400 for computing a quality index for a test video sequence. The system 400 may comprise a video-sequence receiver 402 for receiving a test video sequence and a reference video sequence corresponding to the test video sequence. The video-sequence receiver 402 may store the test video sequence in a test-sequence memory 404 and the reference video sequence in a reference-sequence memory 406. The test video sequence and the reference video sequence may be made available to a spatial-quality-index calculator 408 and a temporal-quality-index calculator 412 from the test- sequence memory 404 and the reference-sequence memory 406, respectively. The spatial-quality-index calculator 408 may calculate a spatial quality index which may be stored in a spatial-quality-index memory 410, and the temporal- quality-index calculator 412 may calculate a temporal quality index which may be stored in a temporal-quality- index memory 414. The spatial quality index and the temporal quality index may be made available to a quality- index combiner 416 from the spatial-quality-index memory 410 and the temporal-quality-index memory 414, respectively. The quality-index combiner 416 may combine the spatial quality index and the temporal quality index to form a final quality index which may stored in a final- quality-index memory 418. A final-quality-index transmitter 420 may make the final quality index stored in the final-quality-index memory 418 available to other processes and/or systems.

The spatial-quality-index calculator 408 may be understood in relation to Figure 5. Figure 5 illustrates exemplary embodiments, according to the present invention, of the spatial-quality-index calculator 408. The s atial- quality-index calculator 408 may comprise a controller 500 for controlling the processing flow. The spatial-quality- index calculator 408 may comprise a video-frame receiver 502 (first video-frame receiver) which may be controlled by the controller 500 to receive a test frame and temporally corresponding reference frame pair. The test frame may be written to a test-frame memory 504, and the temporally corresponding reference frame may be written to a reference-frame memory 506. The test frame - reference frame pair may be made available from the test-frame memory 504 and the reference-frame memory 506 to a multi- scale structural similarity (M S - S S IM) -index calculator 508(first MS-SSIM index calculator). The M S - S S IM- index calculator 508 may calculate an MS-SSIM index for the test frame - reference frame pair, and the MS-SSIM index may be written to an MS - S S IM-index memory 510, and the MS- SSIM index may be made available from the M S - S S IM- index memory 510 to an MS-SSIM-index pooler 512. The controller 500 may control the data flow so that each test frame and temporally corresponding reference frame may be processed, and an MS-SSIM-index calculated for each frame pair. When a sufficient number of MS-SSIM indices are available to the MS-SSIM-index pooler 512, a plurality of MS-SSIM indices may be pooled, and the pooled index value may be written to a pooled-index memory 514. The controller may control the initiation of pooling based on the number of available MS-SSIM indices. In some embodiments of the present invention, the MS- SSIM indices may be pooled using a modified exponential moving average. An initial spatial quality value, which may be denoted S, , may be computed according to:

where MSSSIM, denotes the MS-SSIM index of the z th temporally located reference frame - test frame pair. For n = \,2,...,N-p , where Nis the number of frames in each the test video sequence and the reference video sequence, S n+l may be computed according to:

S n+1 =aMSSSIM n+p +(l-a)S n ,

where a is a smoothing factor which may be, in an exemplary embodiment of the present invention, selected

77

according to: cc = -—— - , where n = 0.25 and p = 30. In some embodiments of the present invention, each S n may contain information from, at least, half a second of the video, and in each 5„, a new frame may not make an immediate strong effect and the contribution of previous frames may not drop too fast. In some embodiments of the present invention, setting p = 30 and a to a small value may achieve the above- described three constraints on S n .

A minimum calculator 516 may determine a minimum spatial quality value from the spatial quality values available in the pooled-index memory 514, and the minimum spatial quality value may be written to a spatial-quality- index memory 518. A spatial-quality-index transmitter 520 may make the spatial quality index stored in the spatial- quality-index memory 518 available to other processes and/or systems .

The controller 500 may control the data flow and process initiation of the components of the spatial-quality- index calculator 408. In some embodiments, the flow may be purely sequential. In alternative embodiments, the flow may partially concurrent. In yet alternative embodiments, the flow may substantially concurrent.

The temporal-quality-index calculator 412 may be understood in relation to Figure 6. Figure 6 illustrates exemplary embodiments, according to the present invention, of the temporal-quality-index calculator 412. The temporal-quality-index calculator 412 may comprise a controller 600 for controlling the processing flow. The temporal-quality-index calculator 412 may comprise a video-frame receiver 602(second video-frame receiver) which may be controlled by the controller 600 to receive a test frame and temporally corresponding reference frame pair. The test frame may be written to a test-frame memory 604, and the temporally corresponding reference frame may be written to a reference-frame memory 606. The immediately temporally previous reference frame may be received by the video-frame receiver 602 and may be written to the reference-frame memory 606. A frame differencer 608 may form two difference frames according to:

D r .=f r ,-f r .

and

where .,, and f rM may denote the temporally adjacent frames within the reference video sequence and f dM may denote the test frame temporally corresponding to reference frame f ri+l , and wherein i may be a temporal index. The test frame and the reference frames may be made available to the frame difference from the test-frame memory 604 and the reference-frame memory 606, respectively. An MS-SSIM index may calculated by an M S - S S IM- index calculator 610(second MS-SSIM index calculator) for the frame pair (D di ,D ri ) . The MS-SSIM index may be written to an MS- SSIM index memory 612. An MS-SSIM-index combiner 614 may combine the MS-SSIM indices for all frame pairs (D di ,D ri j^, where z = l,...,N-l and Ndenotes the number of frames in the test video sequence. The MS-SSIM-index combiner 614 may, in some embodiments of the present invention, average the N-l MS-SSIM indices to form the temporal quality index, which may be denoted Q T , according to:

where T T may denote the MS-SSI M ' index associated with the frame air (D DI ,D RI ).

In alternative embodiments, the N-l MS-SSIM indices may be combined using a weighted average, an exponential weighting or another data fusion method known in the art.

The temporal quality index may be written to a temporal-quality-index memory 618 and may be made available to other processes and/or systems by a temporal- quality-index transmitter 620.

The controller 600 may control the data flow and process initiation of the components of the temporal- quality-index calculator 412. In some embodiments, the flow may be purely sequential. In alternative embodiments, the flow may partially concurrent. In yet alternative embodiments, the flow may substantially concurrent.

Referring to Figure 4, in some embodiments of the present invention, the quality-index combiner 416 may combine the spatial quality index, which may be denoted Q S , and the temporal quality index, which may be denoted Q R , to generate the final quality index, which may be denoted Q , for the test video sequence, according to:

N _(Q S +QT) In alternative embodiments, the spatial quality index, Q s , and the temporal quality index, Q T , may be combined in the quality-index combiner 4.16 using a weighted average, an exponential weighting or another data fusion method known in the art.

The final quality index, Q , may be a value in the range of zero to one, wherein a video sequence with a larger final quality index value may correspond to a visibly higher quality video sequence than a video sequence a smaller final quality index value.

Some embodiments of the present invention may comprise a video processing apparatus in which the above described methods and/or systems may be embodied. Exemplary video processing apparatus may be video test devices, video encoders, video decoders and other apparatus in which a measurement of video quality may be required.

Aspects of the present invention are related to systems, methods and apparatus for automatic quality assessment of a video sequence.

According to a first aspect of the present invention, a quality index may be generated by calculating a spatial quality index, calculating a temporal quality index and combining the spatial quality index and the temporal quality index to form a final quality index. According to a second aspect of the present invention, a spatial quality index may be calculated using a modified exponential moving average model to pool multi-scale structural similarity indices computed from test frame - reference frame pairs.

According to a third aspect of the present invention, a temporal quality index may be generated by averaging multi- scale structural similarity indices computed from difference image pairs, wherein one difference image is formed between reference frames and another difference image is formed between a reference frame and a test frame.

(Different Description of The Present Inventions)

Note that the invention of the present inventions can be different described as follows.

According to one aspect of the invention, there is provided a method for determining a quality index for a test video sequence, said method comprising: receiving, in a processor, a test video sequence; receiving, in said processor, a reference video sequence corresponding to said test video sequence; in said processor, calculating a spatial quality index using said test video sequence and said reference video sequence; in said processor, calculating a temporal quality index using said test video sequence and said reference video sequence; and in said processor, combining said spatial quality index and said temporal quality index to form a final quality index for said test video sequence.

Furthermore, it is preferable that the test video sequence is a degraded version of said reference video sequence, a processed version of said reference video sequence or a previously compressed version of said reference video sequence.

Furthermore, it is preferable that the combining comprises averaging said spatial quality index and said temporal quality index.

Furthermore, it is preferable that the test video sequence comprises a first plurality of image frames; said reference video sequence comprises a second plurality of image frames; and said calculating a spatial quality index comprises: calculating a multi-scale structural similarity (MS-SSIM) index for each image frame in said first plurality of image frames and a temporally corresponding image frame in said second plurality of image frames, thereby producing a plurality of MS-SSIM indices; pooling said plurality of MS-SSIM indices, thereby producing a plurality of pooled MS-SSIM indices; determining a minimum value from said plurality of MS-SSIM indices; and setting said spatial quality index to said minimum value.

Furthermore, it is preferable that the pooling the plurality of MS-SSIM indices comprises: calculating an initial pooled MS-SSIM index by averaging a first plurality of said MS-SSIM indices in said plurality of MS-SSIM indices, wherein said first plurality of said MS-SSIM indices corresponds to a temporally ! nitial portion of said MS-SSIM indices in said plurality of MS-SSIM indices; calculating a first subsequent pooled MS-SSIM index by forming a linear combination of said initial pooled MS- SSIM index and a first next MS-SSIM index, wherein said first next MS-SSIM index is an immediately temporally subsequent MS-SSIM index to said temporally initial portion of said MS-SSIM indices in said plurality of MS-SSIM indices; and calculating a second subsequent pooled MS- SSIM index by forming a linear combination of said first subsequent pooled MS-SSIM index and a second next MS- SSIM index, wherein said second next MS-SSIM index is an immediately temporally subsequent MS-SSIM index to said first next MS-SSIM index in said plurality of MS-SSIM indices.

Furthermore, it is preferable that the initial portion of the MS-SSIM indices is associated with a portion of video at least one-half second in length.

Furthermore, it is preferable that the calculating a temporal quality index comprises: forming a first reference difference image between a first image frame in said second plurality of image frames and a second image frame in said second plurality of image frames, wherein said second image frame is an immediately temporally previous image frame to said first image frame in said second plurality of image frames; forming a first test difference image between a test image frame in said first plurality of image frames, wherein said test image frame corresponds temporally to said first image frame, and said second image frame; calculating a difference-frames multi-scale structural similarity (MS- SSIM) index using said first reference difference image and said first test difference image; and averaging said difference- frames MS-SSIM index with a plurality of previously calculated difference-frames MS-SSIM indices.

Furthermore, it is preferable that the combining comprises averaging said spatial quality index and said temporal quality index.

Furthermore, it is preferable that the calculating a temporal quality index comprises: forming a first reference difference image between a first image frame in said second plurality of image frames and a second image frame in said second plurality of image frames, wherein said second image frame is an immediately temporally previous image frame to said first image frame in said second plurality of image frames; forming a first test difference image between a test image frame in said first plurality of image frames, wherein said test image frame corresponds temporally to said first image frame, and said second image frame; calculating a difference-frames multi-scale structural similarity (MS- SSIM) index using said first reference difference image and said first test difference image; and averaging said difference-frames MS-SSIM index with a lurality of previously calculated difference-frames MS-SSIM indices.

Furthermore, it is preferable that the test video sequence comprises a first plurality of image frames; said reference video sequence comprises a second plurality of image frames; and said calculating a temporal quality index comprises: forming a first reference difference image between a first image frame in said second plurality of image frames and a second image frame in said second plurality of image frames, wherein said second image frame is an immediately temporally previous image frame to said first image frame in said second plurality of image frames; forming a first test difference image between a test image frame in said first plurality of image frames, wherein said test image frame corresponds temporally to said first image frame, and said second image frame; calculating a multi- scale structural similarity (MS-SSIM) index using said first reference difference image and said first test difference image; and averaging said MS-SSIM index with a plurality of previously calculated MS-SSIM indices. According to one aspect of the invention, there is provided a method for determining a quality index for a test video sequence, said method comprising: receiving, in a processor, a test video sequence, wherein said test video sequence comprises a first plurality of image frames; receiving, in said processor, a reference video sequence corresponding to said test video sequence, wherein said reference video sequence comprises a second plurality of image frames; and in said processor, calculating a spatial quality index using said test video sequence and said reference video sequence, wherein said calculating comprises: calculating a multi-scale structural similarity (MS-SSIM) index for each image frame in said first plurality of image frames and a temporally corresponding image frame in said second plurality of image frames, thereby producing a plurality of MS-SSIM indices; pooling said plurality of MS-SSIM indices, thereby producing a plurality of pooled MS-SSIM indices; determining a minimum value from said plurality of MS-SSIM indices; and setting said spatial quality index to said minimum value.

Furthermore, it is preferable that the pooling said plurality of MS-SSIM indices comprises: calculating an initial pooled MS-SSIM index by averaging a first plurality of said MS-SSIM indices in said plurality of MS-SSIM indices, wherein said first plurality of said MS-SSIM indices corresponds to a temporally initial portion of said MS-SSIM indices in said plurality of MS-SSIM indices; calculating a first subsequent pooled MS-SSIM index by forming a linear combination of said initial pooled MS- SSIM index and a first next MS-SSIM index, wherein said first next MS-SSIM index is an immediately temporally subsequent MS-SSIM index to said temporally initial portion of said MS-SSIM indices in said plurality of MS-SSIM indices; and calculating a second subsequent pooled MS- SSIM index by forming a linear combination of said first subsequent pooled MS-SSIM index and a second next MS- SSIM index, wherein said second next MS-SSIM index is an immediately temporally subsequent MS-SSIM index to said first next MS-SSIM index in said plurality of MS-SSIM indices .

Furthermore, it is preferable that the initial portion of said MS-SSIM indices is associated with a portion of video at least one-half second in length.

Furthermore, it is preferable that combining said spatial quality index with a temporal quality index.

Furthermore, it is preferable that the combining comprises averaging said spatial quality index and said temporal quality index.

Furthermore, it is preferable that the test video sequence is a degraded version of said reference video sequence, a processed version of said reference video sequence or a previously compressed version of said reference video sequence.

According to one aspect of the invention, there is provided a method for determining a quality index for a test video sequence, said method comprising: receiving, in a processor, a test video sequence, wherein said test video sequence comprises a first plurality of image frames; receiving, in said processor, a reference video sequence corresponding to said test video sequence, wherein said reference video sequence comprises a second plurality of image frames; and in said processor, calculating a temporal quality index using said test video sequence and said reference video sequence, wherein said calculating comprises; forming a first reference difference image between a first image frame in said second plurality of image frames and a second image frame in said second plurality of image frames, wherein said second image frame is an immediately temporally previous image frame to said first image frame in said second plurality of image frames; forming a first test difference image between a test image frame in said first plurality of image frames, wherein said test image frame corresponds temporally to said first image frame, and said second image frame; calculating a multi- scale structural similarity (MS-SSIM) index using said first reference difference image and said first test difference image; and averaging said MS-SSIM index with a plurality of previously calculated MS-SSIM indices.

Furthermore, it is preferable that combining said tem oral quality index with a spatial quality index.

Furthermore, it is preferable that the combining comprises averaging said spatial quality index and said temporal quality index.

Furthermore, it is preferable that the test video sequence is a degraded version of said reference video sequence, a processed version of said reference video sequence or a previously compressed version of said reference video sequence.

According to one aspect of the invention, there is provided an apparatus for determing a quality index for a test video sequence, said apparatus comprising: a video- sequence receiver for receiving a test video sequence and a reference video sequence corresponding to the test video sequence; a spatial-quality-index calculator for calculating a spatial quality index using said test video sequence and said reference video sequence; a temporal-quality-index calculator for calculating a temporal quality index using said test video sequence and said reference video sequence; and a quality-index combiner for combining said spatial quality index and said temporal quality index to form a final quality index for said test video sequence.

According to one aspect of the invention, there is provided an apparatus for determing a quality index for a test video sequence, said apparatus comprising: a first video-frame receiver for recieving a test video sequence, wherein said test video sequence comprises a first plurality of image frames; said first video-frame receiver for recieving a reference video sequence corresponding to said test video sequence, wherein said reference video sequence comprises a second plurality of image frames; and a first multi-scale structural similarity (MS-SSIM) index calculator for calculating MS-SSIM index for each image frame in said first plurality of image frames and a temporally corresponding image frame in said second plurality of image frames, thereby producing a plurality of MS-SSIM indices; an MS-SSIM index pooler for pooling said plurality of MS- SSIM indices, thereby producing a plurality of pooled MS- SSIM indices; a minimum calculator for determining a minimum value from said plurality of MS-SSIM indices and for setting said spatial quality index to said minimum value.

According to one aspect of the invention, there is provided an apparatus for determing a quality index for a test video sequence, said apparatus comprising: a second video-frame receiver for recieving a test video sequence, wherein said test video sequence comprises a first plurality of image frames and a reference video sequence corresponding to said test video sequence, wherein said reference video sequence comprises a second plurality of image frames; and a frame differencer for forming a first reference difference image between a first image frame in said second plurality of image frames and a second image frame in said second plurality of image frames, wherein said second image frame is an immediately temporally previous image frame to said first image frame in said second plurality of image frames; said frame differencer for forming a first test difference image between a test image frame in said first plurality of image frames, wherein said test image frame corresponds temporally to said first image frame, and said second image frame; a second MS-SSIM index calculator for calculating MS-SSIM index using said first reference difference image and said first test difference image; and an MS-SSIM conbiner for averaging said MS-SSIM index with a plurality of previously calculated MS-SSIM indices.

According to one aspect of the invention, there is provided a system for determining a quality index for a test video sequence, the system comprising: the apparatus as described above. The terms and expressions which have been employed in the foregoing specification are used therein as terms of description and not of limitation, and there is no intention in the use of such terms and expressions of excluding equivalence of the features shown and described or portions thereof, it being recognized that the scope of the invention is defined and limited only by the claims which follow.