METHODS OF ENCRYPTING A MULTIMEDIA FILE, METHODS OF DECRYPTING AN ENCRYPTED MULTIMEDIA FILE; COMPUTER PROGRAM PRODUCTS AND APPARATUS

Title:

METHODS OF ENCRYPTING A MULTIMEDIA FILE, METHODS OF DECRYPTING AN ENCRYPTED MULTIMEDIA FILE; COMPUTER PROGRAM PRODUCTS AND APPARATUS

Document Type and Number:

WIPO Patent Application WO/2023/026065

Kind Code:

Abstract:

There is disclosed a computer-implemented method of encrypting a video file, the video file including a plurality of two or more levels of frames of the video file, wherein content of frames in level zero of the video file is displayable without depending on content of frames of any other level, and wherein content of frames in each level x not in level zero of the video file is displayable only using content of at least one or more frames not in level x of the frames of the video file, the method including the steps of: (i) accessing the video file including the plurality of two or more levels of frames of the video file, wherein the content of frames in level zero of the video file is displayable without depending on the content of frames of any other level, and wherein the content of frames in each level x not in level zero of the video file is displayable only using the content of at least one or more frames not in level x of the frames of the video file; (ii) encrypting the frames in level zero of the video file; (iii) assembling a file comprising a partially encrypted version of the video file, the partially encrypted version of the video file including level zero frames including the encrypted frames in level zero of the video file from step (ii), and further including the levels of frames of the video file that do not include level zero of the frames of the video file. Related methods of encryption and decryption are also disclosed. Related computer program products and apparatus are also disclosed.

Inventors:

STREATER STEPHEN (GB)

Application Number:

PCT/GB2022/052216

Publication Date:

March 02, 2023

Filing Date:

August 30, 2022

Export Citation:

Click for automatic bibliography generation Help

Assignee:

BLACKBIRD PLC (GB)

International Classes:

H04N19/30; H04N21/2347; H04N21/4405

Domestic Patent References:

WO2018127695A2	2018-07-12
WO2005048607A1	2005-05-26
WO2007077447A2	2007-07-12

Foreign References:

US20040196975A1	2004-10-07
US20020118828A1	2002-08-29
US20050185795A1	2005-08-25
US20210084304A1	2021-03-18
US20100014666A1	2010-01-21
EP1503590A2	2005-02-02
EP1503590B1	2018-06-13
EP3641317A1	2020-04-22
EP3641317B1	2022-06-15
US9179143B2	2015-11-03
US8711944B2	2014-04-29
US8660181B2	2014-02-25

Attorney, Agent or Firm:

ORIGIN LTD (GB)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1. A computer-implemented method of encrypting a video file, the video file including a plurality of two or more levels of frames of the video file, wherein content of frames in level zero of the video file is displayable without depending on content of frames of any other level, and wherein content of frames in each level x not in level zero of the video file is displayable only using content of at least one or more frames not in level x of the frames of the video file, the method including the steps of:

(i) accessing the video file including the plurality of two or more levels of frames of the video file, wherein the content of frames in level zero of the video file is displayable without depending on the content of frames of any other level, and wherein the content of frames in each level x not in level zero of the video file is displayable only using the content of at least one or more frames not in level x of the frames of the video file;

(ii) encrypting the frames in level zero of the video file;

(iii) assembling a file comprising a partially encrypted version of the video file, the partially encrypted version of the video file including level zero frames including the encrypted frames in level zero of the video file from step (ii), and further including the levels of frames of the video file that do not include level zero of the frames of the video file.

2. The method of Claim 1, further including storing the file assembled in step (iii).

3. A computer-implemented method of encrypting a video file, the video file including a compressed format structure including a hierarchy of two or more levels of temporal resolution of frames of the video file, wherein frames in level zero of the video file have the lowest temporal resolution, wherein content of the frames in level zero of the video file is displayable when decompressed without depending on content of frames of any other level, and wherein content of frames in each level x not in level zero of the video file is displayable when decompressed only using content of at least one or more frames not in level x of the frames of the video file, and included in one or more lower levels of lower temporal resolution of frames of the hierarchy, the method including the steps of: 43

(i) accessing the video file including the compressed format structure including a hierarchy of two or more levels of temporal resolution of frames of the video file, wherein frames in level zero of the video file have the lowest temporal resolution, wherein content of the frames in level zero of the video file is displayable when decompressed without depending on content of frames of any other level, and wherein content of frames in each level x not in level zero of the video file is displayable when decompressed only using content of at least one or more frames not in level x of the frames of the video file, and included in one or more lower levels of lower temporal resolution of frames of the hierarchy;

(ii) encrypting the frames in level zero of the video file;

4. The method of Claim 3, further including storing the file assembled in step (iii).

5. The method of any previous Claim, wherein the lowest level, level zero, of the hierarchy are key frames.

6. The method of Claim 5, wherein level one comprises delta frames, which are the deltas between the key frames.

7. The method of Claim 6, wherein level two comprises delta frames, which are the deltas between the level one frames.

8. The method of Claims 6 or 7, wherein the delta frames have a chain of dependency back to the key frames.

9. The method of any previous Claim, wherein decoding each level of content relies on all lower levels having been decoded, with an adaptive code where codewords depend on previous data. 44

10. The method of any previous Claim, wherein for the non-encrypted video file, compression uses transition tables for encoding and decoding, and to perform decoding successfully for any given level, you need to have decoded all the lower levels of lower temporal resolution.

11. The method of any previous Claim, wherein compressed level zero frames comprise 20% or less of the total compressed data of all levels, or 10% or less of the total compressed data of all levels, or 5% or less of the total compressed data of all levels.

12. The method of any previous Claim, in which non-zero level frames are not encrypted.

13. The method of any previous Claim, wherein compressed data in the non-zero level frames is 80% or more of the total compressed data of all levels, or 90% or more of the total compressed data of all levels, or 95% or more of the total compressed data of all levels.

14. The method of any previous Claim, in which if a file size of an encrypted level zero frame is less than a predetermined size, then the corresponding level one frame is also encrypted.

15. The method of Claim 14, wherein the predetermined size is lOkB, or less.

16. The method of any previous Claim, in which the compressed format structure is an MPEG structure.

17. The method of any of Claims 1 to 15, in which the compressed format structure is a Blackbird codec structure.

18. The method of any previous Claim, in which the encryption uses a symmetric key cryptography. 19. The method of any previous Claim, in which the encryption uses an asymmetric key cryptography.

20. A computer-implemented method of decrypting a partially encrypted video file to produce a decrypted video file, the decrypted video file including a plurality of two or more levels of frames of the decrypted video file, wherein content of frames in level zero of the decrypted video file is displayable without depending on content of frames of any other level, and wherein content of frames in each level x not in level zero of the decrypted video file is displayable only using content of at least one or more frames not in level x of the frames of the decrypted video file, the method including the steps of:

(i) accessing the partially encrypted video file including a plurality of two or more levels of frames of the partially encrypted video file, wherein the content of frames in level zero of the partially encrypted video file is displayable after decryption without depending on the content of frames of any other level, and wherein the content of frames in each level x not in level zero of the partially encrypted video file is displayable only using the content of at least one or more frames not in level x of the frames of the partially encrypted video file after decryption;

(ii) decrypting the frames in level zero of the partially encrypted video file;

(iii) assembling a decrypted video file, the decrypted video file including level zero frames including the decrypted frames in level zero of the partially encrypted video file from step (ii), and further including the levels of frames of the partially encrypted video file that do not include level zero of the frames of the partially encrypted video file.

21. The method of Claim 20, further including storing the file assembled in step (iii).

22. A computer-implemented method of decrypting a partially encrypted video file to produce a decrypted video file, the decrypted video file including a compressed format structure including a hierarchy of two or more levels of temporal resolution of frames of the decrypted video file, wherein frames in level zero of the decrypted video file have the lowest temporal resolution, wherein content of the frames in level zero of the decrypted video file is displayable when decompressed without depending on content of frames of any other level, and wherein content of frames in each level x not in level zero of the decrypted video file is displayable when decompressed only using content of at least one or more frames not in level x of the frames of the decrypted video file, and included in one or more lower levels of lower temporal resolution of frames of the hierarchy, the method including the steps of:

(i) accessing the partially encrypted video file including the compressed format structure including a hierarchy of two or more levels of temporal resolution of frames of the partially encrypted video file, wherein frames in level zero of the partially encrypted video file have the lowest temporal resolution, wherein content of the frames in level zero of the partially encrypted video file is displayable after decryption when decompressed without depending on content of frames of any other level, and wherein content of frames in each level x not in level zero of the partially encrypted video file is displayable after decryption when decompressed only using content of at least one or more frames not in level x of the frames of the decrypted video file, and included in one or more lower levels of lower temporal resolution of frames of the hierarchy;

(ii) decrypting the frames in level zero of the partially encrypted video file;

23. The method of Claim 22, further including storing the file assembled in step (iii).

24. The method of any of Claims 20 to 23, further including a method of any of Claims 1 to 19.

25. The method of any of Claims 20 to 24, wherein for the decrypted video file, decoding each level of content relies on all lower levels having been decoded, with an adaptive code where codewords depend on previous data.

26. The method of any of Claims 20 to 25, wherein for the non-encrypted video file, compression uses transition tables for encoding and decoding, and to perform decoding 47 successfully for any given level, you need to have decoded all the lower levels of lower temporal resolution.

27. The method of any of Claims 20 to 26, wherein level zero frames comprise 20% or less of the total (e.g. compressed) data of all levels, or 10% or less of the total (e.g. compressed) data of all levels, or 5% or less of the total (e.g. compressed) data of all levels.

28. The method of any of Claims 20 to 27, wherein the data in the non-zero level frames is 80% or more of the total (e.g. compressed) data of all levels, or 90% or more of the total (e.g. compressed) data of all levels, or 95% or more of the total (e.g. compressed) data of all levels.

29. The method of any of Claims 20 to 28, in which if a level one frame of the partially encrypted video file is also encrypted, then it is also decrypted.

30. The method of any of Claims 20 to 28, in which the non-zero levels of the partially encrypted video file are not encrypted.

31. The method of any of Claims 20 to 30, in which the compressed format structure is an MPEG structure.

32. The method of any of Claims 20 to 30, in which the compressed format structure is a Blackbird codec structure.

33. The method of any of Claims 20 to 32, wherein the decryption uses a symmetric key cryptography.

34. The method of any of Claims 20 to 33, wherein the decryption uses an asymmetric key cryptography.

35. A computer program product executable on a processor to perform a computer- implemented method of encrypting a video file of any of Claims 1 to 19. 48

36. A computer program product executable on a processor to perform a computer- implemented method of decrypting a partially encrypted video file of any of Claims 20 to 34.

37. A file comprising the partially encrypted version of the video file of any of Claims 1 to 19.

38. A data signal including the file comprising the partially encrypted version of the video file of Claim 37.

39. A video file encryption apparatus including a processor configured to perform a computer-implemented method of encrypting a video file of any of Claims 1 to 19.

40. The video file encryption apparatus of Claim 39, wherein the video file encryption apparatus is a chip.

41. A video file decryption apparatus including a processor configured to perform a computer-implemented method of decrypting a partially encrypted video file of any of Claims 20 to 34.

42. The video file decryption apparatus of Claim 41, wherein the video file decryption apparatus is a chip.

43. A video file encryption and decryption apparatus, including a processor configured to perform a computer-implemented method of encrypting a video file of any of Claims 1 to 19, and wherein the processor is configured to perform a computer- implemented method of decrypting a partially encrypted video file of any of Claims 20 to 34.

44. The video file encryption and decryption apparatus of Claim 43, wherein the video file encryption and decryption apparatus is a chip.

Description:

METHODS OF ENCRYPTING A MULTIMEDIA FILE, METHODS OF DECRYPTING AN ENCRYPTED MULTIMEDIA FILE; COMPUTER PROGRAM PRODUCTS AND APPARATUS

BACKGROUND OF THE INVENTION

1. Field of the Invention

The field of the invention relates to methods of encrypting a multimedia file, especially encrypting a video file, to methods of decrypting an encrypted multimedia file, especially decrypting a video file, to related computer program products and apparatus, and to encrypted multimedia files, especially encrypted video files.

2. Technical Background

Security is an important technical theme these days, and this includes “encryption at rest”, in which data stored on non-volatile storage, such as hard disks, should be encrypted so that even if the disks are stolen and put into another machine, the data itself is not accessible without the relevant decryption keys.

Encrypting (and decrypting) all the data can take much processor time - particularly in a video environment where pebibytes of data are being handled. A pebibyte (Peta Binary Byte, abbreviated PiB) is a large unit of measurement of bytes. 1 pebibyte = 2 ^A50 bytes. Many systems support hardware options for this encryption and decryption. An efficient software solution is desirable, however, as it can be installed and used on existing hardware, with no hardware modification. A more efficient hardware solution is also desirable.

3. Discussion of Related Art

EP1503590A2 and EP1503590B1 disclose a data encryption apparatus operable to apply encryption to input data having at least a format identifying portion and a payload portion comprises means for discriminating between the format identifying portion of the input data and the payload portion of the input data; and means for selectively applying encryption to the input data in dependence upon an output of the discriminating means so as to encrypt at least a part of the payload portion of the input data but not to encrypt the format identifying portion. A corresponding decryption apparatus is also provided.

EP3641317A1 and EP3641317B1 disclose an encryption system configured to generate a multimedia file including digital rights management (DRM), the multimedia file comprising: video chunks that are only partially encrypted; 'DRM' chunks that contain a reference to the portion of a 'video' chunk that is encrypted and a reference to the key that can be used to decrypt the encrypted portion, where the decryption keys are scrambled and encrypted with a master key; and a 'DRM' header also contains information identifying the master key.

SUMMARY OF THE INVENTION

According to a first aspect of the invention, there is provided a computer-implemented method of encrypting a video file, the video file including a plurality of two or more levels of frames of the video file, wherein content of frames in level zero of the video file is displayable without depending on content of frames of any other level, and wherein content of frames in each level x not in level zero of the video file is displayable only using content of at least one or more frames not in level x of the frames of the video file, the method including the steps of

(ii) encrypting the frames in level zero of the video file;

An advantage is that less energy is consumed than in encrypting the entire video file. An advantage is that the processing is faster than encrypting the entire video file.

The method may be one further including storing the file assembled in step (iii).

According to a second aspect of the invention, there is provided a computer- implemented method of encrypting a video file, the video file including a compressed format structure including a hierarchy of two or more levels of temporal resolution of frames of the video file, wherein frames in level zero of the video file have the lowest temporal resolution, wherein content of the frames in level zero of the video file is displayable when decompressed without depending on content of frames of any other level, and wherein content of frames in each level x not in level zero of the video file is displayable when decompressed only using content of at least one or more frames not in level x of the frames of the video file, and included in one or more lower levels of lower temporal resolution of frames of the hierarchy, the method including the steps of:

(ii) encrypting the frames in level zero of the video file;

An advantage is that less energy is consumed than in encrypting the entire video file. An advantage is that the processing is faster than encrypting the entire video file.

The method may be one further including storing the file assembled in step (iii).

The method may be one wherein the lowest level, level zero, of the hierarchy are key frames.

The method may be one wherein level one comprises delta frames, which are the deltas between the key frames.

The method may be one wherein level two comprises delta frames, which are the deltas between the level one frames. The method may be one wherein the delta frames have a chain of dependency back to the key frames.

The method may be one wherein decoding each level of content relies on all lower levels having been decoded, with an adaptive code where codewords depend on previous data.

The method may be one wherein for the non-encrypted video file, compression uses transition tables for encoding and decoding, and to perform decoding successfully for any given level, you need to have decoded all the lower levels of lower temporal resolution.

The method may be one wherein compressed level zero frames comprise 20% or less of the total compressed data of all levels, or 10% or less of the total compressed data of all levels, or 5% or less of the total compressed data of all levels. An advantage is that less energy is consumed than in encrypting the entire video file. An advantage is that the processing is faster than encrypting the entire video file.

The method may be one in which non-zero level frames are not encrypted. An advantage is that less energy is consumed than in encrypting the entire video file. An advantage is that the processing is faster than encrypting the entire video file.

The method may be one wherein compressed data in the non-zero level frames is 80% or more of the total compressed data of all levels, or 90% or more of the total compressed data of all levels, or 95% or more of the total compressed data of all levels. An advantage is that less energy is consumed than in encrypting the entire video file. An advantage is that the processing is faster than encrypting the entire video file.

The method may be one in which if a file size of an encrypted level zero frame is less than a predetermined size, then the corresponding level one frame is also encrypted.

The method may be one wherein the predetermined size is lOkB, or less. The method may be one in which the compressed format structure is an MPEG structure.

The method may be one in which the compressed format structure is a Blackbird codec structure.

The method may be one in which the encryption uses a symmetric key cryptography.

The method may be one in which the encryption uses an asymmetric key cryptography.

According to a third aspect of the invention, there is provided a computer-implemented method of decrypting a partially encrypted video file to produce a decrypted video file, the decrypted video file including a plurality of two or more levels of frames of the decrypted video file, wherein content of frames in level zero of the decrypted video file is displayable without depending on content of frames of any other level, and wherein content of frames in each level x not in level zero of the decrypted video file is displayable only using content of at least one or more frames not in level x of the frames of the decrypted video file, the method including the steps of:

(ii) decrypting the frames in level zero of the partially encrypted video file;

An advantage is that less energy is consumed than in decrypting the entire video file.

An advantage is that the processing is faster than decrypting the entire video file. The method may be one further including storing the file assembled in step (iii).

According to a fourth aspect of the invention, there is provided a computer- implemented method of decrypting a partially encrypted video file to produce a decrypted video file, the decrypted video file including a compressed format structure including a hierarchy of two or more levels of temporal resolution of frames of the decrypted video file, wherein frames in level zero of the decrypted video file have the lowest temporal resolution, wherein content of the frames in level zero of the decrypted video file is displayable when decompressed without depending on content of frames of any other level, and wherein content of frames in each level x not in level zero of the decrypted video file is displayable when decompressed only using content of at least one or more frames not in level x of the frames of the decrypted video file, and included in one or more lower levels of lower temporal resolution of frames of the hierarchy, the method including the steps of:

(ii) decrypting the frames in level zero of the partially encrypted video file;

An advantage is that less energy is consumed than in decrypting the entire video file.

An advantage is that the processing is faster than decrypting the entire video file. The method may be one further including storing the file assembled in step (iii).

The method of any aspect of the third or fourth aspects of the invention may further include a method of any aspect of the first or second aspects of the invention.

The method may be one wherein for the decrypted video file, decoding each level of content relies on all lower levels having been decoded, with an adaptive code where codewords depend on previous data.

The method may be one wherein level zero frames comprise comprise 20% or less of the total (e.g. compressed) data of all levels, or 10% or less of the total (e.g. compressed) data of all levels, or 5% or less of the total (e.g. compressed) data of all levels. An advantage is that less energy is consumed than in decrypting the entire video file. An advantage is that the processing is faster than decrypting the entire video file.

The method may be one wherein the data in the non-zero level frames is 80% or more of the total (e.g. compressed) data of all levels, or 90% or more of the total (e.g. compressed) data of all levels, or 95% or more of the total (e.g. compressed) data of all levels. An advantage is that less energy is consumed than in decrypting the entire video file. An advantage is that the processing is faster than decrypting the entire video file.

The method may be one in which if a level one frame of the partially encrypted video file is also encrypted, then it is also decrypted.

The method may be one in which the non-zero levels of the partially encrypted video file are not encrypted. The method may be one in which the compressed format structure is an MPEG structure.

The method may be one in which the compressed format structure is a Blackbird codec structure.

The method may be one wherein the decryption uses a symmetric key cryptography.

The method may be one wherein the decryption uses an asymmetric key cryptography.

According to a fifth aspect of the invention, there is provided a computer program product executable on a processor to perform a computer-implemented method of encrypting a video file of any aspect of the first or second aspects of the invention.

According to a sixth aspect of the invention, there is provided a computer program product executable on a processor to perform a computer-implemented method of decrypting a partially encrypted video file of any aspect of the third or fourth aspects of the invention.

According to a seventh aspect of the invention, there is provided a file comprising the partially encrypted version of the video file of any aspect of the first or second aspects of the invention.

According to an eighth aspect of the invention, there is provided a data signal including the file comprising the partially encrypted version of the video file of the seventh aspect of the invention.

According to a ninth aspect of the invention, there is provided a video file encryption apparatus including a processor configured to perform a computer-implemented method of encrypting a video file of any aspect of the first or second aspects of the invention.

The video file encryption apparatus may be one wherein the video file encryption apparatus is a chip. An example is a field-programmable gate array (FPGA), or an application specific integrated circuit (ASIC).

According to a tenth aspect of the invention, there is provided a video file decryption apparatus including a processor configured to perform a computer-implemented method of decrypting a partially encrypted video file of any aspect of the third or fourth aspects of the invention.

The video file decryption apparatus may be one wherein the video file decryption apparatus is a chip. An example is a field-programmable gate array (FPGA), or an application specific integrated circuit (ASIC).

According to an eleventh aspect of the invention, there is provided a video file encryption and decryption apparatus, including a processor configured to perform a computer-implemented method of encrypting a video file of any aspect of the first or second aspects of the invention, and wherein the processor is configured to perform a computer-implemented method of decrypting a partially encrypted video file of any aspect of the third or fourth aspects of the invention.

The video file encryption and decryption apparatus may be one wherein the video file encryption and decryption apparatus is a chip. An example is a field-programmable gate array (FPGA), or an application specific integrated circuit (ASIC).

Aspects of the invention, or of the disclosures, may be combined.

BRIEF DESCRIPTION OF THE FIGURES

Aspects of the invention will now be described, by way of example(s), with reference to the following Figures, in which:

Figure 1 shows a typical image of 376x280 pixels divided into 8x8 pixel super-blocks.

Figure 2 shows a typical super-block of 8x8 pixels divided into 64 pixels.

Figure 3 shows a typical mini-block of 2x2 pixels divided into 4 pixels.

Figure 4 shows an example image containing two Noah regions and a Noah edge.

Figure 5 shows an example of global accessible context for Transition Tables.

Figure 6 shows an example of Transition Tables with local context (e.g. LC1, etc.) and corresponding resulting values which have been predicted so far.

Figure 7 shows an example of typical context information for cuts.

Figure 8 shows an example of typical context information for delta frames.

Figure 9 is a flowchart showing how variable length codewords may be generated from a list of codewords sorted by frequency.

Figure 10 is a schematic diagram of a sequence of video frames.

Figure 11 is a schematic diagram illustrating an example of a construction of a delta frame.

Figure 12 is a schematic diagram of an example of a media player.

Figure 13 shows an example in which an encryption key is used only for the level zero key frames, and the data not in the key frames (i.e. data in levels one to six) is not encrypted.

Figure 14 shows an example in which a decryption key is used only for the encrypted level zero key frames, and the data not in the key frames (i.e. data in levels one to six) is not decrypted, because it was not previously encrypted. DETAILED DESCRIPTION

In an example, there is provided a system for efficiently encrypting data when it is compressed using an adaptive code in a hierarchical form, such as the Blackbird family of video codecs. Where decoding each level of content relies on all previous levels having been decoded, with an adaptive code where codewords depend on previous data, in most cases, only the first level of data needs to be encrypted, as without knowing this level, none of the subsequent levels can be decrypted. Efficiency of encryption or decryption lies for example in using less processor time, and less processor energy. Some edge (i.e. exceptional) cases may also be handled, in a related but slightly different way.

In an example, we use a codec including a compressed format structure, the compressed format structure including a hierarchy of levels of temporal resolution of frames, each respective level of the hierarchy including frames corresponding to a respective temporal resolution of the respective level of the hierarchy, but not including frames which are included in one or more lower levels of lower temporal resolution of frames of the hierarchy. For example, the lowest level (level zero) of the hierarchy are key frames. In the next level, (level one) there are delta frames, which are the deltas between the key frames. In the next level (level two) there are delta frames, which are the deltas between the level one frames. In the next level (level three) there are delta frames, which are the deltas between the level two frames, etc. The compressed data comprises key frames and deltas, in which the deltas have a chain of dependency back to the key frames. During video playback, the codewords for decoding are optimised e.g. thousands of times per second, and the codewords are not stored explicitly in the bitstream. The only simple way to deduce the codewords at any point is to decode the video from the key frames up to that point.

So for example, in a system with 63 frames between key frames, where 63=2 ^A6 -1, hence with levels from level zero to level six, if you want to play at eight times the normal speed, you don’t use frames from levels six, five, or four, and you use only frames from levels three, two, one, and zero. The compression may use transition tables for encoding and decoding. But to perform decoding successfully for any given level, you need to have decoded all the lower levels of lower temporal resolution. So for example, the deltas for level one are no use if you don’t have the key frames (level zero). And for example, the deltas for level two are no use if you don’t have the deltas for level one, and the key frames (level zero).

The level zero frames might comprise only 5% of the total data. All the higher levels (e.g. level one to level six) typically are not encrypted, because these cannot be decoded without successfully decoding the level zero frames, so there is no need to encrypt the higher levels (e.g. level one to level six). The code words in the higher levels (e.g. level one to level six) are meaningless if one cannot decrypt the level zero key frames. Hence the data not in the key frames (level zero) does not need to be encrypted. In a typical example, the data not in the key frames (i.e. data not in level zero) which does not need to be encrypted is about 95% of the total data. See Figure 13, for example. For related decryption, see Figure 14, for example.

In edge cases, or exceptional cases, the key frames may be too simple (e.g. totally black) to obfuscate the codewords enough, and so in this case more frames are encrypted, for example some level one frames. Typically, a minimum number of bytes of the files, in order of decoding, would be present, to ensure the codewords used in the (e.g. Blackbird) codec were sufficiently unpredictable. An encrypted file size of lOkB is expected to be a sufficiently large file size, in most cases.

This approach is harder to do in MPEG, because there is only a key frame (level zero) and a further level (level one), but it can be implemented in MPEG. In MPEG, it is less beneficial than in the Blackbird codec, but it is beneficial nonetheless.

Consider a situation where video, or other data, is compressed in layers, where each successive layer provides further details or resolution (for example temporal or spatial resolution) compared to antecedent layers.

Further consider a case where the compression uses an adaptive code, where the codewords are optimised as data is received and/or processed. In this case, the codewords in later layers are dependent on the contents of earlier layers.

Encrypting the first layer makes the other layers hard to decrypt, because the adaptive codewords are unknown.

One example case is the Blackbird video codecs, including for example Blackbird 9. Elements of Blackbird 9 are disclosed in WO2018127695A2. Here the video is stored in multiple files, including one file for each time period.

In an example, video is split into a first key frame and then multiple chunks of, for example 64 frames each, each with its own key frame.

Each chunk of 64 frames is split into multiple files, including one file for each time period (e.g. one second, half a second, quarter of a second, eighth of a second, and so on), each of which cannot be decompressed without knowledge of the preceding files in the same chunk, and the last key frame of the previous chunk.

For example, File 0 from the previous chunk includes frame 0. File 0 includes frame 64 - a single key frame compressed through intra-frame compression. File 1 includes frame 32 (when decompressed with the knowledge of frames 0 and 64). File 2 includes frames 16 and 48 (when decompressed with the knowledge of Files 0 and 1. File 3 includes frames 8, 24, 40 and 56 (when decompressed with the knowledge of Files 0, 1 and 2). File 4 includes frames 4, 12, 20, 28, 36, 44, 52 and 60 (when decompressed with the knowledge of Files 0, 1, 2 and 3). File 5 includes frames 2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58 and 62 (when decompressed with the knowledge of Files 0, 1, 2, 3 and 4). File 6 includes frames 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61 and 63 (when decompressed with the knowledge of Files 0, 1, 2, 3, 4 and 5).

A codec, e.g. a Blackbird codec, may use “Transition tables” to efficiently adjust the codewords e.g. thousands of times per second of video. By encrypting File 0 in each chunk, the codewords used to compress the remaining frames are almost impossible to guess, so encrypting these other frames is usually unnecessary.

In certain cases, the key frames are easy to guess. For example, an entirely black video frame is a possible occurrence, and using this “guess” would provide decoding of all the other frames in the chunk, including potentially secret information. To avoid this possibility, if File 0 is very short, File 1 should be encrypted too. If File 1 is very short, File 2 should be encrypted too. If File 2 is very short, File 3 should be encrypted too. And so on.

Although this disclosure has been given with particular reference to video data, it will be appreciated that it could also be applied to other types of data such as audio data.

Encryption Examples

Encryption examples which may be used in examples of the invention include the following cryptographic schemes for encrypting/decrypting information content: symmetric key cryptography and asymmetric key cryptography. In symmetric key cryptography the key used to decrypt the information is the same as (or easily derivable from) the key used to encrypt the information. However, in asymmetric key cryptography the key used for decryption differs from that used for encryption and it should be computationally infeasible to deduce one key from the other. For asymmetric cryptography a public key / private key pair is generated, the public key (which need not be kept secret) being used to encrypt information and the private key (which must remain secret) being used to decrypt the information. An example of an asymmetric cryptography algorithm that may be used is the RSA (Rivest-Shamir-Adleman) algorithm. The RSA algorithm relies on a one-way function. The public key X is a product of two large prime numbers p and q, which together form the private key. The public key is inserted into the one-way function during the encryption process to obtain a specific one-way function tailored to the recipient's public key. The specific one-way function is used to encrypt a message. The recipient can reverse the specific one-way function only via knowledge of the private key (p and q). Note that X must be large enough so that it is infeasible to deduce p and q from a knowledge of X alone.

An alternative asymmetric encryption scheme to RSA is elliptical curve cryptography (ECC). Rather than generating keys as the product of very large prime number as in the case of RSA, ECC generates keys through properties of an elliptic curve equation. ECC is likely to be faster at key generation than RSA and hence is a preferred asymmetric encryption technique for the present arrangement. The information content stored on a storage medium may be symmetrically encrypted using one or more content encryption keys (CEKs) and the CEKs are asymmetrically encrypted using a public key / private key pair. In alternative examples, data may be asymmetrically encrypted. The CEKs are generated according to a binary tree encryption scheme and may be updated on a regular basis, e.g. every 10 seconds, so that different portions of encrypted information content correspond to different subsets of the CEKs. This allows for selective accessibility of the decrypted information content. Symmetric encryption may be used for encryption of information content because it is less numerically intensive and hence quicker than asymmetric encryption.

IMPROVEMENTS TO REPRESENTATIONS OF COMPRESSED VIDEO

This section of this document relates to disclosures made in W02005048607A1, US9179143B2 and US8711944B2.

There is provided a method of compressing digital data comprising the steps of (i) reading digital data as series of binary coded words representing a context and a codeword to be compressed, (ii) calculating distribution output data for the input data and assigning variable length codewords to the result; and (iii) periodically recalculating the codewords in accordance with a predetermined schedule, in order to continuously update the codewords and their lengths.

This disclosure relates to a method of processing of digital information such as video information. This digital video information may be either compressed for storage and then later transmission, or may be compressed and transmitted live with a small latency. Transmission is for example over the internet.

There is a need for highly efficient compression techniques to be developed to enable transmission of video or other data in real time over the internet because of the restrictions in the bandwidth. In addition, the increasing need for high volume of content and rising end-user expectations mean that a market is developing for live compression at high frame rate and image size.

An object of this disclosure is to provide such compression techniques.

The video to be compressed can be considered as comprising a plurality of frames, each frame made up of individual picture elements, or pixels. Each pixel can be represented by three components, usually either RGB (red, green and blue) or YUV (luminance and two chrominance values). These components can be any number of bits each, but eight bits of each is usually considered sufficient.

The human eye is more sensitive to the location of edges in the Y values of pixels than the location of edges in U and V. For this reason, the preferred implementation here uses the YUV representation for pixels.

The image size can vary, with more pixels giving higher resolution and higher quality, but at the cost of higher data rate. Where the source video is in PAL format, the image fields have 288 lines with 25 frames per second. Square pixels give a source image size of 384 x 288 pixels. The preferred implementation has a resolution of 376 x 280 pixels using the central pixels of a 384 x 288 pixel image, in order to remove edge pixels which are prone to noise and which are not normally displayed on a TV set.

The images available to the computer generally contain noise so that the values of the image components fluctuate. These source images may be filtered as the first stage of the compression process. The filtering reduces the data rate and improves the image quality of the compressed video.

A further stage analyses the contents of the video frame-by-frame and determines which of a number of possible types pixel should be allocated to. These broadly correspond to pixels in high contrast areas and pixels in low contrast areas.

The pixels are hard to compress individually, but there are high correlations between each pixel and its near neighbours. To aid compression, the image is split into one of a number of different types of components. The simpler parts of the image split into rectangular components, called "super-blocks" in this application, which can be thought of as single entities with their own structure. These blocks can be any size, but in the preferred implementation described below, the super-blocks are all the same size and are 8 x 8 pixel squares. More structurally complex parts of the image where the connection between pixels further apart is less obvious are split up into smaller rectangular components, called "mini-blocks" in this application.

It is apparent that if each super-block is compressed separately, the errors resulting from the compression process can combine across edges between super-blocks thus illustrating the block-like nature of the compression by highlighting edges between blocks, which is undesirable. To avoid this problem, the mini-blocks are tokenised with an accurate representation and these are compressed in a loss free way.

Each super-block or mini-block is encoded as containing YUV information of its constituent pixels.

This U and V information is stored at lower spatial resolution than the Y information, in one implementation with only one value of each of U and V for every mini-block. The super-blocks are split into regions. The colour of each one of these regions is represented by one UV pair.

Real time filtering

An aim is to remove noise from the input video, as noise is by definition hard to compress. The filtering mechanism takes frames one at a time. It compares the current frame with the previous filtered frame on a pixel-by-pixel basis. The value for the previous pixel is used unless there is a significant difference. This can occur in a variety of ways. In one, the value of the pixel in the latest frame is a long way from the value in the previous filtered frame. In another, the difference is smaller, but consistently in the same direction. In another, the difference is even smaller, but cumulatively, over a period of time, has tended to be in the same direction. In these the first two cases, the pixel value is updated to the new value. In the third case, the filtered pixel value is updated by a small amount in the direction of the captured video. The allowable error near a spatial edge is increased depending on the local contrast to cut out the effects of spatial jitter on the input video.

Real time motion estimation

The video frames are filtered into "Noah regions". Thus the pixels near to edges are all labelled. In a typical scene, only between 2% and 20% of the pixels in the image turn out to have the edge labelling. There are three types of motion estimation used. In the first, whole frame pan detection using integer number of pixels is implemented.

These motions can be implemented efficiently over the whole image on playback as pixels can be copied to new locations and no blurring is needed. This uses the edge areas from the Noah regions only, as the edges contain the information needed for an accurate motion search. The second is sub-pixel motion removal over the whole image.

This uses the edge areas from the Noah regions only, as the edges contain the information needed for an accurate motion search. The edge pixels in the image, estimated by example from the Noah filtering stage, are matched with copies of themselves with translations of up to e.g. 2 pixels, but accurate to e.g. 1/64 pixel (using a blurring function to smooth the error function) and small rotations. The best match is calculated by a directed search starting at a large scale and increasing the resolution until the required sub-pixel accuracy is attained. This transformation is then applied in reverse to the new image frame and filtering continues as before. These changes are typically ignored on playback. The effect is to remove artefacts caused by camera shake, significantly reducing data rate and giving an increase in image quality. The third type examines local areas of the image. Where a significant proportion of the pixels are updated, for example on an 8x8 pixel block, either motion vectors are tested in this area with patches for the now smaller temporal deltas, or a simplified super-block representation is used giving either 1 or 2 YUVs per block, and patches are made to this.

Real time fade representation

The encoding is principally achieved by representing the differences between consecutive compressed frames. In some cases, the changes in brightness are spatially correlated. In this case, the image is split into blocks or regions, and codewords are used to specify a change over the entire region, with differences with these new values rather than differences to the previous frame itself being used.

Segment Noah regions-find edges

A typical image includes areas with low contrast and areas of high contrast, or edges. The segmentation stage described here analyses the image and decides whether any pixel is near an edge or not. It does this by looking at the variance in a small area containing the pixel. For speed, in the current implementation, this involves looking at a 3x3 square of pixels with the current pixel at the centre, although implementations on faster machines can look at a larger area. The pixels which are not near edges are compressed using an efficient but simple representation which includes multiple pixels- for example 2x2 blocks or 8x8 blocks, which are interpolated on playback. The remaining pixels near edges are represented as either e. g. , 8x8 blocks with a number of YUV areas (typically 2 or 3) if the edge is simply the boundary between two or more large regions which just happen to meet here, or as 2x2 blocks with 1 Y and one UV per block in the case that the above simple model does not apply e.g. when there is too much detail in the area because the objects in this area are too small.

Miniblockify

The image is made up of regions, which are created from the Noah regions. The relatively smooth areas are represented by spatially relatively sparse YUV values, with the more detailed regions such as the Noah edges being represented by 2x2 blocks which are either uniform YUV, or include a UV for the block and maximum Y and a minimum Y, with a codeword to specify which of the pixels in the block should be the maximum Y value and which should be the minimum. To further reduce the datarate, the Y pairs in the non-uniform blocks are restricted to a subset of all possible Y pairs which is more sparse when the Y values are far apart.

Transitions with variable lengths codewords

Compressing video includes in part predicting what the next frame will be, as accurately as possible from the available data, or context. Then the (small) unpredictable element is what is sent in the bitstream, and this is combined with the prediction to give the result. The transition methods described here are designed to facilitate this process. On compression, the available context and codeword to compress are passed to the system. This then adds this information to its current distribution (which it is found performs well when it starts with no prejudice as the likely relationship between the context and the output codeword). The distribution output data for this context is calculated and variable length codewords assigned to the outcomes which have arisen. These variable length codewords are not calculated each time the system is queried as the cost/reward ratio makes it unviable, particularly as the codewords have to be recalculated on the player at the corresponding times they are calculated on the compressor. Instead, the codewords are recalculated from time to time. For example, every new frame, or every time the number of codewords has doubled. Recalculation every time an output word is entered for the first time is too costly in many cases, but this is aided by not using all the codeword space every time the codewords are recalculated. Codeword space at the long end is left available, and when new codewords are needed then next one is taken. As these codewords have never occurred up to this point, they are assumed to be rare, and so giving them long codewords is not a significant hindrance. When the codeword space is all used up, the codewords are recalculated. The minimum datarate for Huffman codewords is a very flat and wide minimum, so using the distribution from the codewords which have occurred so far is a good approximation to the optimal. Recalculating the codewords has to happen quickly in a real time system. The codewords are kept sorted in order of frequency, with the most frequent codewords first. In an example, the sorting is a mixture of bin sort using linked lists which is O(n) for the rare codewords which change order quite a lot, and bubble sort for the common codewords which by their nature do not change order by very much each time a new codeword is added. The codewords are calculated by keeping a record of the unused codeword space, and the proportion of the total remaining codewords the next data to encode takes. The shorted codeword when the new codeword does not exceed its correct proportion of the available codeword space is used. There are further constraints: in order to keep the codes as prefix codes and to allow spare space for new codewords, codewords never get shorter in length, and each codeword takes up an integer power of two of the total codeword space. This method creates the new codewords into a lookup table for quick encoding in O(n) where n is the number of sorted codewords.

Memory management

To facilitate Java playback, all the memory used is allocated in one block at the start. As garbage collection algorithms on Java virtual machines are unpredictable, and many stop the system for periods which are long in comparison to the length of a video frame, the computer method or apparatus may use its own memory management system. This involves allocating enough memory for e.g. 2 destination codewords for each source codeword when it is first encountered. New transitions are added as and when they occur, and when the available space for them overflows, the old memory is ignored, and new memory of twice the size is allocated. Although up to half the memory may end up unused, the many rare transitions take almost no memory, and the system scales very well and makes no assumption about the distribution of transitions.

Give compressed codeword for this uncompressed codeword

Every time a codeword occurs in a transition for the second or subsequent time, its frequency is updated and it is re-sorted. When it occurs for the first time in this transition however, it must be defined. As many codewords occur multiple times in different transitions, the destination value is encoded as a variable length codeword each time it is used for the first time, and this variable length codeword is what is sent in the bitstream, preceded by a "new local codeword" header codeword. Similarly, when it occurs for the first time ever, it is encoded raw preceded by a "new global codeword" header codeword. These header codewords themselves are variable length and recalculated regularly, so they start off short as most codewords are new when a new environment is encountered, and they gradually lengthen as the transitions and concepts being encoded have been encountered before.

Video compression (cuts)

Cuts are compressed using spatial context from the same frame.

Cuts, RLE uniform shape, else assume independent and context=CUT_CW.

Cuts- > editable, so needs efficient. First approximation at lower resolution e. g., 8x8. Cuts-predict difference in mini-block codewords from previous one and uniform flag for current one.

Video compression (deltas)

The deltas can use temporal and spatial context.

Deltas shape-predict shape from uniformness of four neighbours and old shape. Deltas-predict mini-block codeword differences from uniformness of this miniblock and old mini-block in time.

Datarate reductions

Various simple but effective datarate reduction methods are employed. Noise in the input signal can lead to isolated small changes over the image, whose loss would not be noticed. Isolated changed mini-blocks are generally left out from the bitstream, though if the mini-block is sufficiently different they can still be updated. In addition, small changes in colour in high colour areas are generally ignored as these are almost always caused by noise.

Multi-level gap masks: 4x4, 16x16, 64x64

The bulk of the images are represented mbs and gaps between them. The gaps are spatially and temporally correlated. The spatial correlation is catered for by dividing the image into 4x4 blocks of mbs, representing 64 pixels each, with one bit per miniblock representing whether the mbs has changed on this frame. These 4x4 blocks are grouped into 4x4 blocks of these, with a set bit if any of the mbs it represents have changed. Similarly, these are grouped into 4x4 blocks, representing 128x128 pixels, which a set bit if any of the pixels has changed in the compressed representation. It turns out that trying to predict 16 bits at a time is too ambitious as the system does not have time to learn the correct distributions in a video of typical length. Predicting the masks 4x2 pixels at a time works well. The context for this is the corresponding gap masks from the two previous frames. The transition infrastructure above then gives efficient codewords for the gaps at various scales.

Multiple datarates at once

One of the features of internet or intranet video distribution is that the audience can have a wide range of receiving and decoding equipment. In particular the connection speed may vary widely. In a system such as this designed for transmission across the internet, it helps to support multiple datarates. So the compression filters the image once, then resamples it to the appropriate sizes involving for example cropping so that averaging pixels to make the final image the correct size involves averaging pixels in rectangular blocks of fixed size. There is a sophisticated datarate targeting system which skips frames independently for each output bitstream. The compression is sufficiently fast on a typical modern PC of this time to create modem or midband videos with multiple target datarates. The video is split into files for easy access, and these files may typically be 10 seconds long, and may start with a key frame. The player can detect whether its pre-load is ahead or behind target and load the next chunk at either lower or higher datarate to make use of the available bandwidth. This is particularly important if the serving is from a limited system where multiple simultaneous viewers may wish to access the video at the same time, so the limit to transmission speed is caused by the server rather than the receiver. The small files will cache well on a typical internet setup, reducing server load if viewers are watching the video from the same ISP, office, or even the same computer at different times.

Key frames The video may be split into a number of files to allow easy access to parts of the video which are not the beginning. In these cases, the files may start with a key frame. A key frame contains all information required to start decompressing the bitstream from this point, including a cut-style video frame and information about the status of the Transition Tables, such as starting with completely blank tables.

Digital Rights Management (DRM)

DRM is an increasingly important component of a video solution, particularly now content is so readily accessible of the internet. Data typically included in DRM may be an expiry data for the video, a restricted set of URLs the video can be played from. Once the compressor itself is sold, the same video may be compressed twice with different DRM data in an attempt to crack the DRM by looking at the difference between the two files. The compression described here is designed to allow small changes to the initial state of the transition or global compression tables to effectively randomise the bitstream. By randomizing a few bits each time a video is compressed, the entire bitstream is randomized each time the video is compressed, making it much harder to detect differences in compressed data caused by changes to the information encoded in DRM.

Miscellaneous

The Y values for each pixel within a single super-block can also be approximated.

In many cases, there is only one or part of one object in a super-block. In these cases, a single Y value is often sufficient to approximate the entire super-block's pixel Y values, particularly when the context of neighbouring super-blocks is used to help reconstruct the image on decompression.

In many further cases, there are only two or parts of two objects in a super-block.

In these cases, a pair of Y values is often sufficient to approximate the entire superblock's Y values, particularly when the context of the neighbouring super-blocks is used to help reconstruct the image on decompression. In the cases where there are two Y values, a mask is used to show which of the two Y values is to be used for each pixel when reconstructing the original super-block. These masks can be compressed in a variety of ways, depending on their content, as it turns out that the distribution of masks is very skewed. In addition, masks often change by small amounts between frames, allowing the differences between masks on different frames to be compressed efficiently.

Improvements to image quality can be obtained by allowing masks with more than two Y values, although this increases the amount of information needed to specify which Y value to use.

Although this disclosure has been given with particular reference to video data, it will be appreciated that it could also be applied to other types of data such as audio data.

Examples

Video frames of typically 384x288, 376x280, 320x240, 192x144, 160x120 or 128x96 pixels (see e.g. Figure 1) are divided into pixel blocks, typically 8x8 pixels in size (see e.g. Figure 2), and also into pixel blocks, typically 2x2 pixels in size, called mini-blocks (see e.g. Figure 3). In addition, the video frames are divided into Noah regions (see e.g. Figure 4), indicating how complex an area of the image is.

In one implementation, each super-block is divided into regions, each region in each super-block approximating the corresponding pixels in the original image and containing the following information:

1 Y values (typically 8 bits)

1 U value (typically 8 bits)

1 V value (typically 8 bits)

64 bits of mask specifying which YUV value to use when reconstructing this superblock. In this implementation, each mini-block contains the following information:

2 Y values (typically 8 bits each)

1 U value (typically 8 bits)

1 V value (typically 8 bits)

4 bits of mask specifying which Y value to use when reconstructing this mini -block.

Temporal gaps

If more latency is acceptable, temporal gaps rather than spatial gaps turn out to be an efficient representation. This involves coding each changed mini-block with a codeword indicating the next time (if any) in which it changes.

Interpolation between Uniform Super-Blocks

Where uniform super-blocks neighbour each other, bilinear interpolation between the Y, U and V values used to represent each block is used to find the Y, U and V values to use for each pixel on playback.

In an example, there is provided a method of processing digital video information for transmission or storage after compression, said method comprising: reading digital data representing individual picture elements (pixels) of a video frame as a series of binary coded words; segmenting the image into regions of locally relatively similar pixels and locally relatively distinct pixels; having a mechanism for learning how contextual information relates to codewords requiring compression and encoding such codewords in a way which is efficient both computationally and in terms of compression rate of the encoded codewords and which dynamically varies to adjust as the relationship between the context and the codewords requiring compression changes and which is computationally efficient to decompress; establishing a reduced number of possible luminance values for each block of pixels (typically no more than four); encoding to derive from the words representing individual pixels further words describing blocks or groups of pixels each described as a single derived word which at least includes a representation of the luminance of a block component of at least eight by eight individual pixels (super-block); establishing a reduced number of possible luminance values for each block of pixels (typically no more than four); encoding to derive from the words representing individual pixels further words describing blocks or groups of pixels each described as a single derived word which at least includes a representation of the luminance of a block component of typically two by two individual pixels (miniblock); establishing a reduced number of possible luminance values for each block of pixels (typically one or two); providing a series of changeable stored masks as a mechanism for indicating which of the possible luminance values are to be used in determining the appropriate luminance value of each pixel for display; comparing and evaluating the words representing corresponding portions of one frame with another frame or frames in a predetermined sequential order of the elements making up the groups to detect differences and hence changes; identifying any of the masks which require updating to reflect such differences and choosing a fresh mask as the most appropriate to represent such differences and storing the fresh mask or masks for transmission or storage; using context which will be available at the time of decompression to encode the masks, the changes in Y values, U values, and V values, and the spatial or temporal gaps between changed blocks, combined with the efficient encoding scheme, to give an efficient compressed real time representation of the video; using variable length codewords to represent the result of transitions in a way which is nearly optimal from a compression point of view, and computational very efficient to calculate.

There is provided a method of compressing digital data comprising the steps of: (i) reading digital data as series of binary coded words representing a context and a codeword to be compressed; (ii) calculating distribution output data for the input data and assigning variable length codewords to the result ; and (iii) periodically recalculating the codewords in accordance with a predetermined schedule, in order to continuously update the codewords and their lengths.

The method may be one in which the codewords are recalculated each time the number of codewords has doubled. The method may be one in which the codewords are recalculated for every new frame of data. The method may be one in which some codeword space is reserved at each recalculation so as to allow successive new codewords to be assigned for data of lower frequency. There is provided a method of processing digital video information so as to compress it for transmission or storage, said method comprising: reading digital data representing individual picture elements (pixels) of a video frame as a series of binary coded words; segmenting the image into regions of locally relatively similar pixels and locally relatively distinct pixels; establishing a reduced number of possible luminance values for each block of pixels (typically no more than four); carrying out an encoding process so as to derive from the words representing individual pixels, further words describing blocks or groups of pixels each described as a single derived word which at least includes a representation of the luminance of a block component of at least eight by eight individual pixels (super-block) ; establishing a reduced number of possible luminance values for each smaller block of pixels (typically no more than four); carrying out an encoding process so as to derive from the words representing individual pixels, further words describing blocks or groups of pixels each described as a single derived word which at least includes a representation of the luminance of a block component of typically two by two individual pixels (mini-block) ; establishing a reduced number of possible luminance values for each block of pixels (typically one or two); providing a series of changeable stored masks to indicate which of the possible luminance values are to be used in determining the appropriate luminance value of each pixel for display; comparing and evaluating the words representing corresponding portions of one frame with another frame or frames in a predetermined sequential order of the elements making up the groups to detect differences and hence changes; identifying any of the masks which require updating to reflect such differences and choosing a fresh mask as the most appropriate to represent such differences and storing the fresh mask or mask for transmission or storage; using context which will be available the time of decompression to encode the masks, the changes in Y values (luminance), U values (chrominance), and V values (chrominance) and the spatial or temporal gaps between changed blocks, combined with the efficient encoding scheme, to give an efficient compressed real time representation of the video; and using variable length codewords to represent the result of transitions.

The method may be one in which the method further comprises an adaptive learning process for deriving a relationship between contextual information and codewords requiring compression, and a process for dynamically adjusting the relationship so as to optimise the compression rate and the efficiency of decompression.

There is provided a method of compressing digital data for storage or transmission, comprising the steps of:

(i) reading inputted digital data as series of binary coded words representing a context and an input codeword to be compressed;

(ii) calculating distribution output data for the inputted digital data and generating variable length prefix codewords for each combination of context and input codeword, and generating a respective sorted Transition Table of variable length prefix codewords for each context, in a manner in which codeword space at the long end is left available to represent new input codewords, which have not yet occurred with corresponding contexts, as they occur; and

(iii) repeating the process of step (ii) from time to time;

(iv) whereby the inputted digital data can be subsequently replayed by recalculating the sorted Transition Table of local codewords at corresponding times in the inputted digital data.

The method may be one in which the codewords are recalculated for every new frame of data. The method may be one in which some codeword space is reserved at each recalculation so as to allow successive new codewords to be assigned for data of lower frequency. The method may be one in which some codeword space is reserved at each recalculation so as to allow successive new codewords to be assigned for data of lower frequency.

There is provided a method of compressing digital data for storage or transmission, comprising the steps of:

(i) reading digital data as a series of binary coded words representing a context and a codeword to be compressed;

(ii) calculating distribution output data for the input data and generating variable length prefix codewords for each combination of context and input codeword so as to form a respective sorted Transition Table of local codewords for each context, in a manner which reserves logical codeword space at the long end to represent any new input codewords, which have not yet occurred with that context, as they occur for the first time; and

(iii) repeating the process of step (ii) from time to time;

(iv) whereby the input data can be subsequently replayed by recalculating the codeword tables at corresponding times in the input data, wherein the codewords are recalculated each time the number of codewords has doubled.

There is provided a method of compressing digital data for storage or transmission, comprising the steps of:

(i) reading digital data as a series of binary coded words representing a context and a codeword to be compressed;

(iii) repeating the process of step (ii) from time to time;

(iv) whereby the input data can be subsequently replayed by recalculating the codeword tables at corresponding times in the input data, wherein the method further comprises an adaptive learning process for deriving a relationship between contextual information and codewords requiring compression, and a process for dynamically adjusting the relationship so as to optimize the compression rate and the efficiency of decompression.

A Method Of Compressing Video Data And A Media Player For Implementing the Method

This section of this document relates to disclosures made in W02007077447A2 and US8660181B2.

There is provided a method of receiving video data comprising the steps of: receiving at least one chunk of video data comprising a number of sequential key video frames where the number is at least two and, constructing at least one delta frame between a nearest preceding key frame and a nearest subsequent key frame from data contained in the either or each of the nearest preceding and subsequent frames.

Visual recordings of moving things are generally made up of sequences of successive images. Each such image represents a scene at a different time or range of times. This disclosure relates to such sequences of images such as are found, for example, in video, film and animation.

Video takes a large amount of memory, even when compressed. The result is that video is generally stored remotely from the main memory of the computer. In traditional video editing systems, this would be on hard discs or removable disc storage, which are generally fast enough to access the video at full quality and frame rate. Some people would like to access and edit video files content remotely, over the internet, in real time. This disclosure relates to the applications of video editing (important as much video content on the web will have been edited to some extent), video streaming, and video on demand.

At present any media player editor implementing a method of transferring video data across the internet in real time suffers the technical problems that: (a) the internet connection speed available to internet users is, from moment to moment, variable and unpredictable; and (b) that the central processing unit (CPU) speed available to internet users is from moment to moment variable and unpredictable.

For the application of video editing, consistent image quality is very preferable, because many editing decisions are based on aspects of the image, for example, whether the image was taken in focus or out.

It is an object of the present disclosure to alleviate at least some of the aforementioned technical problems. Accordingly this disclosure provides a method of receiving video data comprising the steps of: receiving at least one chunk of video data comprising a number (n) of sequential key video frames where the number (n) is at least two and, constructing at least one delta frame between a nearest preceding key frame and a nearest subsequent key frame from data contained in either, or each, of the nearest preceding and subsequent frames.

Preferably the delta frame is composed of a plurality of component blocks or pixels and each component of the delta frame is constructed according to data indicating it is one of: the same as the corresponding component in the nearest preceding key frame, or the same as the corresponding component in the nearest subsequent key frame, or a new value compressed using some or all of the spatial compression of the delta frame and information from the nearest preceding and subsequent frames. After the step of construction, the delta frame may be treated as a key frame for the construction of one or more further delta frames. Delta frames may continue to be constructed in a chunk until either: a sufficiently good predetermined image playback quality criterion is met or the time constraints of playing the video in real time require the frames to be displayed. The number of key frames in a chunk may be in the range from n=3 to n=10.

Although the method may have other applications, it is particularly advantageous when the video data is downloaded across the internet. In such a case it is convenient to download each key frame in a separate download slot, the number of said download slots equating to the maximum number of download slots supportable by the internet connection at any moment in time. Preferably each slot is implemented in a separate thread. Where it is desired to subsequently edit the video it is preferable that each frame, particularly the key frames, are cached upon first viewing to enable subsequent video editing.

According to another aspect of this disclosure, there is provided a media player arranged to implement the method which preferably comprises a receiver to receive chunks of video data including at least two key frames, and a processor adapted to construct a delta frame sequentially between a nearest preceding key frame and a nearest subsequent key frame. Preferably, a memory is also provided for caching frames as they are first viewed to reduce the subsequent requirements for downloading.

According to a third aspect of this disclosure, there is provided a method of compressing video data so that the video can be streamed across a limited bandwidth connection with no loss of quality on displayed frames which entails storing video frames at various temporal resolutions which can be accessed in a pre-defined order, stopping at any point. Thus multiple simultaneous internet accesses can ensure a fairly stable frame rate over a connection by (within the resolution of the multitasking nature of the machine) simultaneously loading the first or subsequent temporal resolution groups of frames from each of a number of non-intersecting subsets of consecutive video frames until either all the frames in the group are downloaded, or there would probably not be time to download the group, in which case a new group is started.

This disclosure includes a method for enabling accurate editing decisions to be made over a wide range of internet connection speeds, as well as video playback which uses available bandwidth efficiently to give a better experience to users with higher bandwidth. Traditional systems have a constant frame rate, but the present disclosure relates to improving quality by adding extra delta frame data, where bandwidth allows.

A source which contains images making up a video, film, animation or other moving picture is available for the delivery of video over the internet. Images (2, 4, 6...) in the source are digitised and labelled with frame numbers (starting from zero) where later times correspond to bigger frame numbers and consecutive frames have consecutive frame numbers. The video also has audio content, which is split into sections.

The video frames are split into chunks as follows: A value of n is chosen to be a small integer 0<n. In one implementation, n is chosen to be 5. A chunk is a set of consecutive frames of length 2 ^An. All frames appear in at least one chunk, and the end of each chunk is always followed immediately by the beginning of another chunk.

"f ' represent the frame number in the chunk, where the earliest frame (2) in each chunk has f=0, and the last (8) has f=(2 ^An)-l (see e.g. Figure 10).

All f=0 frames in a chunk are compressed as key frames - that is they can be recreated without using data from any other frames. All frames equidistant in time between previously compressed frames are compressed as delta frames recursively as follows: Let frame C (see e.g. Figure 11) be the delta frame being compressed. Then there is a nearest key frame earlier than this frame, and a nearest key frame later than this frame, which have already been compressed. Let us call them E and L respectively. Each frame is converted into a spatially compressed representation, in one implementation comprising rectangular blocks of various sizes with four Y or UV values representing the four comer values of each block in the luminance and chrominance respectively.

Frame C is compressed as a delta frame using information from frames E and L (which are known to the decompressor), as well as information as it becomes available about frame C.

In one implementation, the delta frame is reconstructed as follows:

Each component (12) of the image (pixel or block) is represented as either: the same as the corresponding component (10) in frame E; or the same as the corresponding component (14) in frame L; or a new value compressed using some or all of spatial compression of frame C, and information from frames E and L.

Compressing the video data in this way allows the second part of the disclosure to function. This is described next. When transferring data across the internet, using the HTTP protocol used by web browsers, the described compression has advantages, for example enabling access through many firewalls. The two significant factors relevant to this disclosure are latency and bandwidth. The latency here is the time taken between asking for the data and it starting to arrive. The bandwidth here is the speed at which data arrives once it has started arriving. For a typical domestic broadband connection, the latency can be expected to be between 20ms and Is, and the bandwidth can be expected to be between 256kb/s and 8Mb/s.

The disclosure involves one compression step for all supported bandwidths of connection, so the player (e.g. 16, Figure 12) has to determine the data to request which gives the best playback experience. This may be done as follows:

The player has a number of download slots (20, 22, 24...) for performing overlapping downloads, each running effectively simultaneously with the others. At any time, any of these may be blocked by waiting for the latency or by lost packets. Each download slot is used to download a key frame, and then subsequent files (if there is time) at each successive granularity. When all files pertaining to a particular section are downloaded, or when there would not be time to download a section before it is needed for decompression by the processor (18), the download slot is applied to the next unaccounted for key frame.

In one implementation of the disclosure, each slot is implemented in a separate thread.

A fast link results in all frames being downloaded, but slower links download a variable frame rate at e.g. 1, 1/2, 1/4, 1/8 etc of the frame rate of the original source video for each chunk. This way the video can play back with in real time at full quality, possibly with some sections of the video at lower frame rate.

In a further implementation, as used for video editing, frames downloaded in this way are cached in a memory (20A) when they are first seen, so that on subsequent accesses, only the finer granularity videos need be downloaded.

The number of slots depends on the latency and the bandwidth and the size of each file, but is chosen to be the smallest number which ensures the internet connection is fully busy substantially all of the time.

In one implementation, when choosing what order to download or access the data in, the audio is given highest priority (with earlier audio having priority over later audio), then the key frames, and then the delta frames (within each chunk) in the order required for decompression with the earliest first.

There is provided a method of receiving video data comprising the steps of: receiving at least one chunk of video data comprising a number (n) of sequential key video frames where the number (n) is at least two and, constructing at least one delta frame (C) between a nearest preceding key frame (E) and a nearest subsequent key frame (L) from data contained in the either or each of the nearest preceding and subsequent frames.

The method may be one wherein the delta frame (C) is composed of a plurality of component blocks or pixels and each component of the delta frame is constructed according to data indicating it is one of:

(a) the same as the corresponding component in the nearest preceding key frame (E), or

(b) the same as the corresponding component in the nearest subsequent key frame (L), or (c) a new value compressed using some or all of the spatial compression of frame C, and information from the nearest preceding and subsequent frames.

The method may be one wherein after the step of construction, the delta frame is treated as a key frame for the construction of one or more delta frames.

The method may be one wherein delta frames continue to be constructed in a chunk until either: a sufficiently good predetermined image playback quality criterion is met or the time constraints of playing the video in real time require the frames to be displayed.

The method may be one wherein the number of key frames is in the range from n=3 to n=10.

The method may be one comprising downloading the video data across the internet.

The method may be one comprising downloading each key frame in a separate download slot, the number of said download slots equating to the maximum number of download slots supportable by the internet connection at any moment in time.

The method may be one wherein each slot is implemented in a separate thread.

The method may be one wherein each frame is cached upon first viewing to enable subsequent video editing.

The method may be one wherein the key frames are cached.

There is provided a media player configured to implement the method according to any one of the above statements.

The media player may be one having: a receiver to receive chunks of video data including at least two key frames, a processor adapted to construct a delta frame sequentially between a nearest preceding key frame and a nearest subsequent key frame.

There is provided a method of compressing video data so that the video can be streamed across a limited bandwidth connection with no loss of quality on displayed frames, the method comprising storing video frames at various temporal resolutions which can be accessed in a pre-defined order, stopping at any point.

The method may be one where multiple simultaneous internet accesses can ensure a fairly stable frame rate over a connection by simultaneously loading the first or subsequent temporal resolution groups of frames from each of a number of nonintersecting subsets of consecutive video frames until either all the frames in the group are downloaded, or until a predetermined time has elapsed, and then in starting a new group.

There is provided a method of compressing video data with no loss of frame image quality on the displayed frames, by varying the frame rate relative to the original source video, the method comprising the steps of: receiving at least two chunks of uncompressed video data, each chunk comprising at least two sequential video frames and, compressing at least one frame in each chunk as a key frame, for reconstruction without the need for data from any other frames, compressing at least one intermediate frame as a delta frame between a nearest preceding key frame and a nearest subsequent key frame from data contained in either or each of the nearest preceding and subsequent frames, wherein further intermediate frames are compressed as further delta frames within the same chunk, by treating any previously compressed delta frame as a key frame for constructing said further delta frames, and storing the compressed video frames at various mutually exclusive temporal resolutions, which are accessed in a pre-defined order, in use, starting with key frames, and followed by each successive granularity of delta frames, stopping at any point; and whereby the frame rate is progressively increased as more intermediate data is accessed.

The method may be one wherein the delta frame is composed of a plurality of component blocks or pixels and each component of the delta frame is constructed according to data indicating it is one of:

(a) the same as the corresponding component in the nearest preceding key frame, or

(b) the same as the corresponding component in the nearest subsequent key frame, or

(c) a new value compressed using some or all of the spatial compression of frame, and information from the nearest preceding and subsequent frames.

The method may be one wherein after the step of construction, the delta frame is treated as a key frame for the construction of one or more delta frames.

The method may be one wherein delta frames continue to be constructed in a chunk until either: a predetermined image playback quality criterion, including a frame rate required by an end-user, is met or the time constraints of playing the video in real time require the frame to be displayed.

The method may be one wherein the number of frames in a chunk is 2 ^An, and n is in the range from n=3 to n=10.

The method may be one comprising downloading the video data across the internet.

The method may be one comprising downloading each key frame in a separate download slot, the number of said download slots equating to the minimum number to fully utilize the internet connection.

The method may be one wherein each slot is implemented in a separate thread.

The method may be one wherein each frame is cached upon first viewing to enable subsequent video editing. The method may be one wherein the key frames are cached.

There is provided a method of processing video data comprising the steps of: receiving at least one chunk of video data comprising 2 ^An frames and one key video frame, and the next key video frame; constructing a delta frame (C) equidistant between a nearest preceding key frame (E) and a nearest subsequent key frame (L) from data that includes data contained in either or each of the nearest preceding and subsequent key frames; constructing additional delta frames equidistant between a nearest preceding key frame and a nearest subsequent key frame from data that includes data contained in either or each of the nearest preceding and subsequent key frames, wherein at least one of the nearest preceding key frame or the nearest subsequent key frame is any previously constructed delta frame; storing the additional delta frames at various mutually exclusive temporal resolutions, which are accessible in a pre-defined order, in use, starting with the key frames, and followed by each successive granularity of delta frames, stopping at any point; and continuing to construct the additional delta frames in a chunk until either a predetermined image playback quality criterion, including a user selected frame rate, is achieved, or a time constraint associated with playing of the chunk of video data in real time requires the frames to be displayed.

The method may be one further comprising downloading the at least one chunk of video data at a frame rate that is less than an original frame rate associated with the received video data.

The method may be one further comprising determining a speed associated with the receipt of the at least one image chunk, and only displaying a plurality of constructed frames in accordance with the time constraint and the determined speed.

Note

It is to be understood that the above-referenced arrangements are only illustrative of the application for the principles of the present invention. Numerous modifications and alternative arrangements can be devised without departing from the spirit and scope of the present invention. While the present invention has been shown in the drawings and fully described above with particularity and detail in connection with what is presently deemed to be the most practical and preferred example(s) of the invention, it will be apparent to those of ordinary skill in the art that numerous modifications can be made without departing from the principles and concepts of the invention as set forth herein.

Previous Patent: METHOD OF EDITING A VIDEO FILE

Next Patent: MULTIDISC INDUCTOR-TYPE GENERATOR WITH ADJUSTABLE CURRENT IN THE EXCITATION WINDING