METHOD AND SYSTEM FOR AUDIO COMPRESSION AND DISTRIBUTION

Title:

METHOD AND SYSTEM FOR AUDIO COMPRESSION AND DISTRIBUTION

Document Type and Number:

WIPO Patent Application WO/2001/086638

Kind Code:

A2

Abstract:

Typical audio compression/decompression systems require a software application or browser plugin on the end user's computer because their decompression requires a great deal of processing, particularly of series of trigonometric functions. The invention provides a method and system in which audio data is decompressed using simple linear operations, which allows music quality audio to be decompressed in real time. This also allows the decompression to be performed in a Java environment, so executable audio applets can be posted on Web sites, and be universally accessible.

Inventors:

VESTERGAARD STEVE (CA)
TSUE CHE-WAI WILLIAM (CA)
KOLIC EDWARD (CA)

Application Number:

PCT/CA2001/000631

Publication Date:

November 15, 2001

Filing Date:

May 09, 2001

Export Citation:

Click for automatic bibliography generation Help

Assignee:

DESTINY SOFTWARE PRODUCTIONS I (CA)
VESTERGAARD STEVE (CA)
TSUE CHE WAI WILLIAM (CA)
KOLIC EDWARD (CA)

International Classes:

G10L19/04; H04B1/66; G10L19/02; (IPC1-7): G10L19/04

Foreign References:

EP0867862A2

1998-09-30

Other References:

JOSEPH S M ET AL: "SUBJECTIVE EVALUATION OF FOUR LOW-COMPLEXITY AUDIO CODING SCHEMES" JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, AMERICAN INSTITUTE OF PHYSICS. NEW YORK, US, vol. 97, no. 6, 1 June 1995 (1995-06-01), pages 3657-3662, XP000522124 ISSN: 0001-4966
RAMSEY L T ET AL: "Information-theoretic compressibility of speech data" PROCEEDINGS: ICASSP 87. 1987 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (CAT. NO.87CH2396-0), DALLAS, TX, USA, 6-9 APRIL 1987, pages 17-20 vol.1, XP002184418 1987, New York, NY, USA, IEEE, USA
JOSHI ET AL: "Some fast speech processing algorithms using AltiVec technology" ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 1999. PROCEEDINGS., 1999 IEEE INTERNATIONAL CONFERENCE ON PHOENIX, AZ, USA 15-19 MARCH 1999, PISCATAWAY, NJ, USA,IEEE, US, 15 March 1999 (1999-03-15), pages 2135-2138, XP010327872 ISBN: 0-7803-5041-3
SRIVASTAVA H O ET AL: "ON-LINE BROADCAST ARCHIVES FOR INTERACTIVE VIDEO" IEEE TRANSACTIONS ON BROADCASTING, IEEE INC. NEW YORK, US, vol. 43, no. 3, September 1997 (1997-09), pages 288-308, XP000834825 ISSN: 0018-9316
DATABASE INSPEC [Online] INSTITUTE OF ELECTRICAL ENGINEERS, STEVENAGE, GB; PROKHOROV YU N ET AL: "Study of ADPCM with recurrent preemphasis for speech and digital message transmission" Database accession no. 3057448 XP002184419 & ELEKTROSVYAZ, JULY 1986, USSR, vol. 40, no. 7, pages 53-56, ISSN: 0013-5771

Attorney, Agent or Firm:

Wada, Ikuko (Ontario K1P 1C3, CA)

Download PDF:

View/Download PDF PDF Help

Claims:

WHAT IS CLAIMED IS :

1.

A method of compressing music quality audio comprising the steps of: digitally sampling an audio signal ; generating a linearly predicted value of one of said digital samples, using previous samples; and calculating a residual value as the difference between a linearly predicted value and a corresponding one of said digital samples ; whereby a set of residual values will define said audio signal, and may be decompressed using simple linear operations rather than trigonometric operations.

2.	The method of claim 1, further comprising the step of coding said residual values to effect further compression.

3.	The method of claim 2, where said step of coding comprises the step of Huffman coding said residual values.

4.	The method of claim 1, wherein said step of generating a linearly predicted value comprises the step of generating a linearly predicted value of one of said digital samples, by performing an autoregression analysis on eight previous samples.

5.	The method of claim 4, wherein said autoregression analysis is performed on eight previous samples.

6.	: The method of claim 5, wherein the coefficients of said eight previous samples are weighted.

7.	The method of claim 6, wherein said autoregression analysis includes Schur recursion.

8.	The method of claim 1, wherein said step of digitally sampling further comprises the step of dividing said digital samples into blocks for said linear prediction operation.

9.

The method of claim 8, further comprising the steps of: subdividing each said block into subblocks ; searching past data to identify a prior subblock which correlates to a current sub block ; calculating a scaling factor which best matches said prior subblock to said current subblock ; recording the starting point of said prior subblock and the scaling factor; and subtracting said scaled prior subblock from said current subblock.

10.

The method of claim 1, further comprising the steps of: scanning the residuals in a given block and identifying the maximum amplitude; calculating a"quantization step"as said maximum amplitude divided by 32; and dividing each sample by the value of said quantization step to generate a residual value.

11.	The method of claim 10, further comprising the step of: responding to the total bits required to store said residual exceeding a desired bit rate, by adjusting said quantization step and repeating said step of dividing.

12.	The method of claim 10, wherein said residual data are downsampled, if said quantization step if too great.

13.	The method of claim 10, wherein said residual data are downsampled by a factor of 2, if said quantization step is greater than one third of the maximum amplitude in a subblock.

14.	The method of claim 1, further comprising the step of: wrapping said residual data in a Java applet, including a decompression routine.

15.	The method of claim 1, further comprising the step of: applying a preemphasis filter to said digital samples.

16.

A method of decompressing a compressed audio file comprising the steps of: receiving data corresponding to said compressed audio file ; sorting said data into blocks ; and linearly expanding said data, reproducing said audio file ; whereby said decompression does not require the execution of trigonometric and nonlinear functions which are demanding of system resources.

17.	A system for executing the method of any one of claims 1 through 16.

18.	A computer readable memory medium for storing software code executable to perform the method steps of any one of claims 1 through 16.

19.	A carrier signal incorporating software code executable to perform the method steps of any one of claims 1 through 16.

Description:

Method and System for Audio Compression and Distribution The present invention relates generally to communications, and more specifically, to a method and system for compressing high quality audio content for transfer over computer, telephony and wireless telephony networks, and subsequently decompressing it on a device with minimal processing power.

Background of the Invention Over the last two decades, tremendous advances have been made in the availability and capability of communication networks and devices. Hard-wired telephone systems have evolved to include wireless telephone and pager networks based on satellite, cellular, wireless local loop, line of sight and other wireless technologies. Data communication networks such as the Internet, Wide Area Networks (WANs) and Local Area Networks (LANs) have also become widespread, and support many different devices including personal and laptop computers, personal digital assistants (PDAs) and television set top boxes. There are also devices and networks which operate in both telephony and data environments, combining technologies of various types.

These telephony and data networks are generally converging to a model in which data are transferred in multiple packets of digital symbols. These data packets may travel independently of one another both in terms of the routings they take, and the time they take to travel from the originator to the destination. This model has proven to be very successful and is the basis, for example, of the protocols used in the Internet.

An area of particular interest is the communication of audio content between devices and over networks. Communicating high quality audio content requires a great deal of bandwidth on the network that is transferring the content between devices. Music on a compact disk (CD), for example, is stored in a format which contains 44,100 samples per second with 16 bits per sample, for each of two channels. A communication channel would therefore have to carry about 1.4 Mbits per second, to transfer a CD formatted music file in real time. Thus, it is desirable to compress audio data which is to be stored or transmitted. Techniques are known for compressing this audio content to minimize the demands on the network resources, but these techniques place computational demands on the devices compressing or decompressing the content, and also reduce the quality of the audio content.

The development of voice coding technology has been pursued for many years, in the areas of telephony. One such technique is PCM (Pulse Code Modulation). PCM converts analogue voice signals into digital form by sampling the analogue signal 8000 times per second and converting each sample into a numeric code. PCM is a"waveform"codec (coder/decoder) technique, that is, it is a compression technique which exploits the redundant characteristics of the waveform itself. PCM simply interprets each signal sample as an individual voltage or current pulse at a particular amplitude. This amplitude is binary encoded, and the binary data transmitted or manipulated as required.

PCM is an uncompressed digital representation of an audio signal, so it requires a great deal of memory to store, and bandwidth to transmit. Related techniques have been developed to improve the compression ratio of PCM, including DPCM (differential pulse code modulation) and ADPCM (adaptive differential pulse code modulation). These techniques reduce the bandwidth required to carry an audio signal, but do so at a reduction in audio quality.

There are also"parametric"or"vocoding"techniques such as MP-MLQ (multi-pulse, multilevel quantization) and ACELP (adaptive code-excited linear prediction) coding which make assumptions about the human voice so they only have to transmit parametric data. While this requires less bandwidth, it also produces mechanical sounding voices, and is poor at reproducing non-voice audio signals such as music. Hence, these coding techniques are undesirable for music quality audio applications.

Higher quality compression systems have been used to carry music over the Internet, such as MP3 and WAV. These and similar techniques generally use a software player which exists as an independent software application on the receiver's personal computer, or as a plugin to an Internet browser. Either way, the software must be compatible with the end user's operating system and software platform, and the format of the audio file being downloaded. Thus, the end user must obtain the correct software and configure it appropriately for his computer, making periodic upgrades as required. These downloading and configuration tasks can be slow and frustrating, which presents a barrier to access even if the software is available to the end user at no cost.

As there is no single format which has emerged as an industry standard, an end user must perform this exercise for multiple plugins and software applications.

Clearly, it is impractical to expect end users to maintain a large number of such

formats on their computer. Therefore, a Web site that includes such audio content could not be universally accessible.

These higher quality compression techniques require software on the end user's computer because they use processing intensive decompression algorithms.

The two most typical algorithms are based on the fast fourier transform (FFT) and the discrete cosine transform (DCT). Both of these techniques require a great deal of trigonometric processing to be performed on the end user's computer.

The FFT, for example, is based on the principle that any periodic function of time x (t) can be resolved into an equivalent infinite summation of sine waves and cosine waves with frequencies that start at 0 and increase in integer multiples of a base frequency fo = 1/T, where T is the period of x (t). The expansion may be presented as follows : The DCT transform is similar to the FFT in that it models signals as an infinite series of trigonometric waves, hence, it also requires intensive trigonometric processing to effect decompression.

The specific details of the FFT and DCT are not important for this discussion; it is sufficient to note that decompressing FFT or DCT compressed data requires a great deal of trigonometric processing. As well, the quality of audio reproduction using FFT rises as n increases, but the number of operations that must be performed rises with/ (even the most efficient routines still require n log n operations). This explains why FFT and DCT based systems must employ a dedicated software application on the end user's computer or other device: this is the only way they can process the data quickly enough to provide real-time audio.

Techniques are known for improving the efficiency of the trigonometric processing, but these techniques do not improve the efficiency a great deal, and generally come at the expense of reproductive quality.

There is therefore a need for an audio compression and decompression system which provides high quality audio reproduction, without placing excessive processing demands on the end user's computer.

Summary of the Invention It is therefore an object of the invention to provide a method and system which obviates or mitigates at least one of the disadvantages described above.

One aspect of the invention is broadly defined as a method of compressing music quality audio comprising the steps of: digitally sampling an audio signal ; generating a linearly predicted value of one of said digital samples, using previous samples ; and calculating a residual value as the difference between a linearly predicted value and a corresponding one of said digital samples ; whereby a set of residual values will define said audio signal, and may be decompressed using simple (linear) operations rather than trigonometric operations.

Another aspect of the invention is defined as a method of decompressing a compressed audio file comprising the steps of: receiving data corresponding to said compressed audio file ; sorting said data into blocks ; and linearly expanding said data, reproducing said audio file ; whereby said decompression does not require the execution of trigonometric and non-linear functions which are demanding of system resources.

Brief Description of the Drawings These and other features of the invention will become more apparent from the following description in which reference is made to the appended drawings in which: Figure 1 presents a flow chart of a method of compression in a broad embodiment of the invention; Figure 2 presents a block diagram of a system for compression, transfer and decompression of audio files in a preferred embodiment of the invention; Figures 3A, 3B, 3C and 3D present a flow chart of a method of implementing a file generation interface in a preferred embodiment of the invention; Figures 4A and 4B present a flow chart of a method of audio file compression in a preferred embodiment of the invention; and Figures 5A and 5B present a flow chart of a method of audio file decompression in a preferred embodiment of the invention.

Description of the Invention A methodology which addresses the objects outlined above, is presented as a flow chart in Figure 1. This figure presents a method of compressing music quality

audio in which only simple, linear operations must be performed on the end user's computer or other device. These simple operations can be executed quickly enough, that an executable file can be created which can run in a Java environment.

The method of the invention is generally effected as follows : 1. first by digitally sampling an audio signal in some manner, at step 20; then 2. generating linearly predicted values of the digital samples, using previous samples, at step 22; and finally 3. calculating residual values as the difference between the linearly predicted values and their corresponding digital samples, at step 24.

In this manner, only the residual values need to be stored and transmitted to an end user when required. These residual values can be used to reconstruct the audio signal using simple, linear operations rather than trigonometric operations required by FFT and DCT methods known in the art.

The invention is intended to be applied to audio signals which require high quality reproduction, such as music, though it could also be applied to less demanding audio signals such as human voice. The frequency range that a codec can handle is limited by the sampling rate it uses, but just because a system uses a high sampling rate does not necessarily mean that it will faithfully reproduce the audio signal. Most known codecs are optimised for human voice, and will not provide high fidelity reproduction of music.

Because the invention employs digital data and finite calculation methods, it provides high quality reproduction of the audio signal. In the preferred embodiments described hereinafter, there is additional compression of the audio data in which information is lost, but even with those additional compression techniques, the audio is still reproduced with music quality.

As noted in the Background, it is desirable to compress audio data in order to minimize the bandwidth necessary to transmit audio files. However, known methods use techniques which place a great processing demand on the processing resources of the decompressor, because they are attempting to model waveforms in the abstract. The method of the invention recognizes that when human voices and music are sampled quickly enough to provide accurate reproduction (music quality), there will generally not be a great deal of change between adjacent samples. Thus, it requires far less data to describe the audio signal as follows : by generating a linearly predicted value for each digital sample, based on previous samples, and to store the deviation from this prediction (the residual value).

The invention of Figure 1 addresses several of the problems in the art. It allows high quality, compressed audio files to be communicated or downloaded with a minimal demand on the resources of the communication network. More important, it allows those audio files to be decompressed on a platform that has minimal processing power, as all the decompression functions are simply linear operations.

This is in direct contrast to known decompression transforms such as DCT and FFT, which require complex trigonometric functions to be calculated.

Because the decompression routine of the invention can be processed so quickly, it can be implemented in a Java environment. Java is a code interpretor which has almost universal support in computers and similar devices. More important, it is platform independent, in that Java code and applets may be executed on any platform.

Because it is an interpretor, Java executes much slower than executable machine code (which is the usual form of browser plugins and software applications).

Thus, Java cannot execute the processing intensive FFT or DCT decompression techniques fast enough to provide real time, high quality audio.

Since the decompression routine of the invention can be applied in a Java environment, this allows Web sites, for example, to provide executable applets which can be downloaded and played by an end user with a Java-enabled browser or operating system. These executable applets can also be delivered to end users in other ways such as via Email or Banner Ads.

Using executable applets, the end user does not have to obtain application software or browser plugins to listen to the content. Thus, the end user does not have to address issues of compatibility with his platform and the format of the audio content being downloaded, inconvenience of obtaining upgrades, and possibly requiring many different software packages to address different audio formats. In the preferred embodiment of the invention, decompression software is packaged with the audio content so the end user simply downloads an executable file.

The preferred embodiment of the invention described hereinafter also provides further advantages over the prior art.

Detailed Description of Preferred Embodiments of the Invention The preferred embodiment of the invention is presented in by means of the block diagram in Figure 2, and the flow charts of Figures 3 through 5. Figure 2 identifies the relevant parties in a transaction of the preferred embodiment of the

invention, while the specific processing steps are presented in detail in of Figures 3 through 5.

In the preferred embodiment, the invention is applied to an Internet and Web site environment. The owner of a Web site (the"purchaser") can purchase the software of the invention and use it to compress sound clips, and post them on his Web site. These compressed sound clips are packaged with an execution module and are presented on purchaser's Web site in the form of an icon. When an end user clicks on the icon, the audio file is streamed to the end user's computer or similar device, and immediately begins to play.

Figure 2 presents an exemplary layout of an Internet communications system 30 in a preferred embodiment of the invention. Generally, the Internet 32 is described as a system of routers interconnected by an Internet backbone network, which allows two parties to communicate via whatever entities happen to be interconnected at any particular time. However, it would be known to one skilled in the art that the Internet 32 is far more complex, consisting of a vast interconnection of computers, servers, routers, computer networks and public telecommunication networks.

End users 34 may access the Internet 32 in a number of manners including modulating and demodulating data signals over a telephone line using audio frequencies, which requires a modem and connection to the Public Switched Telephone Network, which in turn connects to the Internet 32 via an Internet Service Provider 36. Another manner of connection is the use of set top boxes which modulate and demodulate data onto high frequencies which pass over existing telephone or television cable networks and are connected directly to the Internet 32 via Hi-Speed Internet Service Providers. Generally, these high frequency signals are transmitted outside the frequencies of existing services passing over these telephone or television cable networks.

An end user 34 may also obtain access to the Internet 32, using a digital cellular telephone, pager, or personal digital assistant.

Internet Service Providers (ISPs) 36 or Internet Access Providers (lAPs), are companies that provide access to the Internet. ISPs 36 are considered by some to be distinguished from lAPs in that they also provide content and services to their subscribers, but in the context of this disclosure the distinction is irrelevant. For a monthly fee, ISPs 36 generally provider end users with the necessary software, user name, password and physical access. Equipped with a telephone line modem or set

top box, one can then log on to the Internet 32 and browse the World Wide Web, and send and receive e-mail.

Web servers 38,40 are computers which provide text, graphic or multimedia content, or software applications, to other parties over the Internet 32. In the discussion of the invention which follows hereinafter, the purchaser 42 will obtain software from the compressor/decompressor software server 38, and the purchaser's Web site 40 will provide audio files and other content to the end users 34. The interactions between these parties will become clear from the discussion whichfollows.

Of course, the invention may be applied to almost any communication network known in the art, and may be applied to a system of several different networks working together. Such networks could include : wireless networks such as cellular telephone networks, the public switched telephone network, cable television networks, the Internet, ATM networks, frame relay networks, local area networks (LANs) and wide area networks (WANs).

File Generation Interface In the preferred embodiment of the invention, the owner of a Web site simply obtains a copy of the encoding software electronically, and uses it to generate executable audio applets. These audio applets can then be included with any Web page, linked to it using a graphic icon. End users 34 who visit these Web pages execute the audio applets by clicking on these icons.

A sales and implementation process has been created to take advantage of the immediacy and familiarity of the Internet 32. Generally, purchasers 42 will visit the compression software Web site 38, and be prompted audibly through the Web site's testing and purchase processes while at the same time demonstrating the product. This intuitive process will generate the executable applet and HTML code automatically for the purchaser 42.

Referring to the flow charts of Figures 3A, 3B, 3C and 3D, the process of obtaining the software of the invention and generating executable audio files, proceeds as follows : First, the purchaser 42 visits one of the Web sites 38 which makes the software of the invention available, at step 50 of Figure 3A. This Web page may be

located using a search engine, clicking on a hypertext link from another Web page, or using some other method known in the art.

On the main Web page of the compression software Web site 38, audio is played using the method of the invention, to greet the viewer and demonstrate the product. The purchaser 42 can choose to view additional samples or demonstrations, read about the product, view help files, download a free trial version of the software, purchase a software license and code key, or learn more about partners and partner opportunities.

After reviewing documentation and purchasing details on the compression software Web site 38, which may require clicking through several Web pages, the purchaser 42 may click on a"buy"button to effect the purchase of the compression software. Currently, the compressor software is available for both PC and Mac platforms, so the purchaser 42 will be able to select which of the two options he requires. In this manner, the download of the software may be effected at step 52.

As the software is received at the purchaser's computer or similar device, it will generally be stored in a non-volatile memory such as a hard drive. Once the downloading of the software has been completed, installation will begin at step 54.

As part of the installation process, the purchaser 42 will be presented with the details of the end user licensing agreement, and will be prompted to confirm agreement with the terms and conditions, by clicking on an"agree"button. If the purchaser 42 does not agree to these terms and conditions at step 56, then the install process is terminated at step 58, and the downloaded software is deleted.

Otherwise, the installation of the compressor software is completed at step 60.

With the software now installed on his computer, the purchaser 42 may now generate executable audio files. These files may be generated simply by entering a command line at step 62, which effects steps 64 and 66, or by accessing the user interface which initiates the more comprehensive and flexible process of steps 68 through 120.

If a simple command line instruction is entered by the purchaser 42, then the software will generate three files at step 64: 1. a data file with an extension of". 22" ; 2. a data file with an extension of". 44" ; and 3. a file of Java classes with the extension of". zip".

The data files with extensions of". 22" and". 44" represent compressed audio content suitable for downloading at streaming rates of 20kb and 32 kb respectively.

Of course, other rates could also be used as defaults, and certainly would change as the transfer rates of computer networks increase over time. The file with the". zip" extension is a package of all the Java classes needed to execute the audio decompression. More details on the nature of these files is included hereinafter.

Command line syntax of the ClipstreamTM compressor software is as follows : clipstream <filename. wav> <filename. mp3> rename. cda>lo <outputfolder> /l <listfile.txt> ie. clipstream sample. wav- This compresses sample. wav to the current folder clipstream sample. wav /o c:\temp - This compresses sample. wav to the output folder c : \temp clipstream sample. wav sample.mp3 /o c:\temp - This compresses sample. wav and sample. mp3 to the output folder c : \temp clipstream//list. txt- This compresses the files as outlined in the listtxt file. listfile. txt file syntax (the name can be anything as long as it is a txt file) <filename><outputfolder> <filename>outputfolder <filename> <outputfolder> <filename><outputfolder> ie : c:\test1\sample.wav c:\temp1 c : \test2\sample. mp3 c : \temp2 c:\test3\sample. cda c : \temp3 The command line functionality enables the compressor to be implemented automatically by other applications and be integrated into larger solutions. The command line processing would then end at step 66.

If the purchaser 42 has elected to employ the user interface, then control passes to step 68 where the compressor software is opened. The user will then be able to add or delete audio files from an editable list by selecting the files through a

standard File menu selection, or by dragging and dropping them from another application, at step 70 of Figure 3B. In the preferred embodiment, audio content may be provided to the compressor in a variety of audio formats including CD Audio, . WAV,. MP3, and. AIFF. These files can be created from a number of different software applications or derived from another source. The compressor software will convert these various formats into pulse code modulated (PCM), 16 bit, mono format before they are converted to the format of the invention. Multiple files of multiple formats can be compressed in a batch process or from the command line. The purchaser 42 then selects the directory or folder where they would like the compressor to output the compressed audio files at step 72.

When the purchaser 42 clicks on a"compress"button at step 74,. 22,. 44 and . zip files are generated for the selected content at step 76. As noted above, for each audio file that the compressor processes, it will output one data file with the extension of. 22 and one with the extension of. 44 representing the proprietary compression files at a 20kb streaming rate and 32 kb streaming rate respectively. In addition, each time the"compress"button is activated, a. zip file is created, which is a package of all the Java classes needed to execute the decompressor in a Web page or similar environment. This is far superior to other approaches which do not package the classes, and inherently complicate the creation and implementation process.

During the compression process, the compressor user interface provides a progress indicator, and will inform the purchaser 42 with a dialogue screen and audible indicator that the compression process is complete. The completion dialogue informs the purchaser 42 that the job is complete and queries whether they wish to visit their Web site to generate the necessary applet code to insert and enable the decompressor in their Web page, at step 78. If they decline, the compressed audio files remain on their system, and the routine is exited at step 80.

Otherwise, processing proceeds to step 82, where the purchaser's browser is opened and their Web page accessed at step 84.

The purchaser 42 is then prompted to either purchase a code key, or generate an executable applet at step 86. The code key is a unique alpha numeric string of characters which is used to bond the software to the purchaser 42. Many methods for generating such secure, random codes are known in the art.

Without the proper code key inserted into the applet code, the audio player will not be capable of being served, except on an end user's hard drive for testing

purposes. Code keys are valid for a single URL (universal resource locator, or domain name) or IP address (Internet Protocol address; the numerical equivalent to a given URL) and cannot be transferred from one to another.

The difficulty is that the purchaser 42 may have multiple servers on the same URL, each with a different IP address, or may use a proxy server. It is therefore desirable to have some flexibility to how the code keys are bound to the purchaser 42. Thus, code keys can be generated when given 1 of 3 possible input formats: 1. based on a URL, for example, www. domain. com. Typically, the URL will be the Web site on which the purchaser 42 is posting the executable applet. In this case, the code key program will try to support the URL, www. domain. com, and the IP address associated with it. If the lookup to the IP address fails, only www. domain. com will be supported; 2. based on the IP address of the purchaser's Web site, for example, 198.137.240.91. In this case, the code key program will produce a password that only supports the IP address entered, regardless if it is valid or not; or 3. based on either the URL, such as www. domain. com, or an IP address, such as 198.137.240.91. In this case, the code key program will try to find the IP address associated with www. domain. com from the Internet. if it finds it, it will produce a code key that will support www. domain. com and the IP address located (it will ignore the provided IP address). If the IP lookup fails, it will produce a code key that supports both the domain and IP provided. In the case of Intrants, this is the preferred format.

Thus, if the purchaser 42 wishes to purchase a code key, control passes to steps 88 and 90, where some manner of electronic commerce is executed and a code key is returned. Many such systems are known in the art, and do not limit the invention. It is preferred, of course, to return the code key to the purchaser 42 in an encrypted form, possibly by use of a public/private key pair.

Whether a code key is purchased or not, control then proceeds to step 92, where an applet generation page is generated. The applet generation page is a Javascript form which prompts the purchaser 42 to enter information about the executable file, beginning with the name of the compressed audio file. The purchaser 42 is also presented with various options, such as whether auto play is desired (auto play will download and execute an audio file automatically when an end user 34 visits a Web page), or auto loop (auto loop will continuously play an audio

file once an end user 34 clicks on it). These two options are presented in Figure 3C at steps 96 and 100, and can be toggled on and off at steps 98 and 102 respectively.

The purchaser 42 is then prompted to enter in his code key at step 104. If he is simply performing a test, he can enter in the code of 00000000 (eight zeros).

Next, the process will query whether the purchaser 42 wishes to"generate code", at step 106, and if so, he will be asked to confirm that all information is correct at step 108. If information is incorrect, it can be reviewed and corrected at step 110, and control will return to step 94. Once all information is correct, applet code is generated at step 112 of Figure 3D. A sample of the applet code is as follows : APPLET ARCHIVE="clipstream. zip"CODE="clipstream. class"ALT="The clipstream player"NAME="clipstream"WIDTH="87"HEIGHT="45"> <paramname="AudioStreamURL"value="sample"> <paramname="AutoPlay"value="false"> <paramname="AutoLoop"value="false"> <paramname="Key"value="00000000"> <HR> If you were using a Java-enabled browser, you would see the <a href="http ://www. clipstream. com/help/Java. html">clipstream</a> player instead of this paragraph. <HR> </APPLET> The purchaser 42 can now"select all"at step 114, and copy or cut the generated code at step 116, pasting it into his Web page and saving it at step 118.

All the necessary files can then be uploaded to the purchaser's Web server 40 at step 120. As outlined above, the compressor software creates all the files needed to be uploaded to the purchaser's Web server 40, including the. 22,. 44 and . zip files. In the simplest implementation, using the applet code above, all the files would be uploaded to the same directory or folder as the Web page which contains the applet code. To access the. 22,. 44 and. zip file from other directories, the applet code can be modified as such: <APPLET CODEBASE="http : llinsert fullpath to clipstreamfiles here" ARCHIVE="clipstream. zip" CODE="clipstream. class"ALT="The clipstream player"NAME="clipstream"WIDTH="87"HEIGHT="45">

Further modifications to the applet code can be made to address specific accessibility issues as they arise. Once uploaded, an icon representing the audio applet should be displayed on the respective Web page when next viewed.

This purchasing system provides many advantages over the systems known in the art, including the following : it is completely automated, so the company selling the software does not require a sales person or any live interaction with the purchaser 42; it is easily integrated into existing Web sites and processes; it has a simple, clean interface that is easy to understand and operate, both in the compressor product and in the player applet ; the purchaser 42 does not require knowledge of Java coding to create applets for his Web site; and it can be integrated into back end automated process for larger applications, using the command line functionality.

Compression As explained above, the method of the invention results in a executable file which can decompress a compressed audio file in a Java environment. In the preferred embodiment of the invention this is done using"asymmetric"compression and decompression, that is, only simple computations need to be performed during decompression on the end user's device, while more complex processing may be performed during compression. The power and speed of the processing during compression is not limiting because it need not be performed in real time, unlike the decompression.

The preferred embodiment of the compression routine is presented in the flow chart of Figures 4A and 4B.

First, at step 130 of Figure 4A, the values of parameters used in the compression are determined. These parameters include the following : desired block size for the bit rate of the data transfer, which is preferably 200 samples for 20kbps and 140 samples for 32kbps; bit allocation for past data position, which defaults to 7 bits; and amount of pre-emphasis desired, which defaults to 0.86. The amount of pre- emphasis desired, will vary with the type of audio (music or voice).

The relevance of these parameters will become clear from the detailed discussion which follows. Of course, if variations are made to the invention, different parameters may also have to be set.

Next, the audio signal can be prepared for compression by performing the followingsteps: 1. converting the input audio signal to 8,000 samples per second (8,000 kHz), of 16-bit monaural PCM digital data, at step 132. As noted above, the invention includes drivers to convert other audio formats such as CD Audio,. WAV, . MP3, and. AIFF to this format. The audio data may also be obtained using a microphone and sound card with an appropriate digitizer; 2. dividing PCM digital data up into blocks at step 134. As noted above, the default bock size will be 200 samples; given the sampling rate of 8kHz, a block size of 200 samples will therefore cover 25ms; and 3. applying pre-emphasis to the data at step 136. Pre-emphasis attenuates the low frequencies and enhances the higher frequencies in the audio signal, allowing the perceptually more significant higher frequencies to be recorded with greater dynamic range. Pre-emphasis filters are known in the art, and include, for example, the 50/15 us and CCITT J. 17 types. In the preferred embodiment, a simpler first-order FIR (finite impulse-response) filter is used: a' (k) = a (k)-0. 86 * a (k-1) where: a is the original signal ; a'is the pre-emphasised signal ; and 0.86 is the default weighting which can be changed for different audio content.

The executable compressed applet must identify the type of pre-emphasis used and the parameters with which it was applied so that the decompressor software can compensate for its effects.

Next, at step 138, an autoregression analysis is performed to predict a value for each given sample, based on the previous eight samples. The rationale is that adjacent audio samples will not vary from one another a great deal, in view of the sampling rate of 8,000 kHz. Thus, rather than storing a full data word for each sample, a value is predicted and only the variance from that predicted value (called the residue) will be stored.

This autoregression analysis is done using autocorrelation and Schur Recursion, but may also use other techniques such as Levinson-Durbin Recursion.

Schur Recursion is preferred for several reasons: on a sequential processor, Levinson-Durbin Recursion is about 25% faster than Schur Recursion, while on a SIMD (single-instruction multiple-data) architecture (such as that provided on an Intel MMX processor), Schur Recursion is about 40% faster than the most efficient Levinson-Durbin Recursion; Levinson-Durbin Recursion may result in coefficients with a broader range of values, requiring more bits to represent the same data; and Levinson-Durbin Recursion may be more sensitive to round-off errors, propagating them and building upon them.

The product of this analysis will be a set of eight coefficients for each block, which will predict what the value of each sample in the block will be. These coefficients are then quantized to 6,6,5,5,4,4,3 and 3 bits respectively, for a total of 36 bits altogether. The coefficients of the more recent data points receive greater weighting (hence, more bits) because there is greatly likelihood of them being representative of the signal being analysed.

To avoid sudden changes between blocks, the actual coefficients being used in the first quarter of the block are a linear interpolation of the previous block and the current block.

The quantized and interpolated coefficients are then used to generate a predicted signal. The error between the predicted signal and the actual sample is then calculated at step 142.

An additional compression step is now performed at step 144 which further reduces the data required by identifying repetition in the audio signal. This technique is referred to as"Past Data Lookup"in Figure 4A, and is effected as follows : first, an error block is divided into four sub-blocks, for example, dividing a 200 sample block into four sub-blocks, each containing 50 samples; a search of the past data is then performed to identify a set of data with the best correlation to a given sub-block ; when the best correlation is found, an optimal multiplication factor is calculated to scale the past data to match with the current sub-block as closely as possible. Hence, there are two parameters for each sub-block : the position, which identifies the starting point of the past data block to copy, and

the multiplier or scaling factor. The position is stored as 7 bits and the multiplier is quantized to 3 bits; and finally, the algorithm subtracts the scaled past data block from the sub-block being analysed, to yield an array of residual values.

The residual data is then further encoded, but first the entire block of data is scaled linearly by a"quantization step". This is done by scanning the residuals in a given block and finding the maximum Amplitude. The"quantization step"is defined as the maximum Amplitude divided by 32. Each sample is divided by the quantization step at step 146 of Figure 4B, and is rounded to the nearest integer.

Each quantized residual value is then encoded with a fixed Huffman table at step 148. The Huffman code is x number of consecutive 1s before a 0, where x is the quantized integer. If it is non-zero, an extra bit is used to indicate the sign, for example, Huffman codings would map onto residual values as follows : 0-> 0 100->+1 101->-1 1100-> +2 1101->-2 111111101->-7 The total bits required to store the residual is checked against the bit rate (for example, 20 kbps or 32 kbps), and the algorithm will keep adjusting the quantization step until the residual will fit into the desired bit rate.

However, in some rare cases, if the quantization step is too big the quality of the audio degrades. To handle this problem, if the quantization step is bigger than 1/3 of the maximum amplitude, it will down-sample the residuals by a factor of two, discarding 1 out of 2 sample values. There are two evenly spaced subsequences to choose from starting with samples 1 or 2.

The quantization step is stored as a 6-bit mapped table. The sub-sequence selection is stored in 2 bits: 00-> no downsampling 01-> downsampled and start with sample 1 10-> downsampled and start with sample 2 Finally, the compressor prepares the next block by updating its remembered "past data", the reconstructed residual. To make sure that the compressor and

decompressor work with the same residual, the compressor simulates the decompressor's steps until just before the Autoregression prediction stage. That is, it uses the decompressor's grainy approximation of the past, rather than its own more accurate version.

The invention uses predetermined Huffman coding tables within the compressor and decompressor. Adaptive Huffman coding or dynamic Huffman tables could be used, but generally, adaptive Huffman would consume too many CPU cycles to decompress, and dynamic Huffman tables would increase the amount of data to download.

Since now, uncompressed data is 16 bit, and compressed data will average 2-3 bits, the compression ratio in the preferred embodiment of the invention is very good.

The implementation of the preferred embodiment of the invention will typically require the following resources to generate compressed audio files (on a personal computer or PC): Java compatible browser; text editor software for inserting applet code; 28. 8 or faster Internet connection; operating system such as Windows 95,98, NT or above; sound card; Pentium 166Mhz CPU or better; 32MB RAM; and 500KB hard disk space; and on a Macintosh computer: Java compatible browser; text editor software for inserting applet code; 28. 8 or faster Internet connection; PowerPC Processor; MacOS 7. 5 or later ; 4MB application RAM; and 1 MB hard disk space.

Downloading and Executing an Audio File As noted above, there are two main methods in which the preferred embodiment of the invention would generally be implemented: as an icon on a Web

page, which is executed when an end user 34 clicks upon it, or as an applet which executes when anyone visits the Web page. In either case, the method presented in the flow chart of Figures 5A and 5B would be performed.

Execution of this method generally requires that the end user 34 have the following : a device capable of connecting to the Internet; a Java compatible browser, email reader or other Java compatible software application ; sound generating hardware such as a computer sound card; and sound producing hardware such as speakers or headphones.

It is also necessary to have a Java applet residing on an Internet Web page server, for the end user 34 to download.

The routine begins at step 160 of Figure 5A, after the end user 34 has initiated execution of an applet in either of the manners noted above. In this step, an auto detection function is performed, as known in the art, to determined the available bandwidth capability of the end user 34. The applet uses this information to determine which of the previously stored data files should be downloaded, providing the best quality for the bandwidth available.

The compressed audio file is then streamed to the end user 34 at step 162, using the undocumented but commonly used streaming facility in Java."Streaming" is the process of beginning to play the content before all of the content has been downloaded. In the method of the invention, content can be decompressed and played as soon as the first sub-block has been received; meanwhile, the balance of the compressed content can be downloaded and decompressed in the background.

The decompression module simply buffers data as it is received and decompressed (that is, the decompressed data is simply stored in a single buffer on the end user's computer). As the sound card on the end user's computer requires data, it simply calls the single buffer for data. When the decompression module reaches the end of the buffer, it simply re-indexes to the beginning of the buffer and continues writing data as it is received from the Internet. The decompressor stops writing to the buffer when the buffer is full, so it will not write over data that has not yet been read.

Decoding of the compressed audio file is performed in a manner that is complementary to the compression routine. To begin with, Huffman decoding is performed using a table that was downloaded with the executable applet, at step

164. Next, the quantized residual values are multiplied by the quantization step appropriate to the particular sub-block being expanded at step 166, and they are expanded back into 50 samples, zero-padding the gaps if it was down-sampled.

Past Data Lookups are then reconstructed at step 168, by cutting out a 50-sample segment from the old estimated Past Data residual signal, scaling it by the multiplier and adding it to the residual. The resulting residual becomes part of the Past Data source.

The autoregression analysis is then reversed at step 170, which reconstructs the original 16-bit PCM signal once de-emphasis is performed at step 172. Both the autoregression and de-emphasis steps are performed by simple multiplication and addition operations.

This de-emphasized signal is then sent to the local sound card at step 174 of Figure 5B which will convert it to an audible signal, completing the routine. The sound card will have a software driver which uses a standard interface; by transferring 8,000 sample per second data to the sound card driver, the sound card will output an audio signal without having to take any further measures to synchronize the sound with real time. The header of the audio applet also includes identification of the 8,000 sample per second format, which may be used to initialize a sound card, if necessary.

To summarize, the method of the invention provides a number of marketable advantages including the following : compressed, music quality audio can be streamed over the existing Internet using a standard 28.8 modem and telephone line ; asymmetric compression provides sufficiently fast decompression, to make a platform independent, Java implementation possible; no plugins or players are required on the end user's device, so there are no difficulties with system compatibility, or having to purchase or upgrade software; a regular Web server can be used; special servers are not required; the method of the invention allows various sizes of compressed files to be created, so that the end user 34 can obtain an executable file that is optimal for the bandwidth of his network connection. Most compression systems are static, and the end user 34 has no choice of which audio file to download; the efficient compression of the preferred embodiment provides for an extremely small download;

the purchaser 42 does not have to perform any complex programming to compress his files or to generate executable applets which may be loaded onto his Web site. All programming code is generated automatically and the purchaser 42 only performs cutting and pasting; and abuse or piracy of audio applets is controlled by use of a code key which bonds the applet to a specific Internet IP address and/or URL.

The invention is not limited by the nature of the Web page being transmitted.

The invention could be used to insert simple banners into Web pages, or more sophisticated multimedia advertisements. As well, these advertisements could be sent along with real audio, real video, telephone over Internet, video conferencing over Internet, or other data and software applications.

While particular embodiments of the present invention have been shown and described, it is clear that changes and modifications may be made to such embodiments without departing from the true scope and spirit of the invention.

Portions of the invention could be implemented in part, in different applications. For example, the business model of the invention could be implemented using a slightly less efficient compression technique.

As well, the invention offers a compression technique that is particularly effective in view of the current trade-off between available resources and audio quality. It is expected that the bandwidth and speed of existing communication networks, and the available processing power on various computing platforms will continue to improve, thus the tradeoff curve will slowly shift. This will allow the method of the invention to be implemented with more resource intensive functions.

The method steps of the invention need not be implemented as Java code, but may be embodiment in sets of executable machine code stored in a variety of formats such as object code or source code. Clearly, the executable machine code may be integrated with the code of other programs, implemented as subroutines, by external program calls or by other techniques as known in the art.

The embodiments of the invention may be executed by a computer processor or similar device programmed in the manner of method steps, or may be executed by an electronic system which is provided with means for executing these steps.

Similarly, an electronic memory medium such computer diskettes, CD-Roms, Random Access Memory (RAM), Read Only Memory (ROM) or similar computer software storage media known in the art, may be programmed to execute such

method steps. As well, electronic signals representing these method steps may also be transmitted via a communication network.

The invention could be applied to all manner of appliances having computer or processor control and communication capability, including computers, smart terminals, lap top computers, personal digital assistants, telephones, cellular telephones, Internet-ready telephones, televisions, television set top units, and automobiles. Such implementations would be clear to one skilled in the art, and do not take away from the invention.

Previous Patent: FORWARD ERROR CORRECTION IN SPEECH CODING

Next Patent: HANDHELD PORTABLE INTERACTIVE DATA STORAGE DEVICE