DEVICE AND METHOD FOR ELIMINATING REDUNDANT STACK OPERATIONS

Title:

DEVICE AND METHOD FOR ELIMINATING REDUNDANT STACK OPERATIONS

Document Type and Number:

WIPO Patent Application WO/2001/069377

Kind Code:

A2

Abstract:

An apparatus stores state information corresponding to various stack items, indicative of the value of the stack items. The state information may be retained even if the stack items are popped from the stack. When a stack item is to be pushed to the same stack location, if the state information corresponding to the item previously in that location is still valid, the apparatus determines if the stack item being pushed is the same as the previous stack item. If so, the instruction to perform the push may be eliminated and the stack pointer may be adjusted. In one embodiment, the apparatus includes a code translator configured to translate from a stack-based instruction sequence to a register based instruction sequence. Registers are used to store one or more items at the top of the stack, and a stack transform is used to map the registers to stack locations. The stack transform may also be used to track state information for the stack items which the code translator is able to track. For example, in one embodiment, the code translator is able to track constants and variables from a local variable array. If the stack transform entry to be allocated to a push has valid state information, and that state information matches state information corresponding to the stack item being pushed, the push may be eliminated from the translated code sequence.

Inventors:

MCDONALD ROBERT G

Application Number:

PCT/US2001/007943

Publication Date:

September 20, 2001

Filing Date:

March 12, 2001

Export Citation:

Click for automatic bibliography generation Help

Assignee:

CHICORY SYSTEMS INC (US)

International Classes:

G06F9/318; (IPC1-7): G06F9/318; G06F9/30

Foreign References:

US5913047A	1999-06-15
US5859998A	1999-01-12
US5898885A	1999-04-27
EP0949564A2	1999-10-13

Attorney, Agent or Firm:

Merkel, Lawrence J. (Rose & Tayon P.C. P.O. Box 398 Austin, TX, US)
Harris, Ian Richard Et Al (Brinton Street Southampton, SO14 3EB, GB)

Download PDF:

View/Download PDF PDF Help

Claims:

WHAT IS CLAIMED IS:

1.

An apparatus comprising: a storage including at least a first entry configured to store state information indicative of a first stack item stored in a first storage location; and a control circuit coupled to said storage and to receive an indication of a first instruction, wherein said control circuit is configured to modify said state information in said first entry responsive to said first instruction having said first stack item as an operand.

2.	The apparatus as recited in claim 1 wherein said first storage location is a first register.

3.	The apparatus as recited in claim 2 wherein said first entry is further configured to store a first register index identifying said first register.

4.	The apparatus as recited in claim 1 wherein said state information includes at least one valid indication and an identifier indicative of said first stack item.

5.	The apparatus as recited in claim 4 wherein said control circuit is further configured to maintain a pointer indicative of which one of a plurality of entries, including said first entry, of said storage is storing said state information corresponding to a top of stack item.

6.	The apparatus as recited in claim 5 wherein said first stack item is a constant, and wherein said at least one valid indication includes a constant valid indication, and wherein said identifier is said constant.

7.	The apparatus as recited in claim 6 wherein said control circuit is configured to place said constant valid indication into a valid state responsive to said first instruction having said constant as a source operand and said first storage location as a destination operand.

8.	The apparatus as recited in claim 7 wherein said first instruction has said first storage location as a destination operand if said pointer indicates that said first entry is next to be allocated and said first instruction includes a push operation.

9.

The apparatus as recited in claim 6 wherein said control circuit is configured to inhibit generation of a second instruction having said constant as a source operand responsive to: (i) said pointer indicating that said first entry is next to be allocated, (ii) said constant valid indication being in a valid state, and (iii) said second instruction including a push operation to push said constant.

10.	The apparatus as recited in claim 5 wherein said first stack item is a variable stored in a memory location, and wherein said at least one valid indication includes a variable valid indication, and wherein said identifier is an indication of said memory location.

11.	The apparatus as recited in claim 10 wherein said indication of said memory location is an index into a local variable array.

12.	The apparatus as recited in claim 10 wherein said control circuit is configured to place said variable valid indication into a valid state responsive to said first instruction having said variable as a source operand and said first storage location as a destination operand.

13.	The apparatus as recited in claim 12 wherein said first instruction has said first storage location as a destination operand if said pointer indicates that said first entry is next to be allocated and said first instruction includes a push operation.

14.

The apparatus as recited in claim 10 wherein said control circuit is configured to inhibit generation of a second instruction having said variable as a source operand responsive to: (i) said pointer indicating that said first entry is next to be allocated, (ii) said variable valid indication being in a valid state, and (iii) said second instruction including a push operation to push said variable.

15.	The apparatus as recited in claim 10 wherein said control circuit is configured to place said variable valid indication into a valid state responsive to said first instruction having said first storage location as a source operand and said variable as a destination operand.

16.	The apparatus as recited in claim 4 wherein said control circuit does not change said at least one valid indication to an invalid state if said first stack item is popped.

17.	The apparatus as recited in claim 4 wherein, if said first stack item is not predictable by said control circuit, said control circuit is configured to place said at least one valid indication in an invalid state.

18.	A method comprising: receiving a first instruction having a first stack item stored in a first storage location as an operand; and recording state information indicative of said first stack item in a first entry of a storage responsive to said receiving.

19.	The method as recited in claim 18 wherein said state information includes at least one valid indication and an identifier indicative of said first stack item.

20.	The method as recited in claim 19 further comprising maintaining a pointer indicative of which one of a plurality of entries, including said first entry, of said storage is storing said state information corresponding to a top of stack item.

21.

The method as recited in claim 20 wherein said first stack item is a constant, and wherein said at least one valid indication includes a constant valid indication, and wherein said recording comprises, responsive to said first instruction having said constant as a source operand and said first storage location as a destination operand: placing said constant valid indication into a valid state; and storing said constant as said identifier.

22.

The method as recited in claim 21 further comprising: receiving a second instruction having said first stack item stored in said first storage location as an operand; and inhibiting generation of said second instruction responsive to: (i) said pointer indicating that said first entry is next to be allocated, (ii) said constant valid indication being in a valid state, and (iii) said second instruction including a push operation to push said constant.

23.

The method as recited in claim 20 wherein said first stack item is a variable stored in a memory location, and wherein said at least one valid indication includes a variable valid indication, and wherein said recording comprises, responsive to said first instruction having said variable as a source operand and said first storage location as a destination operand: placing said variable valid indication into a valid state; and storing an indication of said memory location as said identifier.

24.	The method as recited in claim 23 wherein said indication of said memory location is an index to a local variable array.

25.

The method as recited in claim 23 further comprising: receiving a second instruction having said first stack item stored in said first storage location as an operand; and inhibiting generation of said second instruction responsive to: (i) said pointer indicating that said first entry is next to be allocated, (ii) said variable valid indication being in a valid state, and (iii) said second instruction including a push operation to push said variable.

26.

The method as recited in claim 20 wherein said first stack item is a variable stored in a memory location, and wherein said at least one valid indication includes a variable valid indication, and wherein said recording comprises, responsive to said first instruction having said first storage location as a source operand and said variable as a destination operand: placing said variable valid indication into a valid state; and storing an indication of said memory location as said identifier.

Description:

TITLE: DEVICE AND METHOD FOR ELIMINATING REDUNDANT STACK OPERATIONS BACKGROUND OF THE INVENTION 1. Field of the Invention This invention is related to computer systems and, more particularly, to optimization of stack-based code sequences.

2. Description of the Related Art Java programs have become quite popular in recent years, particularly in view of the popularity of the Internet. A Java program is a program written to the Java language specification, and executes on a Java virtual machine (JVM). A JVM is an abstract computing machine which may be supported on any hardware platform (employing any suitable operating system and a native instruction set defined via any of a variety of architectures, e. g. x86, PowerPC, ARM, Alpha, etc.), and thus a Java program may execute on a variety of different hardware platforms. Thus, a Java program may be written and made available for download on the Internet, and the Java program may be executed on any hardware platform which supports the JVM The Java language is one example of a stack-based programming and storage model. In a stack-based programming and storage model, many of the operands of instructions may be read from an operand stack, typically in memory. An instruction set having a stack-based programming and storage model may enjoy some advantages. For example, since the stack is typically in memory, the number of operand storage locations may be large and not fixed. Additionally, explicit operand information may not be required in each instruction, thus shortening the instruction length. Instead, operands are taken from the top of the stack. A stack is a last-in, first- out (LIFO) storage mechanism. Each value in the stack (which is added to or deleted from the stack as a unit) is referred to herein as a"stack item". An item is added to the stack via a"push"operation in which the stack item is placed at the top of the stack (and the stack item previously at the top of the stack becomes next to the top, etc.). A stack item at the top of the stack is removed from the stack via a"pop"operation (and the stack item which was previously next to the top of the stack becomes the top of stack). As used herein, the term"operand"refers to an input value or output value of an instruction. Input values operated upon by the instruction are"source operands" and output values generated by the instruction are"destination operands".

Unfortunately, instruction code written to a stack-based programming and storage model may also have disadvantages. For example, source operands used by an instruction are typically popped from the stack during execution of the instruction. Thus, if a particular source operand is used repeatedly in a code sequence, that particular source operand must be repeatedly pushed onto the stack at the appropriate points. Code efficiency (the number of instructions used to perform a given task) may therefore suffer. A mechanism for improving the efficiency of a code sequence using a stack-based programming and storage model is therefore desired.

SUMMARY OF THE INVENTION The problems outlined above are in large part solved by an apparatus and method as described herein. The apparatus stores state information corresponding to various stack items, indicative of the value of the stack items.

The state information may be retained even if the stack items are popped from the stack. When a stack item is to be pushed to the same stack location, if the state information corresponding to the item previously in that location is still valid, the apparatus determines if the stack item being pushed is the same as the previous stack item. If so, the instruction to perform the push may be eliminated and the stack pointer may be adjusted. In this manner, the instruction sequence may be more efficient than the original code sequence.

In one embodiment, the apparatus includes a code translator configured to translate from a stack-based instruction sequence to a register based instruction sequence. Registers are used to store one or more items at the top of the stack, and a stack transform is used to map the registers to stack locations. The stack transform may also be used to track state information for the stack items which the code translator is able to track. For example, in one embodiment, the code translator is able to track constants and variables from a local variable array. If the stack transform entry to be allocated to a push has valid state information, and that state information matches state information corresponding to the stack item being pushed, the push may be eliminated from the translated code sequence.

Broadly speaking, an apparatus is contemplated. The apparatus includes a storage and a control circuit coupled to the storage. The storage includes at least a first entry configured to store state information indicative of a first stack item stored in a first storage location. The control circuit is coupled to receive an indication of a first instruction, and is configured to modify the state information in the first entry responsive to the first instruction having the first stack item as an operand.

Additionally, a method is contemplated. A first instruction having a first stack item stored in a first storage location as an operand is received. State information indicative of the first stack item is recorded in a first entry of a storage responsive to the receiving.

BRIEF DESCRIPTION OF THE DRAWINGS Other objects and advantages of the invention will become apparent upon reading the following detailed description and upon reference to the accompanying drawings in which: Fig. 1 is a block diagram of a system.

Fig. 1A is a block diagram of a second embodiment of a system.

Fig. 2 is a block diagram illustrating a stack-based programming and storage model and a register-based programming and storage model.

Fig. 3 is a block diagram of one embodiment of a stack transform shown in Fig. 2.

Fig. 4 is a flowchart illustrating operation of one embodiment of a code translator for push operations.

Fig. 5 is a flowchart illustrating operation of one embodiment of the code translator for pop operations.

Fig. 6 is a flowchart illustrating additional operations of one embodiment of the code translator.

Fig. 7 is a first example of an input code sequence and an output code sequence.

Fig. 8 is a block diagram illustrating a state of a stack transform at various points in the output code sequence of Fig. 7.

Fig. 9 is a second example of an input code sequence and an output code sequence.

Fig. 10 is a block diagram illustrating a state of a stack transform at various points in the output code sequence of Fig. 9.

Fig. 11 is a third example of an input code sequence and an output code sequence.

Fig. 12 is a block diagram illustrating a state of a stack transform at various points in the output code sequence of Fig. 11.

Fig. 13 is a block diagram illustrating certain details of one embodiment of the code translator.

Fig. 14 is a block diagram of a frame including a local variable array and an operand stack.

While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Overview Turning now to Fig. 1, a block diagram of one embodiment of a system 10 is shown. Other embodiments are possible and contemplated. The illustrated system 10 includes a central processing unit (CPU) 12, a memory controller 14, a memory 16, a Peripheral Component Interconnect (PCI) bridge 18, a PCI bus 20, a code translator 22, and an interrupt controller 24. CPU 12 is coupled to PCI bridge 18, memory controller 14, and interrupt controller 24. Memory controller 14 is further coupled to memory 16. PCI bridge 18 is further coupled to PCI bus 20. Code translator 22 is coupled to interrupt controller 24 and to PCI bus 20. In the illustrated embodiment, code translator 22 includes a source address register 26, a target address register 28, a control register 30, and a status register 32. In one embodiment, CPU 12, memory controller 14, and PCI bridge 18 may be integrated onto a single chip or into a package as illustrated by the dotted line surrounding these components in Fig. 1 (although other embodiments may provide these components separately).

Generally, CPU 12 is capable of executing instructions defined in a first instruction set (the native instruction set of system 10). The native instruction set may be any instruction set, e. g. the ARM instruction set, the PowerPC instruction set, the x86 instruction set, the Alpha instruction set, etc. Code translator 22 is provided for translating code sequences coded using a second instruction set, different from the native instruction set, to a code sequence coded using the native instruction set. Code sequences coded using the second instruction set are referred to as"non-native"code sequences, and code sequences coded using the first instruction set of CPU 12 are referred to as"native"code sequences.

When CPU 12 detects that a non-native code sequence is to be executed, CPU 12: (i) stores the source address of the non-native code sequence in source address register 26; (ii) stores the target address at which code translator 22 is to write the translated code sequence in target address register 28; and (iii) stores a command in control register 30 to activate code translator 22. In response to the command, code translator 22 reads the non- native code sequence from the source address, translates the non-native code sequence to a native code sequence, and stores the native code sequence at the target address. Code translator 22 provides a status for the translation in status register 32, and signals CPU 12 that the translation is complete. In the present embodiment, for example, code translator 22 may assert an interrupt signal to interrupt controller 24, which may subsequently interrupt CPU

12. CPU 12 may access interrupt controller 24 to determine the source of the interrupt (e. g. interrupt controller 24 may be coupled to PCI bus 20). In response to determining that the source of the interrupt is code translator 22, CPU 12 may read status register 32 to ensure that no errors occurred during the translation, and may then execute the native code sequence stored at the target address. It is noted that the source and target addresses are addresses identifying memory locations within the memory 16. As an alternative to interrupts, CPU 12 may poll status register 32 or another register (not shown) to determine the status of the translation. As an alternative to receiving the target address in target address register 28, code translator 22 may maintain a cache of translated code sequences (internally or in a designated area of memory 16) and may provide the target address of the routine upon completion of the translation.

In one embodiment, code translator 22 is configured to translate Java code sequences to the native instruction set. Thus, Java bytecodes will be used as an example of a non-native instruction set below. However, the techniques described below may be used with any non-native instruction set. Additionally, the Java instruction set uses a stack-based programming and storage model, while the native instruction set may use a register-based programming and storage model. The techniques described below for converting between the Java instruction set and the register-based native instruction set are applicable to converting any other stack-based instruction set. As used herein, the terms"register-based programming and storage model"or"register-based instruction set"refer to a model or instruction set in which operands for instructions are stored in a set of registers defined by the architecture. Each register is identified via a register index, and the register indexes are coded into the instructions to specify the operands of the instructions. Operand fetch for instructions in a register-based instruction set are then generally reads of the registers, typically implemented within the CPU. Register-based instruction sets often use explicit load/store instructions to load operands from memory locations to registers for subsequent instructions to use as operands and to store results from registers to memory locations. Furthermore, the term"instruction set"as used herein refers to a group of instructions defined by a particular architecture. Each instruction in the instruction set may be assigned an opcode which differentiates the instruction from other instructions in the instruction set, and the operands and behavior of the instruction are defined by the instruction set. Thus, Java bytecodes are instructions within the instruction set specified by the Java language specification, and the term bytecode and instruction will be used interchangeably herein when discussing Java bytecodes. Similarly, ARM instructions are instructions specified in the ARM instruction set, PowerPC instructions are instructions specified in the PowerPC instruction set, etc.

Since code translator 22 may translate from a stack-based instruction set to a register-based instruction set, code translator 22 may include hardware for translating the stack references in the stack-based instruction set to register indexes in the register-based instruction set. More particularly, a subset (or"pool") of the registers may be reserved to store stack operands. Code translator 22 may assign register indexes as stack items are pushed onto the stack, and may use those register indexes as source operands for instructions which reference the stack. After the stack items are popped from the stack, the corresponding registers may be free for use for another stack item pushed onto the stack. Thus, the register pool may store the topmost items on the stack, and memory may be used for lower items (as will be described in more detail below). The register-based instruction set may be most efficient at accessing operands in registers (since loads and stores may be needed to read the values from memory), and thus keeping items at the top of the stack in registers may enhance performance.

As an alternative to reserving the pool of registers, code translator 22 may be configured to statically or dynamically allocate registers from the register set of CPU 12 into the register pool. Code translator 22 may generate native instructions to store the registers selected for the pool to a scratchpad memory area (preserving the values in the selected registers), and then these registers may be used to store stack items. In a static embodiment, the entire pool of registers may be allocated at the beginning of a translated code sequence. In a dynamic embodiment, registers may be allocated to the pool as additional registers are needed during the translation. At the end of the translated code sequence, code translator 22 may insert instructions to restore the values of these registers by reading the values from the scratchpad memory area (after storing the items to the operand stack).

In one embodiment, code translator 22 may translate instructions beginning at the source address and up to a basic block boundary in response to being activated by CPU 12. Generally, instructions within a basic block are not branch instructions (e. g. conditional or unconditional branches, call or return instructions, etc.). Once a basic block is entered, each instruction in the basic block is executed. The basic block boundary is formed by an branch instruction. Upon translating the branch instruction, code translator 22 may update status register 32 and terminate the translation. Other embodiments may employ branch prediction and speculatively translate instructions past the basic block boundary based on the branch prediction. If the branch prediction is incorrect, the speculative translation may be discarded.

In another embodiment, code translator 22 may translate instructions through an unconditional branch, stopping translation when a conditional branch instruction or the end of the code sequence is encountered. The unconditional branch instruction may be deleted from the translated code sequence ("folded out") and the instructions at the target address of the unconditional branch instruction may be inserted in-line in the translated code (sequential to the instructions translated from the code preceding the unconditional branch instruction). Such an embodiment may further provide speculative translation beyond conditional branches, as mentioned above.

Additionally, code translator 22 may limit the total number of instructions translated before terminating translation.

The total number may be the number of source instructions (e. g. non-native instructions) or the number of target instructions (e. g. native instructions). Alternatively, the number of bytes may be limited (and may be either the number of bytes of source instructions or the number or bytes of target instructions). The limit on the number of bytes/instructions may be programmable in a configuration register of code translator 22 (not shown). In one particular implementation, for example, a maximum size of 64 or 128 bytes of translated code may be programmably selected.

Generally, CPU 12 executes native code sequences and controls other portions of the system in response to the native code sequences. More particularly, CPU 12 may execute the JVM for system 10, including the interpreter mode to handle exception conditions detected by code translator 22. The JVM executed by CPU 12 may include all of the standard features of a JVM and may further include code to activate code translator 22 when a Java code sequence is to be executed, and to jump to the translated code after code translator 22 completes the translation. Code translator 22 may insert a return instruction to the JVM at the end of each translated sequence.

CPU 12 may further execute the operating system code for system 10, as well as any native application code that may be included in system 10.

Memory controller 14 receives memory read and write operations from CPU 12 and PCI bridge 18 and performs these read and write operations to memory 16. It is noted that some of the read and write operations

presented by PCI bridge 18 may be read and write operations generated by code translator 22 (e. g. read operations from the source address and subsequent addresses and write operations to the target address and subsequent addresses). Memory 16 may comprise any suitable type of memory, including SRAM, DRAM, SDRAM, RDRAM, or any other type of memory.

PCI bridge 18 facilitates communication between PCI bus 20 and memory controller 14 or CPU 12. More particularly, source address register 26, target address register 28, control register 30, and status register 32 may be memory-mapped registers. PCI bridge 18 may detect read or write operations to the addresses to which the registers are mapped, and transmit those operations on PCI bus 20 to code translator 22. As mentioned above, PCI bridge 18 may also detect read and write operations from code translator 22 to memory 16 on PCI bus 20 and may transmit those operations to memory controller 14.

Interrupt controller 24 generally receives interrupt signals from code translator 22 and other devices within system 10 (not shown), and prioritizes the interrupts received. If one or more interrupts have been signalled, interrupt controller 24 may assert the interrupt signal to CPU 12. CPU 12 may then access interrupt controller 24 to determine the source of the highest priority pending interrupt, and may service that interrupt.

It is noted that, while the PCI bus is used as an exemplary peripheral bus in the embodiment of Fig. 1, any other bus may be used. For example, the Universal Serial Bus (USB), IEEE 1394 bus, the Industry Standard Architecture (ISA) or Enhanced ISA (EISA) bus, the Personal Computer Memory Card International Association (PCMCIA) bus, etc. may be used. Still further, the Advanced RISC Machines (ARM) Advanced Microcontroller Bus Architecture (AMBA) bus, including the Advanced High-Performance (AHB) and/or Advanced System Bus (ASB) may be used, as may the Handspring Interconnect specified by Handspring, Inc. (Mountain View, CA). Still further, code translator 22 may be connected to memory 16 using a Unified Memory Architecture connection (e. g. see Fig. 1A). The bus used for the Unified Memory Architecture connection may be any of the above buses In other alternatives, code translator 22 may be directly connected to CPU 12 or memory 16, or may be integrated into CPU 12, memory controller 14, or PCI bridge 18.

In other embodiments, interrupt controller 24 may be deleted and code translator 22 may assert an interrupt signal directly to CPU 12. Still further, other embodiments may employ semaphores in memory for communication between CPU 12 and code translator 22. Code translator 22 may provide one or more registers that may be polled for CPU 12 to determine the status of a translation, or CPU 12 may poll status register 32 to determine the status. Any technique for communicating code sequences to be translated and completion of the translation may be used.

Turning now to Fig. 2, a diagram illustrating a stack-based programming and storage model (e. g. the Java programming and storage model) and a register-based programming and storage model (e. g. the programming and storage model of CPU 12) when executing translated code sequences is shown. In the stack-based programming and storage model, a stack 60 is stored in memory 16. A top of stack (TOS) pointer is maintained by the JVM which identifies the memory location storing the stack item which is at the top of the stack. The TOS pointer may be more succinctly referred to as the stack pointer, and is stored in a register 62. Register 62 may be one of the registers in the register set of CPU 12, for a JVM implemented as native code operating on CPU 12. As items are pushed onto the stack, the items are stored into memory locations contiguous to the memory location indicated by the stack pointer, and the stack pointer is updated to indicate the new top of stack. As items are popped from the

stack, the stack pointer is updated to indicate the new top of stack (e. g. if item S (0) is popped in Fig. 2, the stack pointer is updated to indicate S (1)).

The stack 60 is represented in the register-based based programming and storage model (after the corresponding code is translated by code translator 22) by a register pool 64, a stack transform 66, a top of stack transform (TOST) pointer 67, a stack 68, and a memory top of stack (MTOS) pointer 70. As Fig. 2 illustrates, a portion of the top of stack 60 are stored in registers within register pool 64 (which is a subset of the registers included in CPU 12). Accordingly, access to the operands at the top of the stack may be efficient in a register- based programming and storage model, since these operands are stored in registers. Operands further down the stack are stored in stack 68, with the MTOS pointer 70 indicating the top of the stack 68 (e. g. item S (4) in this example, where items S (0) through S (3) are stored in registers). It is noted that MTOS pointer 70 may be stored in another register within the register set, outside of the register pool 64. More particularly, MTOS pointer 70 may be stored in register 62, and may be the stack pointer prior to entry into the translated code, in one embodiment. In such an embodiment, updates to the stack pointer register stored in register 62 may be deferred until the translated code sequence is terminated. As items are pushed onto the stack, registers from register pool 64 are allocated to store the items. As items are popped from the stack, the registers storing those items become free for allocation during a subsequent push.

Code translator 22 manages the register pool 64 and assigns register indexes for the operands of instructions in the translated Java code sequences based on which registers are storing the top of stack operands.

Code translator 22 maintains stack transform 66, which maps the stack locations of stack 60 to the register indexes of the registers in register pool 64 assigned to those stack locations. For example, register 72 in Fig. 2 is storing the top stack item S (0). Stack transform 66 thus provides the register index identifying register 72 for instructions needing the top of stack value as an operand. Stack transform 66 may include multiple entries, each capable of identifying a stack item. TOST pointer 67 may indicate the entry corresponding to the top of stack item, and other entries below the entry corresponding to the top of stack item may identify other stack items in the order they were pushed. Entries above the entry corresponding to the top of stack item are available for use in identifying additional pushes, and may be used to store state information regarding former stack items, as described in more detail below.

Stack transform 66 may additionally store state information indicative of a stack item stored in the corresponding register, for cases in which code translator 22 is able to determine information about the stack item.

Generally speaking, state information may be any information which is indicative of the value of the stack item corresponding to the entry. For example, if a constant is pushed onto the stack, the code translator 22 may be able to determine the value of the constant. The constant may be specified as an immediate field of the instruction.

Additionally, the constant may be implicit in the opcode (e. g. the icons0, iconst 1, and similar instructions of the Java instruction set). In another example from the Java instruction set, a variable from a local variable array may be pushed onto the stack (or a stack item may be popped into the variable). The variable is identified via an index into the local variable array. The index may be an immediate field of the instruction, or may be implicit in the opcode (such as the iload0, iloadl, and other similar instructions). Thus, the index of the variable may be determinable by code translator 22. In other embodiments, the address of the variable may be determinable and may be stored as state information for the variable.

By tracking state information for stack items, code translator 22 may be able to eliminate instructions from the translated code sequence if the corresponding instructions in the source code sequence are provided to push previously popped stack items back onto the stack. Since the stack item generally still exists in the stack location after being popped from the stack, the push overwrites the stack location with the same value. In the case of translation of the code sequence with code translator 22, the stack item being pushed is still stored in the register previously assigned to that stack item, if the register has not been reused for another stack item. Code translator 22 may adjust the stack pointer but otherwise eliminate the instruction from the translated code sequence. Thus, inefficiencies in the source instruction sequence which are encountered due to the stack-based programming and storage model may be eliminated from the translated code sequence, and thus the translated code sequence may be more efficient than the source code sequence. While the elimination of inefficiencies during the translation of a code sequence is described as an example of the technique, the technique may also be used to eliminate inefficiencies during execution of a stack-based code sequence as well.

Code translator 22 may also handle the overflow and underflow of register pool 64. If a push is detected in the code sequence and all the registers in register pool 64 are storing stack items (overflow), one or more registers in the register pool may be freed by pushing the values in those registers onto stack 68. More particularly, the registers storing the stack items farthest from the top of the stack may be freed in this fashion. Code translator 22 may automatically generate the store instructions (to be executed by CPU 12) to push the values onto stack 68 and free the registers (storing these instructions at the target address along with the instructions representing the translated Java code sequence), or may generate an interrupt to CPU 12 and have an interrupt service routine provide the instructions to push the values to memory. Alternatively, code translator 22 may monitor the number of free registers available and, when the number is less than a threshold value, free some of the registers by pushing their contents to stack 68. Similarly, if the registers in register pool 64 are storing no stack items (underflow), code translator 22 may generate instructions to load the values from the top of stack 68 into the registers (or may use an interrupt to CPU 12 similar to the above description).

Eliminating Redundant Operations Turning now to Fig. 3, a block diagram of an exemplary embodiment of stack transform 66 is shown.

Other embodiments are possible and contemplated. In the embodiment of Fig. 3, stack transform 66 is a storage which includes multiple entries 80A-80N. Each entry 80A-80N includes two valid indications (a constant valid indication, labeled CV and a variable valid indication labeled W), an identifier field, and a register index field.

For example, entry 80A includes a constant valid indication 82, a variable valid indication 84, an identifier field 86, and a register index field 88. TOST pointer 67 indicates which entry 80A-80N corresponds to the current top of stack item. In the example of Fig. 3, entry 80D is indicated as corresponding the top of stack item. Thus, entry 80E may correspond to the next to top of stack item, entry 80F may correspond to the third from top of stack item, etc., through entry 80N. Alternatively, a bottom of stack transform (BOST) pointer may be employed to indicate the entry corresponding to the bottom of the stack transform and entries"between"the TOST pointer and the BOST pointer may correspond to stack items. Entries 80A, 80B, and 80C may not be storing stack items currently on the stack. However, the state information recorded in entries 80A-80C is still retained and may be used to eliminate a subsequent push of the same stack item back onto the stack at the same stack location.

For the embodiment of Fig. 3, the constant valid indication, the variable valid indication, and the identifier field may be an example of state information stored for each stack item. If code translator 22 determines that a particular stack item is a constant, code translator 22 may place the constant valid indication in a valid state. If code translator 22 determines that a particular stack item is a variable, code translator 22 may place the variable valid indication in the valid state. More particularly in embodiments in which the source instruction set is the Java instruction set, code translator 22 may determine that a stack item is a variable if the variable is stored in a local variable array within the current frame. Generally, the valid indications may take any suitable form. For example, each of the constant valid indication and the variable valid indication may be a bit. The valid state may be the bit being set, and the invalid state may be the bit being clear. Alternatively, the valid state may be the bit being clear, and the invalid state may be the bit being set. Any other encoding for a valid indication (including multiple bit encodings) may be used. For the remainder of this disclosure, the constant valid indication may be referred to as a constant valid bit and the variable valid indication may be referred to as a variable valid bit, with the valid state of each being set and the invalid state being clear. However, it is understood that any form of valid indication may instead be used.

In addition to setting a valid bit if the stack item is determined to be either constant or variable, code translator 22 updates the identifier field of the entry with information identifying the stack item. If the stack item is a constant, the identifier field is updated with the value of the constant. If the stack item is a variable, the identifier field is updated with the index of the variable within the local variable array. As used herein, the term"index", when used to refer to the identifier of a variable, refers to the value identifying the variable within a predefined block of memory (e. g. the local variable array). The address of the variable may be generated via a base address of the block of memory and the index. The identifier field may be any suitable size. However, in one embodiment, the identifier field may be 8 bits (one byte).

In the embodiment shown, each entry 80A-80N also includes a register index field which stores the register index of the register assigned to store the stack item. While the present embodiment includes the register index field, other embodiments may track register indexes using a separate structure, if desired. It is noted that the storage comprising stack transform 66 may be registers, a RAM array, or any other suitable storage.

Turning next to Fig. 4, a flowchart illustrating operation of one embodiment of code translator 22 in response to a push operation in the code sequence being translated is shown. Other embodiments are possible and contemplated. While the operations shown in Fig. 4 are illustrated in a particular order for ease of understanding, any order which provides similar operation may be used. Furthermore, various operations may be performed in parallel in the combinatorial logic circuitry which may implement the operations. Any circuitry producing the equivalent operation to that shown in Fig. 4 is contemplated. It is further noted that the operations illustrated in Fig.

4 show operation of code translator 22 in response to a push. Code translator 22 may handle multiple stack pushes and pops in parallel. Each push may be handled in the fashion illustrated in Fig. 4. Information indicating modification of the top of stack by prior pushes and pops being handled in parallel may be used to locate the appropriate entries in stack transform 66. Still further, the operations shown in Fig. 4 are related to maintaining state information for each stack item. Code translator 22 may perform additional operations in response to the instructions from embodiment to embodiment, as desired.

The discussion below refers to checking and modifying state information in a stack transform entry. The

stack transform entry referred to is the entry of stack transform 66 allocated for the stack item being pushed. Thus, the stack transform entry is the stack transform entry next to be allocated (prior to processing the push), as indicated by the TOST pointer. In the case of processing multiple pushes in parallel, the next-to-be-allocated stack transform entry is the stack transform entry as indicated by the TOST pointer modified to reflect the effects of prior pushes and pops being handled in parallel.

Code translator 22 determines if the push is predictable (decision block 90). In other words, code translator 22 determines if state information corresponding to the stack item being pushed is determinable by code translator 22. In one embodiment, for example, code translator 22 collects state information for constants and variables from the local variable array. Other types of pushes (e. g. results of instructions using stack operands) may not be predictable and thus state information may not be collected. If the push is not predictable, code translator 22 may clear the valid bits in the stack transform entry (operation 92). Additionally, the native instruction (s) corresponding to the push operation may be generated.

If the push is predictable (decision block 90), code translator 22 determines if the stack item being pushed is a constant (decision block 94). If the stack item is a constant, code translator 22 considers the current state of the constant valid bit in the stack transform entry. If the constant valid bit is set (decision block 96), code translator 22 compares the constant recorded in the identifier field of the stack transform entry to the constant being pushed (decision block 98). If the constants match, then the push is redundant and may be eliminated from the translated instruction sequence. In this case, code translator 22 may inhibit generation of native instruction (s) to perform the push (operation 100). Additionally, the TOST pointer may be updated to indicate that the stack transform entry now corresponds to the top of stack item. Furthermore, the architected stack pointer may be updated. In an embodiment which accumulates stack pointer updates and updates the stack pointer register at the end of the translated code sequence, the push may be noted in the accumulated stack pointer update.

On the other hand, if the constant valid bit in the stack transform entry is not set or the constants do not match, code translator 22 may generate native instruction (s) to perform the push (e. g. copying the constant into the register index assigned to the stack item-operation 102). Additionally, code translator 22 may set the identifier field in the stack transform entry to reflect the constant being pushed and may set the constant valid bit. Code translator 22 adjusts the TOST pointer to indicate the stack transform entry. Additionally, code translator 22 may adjust the architected stack pointer, similar to the above description.

Returning to decision block 94, if the stack item being pushed is predictable but is not a constant, the stack item is a variable from the local variable array in the present embodiment. In this case, code translator 22 considers the current state of the variable valid bit in the stack transform entry. If the variable valid bit is set (decision block 104), code translator 22 compares the index recorded in the identifier field of the stack transform entry to the index of the variable being pushed (decision block 106). If the indexes match, then the push is redundant and may be eliminated from the translated instruction sequence. In this case, code translator 22 may inhibit generation of native instruction (s) to perform the push (operation 108). Additionally, the TOST pointer may be updated to indicate that the stack transform entry now corresponds to the top of stack item. Furthermore, the architected stack pointer may be updated (similar to the above description).

On the other hand, if the variable valid bit in the stack transform entry is not set or the indexes do not match, code translator 22 may generate native instruction (s) to perform the push (e. g. loading the variable into the

register index assigned to the stack item-operation 110). Additionally, code translator 22 may set the identifier field in the stack transform entry to reflect the index of the variable being pushed and may set the variable valid bit.

Code translator 22 adjusts the TOST pointer to indicate the stack transform entry. Additionally, code translator 22 may adjust the architected stack pointer, similar to the above description.

It is noted that, in the present embodiment, the constant valid bit and variable valid bit are mutually exclusive. Accordingly, if code translator 22 sets the variable valid bit in a stack transform entry, code translator 22 may also ensure that the constant valid bit is clear. Similarly, if code translator 22 sets the constant valid bit in a stack transform entry, code translator 22 may also ensure that the variable valid bit is clear. It is further noted that, in one implementation, blocks 96-102 and blocks 104-110 may be performed in parallel in the combinatorial logic circuitry within code translator 22 and the value for the identifier field may be muxed between the constant and the index.

Turning now to Fig. 5, a flowchart illustrating operation of one embodiment of code translator 22 in response to a pop operation in the code sequence being translated is shown. Other embodiments are possible and contemplated. While the operations shown in Fig. 5 are illustrated in a particular order for ease of understanding, any order which provides similar operation may be used. Furthermore, various operations may be performed in parallel in the combinatorial logic circuitry which may implement the operations. Any circuitry producing the equivalent operation to that shown in Fig. 5 is contemplated. It is further noted that the operations illustrated in Fig.

5 show operation of code translator 22 in response to a pop. Code translator 22 may handle multiple stack pushes and pops in parallel. Each pop may be handled in the fashion illustrated in Fig. 5. Information indicating modification of the top of stack by prior pushes and pops being handled in parallel may be used to locate the appropriate entries in stack transform 66. Still further, the operations shown in Fig. 5 are related to maintaining state information for each stack item. Code translator 22 may perform additional operations in response to the instructions from embodiment to embodiment, as desired.

Code translator 22 determines if the pop operation is storing the stack item being popped to a local variable (decision block 112). If the pop operation is not storing the stack item being popped to a local variable, the state information for the stack transform entry corresponding to the popped stack item is unmodified. On the other hand, if the pop operation is storing the stack item to a local variable, code translator 22 may record the index of the local variable in the identifier field of the stack transform entry corresponding to the popped stack item and may set the corresponding variable valid bit (operation 114). Additionally, the TOST pointer may be updated to pop the stack transform entry from the stack transform (operation 116) and the architected stack pointer may be similarly updated. It is noted that popping the stack transform entry from the stack transform is performed without modifying the state information in the stack transform entry (other than as specified in operation 114). In this manner, the state information is retained for detecting subsequent pushes which push the same stack item back into the same location on the stack.

By capturing the popping of a stack item into a local variable in the state information, code translator 22 may be able to eliminate a subsequent push of the same local variable back onto the stack. It is noted that, while Fig. 5 illustrates setting the variable valid bit and recording the index for any pop, it is contemplated that one embodiment may not modify the state information if the constant valid bit in the stack transform entry is already set. Thus, in a case in which a constant is stored to multiple local variables (e. g. the exemplary code sequence

shown in Fig. 7), the constant is recorded and reused while the variable is not. Other embodiments may not inhibit modifying the state information for this case.

Turning now to Fig. 6, a flowchart illustrating operation of one embodiment of code translator 22 for maintaining the state information when other stack items are being pushed, popped, or otherwise modified is shown. Other embodiments are possible and contemplated. While the operations shown in Fig. 6 are illustrated in a particular order for ease of understanding, any order which provides similar operation may be used. Furthermore, various operations may be performed in parallel in the combinatorial logic circuitry which may implement the operations. Any circuitry producing the equivalent operation to that shown in Fig. 6 is contemplated.

Code translator 22 determines if translation is being terminated (decision block 118). As described above with respect to Fig. 1, code translator 22 may terminate translation for a variety of causes, depending upon the embodiment. If the translation is being terminated, code translator 22 clears the constant and variable valid bits in stack transform 66 (operation 120). Code translator 22 may clear the valid bits because the next code sequence to be translated may have no correlation to the current code sequence. Furthermore, the stack may be modified prior to translating another code sequence, and thus the information recorded in stack transform 66 may be incorrect.

Additionally, valid stack items still stored in registers may be stored to the memory stack during the termination, as mentioned above.

Code translator 22 additionally monitors the code sequence being translated for stores to local variables whose indexes are recorded in the state information of any entry. If the local variable is modified, then the previous value stored in the register and represented by state information including the local variable index may now be invalid (i. e. no longer represents the value stored in the local variable). Accordingly, if a store to a local variable index recorded in state information of a stack transform entry is detected (decision block 122), then the corresponding variable valid bit in that stack transform entry is cleared (operation 124).

Additionally, code translator 22 in the present embodiment may detect if the register index is reused for another stack item and corresponding stack transform entry (decision block 126). If the register index is reused, then the contents of the register are changed and thus the instruction to push a value back onto the stack is performed in this case. A register index may get reused, for example, if the register contents are spilled to the stack to free register indexes for additional pushes of stack items. If a register index is reused, code translator 22 clears the valid bits in the corresponding stack transform entry (operation 128).

It is noted that reference numerals 118-120,122-124, and 126-128 each represent independent operations which may be performed in parallel and/or independently and/or in any order.

Figs. 7-12 comprise exemplary source code sequence, the resulting translated code sequences (including the instructions eliminated via the above described mechanisms), and illustrations of several stack transform entries at various points during the code sequence. The resulting translated code sequences are also referred to below as target code sequences. The source instruction sequences in these examples use mnemonics from the Java instruction set, although any stack-based instruction set may be used. The target instruction sequences are shown using generic mnemonics assuming a load/store register-based instruction set. Any target instruction set may be used as well. In the examples, four stack transform entries are shown with the value of the constant valid bit (CV), variable valid bit (W), identifier field (Id), and register index (RI) for each entry. The TOST pointer is illustrated by the arrow to the left of each transform block, pointing at the top of stack transform entry.

Turning now to Figs. 7 and 8, a first exemplary source code sequence 130 including instructions 130A- 130H, a first exemplary target code sequence 132 including instructions 132A-132E (resulting from translation of code sequence 130 by code translator 22), and stack transform contents 134A-134C are shown. Code sequence 130 includes a push of a constant zero (icons0) prior to each of four stores to local variables (astore). The number next to each astore instruction is the variable index of the local variable updated for that instruction. Thus, source code sequence 130 could be used to initialize each of the local variables at indexes 1-4 with a value of zero. Since the store to each local variable pops the value stored from the stack, the constant zero is reloaded onto the stack before each store instruction is performed.

In the target code sequence 132, instruction 132A corresponds to instruction 130A. A load immediate (li) instruction is provided which loads an immediate field (the constant zero) into the register assigned to the stack item pushed by instruction 130A. The register selected for this example is register 1 (rl). Instruction 132B is a store instruction storing the contents of rl to an address formed from rbase (a register storing the base address of the local variable array, which may be specified to code translator 22) and a displacement of 1 (the variable index).

Similarly, instructions 132C-132E each store the contents of rl to the address formed from rbase and the other variable indexes (2,3, and 4 respectively). Thus, each of local variable indexes 1-4 receive the value 0, and target code sequence 132 accomplishes the same function as source code sequence 130. However, since code translator 22 detects that the same constant is being pushed for each of instructions 130A, 130C, 130E, and 130G, code translator 22 eliminates instructions 130C, 130E, and 130G from the target code sequence 132. Instead, as indicated by the bracketed phrases in the target code sequence, code translator 22 updates the TOST pointer (and the architected stack pointer) to reflect the effects of the eliminated pushes but does not generate instructions to store the same value into rl.

The stack transform contents 134A are the stack transform contents after processing the first icons 0 instruction 130A. Thus, the stack transform entry indicated as the top of stack entry has its constant valid (CV) bit set and the identifier field is set to 0 (the constant pushed by instruction 130A). The register index field (RI) is set to 1 to indicate rl. The stack transform contents 134B are the stack transform contents after processing instruction 130B. Since instruction 130B pops the top of stack item, the TOST pointer indicates the next entry down from the entry corresponding to the constant. However, the state information remains valid in the popped entry (CV is set, the identifier field still stores 0). When processing instruction 130C, code translator 22 detects that the stack transform entry to be allocated has its constant valid bit set and the constant is zero (the same as instruction 130C).

Therefore, code translator 22 inhibits generation of an instruction corresponding to instruction 130C in translated code sequence 132, but updates the TOST pointer. The stack transform contents after processing instruction 130C are illustrated as reference numeral 134C. Processing of instruction 130D yields stack transform contents similar to reference numeral 134B, and processing of instruction 130E yields stack transform contents similar to reference numeral 130C, etc.

Turning now to Figs. 9 and 10, a second exemplary source code sequence 136 including instructions 136A-136G, a second exemplary target code sequence 138 including instructions 138A-138F, and stack transform contents 140A-140F are shown. Source code sequence 136 includes pushes of two local variables at indexes 8 and 1 (instructions 136A and 136B, respectively) followed by an add of the two local variables (instruction 136C) and a store of the addition result to the local variable at index 2 (instruction 136D). Source code sequence 136 further

includes pushes of two local variables at indexes 9 and 1 (instructions 136E and 136F, respectively) followed by an add of the two local variables (instruction 136G). As will be clear in the stack transform contents, instruction 136F loads the same local variable index into the same stack location, and thus may be eliminated by code translator 22.

Accordingly, target code sequence 138 includes fewer instructions than source code sequence 136.

The stack transform contents 140A are the stack transform contents after processing instruction 136A.

Accordingly, the variable valid bit is set in the top of stack transform entry and the identifier is set to 8 (the index of the variable). Instruction 138A is generated, and the stack item is assigned to rl. The load instruction 138A forms the address from rbase (as described above with respect to Fig. 7) and the displacement 8 (the variable index).

Similarly, the stack transform contents 140B illustrate the stack transform contents after processing of instruction 136B, including a set variable valid bit for the stack item pushed by instruction 136B and an identifier of 1. The stack item is assigned to r2. Instruction 138B is generated, and forms the address of the local variable from rbase and the index 1.

The stack transform contents 140C are the stack transform contents after processing of iadd instruction 136C. The iadd instruction 136C pops two stack items, adds them, and pushes the result. The corresponding add instruction 138C thus adds the contents of registers rl and r2 and stores the result in r3 (the register assigned to the stack item pushed by the iadd instruction 136C). It is noted that the iadd instruction may alternatively have been assigned register rl. However, for clarity in the examples, the register index is changed each time the corresponding value is changed, even if the value is stored in the same stack storage location as a preceding value.

Since two pops and one push are performed, the TOST pointer moves down one entry between stack transform contents 140B and 140C. The variable valid bit for the top of stack transform entry is cleared since the result of the iadd instruction 136C is unpredictable by code translator 22.

The stack transform contents 140D are the stack transform contents after processing of istore instruction 136D. The store instruction pops the stack, and thus the TOST stack pointer moves down another entry with respect to stack contents 140C. Additionally, since the store is to a local variable index, the variable valid bit of the popped stack transform entry (reference numeral 200) is set and the identifier is set to the index 2. The corresponding store instruction is shown as instruction 138D.

The stack transform contents 140E are the stack transform contents after processing of instruction 136E.

A push is performed, so the TOST pointer has moved up an entry. Additionally, the variable valid bit and index in the top of stack transform entry reflect local variable index 9. The resulting load instruction 138E is included in code sequence 138.

When code translator 22 processes instruction 136F, code translator 22 detects that the stack transform entry to be allocated to the instruction (reference numeral 202) has the variable valid bit set. The index from the identifier field is 1, which matches the index of instruction 136F. Accordingly, code translator 22 inhibits generation of an instruction in target code sequence 138 corresponding to instruction 136F. The TOST pointer is incremented, resulting in stack transform contents 140F. The translation of iadd instruction 136G to add instruction 138F proceeds using stack transform contents 140F.

Turning now to Figs. 11 and 12, a third exemplary source code sequence 142 including instructions 142A- 142G, a third exemplary target code sequence 144 including instructions 144A-144F, and stack transform contents 146A-146F are shown. Code sequence 142 includes pushes of two local variables at indexes 1 and 2 (instructions

142A and 142B, respectively) followed by an add of the two local variables (instruction 142C) and a store of the addition result to the local variable at index 3 (instruction 142D). After the store, code sequence 142 repushes the local variable from index 3 (which was popped by the store) (instruction 142E). The local variable from index 4 is pushed (instruction 142F) and the two local variables are added (instruction 142G). Instruction 142E loads the same local variable index into the same stack location as was popped via the store of the stack location to the local variable, and thus may be eliminated by code translator 22. Accordingly, target code sequence 144 includes fewer instructions than source code sequence 142.

The stack transform contents 146A are the stack transform contents after processing instruction 142A.

Accordingly, the variable valid bit is set in the top of stack transform entry and the identifier is set to 1 (the index of the variable). Instruction 144A is generated, and the stack item is assigned to rl. The load instruction 144A forms the address from rbase (as described above with respect to Fig. 7) and the displacement 1 (the variable index).

Similarly, the stack transform contents 146B illustrate the stack transform contents after processing of instruction 142B, including a set variable valid bit for the stack item pushed by instruction 142B and an index of 2. The stack item is assigned to r2. Instruction 144B is generated, and forms the address of the local variable from rbase and the displacement 2.

The stack transform contents 146C are the stack transform contents after processing of iadd instruction 142C. The iadd instruction 142C pops two stack items, adds them, and pushes the result. The corresponding add instruction 144C thus adds the contents of registers rl and r2 and stores the result in r3 (the register assigned to the stack item pushed by the iadd instruction 142C). It is noted that the iadd instruction may alternatively have been assigned register rl. However, for clarity in the examples, the register index is changed each time the corresponding value is changed, even if the value is stored in the same stack storage location as a preceding value.

Since two pops and one push are performed, the TOST pointer moves down one entry between stack transform contents 146B and 146C. The variable valid bit for the top of stack transform entry is cleared since the result of the iadd instruction 142C is unpredictable by code translator 22.

When code translator 22 processes instruction 142D, code translator 22 detects that the top of stack item is being stored to a local variable. Accordingly, code translator 22 sets the variable valid bit in the popped stack transform entry (reference numeral 204) and records the index of the variable to which the stack item is stored (the index is 3 in this example). Stack transform contents 146D illustrate the resulting stack transform contents.

Instruction 144D is generated corresponding to instruction 142D.

When code translator 22 processes instruction 142E, code translator 22 detects that the stack transform entry to be allocated (reference numeral 204) has its variable valid bit set and the index in the identifier field is 3.

Since instruction 142E pushes the local variable at index 3, code translator 22 inhibits generation of a target instruction in translated code sequence 144 corresponding to instruction 142E and increments the TOST pointer.

Stack transform contents 146E illustrate the resulting stack transform contents.

The stack transform contents 146F are the stack transform contents after processing instruction 142F.

Accordingly, the variable valid bit is set in the top of stack transform entry and the identifier is set to 4 (the index of the variable). Instruction 144E is generated, and the stack item is assigned to r4. The iadd instruction 142G is translated using the stack transform contents 146F, resulting in add instruction 144F.

It is noted that, while the discussion of the examples if Figs. 7-12 treated each instruction one at a time,

instructions may be processed by code translator 22 in parallel. Additionally, while each source instruction translated to one target instruction in the exemplary code sequences shown in Figs. 7-12, other instructions may translate to more than one target instruction.

Exemplary Implementation of Code Translator Turning next to Fig. 13, a block diagram of one embodiment of code translator 22 is shown. Other embodiments are possible and contemplated. In the embodiment of Fig. 13, code translator 22 includes a PCI interface unit 150, a fetch unit 152, a translate unit 154, a write unit 156, source address register 26 (within fetch unit 152), target address register 28 (within write unit 156), control register 30, and status register 32. PCI interface unit 150 is coupled to fetch unit 152, translate unit 154, write unit 156, control register 30, status register 32, source address register 26, target address register 28, PCI bus 20, and an interrupt line 158. Fetch unit 152 is further coupled to translate unit 154 and control register 30. Translate unit 154 is further coupled to status register 32 and write unit 156. Translate unit 154 includes stack transform 66, TOST pointer 67, and control circuit 155.

Generally speaking, in response to a"go"command written into control register 30, fetch unit 152 is configured to begin fetching source instructions. Fetch unit 152 may initiate fetching from the source address, and may receive fetch addresses from translate unit 154. Additionally, fetch unit 152 may be configured to prefetch addresses ahead of translate unit 154 requests, if desired. Translate unit 154 may translate the source instructions into target instructions. Translate unit 154 translates the stack operand references of the source instructions into register indexes for the target instructions, recording which register indexes correspond to which stack items in stack transform 66. The target instructions, with register operand assignments, are then provided to write unit 156.

Write unit 156 provides the target instructions to PCI interface unit 150 along with a target address (initially the address in target address register 156 and subsequently incremented as instructions are stored out). PCI interface unit 150 writes the translated instructions to memory 16 via PCI bus 20. Once the translation of the code sequence is stopped (e. g. an exception condition or basic block boundary is detected, the maximum translation size is reached, etc.), translate unit 154 updates status register 32 with the status of the translation. Additionally, translate unit 154 may generate a return instruction to the JVM. Responsive to the status being updated (and subsequent to completing the write commands from write unit 156), PCI interface unit 150 may assert the interrupt signal on interrupt line 158 to interrupt CPU 12 for execution of the translated code sequence.

Additionally, control circuit 155 may be configured to track state information for various stack items, as described above. For example, control circuit 155 may include circuitry performing the operations illustrated in Figs. 4-6. While control circuit 155, stack transform 66, and TOST pointer 67 are illustrated within translate unit 154 in the embodiment of Fig. 13, these elements may be located in any suitable part of code translator 22.

Furthermore, other embodiments may have a different structure, but may still include control circuit 155, stack transform 66, and TOST pointer 67 at an appropriate location therein.

For the description of portions of one embodiment of code translator 22 provided with respect to Fig. 13, the terms"source instructions"and"target instructions"will be used to refer to instructions fetched by code translator 22 and generated by code translator 22, respectively. For a system embodiment similar to system 10 shown in Fig. 1, source instructions may be non-native instructions (e. g. Java bytecodes), and target instructions may be native instructions for CPU 12.

As mentioned above with respect to Fig. 1, while PCI interface unit 150 is shown in the present embodiment, other embodiments may use any suitable external interface. For example, the Universal Serial Bus (USB), IEEE 1394 bus, the Industry Standard Architecture (ISA) or Enhanced ISA (EISA) bus, the Personal Computer Memory Card International Association (PCMCIA) bus, etc. may be used. Still further, the Advanced RISC Machines (ARM) Advanced Microcontroller Bus Architecture (AMBA) bus, including the Advanced High- Performance (AHB) and/or Advanced System Bus (ASB) may be used, as may the Handspring Interconnect specified by Handspring, Inc. (Mountain View, CA).

Turning next to Fig. 14, a block diagram of one embodiment of a frame 170 is shown according to one definition of the Java architecture. Frame 170 includes an operand stack 172 and a local variable array 174.

Operand stack 172 stores the stack operands used by instructions, and thus corresponds to stack 60 shown in Fig. 2.

Local variable array 174 is the local variable array described above, into which an index is used to locate a particular desired local variable. Frame 170 is associated with a particular method, and is allocated by the JVM when the method is invoked and destroyed thereafter.

Numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.

Previous Patent: METHOD AND APPARATUS FOR PROCESSOR CODE OPTIMIZATION USING CODE COMPRESSION

Next Patent: METHOD AND APPARATUS FOR ENHANCING THE PERFORMANCE OF A PIPELINED DATA PROCESSOR