reversible computation, entropy, heat dissipation, reversible algorithms, finite automata, computer architecture, retractile cascade, sorting
We describe a reversible Instruction Set Architecture using recently developed reversible logic design techniques. Such an architecture has the dual advantage of being able to run backwards and of being, in theory, implementable so as to dissipate less than log 2 kT joules per bit operation. We analyze several basic control structures and algorithms on the architecture, showing that, for example, a sorting algorithm need only dissipate O(n log n) bits even though it makes O(n2) comparisons.
The purpose of this paper is two-fold. First, we prove, by demonstration, that it is possible to design a processor in the electroid logic that executes a near-conventional instruction set in a reversible manner, requiring physical bit erasure only where implied by the logical operations; and secondly, we prove, again by demonstration, that it is possible to implement reversible programs on such an architecture, logically erasing bits only in the amount implied by the input-output function of the program.
The only problem with such a technique is that the inputs must remain asserted until after the output clocks have been retracted. A circuit that performs a logic function several layers deep must have a sequence of rising clocks, and then the reverse sequence of falling ones. Such a circuit is called a retractile cascade.
A retractile cascade adder is seen in Fig. 1. First the two input numbers A[0-n] and B[0-n] are asserted. Then clock LL1 is asserted; this energizes the first stage of each full adder, consisting of an XOR and an AND. Then clock CC is asserted, energizing the carry chain; finally clock LL2 is asserted, producing the sum.
If the sum is needed more than momentarily, and the adder or its inputs need to be vacated, it will be necessary to save the sum using the clock labelled LATCH. The sum would then presumably be moved using conservative techniques or erased dissipatively. However, it is interesting and important to note that as long as the inputs A and B are available, the sum can be erased non-dissipatively. Fig. 1 shows the clock sequence which accomplishes this. This technique can be used with any retractile cascade to erase as well as to produce the value given the inputs.
Retractile cascades can be used for virtually all the medium-scale components
of a standard computer architecture. [Hall 92]
shows a PLA, a barrel shifter, register set, and so forth, implemented
as retractile cascades. The register set (with a correction from the original,)
appears in figure 2.
![]() |
![]() |
Sources of irreversibility in conventional instruction sets, (as distinguished from actual architectures), are largely of two kinds. First, setting, zeroing, copying over data in registers or memory, implicitly erases the previous contents of the register or memory. Secondly, transfer of control, i.e. branch instructions, cause the implicit erasure of the previous program counter.
In theory, a branch or jump does not cause any information loss if there is no other predecessor instruction to its target. It is only the coalescing of control flow paths that causes logically necessary bit erasure, and then only enough bits to distinguish between the incoming paths. More to the point, if there is sufficient information in the program state to distinguish the paths, no bits have been logically erased at all.
The major burden of reversible instruction set design, then, is allowing the programmer to remove redundant but physically represented information which would otherwise have to be erased dissipatively. This is accomplished by a process we call "uncopying", an instruction- level equivalent to the retraction stage of a retractile logic operation.
For example, consider that register A contains the sum of registers B and C, as if from executing
ADD A, B, CWe can reset A to 0 (the default "empty" value) by
UNADD A, B, CThe UNADD operation is performed by exactly the same retractile adder which was used for the ADD, (Fig 1.) but using the second, or retraction, timing on the latch clock.
For abstract logical reversibility, ERASE could push the information onto a stack (or tape). With this proviso, the definition of the rest of the instruction set is such that the program could be run in reverse, inverting its function, reading from the tape or stack at each point an ERASE was to be executed in reverse.
For a more practical physical interpretation, we conjecture that computers will eventually operate at such efficiencies that bit erasure is the primary heat-generating mechanism (see Drexler[92]). Such a computer might have a thermostatically controlled clock, and frequency of bit erasure might be one of the more critical bounds on operating speed.
Simply for perspicuity and keeping irrelevant mechanism to a minimum, we will adopt the convention that a physical dissipation means is employed that dissipates only when a 1 bit is being erased (e.g., if 1 is represented by postitve voltage and 0 by ground voltage, and dissipation is by connecting the charge to ground through a resistance.) This convention allows us to execute ERASE on word-length data but count as erased only those bits which might have been 1.
For example, in the PDP-10-style effective address calculation, the operand of an ADD instruction might be the contents of a memory location whose address is the sum of the contents of a register and an immediate value from the instruction. In a RISC this would be done as several separate instructions, leaving the various intermediate values in registers to be cleaned up explicitly. An architecture with hardware effective address calculation, however, can perform the whole operation as a retractile cascade. This has the practical drawback that the cycle takes twice as long and cannot be pipelined, but it is clearly superior for pedagogical purposes and as a proof of concept.
ADD reg, regIn the standard form, the destination is the first argument; in the memory (ADDM) form, the last. ADD is a "reversible" instruction, meaning that the two-argument form is legal, and involves adding the second operand destructively to the first. Other reversible instructions are SUB, XOR, and EXCH.
ADD reg, reg, reg
ADD reg, addr(reg)
ADD reg, reg, addr(reg)
ADD reg, const
ADD reg, reg, const
ADD reg, const(reg)
ADD reg, const, addr(reg)
ADDM reg, reg
ADDM reg, reg, reg
ADDM reg, addr(reg)
ADDM reg, reg, addr(reg)
ADDM reg, const, addr(reg)
Instructions that are not reversible (bitwise AND and OR) may be used only in three-operand form. All three-operand instructions, reversible or not, require that the destination contain 0 before they are executed. This includes the 3-operand form of "reversible" instructions like ADD.
All three-operand instructions have inverses UNADD, UNOR, and so forth.
It is not allowed to use the same registers in an index position and as a destination in the same instruction.
There are separate instructions MOVE, EXCH, and COPY. MOVE and EXCH actually do the same thing (as does MOVEM); use of MOVE is a convention that indicates that one of the items is known to be 0 and the other is being "moved". The destination of COPY must be 0 beforehand; both source and destination contain the same value afterward. UN COPY is abbreviated UNC. All these instructions exist in the 2-address forms only.
There is a proliferation of jump instructions. The general form is (mod)J(comp) reg, opnd, addr. (mod) is A (annotate), I (increment), D (decrement), or empty. (comp) is GT, GE, EQ, LE, LT, NE, AB, OB, NAB, NOB, or empty. opnd is a register, an indexed address, or a constant.
The destination address of a jump must be the address of a Come-From instruction. The format of a Come-From is the same as a jump with CF instead of J. The function of the Come-From instruction is to uncopy the program counter left over from the Jump. To that end the "destination" of a CF must be the address of the corresponding J.
The conditionals attached to corresponding jump and come-froms are not necessarily, or even usually, the same. A come-from which can be "fallen into" from preceding code must use its condition to distinguish whether this has happened or it was jumped to. This is so the old PC can be uncopied or not, as appropriate.
From the point of view of logical reversibility, this requirement is the same as saying that if the state of the program after the jump is such as to distinguish whether the jump was made or not, no bit has logically been erased. In a more practically oriented interpretation, the come-from has the appropriate conditions to operate as an un-jump when executed in reverse.
If the control flow confluence does logically destroy bits, it is necessary to handle them explicitly by way of an annotated jump. For example, each jump to a come-from could have an entry in a table, and should indicate which one it was:
j3: aj w,2,confAfter the ACF, of course, the value in W remains, to be ERASEd or stored or whatever else the programmer wishes; but it has fewer bits than the raw PC would have had.
...
conf: acf tab(w)
...
tab: j1
j2
j3
...
And fianlly, the ERASE instruction, taking either a register or effective memory address as its single operand, has in the instruction set universe of discourse the effect of setting the operand location to 0, but with some meta-ISA-level interpretation as discussed above.
; for (i = 0; i <10, i++)
top: cfgt I, 0, bot
...
bot: ijlt I, 10, top
As long as the difference in data state which is reflected in the branch has not been obliterated by the intervening statements, the confluence can be accomplished using the same condition. However, a statement such as
if (a > 30) a = a-10;cannot be implemented reversibly; if a==25 after the statement, for example, there is no way to know whether it was 25 or 35 before.
A reversible if can be implemented
; if (a) b; else c;with the restriction that neither b nor c changes a.
top: jeq a,0,mid
;code for b
t2: j bot
mid: cf top
; code for c
bot: cfne a,0,t2
...
; on entry, the numbers are in A (memory area)
; the number of numbers is in N (register)
; inf is defined as the highest word value
; all other registers are 0copym inf,A(N) ; set sentinel value at end of list
copy I,N ; I points to bottom of sorted part
bol: cflt I,N,eol ; in the outer loop, I decreases
sub I,1
copy J,I ; J moves back up thru sorted part
move X,A(J) ; element being inserted
bil: cfgt J,I,eil ; J increases in inner loop
bx: jle X,A+1(J),ex ; compare insertee to list elements
move Y,A+1(J) ; not yet, move elements down
movem Y,A(J)
eil: ij J,bil ; end of inner loop
ex: cf bx ; only get here from bx jump
movem X,A(J) ; insert element
sub J,I ; an optimization: decrease no. bits in J
ERASE J ; < log N bits, happens N times
eol: djgt I,0,bol ; end of outer loop
uncm inf,A(N) ; clean up sentinel