ACM Communications in Computer Algebra, Vol. 45, No. 4, Issue 178, December 2011
Ways to implement computer algebra compactly
David R. Stoutemyer∗
Abstract
Computer algebra had to be implemented compactly to fit on early personal computers
and hand-held calculators. Compact implementation is still important for portable hand-held
devices. Also, compact implementation increases comprehensibility while decreasing development
and maintenance time and cost, regardless of the platform. This article describes several
ways to achieve compact implementations, including:
• Exploit evaluation followed by interpolation to avoid implementing a parser, such as in
PicoMathtm.
• Use contiguous storage as an expression stack to avoid garbage collection and pointerspace
overhead, such as in Calculus Demontm and TI-Math-Engine.
• Use various techniques for saving code space for linked-storage representation of expressions
and functions, such as in muMathtm and DeriveR
1 Introduction
“Inside every large program is a small program struggling to emerge.”
– Hoare’s law of large programs.
This article is a written version of a PowerPoint presentation at the 2008 Applications of Computer
Algebra conference.
Fortran was my first programming language, around 1965. I was disappointed when I learned
that it could only substitute numbers into formulas, rather than also do algebraic transformations.
I was therefore delighted when I learned around 1973 that computer algebra programs existed, and I
began using Reduce and Macsyma. I wanted my engineering students to learn to use them, but these
systems tended to monopolize our time-shared university mainframe computer, quickly exhausting
my course computing budget. This was the cause of my long-standing effort to make computer
algebra available on personal computers that are free of time budgets and on robust inexpensive
hand-held calculators that can be conveniently carried for spontaneous use in all classrooms, at
home and while traveling.
At that time the available computer algebra systems barely fit on only the largest mainframes,
and RAM was so expensive that personal computers and calculators had very little address space
and often even less memory. Consequently, the most important design constraint was compactness
of the computer algebra implementation.
Traditional student mathematics problems don’t tend to entail enormous input or result expressions.
Therefore compactness of the data representation is less important. Because of the modest
∗dstout at hawaii dot edu
199
Implementing computer algebra compactly Vol. 45, No. 4, Issue 178, December 2011
problem size, speed is also less important. For example, it isn’t worth implementing an asymptotically
faster algorithm if it is more complicated, and it is even less worthwhile combining several
algorithms to achieve optimal speed all the way from the smallest through largest possible problem
sizes. However, compact expression size and surprisingly fast execution can sometimes be obtained
without sacrificing program compactness.
Along the way others and I discovered several quite different ways to implement computer
algebra compactly. This article is an effort to collect in one place a description of these techniques
and some references to relevant literature.
Although one purpose of this article is to record some history, this article is organized primarily
by the type of implementation, then chronologically within each type.
Compactness is still important despite the declining cost of computer memory and the (more
gradually) increasing speed of Internet connections, because:
• Maintenance cost and the time required to develop software grow super-linearly with program
size.
• Program cache faults can be less frequent for small programs.
• Carbon footprints grow with silicon footprints, secondary storage footprints and transmitted
file footprints. Estimates of electricity consumption associated with the Internet vary wildly
from 1% for servers and their cooling worldwide [16], through 9.4% in the USA including
also prorated client electricity costs [33]. Many programs and data files are stored on many
computers, run on many computers, are transmitted many times, and are printed many times.
Therefore the benefit of program compactness grows with the number of users.
Section 2 describes the typical constraints of early micro-computers and programmable calculators.
Section 3 describes two different methods that are especially compact but suitable only for
specialized classes of expressions. Section 4 describes three different methods that are better for
general expressions.
2 Early micro-computers & programmable calculators
In 1976 Cioni and Miola [4] described implementing parts of the modular SAC-1 computer algebra
system on a PDP-11 minicomputer having 64 kilobytes of memory with two hard disks, using
overlays. Minicomputers with hard disks were too expensive for purchase by individuals, but it
was a demonstration that useful computer algebra could be done in a relatively small amount of
memory.
The first mass-market personal computers were the Apple II, Commodore Pet, Radio Shack
Trs-80, and Atari 400 and 800, which all had 64 kilobytes of address space. Up to 25% of that was
devoted to ROM that provided interactive terminal I/O, loading of programs stored on secondary
storage, and an interpreted Basic programming language that wasn’t well suited to implementing
computer algebra. However, if the Basic wasn’t exploited to help implement computer algebra,
then the associated ROM was wasted address space, and precious RAM storage would have to be
used for the entire implementation and data.
RAM was so expensive that the entry-level versions of these computers came with only 4 kilobytes
to 16 kilobytes, and most users didn’t buy more. Four kilobytes was typically only enough for about
200
Stoutemyer
100 lines of Basic. Secondary storage was typically an audio cassette, which was too slow and
unreliable to serve as virtual memory.
Programmable calculators were even more spartan. There was typically enough memory for
about 100 steps of an interpreted assembly-like language that provided instruction mnemonics and
some preassigned symbolic address labels such as Lbl1, Lbl2, etc. Typically the one-line display
was only about 14 characters wide and incapable of displaying most characters other than what is
necessary for floating-point numbers. Sometimes there was a slot for ROM cartridges containing
programs purchased from the manufacturer or a third party. Usually programs and data could be
stored to and loaded from secondary storage consisting of a magnetic strip or a miniature magnetic
tape cartridge.
3 Special-purpose methods
A computer algebra system typically has a driver loop that repeatedly accepts inputs interactively
entered by a user, producing and displaying a corresponding simplified result for each input. Usually
there is:
• a parser that converts the user’s human-oriented input expression into an internal tree-like
form more suitable for mathematical transformations,
• a simplifier that transforms the parsed expression into a simplified equivalent internal form,
and
• a displayer that transforms the simplified internal form into a string or a two-dimensional
form more suitable for human comprehension.
The simplifier includes arithmetic, algebra, calculus, etc.
General expressions include at least rational expressions, fractional powers, exponentials, logarithms,
trigonometric functions and their inverses, together perhaps with vectors and matrices
having such expressions as entries.
Special-purpose methods are best suited for implementing only a narrow subset of general expressions,
such as polynomials, univariate rational expressions, or Maclauren series. However, such
a category is all that is needed for some applications, such as a web-based applet that exercises or
tests students with problems in a particular category.
3.1 Dense univariate series and polynomial algebra
It is easy to implement univariate polynomials, Maclauren series, or Fourier series as a onedimensional
array of floating-point coefficients. Even the 1965 IBM Fortran Scientific Subroutine
Package [14] had subroutines for operations such as adding, multiplying and integrating such polynomials
with floating-point coefficients. The user was responsible for writing a main program to
initialize or enter the input polynomial coefficients, then call an appropriate sequence of subroutines,
then display the result polynomial coefficients.
In 1977 Henrici [9] describes an analogous set of subroutines for univariate Maclauren series
on the HP 25 calculator, which had memory for only 49 steps, 8 direct-address registers, and a
4-register stack. The limited memory forced Henrici to use some ingenious indirect methods so that
each coefficient of a result could be computed independently.
201
Implementing computer algebra compactly Vol. 45, No. 4, Issue 178, December 2011
In 1977 Stoutemyer [25] described an analogous set of routines for the 224-step HP 67 and HP
97 calculators, using the more traditional algorithms described by Knuth [17]. All series were of
degree 9. Table 1 lists the number of program steps and the execution times for the implemented
operations, which are independent of the particular ten input coefficients for each polynomial.
Table 1: Size and speed of the HP-67 Maclauren-series operations
Operation Number of steps Seconds
subtraction 4 20
copying 5 10
negation 10 10
addition 14 10
substitute number 16 10
integration 16 10
multiplication 38 60
division 49 60
reversion 62 180
Knuth also gives algorithms for substituting one power series into another, raising a power series
to a real power, and computing the exponential or logarithm of a power series. However, there isn’t
enough space to fit them simultaneously with the above routines, so they would have to be overlaid
from secondary storage.
With a non-qwerty keyboard, entering coefficients alone in response to prompts is actually more
convenient than also entering intervening sub-strings of the form “. . . *x^. . . +”. Consequently
there is no strong need for parsing such tightly-scoped mathematical expressions.
The technique could be extended to multivariate polynomials by using multidimensional coeffi-
cient arrays or mapping them into 1-dimensional arrays, However, such dense distributed representation
is notoriously inefficient for most practical multivariate polynomial problems. Consequently,
that extension would be well beyond the point of diminishing returns for this parser-less coefficient
array method.
Though narrowly scoped, the two HP calculator series programs were interesting demonstrations
that useful computer algebra could be done with floating point arithmetic and 49 or 224 steps of
assembly language.
3.2 Computer algebra by evaluation and interpolation
It is all done by smoke and mirrors.
– Anne Brooke
With only four to sixteen kilobytes of RAM installed on most early commercial personal computers,
there was no choice for this market except to use a built-in Basic for implementation even
though those versions of Basic were not an attractive implementation language because:
• The entry-level Basic typically didn’t support string variables: All variables were global
floating point with non-mnemonic names A through Z.
• Subroutines couldn’t have parameters or local variables.
202
Stoutemyer
• Function definitions were limited to unconditional one-line expressions.
• One-dimensional and perhaps two-dimensional arrays of floating-point numbers were the only
non-scalar data structure.
Moreover, without string variables and string processing function such as substring(. . .), it is impractical
to write even a parser. Nonetheless, we can still do computer algebra by finessing it with
the evaluation-interpolation homomorphism:
Basic already has a built-in parser, but its inseparable simplifier expects all of the variables in
an expression to have assigned numeric values so that the final value and all intermediate values
are numeric. In 1980 PicoMath [27] exploited this by evaluating a user’s unsimplified expression
at a set of points, then interpolating a simplified expression through those points. PicoMath then
used character string constants such as “+” and “X” together with the interpolated coefficients to
display the simplified result expression.
Programma International published PicoMath for the TRS80, Apple II, Commodore Pet and
Atari 400 and 800. Texas Instruments published a ROM version for the TI 99/4. TI also planned
to release a ROM version for their TI-59 calculator, which was discontinued just before that release.
Arthur Norman adapted PicoMath for the Acorn BBS computer and Exidy Sorcerer, with the
improvement of recovering simple exact fraction coefficients by repeated quotients, remainders and
reciprocation with a tolerance. PicoMath also ran on the 1×7×17 cm Sharp PC-1211 or Radio-Shack
Pocket computer, which had 2 kilobytes of RAM.
PicoMath is actually four separate programs for simplifying expressions to four different classes
of results:
• The rational program can expand and reduce over a common denominator a univariate rational
expression in x, such as
1 +
1
x − 1
−
1
x + 1
+
2x
x
2 − 1
→
x + 1
x − 1
,
x
2 − 5x − 6
x
2 − 2x − 15
·
x
2 − 7x + 10
x
2 + 5x + 4
2x − 12
x
2 + 3
→
x
2 − 2x
2x + 8
.
To limit the effect of rounding errors, the maximum degrees of the numerator and denominator
of the result are limited to about half the number of significant digits in the floating-point
arithmetic.
• The polynomial program can expand, then optionally integrate or differentiate, with respect
to x, polynomials in x, y and z such as
(x + 1)6 + (x + y + 1)4 + (x + y + z + 1)2
→ x
6 + 6x
5 + 16x
4 + 4x
3
y + 24x
3 + 6x
2
y
2 + 12x
2
y + 22x
2 + 4xy3
+ 12xy2 + 14xy + 2xz + 12x + y
4 + 4y
3 + 7y
2 + 2yz + 6y + z
2 + 2z + 3,
203
Implementing computer algebra compactly Vol. 45, No. 4, Issue 178, December 2011
Z
x
3 − 1
x
3 + 1
+ y
4 + z
2
dx → 0.142857x
7 + xy4 + xz2 − x,
d
dx
x
6 + xy3 + z
2
→ 6x
5 + y
3
.
As illustrated: In the simplified expression, integrand or differand, any term containing z is
limited to degree 2, any term containing y is limited to total degree 4, and any term containing
x is limited to total degree 6.
• The trigonometric program can simplify, then optionally integrate or differentiate with respect
to x many trigonometric expressions in x and y. For example,
Z
2 sin
x + y
2
·cos
x − y
2
dx → x·sin(y) − cos(x),
sin(x + 3π/2)
cos(x + π)
+
cos(x − 3π/2)
sin(x + π/2) → − tan(x) + 1,
d
dx
1 + cos(2x)
cos(x)
+
sin(2x)
sin(x)
→ −4 sin(x).
The simplified expression, integrand or differand can be any linear combination of the expressions
1, sin x, cos x, tan x, cot x, csc x, sec x, sin x · cos x, sin(x)
2
, sec(x)
2
, tan x · sec x,
cot x · csc x, sin y, cos y, sin x · sin y, sin x · cos y, cos x · sin y, cos x · cos y. This set of basis
functions covers a surprising portion of the results in textbook trigonometric examples and
exercises.
• The Fourier program transforms polynomials in sines and cosines of x and of integer multiples
of x into a linear combination of sines and cosines of x and integer multiples of x, then
optionally integrates or differentiates with respect to x. It thus computes the spectrum or
Fourier decomposition of univariate sinusoidal expressions. For example,
64 sin(x)
4
cos(x) → 3 cos(x) − 3 cos(3x) − cos(5x) + cos(7x),
Z
−4 sin(2x)
2
cos(x) − 4 cos(x)
3 + 5 cos(x) dx → 0.2 sin(5x),
d
dx
−4 sin(2x)
2
cos(x) − 4 cos(x)
3 + 5 cos(x)
→ 5 sin(5x).
Here is how it is done:
1. All four programs begin with the statements
2. 10 GOTO 40
20 A = e x p r e s si o n to sim plif y , i n t e g r a t e o r d i f f e r e n t i a t e
30 RETURN
40 . . .
204
Stoutemyer
3. The instruction manual tells users to modify the right side of the assignment statement on line
number 20 to an expression that they want simplified to the implemented class. Except for
the rational program, users are then queried whether they want to integrate or differentiate
or simplify the expression on line number 20.
4. The program then executes the subroutine starting on line number 20 for different appropriatelychosen
values of the independent variable or variables, depending on the program. From these
samples it determines the coefficients of an interpolating expression in the class covered by
the program.
5. It then evaluates the interpolant and the expression on line number 20 at a few extra points.
If the magnitude of the relative difference or absolute difference exceeds tolerances at any of
these points, then the user sees an error message such as, for the polynomial program:
“Sampling suggests significant discrepancy. Perhaps line 20 is not equivalent to a polynomial
in X, Y and Z, or the polynomial is of excessive degree or the expression is too sensitive to
limitations of the Basic arithmetic.”
“For comparison, LIST line 20. If desired, alter it to assign to A any expression equivalent
to an expanded polynomial in X, Y and Z. The total degree of any expanded result term
containing X, Y, and Z should not exceed 6, 4 and 2 respectively.”
6. If instead all of the discrepancies are within tolerances, then for all but the rational program
the user is asked to enter E for expand, I for integrate or D for differentiate. (For versions of
Basic that don’t provide character or string variables, E, I and D are preassigned different
numeric values that can be tested to determine the user’s choice.)
7. The program then lists the coefficients of the interpolant expression (appropriately modified
for the integration and differentiation choices) interspersed with character strings such as
“x^2” or “cos(” and “+” to display the result expression. To avoid spurious terms caused by
rounding errors, artificial underflow is used to replace by 0.0 any computed coefficients that
have relatively small magnitude.
8. The program then lists the second paragraph of the above message beginning “For comparison
...”, then stops. The user can then modify line 20 to do another problem for the same expression
class, or the user can load one of the other three programs to treat a different expression class.
The polynomial and trigonometric programs use explicit fixed-basis formulas for the interpolant
coefficients as linear combinations of the sample values, derived for the specific sample points. These
formulas were derived by solving a set of linear equations having a symbolic right side A1, A2, ... ,
using Macsyma. The interpolation points were selected to make the resulting linear combinations
sparse and simple. For example, x, y, and z were set to combinations of 0, 1, -1 and other smallmagnitude
integers for the polynomial program. For the trigonometric program, x and y were set to
various simple multiples of π that avoid any poles of all the basis function. Moreover, computation
of the coefficients is interleaved with display to recycle some of the 26 scalar variables, and common
sub-expressions among the coefficients were exploited to reduce the program size.
The Fourier program uses an interpolant of the form
y = c0 + s1 sin x + c1 cos x + s2 sin(2x) + c2 cos(2x) + · · · + cm cos(mx)
205
Implementing computer algebra compactly Vol. 45, No. 4, Issue 178, December 2011
together with sample points xj = jπ/(2m + 1) with j = 0, 1, . . . , 2m, giving corresponding yj
.
The general closed-form solution is the discrete Fourier transform:
c0 =
X
2m
j=0
yj
,
ck =
X
2m
j=0
yj cos
2jkπ
2m + 1
,
sk =
X
2m
j=0
yj sin
2jkπ
2m + 1
.
Integer m is set to 8, but it could be increased because this set of basis functions is orthogonal,
making the interpolation relatively insensitive to rounding errors.
The rational program uses the method of inverted differences described by Hildebrand [10]. The
maximum allowable numerator and denominator degree were determined by iteratively computing
at run time a near-minimal ε > 0 such that arctan(1.0 + ε) = arctan(ε). (Microsoft Basic tended
to use double precision for arithmetic but single precision for all exponentiation and for irrational
functions such as sinusoids.) A tolerance was used to determine convergence or the lack thereof
within this maximum degree. To reduce the expectation of having a sample abscissa near or on
a zero of any denominator, the successive abscissas were taken as the transcendental numbers
x = ln(2), ln(2) + 1, ln(2) − 1, ln(2) + 2, ln(2) − 2, . . .
The evaluation-interpolation technique was pioneered by Brown and Collins [1] for computing
polynomial gcds. Sparse interpolation pioneered by Zippel [32] has made it quite effective even for
sparse multivariate polynomial gcds and for many other tasks. The novelty of Picomath was to
use evaluation-interpolation to avoid the need for a parser. However, using evaluation-interpolation
alone as a simplifier is suitable primarily only for narrowly-scoped applications such as a Web-based
applet that tests or exercises students within narrow expression classes.
The four PicoMath programs could easily be combined, making the program try all four models
then reject those that have unacceptable discrepancies at the extra sample points. However, even
the scope of such a combined program is too narrow to be of widespread general interest, and the
space required for the combined four programs would be comparable to the more flexible Calculus
Demon Basic program described in the next section.
4 General-purpose methods
Most of the time, most people want to treat general multivariate expressions composed without
restrictions of at least the operators +, −, ·, /, and ^, together with exponentials, logarithms,
trigonometric functions and inverse trigonometric functions. They want optional transformations
that include at least reduction over a common denominator, polynomial expansion, factoring and
various trigonometric transformations. This section discusses implementation methods that are
open ended — fully capable of accomplishing this and much more.
The scrollable history of input-result pairs of all the systems described in this section were
inspired by the versions of Reduce and Macsyma that I started using in 1973.
206
Stoutemyer
4.1 String-based computer algebra
It is possible to do computer algebra directly on strings containing expressions in ordinary infix
notation, as humans do. There were several early compact programs for doing differentiation using
the pioneering string-processing language Snobol, which included string pattern matching. Many of
these programs required the expression to be thoroughly parenthesized to avoid the need for parsing
other than matching left and right parentheses.
There is some appeal to using the same representation internally as for I/O, but it is inefficient
to repeatedly scan strings to delimit operands and locate operators between them. Also, although
pattern matching is well suited to some operations such as differentiation, pattern matching is
cumbersome for some other simplification algorithms such as the known efficient and thorough
algorithms for integration, polynomial gcds, and factoring. Moreover, thorough exploitation of
mathematical properties such as commutativity and associativity is cumbersome in a general purpose
pattern matcher that doesn’t have that capability built in.
4.2 Contiguous stack-based computer algebra
Mathematical expressions can be represented as trees. Computer algebra transforms a tree representing
the input expression into a tree representing the corresponding result expression. Most
computer algebra systems represent these trees using blocks of memory linked by pointers. They
use either reference counters or garbage collection to recycle blocks of memory that are no longer
referenced directly or indirectly by the program.
In contrast, Polish and reverse Polish provide a way of packing an expression tree into contiguous
stack containing no internal pointers or parentheses. With reverse Polish, an operator is deeper
than its operands, such as on many HP calculators. This is particularly appropriate for numeric
computation where the result is always a number and where in most cases there is no need to know
the operator until after the operands have been computed.
In contrast, ordinary Polish with operators shallower than their operands is more appropriate
for computer algebra where the result can be a non-numeric expression: When transforming or
combining simplified non-numeric expressions it is usually most convenient to consider the operators
of the expressions before considering their operands. For example, to implement rule-based
integration it is convenient to branch on the top-level operator of the simplified integrand. If the
top-level operator is a sum, then we can first try distributing the integration over the summands.
If the top-level operator is a product, then we can factor out factors that are independent of the
integration variable, etc.
The items in the Polish representation could be the individual characters representing digits of
numbers, characters of function names, etc. However, it is more efficient to tokenize: For example,
variable-length integers can be represented as an integer type tag on top of a length field on top
of a binary or binary-coded decimal number. As another example, built-in function names can be
more efficiently represented by unique short tags rather than with strings containing their names.
Because of the position of the tags and of any length fields at the shallower end of each subexpression,
tokenized Polish can only be traversed starting from the top-level operator, rather than
from the bottom-level operand as in reverse Polish.
In 1965 Smith [23] published an algorithm for symbolic differentiation using a Polish stack
representation.
Computer algebra can be done entirely using one contiguous stack of tokenized Polish expressions
207
Implementing computer algebra compactly Vol. 45, No. 4, Issue 178, December 2011
— an expression stack. In 1981 Calculus Demon was my first general purpose computer algebra
program based on this technique:
• It is written in about 800 lines of Basic, several statements per line, including about 100
lines of on-line help. It was published by Atari for their model 400 and 800 computers.
• It can represent, differentiate, modestly integrate and simplify general expressions written in
traditional infix notation, including fractional powers, exponentials, logarithms, trigonometric
and inverse trigonometric functions.
• Optional transformations include polynomial expansion together with transformation of integer
powers and products of sinusoids to linear combinations of sinusoids.
The stack of expressions is represented as an array of floats. Particular number tags represent the
variables A through Z, operators such as +, and functions such as ln. Floating-point constants are
distinguished by having another particular number tag on top of them. The tags for numbers and
variables, unary operators and functions, or binary operators or functions are each grouped together
so that mere boundary checks on tag values can determine the corresponding number of operands
or arguments.
The deepest 26 elements of the expression stack, indexed 0 through 25, are a symbol table of
indices for values of user variables A through Z. Assigned values are stored contiguously above that,
with the most recently assigned on top. The working stack where simplified results are developed
is on top of the assigned values. A symbol table entry contains 0.0 if the corresponding variable
has no assigned value. Otherwise the entry contains the index of the topmost element of the stored
value.
If an assignment is made to a variable that already has an assigned value, then the previous
value is deleted; and any assigned values above it are moved down by the size of the deletion, with
the assignment indices of the corresponding variables decreased by the size of the deletion.
The parser is recursive descent, which is one of the most compact alternatives when there are
only a few operators. To save code space, parsing and simplification are done in the same pass:
• Numbers simplify to themselves.
• Variables simplify to themselves if they are indeterminates. Otherwise they simplify to their
stored values. There is a “re-simplify” postfix operator @ that enables intervening assignments
to indeterminates in stored values to contribute their new values.
• Otherwise, depending on the operator or function, the top one or two simplified expressions
on the expression stack are replaced with the result of applying the operator or function to
them.
Combining parsing with simplification avoids having to branch on the tag value in two separate
places. Moreover, simplification has fewer cases to consider than with separate parser and simplifier
passes because, for example:
• Subtractions and negations don’t occur in simplified internal form. They are represented using
negative numeric coefficients, which are transformed back to subtractions and negations by
the display subroutine.
208
Stoutemyer
• Ratios don’t occur in simplified internal form. They are represented using multiplication and
the -1 power, which are transformed back to ratios by the display routine.
• Other trigonometric functions are replaced by sines and cosines in simplified internal form.
Obvious cases such as sin(x)/ cos(x) are transformed back to tan(x) by the display subroutine.
A disadvantage of having a combined parser-simplifier is that intervening expansions, etc. might
cause a lengthy delay before encountering a syntax error near the right end of the input expression.
However, the space savings are worth this disadvantage for this platform.
With a pure stack discipline, routines using the stack, as opposed to implementing the stack,
have direct access to only the topmost element. However, for efficiency, the program and subroutines
often store into scalar Basic variables the indices of expressions and sub-expressions within the
stack and use these for faster subsequent access. Thus the data structure is more accurately called
a remember stack.
For computer algebra, most procedures are most compactly written recursively. At the expense
of increased program size, some or all of the recursion can easily be replaced by loops over terms and
over factors within procedures that merely inspect expressions to determine a property. Often the
procedure can be semi-recursive — recurring on the left sub-tree but looping on the right sub-tree
or vice versa. For example, this is convenient for a procedure that determines whether or not an
expression is free of a particular variable.
However, pushing corresponding new terms or factors while looping over successively deeper
terms or factors of a previous expression typically produces the terms or factors in reversed order.
For example, suppose we use a loop to distribute negation over two terms of a sum: The loop will
push the negative of the shallowest term, then the negative of the deeper term. Thus the negative of
the given shallower term ends up deeper than the negative of the given deeper term. Consequently,
we then need a second loop to push a reversed copy of the reversed negated terms, increasing the
program size even more over that of a purely recursive procedure.
Most versions of Basic permit recursive subroutines. Moreover, most versions also provide a
large enough return-address stack so that the expression stack is more likely to avoid overflow first.
However, the lack of subroutine parameters and local variables makes it awkward for a subroutine
to remember indices into the expression stack past intervening calls on subroutines — particularly
recursive calls.. This handicap can be overcome by pushing such indices onto the expression stack,
which makes that stack a mixture of Polish expressions and indices of deeper Polish expressions.
Operators “+” and “∗” are treated as binary rather than n-ary, with associativity and commutativity
being used to sort the operands into a canonical lexical order. Although n-ary “+” represents
lengthy sums more compactly, the program would be more complicated: Each recursive routine that
processed sums would have to invoke an auxiliary recursive routine that recurred over the operands
to build a list of result terms, then the invoking routine would have to attach a “+” operator to the
result if it was more than one term, or extract the first term from the list if there is only one term.
There is even less reason to make products n-ary: Expanding polynomials doesn’t generate new
variables, so the number of factors in a fully-expanded term can’t be more than 1 plus the number
of indeterminates.
One of the most important Calculus Demon subroutines is one that, given the index of an
expression or sub-expression, determines the index of the expression or sub-expression immediately
below it. The algorithm is to initialize a global “sub-expression deficit” counter to 1, then decrement
209
Implementing computer algebra compactly Vol. 45, No. 4, Issue 178, December 2011
it by 1 for a number or variable, or increment the counter by 1 for a binary operator or function, or
leave the counter unchanged for a unary operator or function. For binary operators it recurs over
the first operand and loops over the second operand. The loop is exited when the counter reaches
0.
As with assembly language, you can save code space, return-address stack space and execution
time in Basic by replacing a code sequence such as “Gosub 90: Return” with “Goto 90”: The
return from subroutine 90 then also returns from the subroutine that calls it.
Calculus Demontm is accompanied by a similar program named PolyCalctm that is specialized to
polynomials and a program named AlgiCalctm that is specialized to univariate rational expressions in
X, including an option for approximate fully-factored form. All three programs can be downloaded
free from [2].
A major goal of PicoMath, Calculus Demon, PolyCalc and AlgiCalc was to demonstrate to
calculator manufacturers the feasibility of building computer algebra into a widely marketed calculator.
As a demonstration prototype, Calculus Demon was adapted to the Basic on the HP 75C
calculator [12], but never distributed. However, Hewlett-Packard did later build computer algebra
into their HP 28C calculator [11] and some subsequent calculators.
Calculus Demon demonstrated the feasibility of using a Polish expression stack to implement
general-purpose computer algebra.
However, using floats to represent variables and type tags and using Basic as an implementation
language was very suboptimal. It is a waste of memory and computing time to devote an entire
floating-point number to representing one-letter variables, tags, integer indices etc. For this purpose,
even assembly language is easier, with its symbolic labels and constants, integer variables, and builtin
instructions for pushing data onto the return-address stack or popping it off.
The remember-stack mechanism was later implemented using the C programming language for
the TI-92 introduced in 1995 and for some subsequent TI calculators, together with Windows and
Macintosh computers [6]. Instead of floats, the units of storage on the expression stack are 8-bits on
the Motorola 68000-based calculators, but they are 32 bits on the Macintosh version of TI-NSpire
or 16 bits on the Windows version and on the ARM-based handheld version.
With its parametrized functions, local variables and integer arithmetic for implementing unlimited
precision rational arithmetic, C is a luxurious programming language compared to Basic. The
TI-89/TI-92 Plus Developer Guide [30] gives some implementation details. The computer algebra
capabilities are significantly more than Calculus Demon — roughly comparable to Derive, but with
significant differences too.
Some advantages of Polish representation of expression trees are:
• No space is wasted on pointers within an expression.
• The data is contiguous, which improves speed for caches and virtual memory.
• The data is relocatable. (For this reason, and the fact that it doesn’t require parsing, it is
also more efficient than infix character strings for serialization in computer algebra systems
that use linked storage for doing transformations.)
• It isn’t necessary to implement garbage collection or reference counts.
• There are no annoying pauses for garbage collection during plotting, etc., making the technique
particularly suitable for real-time applications.
210
Stoutemyer
• Unlike garbage collection, the percentage of time devoted to memory reclamation doesn’t
increase as the amount of reclaimed memory decreases.
To save some program space and execution time, the program uses pointers rather than indices that
are added to the base address of an array. A pointer to the top of the expression stack is stored
in a global variable named top_estack. When data that is no longer needed is on the top of the
expression stack, that memory can be reclaimed in negligible constant time by simply assigning to
top_estack an address that was saved earlier into a local variable.
When a block of memory strictly within the expression stack is no longer needed, it can be
reclaimed by shifting the still-needed portion above it down in time Θ(n), where n is the number
of memory units being shifted. This tends to require significantly less time than was required to
create the contents of those memory locations.
Pointers to any expressions of remaining interest above the deleted portion must be decremented
by the amount of shift, which requires time proportional to the number of such pointers. Such
adjustments tend to be infrequent for recursive algorithms but more frequent for looping algorithms
that push expressions onto the expression stack in an irregular sequence. Thus although there is an
initial code space savings for not having to implement garbage collection or reference counts, the
code size might grow faster per implemented feature. Moreover, storage reclamation is a distributed
responsibility of the implementer rather than a larger initial programming time and program space
investment followed by less subsequent concern.
Moenck [19] proposes a similar data structure, except that immediately below each binary
operator is a displacement field that, when subtracted from the address of the displacement field,
points to the deeper operand. Using displacements rather than pointer fields preserves the valuable
relocatability. At the expense of requiring more space for expressions and more code to consistently
manage the displacement fields, this technique might make the system faster by providing quicker
access to the deeper operand. However, many functions that process the first operand can be written
to return the address of the expression below that operand. Even when that is inconvenient, the
time required to traverse the shallower operand to reach the deeper operand is often significantly
less than the time required to process the shallower operand.
With a remember stack, any number of pointer variables from a running program can point
to the same expression or sub-expression in the expression stack. However, the expression stack
itself ordinarily doesn’t contain pointers to expressions or sub-expressions. In this sense there is no
sharing within the expression stack of common sub-expressions therein.
In contrast, a system implemented with linked storage can more thoroughly share common subexpressions.
For example, Lisp programs can fortuitously share some common sub-expressions —
particularly if the data structure and algorithms are implemented with that as one of the goals.
For example when recurring over the terms of two polynomials to add them, when one polynomial
operand becomes empty, the partial result can be the pointer to the remaining terms in the other
polynomial, thus sharing between that operand and the result those terms together with the Cons
cells that glue them together. As another example, when multiplying two terms such as xy2
times
xz, a pointer to the existing y
2
can be used in the result x
2y
2
z rather than forming a separate
duplicate y
2 by computing y
2+0
. Moreover, the implementation can more forcefully share some
sub-expressions. For example in the Reduce computer algebra system, there is a protocol so that
functional forms such as ln(x) and sin(ln(x)+y) are stored uniquely, which saves space for duplicate
instances and makes fast address comparison sufficient for recognition of Equal functional forms.
211
Implementing computer algebra compactly Vol. 45, No. 4, Issue 178, December 2011
A system such as Maple [3] or HLisp [8] that also hashes sub-expressions can, at the expense
of some initial time investment, share even more common sub-expressions and make recognition of
identical expressions faster. Thus a remember stack and hashing with pointers are at the opposite
ends of a spectrum with respect to the sharing of common sub-expressions. The amount of possible
sharing depends strongly on the data structures and specific examples. Table 2 in [28] implies that
for Lisp the number of Cons cells ranges from 2% to 100% of the number required if there were
no sharing, with an average of 67% over all of the examples and alternative data structures. The
break-even point for the relative space efficiency of an expression stack versus pointers with sharing
depends on the relative sizes of tokens and atom representations in the expression stack versus
pointers and atoms for Lisp.
4.3 Implementing linked-data computer algebra compactly
The early computer algebra systems MathLab, Reduce, ScratchPad and Macsyma were all implemented
in Lisp. This helped inspire the use of Lisp as an implementation language for muMath
and Derive.
The Lisp programming language and close relatives such as Scheme come with some important
support for computer algebra already built in:
• They include a built-in garbage collector — one that tends to be fast even for very small
blocks of memory such as a small-magnitude integer or a Cons cell consisting of two pointers.
• They include arbitrary-precision rational arithmetic that automatically adapts to whatever
memory block sizes are necessary for each individual number.
• They include a built-in association list that makes it easy and fast to store and query information
about an indeterminant, operator or function, such as its type, parser precedence, arity,
associativity, commutativity, linearity, and symmetries.
• They include an interpreter that permits run-time definition of new functions, and the interpreter
can be bootstrapped into interpreting mathematical expressions and accepting definitions
of new mathematical functions.
• They include efficient powerful mapping functions that factor out common processing overhead
such as looping or recurring over lists, thus reducing computation time and program size while
increasing program comprehensibility.
• They are typically quite efficient at executing recursive function calls, which are prevalent for
implementing most computer algebra.
Lisp is more suitable than any of the previously discussed methods for implementing computer
algebra. Logic programming languages such as Prolog might be even better matched to the task —
particularly for the highly desirable better integration of computer algebra with logic.
For languages such as C or C++ that lack most of these features, the features must be programmed
by the computer algebra implementers.
212
Stoutemyer
4.3.1 muLisptm for the Intel 8080 and Zilog Z-80
Full implementations of the Common Lisp standard entail a very large run-time footprint, making
them less suitable for compact computer algebra. However, Common Lisp includes many features
that are not crucial for computer algebra. Before Common Lisp was defined, Albert Rich wrote a
very compact Lisp interpreter named muLisp — initially for the Intel 8080 micro-processor [24].
The initial 2 kilobyte version was written in machine language and was entered into a S-100-bus
computer built from a kit [15], using front-panel toggles for each bit of the address and each bit of
the instructions.1 At that time the computer had 4 kilobytes of RAM. A monochrome characteroriented
monitor also built from a kit was used for the interactive Lisp I/O.
Subsequent versions added audio-tape cassette backup of the Lisp interpreter and Lisp programs.
The cassette drive was later replaced with a dual-floppy drive storing about 128 kilobytes per floppy
disk. At that time we also began using the CP/Mtm operating system, which included an assembler,
loader, machine-language debugger, and line-oriented editor. The Imsai 8080 RAM was gradually
increased to the 64 kilobyte address space using hand-soldered circuit boards. The interpreter was
rewritten in assembly language and enhanced to contain arbitrary-precision rational numbers.
muMath was the first computer algebra system implemented in muLisp. Several unique features
of muLisp contributed significantly to the compactness of muMath and its successor, Derive:
1. In most Lisps, evaluating the Car or Cdr of an atom provokes a run-time error. Consequently
these most-frequently invoked functions incur the speed penalty of run-time checks. Instead,
in muLisp it is allowable to evaluate the Car or Cdr of an atom because:
(a) The Car of a number points to itself, and the Cdr points to a symbol designating its sign
and type (small integer, big integer or reduced fraction). Numbers have two additional
fields containing a small integer or containing the number of bytes and a pointer to
the contiguous bytes of a large unsigned binary integer, or containing pointers to two
atoms that are the numerator and denominator of a rational number. However, these
two additional fields aren’t accessible via Car or Cdr. Moreover, small integers are
hashed to avoid duplicate instances of small magnitude integers that commonly occur in
expressions and to make the Eq and Equal functions faster for them.
(b) The Car of a symbol points to its value (perhaps itself), and the Cdr points to the
symbol’s property list (perhaps Nil). There are two additional fields pointing to its
function definition (or to Nil) and to its print-name string, but these two fields aren’t
accessible via Car or Cdr.
(c) This closed pointer universe makes it impossible to crash the interpreter or operating
system by inadvertently taking the Car or Cdr of an atom, despite the lack of costly
run-time checks.
2. Lisp has an interactive driver loop that prompts for an input expression, reads it, evaluates it,
then displays the resulting value. In most Lisps if you attempt to evaluate a symbol such as
x that hasn’t been assigned a value, it provokes an error interrupt. With such Lisps you must
use a quote macro, such as in ’x to avoid that error interrupt. In computer algebra, if a symbol
doesn’t have an assigned value, then we want the atom to evaluate to its own name. muLisp
1The IMSAI 8080 was a clone of the pioneering MITS Altair.
213
Implementing computer algebra compactly Vol. 45, No. 4, Issue 178, December 2011
uses this auto-quoting method. Therefore the machine-language muLisp Apply and Eval
functions can be used directly for computer algebra. There is no need to write additional
slower versions merely to obtain the auto-quoting needed for computer algebra. This also
saves space and execution time throughout the computer algebra program because the quote
can usually be omitted for symbols.
3. In muLisp, symbols and strings are the same thing. For example, there are functions for
concatenating and extracting sub-strings from the names of symbols. This makes the muLisp
interpreter more compact, and it can make muLisp programs more compact because only one
copy is stored for duplicate instances of symbols, hence also for strings.
4. Excess trailing formal parameters for which corresponding arguments aren’t provided are
treated as local variables, initialized to Nil. This avoids expanding and interpreting a separate
Let macro, and it is an efficient method of providing a variable but limited number of
arguments. (There are also no-spread functions that bundle an unlimited number of arguments
into one parameter as a list of arguments.)
5. In computer algebra implementations almost every function body is a sequence of statements
containing assignments, conditional statements and loops that themselves contain such sequences.
In Lisp, Progn is comparable to braces in C, and Cond is comparable to the
C construct “if (...){...} else if(...){...},... else{...}”. In muLisp, function bodies are implicit
Progns that optionally contain implicit Conds, making these functions rarely needed: Every
function body is a sequence of one or more forms: form1
, form2
, . . . formn
. The returned
value is the last value that is computed before exiting the function. Beginning with form1
and proceeding sequentially:
(a) If formi
is an atom, then it is simply evaluated.
(b) If instead the Car of formi
is an atom, then the function having that name is applied
to the values of the remaining elements in formi
.
(c) If instead the Caar of formi
is an atom, then it is an implicit Cond: The function
having that name is applied to the values of the remaining elements in the Car of formi
.
If the result is Nil, then the remaining elements of formi are ignored and execution
proceeds to formi+1. Otherwise formi+1 through formn are abandoned, and execution
proceeds to the remaining elements of formi as the alternative to those abandoned forms.
(d) If instead the Caar of formi
isn’t an atom, then the forms in formi are those of a nested
implicit Progn, treated as described here for forms form1
through formn
, after which
formi+1 is evaluated instead of leaving the outer implicit Progn that is the function
body.
Another way to regard a function body is that whenever formi begins with two left parentheses,
then formi
is a conditional branch that will abandon formi+1 through formn
if formi
is nonNil.
Whenever formi begins with three left parentheses, then formi
is a similar sequence of
forms that will be executed or partly executed, after which execution continues with formi+1.
For example, here is a traditional Lisp recursive factorial function followed by a faster more
compact one that exploits the muLisp implicit Cond:
214
Stoutemyer
( Defun Factorial (N )
( Cond
(( Zerop N ) 1)
( T (* N ( Factorial ( Sub1 N )))) ) )
( Defun Factorial (N )
(( Zerop N ) 1)
(* N ( Factorial ( Sub1 N ))) )
muLisp also has ProgN and Cond for compatibility with other Lisps, but these functions
are rarely used in the implementation of muMath and Derive.
muLisp has an efficient general looping function (Loop form1
, form2
. . . , formn
), where any number
of the forms beginning with two left parentheses are implicit Conds that conditionally exit the loop.
For example, here is a traditional Lisp iterative factorial function and a faster more compact muLisp
alternative that uses Loop together with implicit Progn and implicit Cond:
( Defun Factorial (N )
( Prog (M )
( Setq M 1)
A ( Cond
(( Zerop N ) ( Return M )) )
( Setq M (* M N ))
( Decq N )
( Go A ) ) )
( Defun Factorial (N
M )
( Setq M 1)
( Loop
(( Zerop N ) M )
( Setq M (* M N ))
( Decq N ) ) )
There are also rarely-used macros that define the Common Lisp Do, Do*, Dolist and Dotimes
looping functions in terms of Loop.
1. muLisp uses dynamic shallow binding rather than lexical deep binding. Dynamic shallow
binding is faster and more appropriate for interpreted programs. Although there was a compiler
for the Intel 8080 version of muLisp, it was used only for a few appropriate speed critical
functions because it typically increases code size by a factor of 2 to 3 while increasing speed
of those functions by a factor of at most 2 to 3. The reason is that the muLisp interpreter is
unusually space and speed efficient. The optionally loadable Flavors package provides lexical
binding if desired.
215
Implementing computer algebra compactly Vol. 45, No. 4, Issue 178, December 2011
2. Cons cells, symbols, their print names, numbers and their binary data are each stored in a
separate contiguous area so that type checking can be done by mere comparisons with the
boundary addresses between these regions. This is fast and it avoids wasting space in each
number, symbol and Cons cell for a type tag.
3. The garbage collector compacts each of these data types toward one or the other end of
each region, and pairs of data types grow toward each other, as do the return-address and
argument stacks. Therefore garbage collection isn’t triggered unless both members of a pair
collide. Moreover, if collection results in relatively much more free space in one of the areas
than another, the amount of memory allocated to each area is reapportioned to balance the free
space proportionate to the employed space of each type. This way, computation can proceed
until almost all of the memory is being used. The data that 16-bit pointers address all align
on an even byte address, so the least significant bit of the address is used as the temporary
marker bit for the mark-and-sweep garbage collection, avoiding the need for separate mark-bit
space while allowing pointers to all of the even addresses in the full 64-kilobyte address space.
4. Function definitions ordinarily remain unchanged after they are debugged. Therefore it is
worth spending some effort to make the stored form of function definitions more compact and
faster to execute. Consequently:
(a) Function definitions are Cdr-coded: With the exception of an in situ quoted dotted
pair for which the Cdr points to a non-Nil atom, function definitions are comprised of
atoms and nested lists. Therefore these lists in function bodies are represented as arrays
of pointers rather than as linked Cons cells. Rather than having an extra pointer to
Nil at the end of the array or having a field containing the number of elements in an
array, the least significant bit of the last pointer is 1 if and only if it points to the last
element. (Of course that bit is masked out before following the pointer.) This Cdr
coding approximately halves the storage space for the function, and it executes faster
too because access to the Cdrs is faster.2
(b) Starting from the innermost expressions, if two or more pointers in a function definition
point to syntactically identical code fragments, then only one copy of that data is stored,
to be pointed to by all of the instances. For example even if the code fragment (Eq (Car
Arg1) Cos) occurs more than once within a function definition, only one copy is stored,
and recursively, if (Car Arg1) occurs in other contexts in the function, then those
instances are also shared. This condensing saved significant additional space. Moreover,
the sharing isn’t limited to entire lists: Lists having different head elements can share
identical tails.
(c) While the control variable *Condense* is nonNil, this condensing process is done
between functions as well as within them, saving additional space if the same code
fragment occurs in more than one function definition. Also, functions defined under
these circumstances are excluded from garbage collection, making the collection faster.
For this reason and the additional time required to load functions in this mode, it is
2Bytecode is an oft-used competing way to encode algorithms compactly. It has been used for many programming
languages from BCPL on, including Lisp and Java.
216
Stoutemyer
used only after a file of functions is debugged, hence ordinarily subject to no further
redefinition.
5. (Zap-string symbol) frees the memory required to store a symbol’s print name by converting
its print name to the null string. Thus a significant amount of space can be saved by zapping
the print-names of internal functions and control variables as the last step in a program
definition.
6. Early versions of the interpreter used some of the Intel 8080 one-byte reset instructions for
some of the most commonly-occurring Lisp functions, such as Car and Cdr. Unfortunately,
later versions of operating systems and other co-resident programs rudely clobbered these
memory locations in a disorganized free-for-all without saving then restoring them after use.
Consequently this idea was abandoned in later versions of muLisp.
7. Macros are used to conditionally exploit the more compact Z-80 relative jumps when assembling
for the TRS-80 computer.
4.3.2 muMath
muMath [22, 26] also employs some techniques that make the code faster and/or more compact:
• The first file loaded during a build bootstraps a Pratt parser by defining the Parse function,
then storing the right and/or left binding powers of operators, etc. on their property lists.
These definitions are co-mingled with using those newly-defined operators, delimiters, and
control constructs in subsequent functions and expressions in the file as soon as they are
available. Pratt Parsers [5, 21] are quite compact and particularly suitable for adding syntactic
extensions during run time. The beginning of the file is muLisp, which steadily morphs into
the the more Algol-like muSimp surface programming language by the end of the file. For
example, the implicit Cond is invoked by the explicit muSimp syntax
When condition; expression1
; expression2
; ...; Exit;
Once parsed, function definitions and mathematical expressions are just as space and
speed efficient as muLisp. There is still only one level of interpretation, or not even
that for the few functions that are compiled.
• The implementation uses a sort-of poor man’s object-based style: Many algorithms that operate
on mathematical expressions are implemented incrementally by putting on the property
list of the function or operator name several small functions, each for handling a different
kind of operand or combination of operands. For example to implement differentiation there
is a function for differentiating numbers, another function for differentiating variables, another
function for differentiating sums, etc. This makes it easy for implementers and users to
enhance functions and operators without doing delicate surgery on a monolithic function that
handles all cases. For example, a user can easily supplement the differentiation algorithms
with differentiation of Bessel functions. The dispatching is handled by muLisp, and it is quite
fast because it uses the assembly-language muLisp Assoc function rather than interpreted
implicit Conds to determine which case is applicable. There are separate functions for putting
new information at either the near or far end of an association list so that the most frequently
used information is near the beginning of the list.
217
Implementing computer algebra compactly Vol. 45, No. 4, Issue 178, December 2011
• Each of the muLisp arithmetic operators invokes an associated trap function to optionally
catch error throws for non-numeric arguments. Using the above object-based style, the corresponding
muSimp trap functions dispatch to various functions such as adding a variable to
a number, a sum to a product, a logarithm to a logarithm, etc. Because of this organization,
the most common case of combining numbers with numbers is tried first in machine language
rather than after an interpreted test.
muMath has no approximate arithmetic, hence no function plotting capability. However, Microsoft
licensed muMath for the Radio Shack TRS-80, for which they added function plotting by tying into
their Microsoft Basic and its floating-point arithmetic in ROM.
Microcomputer algebra became luggable in 1981 when muMath was bundled with the Osborne
1 “suitcase” computer [20].
4.3.3 muLisp for the Intel 8086
Implementing muLisp for the Intel 8086 was more challenging because the 1-megabyte address space
was segmented into 64-kilobyte segments. The MS-DOS operating system reserved 360 kilobytes
of that address space for its purposes, leaving 640 kilobytes for implementing and using muLisp.
Awkward segmented architecture is becoming less common as memory costs decline. However,
techniques for overcoming the limitations of short pointers are still relevant for implementing linked
programs and data compactly.
For generating addresses there are four 16-bit segment registers whose contents are shifted left
4 bits then added to a 16-bit offset. One of these segment registers is used for instructions, one for
two stacks, one for data, and one for an “extra” segment that is effectively a second data segment.
muLisp uses one of the stacks in the stack segment for return-address offsets and the other for
argument-address offsets, with the two stacks growing toward each other. This is a cleaner design
than co-mingling return addresses and argument offsets in one stack.
Following a suggestion of Zippel [31], the 16-bit Cars are stored in an area pointed to by the
Data segment register and the 16-bit Cdrs are stored in an area pointed to by the Extra segment
register, thus doubling the number of cells that would be available if the Car and Cdr were stored
adjacent to each other in the same segment. The Cars and Cdrs of numbers are below those of
symbols, which are below those of Cons cells, to permit fast compact address typing, with the
numbers and symbols growing toward each other. The data for the other two fields in number table
and symbol table entries are in different segments.
There can be up to 6 code segments containing a maximum of about 55 kilobytes each. The
first offset in each function definition is an offset to the atom Lambda, and the interpreter can
detect that the current code segment is no longer appropriate by testing for this. When such a
code-segment fault is detected, the interpreter tries each of the other code segments until Lambda
occurs. The interpreter inserts gaps between function definitions if necessary so that only one code
segment has Lambda at that offset.
An effort was made to sequence the function definitions according to locality of reference to
reduce the frequency of code-segment faults and to put the most frequently used functions in the
segments that are tried first after a code-segment fault. This effort also includes a function that
can force a new code segment at a desired point in the source code, rather than letting it happen
where it became unavoidable.
218
Stoutemyer
4.3.4 Derive for MS-DOS
muMath has a teletype style interface similar to that of the MS-DOS operating system under
which the Intel 8086 version ran. The window and mouse based interface of the Macintosh made
computing attractive and accessible to a far larger audience. Accordingly, to reach a larger audience,
Soft Warehouse released Derive as an MS-DOS successor to muMath in 1988. Derive included its
own windowing system implemented in muLisp.
Microcomputer algebra achieved compact cordless portability in 1991 when a special ROM
version of Derive was released for the HP 95LX palmtop computer [13].
Derive also included function-plot graphing. We needed approximate arithmetic to do this.
Since we already had arbitrary-precision rational arithmetic, the most compact way to implement
approximate arithmetic was to round rational numbers that had excessively lengthy numerators
and denominators to simpler rational numbers, using the mediant rounding described by Matula
and Kornerup [18]. There was also the option of having magnitudes less than a user-set value underflow
to 0. For very little additional code this technique gave us adjustable-precision approximate
arithmetic as a mode. Moreover, mediant rounding has the nice property for computer-algebra that
the capture interval around simple fractions is larger than around more complicated fractions, thus
increasing the chances of recovering a simple exact rational result despite intermediate approximations.
Also, this approximate rational arithmetic kept the implementation more compact because
there wasn’t a separate approximate-number type to test for and manage. It would be more informative
to keep the information that a rational number might not be exact and optionally indicate
that to the user, but we didn’t.
For this mediant-rounding arithmetic:
• Rounded multiplication and division average about three times as slow as exact rational
arithmetic using the same rational operands.
• Rounded multiplication and division average several times slower than for variable-precision
binary floating-point arithmetic at comparable precision.
• Unlike variable-precision floating-point, addition is about the same speed as multiplication
using the same operands, and grows quadratically rather than linearly with the number of
words of precision.
The approximate arithmetic entailed also implementing algorithms for approximate fractional powers
and elementary functions. For this purpose it was helpful to develop algorithms that, as much
as practical, separately developed an approximate integer numerator and denominator of a result,
perhaps shifting them both right by the same amount several times during the process as a fast
approximate rounding, with one mediant rounding only at the end. This was typically much faster
than simply using the usual floating-point algorithms but with many mediant roundings along the
way. In fact, unlike addition, multiplication and division, the speed of fractional powers and the
elementary functions was typically slightly faster than for variable-precision floating-point arithmetic.
This technique of working mostly with large integers and shift-rounding pairs of them during
intermediate steps might benefit implementation of irrational operations in arbitrary-precision
floating-point arithmetic too.
Installation of the Intel 8087 floating-point co-processor was rare at that time, and about 6
significant digits was sufficient for most plotting purposes. Therefore the plot speed compared
219
Implementing computer algebra compactly Vol. 45, No. 4, Issue 178, December 2011
acceptably to the then-prevalent 8 to 16 digit binary or binary-coded decimal fixed-precision software
floating point.
Another difference between muMath and Derive is that although the one-level evaluation of
Lisp and muLisp is sufficient for computer algebra, it isn’t ideal. One-level evaluation works well
for computing the values of formal parameters and local variables, but infinite evaluation is more
natural for global variables. For example, in muSimp, if you enter the assignment z := x + 5; then
the assignment x := 7; then enter z, it will still have the value x + 5, rather than 13. There was a
way to force extra evaluation levels, but users expected that to be done automatically.
For functions such as differentiation, users prefer a sequence such as
x := 7;
d
dxx
2
;
to produce 14 rather than an error caused by differentiating 49 with respect to 7. In other words,
they want delayed substitution of assigned values in some circumstances.
Therefore, Derive uses special computer-algebra oriented Eval and Apply function that are
written in muLisp rather than assembly language.
muLisp and muSimp both had separate optional windowing, editor and debugging environments,
and analogous functions had different names. For example, Car in muLisp was First in muSimp.
It was a burden to maintain these separate but parallel development environments, so Derive was
written in muLisp rather than muSimp.
As indicated in the introduction, compact and fast data representation is secondary to a compact
computer-algebra implementation for the limited address space targets of muMath and the
Intel 8086 version of Derive. However, Derive also uses particularly compact data representations
that facilitate fast processing. Reference [29] describes the recursive partially-factored semi-fraction
form predominantly used by Derive, but without details about the representation of the recursive
multinomial factors therein. Reference [28] compares the speed and space requirements for several
alternative multinomial data structures, with observations about the relative compactness of corresponding
polynomial co-distribution algorithms. Shortly after writing that article, I implemented
two variants of the sparse recursive representation that were usually faster and more space efficient
than the alternatives reported there. Through version 5, Derive used only the following variant:
• Expressions whose top-level function or operator is anything other than “+”’, “·” and “^” are
represented as functional forms that are lists starting with the function or operator name
followed by arguments that are general expressions.
• A power is a non-zero rational numeric exponent dotted with a general expression that is the
base. It requires only one Cons cell beyond any in the base, and it is easily distinguished
from functional forms, for which the Car is a symbol rather than a number. The exponents
can be negative (representing denominators) and/or fractional.
• A product is a power dotted with a non-zero general expression. It is easily recognized as a
product because it is not an atom and its Car is not an atom, but its Caar is numeric. The
most main of the two factors is the Car of the dotted pair. The product requires only one
Cons cell beyond those of the leading power and its cofactor.
220
Stoutemyer
• A sum is a leading term that is a product, dotted with a non-zero reductum that is a general
expression. It is easily recognized as the only remaining possibility for a general expression or
as a non-atomic expression whose Car and Caar aren’t atomic. It requires only one Cons
cell beyond those of the reductum and the product that is the leading term.
This representation can be categorized as sparse recursive with implicit binary “+”, “·”, “^”, and
variables in. Reference [29] describes the ordering of terms and factors, together with how partial
factoring and semi fractions are accommodated in this form. This is completely general and quite
flexible, which is needed for the partially-factored semi-fraction form because it tends to have
relatively few terms in any one sum, with powers, products and sums occurring almost anywhere.3
To make the implicit “+” and “·” recognizable, we must pad a base with an exponent of 1 if the
base is a factor in a product and isn’t the last factor therein; and we must pad a non-product term
with a coefficient of 1 if it isn’t the last term in a sum. However, this is significantly less padding
than is usually required to use implicit operators. In contrast, for example, if a top-level expression
isn’t a number or a variable, the closest representation in [28] must construct a list with a Pol tag
followed by the non-atomic polynomial data structure in which at every recursive level we:
• always force variables to have an exponent even if it is 1;
• always force non-numeric terms to have a coefficient even if it is 1; and
• always force non-numeric polynomials to have a reductum even if it is 0.
At the deepest recursive levels where most of the Cons cells occur, polynomials often don’t have a
numeric term or are only one term that is perhaps merely a variable or a functional form rather than
a non-unit power thereof. Therefore the space savings and associated time savings are significant
for the Derive representation.
A dense univariate representation is also used for the special purpose of computing approximate
univariate polynomial zeros.
In version 6, an even faster expanded sparse recursive representation was used for a few special
purposes. Besides implicit operators, it has stratified factoring out of the variables, with one
representation of the variable at each recursive level. It is similar to but not identical to the
Maxima Canonical Rational Expression form [7].
4.3.5 muLisp for the Intel 386
The Intel 386 was the first processor in the x86 family that supported a full unsegmented 32-bit
address space. Therefore the architecture of the 386 version of muLisp was closer to the simpler
Intel 8080 version, but with 32-bit addresses rather than 16-bit addresses.
4.3.6 32-bit Derive for Windows
The 32-bit version of the Microsoft Windows operation system was finally becoming a widespread
competitor to the much earlier Macintosh windows interface.
Accordingly, Albert Rich and Theresa Shelby designed a user-interface of Derive for Windows,
which Theresa implemented in C++. For mathematical services such as simplifying, solving, or
3A semi-fraction is a sum of an optional extended polynomial and any number of proper ratios of extended
polynomials having denominators whose mutual gcds are numeric. Unlike partial fractions, which semi-fractions
include, the denominators do not need to be square free or irreducible.
221
Implementing computer algebra compactly Vol. 45, No. 4, Issue 178, December 2011
integrating mathematical expressions, the interface called the Derive math engine, which was written
in the 32-bit version of muLisp. The interface required more code space than the computer algebra,
so compact computer algebra became less relevant for that platform.
By that time IEEE floating-point hardware was becoming widespread, and Mathematica(R) made
apparent the marketing appeal and mathematical value of fast high-resolution 3D surface plots.
Consequently, we wrote a muLisp function that produces a contiguous reverse Polish representation
of an expression, together with a C evaluator function that can evaluate that representation for
floating-point values of the variables therein. The Plot functions all invoke this translator from
Lisp to reverse Polish, then repeatedly invoke the C evaluator function to quickly approximate the
reverse Polish expression at plot points.
Adjustable-precision approximate rational arithmetic is still used for all other approximate arithmetic
purposes in Derive, such as quadrature and approximate equation solution.
5 Summary
Despite decreasing memory costs, there are still good reasons for implementing programs compactly.
Compact computer algebra has a long and varied history, with ideas that are relevant to
implementing other kinds of programs compactly too.
Special-purpose methods include
• implementing univariate polynomial and series algebra using dense coefficient arrays, and
• finessing computer algebra via evaluation and interpolation.
General-purpose methods include
• string-based computer algebra,
• contiguous stack-based computer algebra, and
• various methods for making linked-data computer algebra more compact.
Acknowledgments
Albert Rich is the primary author of muLisp. He and David Stoutemyer are the primary co-authors
of muMath. They and Theresa Shelby are the primary co-authors of Derive.
References
[1] W.S. Brown, On Euclid’s algorithm and the computation of polynomial greatest divisors,
Journal of the ACM 18(4), pp. 478-504, 1971.
[2] Calculus Demon, PolyCalc and AlgiCalc, 1981,
http://www.atariarchives.org/APX/showindex.php
[3] B.W. Char, K.O. Geddes, W.M. Gentleman and G.H. Gonnet, The design of Maple: A compact,
portable and powerful computer algebra system, in Computer Algebra, Lecture Notes in
Computer Science 162, Springer-Verlag, pp. 101-115, 1983.
222
Stoutemyer
[4] G. Cioni and A. Miola, Using minicomputers for algebraic computations, ACM SIGSAM Bulletin
10(4), pp. 50-52, November 1976.
[5] D. Crockford, Top down operator precedence, 2007,
http://javascript.crockford.com/tdop/tdop.html
[6] TI-92, TI-89, Voyage 200, TI Nspire, DataMathtm Calculator Museum, 2008,
http://www.datamath.org/
[7] Free Maxima download, http://maxima.sourceforge.net/download.html
[8] E. Goto and Y. Kanada, Hashing lemmas on time complexities with applications to formula
manipulation., Proceedings of the 1976 ACM Symposium on Symbolic and Algebraic Computation,
pp. 154-158., 1976.
[9] P. Henrici, Computational Analysis with the HP 25 Pocket Calculator, Wiley. 1977.
[10] F.B. Hildebrand, Introduction to Numerical Analysis, 2nd edition, McGraw-Hill, pp. 494-502,
1974.
[11] HP 28C calculator photograph and history, 1987, http://www.hpmuseum.org/hp28c.htm
[12] HP 75C calculator photograph and history, 1982, http://oldcomputers.net/hp75.html
[13] HP 95 LX photograph and history, 1991, http://en.wikipedia.org/wiki/HP_95LX
[14] IBM Scientific Subroutine Package, 1965,
http://www-03.ibm.com/ibm/history/exhibits/1130/1130_facts4.html
[15] IMSAI 8080 photograph and history, 1975, http://en.wikipedia.org/wiki/IMSAI_8080
[16] J.G. Koomey, Worldwide electricity used in data centers, Environmental Research Letters, 3,
2008, and http://uowblogs.com/digcjew994/files/2010/04/Koomey.pdf
[17] D.E. Knuth, The Art of Computer Programming, Volume 2, Addison-Wesley, Sections 4.6 and
4.7, 1969.
[18] D.W. Matula and P. Kornerup, Finite precision rational arithmetic: slash number systems,
IEEE Transactions on Computers, C-34 (1), pp. 3-18, 1985.
[19] R. Moenck, Is a linked list the best storage structure for an algebra system?, ACM SIGSAM
Bulletin 12 (3), pp. 20-24, 1978.
[20] Osborne 1 computer photograph and history, 1981,
http://en.wikipedia.org/wiki/Osborne_1
[21] V.R. Pratt, Top down operator precedence, Proceedings of the 1st Annual ACM SIGACTSIGPLAN
Symposium on Principles of Programming Languages, pp. 41-51, 1973.
[22] A.D. Rich, and D.R. Stoutemyer, Capabilities of the muMath-78 computer algebra system,
Symbolic and Algebraic Computation, Lecture Note 72, Springer-Verlag, pp. 241-248, 1979
223
Implementing computer algebra compactly Vol. 45, No. 4, Issue 178, December 2011
[23] P.J. Smith, Symbolic derivatives without list processing, Communications of the ACM 8(8),
pp. 494-496, 1965.
[24] Soft Warehouse Inc., muLisp 90 Reference Manual, 1990. (Out of print.)
[25] D.R. Stoutemyer, Symbolic mathematics on a programmable hand-held calculator, An unpublished
talk presented at the annual SIAM Meeting, 1977.
[26] D.R. Stoutemyer, The muMath symbolic math system, Creative Computing Magazine, July
and August, 1979.
[27] D.R. Stoutemyer„ PicoMath-80, an even smaller computer algebra package, ACM SIGSAM
Bulletin 14 (3), August, Issue 55, pp. 5-7, 1980.
[28] D.R. Stoutemyer, Which polynomial representation is best? Surprises abound!, Proceedings of
the 1984 Macsyma User’s Conference, General Electric, Schenectady N.Y., pp. 221-243, 1984.
[29] D.R. Stoutemyer, Ten commandments for good default expression simplification, Journal of
Symbolic Computation 46, pp. 859-887, 2011.
[30] Texas Instruments, TI-89/TI-92 Plus Developer Guide, 2001,
http://education.ti.com/educationportal
[31] R.E. Zippel, A Macsyma Chip?, Proceedings of the 1979 Macsyma Users Conference, pp.
215-221, 1979.
[32] R.E. Zippel, Probabilistic algorithms for sparse polynomials, in Symbolic and Algebraic Computation,
editor E.W. NG, Lecture Notes in Computer Science 72, Springer-Verlag, pp. 227-239,
1979
[33] Zonk, Internet uses 9.4% of electricty in the US, http://hardware.slashdot.org/story/07/
09/27/2157230/Internet-Uses-94-of-Electricity-In-the-US
224