John Gustafson, PhD

Appendix A | Linear Solvers

The problem that motivated John Atanasoff to build an electronic computer was one that had challenged mathematicians for many centuries. In about 300 BC, a Babylonian clay tablet gives this example of how a system of two equations can arise:

There are two fields whose total area is 1,800 square yards. One produces grain at the rate of 2/3 of a bushel per square yard while the other produces grain at the rate of 1/2 a bushel per square yard. If the total yield is 1,100 bushels, what is the size of each field?*

Translated into equations, with x and y for the areas of each field, this word problem says that

x + y = 1,800 square yards

2/3x + 1/2y = 1,100 bushels

The Chinese also studied such problems, and in the Jiuzhang Suanshu, or Nine Chapters on the Mathematical Art, they provided examples of systems involving up to six equations in six unknown quantities as early as 200 BC.

Even though such problems could be posed very easily, the effort to solve them seemed extraordinary and out of proportion to the simplicity of the statement of the problem. At the risk of reminding the reader of some of the more tedious moments spent in middle school algebra class, the way to solve the above system of two equations is to scale one equation so that the number multiplying x or y in one equation matches that of the other equation, and then subtract the equations to eliminate that variable. If we multiply both sides of the first equation by 2/3, for example, the two equations line up nicely:

2/3x + 2/3y = 1,200

2/3x + 1/2y = 1,100

and we can subtract the second equation from the first to get a system that involves only y:

1/6y = 100

This is called “forward elimination,” where you eliminate one variable at a time from the system. After that, you “backsolve”; in the example above, y must be 600, and we can use the first equation x + y = 1,800 to conclude that x = 1,200.

What stymied human calculators was that the work to eliminate every variable grew as the cube of the number of equations. In the two-by-two example above, all one had to do was scale the first equation and subtract it from the second. But in a six-by-six problem (the largest one attempted in the Chinese tome), the first equation would have to be scaled for each of the other equations to eliminate that first unknown variable, and that task requires the performing of arithmetic on the entire six-by-six problem description (thirty-six numbers). That leaves a problem with five equations in five unknowns, so one has to repeat the elimination task, until all that is left is a simple equation in one unknown quantity. The “forward elimination” to get to a simple problem of one equation in one unknown is like a pyramid of arithmetic work. For a system of n equations, the base of the pyramid of work is n by n, working up to a tip that is 1 by 1, and the volume of that pyramid (the total amount of work) is proportional to the cube of n. (The backsolving task is still tedious, but only grows as the square of n.)

In the 1700s, to solve even ten equations in ten unknowns was considered a nearly insurmountable task. It requires more than three thousand multiplications and subtractions, and each arithmetic operation usually must be done with at least ten decimals of precision to avoid rounding errors that would make the result unacceptably inaccurate. The German mathematician Karl Friedrich Gauss needed to solve a system of six equations in the early 1800s when he was trying to plot the course of an observable asteroid, Pallas, and spent years grinding away at the numbers using a method almost identical to that explained by the Chinese two millennia earlier; that method now bears the name Gaussian elimination.

By 1836, Charles Babbage had conceived his mechanical (steam-powered) Analytical Engine, and in pitching his plan for it to the funding agencies of his era, he led with the idea that it could be used to solve systems of equations:

In the absence of a special engine for the purpose, the solution of large sets of simultaneous equations is a most laborious task, and a very expensive process indeed, when it has to be paid for, in the cases in which the result is imperatively needed.

When a physical problem demanded a logarithm, or a cosine, or for a physical quantity like energy to be calculated, it might have required a few dozen calculations per input quantity, and human calculators knew it was tedious work but not intractable. Solving systems of equations was regarded as intractable, since the work grew as the cube of the number of unknown quantities. Whether one used an abacus, a slide rule, or a desktop calculator like those made by Monroe or Marchand in the twentieth century, it was simply a matter of patience and a bit of skill to bull through the problems that arise with a single unknown variable. But to solve systems of n equations in n unknowns was, and is, the standard by which computational speed is measured.

The speed of computers at solving systems of linear equations has been tracked and publicized since the 1970s. The Top 500 list of computers in the world, analogous to the lists business magazines maintain of the Top 500 companies in the world, is based on this time-honored problem: how fast can the system solve n equations in n unknowns? In 2010, the computers at the top of the list solve problems that have more than a million unknown quantities, and they solve them at speeds exceeding a quadrillion operations per second. Compared to the Atanasoff computer of 1940, they are astronomically faster, yet they use the same procedure and the same precision for what remains the fundamental test of any computer more than seven decades later.

Appendix B | Binary Arithmetic

People accustomed to working with numbers using the usual decimal (base ten) arithmetic notation tend to forget that the basis for that notation is biological and not mathematical: we have ten fingers (digits) to count with. From our earliest years, we are taught the Arabic system of writing the symbols 1, 2, 3, 4, 5, 6, 7, 8, 9 for the quantities one to nine, and that numbers larger than that require more than one symbol. By recording how many tens there are in a number, then how many hundreds, and so on, every whole number is expressed in a unique way. And because Arabic is written from right to left, the tens and hundreds and thousands position are added to the left as numbers get progressively larger, not to the right. This is “base ten” or “decimal” arithmetic because it uses powers of ten. When one reads the number 7,239, say, it immediately conveys the quantity (7 × 1000) + (2 × 100) + (3 × 10) + 9.

We also commonly use clock arithmetic, which uses base sixty. The number of seconds goes up to 59 and then requires a new symbol, 1:00, to indicate 1 minute, 0 seconds. In other words, we work up to 59:59 and then one more second causes us to write it as 1:00:00—1 hour, no minutes, no seconds. There are many ways to represent numbers other than decimal notation.

The decimal system is convenient for counting on fingers, but it creates the inconvenience of having to memorize large tables for addition and multiplication. With rote memorization and many hours of practice, children learn to recite combinations like 7 × 8 = 56 without working through the derivation that 7 groups of 8 is the same number as 5 groups of 10 plus 6.

For automatic computation, it is certainly possible to build machines that work with decimal numbers. The older-style mechanical odometer on a car worked by rotating a wheel numbered 0 to 9, and at the point where the wheel rolls over to 0, a peg advances a similar wheel to the left by one digit to indicate another group of ten has accumulated. That’s the “carry,” and all computing mechanisms require a way to make sure that when an operation produces a number too large to store in one place, the overflow is carried to the next higher place in the notational system. More complex mechanisms allow counting down as well as counting up, which can be repeated for addition and subtraction, and addition and subtraction can be repeated for multiplication and division. Each string on a Chinese abacus holds beads in positions to represent the numbers 0 to 9, where the operator manually manages propagating the carry from ones place to tens place to hundreds place, and so on.

Binary arithmetic uses the number 2 instead of the number 10 as its base. It thus requires only two symbols, 0 and 1, to record numbers. It takes more symbols to record numbers, as can be seen simply by looking at the rather bulky-looking binary equivalent of the numbers 0 to 10:





































However, the binary system has at least one major advantage over the decimal system: the arithmetic tables are extremely small and simple. For addition, there are only four entries:

0 + 0 = 0

0 + 1 = 1

1 + 0 = 1

1 + 1 = 10

(where “10” is binary for the number 2, not decimal for ten).

For multiplication they are even simpler, since there is no need for a carry; the table looks the same as it does in decimal:

0 × 0 = 0

0 × 1 = 0

1 × 0 = 0

1 × 1 = 1

The information theorist Claude Shannon was the first to shorten the phrase “binary digit” to “bit,” which was a play on words because it also was the smallest bit of information that could be represented: yes or no, on or off, one or zero, true or false.

Making automatic devices that have two states is much simpler than making devices that have ten states. The two states could be a wire in an electric circuit being at one of two voltages, or a mechanical lever being in one of two positions, for example. The design problem for building an automatic multiplier changes from having to somehow mimic the entire ten-by-ten table one learned in grade school, to this:

If both inputs are 1, then the answer is 1. Otherwise, the answer is 0.

If the automatic computing device is mechanical, like the first Zuse computers, then this rule means something like letting a pin slip through two holes only if both are lined up on the right. If the device is electronic, like the Atanasoff-Berry Computer, then this rule means building a circuit that allows current to flow only if both switches in series are closed.

The early decimal machines, like the ENIAC and the Harvard Mark I, still used on-off states in their circuits but bundled them into subcircuits that represented the decimal digits 0 to 9. They were essentially electrical versions of the earlier mechanical calculators that used wheels to count in decimal. Both ENIAC and the Mark I were the size of a room, largely because of the inherent inefficiency of decimal representation, whereas the Zuse and Atanasoff designs were the size of a desk. It was not because the larger machines held more data; the ENIAC needed 18,000 vacuum tubes to compute with a maximum of twenty ten-decimal-digit numbers. The Atanasoff-Berry Computer needed only 600 vacuum tubes, yet it computed with a maximum of thirty fifteen-decimal-digit numbers—50 percent more numbers, each with 50 percent greater precision.

Because very few people can look at a representation like 0001110001000111 and grasp its numerical value, all computers that use binary arithmetic also provide a way of accepting human-readable decimal representations as input and converting their answers to decimal output. That conversion is a small price to pay for an overwhelming simplification of the design of the computing device, so computers that use decimal arithmetic internally have disappeared from the computing landscape in favor of the binary arithmetic approach used by Atanasoff and Zuse.

Appendix C | Electronic Switches

In a biological organism, a neuron might cause a muscle to contract or convey a sensation, but a neuron can also trigger other neurons. A brain is a collection of neurons, interconnected so the firing of one can cause or suppress the firing of many others. That capability is what permits organisms to exhibit such complex and intelligent behavior. For any artificial device to rival the computing abilities found in nature, it must similarly have control elements that trigger other control elements.

The control elements of early electronic computing progressed from relays to vacuum tubes (or valves, the British term) to transistors. Such devices, sometimes called “switching elements,” mimic the behavior of neurons in that they are switches that can control many other switches.

When something closes a switch to complete a circuit, the current that flows can operate another switch, either to open it or close it. The small amount of current needed to operate a switching element can result in a lot more current either flowing or not flowing, so a switching element can operate several other switching elements, not just one. That is why switching elements are often referred to as “amplifiers”: the power they control is larger than the power it takes to operate them. Switching elements let electronic devices perform binary arithmetic (see appendix B) because their on-off state can represent the binary digits 0 and 1.

Imagine a waterfall with a gate at the top that can dam up the flow of water. When the gate is open, water spills down with much more energy than that needed to operate the gate. That flow of water could operate mechanisms that open or close other gates of other waterfalls, creating quite a complicated series of events. If, say, water not flowing represents the binary digit 0 and water flowing represents a 1, then a set of waterfalls could represent numbers in binary. Changing the gates performs operations on those numbers. What electrical devices do is similar, but with a flow of electrons instead of a flow of water.

An everyday example of switches in combination is what electricians call a double throw switch. Most homes have at least one room with two entry points and separate wall switches by each entry that can operate the overhead light. Perhaps you have had the experience of entering a dark room at one door when someone else enters another door, and you both hit the wall switches at the same time, which keeps the room dark. The pair of wall switches performs the logical operation that computer scientists call an “exclusive OR,” because the light turns on if one or the other switch flips, but not when both flip.

An example of the logical “AND” operation is an appliance plugged into a power strip that has a switch. You have to flip on the power strip switch AND the appliance switch for the appliance to work. Switches, properly coupled, can represent logical operations, and logical operations in turn can mimic arithmetic operations on binary digits.

Suppose we want to build a circuit that takes two binary inputs a and b, which can be 0 or 1, and produces their sum c in binary. If we allow two digits in the result, the addition table looks like this:

a + b = c:

0 + 0 = 00

0 + 1 = 01

1 + 0 = 01

1 + 1 = 10

The right-hand digit of the sum c is the “exclusive OR” of the values of a and b. It has value 1 if a or b is 1, but not both. The left-hand digit is the “AND” of a and b. It is 1 only if a and b are both 1. So that says we could build a circuit where the inputs a and bflip switches to indicate their 0 or 1 value, and the two digits of c would immediately “light up” at the speed of electricity. Instead of operating lights, the flow of electricity in the result then operates yet other switches, so that the arithmetic can cascade to perform arithmetic on numbers with many (binary) digits.

One type of switch that can be operated by electricity is called a “relay.” A relay is a simple device, one made from ordinary iron and wire. If you wind the wire around a piece of iron and run electricity through the wire, the iron becomes an electromagnet and remains that way until the electricity is off. That electromagnet can pull on something else made of iron such that it mechanically closes or opens a switch. Compared to mechanical switching elements, relays are quite fast, but they are electromechanical and not electronic. In the early days of telephone technology, telephone companies used relays to route calls, so relays were a mass-manufactured part even by the 1930s. Howard Aiken used them in his Mark I computer. Konrad Zuse used inexpensive mechanical linkages in his first designs to represent binary numbers but later used electromechanical relays.

Relays are inexpensive but not very uniform in their response; a group of relays attached to a single source of electricity doesn’t switch at the same time, which means a computer designer can only operate the system as fast as its slowest relay. What is worse is that relays are not very reliable, because on rare occasions the switch sticks in position after the electricity turns off. In a computer system that has many thousands of relays, the odds are good that at least one relay will fail.

An electronic switch moves only electrons, not masses of metal. The first device discovered that could accomplish this was the vacuum tube. The glowing tube in a neon sign has a low-pressure gas that conducts electricity between two electrodes, one at either end. If you create a nearly perfect vacuum, it is still possible to get electricity to flow, but the electrodes have to be closer together, and it helps to heat one of the electrodes to “boil” electrons out of it to jump across the vacuum. It is very much like the waterfall analogy, with electrons responding to the pull of electrical forces instead of water responding to the pull of gravity.

Like the waterfall analogy, it is possible to insert a “gate” that controls the flow. If the vacuum tube has a third electrode in the form of a screen placed between the other two, then its voltage can control how much current flows. Like the waterfall gate, closing a gate stops all flow; electrons and water will not flow “uphill” even when a huge downhill waits on the other side. Thus, a vacuum tube serves as a switching element. Because only electrons and not mechanical parts are moving, vacuum tubes can switch on and off in microseconds, and a vacuum tube is more reliable than a relay. They can still burn out, however, much like an incandescent lightbulb burns out.

In the early decades of electronic computing, vacuum tubes were by far the most costly components in a computer system. The invention of the transistor made it possible to replace vacuum tubes with small, solid-state devices. Today, the switching elements of computers are transistors that are “printed” (lithographed) by the billions onto a piece of silicon, so each transistor in an integrated circuit costs less than a millionth of a penny.

Appendix D | Differential Equations

Whereas everyone learns arithmetic, geometry, and some algebra in a general education, the next conceptual climb is a steep one that only those pursuing technical degrees usually undertake: calculus. Since much of the motivation for the early computers was to solve problems arising in calculus, this appendix gives an overview of the applications that give rise to calculus problems and explains why they are so difficult to solve using pencil-and-paper methods alone.

In elementary school, children learn the counting numbers 1, 2, 3, …, then fractions and decimals, then negative numbers and sets of numbers (like three-dimensional coordinates or statistical results). The concept that marks the transition to higher math is that what calculus shows us how to manipulate are not just numbers, but functions. A function is the operation performed on a set of numbers, like taking the cube root of x for all values of x between 3 and 7, or taking the cosine of x and adding 17 and then taking the square root of that whole expression. The actual value of x is not the focus, and neither is the numerical value that results from applying the function to any particular x. This can be disconcerting after experiencing a decade of math teachers demanding the answer to how operations change numbers into numbers. In calculus, the operations change functions into functions.

In the mid-1600s, two brilliant men independently invented the mathematics we now call calculus. Just as the question of who contributed most to the invention of the modern computer is the subject of argument, so is the question of who deserves the most credit for developing calculus. Isaac Newton in England and Gottfried Leibniz in Germany both made groundbreaking contributions but were not aware of each other’s work.

Newton developed calculus as a way to describe physics in mathematical terms. For example, if a mathematical expression describes the position of an object as a function of time, what is the mathematical expression that describes the speed of the object? Determining speed from position means taking the “derivative” of the position function, also called “differentiating” the function. Differentiation is a calculus operation that has its own collection of memorized rules and methods just as elementary arithmetic has rules for multiplying many-digit numbers. The inverse question, that of determining the position if you know the speed, means taking the “integral” of the speed function, or “integrating” the function. In general terms, differential calculus is used to find the rate at which things change, and integral calculus is used to find how things accumulate, like the area or volume of objects described by functions.

Ordinary Differential Equations (ODEs)

As in the example mentioned above, differentiating the position function gives the speed function. Differentiating the speed function gives the acceleration function. A situation that often arises in physics is that an equation relates the position, the speed, and the acceleration. Since that equation involves differentials of a function (with respect to just one variable) as well as the function itself, it is an “ordinary” differential equation, or ODE.

Here are some examples of physical problems that are expressible as ODEs:

A rocket projectile accelerates as it burns fuel, but it also becomes lighter with time so that it takes less fuel to make it go faster. It slows with air resistance, some of which is proportional to the speed and some of which is proportional to the speed squared. The physical laws lead to ODEs that mathematicians can express with just a few symbols but that are very difficult to solve. Such calculations were of great interest to the computer developers of the World War II era, when hitting a target with a missile was a challenge involving a lot of trial and error. The intimidating and blackboard-filling math for this problem may be the source of the expression “it’s not rocket science,” since rocket science of this sort really is difficult.

As an asteroid moves through the solar system, it accelerates under the gravitational forces of the sun, planets, and other masses. Thus, the second derivative of the position (its acceleration) relates to the position by an expression that sums all those forces, giving rise to an ODE. Solving that equation is of great interest if the question is whether the asteroid might strike the earth in the near future.

A pendulum, or a mass hanging on a spring, moves according to an ODE. The more the mass moves away from equilibrium, the greater the acceleration in the opposite direction. This situation arises so often in physics that the ODE for it has a name: the harmonic oscillator equation.

Partial Differential Equations (PDEs)

Problems in physics are rarely so simple that they involve only a single equation involving one function that depends on one variable (like time). Consider the complexity of the airflow around an airplane wing, for example. Pressure, air speed, air density, and temperature all are functions of the position (in three dimensions) and the time, and those are just a few of the things that enter the equations that determine the lift and drag forces on the wing. Fundamental laws of physics say that the total energy, momentum, and matter cannot change with time. Each of those quantities is expressed with derivatives, and the fact that they are conserved creates a system of three PDEs that must be solved simultaneously. PDEs involve differentiation with respect to more than one variable, like both the time and the x direction.

One type of PDE problem is to find the steady state of a system. Suppose a room has a heater in one corner and an open window on the other side; what is the temperature at every point in the room? Mathematically, the problem is a PDE that involves differentiating the temperature in each of the three spatial dimensions. The temperature in the room at any point is the average of the temperatures in the immediate neighborhood of the point, except where the heater or window forces it to be a particular temperature. Another steady-state PDE problem is that of finding the shape of a trampoline when standing still somewhere on the mat. The depression of the trampoline at any point is the average of the depressions immediately around the point, except for the frame and under the feet of the person standing on it. For this kind of PDE, there is no need to consider time as a variable.

The other type of PDE involves time. Time-dependent PDEs can formulate how a physical situation evolves. In striking a note on a piano, for instance, a hammer hits the string, which causes complex sideways motion of the string as a function of both the time and the position along the length of the string. That problem is a close cousin to the harmonic oscillator problem described above for ODEs, except that both time and position are variables.

PDEs arise in many technical areas, not just physics. They can describe how populations of species grow and decline in an ecosystem; what the climate will be a century from now; how atoms bond to form molecules; how to design a suspension bridge to be strong with minimum materials; and how galaxies evolve over millions of years. They even find use in determining the best price for financial instruments, like put and call options. Economists use PDEs in macroeconomic theory. A famous remark by physicist Max Planck was that in his youth he considered going into economics but had to change to physics because the mathematics was too difficult.

Computers for Differential Equations

Differential equations are easy to express but usually fiendishly difficult to solve. At the beginning of the twentieth century, a handful of simplified examples were all that mathematicians could point to as amenable to pencil-and-paper analysis. The analytical solutions might work if the problem geometry was a sphere or a square plate or other idealized shape, but there was little hope of finding a solution, say, to the PDE that expresses the mechanical stresses on something in the shape of a wrench.

The approach that works with broad generality is to pick so many sample points in the function that the problem becomes one of working with lists of ordinary numbers, not functions. Using sample points gives the “discrete form” of differential equations, which are sometimes called difference equations because simple subtraction suffices to estimate the rate of change with respect to increments of time and space. For the piano string example, imagine that instead of a string, there are point masses evenly distributed along the length of the string that follow the simple rules of how masses behave when tugged on. This eliminates the calculus and gets us back to elementary arithmetic, but with a catch: to use enough points that the sampling is accurate requires a very large number of additions, multiplications, subtractions, and divisions. The sampling approach, or “numerical analysis” method, seems to offer the possibility of solving just about any problem that can be expressed as an ODE or PDE, but it begs for a way to do all that arithmetic at speeds far beyond what a human can do.

In the trampoline example, suppose the trampoline is square and sampled with a five-by-five grid. The amount the trampoline depresses at each grid point is approximately the average of the depression of the points around it. That leads to a set of twenty-five linear equations in twenty-five unknowns. Solving that type of system is what Atanasoff had in mind in designing his computer, since the total work to solve twenty-five equations is more than ten thousand calculations, intractable even with Marchant or Monroe desktop calculators. Atanasoff’s design could solve such a system in about fifteen hours.

The ENIAC design suggests that ODEs were its main target, and missile trajectory calculation is the most commonly cited application for that computer. The ENIAC could store only twenty variables, but it could apply changes to them very quickly to simulate how a physical system might evolve through time, with the time sampled in discrete steps. Thus, the ODEs that describe a missile become a repeated procedure; at time 0, the missile is at a given position on the ground and experiences a given thrust. Arithmetic says how it accelerates as a function of time, if we ignore the fact that its mass is decreasing as it burns fuel and that gravity is pulling it into a curved path back toward the ground. At time 0.01 second later, the velocity and thrust and position and mass are sampled again and used to compute what it will do at time 0.02 second, and so on. Since the ENIAC could do thousands of calculations per second on its small set of variables, the calculation of the complete flight path of the missile could finish in reasonable time.

The progress in computer technology in the last seventy years has increased speed and storage by a factor of more than a trillion. This allows us to obtain close, high-resolution approximations of the solutions to a vast range of differential equations, not just the handful that can be solved with pencil-and-paper analytical methods.

* Units of area have been translated into English units, for readability.

If you find an error or have any questions, please email us at Thank you!