28

On building systems that will fail

 4 years ago
source link: https://fermatslibrary.com/s/on-building-systems-that-will-fail
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
MNZJbaA.png

To FernandoCorbato

forwork in organizing the

concepts and leading the development

of the general-purpose large-scale

time-sharing and resource-sharing

computer systems CTSS and MULTICS

T

ii

FERNANDO J. eORBAT6

UF77J32.png!web

I

t is an honor and a pleasure to

acceptthe Alan Turing

Award. My own work has

been on computer systems,

and thatmy theme.

The essence of systems is that

they areintegrating efforts, requir-

ing broad knowledge of the prob-

lem area to be addressed, and the

detailed knowledge required is

rarely held by one person.the

work of systems is usually done by

teams. Hence I am accepting this

award on behalf of the many with

whomworked as much as for

myself. It is not practical toall

the individuals who contributed.

Nevertheless, I would like to give

special mention to Marjorie Dag-

gett and Bobfor theirparts

in the birth of CTSS Bob

Fano and the Ted Glaser for

their critical contributions to the

development of the Multics System.

Letme turn now to the title of

this talk: "On Building Systems

That WillFail." Of course the title I

chose was a teaser. I considered and

discarded some alternate titles: "On

Building Messy Systems," but it

seemed too frivolous and suggests

there is no systematic approach.

"On Mastering System Complexity"

sounded like I have all the answers.

The title that came closest, "On

Building Systems that are likely to

have Failures" did not the

nuance of inevitability that I

wanted to suggest.

What I am really trying ad-

dress is the class of systems that for

want of a better phrase, I will call

"ambitious systems." It almost goes

without saying that ambitious sys-

tems never quite work as expected.

Things usually go wrong--

sometimes in dramatic ways. And

this leads memy main thesis,

namely,that the question to ask

when designing such systems is not:

"/f something will go but

when

it will go wrong?"

Some Examples

Now, ambitious systems that fail are

really much more common than we

may realize. In fact in some circum-

stances we strive for them, revelling

in the excitement of the unex-

pected. For example, let me remind

you of our national sport of foot-

ball. The whole object of the game

is for each playthe limit

of its abilities. Besides the sheer

physical skill required, onethe

strategic intricacies, the ability to

audibilize, and the quickness to

react to the unexpected--all a deep

part of thegame. Of course, occa-

sionally one team approaches per-

fection, all the plays work, the

game becomes dull.

Another example of a system

that is too ambitious for perfection

is military warfare. same ele-

ments are there with opposing sides

having to constantly improvise and

dealthe unexpected. In fact

we get from themilitary that won-

derful acronym, SNAFU, which is

politely translated as "situation nor-

mal, all fouled up." And if any of

you are still doubtful, consider how

rapidly the phrases "precision

bombing" and "surgical strikes" are

replaced by "thefog of war" and

"casualties from friendly fire" as

soon as hostilities begin.

On a somewhat more whimsical

note, let me offer driving in Boston

as an example of systems that

will

fail. Automobile traffic is an excel-

lent case of distributed control with

a commonof protocols called

trafficregulations.Boston area

is notorious for the free interpreta-

tions drivers make of these pesky

regulations, and perhaps the epit-

ome of it occurs in the arena of the

traffic rotary. A case can be made

forrotaries. They are efficient.

There is no need to wait for slug-

gish traffic signals. They are direct.

And they offer great opportunities

for creative improvisation, thereby

adding zestto the of driving.

One of the most effective strate-

gies is for a driver approaching a

rotary to rigidly fix his or her head,

staring forward, of course, secretly

using peripheral vision to the limit.

Iteven more effective if the

driver on entering the rotary,

speeds up, and some drivers embel-

lish this last step by adopting a look

of maniacal glee. The effect is, of

course, one of intimidation, and a

pecking order quickly develops.

The only reason thereare not

more accidentsmost drivers

have a second component to the

strategy, namely,assume

everyone else may be crazy--they

are often correct--and every driver

is really prepared to stop with

inches to spare. wesee an

example of a system where ambi-

tious tactics and prudent caution

lead to an effective solution.

So far, the examples I have given

may suggestfailures of ambi-

tious systems come from the human

element and that at least the techni-

cal parts of the system can be built

correctly. In particular, turning to

computer systems, it is onlymat-

ter of getting the code debugged.

Some assume rigorous testing will

do the job. Some put theirhopes in

proving program correctness. But

unfortunately, there are many cases

for which none of these techniques

will always work [1]. Let meoffer a

modest example illustrated in Fig-

ure 1.

Consider the case of an elaborate

numerical calculation with a vari-

able, f, representing some physical

value, being calculated for a set of

points over a range of a parameter,

t. Nowthe propertyof physical

variables is thatthey normally do

not exhibit abrupt changes or dis-

continuities.

So what has happened here? If

we lookthe expression forf, we

see it is theresultconstant, k,

added to the product of two other

functions, g and h. Looking further,

we see that the function g has a be-

havior that is exponentially increas-

ing with t. The function h, onthe

other hand, is exponentially de-

creasing with t. The resultant prod-

uct of gh is almost constant

with increasing t until an abrupt

jump occurs and the curve for f

goes flat.

What has gone wrong? The an-

swer is that there has been floating-

point underflow at the critical point

in the curve, i.e., the representation

of thenegative exponent

ceeded the field size in the floating-

COMMUNICATIONS OF THE ACM/September

1991/Vol.34, No.9 7

3

eqyuArv.png

I

A Subtle Bug

Where

f(t)=k+ g(t).h(t)

g(t)-exp(at)

(a>O)

h(t)-exp(-bt)

(b>O)

t -.--I1,,.-

•.. Why Mishaps?

iii

Performance

100

10

MIPS

1

:IGURE

qGURE !

0.1

195019701990

Year

point representation for this partic-

ular computer, and the hardware

has automatically set the value for

the function h tozero. Often this is

reasonable since small numbers are

correctly approximated by zero--

but not in this case, where our re-

sults are grossly wrong. Worse yet,

since the computation off might be

internal, it is imagine that

the failureshown here would not

be noticed.

Because correctly handling the

pathology that this examplerepre-

sents is an extra engineering

bother, it should not be surprising

that the problem of underflow is

frequently ignored. But the larger

lesson to be learned from this ex-

ample is that subtle mistakes are

very difficult to avoid some

extent are inevitable.

I encountered my next example

when I was a graduate student pro-

gramming on the pioneering

Whirlwind computer. One night

while awaiting my turn to use it, the

graduate student before me began

complaining of how "tough" some

of his calculations were. He said he

was computing the vibrational fre-

quencies of a particular wing struc-

ture for a series of cases. In fact, his

equations were cubics, and he was

using the iterative Newton-Raph-

son method. For reasons he did not

understand, his method was find-

ing one of the roots, but not "con-

verging" for the others. He was try-

ing to fix this situation by changing

his program so that when he en-

countered one of these tough roots,

the program would abandon the

iteration


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK