74

The printer that wouldn't print: Fixing an IBM 1401 mainframe from the 1...

 5 years ago
source link: https://www.tuicool.com/articles/hit/YNFf2mZ
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

The Computer History Museum has two operational IBM 1401 computers used for demos , but a few weeks ago one computer suddenly couldn't print anything. I helped track down the problem, but it was more tricky than we expected; along the way we had to investigate the printer error checking circuits, the print buffer, and even low level core memory signals. This blog post discusses our investigation and how we traced the problem to a failed germanium transistor.

7RruUbR.jpg!web

The IBM 1401 mainframe computer (left) at the Computer History Museum printing theMandelbrot fractal on the 1403 printer (right).

The IBM 1401 computer was announced in 1959, and went on to become the best-selling computer of the mid-1960s, with more than 10,000 systems in use. The 1401 leased for $2500 a month (about $20,000 in current dollars), a low price that let even medium-sized businesses use the 1401 for payroll, accounting, invoicing, and many other tasks. The IBM 1401 computer was constructed from small circuit boards (called SMS cards) plugged into units called "gates"—these are gates in the sense of something that swings open, not logic gates. The photo below shows the 1401 with one of the gates open, revealing dozens of brown SMS cards plugged into the gate.

VJVFRnV.jpg!web

The IBM 1401 computer, with one of the gates opened, showing the dozens of circuit boards (SMS cards) in each gate. The fan on the front of the gate keeps the cards cool.

One key selling point of the IBM 1401 was its high-speed line printer (the IBM 1403), which could hammer out 10 lines per second, four times as fast as competing printers. The 1403 printer had excellent print quality, said to be the best printing until laser printers were introduced in the 1970s.IBM claims that "Even today, it remains the standard of quality for high-speed impact printing."

NJzimmb.jpg!web

Closeup of the type chain (upside down) for an IBM 1403 line printer.

The 1403 printer used a chain of type slugs (above) that rotated at high speed above the paper, with an inked ribbon between the paper and the chain. Each of the 132 print columns had a hammer and an electromagnet. At the right moment, when the desired character passed the hammer, the electromagnet drove the hammer against the back of the paper, causing the paper and ribbon to hit the type slug, printing the character.

uqYbqqB.png!web

Printing mechanism of the IBM 1401 line printer. From 1401 Reference Manual , p11.

Unfortunately, the printer at the Computer History Museum recently had a problem: whenever a line was printed, the computer would halt due to a "print check" error. Fortunately the museum has a team of volunteers to help keep the system running; people helping with this printer problem included Ron Williams, Frank King, Marc Verdiell, Carl Claunch, Michael Marineau, Robert Garner and Alexey Toptygin. By the time I arrived to help, Ron had written a simple test program that repeatedly attempted to print a line; he toggled the program into the computer by hand, and he disabled the error check. The printer printed the characters properly, so we suspected the problem was in the error reporting circuitry inside the computer. Our strategy was to find the error signal and then trace it back through the computer to determine why it was being generated.

We started by examining the latch circuit that holds the print check error condition and sends it to the rest of the computer. To find the circuit, we consulted the documentation: binders of cryptic computer-generated wiring diagrams, called Automated Logic Diagrams (ALD). A small piece of an ALD is shown below showing the print check latch (PR CHK LAT). Each box on the ALD corresponds to a circuit on an SMS board and the lines show how the boards are wired together. Deciphering the text inside the box on the right indicates a board of type2JMX implementing a "2+AO" function, which in modern terms is AND-OR-Invert . The text in each box also indicates the location of the card: its gate (physical swing-out gate, not logic gate), gate 01A6 in this case, and the card's position in the gate (F10). Thus, to check the output (labeled H) of the latch with the oscilloscope, we swung out gate 01A6, found card F10, and hooked the oscilloscope to pin H. We found pin H went low (error) when pins F and G went high, which was the proper behavior for the latch. Pin G (PR CK SAMPLE) was essentially a clock to sample the error state, while pin F was the error signal itself. Our next task was to determine what was triggering the error signal on pin F.

nEBzIb7.png!web

Excerpt of an Automated Logic Diagram (ALD) for the IBM 1401, showing the print check latch (PRT CHK LAT). This page is denoted 36.37.21.2.

The documentation also includes logic diagrams that show the circuitry at a logical level, which is slightly easier to understand than the physical connections on the ALD diagrams. The logic diagram below shows the printer error circuitry. At the right, the print check error signal (PRT CHK ERROR) comes out of the latch (PR CHK LAT) that holds the error signal. (This is the same latch as in the ALD diagram above, and you can match up the signal names.) To the left of the latch, several different error conditions are detected and combined to form the error signal fed into the latch. (Note that IBM's logic symbols didn't match standard symbols. The semicircle is an OR gate, not an AND gate. The triangle is an AND gate. An "I" in a box is an inverter.)

QJ7riiB.png!web

Logic diagram of the error checking logic for the IBM 1401/1403. From Instructional Logic Diagrams page 77 "Print Buffer Controls".

Several different conditions can trigger a print check errorand we thought the "hammer fire" check was a likely candidate. Recall that the printer uses 132 hammers, one per column, to print a line of characters. To make sure the hammers are operating correctly, the computer has two special planes in core memory. (The 1401 contains 4,000 characters of core memory; each bit of memory is a tiny ferrite ring that is magnetized one way to store a "1" and the other way for a "0". A grid of 4000 cores forms a plane, storing a 1-bit slice of memory. Multiple planes are stacked up to form the storage unit.) Each time the computer decides to fire a hammer, it records this in core memory in the "equal check" plane. When a hammer actually fires, the current pulse from the electromagnet stores a bit in the "hammer-fire" plane.Each print scan cycle, the computer compares the two core planes to see if a hammer was fired when it wasn't supposed to, or if a hammer failed to fire when it should have; a mismatch triggers the "hammer fire" check error.

F7F3E3b.jpg!web

Closeup of the hammer electromagnets in the IBM 1403 printer. An electromagnet (when energized through its pair of wires) pulls a metal armature, which drives the hammer, paper and ribbon against the type slug. There are 132 hammers, one for each column, arranged in two rows of 66.

After some difficulty, we determined that the problem wasn't the hammer fire check, but a different check: "print line complete" (PLC). This check ensures that for each line, either exactly one character was printed in each column or the column was blank. This check uses a third special core plane, the "print line complete" plane. Each time a character is printed in a column, the corresponding bit is set. (For a blank or unprintable character, a separate circuit sets the column's bit.) At the end of the line (during scan 49), the print line complete cores are checked; if any core is zero, the printer failed to print that column and an error is reported. (You can see the PLC CHECK signal and the logic that generates it on the earlier logic diagram.)

Oscilloscope probing (below) showed that the PLC CHECK (yellow) was triggered because the system thought a second character was being printed in the same column. The cyan signal is the (inverted) PLC bit from core (PR LINE COMP LATCH); each low pulse indicates a character has been printed in that column. The pink pulse (PRINT COMPARE) indicates a new character is being printed. The problem is that the cyan and pink signals go low at the same time, indicating both an existing character and a new character in the column. This generates the extra blue pulse (PLC CHECK), which triggers the yellow pulse (PRINT CHK ERROR from the latch). (This circuit can be seen in the earlier logic diagram, labeled "Trying to print position twice".)

FzuAVbZ.png!web

Oscilloscope trace from debugging the IBM 1401's printer.

Several things could cause the system to think two characters were being printed in the column. Looking at the printer's output we saw that it printed just the expected character on the paper, so the circuit to print a character seemed to be working correctly (PRINT COMPARE, the single pink pulse above), We tested the blank / unprintable circuit and it was detecting blank and non-blank columns correctly. So the most likely problem was reading a 1 from core memory (the cyan line above, PR LINE COMP LATCH) when it should be a 0. But was the problem the wrong value going in to core, or the wrong value coming out?

The logic diagram below shows the circuit that writes to the Print Line Compare core memory. At the right, PR LINE COMP INH is the (inverted) signal written to core.On scan 49 (the error-checking print cycle after printing all 48 characters), this line is set high, clearing the memory. If a character is being printed, the PRINT COMPARE EQUAL signal will set the core. At the left, logic gates detect a blank or unprintable character. And if a 1 bit was already in core (PR LINE COMP LATCH), the 1 bit is rewritten to core.

BZbeyye.png!web

Logic diagram of the print line complete logic for the IBM 1401/1403. From Instructional Logic Diagrams page 77 "Print Buffer Controls".

We detected that this circuit was writing erroneous 1 bits to core because it was reading erroneous 1 bits from core. But that put us in a circle, not knowing if the initial problem was the read or the write. To resolve this, we triggered the oscilloscope on print scan 49, which is when the PLC bits get cleared, and then looked at the next print scan, which reads the cleared bits back. We saw 0's being written (i.e. PR LINE COMP INH high), but unexpectedly saw 1's coming back (PR LINE COMP LATCH). So we knew something was going wrong at a low level in the core memory.

I should mention that in the base 1401 system, the printer check bits were stored in the main core memory module, but our system used a separate "print storage" core memory for improved performance. The performance issue is due to how the printer uses core memory: each time a hammer lines up with a type slug, the computer reads the corresponding character from core memory and fires the hammer if the character in storage matches the character under the hammer. Since core memory is constantly in use while printing a line, the computer can't do any computation while printing. The solution was the print storage feature: an additional 132-address core memory that functioned as a print buffer.With print storage, a line to be printed was first rapidly copied from the main core memory to the print storage core memory. Then the computer could continue doing computation using the main core memory while the print circuitry read from the print storage core memory. Each option on the IBM 1401 had a monthly charge; IBM charged an extra $386 a month for the print storage feature.

NRrQ7rj.jpg!web

This print storage gate has the circuitry to drive the printer buffer core memory. The core memory unit in the upper right has bundles of yellow wires attached.

The photo above shows the gate that implements the print storage feature. The core memory module is the block on the upper right with yellow wires attached. (Individual cores can be seen in the photo below.) Core memory requires a lot of supporting circuitry. To select an address, driver cards generate X and Y signals. To write a core, the inhibit signal is combined with the clock by a gate, and then a driver card generates a current pulse on the inhibit line that passes through all the cores in the plane.When a core is read, it induces a pulse on a sense wire. This pulse is amplified by a sense amplifier card, and then the bit is stored in a latch. The numerous SMS cards in the print storage gate provided these support functions.

AJBfqem.jpg!web

The cores inside the print buffer. The wiring is not the usual core memory grid because each printer hammer is wired directly to a hammer check core. The image quality is bad because of the plastic cover over the cores.

We probed the sense amplifier and latch cards on the reading side of the core memory and they seemed to be operating correctly, so we moved to the writing side. TheHN inhibit driver card seemed a candidate for failure since it operates at high current, but we swapped the card with a replacement and the printer still failed. Next, I tried looking at the input to that card, but found there was no signal on that line, which seemed very suspicious.

FVz2iqr.png!web

Oscilloscope of the bad "CHWW" NAND gate card: pink (3) and blue (4) are inputs, cyan (2) is the output, stuck high.

The missing signal was generated by a card of typeCHWW, a NAND gate that combines the inhibit signal with the clock before sending it to the driver card. I hooked up the oscilloscope to the inputs and output of the NAND gate, yielding the trace above. This trace was the smoking gun: the output (cyan 2) remained high even when the two inputs (pink 3 and blue 4) went high. This showed that the NAND gate had failed and its output was stuck high. This explained everything: with this output stuck high, only 1's would be written to the PLC core plane. Then, when a character was printed, the print circuitry would read the 1 from core, think a character had already been printed in this column, the PLC check would fail, and the print check error would be triggered.

memQRnN.jpg!web

The printer successfully operating, printing out powers of 2.

We swapped this card with a spare, and the printer started printing without any errors (above). This proved that we had finally traced the problem; it was a simple NAND gate in the depths of the printer buffer core memory circuit. The failed card is shown below. It implements three NAND gates (details) using diode-transistor logic (which IBM calls CDTL—Complemented Transistor Diode Logic). Each two-input gate uses one germanium transistor (circular metal can) and two diodes (striped glass components on the right). Pull up resistors (striped) and inductors (beige) on the left complete the circuits.

FZV3aeq.jpg!web

The failed CHWW card from the IBM 1401. This card implements three NAND gates. The lower left transistor failed, and has been replaced.

I tested the card with a signal generator and found that while two of the three NAND gates worked, the other was stuck at a high output, confirming what we saw inside the 1401. Next I tested the transistors using the diode test mode on a multimeter. The good transistors had voltage drops of 0.23V. (This may seem low, but remember that these are germanium transistors not silicon transistors.) In comparison, the bad transistor had a Vbe drop of 0.95V, much higher. Finally, we removed the transistors and checked them on a vintage Tektronix 577 curve tracer . We thought the bad transistor might just be too weak to operate the gate, but it was entirely dead—totally flatlined on the curve tracer.

We opened up the transistor on a lathe and looked inside. The transistor is an IBM 083 NPN germanium alloy transistor (germanium was used before silicon transistors). The transistor consists of a tiny germanium die (the shiny metallic square below), forming the base. Two wires are attached for the emitter and collector, connected to dots of tin alloy, a larger dot on the front for the collector and a smaller dot on the back for the emitter. Under the microscope, it looked like there was some corrosion on the alloy dots and the emitter wire didn't look solidly connected, so we suspect that is the root cause of the failure.

6JJbAjI.jpg!web

Inside a failed IBM 083 germanium transistor. The silver-colored square in the middle is the germanium die, wired to the base pin. The dot in the middle is tin alloy, forming the collector, with a wire to the collector pin on the left. A smaller dot on the other side of the germanium die forms the emitter, wired to the pin on the right.

Conclusions

This was a harder problem to diagnose than most of the IBM 1401 issues. But we managed to track down the problem, replace the bad card, and get the printer back in operation. One nice thing about the IBM 1401 compared to modern systems is that it's not a black box—you can look inside all the circuitry, down to the individual transistors. In this case, we were able to find the bad transistor that was causing the system failure, and even determine that it was probably corrosion that killed the transistor.

I announce my latest blog posts on Twitter, so follow me at @kenshirriff for future articles. I also have anRSS feed. The Computer History Museum in Mountain View runs demonstrations of the IBM 1401 on Wednesdays and Saturdays so if you're in the area you should definitely check it out ( schedule ).

Notes and references


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK