From Stanford CSD History
The S-1 project built a family of multiprocessor supercomputers. The project was envisioned by Dr. Lowell Wood at the Lawrence Livermore National Lab in 1975 and staffed for the first three years by two Stanford University Computer Science graduate students, Tom McWilliams and Curt Widdoes.
That two graduate students could design and almost completely build a supercomputer by themselves is an amazing feat, comparable to the design and building of the CDC 6600 by Seymour Cray and a small staff a dozen years earlier. However, McWilliams and Widdoes are even better known for the major advances in CAD tools for logic design that they developed as part of the early days of the project and for the startup company they founded, Valid Logic Systems. In this respect the S-1 project was similar to the Super Foonly project.
“What we did was the first practical use of structured design for designing a real computer—not a toy research project in a university, but a large computer that really worked.” — Curt Widdoes
The project ramped up in 1978 with the addition of more students, including Mike Farmwald and Jeff Rubin, and again in 1979. Dr. Carl Haussman provided the day-to-day oversight as the project team grew in size.
Five generations of S-1 processors were planned, and two MSI/ECL generations were built. The project independently invented two-bit branch prediction, directory-based cache coherency, and multiprocessor synchronization using load linked and store conditional. The project also influenced the development of programming languages and compilers including Common LISP and gcc.
Support for the S-1 project came initially from the Fannie and John Hertz foundation, and later from the U.S. Navy, through the efforts of Dr. Lowell Wood and Dr. Edward Teller.
Dr. Lowell Wood, a physicist at Lawrence Livermore National Laboratories (LLNL) and protégé of Dr. Edward Teller, led the special studies group at LLNL, which was called the O-Group. The O-Group members had many interests, but their work mainly revolved around ideas for a national missile defense. Wood was also an interviewer for the Hertz Foundation, which awarded prestigious scholarships to graduate students interested in the applied sciences. From this position, Wood could occasionally recruit top students to work at the lab.
In the summer of 1975, two Hertz Foundation scholarship recipients, Tom McWilliams and L. Curtis Widdoes, enrolled in the Ph.D. program in computer science at Stanford and came to work with Dr. Wood at LLNL. Wood challenged them to design and build a supercomputer. In fact, Wood envisioned a family of multiprocessor supercomputers, with each having nodes with compute power similar to contemporary commercial supercomputers. The plan was to build five generations of processors with the same general architecture and to develop computer-aided logic design tools that would ease the task of re-implementing the processors in each new logic technology family. The fifth generation was planned to use wafer-scale integration (WSI).
In the fall of 1975 McWilliams and Widdoes developed the instruction set and multiprocessor structure. The computer was to have a 36-bit word length. The 30-bit address counted 9-bit quarterwords, and instructions were 36 bits in length. For more details, see the S-1 Architecture Manual.
In the spring of 1976 McWilliams and Widdoes used the Stanford University Drawing System (SUDS) on the Stanford Artificial Intelligence Laboratory (SAIL) PDP-10 system to start drawing logic diagrams and developed the Structured Computer-Aided Logic Design (SCALD) language to describe the design hierarchically. SCALD consisted of the following components:
- SUDS was used to input and edit drawings.
- macro expander: 8,000 lines of Pascal initially, written by Tom McWilliams.
- router and wire lister: 12,000 lines of Pascal initially, written by Curt Widdoes.
Although SUDS was written in PDP-10 assembly language, the other SCALD software was written in Pascal, so it could be run on the IBM System/370 model 168 at the Stanford Linear Accelerator Center (SLAC). SCALD ended up with about 30,000 lines of PASCAL code.
In the Summer the wire lister was determined to be too demanding for a summer research associate to complete.
In the Fall McWilliams and Widdoes began implementation of SCALD I on the IBM System/370 model 168 at the Stanford Linear Accelerator Center. The purpose of the SCALD Design System was to automate the conversion from a graphical, heirarchial block diagram to the low-level instructions for the wire-wrap machine. The development debug cycle started with borrowed SUDS time on the PDP-10 at SAIL in the morning, courtesy of John McCarthy. When funded users arrived to use the PDP-10, the data was moved by magnetic tape to the IBM System/370 model 168 at SLAC for back-end processing. Those results were marked up by hand for editing the next morning on SUDS. Time on the SLAC computer was also borrowed, courtesy of Forest Baskett.
In September, Thomas McWilliams, Lawrence C. Widdoes Jr and Lowell Wood prepare The Preliminary Design of an Advanced Programmable Digital Filter Network for Large Passive Acoustic ASW Systems for The Naval Systems Division, Office of Naval Research. The reports describes the LLL Programmable Digital Filter, which was later known as the S-1.
- In the Spring the bulk of SCALD I was complete.
- Summer saw the completion of the physical design subsystem, including simulation of signal waveforms.
- By Fall the final S-1 Mark I wire list was ready.
- In the Spring additional personnel join the project, including Mike Farmwald and Jeff Rubin, for debugging Mark I and for the multiprocessor operating system, known as Amber. Farmwald was also on a Hertz Foundation fellowship.
- In Summer the Mark I ran its first significant program and a single-user operating system, probably a port of Unix 7th Edition, which did not have virtual memory.
- Each node was to have the processing power of a CDC 7600, though benchmarks indicated the completed processor was 1/3 the power of a 7600 or about the power of an IBM System/370 model 168.
- One node was built.
- 10 million instructions per second, 5300 chips, ECL-10K implementation (10 MHz).
- Originally planned (in around 1976) to have a 4,096-word instruction cache, a 4069-word data cache, both four-way set-associative with 4-word line size.
- No segmentation.
- The design required two man-years of effort using the specially-developed CAD tools; there were 211 high-level diagrams and 144 low-level diagrams.
- There were 12 boards, each about 18 by 24 inches, with 5300 integrated circuits involving about 80,000 gates organized into three "pages" of logic; the pages unfold to allow access to wiring.
- SCALD papers were presented at the 15th Annual Design Automation Conference in June:
- In the Fall work began on the Mark II and on SCALD II. Instead of depending on borrowed time at SLAC, the group could now use the S-1 Mark I to run SCALD. SCALD II added the following components:
- packager: written by Curt Widdoes, about 30,000 lines of Pascal initially.
- timing verifier: written by Tom McWilliams, 6,000 lines of Pascal initially.
- simulator: written by Jeff Ruben.
It is not clear from the available documentation whether SUDS was ported to the Mark I, or the S-1 project continued to use the PDP-10 at SAIL (or some other PDP-10) to run SUDS. Porting SUDS to the Mark I would have been a considerable effort, since it was written in PDP-10 assembly language, and depended on a high-performance display system.
In February Ted Anderson and Daniel Weinreb join the project. Probably at this time Jeff Broughton, Hon Wah Chin, Charles Frankston, and Lee Parks join. Hon Wah Chin is the operating system team leader.
- In January Thomas M. McWilliams prepares a preliminary version of Verification of Timing Constraints in Large Digital Systems for delivery to the 17th Design Automation Conference in June. The Timing Verifier was run on the S-1 Mark I, and was used to verify the timing constraints of the S-1 Mark IIA.
- Also in January Thomas H. McWilliams submits The SCALD Timing Verifier: A New Approach to Timing Constraints in Large Digital Systems to the IEEE Transactions on Circuits and Systems.
- In February an S-1 paper by Curtis Widdoes is presented at the Spring COMPCON: The S-1 Project: Developing High-Performance Digital Computers.
- In April McWilliams updates his submission to the Design Automation Conference: Verification of Timing Constraints on Large Digital Systems.
- McWilliams' dissertation is completed in May: Verification of Timing Constraints on Large Digital Systems.
- In July, John Manferdelli, Michael Farmwald and William Bryson submit an S-1 paper to the Society of Photo Optical Instrumentation Engineers (SPIE) Annual Technical Symposium: Signal Processing Aspects of the S-1 Multiprocessor Project.
- Weinreb returns to MIT in August.
- Parks also leaves.
- Earl Killian joins the project.
- Widdoes' dissertation is completed in December: Automatic Physical Design of Large Wire-wrap Digital Systems.
- Broughton becomes the operating system team leader.
- McWilliams, Rubin, and Widdoes start Valid Logic Systems.
- Farmwald completes his dissertation in August: On the Design of High-Performance Digital Arithmetic Units, in which he predicts that the first Mark IIA will be operational in early 1982.
Broughton becomes the S-1 project director, diluting the manpower assigned to the operating system. Charles Frankston took an extended leave to continue his education.
In April, J. M. Broughton, P. M. Farmwald and T. M. McWilliams prepared a paper for the SPIE Technical Symposium: The S-1 Multiprocessor System in which they describe the S-1 Mark IIA as “currently undergoing initial checkout”. They also refer to the S-1 Mark V, targeted for development in 1985, which is intended to run at 2 to 3 times the speed of the Mark IIA.
- John D. Bruner joins the S-1 project in February.
- Also in February, T. S. Axelrod, P. F. Dubois and P. Eltgroth publish A Simulator for MIMD Performance Prediction—Application to the S-1 Mark IIA Multiprocessor in which they refer to the S-1 Mark IIA's “imminent availability”, and complain that hardware documentation was generally unavailable.
- Jay Pattin joins the S-1 project and contributes substantially to the operating system.
- The brochure The S-1 Multiprocessor Test and Evaluation Facility is published, promising a Mark IIA uniprocessor by the summer of 1983, running Unix version 7, and a multiprocessor by early 1984 running Amber, the operating system being developed by the S-1 group.
- In June Michael Farmwald submitted a paper to the NATO Advanced Research Workshop on High-Speed Computation: The S-1 Mark IIA Supercomputer which described the future Mark IIA multiprocessor as having 16 processors, and the Mark IIA uniprocessor as “undergoing initial checkout”.
In May Charles Frankston completes his thesis: The Amber Operating System in which he states that the Mark IIA had been delayed and was not yet ready. In a follow up note dated 1998 he stated that a single uniprocessor had been completed.
Killian leaves the S-1 project for MIPS Computer Systems.
In February, S-1 Project Amber Kernel Specification is published. It refers to a future user-mode library called Amber Base System which is to provide a richer user interface to the kernel functions.
It isn't clear from the available documentation when the Mark IIA became operational, but 1985 seems like a good guess. It is also not clear which of the features designed in 1982 were implemented. The following list is based on the 1982 design.
- Each node was to have the power of a Cray-1.
- Several nodes were to be built.
- Circuit technology was to be 15 MIPS, ECL-100K.
- A 4,096-word instruction cache and a 16,384 word data cache, both four-way set-associative with 16-word line size.
- Vector operations, selection, matrix and signal processing operations were added.
- Segmentation was added, with a variable boundary between the segment number and the offset.
- Relative pointers.
- Pipeline stages controlled by writeable microcode.
- Decoded instruction cache expanded the 36-bit instruction word to a 56-bit format to reduce the instruction decoding time.
- In approximately 1977 it was decided that branch prediction bits would be held in the instruction cache.
- "advance computation" in the early pipeline stages of simple instructions—execution was done twice: once in the Ibox and again in the Abox for ease of branch misprediction recovery; they used the term "value prediction".
- high-performance emulation of the DEC PDP-10 and the Univac AN/UYK-7 (32 bit word).
- 64 (later possibly 72) boards of logic, 25000 integrated circuits.
- The Instruction box (Ibox) was designed by Jeff Rubin and Tom McWilliams.
- The Arithmetic box (Abox) was designed by Mike Farmwald and Bill Bryson
The early physical design was similar to the Mark I: Figure 4 of the Widdoes and Correll Compcon 80 paper shows the Mark IIA, 5 "pages" of logic, pages unfold to allow access. A later design used a fixed frame in which each processor was built with four pages extending out from a central hub: see the 1983 "S-1 Multiprocessor Test And Evaluation Facility" brochure.
Three papers related to the S-1 project are prepared in November for delivery in 1988:
- John D. Bruner, Gary W. Hagensen, Eric H. Jensen, Jay C. Pattin and Jeffrey M. Broughton prepare Cache Coherency on the S-1 AAP, in which they describe the S-1 AAP as “under construction”.
- Eric H. Jensen, Gary W. Hagensen and Jeffrey M. Broughton prepare A New Approach to Exclusive Data Access in Shared Memory Multiprocessors. Both papers note that Hagensen has moved to MIPS Computer Systems.
- Viki Y. Moldenahuer, P. Michael Farmwald, Owen T. Anderson, Jay C. Pattin and Jeffrey M. Broughton prepare Ring Interconnection Network for the S-1 AAP Multiprocessor, in which the S-1 Advanced Architecture Processor is described as “being constructed” and “RISC-like”.
Anderson continued to work on the S-1 project until June, and John D. Bruner until August. Dr. Carl Haussman retires.
Current Location of Hardware
The Computer History Museum contains transcripts of interviews by Holly Stump on February 12, 2008, with Tom McWilliams and Curt Widdoes: