"Those who do not understand Unix are condemned to reinvent it, poorly."-- Henry Spencer
Every branch of engineering and design has technical cultures. In most kinds of engineering, the unwritten traditions of the field are parts of a working practitioner's education as important (and, as experience grows, often more important) than the official handbooks and textbooks. Senior engineers develop huge bodies of implicit knowledge, which they pass to their juniors by (as Zen Buddhists put it) ``a special transmission, outside the scriptures''.
Software engineering is generally an exception to this rule; in most of its sub-specialties, technology has changed so rapidly, software environments come and gone so quickly, that technical cultures have been weak and evanescent. There are, however, exceptions to this exception. A very few software technologies have proved durable enough to evolve strong technical cultures, distinctive arts, and an associated design philosophy transmitted between generations of engineers.
The Unix culture is one of these. The Internet culture is another -- or, as the millenium turns, perhaps the same one. The two have grown increasingly difficult to separate since the early 1980s, and in this book we won't try particularly hard.
Unix has found use on a wider variety of machines than any other operating system can claim. From supercomputers to personal micros, on everything from Crays down to handheld computers and embedded networking hardware, Unix has probably seen more architectures and more odd hardware than any three other operating systems combined.
Unix has supported a mind-bogglingly wide spectrum of uses. No other operating system has shone simultaneously as a research vehicle, a friendly host for technical custom applications, a platform for stock business software, and a vital component technology of the Internet.
Confident predictions that Unix would wither on the vine, or be crowded out by other operating systems, have been made yearly since its infancy. And yet Unix, in its present-day avatars as Linux and Solaris and half a dozen other variants, seems stronger than ever today.
At least one of Unix's central technologies -- the C language -- has been widely naturalized elsewhere. Indeed it is now hard to imagine doing software engineering without C as a ubiquitous lingua franca of systems programming.
Unix's durability and adaptability have been nothing short of astonishing. Other technologies have come and gone like mayflies. Machines have increased a thousandfold in power, languages have mutated, industry practice has gone through multiple revolutions -- and Unix hangs in there, still producing, still paying the bills, and still commanding loyalty from many of the best and brightest software technologists on the planet.
Much of Unix's success has to be attributed to Unix's inherent strengths, to design decisions Ken Thompson and Dennis Ritchie and Brian Kernighan and Doug McIlroy and other early Unix developers made back at the beginning; decisions that have been proven sound over and over since. But just as much is due to the design philosophy, art of programming, and technical culture which grew up around Unix in the early days, and has continuously and successfully propagated itself in symbiosis with Unix ever since.
Outsiders have frequently dismissed Unix as an academic toy or a hacker's sandbox. One recent source [UHH] follows an antagonistic line nearly as old as Unix itself in writing its devotees off as a cult religion of freaks and losers. Certainly the colossal and repeated blunders of AT&T, Sun, Novell, and other commercial vendors and standards consortia in mis-positioning and mis-marketing Unix have become legendary.
Even from within the Unix world, Unix has seemed to be teetering on the brink of mainstream success for so long as to raise the suspicion that it will never actually get there. A skeptical outside observer's conclusion might be that Unix is too useful to die but too awkward to win big, a perpetual niche operating system.
Not even Microsoft's awesome marketing clout has been able to dent Unix's lock on the Internet. While the TCP/IP standards on which the Internet evolved under TOPS-10 and are theoretically separable from Unix, attempts to make them work on other operating systems have been bedeviled by incompatibilities, instabilities, and bugs. The theory and RFCs are available to anyone, but the engineering tradition to make them into a solid and working reality exists only in the Unix world.
The Internet technical culture and the Unix culture began to merge in the the early 1980s, and are now inseperably symbiotic. To function effectively as an Internet expert, an understanding of Unix and its culture are indispensible.
The Unix API is the closest thing to a hardware-independent standard for writing truly portable software that exists. It is no accident that IEEE's Portable Operating System Standard [POSIX] has a Unix API.
Binary-only applications for other operating systems die with their birth environments, but Unix sources are forever. Forever, at least, given a Unix technical culture that polishes and maintains them across decades.
The open-source culture is a tremendous resource for any developer. Why code from scratch when you can adapt, reuse, recycle, and save yourself 90% of the work?
This tradition of code-sharing depends heavily on hard-won expertise about how to make programs cooperative and reusable. And not by abstract theory, but through a lot of engineering practice -- unobvious design rules that allow programs to function not just as isolated one-shot solutions but as synergistic parts of a toolkit.
Today (in 1999), a burgeoning open-source movement is bringing new vitality, new technical approaches, and an entire generation of bright young programmers into to the Unix tradition. Open-source projects including the Linux operating system and symbiotes such as Apache and Mozilla have brought the Unix tradition an unprecedented level of mainstream visibility and success. The open-source movement seems on the verge of winning its bid to define the computing infrastructure of tomorrow -- and the core of that infrastructure will be Unix machines running on the Internet.
Unix boosters seem almost ashamed to acknowledge this sometimes, as though admitting they're having fun might damage their legitimacy somehow. But it's true; Unix is fun to play with and develop for, always has been.
There are not many operating systems that anyone has ever described as fun. Indeed, the friction and labor of development under most other operating systems has been aptly compared to kicking a dead whale down the beach. The kindest adjectives one normally hears are on the order of ``tolerable'' or ``not too painful''. In the Unix world, by contrast, the OS is normally seen not as an adversary to be clubbed into doing one's bidding by main effort but rather as an actual positive help.
This has real economic significance. The fun factor started a virtuous circle early in Unix's history. People liked Unix, so they built more programs for it that made it nicer to use. Today people build entire, production-quality open-source Unix systems as a hobby. To understand how remarkable this is, ask yourself when you last heard of anybody cloning OS/360 or VAX VMS or Microsoft Windows for fun.
The ``fun'' factor is not trivial from a design point of view, either. The kind of people who become programmers and developers have ``fun'' when the effort they have to put out to do a task challenges them, but is just within their capabilities. ``Fun'' is therefore a sign of peak efficiency. Painful development environments waste labor and creativity; they extract huge hidden costs in time, money, and opportunity.
If Unix were a failure in every other way, the Unix engineering culture would be worth understanding for the ways it keeps the fun in development -- because that fun is a sign that it makes developers efficient, effective, and productive.
Other operating systems generally make good practice rather harder, but even so some of the Unix culture's lessons can transfer. And much Unix code (including all its filters, its major scripting languages, and many of its code generators) will port directly to any operating system supporting ANSI C (for the excellent reason that C itself was a Unix invention and the ANSI C library embodies a substantial chunk of Unix's services!).
The Unix philosophy is not a formal design method. It wasn't handed down from the high fastnesses of theoretical computer science as a way to produce theoretically perfect software. Nor is it that perennial executive's mirage, some way to magically extract innovative but reliable software on too short a deadline from unmotivated, badly managed and underpaid programmers.
The Unix philosophy (like successful folk traditions in other engineering disciplines) is bottom-up, not top-down. It is pragmatic and grounded in experience. It is not to be found in official methods and standards, but rather in the implicit half-reflexive knowlege, the expertise that the Unix culture transmits. It encourages a sense of proportion and skepticism -- and shows both by having a sense of (often subversive) humor.
Doug MacIlroy, the inventor of pipes and one of the founders of the Unix tradition, famously summarized it this way (quoted in [PHS]):
This is the Unix philosophy. Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.Rob Pike, one of the great early masters of C programming, offers a slightly different angle in [NoPiC]:
Rule 1. You can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is.Ken Thompson, the man who designed and implemented the first Unix, reinforced Pike's rule 4 with a gnomic maxim worthy of a Zen patriarch:Rule 2. Measure. Don't tune for speed until you've measured, and even then don't unless one part of the code overwhelms the rest.
Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy. (Even if n does get big, use Rule 2 first.)
Rule 4. Fancy algorithms are buggier than simple ones, and they're much harder to implement. Use simple algorithms as well as simple data structures.
Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming. (See Brooks p. 102.)
Rule 6. There is no Rule 6.
When in doubt, use brute force.More of the Unix philosophy was implied not by what these elders said but by what they did and the example Unix itself set. Looking at the whole, we can abstract the following ideas:
Assemblers, compilers, structured programming, ``artificial intelligence'', fourth-generation languages, object orientation, and software-development methodologies without number have been touted and sold as a cure for this problem. All have failed. As Fred Brooks famously observed [NSB], there is no silver bullet.
The only way to write complex software that won't fall on its face is hold its global complexity down -- to build it out of simple pieces connected by well-defined interfaces, so that most problems are local and you can have some hope of fixing or optimizing a part without breaking the whole.
Unix tradition puts a lot of emphasis on writing programs to read and write simple, textual, stream-oriented, device-independent formats. Mythology to the contrary, this is not because Unix programmers hate graphical user interfaces. It's because if you don't write programs this way, it's much more difficult to hook them together.
GUIs can be a very good thing. Complex binary data formats are sometimes unavoidable by any reasonable means. But before writing a GUI, it's wise to ask if the tricky interactive parts of your program can be segregated into one piece and and the workhorse algorithms into another, with a simple command stream or application protocol connecting the two. Before devising a tricky binary format to pass data around, it's worth experimenting to see if you can make a simple textual format work and accept a little parsing overhead in return for being able to hack the data stream with general-purpose tools.
(We discuss these issues in detail in chapter 5.)
It follows that the way to make programs that aren't buggy is to make their internals easy for human beings to reason about. There are two main ways to do that: transparency and simplicity.
A software system is transparent when you can look at it and immediately see what is going on. It is simple when what is going on is uncomplicated enough for a human brain to reason about all the potential cases without strain.
Modularity (small pieces, clean interfaces) is a way to organize programs to make them simpler. There are other ways to fight for simplicity. Here's another one:
Even more often (at least in the commercial software world) excessive complexity comes from project requirements that are based on the marketing fad of the month rather than the reality of what customers want or software can actually deliver. Many a good design has been smothered under marketing's pile of ``check-list features'' -- features which, often, no customer will ever use. And a vicious circle operates; the competition thinks it has to compete with chrome by adding more chrome. Pretty soon, massive bloat is the industry standard and everyone is using huge, buggy programs not even their developers can love.
Either way, everybody loses in the end.
The only way to avoid these traps is to encourage a software culture that actively resists bloat and complexity -- an engineering tradition that puts a high value on simple solutions, looks for ways to break program systems up into small cooperating pieces, and reflexively fights attempts to gussy up programs with a lot of chrome (or, even worse, to design programs around the chrome).
That would be a culture a lot like Unix's.
This objective will have implications throughout a project. At minimum, it implies that debugging options should not be minimal afterthoughts. Rather, they should be designed in from the beginning, from the point of view that the program should be able to demonstrate its own correctness and communicate the original developer's mental model of the problem it solves to future developers.
The objective of designing for transparency should also encourage simple interfaces that can easily be manipulated by other programs -- in particular, test and monitoring programs and debugging scripts.
Therefore, avoid gratuitous novelty and excessive cleverness in interface design -- if you're writing a calculator program, `+' should always mean addition! When designing an interface, model it on the interfaces of functionally similar or analogous programs with which your users are likely to be familiar.
Pay attention to tradition. The Unix world has rather elaborate traditions about things like the format of configuration and run-control files, command-line switches, and the like. These traditions exist for a good reason, to tame the learning curve. Learn and use them.
(We'll cover many of these traditions in Chapters 6 and 7.)
Somehow, though, practice doesn't seem to have quite caught up with reality. If we took this maxim really seriously throughout software development, the percentage of application written in higher-level languages like Perl, TCL, Python, Java, and Lisp that ease the programmer's burden by doing their own memory management would be rising fast.
And indeed this is happening within the Unix world, though outside it most applications shops still seem stuck with the archaic Unix strategy of coding in C (or C++). Later in this book we'll discuss this strategy and its tradeoffs in detail.
One other obvious way to conserve programmer time is to teach machines how to do more of the low-level work of programming. This leads to...
We all know this is true (it's why we have compilers and interpreters, after all) but we often don't think about the implications. High-level-language code that's repetitive and mind-numbing is just as productive a target as machine code. It pays to leverage code generators to the hilt, if they're available.
In the Unix tradition, they are. Parser/lexer generators are the classic examples; makefile generators and GUI interface builders are newer ones. We'll explore these ways and others in this book.
Data is more tractable than program logic. It follows that where you see a choice between complexity in data structures and complexity in code, choose the former. More: in evolving a design, you should actively seek ways to shift complexity from code to data.
The Unix tradition did not originate this insight, but a lot of Unix code displays its influence. The C language's facility at manipulating pointers, in particular, has encouraged the use of dynamically-modified reference structures at all levels of coding from the kernel upward. Simple pointer chases in such structures frequently do duties that implementations in other languages would instead have to embody in more elaborate procedures.
Rushing to optimize before the bottlenecks are known may be the only error to have ruined more designs than feature creep. From tortured code to incomprehensible data layouts, the results of obsessing about speed or memory or disk usage at the expense of transparency and simplicity are everywhere. They spawn innumerable bugs and cost millions of man-hours -- often, just to get marginal gains in the use of some resource much less expensive than debugging time.
In the Unix world there is a long-established and very explicit tradition (exempified by Rob Pike's comments above and Ken Thompson's maxim about brute force) that says: Prototype, then polish. Get it working before you optimize it.
That is, get your design right with an un-optimized, slow, memory-intensive implementation before you try to tune. Then you tune systematically, looking for the places where you can buy big performance wins with the smallest possible increases in local complexity.
KEEP IT SIMPLE, STUPID! |
Unix gives you good leverage for applying the KISS principle. The remainder of this book will help you learn how to use it.
To do the Unix philosophy right, you have to value your own time enough never to waste it. If someone has already solved a problem once, don't let politics or your ego suck you into solving it a second time rather than re-using. And never work harder than you have to; work smarter instead, and save the extra effort for when you need it. Lean on your tools and automate everything you can.
Software design and implementation should be a joyous art, a kind of high-level play. If this attitude seems prepostorous or vaguely embarrassing to you, stop and think; ask yourself what you've forgotten. Why do you design software instead of doing something else to make money or pass the time? You must have thought software was worthy of your passion once....
To do the Unix philosophy right, you need to have (or recover) that attitude. You need to care. You need to play. You need to be willing to explore.
We hope you'll bring this attitude to the rest of this book. Or, at least, that this book will help you rediscover it.