Reflections on Three Years of Reading Knuth
A little over three years ago, I picked up a boxed set of the first three volumes of Donald Knuth's "The Art of Computer Programming" (TAOCP). I knew them by reputation before I ever even considered reading them — I knew, for example, that they had a reputation for being packed with dense mathematicalese along with a lot of brain-bendingly difficult problems. For whatever reason, that's the sort of odd challenge that appeals to me, so I went into the series with the intent to work every single problem. That turned out to be unrealistically ambitious - for one thing, there are a few problems that are marked as "research problems" that Knuth himself admits should take a motivated graduate student a semester to complete satisfactorily. There are even a few problems that he lists as "unsolved research problems" that may not even have a solution. I ended up settling for making at least a realistic attempt at every problem; even so, I was able to work far fewer than I expected I should have been able to. This is some seriously advanced material. I finally finished volume 3 last month, after three years of working as many problems as I was smart enough to work (which turned out to be fewer than I'd care to admit - but don't judge me until you try it yourself!)
I know that a lot of programmer types talk about these books — more than once I've seen them referred to as the books that everybody recommends but nobody actually reads. A lot of people are put off by the academic packaging or the cost; I can't disagree with either of those concerns. The books are heavily academic — to the point that they can and have been used as college textbooks. They're also expensive but, I think, worth the cost.
Another reason people sometimes suggest staying away from TAOCP is that the material is irrelevant in the 21st century. This is sort of a more difficult concern to counter, honestly, especially now that I've finished all three books. On the one hand, I know that the Sun developers who put together the Java programming language leaned heavily on Knuth's discussion of random number generation and large-number mathematical operations in volume 2. On the other hand, though, most of us won't be undertaking any projects like the Java programming language in our careers. It's neat to read and entertaining to know, but you can probably get along your whole life just using the PRNG and mathematical libraries built into your programming environment without ever really understanding what makes them tick.
Volume 3 actually contains both the most relevant along with the least relevant material. The most relevant includes Knuth's discussion of sorting and searching routines. Of course, if you've studied computer science at any level, you already have a decent handle on sorting algorithms; you probably know that Quicksort is O(n log n) and Bubblesort is O(n2) and even if you can't quite remember what that terminology means, you know that Quicksort is better than bubble sort. I remember studying sorting algorithms as an undergraduate from Sedgewick; Sedgewick presented the worst-case behavior of each of the sorting algorithms and sort of punted on the "average" case by presenting them but explaining that their derivation was "beyond the scope of the book". As far as I know, besides papers published in inaccessible academic journals, the derivation of the average case of most sorting routines is beyond the scope of every book except for TAOCP where he patiently builds the concepts up from scratch. He does the same with many standard CS concepts like AVL trees and hash codes as well.
However, the least relevant material to a modern programmer appears in the middle of volume 3, spanning about 100 pages: a detailed description on how to most efficiently use external tape drives to sort data that's too large to fit into main memory. As with most things Knuth, the presentation is fascinating and the theory is amazing. In short, the idea is to sort as much data as you can onto as many tapes as you have, and then merge the data from the tapes onto a final sorted tape. He considers the time needed to rewind the tapes as well as taking advantage of tape drives that can read and write both forward and backward, even accounting for things like rotational speed and the time it takes a tape to get rotating at full speed. As amazing as it is that anybody was ever able to work this out to such efficiency, I can't imagine how any of these algorithms could be applied outside of their original use case and if you really actually need to sort more data than you can fit into RAM, there are almost definitely better solutions than a bank of external tape drives. As carefully as I read all the other sections, I did find myself skimming through the more difficult exercises in this section.
Still, even setting aside the irrelevance of the middle section, it's as questionable how useful the rest of the content is to a modern software developer. It's almost as unlikely that any of us will be implementing our own sorting or searching routines as it is that we'll be developing the sorts of random number generators or unlimited precision math libraries that were the focus of volume 2. As interesting as the material is, these books are a huge time commitment, and if you don't have a lot of time to spare, you can probably spend it more "efficiently" on the study of machine learning algorithms or something else "enterprisey". If you do have the time, though, whether you actually apply the lessons from these books or not, it's fascinating to work through.
The last concern I hear (or read, anyway) from people is that the books not only use assembler for code examples, but use an imaginary assembler language that Knuth made up just for these books. Well, if you're on the fence about taking the plunge and diving into TAOCP and you're concerned about the cost, content, or relevance of these books, I can't say that I can reasonably set your mind at ease on any of these counts. However, if you're put off by the use of assembler, I can assure you that it's actually a relatively small part of the books. Further, all of the algorithms are presented first in what Knuth calls "pseudocode" but is nearly a compilable language. In fact, it's close enough to Python that translating it almost verbatim is practically a mechanical process.
Consider listing 1, below. This is Knuth's description of one of the more complex algorithms from volume 3, the AVL tree algorithm which keeps a binary tree balanced dynamically as new nodes are added to it. As you can see, Knuth has variables, structures, and looping constructs (although he doesn't seem to have had a chance to confer with Djikstra with regards to the use of GOTOs).
The nodes of the tree are assumed to contain KEY, LLINK, and RLINK fields as in
Algorithm 6.2.2T. We also have a new field
B(P) = balance factor of NODE(P)
the height of the right subtree minus the height of the left subtree; this field
always contains either +1, 0, or -1. A special header node also appears at the top
of the tree, in location HEAD; the value of RLINK(HEAD) is a pointer to the root
of the tree, and LLINK(HEAD) is used to keep track of the overall height of the tree.
(Knowledge of the height is not really necessary for this algorithm, but it is useful
in the concatenation procedure discussed below.) We assume that the tree is nonempty,
namely that RLINK(HEAD) != ^.
For convenience in description, the algorithm uses the notation LINK(a,P) as a synonym
for LLINK(P) if a = -1, and for RLINK(P) if a = +1.
A1. [Initialize.] Set T <- HEAD, S <- P <- RLINK(HEAD). (The pointer variable P will
move down the tree; S will point to the place where rebalancing may be necessary, and
T always points to the parent of S.)
A2. [Compare.] If K < KEY(P), go to A3; if K > KEY(P), go to A4; and if K = KEY(P), the
search terminates successfully.
A3. [Move left.] Set Q <- LLINK(P). If Q = ^, set Q <= AVAIL and LLINK(P) <- Q and go
to step A5. Otherwise if B(Q) != 0, set T <- P and S <- Q. Finally set P <- Q and return
to step A2.
A4. [Move right.] Set Q <- RLINK(P). If Q = ^, set Q <= AVAIL and RLINK(P) <- Q and go
to step A5. Otherwise if B(Q) != 0, set T <- P and S <- Q. Finally set P <- Q and return
to step A2. (The last part of this step may be combined with the last part of step A3.)
A5. [Insert.] (We have just linked a new node, NODE(Q), into the tree, and its fields
need to be initialized.) Set KEY(Q) <- K, LLINK(Q) <- RLINK(Q) <- Λ, and B(Q) <- 0.
A6. [Adjust balance factors.] (Now the balance factors on nodes between S and Q need to
be changed from zero to +/-1.) If K < KEY(S) set a <- -1, otherwise set a <- +1. Then set
R <- P <- LINK(a,S), and repeatedly do the following operations zero or more times until
P = Q: If K < KEY(P) set B(P) <- -1 and P <- LLINK(P); if K > KEY(P), set B(P) <- +1 and
P <- RLINK(P). (If K = KEY(P), then P = Q and we proceed to the next step.)
A7. [Balancing act.] Several cases now arise:
i) If B(S) = 0 (the tree has grown higher), set B(S) <- a, LLINK(HEAD) <- LLINK(HEAD) + 1,
and terminate the algorithm.
ii) If B(S) = −a (the tree has gotten more balanced), set B(S) <- 0 and terminate the algorithm.
iii) If B(S) = a (the tree has gotten out of balance), go to step A8 if B(R) = a, to A9 if
B(R) = −a. (Case (iii) corresponds to the situations depicted in (1) when a = +1; S and
R point, respectively, to nodes A and B, and LINK(−a,S) points to A, etc.)
A8. [Single rotation.] Set P <- R, LINK(a,S) <- LINK(−a,R), LINK(−a,R) <- S,B(S) <- B(R) <- 0.
Go to A10.
A9. [Double rotation.] Set P <- LINK(−a,R), LINK(−a,R) <- LINK(a,P),LINK(a,P) <- R, LINK(a,S)
<- LINK(−a,P), LINK(−a,P) ← S. Now set
{ (-a,0), if B(P) = a;
(B(S),B(R))<-{ ( 0,0), if B(P) = 0;
{ ( 0,a), if B(P) = -a;
and then set B(P) <- 0
A10. [Finishing touch.] (We have completed the rebalancing transformation, taking (1) to (2),
with P pointing to the new subtree root and T pointing to the parent of the old subtree root S.)
If S = RLINK(T) then set RLINK(T) <- P, otherwise set LLINK(T) <- P.
Converting this description into working Python code doesn't even require a deep understanding of
the underlying algorithm. Knuth uses ^ to represent "NIL" (Python's NONE
), <- to
indicate assignment and the
unusual syntax B(P)
to represent the B
field of P
which most
languages indicate with P.B
. I've removed some of the exposition but kept the
essentials of the description in comments in the Python translation in listing 2, below. All of the
comments are Knuth's words, not mine.
# The nodes of the tree are assumed to contain KEY, LLINK, and RLINK fields.
# We also have a new field
#
# B(P) = balance factor of NODE(P)
#
# the height of the right subtree minus the height of the left subtree; this field
# always contains either +1, 0, or -1. A special header node also appears at the top
# of the tree, in location HEAD; the value of RLINK(HEAD) is a pointer to the root
# of the tree, and LLINK(HEAD) is used to keep track of the overall height of the tree.
# We assume that the tree is nonempty, namely that RLINK(HEAD) != ^.
class node:
def __init__(self):
self.KEY = None
self.LLINK = None
self.RLINK = None
self.B = 0
HEAD = node()
HEAD.RLINK = node()
HEAD.RLINK.KEY = 'A'
# For convenience in description, the algorithm uses the notation LINK(a,P) as a synonym
# for LLINK(P) if a = -1, and for RLINK(P) if a = +1.
def LINK(a, P):
if a == -1:
return P.LLINK
else:
return P.RLINK
def setLINK(a, P, val):
if a == -1:
P.LLINK = val
else:
P.RLINK = val
def avl(K):
# A1. [Initialize.] Set T <- HEAD, S <- P <- RLINK(HEAD).
T = HEAD
S = P = HEAD.RLINK
# A2. [Compare.] If K < KEY(P), go to A3; if K > KEY(P), go to A4; and if K = KEY(P), the
# search terminates successfully.
while True:
if K == P.KEY:
return P
elif K < P.KEY:
# A3. [Move left.] Set Q <- LLINK(P). If Q = ^, set Q <= AVAIL and LLINK(P) <- Q and go
# to step A5. Otherwise if B(Q) != 0, set T <- P and S <- Q. Finally set P <- Q and return
# to step A2.
Q = P.LLINK
if Q is None:
Q = node()
P.LLINK = Q
break
else:
if Q.B != 0:
T = P
S = Q
P = Q
# A4. [Move right.] Set Q <- RLINK(P). If Q = ^, set Q <= AVAIL and RLINK(P) <- Q and go
# to step A5. Otherwise if B(Q) != 0, set T <- P and S <- Q. Finally set P <- Q and return
# to step A2.
elif K > P.KEY:
Q = P.RLINK
if Q is None:
Q = node()
P.RLINK = Q
break
else:
if Q.B != 0:
T = P
S = Q
P = Q
# A5. [Insert.] Set KEY(Q) <- K, LLINK(Q) <- RLINK(Q) <- ^, and B(Q) <- 0.
Q.KEY = K
Q.LLINK = Q.RLINK = None
Q.B = 0
# A6. [Adjust balance factors.] If K < KEY(S) set a <- -1, otherwise set a <- +1. Then set
# R <- P <- LINK(a,S), and repeatedly do the following operations zero or more times until
# P = Q: If K < KEY(P) set B(P) <- -1 and P <- LLINK(P); if K > KEY(P), set B(P) <- +1 and
# P <- RLINK(P).
if K < S.KEY:
a = -1
else:
a = 1
R = P = LINK(a,S)
while P != Q:
if K < P.KEY:
P.B = -1
P = P.LLINK
elif K > P.KEY:
P.B = 1
P = P.RLINK
# A7. [Balancing act.] Several cases now arise:
#
# i) If B(S) = 0, set B(S) <- a, LLINK(HEAD) <- LLINK(HEAD) + 1,
# and terminate the algorithm.
#
if S.B == 0:
S.B= a
#HEAD.LLINK = HEAD.LLINK + 1
return Q
# ii) If B(S) = -a, set B(S) <- 0 and terminate the algorithm.
elif S.B == -a:
S.B = 0
return Q
#
# iii) If B(S) = a, go to step A8 if B(R) = a, to A9 if B(R) = -a.
elif S.B == a:
if R.B == a:
# A8. [Single rotation.] Set P <- R, LINK(a,S) <- LINK(-a,R), LINK(-a,R) <- S,B(S) <- B(R) <- 0.
# Go to A10.
P = R
setLINK(a, S, LINK(-a, R))
setLINK(-a, R, S)
S.B = R.B =0
elif R.B == -a:
# A9. [Double rotation.] Set P <- LINK(-a,R), LINK(-a,R) <- LINK(a,P),LINK(a,P) <- R, LINK(a,S)
# <- LINK(-a,P), LINK(-a,P) <- S. Now set
#
# { (-a,0), if B(P) = a;
# (B(S),B(R))<-{ ( 0,0), if B(P) = 0;
# { ( 0,a), if B(P) = -a;
#
# and then set B(P) <- 0
P = LINK(-a,R)
setLINK(-a, R, LINK(a,P))
setLINK(a,P, R)
setLINK(a, S, LINK(-a,P))
setLINK(-a, P, S)
if P.B == a:
S.B = -a
R.B = 0
elif P.B == 0:
S.B = 0
R.B =0
elif P.B == -a:
S.B = 0
R.B = a
# A10. [Finishing touch.] If S = RLINK(T) then set RLINK(T) <- P, otherwise set LLINK(T) <- P.
if S == T.RLINK:
T.RLINK = P
else:
T.LLINK = P
return Q
As you can see, I changed very little; I kept the same variable names (even with their sort of dated single-capital letter characteristics). The only really tricky part was uncovering the implicit loops defined by his GOTO statements since Python thankfully doesn't actually support the GOTO statement, but even that was relatively straightforward (and remember that this is one of the most complex algorithms described in the series). Otherwise, a handful of modifications:
- I replaced
Q <= AVAIL
in steps A3 & A4 with an actual dynamic allocation. - Since the implied "convenience function"
LINK(a, P)
appears on both the right and left side of assignments, I just went ahead and implemented it with two separate functions; once to function as a getter and another to function as a setter. - Knuth sidestepped the creation of the first node in an empty tree, so I did too, although this would have been trivial to address.
- In step A7, I skipped the step that kept track of the height of the tree, since it wouldn't work in Python, and isn't necessary to the algorithm; it would again be trivial to keep track of this somewhere else.
Otherwise I just typed in exactly what was described by the algorithm and got a functioning, if not very Pythonic, AVL implementation.
So there you have it - if you're still skeptical about tackling TAOCP and the only thing keeping you away is the use of Knuth's MIX Assembler, rest assured that you can skip all of the MIX and still understand everything that's presented. Still, you might start to change your mind about investing the time to learn MIX once you get started; it's actually a lot of fun to work through.
Add a comment:
How did you go about reading the volumes, more specifically, what sort of routine (daily, weekly, ...) do you have for reading through Knuth's works ?
I have tried (at least twice) during my undergrads of going through Vol 1 but only ever got through some of the math exercises. Trying to take another swing at it, but this time, more for self-satisfaction than academic use.
Please allow me to belabor a well-made point. Consider Facebook querying their database for all users that have dogs as pets. In a well-structured data base of all user data, this query would be conceivably constructed from some teraflops of data more than could be stored on any general personal computer, and would be constructed from many separate physical data stores. In your experience with this volume, would you consider this circumstance irrelevant to the tape reading and writing algorithms?
You've convinced me to give TAOCP a serious go after I've finished Concrete Mathematics, cheers.
Now about the relevance of this material today, one benefit probably is that you very thouroughly learn the methods for constructing and analysing algorithms from that immens three years of work you put in there. What's your take on this?
It is quite inspiring to read what you have accomplished. It has alwauys been my goal to do so sometime in my life. How many years did it take you to read Vol 1-3?
Also, now that you have worked through them, do you think your mathematical or problem solving abilities have increased?
Thank you
Yeah, Knuth may not be sexy and hip, but in my book, he's the first one to turn to when you really want to know what it's all about.
Awesome blog, man :)