Wednesday, July 18, 2018

The Sherlock Holmes Guide to Programming, Debugging and Performance Tuning

Back in 2009, I published on this very blog a set of programming "laws" which I modestly called "Hillyard's Laws of Programming." Here they are:
There has even been a fourth law, although that one is a little more nebulous even than the first three.

Recently, I have been working my way through the Sherlock Holmes canon, but listening to audio recordings rather than reading. I know the stories almost by heart but there is nothing like listening to someone else's interpretation to trigger little observations that may have escaped one on previous readings. I have thus realized that Sherlock Holmes has made many pronouncements to Watson, his amanuensis, that show that the art of detection and that of software development have so much in common that Holmes would have been a first-rate programmer if there had been computers in his day.

Of course, we must not forget that Holmes, despite his familiarity, was in fact fictional--the creation of Sir Arthur Conan Doyle. Doyle was a strange man. Despite being scientifically trained, he was nevertheless a believer in all sorts of hocus-pocus. But the statements which he has Holmes make are prescient when considered in the realm of programming.

There are of course many sub-disciplines involved in software engineering, development, coding, whatever you want to call it, including (but not limited to):
  • programming (relating use cases to a particular design);
  • debugging;
  • performance tuning.
Of these, the greatest degree of mystery pertains to debugging and, perhaps to a lesser extent, performance tuning. It is to these activities that most of these statements relate most appropriately.

From The Sign of Four, one of the early novellas, he writes:
"Eliminate all other factors, and the one which remains must be the truth."
Or, similarly, look first at the following statement from the The Adventure of the Beryl Coronet:
"It is an old maxim of mine that when you have excluded the impossible, whatever remains, however improbable, must be the truth."
I alluded to this in my "First Law." If you have positively eliminated a fragment of code from your pool of suspicion, then the problem must be in some other part of the program, even if that seems highly unlikely. I have personally spent hours looking at the same bit of code, trying to find a flaw in it, only to realize later that I was looking in the wrong place! 

It is all too easy to assume that some part of the code (which was "working before" or which has been tested by someone else, etc.) is perfect. Sherlock Holmes puts it well in The Adventure of the Reigate Squires: 
“Now, I make a point of never having any prejudices, and of following docilely wherever fact may lead me, …”
This observation applies manifestly to performance tuning also. It is so easy to make assumptions, such as "if I cache this, then the performance must improve." Never do any such thing without testing the result.

When faced with a plethora of possibly conflicting results, it is important to know which you should trust the most. For example, you cannot trust the order in which buffered I/O occurs. If you need to be sure of the order, then you should use logs or unbuffered I/O. 

Sherlock Holmes summed it up thus, again from the same story:
“It is of the highest importance in the art of detection to be able to recognize, out of a number of facts, which are incidental and which vital. Otherwise your energy and attention must be dissipated instead of being concentrated.”
So, to employ our example above, the order of buffered output is incidental whereas the order in logs is (usually) vital.

My second "law" has to do with the situation you sometimes find yourself in where there are two seemingly independent problems with your code. Let's say you are concentrating on problem A, which is proving challenging but so far intractable, while you are aware of an apparently minor problem B for which you think you have a simple solution. It's tempting to concentrate your efforts on the more interesting problem (A). But you would be well advised to take a slight detour and fix problem B. You never know: that fix might also be the solution to problem A (it's happened to me many times).

Holmes understood this also, as evidenced by this comment from The Adventure of the Musgrave Ritual: 
“‘At least,' said [Holmes], 'it gives us another mystery, and one which is even more interesting than the first. It may be that the solution of the one may prove to be the solution of the other.”
The third "law" relates to the practice of peer programming. I can't count the number of times I've asked someone for help and then, midway through explaining the background of the problem, I've realized my own error. Holmes was aware of this phenomenon too, for he states in The Adventure of the Blue Carbuncle: 
“Not at all. I am glad to have a friend with whom I can discuss my results.”
And, even more explicitly, he discusses it in The Adventure of Silver Blaze:
“At least I have got a grip of the essential facts of the case. I shall enumerate them to you, for nothing clears up a case so much as stating it to another person, and I can hardly expect your co-operation if I do not show you the position from which we start.”
A certain amount of imagination is also extremely helpful when trying to solve a problem. If you imagine a particular scenario, it may follow that the currently mystifying behavior of your code comes to be a natural outcome of your imagined situation. Again from Silver Blaze (incidentally, one of the very best stories):
”See the value of imagination," said Holmes. "It is the one quality which Gregory lacks. We imagined what might have happened, acted upon the supposition, and find ourselves justified. Let us proceed."
Sometimes a clue comes to you not from observed behavior but from expected behavior that you do not observe. Many's the time I have instrumented some method with a log message or unbuffered print statement only to find that I get no output whatsoever. This usually is enough to tell me that, despite my expectations, the method was never actually called. One of the most famous exchanges of Sherlock Holmes covers this point (again from Silver Blaze):
[Inspector Gregory] “Is there any other point to which you would wish to draw my attention?” 
“To the curious incident of the dog in the night time.” 
“The dog did nothing in the night-time.” 
"That was the curious incident,” remarked Sherlock Holmes.
Let us now return to the second passage quoted above, having to do with casting any prior prejudices aside. I would venture to suggest that this is perhaps the most important guideline that Holmes give us: to carefully gather as much of our evidence as possible before forming a theory. He sums this attitude up in the very first of the Sherlock Holmes stories published in the Strand Magazine--A Scandal in Bohemia:
“This is indeed a mystery,” I [Watson] remarked [to Holmes]. “What do you imagine that it means?” 
“I have no data yet. It is a capital mistake to theorise before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.”
His basic theme is similar in the following statement from The Sign of Four:


“No, no: I never guess. It is a shocking habit,—destructive to the logical faculty.” 
I hope that these utterances of Sherlock Holmes will help you take the proper course of action when presented with a problem in programming, debugging or performance tuning. Clearly, my remarks are intended to apply to any program language or system, not just Java.

OK, back to work!

Updated with quotations from The Sign of Four, Mar 12 2019