Thursday, September 16, 2010

Old farts

I picked up a link (from the DZone and javablogs) to a wonderful presentation by "Uncle" Bob Martin on "bad" code and, by implication, good software practice.  Turns out that Bob Martin began life as a software professional in 1970.  That makes him even older than me (or perhaps he just started work earlier).

I wrote my first line of code in 1967, on a "Hollerith" card that had perforations that allowed us to make holes with a pencil.  We'd send the cards up to London and a week later (yes, I am not making this up), we got the results back.  My first program was to solve the equation x = sech(x).  I probably took about 12 lines of code (that's to say 12 cards) but I don't remember exactly.  The only error I made was in the comment (see below) where I declared that the program was based on Newton's method of "apprnximation".

But after programming more or less full-time for about nine months in 1969 (and getting paid for it), I took time out to get my undergraduate degree where I did almost no programming whatsoever.

So, I recognize Bob Martin as a fellow "old fart" who has been through the trenches, like me.  He gives a damn good presentation and my hat is off to him!

I found myself agreeing with him so wholeheartedly that it made me look back with quite some frustration at all of the times that I worked so hard to advocate good coding practices, only to be fought tooth and nail.

I remember back in 1984 for instance, a certain programmer whose name I will omit although it is burned into my memory, wrote a function (we didn't call them methods in those pre O-O days) that was more than 3,000 lines long!  What was even sadder was that he didn't think there was anything amiss, and neither did his manager (my peer).

Then there were all those arguments about comments in code.  I adhered to the view that if the code needed to be commented there was something wrong with it.  And, worse, the comments were likely to become out of date as programmers changed the design but neglected to make the corresponding changes in the comments.  Others disagreed vehemently.

And then do you remember all that stuff about Yourdon and "Structured Programming" (De Marco et al?).  Those guys were living in cloud cuckoo land.  But you couldn't say so in front of one of the managers who thought that such techniques were the proper way to write software.

Gee, I'm just getting started.  I remember all those battles I had about "Q/A".  The worst of these were not that long ago: in around 1992 and the years following.  I wanted to concentrate on automated testing (we would call this unit testing nowadays) but that was met with huge skepticism.  I had even developed a unit testing methodology of my own to support it.  Not reliable enough I was told -- you needed real people to sit there and push buttons.  Aaargh!

What my opponents in this debate failed to realize is that the inevitable time lag between a software release and Q/A's testing of it means that, almost by definition, it is constantly in a broken state.  As soon as you try to go back and fix any bugs, the underlying software has already changed and you are extremely likely to create new bugs.

The modern "agile" approach with continuous integration, scrums, etc. minimizes this latency effect by early detection of problems.  It's the only sane way to go.

I could continue and maybe I will in another blog entry.  Meanwhile,


OK, back to work!

Monday, April 19, 2010

Peer Programming

As they say in the software world, four eyes are better than two.  This cute little Chihuaha-cum-code reviewer called Madison needs a home (we're fostering).

You could say it's a dog's life in the software business these days.  The great thing about open source software is that you can always look into the code to try to figure out what it does.  So, in theory, you don't need documentation (which rapidly goes out of date).  But in practice, you do need some documentation.  There's tons of unofficial documentation on the web, generally in someone's blog.  But they never seem to remember to specify which version of the software they are using.  And they rarely tell you everything you want to know.  Usually, you get some Hello World application with no explanation of how to extend it.  And of course, there's no peer review process to give you confidence that the author actually knows what he's talking about.

And then there are strange omissions. For instance, I built a bean container with dependency injection on top of Jakarta's commons configuration package.  This package claims to support include files.  Since XML files are more powerful (and least easier to read), it makes sense to use them for configuring applications built from complex sets of beans.  But I could never get the include mechanism to work.  That's where the saving grace of open source software comes in.  I found a comment in the code saying that the include mechanism was not implemented for XML files.  So, I implemented it.  It was fairly easy, and now I'm able to split up the configuration files.

OK, back to work!

Wednesday, January 20, 2010

Things that Java got wrong: part 1: clone()

 This is the start of a series on some of the what-might-have-beens in Java: facets of the language that, in my opinion, they "got wrong".

But, first, a disclaimer: I love Java and as far as I'm concerned, it's by far the best general purpose and popular language that has ever walked the earth, so to speak.  So while I'm about to be very picky indeed to my favorite toy, I still love ya baby!

clone()

The ability of a class to have its objects cloned is determined by the existence of a marker interface Cloneable.  If you override the clone() method in your Cloneable class (don't forget to make it public) you buy the privilege of not having it throw a CloneNotSupportedException when you call clone() on that object.  The super.clone() method will actually do most of the heavy lifting for you.  But it only does a field-by-field copy.  It doesn't clone any of the fields itself, it just copies them.  Now, wouldn't it be nice if we could provide an annotation, for example @Clone(deepcopy=true), for those fields on which we want to perform a deep copy.  Similarly for shallow copies, such as implemented by HashMap and ArrayList.

The way we have to do it now is to invoke clone() on those fields (and naturally, we have to implement the clone() method in the (Cloneable) type of the field).  And here's the annoying part.  When a field is marked final, because it's initialized only by the constructor(s), you have to mark it as non-final just so that you can assign the result of cloning it to the field.  Even without the annotations suggested above, the Java designers could have given the clone() method, if it exists, special privileges on a par with constructors.  It sounds a bit hokey, I agree.  But clearly the Object.clone() method effectively has these privileges (when it does a field-by-field copy).  It's only at our level that such privileges are missing.

OK, back to work!

Done tinkering with Darwin

Today I released Darwin framework for evolutionary computation, version 2.2.01 and I'm done.  No more updates unless somebody has a really interesting problem to solve, preferably for ca$h!  The traveling salesman solution is much better now. Various other improvements too. Along with it, I've updated tostring (1.0.1) and beanpot (1.0.9).  See my Sourceforge page (on Profile) for more details.

The new method for solving the traveling salesman problem is somewhat inspired by the Lenski experiment.  It's not quite the same, but both are based on asexual reproduction and both involve successive generations that are externally managed.  Of course, in the Lenski experiment there are real live critters in there, E. coli to be precise.  Whereas my solutions are purely bits and bytes.  But it's really helpful to keep an eye out for real biology to improve the framework.

In the latest version, I had to implement clone() on a lot of the classes.  Grrr!  See my other post regarding Things Java Got Wrong.

OK, back to work!

Thursday, January 7, 2010

The top ten most difficult software packages

There are some pieces of software that you install, plug them in, whatever and they just start working.  Eclipse is pretty much that way, even if it does take a bit of getting used to at first.  JUnit of course is dead easy to use.  Even stuff of some complexity such as Apache's axis for working with SOAP are straightforward.  I was also particularly impressed with the EclEmma coverage tool.

But there are some pieces of software, whether frameworks, plugins, stand-alone applications, or just Java classes that can be real beasts to get right.

Among Java's classes, there is a lot of bad stuff, some of which is getting improved as we go along.  For instance, exceptions in the early days could not carry with them a cause, except certain classes of exception (hence the getTargetException() method).  That was improved.  Another weird thing to my mind is that so much is done in the java.lang and other basic packages by simply creating classes.  Why, for example, is Number an abstract class, instead of (perhaps several) interfaces.  This means that you can't have a class that implements doubleValue(), for instance, and extend some other class.  Similarly, although much of the Collections framework is based on interfaces, common methods such as size() are not defined by some super-interface in the heart of the framework.  These are just minor gripes really.  Now for the real stuff!

Here's my hall of shame:
  1. In the number one spot, because of the endless frustration it's caused me, when compared with its inherent degree of complexity is Java logging.  Whether using the built in logger (java.util.logging), or using log4J, or trying to use Apache's JCL, the configuration steps are both arcane and woefully inadequately documented.  In the case of log4J, it's somewhat understandable because the author wants to make money out of the documentation so it isn't free.  But when it isn't working correctly, does it tell you why in clear simple terms?  No, of course not.  It tells you nothing at all.  It's as silent as the grave!
  2. Number two has to be Hibernate, although there are definitely extenuating circumstances here.  It's an extremely complex subject.  I know.  I've written my own ORM package in the past.  And things have improved quite a lot: when I first started using it, there were no annotations in Java so everything was configured manually so to speak.  And the caching and proxying components weren't exactly bullet-proof.  Also, I was using Hibernate 2 in those days.  But again, when something goes wrong, is there a clear message explaining what's happened?  No, you get an arcane error message which you can only interpret by looking for other lost souls on the web.
  3. I think I'm going to choose Java's Swing (graphical foundation classes) as my next little horror.  Why?  Partly because of the inconsistency of using the MVC pattern -- some classes like JTree have the model very clearly separated and everything is reasonably clear, but other classes try to sneak the model into widget itself.  Not very disciplined.  But the worst aspect is the whole mechanism of painting.  Maybe it's just me that finds this weird.  To be honest, I can't really remember what's so bad about it.  But I know it's caused me quite a bit of frustration.
  4. Maven.  Can't live with it, can't live without it.  Why do they make it so difficult to submit one's own software to their central repository?  While I find that it simply works 99.9% of the time, there have been issues, typically with repositories being moved (and using HTTP redirects).  There are many aspects of Maven which are just sitting there waiting to trip the unwary.  But to be honest, I've forgotten what most of my early frustrations were.  Again, no documentation unless you are willing to buy the book.
  5. The Calendar classes.  Ugh!  Bizarre is a huge understatement.  And they're broken.  At least, they are on Windows machines, although I suspect that it's actually the underlying time utilities of Windows that are the true problem.  The problem centers around using UTC, formerly known (to within a few microseconds) as GMT.  Now, in an astounding blunder of huge proportions, GMT (in Windoze land) is conflated with British time (subject to daylight saving).  Aaargh!  Don't these people know that GMT is immutable?
  6. Regular expressions.  Another very complex subject but it really is time for a redesign starting from scratch.  Forget all the history.  Let's come up with something that is really easy to use and what's more can easily tell you what's happening.  It's not particularly the Java regex package, it's every regular expression that's out there.  Probably we should call them irregular expressions when they get reborn in the next life.
  7. Most declarative languages.  While I absolutely love the concept of declarative languages, it can be awfully hard to track down why things aren't happening as you expect.  Again, that's probably because it really is complex.  But we're definitely in need of something to make this kind of thing simple.  Although I've used quite a few such languages over the years, the one that's pretty much the worst is XSLT, although I haven't used it in several years and maybe there's now an easier way to figure out why it isn't doing what you expect.
  8. Honorable mention: the MBean rules that Java used to make life "simple" for us.  Instead of providing a marker interface to use to allow us to define an MBean, they use the most byzantine rules for naming the MBean.  Yes, I know there are other ways to define an MBean, but they are equally convoluted, if not worse.
  9. Going back a while now, TCP/IP and its related sockets definitely deserve a place in the hall of shame.  It wasn't so much TCP/IP itself, although there are plenty appallingly bad design decisions (such as the huge hole that allows for "spoofing") but more the Windows implementation of TCP/IP (Winsock).  I would need several days to catalog all the bad things lurking in that little baby!
  10. All good lists have 10 items, don't they? 
Is there a common thread between these different items?  I think there is.  With some more than others perhaps.  Pretty much all of them have evolved from something earlier and which wasn't necessarily expected to be quite so versatile or robust.  The most obvious is the TCP/IP stack.  That's ended up being quite literally ubiquitous and in fact has held up remarkably well.  But clear error messages and configuration result from clear design.  When there is a mapping between the concepts the internal software uses and the concepts the user or programmer thinks in, it is easy to make the connection between internal exceptions and what the remedy might be.

OK, back to work!