Wednesday, January 20, 2010

Things that Java got wrong: part 1: clone()

 This is the start of a series on some of the what-might-have-beens in Java: facets of the language that, in my opinion, they "got wrong".

But, first, a disclaimer: I love Java and as far as I'm concerned, it's by far the best general purpose and popular language that has ever walked the earth, so to speak.  So while I'm about to be very picky indeed to my favorite toy, I still love ya baby!

clone()

The ability of a class to have its objects cloned is determined by the existence of a marker interface Cloneable.  If you override the clone() method in your Cloneable class (don't forget to make it public) you buy the privilege of not having it throw a CloneNotSupportedException when you call clone() on that object.  The super.clone() method will actually do most of the heavy lifting for you.  But it only does a field-by-field copy.  It doesn't clone any of the fields itself, it just copies them.  Now, wouldn't it be nice if we could provide an annotation, for example @Clone(deepcopy=true), for those fields on which we want to perform a deep copy.  Similarly for shallow copies, such as implemented by HashMap and ArrayList.

The way we have to do it now is to invoke clone() on those fields (and naturally, we have to implement the clone() method in the (Cloneable) type of the field).  And here's the annoying part.  When a field is marked final, because it's initialized only by the constructor(s), you have to mark it as non-final just so that you can assign the result of cloning it to the field.  Even without the annotations suggested above, the Java designers could have given the clone() method, if it exists, special privileges on a par with constructors.  It sounds a bit hokey, I agree.  But clearly the Object.clone() method effectively has these privileges (when it does a field-by-field copy).  It's only at our level that such privileges are missing.

OK, back to work!

Done tinkering with Darwin

Today I released Darwin framework for evolutionary computation, version 2.2.01 and I'm done.  No more updates unless somebody has a really interesting problem to solve, preferably for ca$h!  The traveling salesman solution is much better now. Various other improvements too. Along with it, I've updated tostring (1.0.1) and beanpot (1.0.9).  See my Sourceforge page (on Profile) for more details.

The new method for solving the traveling salesman problem is somewhat inspired by the Lenski experiment.  It's not quite the same, but both are based on asexual reproduction and both involve successive generations that are externally managed.  Of course, in the Lenski experiment there are real live critters in there, E. coli to be precise.  Whereas my solutions are purely bits and bytes.  But it's really helpful to keep an eye out for real biology to improve the framework.

In the latest version, I had to implement clone() on a lot of the classes.  Grrr!  See my other post regarding Things Java Got Wrong.

OK, back to work!

Thursday, January 7, 2010

The top ten most difficult software packages

There are some pieces of software that you install, plug them in, whatever and they just start working.  Eclipse is pretty much that way, even if it does take a bit of getting used to at first.  JUnit of course is dead easy to use.  Even stuff of some complexity such as Apache's axis for working with SOAP are straightforward.  I was also particularly impressed with the EclEmma coverage tool.

But there are some pieces of software, whether frameworks, plugins, stand-alone applications, or just Java classes that can be real beasts to get right.

Among Java's classes, there is a lot of bad stuff, some of which is getting improved as we go along.  For instance, exceptions in the early days could not carry with them a cause, except certain classes of exception (hence the getTargetException() method).  That was improved.  Another weird thing to my mind is that so much is done in the java.lang and other basic packages by simply creating classes.  Why, for example, is Number an abstract class, instead of (perhaps several) interfaces.  This means that you can't have a class that implements doubleValue(), for instance, and extend some other class.  Similarly, although much of the Collections framework is based on interfaces, common methods such as size() are not defined by some super-interface in the heart of the framework.  These are just minor gripes really.  Now for the real stuff!

Here's my hall of shame:
  1. In the number one spot, because of the endless frustration it's caused me, when compared with its inherent degree of complexity is Java logging.  Whether using the built in logger (java.util.logging), or using log4J, or trying to use Apache's JCL, the configuration steps are both arcane and woefully inadequately documented.  In the case of log4J, it's somewhat understandable because the author wants to make money out of the documentation so it isn't free.  But when it isn't working correctly, does it tell you why in clear simple terms?  No, of course not.  It tells you nothing at all.  It's as silent as the grave!
  2. Number two has to be Hibernate, although there are definitely extenuating circumstances here.  It's an extremely complex subject.  I know.  I've written my own ORM package in the past.  And things have improved quite a lot: when I first started using it, there were no annotations in Java so everything was configured manually so to speak.  And the caching and proxying components weren't exactly bullet-proof.  Also, I was using Hibernate 2 in those days.  But again, when something goes wrong, is there a clear message explaining what's happened?  No, you get an arcane error message which you can only interpret by looking for other lost souls on the web.
  3. I think I'm going to choose Java's Swing (graphical foundation classes) as my next little horror.  Why?  Partly because of the inconsistency of using the MVC pattern -- some classes like JTree have the model very clearly separated and everything is reasonably clear, but other classes try to sneak the model into widget itself.  Not very disciplined.  But the worst aspect is the whole mechanism of painting.  Maybe it's just me that finds this weird.  To be honest, I can't really remember what's so bad about it.  But I know it's caused me quite a bit of frustration.
  4. Maven.  Can't live with it, can't live without it.  Why do they make it so difficult to submit one's own software to their central repository?  While I find that it simply works 99.9% of the time, there have been issues, typically with repositories being moved (and using HTTP redirects).  There are many aspects of Maven which are just sitting there waiting to trip the unwary.  But to be honest, I've forgotten what most of my early frustrations were.  Again, no documentation unless you are willing to buy the book.
  5. The Calendar classes.  Ugh!  Bizarre is a huge understatement.  And they're broken.  At least, they are on Windows machines, although I suspect that it's actually the underlying time utilities of Windows that are the true problem.  The problem centers around using UTC, formerly known (to within a few microseconds) as GMT.  Now, in an astounding blunder of huge proportions, GMT (in Windoze land) is conflated with British time (subject to daylight saving).  Aaargh!  Don't these people know that GMT is immutable?
  6. Regular expressions.  Another very complex subject but it really is time for a redesign starting from scratch.  Forget all the history.  Let's come up with something that is really easy to use and what's more can easily tell you what's happening.  It's not particularly the Java regex package, it's every regular expression that's out there.  Probably we should call them irregular expressions when they get reborn in the next life.
  7. Most declarative languages.  While I absolutely love the concept of declarative languages, it can be awfully hard to track down why things aren't happening as you expect.  Again, that's probably because it really is complex.  But we're definitely in need of something to make this kind of thing simple.  Although I've used quite a few such languages over the years, the one that's pretty much the worst is XSLT, although I haven't used it in several years and maybe there's now an easier way to figure out why it isn't doing what you expect.
  8. Honorable mention: the MBean rules that Java used to make life "simple" for us.  Instead of providing a marker interface to use to allow us to define an MBean, they use the most byzantine rules for naming the MBean.  Yes, I know there are other ways to define an MBean, but they are equally convoluted, if not worse.
  9. Going back a while now, TCP/IP and its related sockets definitely deserve a place in the hall of shame.  It wasn't so much TCP/IP itself, although there are plenty appallingly bad design decisions (such as the huge hole that allows for "spoofing") but more the Windows implementation of TCP/IP (Winsock).  I would need several days to catalog all the bad things lurking in that little baby!
  10. All good lists have 10 items, don't they? 
Is there a common thread between these different items?  I think there is.  With some more than others perhaps.  Pretty much all of them have evolved from something earlier and which wasn't necessarily expected to be quite so versatile or robust.  The most obvious is the TCP/IP stack.  That's ended up being quite literally ubiquitous and in fact has held up remarkably well.  But clear error messages and configuration result from clear design.  When there is a mapping between the concepts the internal software uses and the concepts the user or programmer thinks in, it is easy to make the connection between internal exceptions and what the remedy might be.

OK, back to work!