Monday, March 20, 2006

Too many classes?

This is a post which was originally posted last year, under my other blog thoughts for life. I'm reposting it here because this is where it actually belongs.

I can remember, when I was in under grad, my brother said to me that when you're stuck on a programming problem then just add another class (the word class in this article is used to refer to interfaces and abstract classes as well as concrete classes).

At the time I don't think he or I understood how profound that statement is. How often does one come to code and found just two classes that do too many things, and then scratch ones head trying to figure out what's going on.

That statement is borne out of the notion that having lots of classes is a good thing.

Let's take an example from real life...

Analogy from the Real World
Imagine your motor car's engine was made up of 5 parts. When the starter motor breaks you have to remove the water pump as well because they're both from the same part. So you introduce potential for problems. Maybe the mechanic does not re-install the water pump properly, and so a problem that was just the starter motor now takes longer, with more risk because you cannot just remove the starter motor.

Transferring the analogy into the software design arena and you soon realise that the problems are far worse. A class which does many things which requires one change, there is an even larger risk factor than the engine analogy, because of the potential complexity software component.

To give you an example, a certain component of an application was particularly inefficient and for want of a better word, was just plain bad. I was tasked with upgrading (rewriting) it. When I was finished, the number of classes was about 4 - 5 times higher then in the previous version.

Here are some advantages of having many classes...
  1. Isolation of Bugs: In a system with many small classes, you have many relatively independent sub-systems all working together. If there is a problem, or a bug, chances are the bug will be isolated to one class, and only that class will have to change.
  2. Flexible to Change: I can remember times when I've had to make changes to an already built system and I spend a good part of the time figuring out how the current system works because the current system is comprised of very few classes. If the system were comprised of many small classes, then I would only have to understand many simple, small pieces of code. I may even find that the piece of functionality where the change applies is wrapped in one class already, and that class may even be named appropriately.
  3. Ease of Comprehension: Having many classes means you need less documentation. If you have one method which is rather large and you break it down into maybe two/three methods, and you name the methods appropriately then the method names describe what they do and the play the role that code comments would play in the larger method. The same is true when you break a system down into many classes.
  4. Potential for Good Design. I have found that when a system consists of many classes, chances are, the design is good. The number of classes would however, have to be relevant to the complexity involved, and this relationship is often an exponential one. As the complexity increases, so the number of classes increases even more. It is only by experience that one gets a feel for how many classes are appropriate to perform a particular job, but by and large, if the complexity is high and the number of classes is low, then the design is poor. If the number of classes is high this does not necessarily mean the design is good, but it is one of the indications.
I've heard rumours that in the past people were reluctant to have many classes with many methods because of the overheads of method calls and class intanstiation. The rationale for this consideration has been removed by efficiencies in modern compilers, virtual machines and the speed of hardware. Yes, you do incur an overhead when calling a method and instantiating a class, but, in general practice, I have found this to be negligible. Time and time again, design well, and then optimise rings true.

Real World Example
Why does this work? Well, it won't work if you just divide your large class into smaller classes in a naive way. If you were to divide a class into smaller pieces but still achieve the same degree of coupling between the classes as was previously the case, then you haven't achieved much. It is far more valuable if when dividing the class into smaller ones, the coupling level between the created classes is fairly low. In other words, the various classes do have a semblance of meaning outside of their context.

Take for example the problem that says
  1. Read in a file
  2. Count the number of words in the file
A naive approach would put all the functionality into one class which will read the file in and then count the words by counting the number of spaces. This would require one class.

A far more funky approach would be to separate the reading of the file from the counting, and if this is done with a low degree of coupling (via an interface) then changing the source location from a file to a string would be a simple process. If the counting is also done in its own class, then you could also change that algorithm to count say the letter 'a' without affecting the way it sources the file.

The classes that would be requried would be
  • Interface to communicate with the data source
  • Interface/abstract class to communiate with the counter
  • Concrete implementation of the file sourcer to retrieve it from a file
  • Concrete implementation of the counter to count spaces/white space
  • Controller to hook the two up
We've gone from 1 class to 5 classes in a very short space of time.

So when you code, do not be afraid to make many classes.

And when you're stuck, just add another class.

Friday, March 10, 2006

Get Perpendicular!

This is most hilarious...

Slashdot reports that "Hitachi has recently announced perpendicular recording with their harddrives".

That is good news because it means hard drives can get even smaller.

That is not the funny bit however, they've also developed a "music video" to go with it....

Absolutely hilarious!

For more information on perpendicular recording see the wikipedia article on it.

Tuesday, March 07, 2006

EJB JDO, what's going on

I've read with interest the discussion on the go between the JDO camp and EJB camp.

If you're not familiar with it, I'll give you a brief background.

EJB, more specifically "entity beans" is a persistence framework. JDO (java data objects) is also a persistence framework. The Entity beans framework is part of the broader EJB spec, currently in version 3.0. JDO is a "stand alone" specification in a similar vein to EJB in that Sun publishes the specification and a vendor then implements that specification and sells it on the market. JDO can be used in a desktop app, a web app or in an application server. Entity beans requires an application server to work and comes with significant overhead because of this.

I have used JDO extensively and have recently started looking at EJB in its version 2 guise. To be honest, the learning curve for Entity Beans is huge! It would take a developer twice if not three times as long to be useful under Entity beans than under JDO. I know because I was new to JDO at one stage and can remember how long it took me to learn. The developer would need to learn a lot of not specificaly persistence related stuff like session beans and local/remote interfaces.

There have been a number of discussions on the pros and cons of JDO against EJB. What makes the discussion even more interesting is that the latest EJB 3.0 spec, from what I've read, is unashamedly Hibernate! There are a number of conspiracy theories floating around as to why the EJB expert group went that way. The plot thickens when you consider that the JDO 2.0 specification is still in the process of being ratified - it has been rejected once already.

One of the loudest shouts in the community has been "We don't want two persistence frameworks!". My response to that is "why not?". It gives the user choice, it increases quality via increasing competition, it promotes flexibility. I say bring on alternatives.

This is the rationale behind the push on EJB 3.0. It seems that somehow this specification has won the day and that JDO has been abandoned. The primary reason for this I think is that because of the fact that Entity beans require an application server to work, the large vendors, Oracle, BEA, IBM, JBoss (who own hibernate btw), have supported the EJB specification because that way they protect their market.

To utilise JDO you do not need a heavyweight, complex and expensive application server.

The other apparent reason for the support of hibernate via EJB 3.0 entity beans is that hibernate uses standard sql as its query language. This not only locks the user into the underlying database but apparently affords the developer a seamless cross-over. I do not see how this can be seamless when the developer has to learn about all the overhead that comes with running on an application server.

I had to learn the JDO QL (JDO Query Language), and the syntax is a lot cleaner than SQL. JDOQL is far closer to object space than SQL is. It is just a more obvious and better fit when it comes to the object space. The key here is that it is far easier for an OO programmer to learn JDOQL than it is for an OO programmer to learn SQL. Which is why I do not buy the "they'll have to learn a new query language" rhetoric.

Furthermore, there is no reason why you can't use JDO with an application server. Again, surprise surprise, you have choice.

Reading various articles on the subject might indicate that JDO is dead, well, I was pleasantly surprised to note that the final round of voting on the JDO 2 spec is imminent. See jcp 243.
I'm holding thumbs for its adoption.

It probably won't be ratified however; based on the voting up till now a betting man would not put money on it.

Ultimately, I think that Apache opinion is correct and the most democratic and "broad based". They have always maintained "Let The Market Decide!". The specification is good enough, the implementations are of sufficient quality that the computing industry will use it. Proof of this is that a number of companies have already implemented the JDO 2.0 specification without it's ratification! JDO fulfills important requirements quite efficiently. In my work place we have used it for two web applications so far, and have seen no reason to go the Application server route.

Let's hear it one more time, Let the Market Decide!

btw, if you have an opinion, then tell me what you think...