Analysis 102: October 2008

Wednesday, October 22, 2008

Developers do what they see...

This is probably the first really big project that I have been on and I have learnt a lot. One of the key lessons I have learnt is that the average developer is a sheep. He (or she) will do what they see and very seldom will they go against that.

In fact, even if it's wrong, they will do what they see.

It is has been quite remarkable.

So changing habits is not easy, especially if you're on a complex enterprise project which has been going for 18 months with a very fast development cycle. There are _many_ different ways of doing almost anything in the code so the poor developer has no idea what approach to follow.

It also means that changing habits is not easy, is has thus been quite encouraging to see some habits changing - I introduced commons collections into the code base, used it in my code, and showed a few dev's what it offers - and I'm now finding it is used more and more.

The other good thing about the way developers do what they see is that if what they see is good and right, those habits are persisted. I guess the opposite also applies, if what they see is low quality, they will extend that low quality.

Thursday, October 16, 2008

Hibernate best practices...

I have been on a very large enterprise project using hibernate for the persistence. Our data model is upwards of 200 objects. It is a clustered web application and thus requires high performance so we've implemented caching and have all our objects and relationship set to lazy load. The typical database interaction is lots of reads and few rights. The data of one user does not affect the data of another.

Here are some of my best practices which I would have loved to have had in place from the beginning. They would have saved a lot of time and reduced bugs...

Use id based Equality
This is contrary to a lot of the typical recommendations but I don't see a problem with it. It means that you have a working equality method without any further effort. Furthermore, you have a guaranteed equality, the same equality that the database uses. There are some issues with this when it comes to one-to-one relationships, but those can be resolved by adjusting the equals method. The other issue is that there are instances where there is no business based unique key.

Don't use auto generated Keys
So in your model classes set the id. This will mean you need another field as a persistence marker. The fundamental reason for this, coupled with the fact that you're using the id for equals, is that you have immediate equality, both prior to persisting the object and in your tests. By using auto generated keys you you have to wait until the object is persisted before its equals method works. You cannot thus use it in a set prior to persistence, and when you run tests you have to manually set the id every time you create the object.

Use instrumentation
Hibernate has an issue (a big issue IMHO) that if you execute the following code (Person is set to be Lazy loaded).

          Person p = session.load(Person.class, id);
      ContactDetails cd = p.getContactDetails();

And on the database that particular object (contact details) is actually a subclass of ContactDetails called AddressDetails, the real type of the variable cd will be ContactDetails and _not_ Address Details as you would have expected. In order to get the real type you have to do first ask hibernate for the real type of the object and then call load again with that real type. So in order for the variable cd to be the correct type you would require the following code:

          Person p = session.load(Person.class, id);

      ContactDetails cd = p.getContactDetails();
      cd = (ContactDetails)load(id, Hibernate.getClass(cd));

An extra line is required.

Instrumentation however injects code into the compiled class of ContactDetails and intercepts the call to getContactDetails to add the bit to load the real type. It's a small little ant job that runs within the context of our IDE and sorts it all out.

Use Field access over method access
Unfortunately, I only discovered that hibernate had the ability to directly act on the fields rather than go via methods too late to make use of this ability. It would have enabled me to setup some nice validation and/or side effects on the setting of a value in a hibernate managed object. I'm sorry I didn't do it earlier. This is a practice which I have thus not been able to test, it does sound like a good idea but maybe there are other issues I do not know about.

Think hard about your Cascades
Hibernate is not good at saving a whole object tree in one go. The problem is that as soon as any sql is executed on the database, the sql is validated against the state of the database. So any constraint violations are immediately raised. If you think hard on the cascades the ability so save whole trees is significantly improved though it is not totally enabled. One obvious practice is that if it has a not-null constraint the cascade must be set. If this is not set unless you save the foreign object before the local one you will get a constraint violation. The objects go together (thus the not-null) so the cascade should be on.

The above recommendations are fairly generic to any project that uses hibernate in the real world. There's a few that I've learnt on this project that are specific to a high performance and high concurrency web application...

Prefer trawling the object model over running queries
The kind of application we have is one where a user would login and probably send on average 10 - 20 minutes on the web site. Thus we would typically have a lot of the objects they require in cache already. Thus when you need to find some data for the person concerned it is a lot better to simply trawl the objects rather than use a hibernate query. The reason for this is twofold..

Querying in hibernate always causes a flush. Data is therefore written to the database at a time when it shouldn't necessarily be. Even though you know the query does not touch any of the pending writes, the flush will still happen.
A data connection is made. On our application we gained significant performance concurrency improvements by removing queries - the objects were already in the cache in any case. Where we had deadlocks we simply removed the query from the equation (changed to trawling the model) and the deadlocks were significantly reduced.

Make everying lazy
On our application where isn't a single relationship which is not lazy. We can't think of a case where having lazy off would be useful to us. Yes, I understand that it means the first read will be slow, but after that it'll be in the cache so it will be fast. If lazy is off even if the object is in the cache it will still load that object from the cache

In other words, if you have an object A which has a reference to object B and the reference is specified as non lazy. When you load A it will _always_ load B even if you don't go near B. Even if B is in the cache it will still load B from the cache. On our application we have a lot of static data. This static data was initially set to be lazy disabled (prefetching). Then we would preload all this static data. We found that the preload was very slow - even though it was coming from the cache - because it would have to load the root object plus all the non lazy relationship stemming from that object.

Thursday, October 09, 2008

What Every Development Shop cannot do without!

For those of you who have not heard of hudson, where have you been. It is the new kid on the block in the Continuous Integration space, and it (IMO) stands head and shoulders above the competition.

It has added an incalculable amount of value to our development environment, enough that we depend on it as much as we depend on our IDE. Furthermore, it has single handedly raised the quality of the code such that I am so disappointed we did not have it in since the beginning of our project.

Why do I say all of that?

To my mind, the killer feature that hudson has is that it supports plugins. This is a feature what sets it apart.

And the killer plugin, which has contributed to the code quality is a fairly simple plugin which allows the results of various code quality metrics to be summarised and tracked in hudson.

We now have continuous metrics for the following...

Checkstyle - this is a tool which examines source code against a number of rules. It checks for instance, the formatting is correct, cyclomatic complexity and npath as well as simple class/method lenghts. It flags as a warning if the rule goes outside of the allowable scope.
FindBugs - this is a class level code checking tool. It can find things like equals being executed against different types
Cpd (Copy-Paste) - a cool plugin that checks for where code is the same and thus probably was copied and pasted.
Pmd - another class level checker

Now I could have run these code checking tools without hudson. Hudson however, allows me to track changes over time and to know when new violations have been added etc. And the running of these tools has already avoided many potential bugs.

So if you haven't checked out Hudson and its Plugins, it's never too late. The longer you leave it the worse your code is going to get.

If you want to know more, then let me know.

Tuesday, October 07, 2008

JDK 1.4 is being retired...

RIP ... JDK 1.4

If you haven't heard, JDK 1.4 is to officially entier End of Service Life. Over at dzone, Alex Miller comments about and what I find most interesting is that the fact that it's going to be retired is not going to make much difference to the software community. They are going to happily continue using it. If they're still using 1.4 now, I doubt whether it reaching end of Service is going to push them over the edge.

For one thing, software shops which use Websphere are typically still on 1.4 - this is because the big bear IBM is controlling the VM version of those shops and it's going to be a long time before WPS on Java 5 sees the light of day - so though WebSphere 6.1 supports Java 5, Websphere Process Server 6.0 does not. So spare a though for people like us who are still stuck on Java 1.4 and the news that it is going to be retired will be a non even in our lives.

Thursday, October 02, 2008

Testng is Cool but flawed

A few months ago we took a closer look at our testing strategy. We assessed testng and noticed it had a number of features we thought we'd find useful. e.g. data driven testing, ability to run only the tests the failed, configurable testing in xml based on annotations and/or xdoclet tags.

Based on the features and a little prototyping we decided to use testng as a standard test framework. We setup the tests to run on our CI server (hudson) and we were good to go...

And things moved a long smoothly, we used a lot of the extra features of testng and also found that having an xml file that indicated what was a test was quite useful.

But then issues started to surface...

1. Testng cannot run each test on a different VM. The makers of testng do not give you the option to run your tests in a totally new VM so you can potentially run into issues when doing integration like tests requiring the hibernate session factory and database. In junit I have the option of runnning each test case in a totally new VM.

2. Testng doesn't really work with 1.4 xdoclet annotations. Unfortunately we are still on 1.4 and thus had to use the xdoclet annotation mechanism for tagging a test. This was problematic almost all the time. I wouldn't be surprised if the issues we had with testng would be a lot less if we were able to use class annotations.

3. Testng tries to skip tests when the setup fails, i.e. it doesn't fail fase.This was the major issue. The problem was that when a test's setup does fail and thus the test is skipped, testng struggles to identify how many tests were supposed to run. Thus in our CI server we found that the number of tests kept on changing. Permanently. Thus it because very difficult to get predictable results and a good idea of what is going on in the tests.

4. The Eclipse plugin for testng is not as good as junit's. The developers in my shop were unhappy partly because of this. When I announced that we were moving back to junit they were happy. Some of them had continued to subclass TestCase because they felt the assertions provided were a lot richer when this is done (and they are). With Testng, because it is not required that you subclass when creating a test, you generally use the java keyword "assert".

So now we are moving off testng back onto the old faithful, junit. There are however tests which are built with the data provider and/or parameter subsystems which testng offers and these will stay testng tests. We have no plans to move these off testng. And if our developers really need the extra functionality provided by testng, they can use it, within reason - our testng CI run has not totally gone away.

The approach now is, if the tests only requires the basic features which junit3 has, then use junit. If you need testng functionality, get permission for it before you use it.

Analysis 102