Object Identity and why JDK 1.0.2/1.1 was better than 1.2+


Object Identity and why JDK 1.0.2/1.1 was better than 1.2+

Object identity is something that is at the core of all persistence containers and most distributed object systems. Finding the “same” object in another system or loading the “same” object that was referenced before is at the heart of all entity based containers. Java had a built-in solution to this but it was removed.

Before JDK 1.2 or “Java 2” as it was billed, there was a contract specified by the hashCode() method on objects that was supposed to not be broken. You can find this contract in API documentation from Sun here. I’ll reproduce the relevant portion from the JDK 1.1 documentation:

The general contract of hashCode is:

• Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer. This integer need not remain consistent from one execution of an application to another execution of the same application.

• If two objects are equal according to the equals method, then calling the hashCode method on each of the two objects must produce the same integer result.

I can simplify this spec speak by paraphrasing a bit. The first point basically says that your object’s hash code cannot change after the constructor completes. Anything that contributes to the identity of your object must be known from the instant it was created forward. The second point merely states that things that are equal have the same hash code, that second point still exists today. However, in Java 2 the first law was rewritten, mostly I imagine, from the confusion that results for programmers that were trying to use JavaBeans. JavaBeans had the property that they all have a default constructor that can be used to create them. This would mean that a JavaBean’s hash code method would always have to return a constant no matter what (since it can’t change after construction time) and that the equals method would basically have to always return true in order to satisfy both laws. This, I can only assume, was unacceptable to the general programming community. Here is what it says in the current version of the JDK 1.4.2:

The general contract of hashCode is:

• Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.

• If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

As you can see, this modification allows you to change the identity of an object by changing the fields of the object after it is constructed, this has a positive effect on JavaBeans. If only they had defined object identity for JavaBeans from the start, i.e. specify a set of properties of the bean that must be set in the constructor and that were also final fields. If they had done that, it would have been very easy to create a simple JavaBean container that managed persistence, distributed invocation and object distribution. Instead we had to come up with yet another specification, EJB, to handle object identity through the definition of a primary key for entities separate from the normal Java contracts.

If you look at all the persistence and distributed computing containers that deal with entities you will find that they all try and handle this object identity problem in different ways. If only they had required it from the beginning and built it into the language there wouldn’t be all these conflicting implementations. You’ll note that Microsoft, because C# is basically built from a Java 1.1 version of the world, has stuck with the old contract:

A hash function must have the following properties:

• If two objects of the same type represent the same value, the hash function must return the same constant value for either object.

It appears that the C# community has latched on to using it for good instead of evil by suggesting that your hashCode be based on database primary keys for the associated persistent state of the object. Although many C# programmers don’t seem to understand the contract and override it with Java 2 like semantics.