Chris Kelly: legacy

A flaw I've notice frequently in a legacy code base is constructors which are doing too much work. A common anti-pattern is that a configuration is passed into a constructor. This configuration is then perused to find information so that further objects can be constructed in the constructor. I will discuss why I think this is bad practice.

As an example I have a message dispatcher which delegates to a sender object
to send a message to a destination. A dispatcher usually makes a decision on the appropriate destination based on the message. For this example there is only one endpoint. This destination is to be read from the configuration.

So I have an interface for the sender:

and a dispatcher which uses the configuration to build a sender inside its constructor:

The expected usage for the constructor then becomes:

This is bad on several counts:

Abstract data type is passed in. There's no indication how this will be used or what it's used for. It's just a bucket of parameters. If possible a specific type should be used in preference to an ADT. It's all about the domain and context. Check out Domain Driven Design or GOOS touches on this point.
Harder to unit test. I have to make sure configuration is correct so that internal objects can be created properly. In my example I've only one object to consider but they may be several.
I can't mock the behaviour of any objects created by the constructor. In essence that behaviour is hard-wired which may be deleterious to testing if that behaviour is using a real resource i.e. socket or database. It turns the unit test into more of an integration test

Use a factory approach

So instead of passing in a configuration, pass in the already constructed objects. A DI framework such as Spring or Guice will assist you in this regard as it's fundamentally what DI is all about. You inject pre-configured objects into other objects rather than building those objects internally.

If you're not using a DI framework (I'm in this situation as it's legacy code) then it's better to pass a factory which builds the object on the fly as a constructor parameter.

To illustrate this:

I can have a factory that produces Sender objects. I've taken the liberty of introducing a generic factory for this purpose. It builds objects of the specified type for a given configuration.

Now I provide an implementation for the Sender:

And to illustrate the point I rewrite the Sender class:

Finally I rewrite the message dispatcher. The constructor now takes objects rather than configuration

And the usage becomes:

This is a lot better:

Now the dispatcher only knows about objects. It does not even know a configuration is used.
The transformation of a configuration into object is localized in the factory and not spread out across the code base
The class becomes trivial to unit test. Instead of worrying about configurations I can simply create a mock Sender and pass that to the dispatcher. In this way I can focus solely on the behaviour of the Dispatcher itself.

Conclusion

Resist the urge to pass around configuration objects and allowing constructors to build new objects. A constructor should ideally be used for assembly only. It's better to just pass pre-configured objects into a constructor. This makes the intention of that constructor clear and makes it easier to unit test the class in question as the colloborators can be mocked easily.

I've recently had to develop and improve a legacy code base. This has reminded me of a several anti-patterns one should consider when coding. This is by no-means an exhaustive list but conveys some of the most frequent ones in my experience.

Final classes without interfaces

There were a lot of instances where classes were declared as final. This may be a good thing from a security perspective but from a testing perspective it meant that mock classes could not be created resulting in unit tests that were harder to write or could not be written.

There are tools to can create mock objects even without an interface. A particular favourite of mine is Mockito. In fact the mocking of final classes can be overcome through the use of the PowerMock API with Mockito. However a class with an interface is preferable from an OO standpoint as the use of concrete classes results in tighter coupling between participants.

Lack of Defensive Programming

There are numerous examples of methods without any defensive checks. The caller of this method had a try catch block to catch a NullPointerException, NPE. This is a bad code smell because the onus is on the caller of the method to assert the validity of parameters that are passed to the method in question. This is the wrong place. From an encapsulation point of view, it should be the method that validates whether its input parameters are valid or not. This is a form of Design by Contract where the method checks to see if input parameters satisfies some condition and reacts accordingly. I like to think this as the 'Bouncer' Pattern. If you don't look right, you're not getting in :).

A contrived example is shown below:

This short of approach should also be applied to constructors to ensure the object has a valid state and is fully built before use. Furthermore the throwing of exceptions give more context as to what has caused the failure rather than catching a NPE and having to retrospectively determine what has called that NPE to be thrown.

A favourite API of mine is the Validator class in Commons Lang which lists numerous methods for validation in different contexts.

Exposure of super state to child classes

I've seen a lot of instances where child classes would use the state variables of a super class. Variables had protected visibility so there was naked access to these variables which are inherently dangerous. This is because a child class could change the reference to a super class variable with unforeseen consequences. I think the intention was to provide access to a child class with a parent's state so the child class can perform an operation. A quick fix would be to only allow these parent state variables to be access through accessor methods.

http://c2.com/cgi/wiki?InappropriateIntimacy

But this begs the question on why child classes needed that sort of access in the first place. IMHO, private state should never escape the confines of a class, only behaviour.

The other smell was that the super class was becoming top heavy with functionality. Probably the reason was that common functionality was pushed up the class hierarchy for re-use by child classes. But this resulted in the super class becoming bloated and unfocused. A better solution would be to use delegation techniques rather than inheritance.

http://www.refactoring.com/catalog/replaceInheritanceWithDelegation.html

When I look at a class functionality, I like to keep in mind a Unix philosophy i.e. Do one thing but do it well. If you find your class not adhering to that maxim, that's a sign refactoring is in order.

Printing out error messages to console instead of logging

I've came across a few situations where exception stack traces were dumped to console. Don't do that. Use a logging framework such as log4j. If log4j is used then a console appender can be used to achieve the same goal. Furthermore errors and warnings could be logged to a specific destination i.e. a file so one can see only pertinent errors and not worry about debug messages.

Another observation was the lack of categories used in logging. Most of the statements I saw were a generic dump of error messages for the whole platform. Using categories allows log messages to be sorted. For example I could have a category called com.acme.X for X related logs and com.acme.Y for Y related logs. If I wanted to see all logs I create another appender that logged at the com.acme level. At the very least, use the fully qualified name of the class the logger resides in as the logging category. The use of categories results in greater capabilities on what should be logged and where it should be logged to. In this example X and Y are logged to different appenders but the possibilities are endless depending on how the categories are devised.

Classes with unclear focus

There were examples of classes trying to do too much. An example would be a Handler class. The main functions are listed below

Setup relevant properties needed by the handler
Handle requests.
Convert a request to a protocol specific message.
Handle synchronous and asynchronous responses

This resulted in a class of over 1000 lines with several private methods. This was one of the major smells in that many private methods were only working on a subset of the private fields of the class. This implies that the class is trying to do too much.

Most of the examples I've seen are that the majority of functionality is realized inside one class instead of being delegated to other classes. The lack of delegation means the intention of the class is lost. Furthermore testing of the handler becomes more problematic. By delegation, each of these functions can be tested in isolation.

http://c2.com/cgi/wiki?LongMethodSmell
http://c2.com/cgi/wiki?GodClass

Unwieldy or unneeded comments

There were a lot of instances where code comments were of no use or didn't add extra information. For example, one method contained a lot of retrievals from a database along with ambiguous looping constructs. Each part of the method contained a comment explain what the next section of code would do. The reason I don't like this firstly comments are deodorant on 'smelly' code. That comment is probably there because the code is not clear enough to be understood. Secondly comments are brittle. If I changed that section of code, then I have to remember to change the comments, another piece of maintenance.

I am a proponent of 'Programming by Intention'. This is a programming style where you give meaningful names to methods, variables, classes etc so that the intent of the object in question is clear. Dave Astel gives an excellent overview here: http://www.informit.com/articles/article.aspx?p=357688

In the case of the Handler class, the method was essentially doing three things:

Obtaining a customer ID
Obtaining an billing ID
Obtaining other parameters from a database and checking to see if those parameters had values.

This resulted in a method of 200+ lines. The clarity or intention of the method has been lost. To regain the intention of this method, the method should be functionally decomposed into smaller methods.

i.e.

Now the intention of this method is clearer. The code becomes self-describing and there is not need for extraneous comments.

'Programming by Intention' is not used to declare all commenting is bad, just that commenting must not duplicate a purpose. If the code is clear then commenting what the code does is unneeded. However comments may still be needed. You could draw attention to a particular algorithm being used i.e MergeSort or that the code fixes a particular defect. When the comment has value it should be included, if not it should be discarded.

Use of exceptions to control program flow

Simple. Don't to it. The following link provides arguments:

http://c2.com/cgi/wiki?DontUseExceptionsForFlowControl

Throwing of ambiguous exceptions

There were numerous occasions where java.lang.Exception was thrown instead of a more specific exception. This is bad practice because throwing an ambiguous exception means the catcher cannot react to the exception in different ways. An ambiguous exception loses information on whether the situation is recoverable or irrecoverable. The meaning of the error is also lost. A specific exception should be thrown for a particular situation.

http://c2.com/cgi/wiki?ExceptionPatterns

Use parameter objects instead of long method signatures.

There were a few cases where methods had long method signatures. I'm talking about 10 or more parameters. This makes the method call unwieldy and prone to mistakes. A better approach is to use a parameter object which encapsulates the method signature, simplifying the method call.

http://c2.com/cgi/wiki?ParameterObject

Furthermore different parameter objects can be used to group together related parameters for different contexts. This is preferable than nullifying unneeded parameters in the long method signature.

Never Duplicate Code

http://c2.com/cgi/wiki?OnceAndOnlyOnce

A few examples were observed where code was duplicated and in some places just a copy and paste job. This means that if any defects are found, then the fixes have to be applied in more than one place. Ideally code should be written using DRY principles i.e. Don't Repeat Yourself. Situations where the same code exists in different places should be remedied by that code being pulled out into a separate method and re-used.

Return nulls from methods.

There were a few cases where a call to a method resulted in a null being returned. For instance a client asked for a map and got returned a null because the input parameters were incorrect. The onus is then on the callee to check the result is not null before using the result. A better approach would be to return a NULL object. The Null Object pattern provides an alternative. It connotes the absence of an object. Instead of using null, the Null Object pattern uses a reference to an object that doesn’t do anything.

http://en.wikipedia.org/wiki/Null_Object_pattern

In this example, instead of returning null, an empty map should be returned. The client then doesn't have to check for nulls. This leads to safer code.

Conclusion

A lot of my recommendations are based on Martin Fowlers' Refactoring which gives guidance on how to remove particular code smells. However as legacy code is usually not particularly amenable to unit testing, refactoring can give a low confidence level as there are not the number of unit tests to back it up. Part of the Test Driven Design (TDD) approach is that unit tests are written to prove the behaviour of the system at a granular level. Once you have the tests, you have the confidence to refactor as you can regression test to see the system behaves as before. In my opinion a lot of the defects seen in production code would be diminished by the use of unit testing and paying heed to the aforementioned anti-patterns.

Recommended Reading

Refactoring: Improving the Design of Existing Code by Martin Fowler. The classic book on taking poorly designed code and making it better without breaking it.
A good introduction to TDD. Test-Driven Development: A Practical Guide by Dave Astels
A TDD book for the more seasoned developer: Growing Object-Orientated Software, Guided by Tests by Steve Freeman and Nat Pryce. Probably if TDD was adopted in the first place many of these problems I described would not have arisen in the first place.
But if you have a legacy code base showing these symptoms then a great book to read is Working Effectively with Legacy Code by Martin Feather. It has a set of techniques for changing code so that it is more suitable to be covered by unit tests

Chris Kelly

Wednesday, 25 July 2012

Constructors that do too much

Use a factory approach

Sunday, 15 May 2011

Programming Retrospective