As a member of the London Java Community, I was gratified to hear of a Java conference, JAX London, on my own doorstep and eagerly signed up. It had been a while since I've attended a Java conference. I used to be a regular attendee of JavaOne but over the last few years, it has lost its lustre, especially now as it seems to be a bolt-on to Oracle OpenWorld. Maybe that'll change in future.
Day 1:
I eschewed the workshops on Android and JEE6/7 development in favour of talks focused on Spring technology. There were some excellent talks given on the new additions to Spring such as Spring Data as well as the upcoming functionality in the next version of Spring. There were also some good introductions to Spring Batch and Spring Security. A talk was also given on the interoperability of Spring and Scala which was informative on the use of Java DI frameworks with Scala rather than using inherent Scala approaches (see Cake pattern).
The most useful session in my eyes was the round table discussion at the end of the day in which the audience were allowed to ask questions to the Spring creators themselves. This was especially fruitful as you gain those nuggets of wisdom from these experts in their fields which you wouldn't normally get if it wasn't a vis-a-vis conversation.
Day 2:
My second day at the conference was mostly Agile focused. I attended a very good talk on the Scrum Product Backlog by Roman Pichler of which many salient points of advice were offered.
The next talk addressed software quality. The first talk on Software Craftmanship by Sandro Mancuso. This talked reinforced the idea that software development is a craft rather than purely engineering. He extolled the principles of the Software Craftsmanship movement, promoting self-improvement, knowledge share, professionalism, passion and care. I resisted the urge to jump up from the audience and shout 'Amen brother' but that was what I was thinking. The next talk titled Slow and Dirty by Jason Gorman refuted the notion about that we should release code dirty due to unrealistic deadlines and worry about the clean up later. One of his memorable phrases was 'Anaerobic Software Development'. I'm sure a lot of developers will concur with this sentiment. This describes situations where development teams start projects at an unsustainably fast rate, causing entropy in their code to build up. This causes intense pain after a short while which results in the team having to stop for weeks, maybe months until enough detritus is removed so the project can start moving forward again. If you're lucky that is, sometimes there's so much build up (holding back on the profanity :)), that the project must be dropped entirely and a new one must be started. His guiding principle was that if you care about quality all of the time then you don't get into these situations. Next time a manager, tells you we need something quick and dirty, cut corners on quality, to get something out of door, you should be prepared to push back as much as possible. You're only making a rod for your own back and building technical debt which must be repaid later.
Given my interest in concurrency, the next session I attended was on message passing by Dr Russell Winder who opined (with many interesting, witty and funny anecdotes) about why the shared memory multithreading, the prevailing wisdom of currently popular languages should not be used. Instead they should be replaced by higher level constructs such as Actors, CSP and DataFlow so that issues seen in contemporary approaches are eradicated. He then gave a demonstration with Actors via the GPars library of which he is the author. The 50 minute talk did not allow him to give a fuller treatise of the subject but it has definitely piqued my interest in that library for further investigation.
The last session of the day was giving by Martijn Verburg and Ben Evans on some of the new features of Java 7. They then delivered an open coding session on some of the features of Project Coin. Unfortunately for me, this turned out for me to be a thought experiment as I didn't bring my laptop to the session, although I did get to ask Martijn some questions on some of the new concurrency features in Java 7.
Day 3:
In my previous blog, I opined about whether had Java had reached its peak. I may have to revise my opinion somewhat after two excellent keynotes from James Governor of RedMonk and Simon Ritter of Oracle. The latter reinforced my view that the really interesting stuff will happen in Java 8 with respect to lambdas and modularity. It was also interesting to see Oracle's roadmap for Java in that they intend to release a new version of Java every two years. The roadmap given was up to 2019. Wonder what Java will look like then?
Fredrik Öhrström of Oracle then gave a very informative talk on some of the expected features in Java 8 specifically lambdas, map filter reduce and interface extensions. Following the takeover of Sun by Oracle, there are now two competing JVMs i.e. JRockit and Sun Hotspot. There is now a plan is for JRockit functionality to subsumed into Hotspot VM with the non-enterprise JRockit features to be added later incrementally.
The next lively talk on Performance Tuning resulted in a well known adage, 'Measure don't guess'. The presentation centered on a mal-performing web application. By measuring the throughput and load using different tools such as JMeter, VisualVM and vmstat, they showed how to investigate and eventually find the culprit. One should never shoot in the dark when performance tuning. A scientific approach should be followed such as baselining your application before any measurements are made so as to ensure that any changes actually lead to an improvement rather than degradation.
I then attended another talk on Java 8 concurrency giving by Martijn Verburg and Ben Evans advocating the use of Java concurrency library rather than relying on outdated constructs such as using synchronized. They also gave an on why parallelism will become more important in the upcoming years.
Changing tack for a few sessions, I then attend Ted Neward NoSQL session. This was a talk on what NoSQL actually meant as there's a lot of ambiguity in the community on this fundamental point. He then compared some of the common NoSQL variants such as db40, Cassandra, MongoDB and the typical situations where they could be used, a point sometimes missed. Use the right tool for the right job. Ted Neward is a great communicator and the session was enlightening in all respects.
The final session of the day was given by LMAX on their Disruptor Pattern. This was very interesting in many ways, as it seemed to go against the grain of previous sessions on concurrency. LMAX is a financial exchange so latency throughout their system is extremely critical. Through empirical evidence they demonstrated that accepted approaches to concurrency were non-performant due to the overhead of dealing with the JVM and JMM. This was especially interesting as being a Java programmer we're shielded from the low level details of the architecture your program is running on and assume that we need not worry about it too much as it's the JVM problem not mine. This is no longer the case. We know have to be increasingly wary of latencies between the processor, the L1, L2 caches etc and the main memory, as well as the JIT assembly code, a mechanical sympathy if you will. Their innovative solution rests on using a lock free ring buffer (the fundamental data structure at the heart of the Disruptor) rather than using the traditional work queue/thread approaches. I've not really given the Disruptor pattern the attention it deserves, as it really is a sea change in how applications could be architectured both from a business logic and data point of view. I will definitely be doing some investigation on this topic. There is a great introduction to this given by Martin Fowler and it's also advantageous that the Disruptor is an open source project.
Last thoughts:
Sadly though there were at least 10 other sessions I would like to have been at. I would have loved to followed Ian Robinson's session on Neo4J as well as attended some of some of the cloud offerings (some former colleagues of mine from Cloudsoft were presenting) but alas the timetable didn't allow me the opportunity. All in all, a feature packed 3 days which ended far too quickly. It's great to meet a lot of kindred spritis who were as passionate about technology as I am. I'm already looking forward to JAX 2012 :).
Friday, 4 November 2011
Wednesday, 12 October 2011
Peak Java?
Has Java peaked? To me the Java 7 release feels like it did when Java 3 came out; some gravy no meat. It's not a ground breaker as the evolution from Java 1.1 to 1.2 or when Java 5 was released. Sure there has been some nice language improvements but for me, Java as a language has pretty much stabilised. These new syntactical changes are just glossing over pain points for which they are ample (although sometimes arduous) workarounds.
The big changes I see in the future in Java 8 are the Java Module System (JSR 277) and Closures (JSR 335). The latter is needed to make Fork/Join more useable. Closures may have made a bigger impact a few years ago, but now with a plethora of other JVM languages, I feel this will no longer be the case.
What's becoming increasingly apparent is the increasing importance of Java platform. The important thing is not the Java language itself but the JVM. This acts as a substrate for different languages. Although from above the languages may seem different, it ensures that underneath the covers the the behaviour is consistent.
Some examples of JVM languages which have gained traction over the last few years are:
The big changes I see in the future in Java 8 are the Java Module System (JSR 277) and Closures (JSR 335). The latter is needed to make Fork/Join more useable. Closures may have made a bigger impact a few years ago, but now with a plethora of other JVM languages, I feel this will no longer be the case.
What's becoming increasingly apparent is the increasing importance of Java platform. The important thing is not the Java language itself but the JVM. This acts as a substrate for different languages. Although from above the languages may seem different, it ensures that underneath the covers the the behaviour is consistent.
Some examples of JVM languages which have gained traction over the last few years are:
- Scala
- Groovy and its statically typed cousin Groovy++ (A great article outlining it's usefulness http://groovy.dzone.com/articles/sneak-peak-groovy-what-it-why)
- Clojure
- JRuby
- Ceylon, Red Hat's Java competitor
- CAL, a Haskell-inspired functional programming language.
- Gosu (programming language), an extensible type-system language compiled to Java bytecode.
Tuesday, 2 August 2011
Using generics with a fluent API
I've recently worked on an integration project requiring a message API. It was to be responsible for building messages of various types to be sent on a variety of transports. Each message had numerous parameters but some were only used in particular contexts. This meant that frequently you'd get constructors where some of those parameters had to be set to null to indicate they were not to be used. This is a code smell. One solution would be to use the refactoring 'Introduce parameter object'. This refactoring is used to group parameters together into immutable classes so those parameters have a common context. This may alleviate the problem somewhat but in practice I found this resulted in several overloaded constructors each with different combinations of parameter objects. I needed another solution.
I've had some success in the past using fluent API with builders. A fluent interface is implemented by using method chaining to relay the instruction context of a subsequent call. My expectation was to create something like:
Some parameters were common to all message types so it made sense to have these in a base class.
The message base class:
and it's associated builder:
I then subclassed the builder class to create the specialized message types. For brevity most parameters have been omitted.
An example of a specialized message class with inherits all the properties of the base class along with its own parameters.
The builder for this class:
However to get the expected usage of:
I've had to subclass all withXXX methods from the Message base class. In this example, it's not too painful as there's only a couple of common parameters but more realistically I could have numerous parameters which means for each specialized subclass I need to override those methods. This approach quickly becomes unwieldy and a maintenance headache. To make sure that method chain returns the correct type I've had to resort to calling the super method and then return the correct type i.e this. Not very good.What I needed was for each subclass to inherit the super class methods implicitly but with the proviso that they return the actual type not the super type.
A solution is to use Self Bound GenericTypes. Angelika Langer gives a good explanation which she calls the getThis() trick.
The base MessageBuilder class has been changed to use self bound generic types. So for example, instead of hardwiring
Now all the subclass has to do, in addition to its own fluent methods, is to provide an implementation of the self() method. Job done.
In this way, subclasses for builders only need be concerned with providing fluent methods for the parameters pertinent to that class. They will automatically inherit fluent methods from the super class so no overriding is required and the intent of the class is clearer. I can build up that message's fluent methods to set the parameters I actually need for that particular context without resorting to long unwieldy parameter lists. Again the intent is clearer. Result !
Thanks to a bit of tinkering, this solution seems elegant and understandable now but as usual with generics it'll be mostly incomprehensible tomorrow :).
Links
I've had some success in the past using fluent API with builders. A fluent interface is implemented by using method chaining to relay the instruction context of a subsequent call. My expectation was to create something like:
Some parameters were common to all message types so it made sense to have these in a base class.
The message base class:
and it's associated builder:
I then subclassed the builder class to create the specialized message types. For brevity most parameters have been omitted.
An example of a specialized message class with inherits all the properties of the base class along with its own parameters.
The builder for this class:
However to get the expected usage of:
I've had to subclass all withXXX methods from the Message base class. In this example, it's not too painful as there's only a couple of common parameters but more realistically I could have numerous parameters which means for each specialized subclass I need to override those methods. This approach quickly becomes unwieldy and a maintenance headache. To make sure that method chain returns the correct type I've had to resort to calling the super method and then return the correct type i.e this. Not very good.What I needed was for each subclass to inherit the super class methods implicitly but with the proviso that they return the actual type not the super type.
A solution is to use Self Bound GenericTypes. Angelika Langer gives a good explanation which she calls the getThis() trick.
The base MessageBuilder class has been changed to use self bound generic types. So for example, instead of hardwiring
withId()
to return the type of the builder that defines it, a type parameter B
is introduced and withId() returns B
via the abstract self() method. This self method implemented in the subclass to return the concrete type rather than the base type of the builder. The self-referential definition MessageBuilder<B extends MessageBuilder<B>>
allows the return type of the inherited withId()
in RequestBuilder
to be RequestBuilder
rather than MessageBuilder
.Now all the subclass has to do, in addition to its own fluent methods, is to provide an implementation of the self() method. Job done.
In this way, subclasses for builders only need be concerned with providing fluent methods for the parameters pertinent to that class. They will automatically inherit fluent methods from the super class so no overriding is required and the intent of the class is clearer. I can build up that message's fluent methods to set the parameters I actually need for that particular context without resorting to long unwieldy parameter lists. Again the intent is clearer. Result !
Thanks to a bit of tinkering, this solution seems elegant and understandable now but as usual with generics it'll be mostly incomprehensible tomorrow :).
Links
Monday, 1 August 2011
A comparison of FDD and Scrum
FDD and Scrum are two examples of agile development methodologies. Agile development tries to avoid the main weakness of "waterfall" by doing iterative development. Each iteration is meant to be short (1-2 weeks) and includes all of the following steps.
- Gathering user requirements
- Design and documentation
- Development
- Testing
- Deployment
This guarantees that design errors are discovered at early stages of development. I've had experience on working on projects using both methodologies and it was interesting to compare the salient features of both approaches. Before the comparison, a short overview of each practice is given.
Feature Driven Development
Feature Driven Development, FDD is an agile software development methodology by Jeff De Luca and Peter Coad. It has more formal requirements and steps than Scrum from a development perspective.
FDD consists of five high level activities:
- Develop an overall domain mode
- FDD advocates light modelling up front to understand the shape and scope of the application.
- Build a list of features
- Team builds a feature list. Each feature represents 1-10 days worth of effort.
- A feature is a small piece of client-valued function expressed in the form: <action> <result> <object> i.e. Calculate the total of a sale
- Plan by feature
- Features are then assigned to iterative release cycles.
- Developers are also assigned to own particular classes identified in the domain model by a chief programmer.
- Design By Feature (DBF)
- Feature teams are assigned to a set of features and proceed to the design in detail i.e. sequence diagrams
- Build By Feature (BBF)
- Those teams then carry out the code development and testing of those features.
- When chief programmer is satisfied then completed features are promoted to the main build.
Progress and can be tracked and reported with accuracy by assigning a percentage weighting to each step in a DBF/BBF iteration The chief programmers indicate when each step has been completed for each feature they're developing. This enables a view on how much of a particular feature has been completed. The cycle repeats itself either by refinement of the original domain model and subsequent activities or until all the features in the list have been built.
Scrum
Scrum is an iterative, incremental framework for project management often seen in agile software development, a type of software engineering. Essentially Scrum boils down to the following points (from Scrum Alliance):
Saturday, 16 July 2011
A tomato a day keeps the defects away
Recently I've been getting some odd stares from people passing by my desk. This was because on my desktop I have a large stopwatch prominently displayed in my work environment. This was not a brazen attempt to count down the hours until I could go home. I was following a recommended practice to break my work patterns in order to increase my productivity.
A good technique is to take brief but regular breaks. I can sometimes fall into the habit of focusing on a problem too much to the detriment of other things, sometimes even forgetting to take lunch until much later in the day.
A solution is to adhere to the Pomodoro Technique which means tomato in Italian. In this approach, team members work in 30 minute increments. At the start of the increment, a timer is set for 25 minutes. The team works diligently during that time without distractions. Those emails and phone calls can wait. When timer goes off, then team takes a five minute break at which they can walk around, stretch etc. However that time should not be used to talk shop. It really is a break from work. When that break is up, you repeat the process. Every fourth Pomodoro you take a longer break i.e. 15 minutes.
- Choose a task to be accomplished
- Set the Pomodoro to 25 minutes (the Pomodoro is the timer)
- Work on the task until the Pomodoro rings
- Take a short break (5 minutes is OK)
- Every 4 Pomodoros take a longer break
Although I still a Pomodoro newbie, I’ve found it useful to focus my work into manageable slices. As a Scrum practitioner, I can focus on a particular story and finish a task within a single increment. I've become accustomed to breaking a story I’m working on into several small tasks. I'm a big fan of Pivotal Tracker. In Pivotal, I can decompose stories into tasks, each of which are easily understandable and trackable so I can gauge my progress.
Normally if one was following Pomodoro you would have a physical kitchen timer on the desk but I’m a techie, I’ve found there are numerous online alternatives. A good one is http://tomatoi.st/bcdg.
Wednesday, 1 June 2011
JUnit Theories to the Rescue
I recently developed a restart manager to respond to JMS connection outages. After the connection was restored, I needed to reinitialize my message clients in order for them to rebuild their JMS sessions. But I needed to test that a restart only occurred for particular status transitions. At first I thought this would be an onerous task as I would have to work out all the status permutations. That is until I came across JUnit Theories.
JUnit Theories
Theories allows one to write tests that apply to a (potentially infinite) set of data points rather than having to recreate the same test multiple times with different data or creating one test and iterating through your own collection of data values.
Here is an example:
Messaging client
First off is the interface for the messaging client. It has two lifecycle methods, start and stop which are self-explanatory, as well as methods for reading and sending a message which for brevity are just strings.
Status Enum
The following enum represents the status of the JMS connection. As can be seen there are numerous states so it would be cumbersome to work out all the possible combinations for each status transition. This is where Theories become so useful. More of that later.
Restart Manager
This class is used to restart a message client when it recognizes a status transition from FAILED to STARTED.
Test class
Finally here's the test class. Notice I'm using mock objects here using Mockito. As I'm programming to the interface of the messaging client not its implementation, I can make use of a mock and verify its behaviour. Presently as of JUnit 4.8, theories is still within an experimental package.
The most important bit of configuration is @Datapoints annotation. This sets up the data that'll be pass into the test methods. In this test I have two theory methods. One theory checks that the message client is restarted when there is a suitable status transition i.e. FAILED to STARTED. Another theory checks that the message client is not restarted if the status transition is not suitable. i.e. not FAILED to STARTED.
When you run the tests you'll find the all the combinations of the Status enums are passed to the test methods in question as shown below:
In this way, I ensure that all possible transitions are covered by my unit tests and the restart manager's behaviour is as required.
Conclusion
The use of theories allows tests to be devised that cover all possible combinations of data. This is in contrast to parameterized tests where the dataset to be passed to a test is strictly defined and the onus is on the developer to work out what data is needed for a particular range of tests. Each approach can be used in different situations as dictated by your requirements.
JUnit Theories
Theories allows one to write tests that apply to a (potentially infinite) set of data points rather than having to recreate the same test multiple times with different data or creating one test and iterating through your own collection of data values.
Here is an example:
Messaging client
First off is the interface for the messaging client. It has two lifecycle methods, start and stop which are self-explanatory, as well as methods for reading and sending a message which for brevity are just strings.
Status Enum
The following enum represents the status of the JMS connection. As can be seen there are numerous states so it would be cumbersome to work out all the possible combinations for each status transition. This is where Theories become so useful. More of that later.
Restart Manager
This class is used to restart a message client when it recognizes a status transition from FAILED to STARTED.
Test class
Finally here's the test class. Notice I'm using mock objects here using Mockito. As I'm programming to the interface of the messaging client not its implementation, I can make use of a mock and verify its behaviour. Presently as of JUnit 4.8, theories is still within an experimental package.
The most important bit of configuration is @Datapoints annotation. This sets up the data that'll be pass into the test methods. In this test I have two theory methods. One theory checks that the message client is restarted when there is a suitable status transition i.e. FAILED to STARTED. Another theory checks that the message client is not restarted if the status transition is not suitable. i.e. not FAILED to STARTED.
When you run the tests you'll find the all the combinations of the Status enums are passed to the test methods in question as shown below:
In this way, I ensure that all possible transitions are covered by my unit tests and the restart manager's behaviour is as required.
Conclusion
The use of theories allows tests to be devised that cover all possible combinations of data. This is in contrast to parameterized tests where the dataset to be passed to a test is strictly defined and the onus is on the developer to work out what data is needed for a particular range of tests. Each approach can be used in different situations as dictated by your requirements.
Labels:
mock objects,
parameterized testing,
TDD,
theories,
unit testing
Tuesday, 24 May 2011
Further musings on BDD
As member of the London Java User Group, I was able to gauge the experience of my peers on their use of BDD. I been able to obtain some great advice and avenues for further investigation. Thanks in particular go out to Richard Paul, Bruce Loewe and John Stevenson.
So here's a summary of their thoughts:
Links for further investigation
Below are some links that might on getting started with BDD on the JVM:
So here's a summary of their thoughts:
- Most of the challenges are not particularly tool specific and often it is writing useful "acceptance tests" or "examples" in BDD parlance is where people get caught up.
- Tools are useful but shouldn't be an obstacle during the discussion and documenting of scenarios. Recommended tools included Cucumber and JBehave which have plain text features and scenarios. The separation of implementation details from the scenarios helps one focus on the scenarios and language at the right level i.e. at the business level. The discussion of examples with stakeholders, developers, QAs, BAs then proceeds more naturally.
- A recommended read is Specification by example book as it is brimming with examples of how people have applied the BDD techniques. It's not a book that will hand hold you through writing scenarios, but it will help you to understand what scenarios and features should be about.
- In a previous blog, I discounted Cucumber as being too Ruby centric. I now stand corrected. Cucumber isn't just for Rubyist. While the tool itself is written in Ruby, it has bindings to many popular languages on the JVM through Cuke4Duke including Java, Scala, Groovy and Clojure. This allows the automation of step definitions in the language of your choice and integrates reasonably well into build tools such as Ant and Maven. Aslak Hellesoy, the creator of Cucumber, is working on a pure Java version of Cucumber with intended tighter support with JUnit and other testing frameworks.
Division of labour
- Regarding the roles of team members in writing and automating specifications, it all depends on the technical skill levels of members of your team. Generally it is developers and QA writing scenarios, with input from product owner types.
- Ideally a developer or tester would be pairing with a stakeholder as they break down the key examples. Other examples can later be filled in as they are thought of and run by the interested stakeholder.
- Automation will initially require someone with strong coding skills whether that be a developer or a QA with good programming knowledge. Once the foundations are there it becomes easier for less programming oriented members to follow existing examples in the code base.
- It's important to treat the automation layer as a first class citizen in regard to refactoring and cleanliness.
What about unit tests?
Overlap with unit level tests is something to be wary of. It often means there are many high level scenarios and possibly every permutation does not need to be covered at the top level. A balance is usually required since functional/scenario level tests are generally slower to run and more expensive to maintain.
Links for further investigation
Below are some links that might on getting started with BDD on the JVM:
- http://cuke4ninja.com/
- http://www.rapaul.com/tag/cucumber/
- An informative talk on Cucumber on the JVM with Groovy.
- https://github.com/aslakhellesoy/cuke4duke/tree/master/examples (JVM language examples)
- http://specificationbyexample.com/ (the long term benefits of specification by example/BDD)
- Joseph Wilk from Songkick gave a very interesting talk on how they adopted BDD and Cucumber and talked about lessons learnt.
The aforementioned comments and guidance have been very helpful and informative. I'm sure it'll stand me in good stead when I get round to a BDD spike in the next few weeks.
Sunday, 15 May 2011
Programming Retrospective
I've recently had to develop and improve a legacy code base. This has reminded me of a several anti-patterns one should consider when coding. This is by no-means an exhaustive list but conveys some of the most frequent ones in my experience.
Final classes without interfaces
There were a lot of instances where classes were declared as final. This may be a good thing from a security perspective but from a testing perspective it meant that mock classes could not be created resulting in unit tests that were harder to write or could not be written.
There are tools to can create mock objects even without an interface. A particular favourite of mine is Mockito. In fact the mocking of final classes can be overcome through the use of the PowerMock API with Mockito. However a class with an interface is preferable from an OO standpoint as the use of concrete classes results in tighter coupling between participants.
Lack of Defensive Programming
There are numerous examples of methods without any defensive checks. The caller of this method had a try catch block to catch a NullPointerException, NPE. This is a bad code smell because the onus is on the caller of the method to assert the validity of parameters that are passed to the method in question. This is the wrong place. From an encapsulation point of view, it should be the method that validates whether its input parameters are valid or not. This is a form of Design by Contract where the method checks to see if input parameters satisfies some condition and reacts accordingly. I like to think this as the 'Bouncer' Pattern. If you don't look right, you're not getting in :).
A contrived example is shown below:
This short of approach should also be applied to constructors to ensure the object has a valid state and is fully built before use. Furthermore the throwing of exceptions give more context as to what has caused the failure rather than catching a NPE and having to retrospectively determine what has called that NPE to be thrown.
A favourite API of mine is the Validator class in Commons Lang which lists numerous methods for validation in different contexts.
Exposure of super state to child classes
I've seen a lot of instances where child classes would use the state variables of a super class. Variables had protected visibility so there was naked access to these variables which are inherently dangerous. This is because a child class could change the reference to a super class variable with unforeseen consequences. I think the intention was to provide access to a child class with a parent's state so the child class can perform an operation. A quick fix would be to only allow these parent state variables to be access through accessor methods.
http://c2.com/cgi/wiki?InappropriateIntimacy
But this begs the question on why child classes needed that sort of access in the first place. IMHO, private state should never escape the confines of a class, only behaviour.
The other smell was that the super class was becoming top heavy with functionality. Probably the reason was that common functionality was pushed up the class hierarchy for re-use by child classes. But this resulted in the super class becoming bloated and unfocused. A better solution would be to use delegation techniques rather than inheritance.
http://www.refactoring.com/catalog/replaceInheritanceWithDelegation.html
When I look at a class functionality, I like to keep in mind a Unix philosophy i.e. Do one thing but do it well. If you find your class not adhering to that maxim, that's a sign refactoring is in order.
Printing out error messages to console instead of logging
I've came across a few situations where exception stack traces were dumped to console. Don't do that. Use a logging framework such as log4j. If log4j is used then a console appender can be used to achieve the same goal. Furthermore errors and warnings could be logged to a specific destination i.e. a file so one can see only pertinent errors and not worry about debug messages.
Another observation was the lack of categories used in logging. Most of the statements I saw were a generic dump of error messages for the whole platform. Using categories allows log messages to be sorted. For example I could have a category called com.acme.X for X related logs and com.acme.Y for Y related logs. If I wanted to see all logs I create another appender that logged at the com.acme level. At the very least, use the fully qualified name of the class the logger resides in as the logging category. The use of categories results in greater capabilities on what should be logged and where it should be logged to. In this example X and Y are logged to different appenders but the possibilities are endless depending on how the categories are devised.
Classes with unclear focus
There were examples of classes trying to do too much. An example would be a Handler class. The main functions are listed below
Most of the examples I've seen are that the majority of functionality is realized inside one class instead of being delegated to other classes. The lack of delegation means the intention of the class is lost. Furthermore testing of the handler becomes more problematic. By delegation, each of these functions can be tested in isolation.
http://c2.com/cgi/wiki?LongMethodSmell
http://c2.com/cgi/wiki?GodClass
Unwieldy or unneeded comments
There were a lot of instances where code comments were of no use or didn't add extra information. For example, one method contained a lot of retrievals from a database along with ambiguous looping constructs. Each part of the method contained a comment explain what the next section of code would do. The reason I don't like this firstly comments are deodorant on 'smelly' code. That comment is probably there because the code is not clear enough to be understood. Secondly comments are brittle. If I changed that section of code, then I have to remember to change the comments, another piece of maintenance.
I am a proponent of 'Programming by Intention'. This is a programming style where you give meaningful names to methods, variables, classes etc so that the intent of the object in question is clear. Dave Astel gives an excellent overview here: http://www.informit.com/articles/article.aspx?p=357688
In the case of the Handler class, the method was essentially doing three things:
i.e.
Now the intention of this method is clearer. The code becomes self-describing and there is not need for extraneous comments.
'Programming by Intention' is not used to declare all commenting is bad, just that commenting must not duplicate a purpose. If the code is clear then commenting what the code does is unneeded. However comments may still be needed. You could draw attention to a particular algorithm being used i.e MergeSort or that the code fixes a particular defect. When the comment has value it should be included, if not it should be discarded.
Use of exceptions to control program flow
Simple. Don't to it. The following link provides arguments:
http://c2.com/cgi/wiki?DontUseExceptionsForFlowControl
Throwing of ambiguous exceptions
There were numerous occasions where java.lang.Exception was thrown instead of a more specific exception. This is bad practice because throwing an ambiguous exception means the catcher cannot react to the exception in different ways. An ambiguous exception loses information on whether the situation is recoverable or irrecoverable. The meaning of the error is also lost. A specific exception should be thrown for a particular situation.
http://c2.com/cgi/wiki?ExceptionPatterns
Use parameter objects instead of long method signatures.
There were a few cases where methods had long method signatures. I'm talking about 10 or more parameters. This makes the method call unwieldy and prone to mistakes. A better approach is to use a parameter object which encapsulates the method signature, simplifying the method call.
http://c2.com/cgi/wiki?ParameterObject
Furthermore different parameter objects can be used to group together related parameters for different contexts. This is preferable than nullifying unneeded parameters in the long method signature.
Never Duplicate Code
http://c2.com/cgi/wiki?OnceAndOnlyOnce
A few examples were observed where code was duplicated and in some places just a copy and paste job. This means that if any defects are found, then the fixes have to be applied in more than one place. Ideally code should be written using DRY principles i.e. Don't Repeat Yourself. Situations where the same code exists in different places should be remedied by that code being pulled out into a separate method and re-used.
Return nulls from methods.
There were a few cases where a call to a method resulted in a null being returned. For instance a client asked for a map and got returned a null because the input parameters were incorrect. The onus is then on the callee to check the result is not null before using the result. A better approach would be to return a NULL object. The Null Object pattern provides an alternative. It connotes the absence of an object. Instead of using null, the Null Object pattern uses a reference to an object that doesn’t do anything.
http://en.wikipedia.org/wiki/Null_Object_pattern
In this example, instead of returning null, an empty map should be returned. The client then doesn't have to check for nulls. This leads to safer code.
Conclusion
A lot of my recommendations are based on Martin Fowlers' Refactoring which gives guidance on how to remove particular code smells. However as legacy code is usually not particularly amenable to unit testing, refactoring can give a low confidence level as there are not the number of unit tests to back it up. Part of the Test Driven Design (TDD) approach is that unit tests are written to prove the behaviour of the system at a granular level. Once you have the tests, you have the confidence to refactor as you can regression test to see the system behaves as before. In my opinion a lot of the defects seen in production code would be diminished by the use of unit testing and paying heed to the aforementioned anti-patterns.
Recommended Reading
Final classes without interfaces
There were a lot of instances where classes were declared as final. This may be a good thing from a security perspective but from a testing perspective it meant that mock classes could not be created resulting in unit tests that were harder to write or could not be written.
There are tools to can create mock objects even without an interface. A particular favourite of mine is Mockito. In fact the mocking of final classes can be overcome through the use of the PowerMock API with Mockito. However a class with an interface is preferable from an OO standpoint as the use of concrete classes results in tighter coupling between participants.
Lack of Defensive Programming
There are numerous examples of methods without any defensive checks. The caller of this method had a try catch block to catch a NullPointerException, NPE. This is a bad code smell because the onus is on the caller of the method to assert the validity of parameters that are passed to the method in question. This is the wrong place. From an encapsulation point of view, it should be the method that validates whether its input parameters are valid or not. This is a form of Design by Contract where the method checks to see if input parameters satisfies some condition and reacts accordingly. I like to think this as the 'Bouncer' Pattern. If you don't look right, you're not getting in :).
A contrived example is shown below:
This short of approach should also be applied to constructors to ensure the object has a valid state and is fully built before use. Furthermore the throwing of exceptions give more context as to what has caused the failure rather than catching a NPE and having to retrospectively determine what has called that NPE to be thrown.
A favourite API of mine is the Validator class in Commons Lang which lists numerous methods for validation in different contexts.
Exposure of super state to child classes
I've seen a lot of instances where child classes would use the state variables of a super class. Variables had protected visibility so there was naked access to these variables which are inherently dangerous. This is because a child class could change the reference to a super class variable with unforeseen consequences. I think the intention was to provide access to a child class with a parent's state so the child class can perform an operation. A quick fix would be to only allow these parent state variables to be access through accessor methods.
http://c2.com/cgi/wiki?InappropriateIntimacy
But this begs the question on why child classes needed that sort of access in the first place. IMHO, private state should never escape the confines of a class, only behaviour.
The other smell was that the super class was becoming top heavy with functionality. Probably the reason was that common functionality was pushed up the class hierarchy for re-use by child classes. But this resulted in the super class becoming bloated and unfocused. A better solution would be to use delegation techniques rather than inheritance.
http://www.refactoring.com/catalog/replaceInheritanceWithDelegation.html
When I look at a class functionality, I like to keep in mind a Unix philosophy i.e. Do one thing but do it well. If you find your class not adhering to that maxim, that's a sign refactoring is in order.
Printing out error messages to console instead of logging
I've came across a few situations where exception stack traces were dumped to console. Don't do that. Use a logging framework such as log4j. If log4j is used then a console appender can be used to achieve the same goal. Furthermore errors and warnings could be logged to a specific destination i.e. a file so one can see only pertinent errors and not worry about debug messages.
Another observation was the lack of categories used in logging. Most of the statements I saw were a generic dump of error messages for the whole platform. Using categories allows log messages to be sorted. For example I could have a category called com.acme.X for X related logs and com.acme.Y for Y related logs. If I wanted to see all logs I create another appender that logged at the com.acme level. At the very least, use the fully qualified name of the class the logger resides in as the logging category. The use of categories results in greater capabilities on what should be logged and where it should be logged to. In this example X and Y are logged to different appenders but the possibilities are endless depending on how the categories are devised.
Classes with unclear focus
There were examples of classes trying to do too much. An example would be a Handler class. The main functions are listed below
- Setup relevant properties needed by the handler
- Handle requests.
- Convert a request to a protocol specific message.
- Handle synchronous and asynchronous responses
Most of the examples I've seen are that the majority of functionality is realized inside one class instead of being delegated to other classes. The lack of delegation means the intention of the class is lost. Furthermore testing of the handler becomes more problematic. By delegation, each of these functions can be tested in isolation.
http://c2.com/cgi/wiki?LongMethodSmell
http://c2.com/cgi/wiki?GodClass
Unwieldy or unneeded comments
There were a lot of instances where code comments were of no use or didn't add extra information. For example, one method contained a lot of retrievals from a database along with ambiguous looping constructs. Each part of the method contained a comment explain what the next section of code would do. The reason I don't like this firstly comments are deodorant on 'smelly' code. That comment is probably there because the code is not clear enough to be understood. Secondly comments are brittle. If I changed that section of code, then I have to remember to change the comments, another piece of maintenance.
I am a proponent of 'Programming by Intention'. This is a programming style where you give meaningful names to methods, variables, classes etc so that the intent of the object in question is clear. Dave Astel gives an excellent overview here: http://www.informit.com/articles/article.aspx?p=357688
In the case of the Handler class, the method was essentially doing three things:
- Obtaining a customer ID
- Obtaining an billing ID
- Obtaining other parameters from a database and checking to see if those parameters had values.
i.e.
Now the intention of this method is clearer. The code becomes self-describing and there is not need for extraneous comments.
'Programming by Intention' is not used to declare all commenting is bad, just that commenting must not duplicate a purpose. If the code is clear then commenting what the code does is unneeded. However comments may still be needed. You could draw attention to a particular algorithm being used i.e MergeSort or that the code fixes a particular defect. When the comment has value it should be included, if not it should be discarded.
Use of exceptions to control program flow
Simple. Don't to it. The following link provides arguments:
http://c2.com/cgi/wiki?DontUseExceptionsForFlowControl
Throwing of ambiguous exceptions
There were numerous occasions where java.lang.Exception was thrown instead of a more specific exception. This is bad practice because throwing an ambiguous exception means the catcher cannot react to the exception in different ways. An ambiguous exception loses information on whether the situation is recoverable or irrecoverable. The meaning of the error is also lost. A specific exception should be thrown for a particular situation.
http://c2.com/cgi/wiki?ExceptionPatterns
Use parameter objects instead of long method signatures.
There were a few cases where methods had long method signatures. I'm talking about 10 or more parameters. This makes the method call unwieldy and prone to mistakes. A better approach is to use a parameter object which encapsulates the method signature, simplifying the method call.
http://c2.com/cgi/wiki?ParameterObject
Furthermore different parameter objects can be used to group together related parameters for different contexts. This is preferable than nullifying unneeded parameters in the long method signature.
Never Duplicate Code
http://c2.com/cgi/wiki?OnceAndOnlyOnce
A few examples were observed where code was duplicated and in some places just a copy and paste job. This means that if any defects are found, then the fixes have to be applied in more than one place. Ideally code should be written using DRY principles i.e. Don't Repeat Yourself. Situations where the same code exists in different places should be remedied by that code being pulled out into a separate method and re-used.
Return nulls from methods.
There were a few cases where a call to a method resulted in a null being returned. For instance a client asked for a map and got returned a null because the input parameters were incorrect. The onus is then on the callee to check the result is not null before using the result. A better approach would be to return a NULL object. The Null Object pattern provides an alternative. It connotes the absence of an object. Instead of using null, the Null Object pattern uses a reference to an object that doesn’t do anything.
http://en.wikipedia.org/wiki/Null_Object_pattern
In this example, instead of returning null, an empty map should be returned. The client then doesn't have to check for nulls. This leads to safer code.
Conclusion
A lot of my recommendations are based on Martin Fowlers' Refactoring which gives guidance on how to remove particular code smells. However as legacy code is usually not particularly amenable to unit testing, refactoring can give a low confidence level as there are not the number of unit tests to back it up. Part of the Test Driven Design (TDD) approach is that unit tests are written to prove the behaviour of the system at a granular level. Once you have the tests, you have the confidence to refactor as you can regression test to see the system behaves as before. In my opinion a lot of the defects seen in production code would be diminished by the use of unit testing and paying heed to the aforementioned anti-patterns.
Recommended Reading
- Refactoring: Improving the Design of Existing Code by Martin Fowler. The classic book on taking poorly designed code and making it better without breaking it.
- A good introduction to TDD. Test-Driven Development: A Practical Guide by Dave Astels
- A TDD book for the more seasoned developer: Growing Object-Orientated Software, Guided by Tests by Steve Freeman and Nat Pryce. Probably if TDD was adopted in the first place many of these problems I described would not have arisen in the first place.
- But if you have a legacy code base showing these symptoms then a great book to read is Working Effectively with Legacy Code by Martin Feather. It has a set of techniques for changing code so that it is more suitable to be covered by unit tests
Labels:
anti-patterns,
legacy,
refactoring,
TDD,
unit testing
Thursday, 14 April 2011
Musings on Behaviour Driven Development
I've been following Behaviour Driven Development (BDD) from a distance, for a while and was reading a couple of good articles about it. A good introduction is one by Dan North.
http://dannorth.net/introducing-bdd/
As an advocate of Test Driven Development (TDD), I find sometimes it can make you focus on the finer detail at the expense of missing the bigger picture. Hopefully BDD can be used to fill this gap.
I'm all for closing the loop between QA/business analysts and developers. Using traditional approaches, it's inevitable that some things will be lost in translation.
I've had a brief look at Cucumber and conceptually I like what I see. However as my main skill set is in Java, I wasn't too enamoured in having to learn Ruby to get BDD benefits. Fortunately there are options. I like JBehave's approach to BDD. It's more amenable in the sense that the stories(specifications) are written in simple English and steps are written in Java in preference to Ruby.
A business analyst can write a number of stories in the normal BDD format in plan text files i.e. Given X, When Y Then X
For example:
Every step (Given, When, Then) is then executed by a JUnit test which can extract parameters from the step. So basically the acceptance criteria are always driven by the business analyst. There is little opportunity for ambiguity as a tight coupling will always be enforced between the specifications and its JUnit counterpart.
See http://jbehave.org/reference/stable/developing-stories.html for more information and examples.
I expect that the path to getting comfortable with BDD will be similar to TDD: writing lots of tests, some of them fairly bad, until over time we get a certain feeling for what's right or wrong develops.
My only reservation is that most of examples I've seen are relatively simple. I'd be interested in a real-life example especially in the potential complexity of stories.
I am sceptical but at the same time curious. But I see that by utilising a BDD approach the test fixtures and tests become self describing. They exhibit meta data for the understanding of the intentions and actions of the code. Anyone who's worked with me knows that documentation is the bane of my life. Anything that could make my life easier in that respect gets a thumbs up from me :).
http://dannorth.net/introducing-bdd/
As an advocate of Test Driven Development (TDD), I find sometimes it can make you focus on the finer detail at the expense of missing the bigger picture. Hopefully BDD can be used to fill this gap.
I'm all for closing the loop between QA/business analysts and developers. Using traditional approaches, it's inevitable that some things will be lost in translation.
I've had a brief look at Cucumber and conceptually I like what I see. However as my main skill set is in Java, I wasn't too enamoured in having to learn Ruby to get BDD benefits. Fortunately there are options. I like JBehave's approach to BDD. It's more amenable in the sense that the stories(specifications) are written in simple English and steps are written in Java in preference to Ruby.
A business analyst can write a number of stories in the normal BDD format in plan text files i.e. Given X, When Y Then X
For example:
- Given a refund request with a threshold of 10.0
- When refund request received for 20.0
- Then the alert status should be ON
Every step (Given, When, Then) is then executed by a JUnit test which can extract parameters from the step. So basically the acceptance criteria are always driven by the business analyst. There is little opportunity for ambiguity as a tight coupling will always be enforced between the specifications and its JUnit counterpart.
See http://jbehave.org/reference/stable/developing-stories.html for more information and examples.
I expect that the path to getting comfortable with BDD will be similar to TDD: writing lots of tests, some of them fairly bad, until over time we get a certain feeling for what's right or wrong develops.
My only reservation is that most of examples I've seen are relatively simple. I'd be interested in a real-life example especially in the potential complexity of stories.
I am sceptical but at the same time curious. But I see that by utilising a BDD approach the test fixtures and tests become self describing. They exhibit meta data for the understanding of the intentions and actions of the code. Anyone who's worked with me knows that documentation is the bane of my life. Anything that could make my life easier in that respect gets a thumbs up from me :).
Subscribe to:
Posts (Atom)