More and more of late, I have noticed that the solutions people arrive at for problems are often very indirect. I have started to suspect that it may be a characteristic of human behaviour, but perhaps it is just a characteristic of my current management at work. Also, there may be room to consider this ‘procrastination’, and maybe in some way, people have got it into their heads that problems always need creative solutions – when often the opposite is true – there is a simple and obvious next step that once taken will improve the situation.
The rest of this post will outline a number of cases in my own recent experience where actions and projects have been undertaken that seem to have been quite ‘indirect’.
Error Logging
A little over a year ago, it was becoming more and more obvious that the bespoke software product I work on was logging more and more errors through the day. One possible reason was that we had recently taken on a number of new staff, and perhaps they were unfamiliar with the intent of our logging infrastructure, or maybe a few simple problems were causing a lot of errors to be recorded, and essentially every release was followed by new errors and exceptions being recorded.
We discussed possible causes for these problems, and as far as I recall these are the sorts of things I considered important:
- Allocate developers to fix some of the code that was logging errors, and reduce the error count directly;
- Discuss with the group, and individuals if necessary;
- Feedback from the developers on the first point might reveal if there were any trends – would we be best off having continuous dedicated to this one role, or would training do the job.
Somehow, however, the decision making lead up a totally different garden-path:
- Understanding where errors come from must be difficult (not really, but it might not be automated);
- It would help if every error was unique so we did not have to identify where they came from with any real effort;
- Our logging service did not allow recording of a unique error / incident number…
- …therefore we should rewrite the Logging Service, and have a supporting website to automatically analyse the logs.
Yes – I’m not kidding – the solution to recording too many errors and the situation getting worse was to change the way we log errors, so that analysis of the problems would be easier in future.
Result: One year later, we have a half-finished, buggy alternative to the original Logging Service, but we still have to use the original because the new version does not work right. It always logs times as if the message was received at the equivalent time in the morning (a message at 23:12:39 logs as 11:12:39, and every log message records the exact-same stack-trace of the logging service function). As far as the original business problem was concerned, errors logs occur between 3 and 10 times more frequently.
Version Control
A year or more ago, there was general dissatisfaction with the version control system we used, which was Subversion (SVN). I have to admit that I was slow to see the problems, and I was quite frustrated that many of our problems seemed to be caused by developers not following our own guidelines for using the tool, but I did eventually recognise that some bugs were extremely frustrating and could be time-consuming to correct. For example, a failed or cancelled Merge process could result in files being left on the hard-disk in an uncontrolled state, and this in turn seemed to lead to a problem where a future merge could ignore that file! (with serious consequences).
Some of the problems were considerably exacerbated by several large-scale refactoring projects that impacted hundreds of code files. As each of these projects committed, every other coder on parallel branch had considerable problems with merging those changes,
Instead of making progress in ways I considered to be fairly direct:
- Ensure everyone worked in ‘approved’ ways;
- Meet to understand each and every problem when working to those procedures, and adjust procedures to cope;
- …and potentially even allocate some resource to the open-source project to help fix some of the problems with the toolset.
Result: What actually happened was that it was decided we should use Git as a version control tool. 2 years later, we have finally switched to Git. For the first three weeks after switching we pretty-much failed to complete a release cycle properly, and we are still very-much in a learning curve as far as this software is concerned, introducing a number of new bugs to our code as a result of incomplete merges, missing files, and other issues. Team leads are now responsible for merges to ‘integration branches’ which actually accounts for a fair amount of time.
Summary
The problems discussed here could equally have been introduced under the heading of ‘Lack of Change Management’ or something similar. But both examples show how in several respects, the indirect solution introduced some harmful effects; like actually making no overall progress on solving the original business problem in the case of the logging, and in the case of moving to Git, the ‘big decision’ to move to Git made simple process improvements with the existing tool-set seem wasteful (so they were not done). I believe that more direct (and quicker-to-deliver) solutions would have helped in the short and medium term, even if a decision was made later on to change version control systems.
Both examples also demonstrate revolutionary rather than evolutionary changes. But I think that’s a post for another day.