I’ve just worked on a little support problem that was quite interesting – although not in a good way – as unfortunately it demonstrates failures at so many stages of the specification and development process that I am quite disappointed to be associated with it. Associated, but not the cause of it, to be clear 🙂
A couple of years ago, I was asked to add support for sending SMS text messages to mobile (and home) phones using a web-service. Texts are typically limited to 160 characters, and as the formats of the messages I had been given were likely to exceed that once field-replacement (which I will just call ‘mail-merge’ from now on) had been completed, I wrote a template-editing screen that demonstrated the likely length of the message (with fields replaced with typical values) and also a second version with fields replaced by the maximum possible values. Depending on which if any of the mail-merged versions exceeded 160 characters, a warning message was displayed to warn of this. And there was also a small test-suite that tested the back-end code.
Skip to two days ago. I happened to be looking at our client’s live machine, and noticed that quite a number of the text formats exceeded 160 characters and of course were warning of such. In addition, I noticed that several of the templates had fields in them that appeared not to be supported, and this was demonstrated in that they had not been converted in the sample texts on the web-page. Needless to say, I notified the client…
A bit of back-and-forth later, I was asked by the client which field-codes were supported by the mail-merge process, and I researched this and reported on what was allowed by my centralised conversion routine. She went off and changed several of the templates so the screen showed the example messages correctly, only to find out within a few minutes that some texts were now going out with field-codes in them; so I must be wrong about what fields were supported! It turned out that in this particular example, someone had added a new function since the original which stored text templates in an entirely new location, implemented its own mail-merge routine, but used the template-editing screen (which did not use the alternate mail-merge routine) so the screen showed sample messages ‘unconverted’, when in fact when they were sent via the mechanism that they would be sent by, they would have been converted correctly. This, however, was only part of the problem, as it still left several templates with fields in that I could not account for where they could possibly be converted as part of the mail-merge process.
Fortunately, we keep a log of every SMS sent for the last little while, and so I was able to search it and see that we had not sent out any badly-formatted messages recently – but after reviewing the code, I also could not account for those messages that ought to have been sent.
So when were these phantom text messages sent?
It looked like the code for SMS support had been changed on two occasions. First, a whole bunch of SMS-templates were added, and to be honest I still don’t know if an when they were sent (probably nearly two years ago). Then, that code had been changed to replace those templates with a later set. If I ignored historic problems, that left me fewer templates to worry about… but it still left the question ‘when’ were they sent out?
It turned out that a whole bunch of text-templates had been added at some point in the past to support new functionality, but the query that searched for DB records that indicated this had conditional tests that effectively prohibited these new types from ever being sent. Well, that was quite lucky actually (considering the issues with message formatting), but really!
So – at this point we are seeing two kinds of ‘phantom code’:
- Code that looks like it should run, but has historically been lopped-off at some higher point in the call hierarchy… in this case I am presupposing that records of certain ‘older’ types will not be created in the particular DB table… this is a systemic issue in that code in one area relies on / or is only run as a consequence of something done somewhere entirely different, and when that trigger ceases to be fired (let’s assume intentionally)… well, the code down-stream of that becomes lifeless.
- Code that looks like it should run, and probably never has due to a bug up-stream of it. In this case, the range of possible expected values had been changed, but a particular query has not been changed to reflect that… and in this case that may well be because the name of the query suggested it was a ‘catch-all’ query, when in fact it was being more specific in its filtering than I believe it needed to be. This is a failure in testing, because as I’ve already noted, it probably never worked. Ever.
Additionally, we have ‘phantom data’: text message templates that are historic, but not readily identifiable as such.
Failures
Here’s my list of failure-points as I see it:
- My generic conversion routine was not flexible enough to easily add new functionality, but in my defence, YAGNI!
- The first person who came along later to add functionality to SMS really did have a totally new set of requirements, but chose not to redevelop pre-existing code and templates to fit in with his new requirements…
- …instead he ‘built his own’ but did not adjust the template-editing screen to work with his mail-merge code;
- The third person to add code for new templates also re-used some of the centralised functionality, but possibly did not warn the client that certain fields would not be merged. In our defence here, the client has a bit of a history of ‘making up field codes’ without the realisation that you can’t just decide that ‘MYFIELDCODE’ is a great idea and expect to have it converted… but bear in mind that the editing screen is now less useful since several messages are ‘legitimately’ shown with field-codes in place and unconverted;
- In two years, no-one complained once about the big red messages on the template-editing screen warning of messages that would be truncated. No-one suggested ways in which it could be made more clear (which I can now see);
- No-one added to the unit tests when they added functionality;
- No-one commented-out or removed code when it became irrelevant, or even to mark the code as ‘historic’ or deprecated;
- An absolute failure to test prior to release;
- An absolute failure to test post-release… someone could have tested some of the later changes were working ‘retrospectively’ by examining the logs of SMS messages sent to confirm they were going out as expected and that they had been converted ok. That obviously didn’t occur;
- No-one thought that it might be a bad idea to send a text to a customer at 3:40am. Yes, this is another classic disconnect in the sense that it is easy to work on a batch script without the knowledge of how and when it will actually be run. On the other hand, it does not take a great deal of time to find out.
So, all in all, these large concerns came out to be the ‘mere’ pain from a phantom-limb; it didn’t actually exist as callable-code, but it sure felt like it. And when we consider that those texts could have been sent at 3:40am, or that they would have had unsupported field-codes in them, and that a substantial portion of the message would have been truncated, one can only be grateful that in fact things conspired so that they were not sent!
What about Historic data?
Sometimes, it is necessary to maintain support for something even if it should never happen or indeed can not happen any more. In this case, I don’t believe it would have been necessary to reconstruct texts that ‘should’ have been sent on some historic date… as we had a reasonable history of the actual texts that were sent for quite some time. We might have ‘had’ to maintain historic enumeration values; but these could have been commented as being present only to support historic records and so on. In truth, I believe most of the code associated with this problematic area could have been deleted, as could the older and unused templates.
And Phantom Limbs?
I have to say, having read a little bit more about it, I do feel a little bit bad about the analogy to Phantom Limbs. From Wikipedia:
“A phantom limb is the sensation that an amputated or missing limb (even an organ, like the appendix) is still attached to the body and is moving appropriately with other body parts. Approximately 5 to 10% of individuals with an amputation experience phantom sensations in their amputated limb, and the majority of the sensations are painful.”
In any event the analogy only holds up so far; in this particular case the old and unusable code is still there. But only physically; if the code can never actually be run, maybe it isn’t really there at all?