Software Reliability Pt. 2: Code For The Best By Preparing For The Worst

Building great software is like playing chess. What’s the similarity? Well, just like in the sport, you have to keep the main objective at the forefront of your mind with every move you make, in our case, the lines of code we write. However, any decent chess player will tell you that knowing what you have to do won’t suffice if you want to win, you have to put equal emphasis on how you go about your attempt to win. For example, to engineer world class quality software, programmers have to be tactful in their approach. Put simply, think before you code. But that’s just one important step in the process, building reliability takes more than just the programmers putting on their thinking hats (shout out to Edward De Bono). So what are some practical steps a team can take to reach this goal? Here’s my two cents…

Prepare For The Worst With A Reliability Specification

Preparing for the worst is something we do in most aspects of life, and software development should be no exception. Consider Uber and Tesla’s self-driving cars, I’m pretty sure they consist of software that, in their most basic forms, are programs with a set of instructions like decision constructs. Can you imagine what the results would be if their algorithms only accounted for perfect scenarios such as no traffic on the road, no route redirections, never running low on fuel, a parking spot always being available, etc? Engineering teams have to ask the question, “What’s the worst that could happen?”, and work from there. If you’re looking to motivate your team in this process, it might be best to leave out Murphy’s Law from your speech.

“Anything that can go wrong will go wrong” — Murphy’s Law

Software specifications have to consist of more than what the system should do, they have to account for how the system should behave in exceptional situations. I think good engineers should follow best practices to produce software that takes meaningful and useful actions in such situations. As mentioned in Part 1 of this short series, it is software failures that affect reliability, and not software faults. The consequences of software failure depend on the nature of that failure, and so it’s important to have a document that outlines these things that could go wrong, and what costs could potentially be incurred in the case that they do happen.

Most software systems are made up of other sub-systems or sub-components, and each should have reliability requirements. I understand that this process can get quite expensive because of the time factor. One way of reducing the costs is by measuring the impact of failures in these various sub-systems and using that to determine reliability requirements for each sub-system. For example, Imposing the same level of maximum reliability on a basic module or sub-system as you would in a payment transaction sub-system is probably not a wise move.

A reliability spec should have the identified sub-systems, the different software failures that may occur in each of them, and an assessment of the consequences of those failures.

Prepare For The Worst With Exception Handling

I’m guessing you already know what an exception is, but I’ll go ahead and define it anyway. It is when an unexpected event, usually but not always an error, occurs during the execution of a program. Ever written (or coded) a function to make an API request and didn’t consider any kind of failure occurring? Exception handling can be quite useful for stuff like that. Thankfully, most programming languages today have built-in features to help with this. Depending on the use case, you could either go the route of using decision constructs (if statements) or, more popularly, “Try” and “Catch” blocks. Try/catch blocks are designed specifically for handling exceptions and would probably be more of the go to.

One could simply code try/catch blocks all over the show, but that won’t necessarily be helpful. You could actually create an inefficiency problem in your software; we don’t want to be wasteful of system resources like memory. This is why a reliability spec is so important, it informs how we go about programming for exceptions, which parts of the system require it, and to what degree. Some parts of your software system may call for nested try/catch blocks, whereas others don’t.

This is why a reliability spec is so important, it informs how we go about programming for exceptions. Some parts of your software system may call for nested Try/Catch blocks, whereas others don’t.

Prepare For The Worst With Defensive Programming

I’m sure you’ve heard this before, and perhaps the saying has become somewhat of a cliche, but I think there’s a lesson in there for developers:

Defense wins championships — Bear Bryant (Legendary Alabama football coach)

It’s not just a cute quote. Though, we do have to ask the question, “What does that look like in programming?”

Defensive programming is an approach to program development whereby programmers assume that there may be undetected faults or inconsistencies in their programs — Ian Sommerville

For example, never trust user input. You don’t need me to tell you that, do you? If you’ve got a few years (or even months) of development under your belt, I’m sure your experiences have been a good teacher in this area. We have to program in such a way that our system handles any incoming data in a strict and defensive way.

External input isn’t the only thing we need to be on guard for. If you’re working in a team, it’s probably best to approach other developers’ code, as well as your own, with positive scrutiny. The equation I mentioned in the previous post should become your new mantra:

Poor software engineering === poor software quality

Programmers have different ways of thinking and applying logic, and that’s not a bad thing. However, that logic or way of thinking can be translated to code in a bad way. Therefore, it may be a good idea to enforce certain styles, patterns, and ways of organising code that could help with preventing software failure, as well as recovering from them.

The goal with defensive programming is to safeguard against errors so that the software we develop maintains a high level of quality and performance, even in the case of unexpected inputs and user actions, which will happen.

Hmmm…I haven’t even touched on testing. I think I smell a Part 3.

That being said…

Building reliable software is hard work. Not only that, it’s an expensive process because of the amount of time it takes to plan for it, as well as program for it. It’s not a one man job because the software development lifecycle has different parties that have to be pulling in the same direction when it comes to quality. You may have to deal with being branded as an idealist amongst your peer developers or the rest of the team. But we need to be able to see passed that, because building reliable software is a task for those who have a high regard for craftsmanship and understand the impact it has on the end product, and its users.

References:

Software Engineering by Ian Sommerville