Bye Bye DORA: Flaws of the State of DevOps Reports

Two years ago, I studied the impact of developer burnout on software engineers, finding 83% suffered from burnout. Over recent months, I’ve been working on further research on perceptions of software development for a variety of organisations, including findings that 75% of software engineers faced retaliation the last time they reported wrongdoing and that 89% of business leaders were concerned about the on-time delivery of software.

Google’s DORA team have for several years conducted their own polling of software engineers and the original authors of the measurement framework have now produced other frameworks including SPACE and DevEx. Whilst I originally trusted the research produced by these teams, as I’ve conducted further research the flaws have become evident.

Over the holiday period I’ve been reading Dr Andrew Jenkinson’s book "Why We Eat (Too Much): The New Science of Appetite", in the book Dr Jenkinson criticises a study known as the Seven Countries Study by Dr Ancel Keys. Dr Jenkinson describes Dr Keys success as follows: “He had won the argument over his biggest rival, trouncing him with indisputable facts, exposing his flawed logic. The crowd's adulation filled him with joy and ecstasy. His life's work had reached fruition. The funding for his research would come rolling in, his reputation as the leading scientist in his field would be secure for years. Fame was good, but now he had secured the top two real prizes - power and influence.”

However, Dr Jenkinson notes: “He had not been dishonest about his research - that would have been unethical and discredited him. Technically what he had presented was the truth. But he knew very well that it was not the whole truth.”

As I’ve studied the research outputs of DORA and later work in further detail, the parallels between this description and the research rigour in DORA’s State of DevOps Report and the subsequent SPACE and DevEx framework have become evident.

Where’s the Data?

Firstly, DORA research is conducted by sampling many thousands of developers through the use of subjective surveys. This research is conducted in-house by the DORA team. Ordinarily, those who conduct such research for a living have joined organisations like the Market Research Society (MRS) and British Polling Council (BPC) to ensure the public can have confidence in the research done by organisations who are members. For example; BPC rules place strict disclosure rules on their members requiring they disclose complete data tables with the questions asked within 2 working days of the research being published.

Here rests our first problem; the DORA team does not publish their raw data, only publishing their State of DevOps Report.

Flawed Methodology

Google’s DORA research, and the SPACE and DevEx frameworks used within team settings, use subjective surveys to create measurements. When using subjective surveys, it’s important to take steps to ensure bias doesn’t come into play.

However, DORA uses Four Key Metrics to measure outcomes - Change Lead Time, Deployment Frequency, Change Failure Rate and Time to Recovery (formerly Mean Time to Recovery). These are essentially measures of the speed to get new features deployed and the speed of resolving issues.

Imagine you asked some people “Do your colleagues eat lots of greens?” and “Do your colleagues work out a lot?”. Those who feel better about their workplace would probably be more likely to answer “yes” to both questions - this does not mean that eating more greens will always lead to greater levels of gym attendance. Whilst there may be a correlation, we haven’t created a cause-and-effect relationship.

DORA research argues that speed and reliability go hand-in-hand, however, they do so based on outcome measures which are entirely based on speed. Moreover, the use of subjective surveys can bias recipients who feel better about their work to answer “yes” to both questions. And whilst companies who are more competent may inevitably be more competent at both factors, this does not create a causal relationship.

For example; consider how highly regarded the reliability of aviation software is, versus the infrequent deployment of software to aircraft. Or consider how Toyota, the pioneer of agile methodologies, in the software reliability case “Bookout v. Toyota” on an unintended acceleration bug that led to fatalities conceded in internal communication that "In truth, technology such as failsafe is not part of the Toyota Engineering division's DNA". Or consider how during the Horizon IT scandal - blamed for multiple suicides and what has been described as “the most widespread miscarriage of justice in UK history”, with those wrongly imprisoned including a pregnant woman - the software developer, Fujitsu pioneered the use of an agile methodology to develop software, namely Rapid Application Development.

Flawed Measurement Outcomes

As discussed, DORA research measured against Four Key Metrics that assess the speed for deploying new work and fixing bugs to evaluate performance. However, these metrics only matter to the extent that they are useful outcomes to measure.

I have conducted research of both software engineers and a representative sample of the general public (with the research firm Survation) and found that both agree speed is the least important factor. Instead, the public cares most about data security, data accuracy and preventing serious bugs. It is hard to find a hypothesis which would connect the Four Key Metrics to these outcomes which software developers and the public say are most important - especially given that preventing serious bugs is outright a lower priority than fixing bugs quickly or getting work fast. Even for other factors like data security, it’s hard to see how these connect to any of the Four Key Metrics.

Even amongst business decision-makers, it seems that on-time delivery matters above fast delivery. According to research I conducted with J.L. Partners, 98% of such business decision-makers in the UK and 96% in the USA agree with the statement “The goal of a software engineering team is to deliver high-quality software on time”, with 65% in the UK and 62% in the US strongly agreeing.

Finally; the research I conducted with Survation found that trust in software engineers and the reliability expectations of the public can vary considerably from industry to industry, meaning a one-size-fits-all approach should be discouraged in favour of what Engineering Council UK suggests in their Guidance on Risk: “adopt a decision-making approach that is proportionate to the risk and consistent with their organisation’s defined risk appetite”.

Follow the Money

Like Dr Keys received funding from the sugar industry in his research - in many investigations, it’s important to follow the money to understand where incentives lie. The DORA team originally started doing State of DevOps reports for Puppet, a company focussed on automating IT infrastructure and now they do this work for Google Cloud. Both have a vested interest in developers being able to deploy work as quickly as possible. This does not mean however that it is the solution to all our problems.

DORA has made a contribution to the world of software engineering in adding a degree of empirical evaluation to the process. However, we must avoid confusing marketing material for the whole truth and recognise the flaws in such research.