TL;DR; Projects are like piles of sand
So given the hypothesis, that projects fail because they start from the wrong place. Here's my thought process for what is the correct place to start.
Today, the NAO brought out a report on the latest UK planning disaster: HS2. Their summary was pertinent, and most likely correct. They are paraphrased as saying: "[the HS2 project is] over budget and running late because the Government underestimated the project's complexity and risk."
Well of course it is. And this is before they've even really started, though billions have already been spent on planning. This underlines that standard planning approaches simply don't work. It isn't a matter of desire, or indeed funding and effort, standard planning and management is just incapable of working at any sort of scale.
So, what to do. Well one thing we can do is consider risk and the drivers of risk in the planning. Believe it or not, risk is almost always ignored in planning, or some small contingency is applied. These contingencies are almost always removed from the timelines by accountants and program managers (among others). Now given that risk is the primary cause of failure, is this not deliberate sabotage?
So, somewhere very early, as a cultural and structural ploy, we invite failure by demoting risk to an afterword. Then when it fails we tut and shake our heads. "It was ever so" we say and move on.
For HS2, the audit team have talked about "underestimated risk" but I'm not sure anyone knows what to do about that. I expect they will address this by simply raising the cost, increasing oversight and extending the due-date, but no-one is suggesting the plan should change. This will not end well, for good and obvious as well as subtle mathematical reasons. It is worth looking at why to seek solutions.
Sand-pile dynamics
A project depends on the controlled progress of a lot of interconnected tasks, with non-recoverable resources of time and cost. Time and money spent is gone, you can't get it back. This makes them act very much like a pile of sand, or indeed snow, if you prefer. The non-recoverability is analogous to the fact that things fall down, not up. As tasks get delayed, this puts pressure on downstream tasks. Most tasks offer some contingency at the end, but this is easily and rapidly consumed.
As we know, snow (and sand) can avalanche, where the whole lot cascades, leading to rapid and complete failure i.e. the cost and time to make any progress becomes extremely large. Anyone who has had building work will have observed this in miniature when the team of builders leave to do other work and your project suddenly slows to a crawl. Imagine this on a gigantic scale, with much greater complexity.
The more complex the interconnection, the more any delays or shortages affect other tasks, and so the more likely it is that this cascade failure will happen. The reasons for this are that given conventional planning, delays and overruns are cumulative. If you have ten tasks in a row, with 5% delay to any one task, then by the time you've got to the end you have a 50% delay to the final task in the best case. If the resources for that task (say the plasterers) are expecting to be somewhere else at that time, then one or other job is going to suffer.
So, what to do?
There are a number of points here, and the root cause (the acceptable approach to failure) may not be addressable. Sorry. Politics is not my thing.
Coming back to the mechanics of failure, it is instructive to look at why sand-piles don't act like water and just immediately fail. If we accept the sandpile analogy, then we'll see lots of small slippages before any major event. Individually they don't participate in the major cascade, but they do influence it. Furthermore, it isn't actually a pile of sand, we do have levers we can apply to potentially reduce the sandpile character of the plan. More on that next time.