Psychological Safety and No Blame
The single biggest barrier to effective root cause analysis is poor quality engagement by stakeholders, and having management involved.
There is a natural tendency to look for people to blame when things go wrong, but in most cases, process and technology failures are often
more culpable than individuals.
Even when humans are at fault, in any event, they are much less likely to engage honestly with a process they feel is out to punish them.
Whilst negligence, dishonesty and incompetence are reasonable grounds for "blame", they are not helpful when diagnosing and recovering from
a large-scale catastrophic event and then rapidly optimising that system for reliability; in other words, even if you must blame and punish, do it afterwards.
Be serious about not blaming or punishing the people involved in the failure - particularly those in the recovery, as they may not have been part of the
root cause or original failure. Humour is important in business, but during a failure event, most people rarely appreciate the joke. Don't make jokes about
the failure or the people within it.
No blame is difficult, and management staff particularly will default to it. It takes concerted, serious effort to maintain an effective root-cause
analysis process - a strong collateral-based process, with outcomes that everyone can understand will go a long way to abating any undue pressure to
"find a culprit" - by sharing responsibility, and making a strong commitment to growth and learning from failure, any negative event can be
turned into a positive one.
Timing and Incident Response
The normal cadence/speed of a CoE process is to produce the first within 24-48 hours of the original event occuring. Data gathering should begin as part
of the incident process - NASAs "lock the doors" is a particularly useful idea. In most cases, the root-cause or causes will be known, but Next Actions
might not be completed. It is normal then to summarise and republish the CoE at the end of the process, perhaps months after, once all Next Actions are
full addressed.