'How Complex Systems Fail' and mental healthcare system design

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

Fan_of_Meehl

Full Member
7+ Year Member
Joined
Oct 22, 2014
Messages
2,477
Reaction score
4,888
I recently ran across this short summary of considerations for 'complex systems' in healthcare and was struck by how relevant some of the points were in relation to attempts to design complex systems to, say, 'prevent suicide,' especially considering the role of health care provider. I think it's worthy of discussion and am interested in others' opinions on it, especially if you've been in the role of trying to apply increasingly complex protocols, procedures, and rules relating to suicide prevention.



How Complex Systems Fail
(Being a Short Treatise on the Nature of Failure; How Failure is Evaluated; How Failure is Attributed to Proximate Cause; and the Resulting New Understanding of Patient Safety)


Richard I. Cook, MD
Cognitive Technologies Labratory
University of Chicago

  1. Complex systems are intrinsically hazardous systems.


    All of the interesting systems (e.g. transportation, healthcare, power generation) are inherently and unavoidably hazardous by the own nature. The frequency of hazard exposure can sometimes be changed but the processes involved in the system are themselves intrinsically and irreducibly hazardous. It is the presence of these hazards that drives the creation of defenses against hazard that characterize these systems.
  2. Complex systems are heavily and successfully defended against failure


    The high consequences of failure lead over time to the construction of multiple layers of defense against failure. These defenses include obvious technical components (e.g. backup systems, ‘safety’ features of equipment) and human components (e.g. training, knowledge) but also a variety of organizational, institutional, and regulatory defenses (e.g. policies and procedures, certification, work rules, team training). The effect of these measures is to provide a series of shields that normally divert operations away from accidents.
  3. Catastrophe requires multiple failures – single point failures are not enough.


    The array of defenses works. System operations are generally successful. Overt catastrophic failure occurs when small, apparently innocuous failures join to create opportunity for a systemic accident. Each of these small failures is necessary to cause catastrophe but only the combination is sufficient to permit failure. Put another way, there are many more failure opportunities than overt system accidents. Most initial failure trajectories are blocked by designed system safety components. Trajectories that reach the operational level are mostly blocked, usually by practitioners.
  4. Complex systems contain changing mixtures of failures latent within them.


    The complexity of these systems makes it impossible for them to run without multiple flaws being present. Because these are individually insufficient to cause failure they are regarded as minor factors during operations. Eradication of all latent failures is limited primarily by economic cost but also because it is difficult before the fact to see how such failures might contribute to an accident. The failures change constantly because of changing technology, work organization, and efforts to eradicate failures.
  5. Complex systems run in degraded mode.


    A corollary to the preceding point is that complex systems run as broken systems. The system continues to function because it contains so many redundancies and because people can make it function, despite the presence of many flaws. After accident reviews nearly always note that the system has a history of prior ‘proto-accidents’ that nearly generated catastrophe. Arguments that these degraded conditions should have been recognized before the overt accident are usually predicated on naïve notions of system performance. System operations are dynamic, with components (organizational, human, technical) failing and being replaced continuously.
  6. Catastrophe is always just around the corner.


    Complex systems possess potential for catastrophic failure. Human practitioners are nearly always in close physical and temporal proximity to these potential failures – disaster can occur at any time and in nearly any place. The potential for catastrophic outcome is a hallmark of complex systems. It is impossible to eliminate the potential for such catastrophic failure; the potential for such failure is always present by the system’s own nature.
  7. Post-accident attribution to a ‘root cause’ is fundamentally wrong.


    Because overt failure requires multiple faults, there is no isolated ‘cause’ of an accident. There are multiple contributors to accidents. Each of these is necessarily insufficient in itself to create an accident. Only jointly are these causes sufficient to create an accident. Indeed, it is the linking of these causes together that creates the circumstances required for the accident. Thus, no isolation of the ‘root cause’ of an accident is possible. The evaluations based on such reasoning as ‘root cause’ do not reflect a technical understanding of the nature of failure but rather the social, cultural need to blame specific, localized forces or events for outcomes. 1

    1 Anthropological field research provides the clearest demonstration of the social construction of the notion of ‘cause’ (cf. Goldman L (1993), The Culture of Coincidence: accident and absolute liability in Huli, New York: Clarendon Press; and also Tasca L (1990), The Social Construction of Human Error, Unpublished doctoral dissertation, Department of Sociology, State University of New York at Stonybrook)
  8. Hindsight biases post-accident assessments of human performance.


    Knowledge of the outcome makes it seem that events leading to the outcome should have appeared more salient to practitioners at the time than was actually the case. This means that ex post facto accident analysis of human performance is inaccurate. The outcome knowledge poisons the ability of after-accident observers to recreate the view of practitioners before the accident of those same factors. It seems that practitioners “should have known” that the factors would “inevitably” lead to an accident. 2 Hindsight bias remains the primary obstacle to accident investigation, especially when expert human performance is involved.

    2 This is not a feature of medical judgements or technical ones, but rather of all human cognition about past events and their causes.
  9. Human operators have dual roles: as producers & as defenders against failure.


    The system practitioners operate the system in order to produce its desired product and also work to forestall accidents. This dynamic quality of system operation, the balancing of demands for production against the possibility of incipient failure is unavoidable. Outsiders rarely acknowledge the duality of this role. In non-accident filled times, the production role is emphasized. After accidents, the defense against failure role is emphasized. At either time, the outsider’s view misapprehends the operator’s constant, simultaneous engagement with both roles.
  10. All practitioner actions are gambles.


    After accidents, the overt failure often appears to have been inevitable and the practitioner’s actions as blunders or deliberate willful disregard of certain impending failure. But all practitioner actions are actually gambles, that is, acts that take place in the face of uncertain outcomes. The degree of uncertainty may change from moment to moment. That practitioner actions are gambles appears clear after accidents; in general, post hoc analysis regards these gambles as poor ones. But the converse: that successful outcomes are also the result of gambles; is not widely appreciated.
  11. Actions at the sharp end resolve all ambiguity.


    Organizations are ambiguous, often intentionally, about the relationship between production targets, efficient use of resources, economy and costs of operations, and acceptable risks of low and high consequence accidents. All ambiguity is resolved by actions of practitioners at the sharp end of the system. After an accident, practitioner actions may be regarded as ‘errors’ or ‘violations’ but these evaluations are heavily biased by hindsight and ignore the other driving forces, especially production pressure.
  12. Human practitioners are the adaptable element of complex systems.


    Practitioners and first line management actively adapt the system to maximize production and minimize accidents. These adaptations often occur on a moment by moment basis. Some of these adaptations include: (1) Restructuring the system in order to reduce exposure of vulnerable parts to failure. (2) Concentrating critical resources in areas of expected high demand. (3) Providing pathways for retreat or recovery from expected and unexpected faults. (4) Establishing means for early detection of changed system performance in order to allow graceful cutbacks in production or other means of increasing resiliency.
  13. Human expertise in complex systems is constantly changing


    Complex systems require substantial human expertise in their operation and management. This expertise changes in character as technology changes but it also changes because of the need to replace experts who leave. In every case, training and refinement of skill and expertise is one part of the function of the system itself. At any moment, therefore, a given complex system will contain practitioners and trainees with varying degrees of expertise. Critical issues related to expertise arise from (1) the need to use scarce expertise as a resource for the most difficult or demanding production needs and (2) the need to develop expertise for future use.
  14. Change introduces new forms of failure.


    The low rate of overt accidents in reliable systems may encourage changes, especially the use of new technology, to decrease the number of low consequence but high frequency failures. These changes maybe actually create opportunities for new, low frequency but high consequence failures. When new technologies are used to eliminate well understood system failures or to gain high precision performance they often introduce new pathways to large scale, catastrophic failures. Not uncommonly, these new, rare catastrophes have even greater impact than those eliminated by the new technology. These new forms of failure are difficult to see before the fact; attention is paid mostly to the putative beneficial characteristics of the changes. Because these new, high consequence accidents occur at a low rate, multiple system changes may occur before an accident, making it hard to see the contribution of technology to the failure.
  15. Views of ‘cause’ limit the effectiveness of defenses against future events.


    Post-accident remedies for “human error” are usually predicated on obstructing activities that can “cause” accidents. These end-of-the-chain measures do little to reduce the likelihood of further accidents. In fact that likelihood of an identical accident is already extraordinarily low because the pattern of latent failures changes constantly. Instead of increasing safety, post-accident remedies usually increase the coupling and complexity of the system. This increases the potential number of latent failures and also makes the detection and blocking of accident trajectories more difficult.
  16. Safety is a characteristic of systems and not of their components


    Safety is an emergent property of systems; it does not reside in a person, device or department of an organization or system. Safety cannot be purchased or manufactured; it is not a feature that is separate from the other components of the system. This means that safety cannot be manipulated like a feedstock or raw material. The state of safety in any system is always dynamic; continuous systemic change insures that hazard and its management are constantly changing.
  17. People continuously create safety.


    Failure free operations are the result of activities of people who work to keep the system within the boundaries of tolerable performance. These activities are, for the most part, part of normal operations and superficially straightforward. But because system operations are never trouble free, human practitioner adaptations to changing conditions actually create safety from moment to moment. These adaptations often amount to just the selection of a well-rehearsed routine from a store of available responses; sometimes, however, the adaptations are novel combinations or de novo creations of new approaches.
  18. Failure free operations require experience with failure.


    Recognizing hazard and successfully manipulating system operations to remain inside the tolerable performance boundaries requires intimate contact with failure. More robust system performance is likely to arise in systems where operators can discern the “edge of the envelope”. This is where system performance begins to deteriorate, becomes difficult to predict, or cannot be readily recovered. In intrinsically hazardous systems, operators are expected to encounter and appreciate hazards in ways that lead to overall performance that is desirable. Improved safety depends on providing operators with calibrated views of the hazards. It also depends on providing calibration about how their actions move system performance towards or away from the edge of the envelope.

Other Materials​



Members don't see this ad.
 
  • Like
Reactions: 1 user
The problem with something like suicide prevention is that in order to produce something like a reliable system you need to:

1. Account for people lying about things like suicidal ideation to avoid the inevitable consequences.

2. Account for people that do not want the involvement of the system/to engage in care.
 
Last edited:
  • Like
Reactions: 2 users
Very thought provoking and lots of good points. It seems like one point I get out of this is that human flexibility is what makes these complex systems work. The tendency though is to blame human flexibility for the faults because of how hindsight works.

One system that I was closely involved with designing was a medication administration system at a treatment center. My instinct was always trying to go simpler and make it idiot-proof as we tried to minimize errors in the system. Often others thoughts would be to either punish the people who made errors or to make more complex systems to try and stop people from making those errors. Since the main people designing it were me and a really smart capable RN, we came up with a pretty good system but it was never 100% error free and it was always too complex for my liking. Kind of wondering about thoughts of how that anecdote fits or doesn’t fit with this. As I am in the process of designing a complex system currently. Aka, group practice and treatment program.
 
  • Like
Reactions: 1 user
Members don't see this ad :)
Hmm. Thanks for posting. One question I have been mulling since reading this is whether bureaucracy, in the political theory sense..a complex system, is inevitable in centralized and/or large (however that’s defined) and/or hierarchically-organized organizations? And if so, what are the ways in which it can be optimized?

To Q1, I have not yet been able to reason how it could not be inevitable.

To Q2, According to this, it seems like it is not possible because of the ways we get trapped in our human-ness. But I’m wondering how and whether it is possible to build in autonomy and perhaps decentralization or some kind of adhoc grouping that does not undermine the larger system, given its presence. Or is this just more begets more.. and we’re back where we began..
 
Hmm. Thanks for posting. One question I have been mulling since reading this is whether bureaucracy, in the political theory sense..a complex system, is inevitable in centralized and/or large (however that’s defined) and/or hierarchically-organized organizations? And if so, what are the ways in which it can be optimized?

To Q1, I have not yet been able to reason how it could not be inevitable.

To Q2, According to this, it seems like it is not possible because of the ways we get trapped in our human-ness. But I’m wondering how and whether it is possible to build in autonomy and perhaps decentralization or some kind of adhoc grouping that does not undermine the larger system, given its presence. Or is this just more begets more.. and we’re back where we began..
I think that we have had systems that have done this. Some successful business models and even our own system of government have some of these mechanisms of decentralization or autonomy built in. It seems like every time it is done, it is hailed as successful and revolutionary, but then human nature takes over and the system wins. I know that the military directly addressed this at one point to make their system more efficient after some of the debacles of decision making in Vietnam and in the 90s, it seemed to hit a peak of efficiency and not too sure where it’s at now.
 
  • Like
Reactions: 1 user
I think that we have had systems that have done this. Some successful business models and even our own system of government have some of these mechanisms of decentralization or autonomy built in. It seems like every time it is done, it is hailed as successful and revolutionary, but then human nature takes over and the system wins. I know that the military directly addressed this at one point to make their system more efficient after some of the debacles of decision making in Vietnam and in the 90s, it seemed to hit a peak of efficiency and not too sure where it’s at now.
In my role as de facto suicide prevention "operator" as an outpatient therapist, these three principles really resonate:

- All practitioner actions are gambles

- Human operators have dual roles: as producers and defenders against failure

- hindsight biases post-accident assessments of human performance

Whether people realize it or not. 'sending police over for a welfare check' after a no-show and failure to return phone calls in 72hrs. is actually 'a gamble' and may actually increase risk, especially considering the high base rate of no shows and disengagement in OP MH practice. It is hardly 'standard of care' to send in the cops and may even constitute a breach of duty (confidentiality) which is unwarranted.

I often commiserate with colleagues about how I feel that we have to constantly perform miracles which involve defusing risky situations, repairing ruptures, preventing catastrophes...all of which go unobserved by the higher ups. But let me fail to update a MH Suite 'treatment plan' on a stable patient, and it's a 'failure.' By the way, they fail to grasp that every additional form/policy/procedure they add to the process multiplies points-of-'failure' exponentially.
 
Last edited:
  • Like
Reactions: 1 users
I think that we have had systems that have done this. Some successful business models and even our own system of government have some of these mechanisms of decentralization or autonomy built in. It seems like every time it is done, it is hailed as successful and revolutionary, but then human nature takes over and the system wins. I know that the military directly addressed this at one point to make their system more efficient after some of the debacles of decision making in Vietnam and in the 90s, it seemed to hit a peak of efficiency and not too sure where it’s at now.
Yes, precisely. I can think of ways this has been implemented in the short term.. But these examples still beg the question as to whether there is a way to eliminate the barrier(s) to meaningfully integrating autonomy and decentralization into complex systems in a long-term way. Or given the maintenance of the overall system, does everything ultimately lead back to the main road with time and/or lack of ongoing effort..which ultimately occurs at the individual level.. Or perhaps the question is more like what are the ways individuals or adhoc groups can change the operation of the system? But again, even if they can and do at what point will it simply revert back to a new form of inefficiency.

Ultimately I think the real answer is that we (humans) generally want easy answers and standard practices and clear decision trees to function most effectively. Too much noise and stimuli to forgo them completely and require fully-considered decision making for all that is encountered. We also must account for all who primarily desire to be operators and not /thinkers/, for lack of a better term, in most of not all situations. On the other hand, “standard” practices or heuristics or whatever generally cannot and do not apply to all situations, even when all the relevant factors are or seem present. Thus, requiring the use of set practices and procedures without exception, in our field at least, is untenable. But then we’re back to the problem - finding a sustainable middle ground where the policies and procedures that exist mostly serve their stated purpose and there’s room for individual or small-group practice variation that doesn’t end in whoever cares most does most or almost complete divestment and alienation etc.
 
In my role as de facto suicide prevention "operator" as an outpatient therapist, these three principles really resonate:

- All practitioner actions are gambles

- Human operators have dual roles: as producers and defenders against failure

- hindsight biases post-accident assessments of human performance

Whether people realize it or not. 'sending police over for a welfare check' after a no-show and failure to return phone calls in 72hrs. is actually 'a gamble' and may actually increase risk, especially considering the high base rate of no shows and disengagement in OP MH practice. It is hardly 'standard of care' to send in the cops and may even constitute a breach of duty (confidentiality) which is unwarranted.

I often commiserate with colleagues about how I feel that we have to constantly perform miracles which involve defusing risky situations, repairing ruptures, preventing catastrophes...all of which go unobserved by the higher ups. But let me fail to update a MH Suite 'treatment plan' on a stable patient, and it's a 'failure.' By the way, they fail to grasp that every additional form/policy/procedure they add to the process multiplies points-of-'failure' exponentially.

Yes, precisely. I can think of ways this has been implemented in the short term.. But these examples still beg the question as to whether there is a way to eliminate the barrier(s) to meaningfully integrating autonomy and decentralization into complex systems in a long-term way. Or given the maintenance of the overall system, does everything ultimately lead back to the main road with time and/or lack of ongoing effort..which ultimately occurs at the individual level.. Or perhaps the question is more like what are the ways individuals or adhoc groups can change the operation of the system? But again, even if they can and do at what point will it simply revert back to a new form of inefficiency.

Ultimately I think the real answer is that we (humans) generally want easy answers and standard practices and clear decision trees to function most effectively. Too much noise and stimuli to forgo them completely and require fully-considered decision making for all that is encountered. We also must account for all who primarily desire to be operators and not /thinkers/, for lack of a better term, in most of not all situations. On the other hand, “standard” practices or heuristics or whatever generally cannot and do not apply to all situations, even when all the relevant factors are or seem present. Thus, requiring the use of set practices and procedures without exception, in our field at least, is untenable. But then we’re back to the problem - finding a sustainable middle ground where the policies and procedures that exist mostly serve their stated purpose and there’s room for individual or small-group practice variation that doesn’t end in whoever cares most does most or almost complete divestment and alienation etc.

I think the larger issue is that we are asking the wrong question (or at least asking questions from the wrong direction). Unless there was a preventable medical error the only thing we have done is prolong life. Last I checked our efficacy and stopping death is exactly 0%. Thus I think the question needs to be did we meet the patient's goals not did we stop what may be the intended result. I have sat across from too many terminally ill patients to think that suicide is never the desired outcome. Say that to a cancer patient who has had their throat scorched from chemo and cannot speak/eat or a MS/ALS patient that can no longer scratch their own nose, change a tv channel, or do anything really. The biggest system change would be making treatment planning, end of life planning, and patient education a properly paid activity rather than largely un/under compensated work no one has time for in the current environment. Second, the system should largely be prevention focused. It costs less to prevent than to treat disease. So, why do we wait to address mental health until someone is mentally unhealthy?
 
Last edited:
  • Like
Reactions: 3 users
Absolutely, cierto.

We can talk about how things should or could be as long as we want, but my personal interest in this conversation is praxis. Given what is, what processes or practices can we utilize to move us toward where we want to be? And what are the barriers and supports related to those?

In talking about system change and operation, what I take from Cook’s How Systems Fail treatise is that process is much more integral than outcome, even while we recognize that we must specify (and agree to) outcomes as goals to work towards.
 
Top