The Information Source for Systems Testability and Diagnostics

Testability on the Shuttle Program
by Darrel Fritts
June 29, 2004

  Why Dependency Modeling and Testability Analysis? During my years of working on the Space Shuttle program in the 1970’s a new buzzword came into being. “Integrated Diagnostics” became a term that had a wide variance in understanding among my fellow constituents. To some it meant the art of diagnosing a problem and isolating it. To others it meant a system of tracking failures within a system to trend failures for prognosis of future failures within the system. Still others understood it to be a method of measuring a system to determine how diagnosable it is.

While all of these are important to include within a definition of integrated diagnostics, my experience in this makes it my opinion that the hardest of these items to do is to measure a system to determine its real-world diagnostic capability. When I say real world, I mean for both operational and maintenance actions. After 25 years of electronic design my world changed from creating systems and subsystems to diagnosis of systems. Why? Because I found it was a much greater challenge than that of design and my experience in the design world left me with a good intuitive understanding of faults and fault isolation.

Two notable Shuttle experiences re-enforce my opinion that diagnosis is the real challenge. When the Shuttle Challenger blew up after launch the problem that causes it was a gigantic failure in diagnosis. Diagnosis of a failure has two major components, “Detection” and “Isolation”. The Bottom line is that you cannot correct or remediate a problem you do not know exists. A simple sensor placed at the right location would have detected the “O”-ring problem in plenty of time to allow a return to Launch site maneuver saving the crew lives. Yet the sensor did not exist because no one had considered the possibility of an “O”-ring failure in the manner it occurred. It was a diagnostic analysis failure. It was the same with the Columbia. No method had been employed to monitor the insulation that came off the external tank because it had never been considered a threat to the shuttle. As a result in both of these catastrophes critical faults went undetected.

Through the intervening years since the Challenger accident I have seen a tremendous increase in the use of the Failure Effects and Criticality Analysis (FMECA). It is now a primary tool in system design. While the FMECA existed clear back to the years before the shuttle its usage in those days were largely that of an after-the-fact logistics system repair tool as opposed to a design tool. But analysis of an effect even if critical is no answer to the solution if that effect goes undetected. This is why I believe dependency modeling driven testability analysis is so important. There is no way that I know to consider all possibilities of critical effects.

If you look at the two failures I have described above the FMECA’s created for this system did not even include these failure effects. That was primarily because there was no associated hardware. Current FMECA technology works by examining each existing hardware components for their failure modes and assessing the effect of these failures. Here is one of the flaws in the system: If system hardware does not exist in a design, neither does the failure mechanism that can be critical to the failure of the system.

Future FMECA analysis needs to go beyond the existing hardware and consider failures that are not driven by system hardware. We have dented the surface in this area in the past to some extent, by analysis of Cooling, weather, lightning strikes and radiation but our efforts still fall way too short. One of the key tools to take us beyond system hardware failures is testability analysis. My experience has taught me that I can always find a way to isolate a problem if I know it exists. But I cannot always find the basic failures that cause problems. The standard faults such as those of the electronic components are well understood and seldom contribute to a catastrophic failure. We have learned how to use redundancy where needed and our detection and remediation in these areas is generally good. What we have not learned is how to manage our analysis to include all the influences outside the defined system. The FMECA still appears to be the best analysis tool to measure critical effects on a system but if it does not include a means to identify exactly how that effect will be detected then we will still fall short. In the past it has been up to the design engineer to determine how he will test and detect a failure. The problem is that he works with a failure set that too limited and does not include all possible failures and he also does not always fully understand failure propagation.

The answer to this lies in dependency modeling. Dependency modeling driven testability analysis provides us with a real good analysis of propagation of failures within a system and leads us to consider “what if”s in a better and logical manner. Combining the testability analysis with the FMECA where it is not a guess as to how a failure mechanism is detected but is driven by a dependency model and its real propagation characteristics is essential. It also helps the analyst to better concentrate on the critical areas of the design and eliminates analysis of the non-critical areas. I have always believed as an engineer to use all the tools I can put my hands on.

A dependency-modeling tool is one I do not want to do without. The tool I use today is the DSI eXpress tool. It provides a good graphical representation of a design, good diagnostic algorithm metrics to determine the characteristics of the design in terms of the design failures, their propagation within the design and detection method for these failures using the designer test strategy. It also provides a quick generation of a customer customized FMECA in Excel format that ties the failures of the design to the detection of these failures. We still will never be able to conjecture all the failures that will bring down our future systems but maybe we can eliminate some of these by better analysis techniques. Dependency model driven testability analysis is definitely one of these techniques and the one I choose.