System Safety Engineering
  • Home
  • System Safety Engineering
  • Software Safety
  • Forensic Engineering
  • Human Factors
  • Organizational Factors
  • Risk

 

NASA has several publications on-line:

Systems Engineering Handbook

Probabilistic Risk Assessment Procedures Guide



Some NASA System Safety background (from PRA guidebook):

HISTORIC BACKGROUND
Probabilistic Risk Assessment (PRA) is a comprehensive, structured, and logical analysis
method aimed at identifying and assessing risks in complex technological systems for the
purpose of cost-effectively improving their safety and performance. NASA’s objective is to
rapidly become a leader in PRA and to use this methodology effectively to ensure mission and
programmatic success, and to achieve and maintain high safety standards at NASA. NASA
intends to use PRA in all of its programs and projects to support optimal management decision
for the improvement of safety and program performance.
Over the years, NASA has been a leader in most of the technologies it has employed in its
programs. One would think that PRA should be no exception. In fact, it would be natural for
NASA to be a leader in PRA because, as a technology pioneer, NASA uses risk assessment
and management implicitly or explicitly on a daily basis. Many important NASA programs,
like the Space Shuttle Program, have, for some time, been assigned explicit risk-based mission
success goals.


Methods to perform risk and reliability assessment in the early 1960s originated in U.S.
aerospace and missile programs. Fault tree analysis (FTA) is such an example. It would have
been a reasonable extrapolation to expect that NASA would also become the first world leader in
the application of PRA. That was, however, not to happen.
Legend has it that early in the Apollo project the question was asked about the probability of
successfully sending astronauts to the moon and returning them safely to Earth. A risk, or
reliability, calculation of some sort was performed and the result was a very low success
probability value. So disappointing was this result that NASA became discouraged from further
performing quantitative analyses of risk or reliability until after the Challenger mishap in 1986.
Instead, NASA decided to rely on the Failure Modes and Effects Analysis (FMEA) method for
system safety assessments. To date, FMEA continues to be required by NASA in all its safetyrelated
projects.


In the meantime, the nuclear industry picked up PRA to assess safety almost as a last resort in
defense of its very existence. This analytical method was gradually improved and expanded by
experts in the field and has gained momentum and credibility over the past two decades, not only
in the nuclear industry, but also in other industries like petrochemical, offshore platforms, and
defense. By the time the Challenger accident occurred, PRA had become a useful and respected
tool for safety assessment. Because of its logical, systematic, and comprehensive approach, PRA
has repeatedly proven capable of uncovering design and operation weaknesses that had escaped
even some of the best deterministic safety and engineering experts. This methodology showed
that it was very important to examine not only low-probability and high-consequence individual
mishap events, but also high-consequence scenarios which can emerge as a result of occurrence
of multiple high-probability and nearly benign events. Contrary to common perception, the latter
is oftentimes more detrimental to safety than the former.

Then, the October 29, 1986, “Investigation of the Challenger Accident,” by the Committee
on Science and Technology, House of Representatives, stated that, without some means of
estimating the probability of failure (POF) of the Shuttle elements, it was not clear how
NASA could focus its attention and resources as effectively as possible on the most critical
Shuttle systems.
In January 1988, the Slay Committee recommended, in its report called the “Post-Challenger
Evaluation of Space Shuttle Risk Assessment and Management,” that PRA approaches be
applied to the Shuttle risk management program at the earliest possible date. It also stated that
databases derived from Space Transportation System failures, anomalies, flight and test results,
and the associated analysis techniques should be systematically expanded to support PRA, trend
analysis, and other quantitative analyses relating to reliability and safety.
As a result of the Slay Committee criticism, NASA began to try out PRA, at least in a “proof-ofconcept”
mode, with the help of expert contractors. A number of PRA studies were conducted in
this fashion over the next 10 years.
On July 29, 1996, the NASA Administrator directed the Associate Administrator, Office of
Safety and Mission Assurance (OSMA), to develop a PRA tool to support decisions on the
funding of Space Shuttle upgrades. He expressed unhappiness that, after he came to NASA in
1992, NASA spent billions of dollars on Shuttle upgrades without knowing how much safety
would be improved. He asked for an analytical tool to help base upgrade decisions on risk. This
tool was called Quantitative Risk Assessment System, and its latest version, 1.6, was issued in
April 2001 [1].