2-4 Safety Management Manual (SMM)
1) optimizing human performance;
2) preventing inadvertent errors;
3) reducing the unwanted consequences of variable human performance; the effectiveness of these
are continually monitored during normal operations;
h) ongoing monitoring of normal operations includes assessment of whether processes and procedures
are followed and, when they are not followed, investigations are carried out to determine the cause;
i) safety investigations include the assessment of contributing human factors, examining not only
behaviours but reasons for such behaviours (context), with the understanding that in most cases
people are doing their best to get the job done;
j) management of change process includes consideration of the evolving tasks and roles of the human
in the system;
k) personnel are trained to ensure they are competent to perform their duties, the effectiveness of
training is reviewed and training programmes are adapted to meet changing needs.
2.2.3 The effectiveness of safety management depends largely on the degree of senior support and
management commitment to create a working environment that optimizes human performance and encourages
personnel to actively engage in and contribute to the organization’s safety management processes.
2.2.4 To address the way that the organization influences human performance there must be senior level
support to implement effective safety management. This includes management commitment to create the right working
environment and the right safety culture to address human factors. This will also influence the attitudes and behaviours
of everyone in the organization. More information on safety culture can be found in Chapter 3.
2.2.5 A number of models have been created to support the assessment of human factors on safety
performance. The SHELL Model is well known and useful to illustrate the impact and interaction of the different system
components on the human, and emphasizes the need to consider human factors as an integrated part of SRM.
2.2.6 Figure 2-2 illustrates the relationship between the human (at the centre of the model) and workplace
components. The SHELL Model contains four satellite components:
a) Software (S): procedures, training, support, etc.;
b) Hardware (H): machines and equipment;
c) Environment (E): the working environment in which the rest of the L-H-S system must function; and
d) Liveware (L): other humans in the workplace.
Chapter 2. Safety management fundamentals 2-5
H
S L E
L
Figure 2-2. SHELL Model
2.2.7 Liveware. The critical focus of the model is the humans at the front line of operations, and depicted in the
centre of the model. However, of all the dimensions in the model, this is the one which is least predictable and most
susceptible to the effects of internal (hunger, fatigue, motivation, etc.) and external (temperature, light, noise, etc.)
influences. Although humans are remarkably adaptable, they are subject to considerable variations in performance.
Humans are not standardized to the same degree as hardware, so the edges of this block are not simple and straight.
The effects of irregularities at the interfaces between the various SHELL blocks and the central Liveware block should
be understood to avoid tensions that may compromise human performance. The jagged edges of the modules represent
the imperfect coupling of each module. This is useful in visualizing the following interfaces between the various
components of the aviation system:
a) Liveware-Hardware (L-H). The L-H interface refers to the relationship between the human and the
physical attributes of equipment, machines and facilities. This considers the ergonomics of operating
the equipment by personnel, how safety information is displayed and how switches and operating
levers are labelled and operated so they are logical and intuitive to operate.
b) Liveware-Software (L-S). The L-S interface is the relationship between the human and the supporting
systems found in the workplace, e.g. regulations, manuals, checklists, publications, processes and
procedures, and computer software. It includes such issues as the recency of experience, accuracy,
format and presentation, vocabulary, clarity and the use of symbols. L-S considers the processes and
procedures - how easy they are to follow and understand.
2-6 Safety Management Manual (SMM)
c) Liveware-Liveware (L-L). The L-L interface is the relationship and interaction between people in their
work environment. Some of these interactions are within the organization (colleagues, supervisors,
managers), many are between individuals from different organizations with different roles (air traffic
controllers with pilots, pilots with engineers etc.). It considers the importance of communication and
interpersonal skills, as well as group dynamics, in determining human performance. The advent of
crew resource management and its extension to air traffic services (ATS) and maintenance operations
has enabled organizations to consider team performance in the management of errors. Also within the
scope of this interface are staff/management relationships and organizational culture.
d) Liveware-Environment (L-E). This interface involves the relationship between the human and the
physical environment. This includes things such as temperature, ambient light, noise, vibration and air
quality. It also considers the externally environmental factors, such as weather, infrastructure and
terrain.
2.3 ACCIDENT CAUSATION
2.3.1 The “Swiss-Cheese” (or Reason) Model, developed by Professor James Reason and well known to the
aviation industry, illustrates that accidents involve successive breaches of multiple defences. These breaches can be
triggered by a number of enabling factors such as equipment failures or operational errors. The Swiss-Cheese Model
contends that complex systems such as aviation are extremely well defended by layers of defences (otherwise known as
“barriers”). A single-point failure is rarely consequential. Breaches in safety defences can be a delayed consequence of
decisions made at the higher levels of the organization, which may remain dormant until their effects or damaging
potential are activated by certain operating conditions (known as latent conditions). Under such specific circumstances,
human failures (or “active failures”) at the operational level act to breach the final layers of safety defence. The Reason
Model proposes that all accidents include a combination of both active failures and latent conditions.
2.3.2 Active failures are actions or inactions, including errors and rule-breaking, that have an immediate adverse
effect. They are viewed, with the benefit of hindsight, as unsafe acts. Active failures are associated with front-line
personnel (pilots, air traffic controllers, aircraft maintenance engineers, etc.) and may result in a harmful outcome.
2.3.3 Latent conditions can exist in the system well before a damaging outcome. The consequences of latent
conditions may remain dormant for a long time. Initially, these latent conditions are not perceived as harmful, but under
certain conditions may become clear when the operational level defences are breached. People far removed in time and
space from the event can create these conditions. Latent conditions in the system may include those created by the
safety culture; equipment choices or procedural design; conflicting organizational goals; defective organizational
systems; or management decisions.
2.3.4 The “organizational accident” paradigm assists by identifying these latent conditions on a system-wide
basis, rather than through localized efforts, to minimize active failures by individuals. Importantly, latent conditions, when
created, had good intentions. Organizational decision makers are often balancing finite resources, and potentially
conflicting priorities and costs. The decisions taken by decision makers, made on a daily basis in large organizations,
might, in particular circumstances, unintentionally lead to a damaging outcome.
2.3.5 Figure 2-3 illustrates how the Swiss-Cheese Model assists in understanding the interplay of organizational
and managerial factors in accident causation. Multiple defensive layers are built into the aviation system to protect
against variations in human performance or decisions at all levels of the organization. But each layer typically has
weaknesses, depicted by the holes in the slices of “Swiss cheese”. Sometimes all of the weaknesses align (represented
by the aligned holes) leading to a breach that penetrates all defensive barriers and may result in a catastrophic outcome.
The Swiss-Cheese Model represents how latent conditions are ever present within the system and can manifest through
local trigger factors.
2.3.6 It is important to recognize that some of the defences, or breaches, can be influenced by an interfacing
organization. It is therefore vitally important that service providers assess and manage these interfaces.
Chapter 2. Safety management fundamentals 2-7
Hazards
Losses
Figure 2-3. Concept of accident causation
2.3.7 “Swiss–Cheese” applications for safety management
2.3.7.1 The “Swiss-Cheese” Model can be used as an analysis guide by both States and service providers by
looking past the individuals involved in an incident or identified hazard, into the organizational circumstances which may
have allowed the situation to manifest. It can be applied during SRM, safety surveillance, internal auditing, change
management and safety investigation. In each case, the model can be used to consider which of the organization’s
defences are effective, which can or have been breached, and where the system could benefit from additional defences.
Once identified, any weaknesses in the defences can be reinforced against future accidents and incidents.
2.3.7.2 In practice, the event will breach the defences in the direction of the arrow (hazards to losses) as displayed
in the rendering of Figure 2-3. The assessments of the situation will be conducted in the opposite direction, in this case
losses to hazard. Actual aviation accidents will usually include a degree of additional complexity. There are more
sophisticated models which can help States and service providers to understand how and why accidents happen.
2.3.8 Practical drift
2.3.8.1 Scott A. Snook's theory of practical drift is used to understand how performance of any system “drifts away”
from its original design. Tasks, procedures, and equipment are often initially designed and planned in a theoretical
environment, under ideal conditions, with an implicit assumption that nearly everything can be predicted and controlled,
and where everything functions as expected. This is usually based on three fundamental assumptions that the:
a) technology needed to achieve the system production goals is available;
b) personnel are trained, competent and motivated to properly operate the technology as intended; and
c) policy and procedures will dictate system and human behaviour.
These assumptions underlie the baseline (or ideal) system performance, which can be graphically presented as a
straight line from the start of operational deployment as shown in Figure 2-4.
2-8 Safety Management Manual (SMM)
Baseline performance
Practical drift
Operational
deployment
Accident
Figure 2-4. Concept of practical drift
2.3.8.2 Once operationally deployed, the system should ideally perform as designed, following baseline
performance (orange line) most of the time. In reality, the operational performance often differs from the assumed
baseline performance as a consequence of real-life operations in a complex, ever-changing and usually demanding
environment (red line). Since the drift is a consequence of daily practice, it is referred to as a “practical drift”. The term
“drift” is used in this context as the gradual departure from an intended course due to external influences.
2.3.8.3 Snook contests that practical drift is inevitable in any system, no matter how careful and well thought out its
design. Some of the reasons for the practical drift include:
a) technology that does not operate as predicted;
b) procedures that cannot be executed as planned under certain operational conditions;
c) changes to the system, including the additional components;
d) interactions with other systems;
e) safety culture;
Chapter 2. Safety management fundamentals 2-9
f) adequacy (or inadequacy) of resources (e.g. support equipment);
g) learning from successes and failures to improve operations, and so forth.
2.3.8.4 In reality people will generally make the system work on a daily basis despite the system’s shortcomings,
applying local adaptations (or workarounds) and personal strategies. These workarounds may bypass the protection of
existing safety risk controls and defences.
2.3.8.5 Safety assurance activities such as audits, observations and monitoring of SPIs can help to expose
activities that are “practically drifting”. Analysing the safety information to find out why the drift is happening helps to
mitigate the safety risks. The closer to the beginning of the operational deployment that practical drift is identified, the
easier it is for the organization to intervene. More information on safety assurance for States and service providers may
be found in Chapters 8 and 9, respectively.
2.4 MANAGEMENT DILEMMA
2.4.1 In any organization engaged in the delivery of services, production/profitability and safety risks are linked.
An organization must maintain profitability to stay in business by balancing output with acceptable safety risks (and the
costs involved in implementing safety risk controls). Typical safety risk controls include technology, training, processes
and procedures. For the State, the safety risk controls are similar, i.e. training of personnel, the appropriate use of
technology, effective oversight and the internal processes and procedures supporting oversight. Implementing safety risk
controls comes at a price – money, time, resources – and the aim of safety risk controls is usually to improving safety
performance, not production performance. However, some investments in “protection” can also improve “production” by
reducing accidents and incidents and thereby their associated costs.
2.4.2 The safety space is a metaphor for the zone where an organization balances desired
production/profitability while maintaining required safety protection through safety risk controls. For example, a service
provider may wish to invest in new equipment. The new equipment may simultaneously provide the necessary efficiency
improvements as well as improved reliability and safety performance. Such decision-making involves an assessment of
both the benefits to the organization as well as the safety risks involved. The allocation of excessive resources to safety
risk controls may result in the activity becoming unprofitable, thus jeopardizing the viability of the organization.
2.4.3 On the other hand, excess allocation of resources for production at the expense of protection can have an
impact on the product or service and can ultimately lead to an accident. It is therefore essential that a safety boundary
be defined that provides early warning that an unbalanced allocation of resources exists, or is developing. Organizations
use financial management systems to recognize when they are getting too close to bankruptcy and apply the same logic
and tools used by safety management to monitor their safety performance. This enables the organization to operate
profitably and safely within the safety space. Figure 2-5 illustrates the boundaries of an organization’s safety space.
Organizations need to continuously monitor and manage their safety space as safety risks and external influences
change over time.
2.4.4 The need to balance profitability and safety (or production and protection) has become a readily
understood and accepted requirement from a service provider perspective. This balance is equally applicable to the
State’s management of safety, given the requirement to balance resources required for State protective functions that
include certification and surveillance.
2-10 Safety Management Manual (SMM)
Bankruptcy
Catastrophe
Production
Figure 2-5. Concept of a safety space
2.5 SAFETY RISK MANAGEMENT
Safety Risk Management (SRM) is a key component of safety management and includes hazard identification, safety
risk assessment, safety risk mitigation and risk acceptance. SRM is a continuous activity because the aviation system is
constantly changing, new hazards can be introduced and some hazards and associated safety risks may change over
time. In addition, the effectiveness of implemented safety risk mitigation strategies must be monitored to determine if
further action is required.
2.5.1 Introduction to hazards
2.5.1.1 In aviation, a hazard can be considered as a dormant potential for harm which is present in one form or
another within the system or its environment. This potential for harm may appear in different forms, for example: as a
natural condition (e.g. terrain) or technical status (e.g. runway markings).
2.5.1.2 Hazards are an inevitable part of aviation activities, however, their manifestation and possible adverse
consequences can be addressed through mitigation strategies which aim to contain the potential for the hazard to result
in an unsafe condition. Aviation can coexist with hazards so long as they are controlled. Hazard identification is the first
step in the SRM process. It precedes a safety risk assessment and requires a clear understanding of hazards and their
related consequences.
2.5.2 Understanding hazards and their consequences
2.5.2.1 Hazard identification focuses on conditions or objects that could cause or contribute to the unsafe
operation of aircraft or aviation safety-related equipment, products and services (guidance on distinguishing hazards that
are directly pertinent to aviation safety from other general/industrial hazards is addressed in subsequent paragraphs).
Chapter 2. Safety management fundamentals 2-11
2.5.2.2 Consider, for example, a fifteen-knot wind. Fifteen-knots of wind is not necessarily a hazardous condition.
In fact, a fifteen-knot wind blowing directly down the runway improves aircraft take-off and landing performance. But if
the fifteen-knot wind is blowing across the runway, a crosswind condition is created which may be hazardous to
operations. This is due to its potential to contribute to aircraft instability. The reduction in control could lead to an
occurrence, such as a lateral runway excursion.
2.5.2.3 It is not uncommon for people to confuse hazards with their consequences. A consequence is an outcome
that can be triggered by a hazard. For example, a runway excursion (overrun) is a potential consequence related to the
hazard of a contaminated runway. By clearly defining the hazard first, one can more readily identify possible
consequences.
2.5.2.4 In the crosswind example above, an immediate outcome of the hazard could be loss of lateral control
followed by a consequent runway excursion. The ultimate consequence could be an accident. The damaging potential of
a hazard can materialize through one or many consequences. It is important that safety risk assessments identify all of
the possible consequences. The most extreme consequence - loss of human life - should be differentiated from those
that involve lesser consequences, such as: aircraft incidents; increased flight crew workload; or passenger discomfort.
The description of the consequences will inform the risk assessment and subsequent development and implementation
of mitigations through prioritization and allocation of resources. Detailed and thorough hazard identification will lead to
more accurate assessment of safety risks.
Hazard identification and prioritization
2.5.2.5 Hazards exist at all levels in the organization and are detectable through many sources including reporting
systems, inspections, audits, brainstorming sessions and expert judgement. The goal is to proactively identify hazards
before they lead to accidents, incidents or other safety-related occurrences. An important mechanism for proactive
hazard identification is a voluntary safety reporting system. Additional guidance on voluntary safety reporting systems
can be found in Chapter 5. Information collected through such reporting systems may be supplemented by observations
or findings recorded during routine site inspections or organizational audits.
2.5.2.6 Hazards can also be identified in the review or study of internal and external investigation reports. A
consideration of hazards when reviewing accident or incident investigation reports is a good way to enhance the
organization’s hazard identification system. This is particularly important when the organization’s safety culture is not yet
mature enough to support effective voluntary safety reporting, or in small organizations with limited events or reports. An
important source of specific hazards linked to operations and activities is from external sources such as ICAO, trade
associations or other international bodies.
2.5.2.7 Hazard identification may also consider hazards that are generated outside of the organization and
hazards that are outside the direct control of the organization, such as extreme weather or volcanic ash. Hazards related
to emerging safety risks are also an important way for organizations to prepare for situations that may eventually occur.
2.5.2.8 The following should be considered when identifying hazards:
a) system description;
b) design factors, including equipment and task design;
c) human performance limitations (e.g. physiological, psychological, physical and cognitive);
d) procedures and operating practices, including documentation and checklists, and their validation
under actual operating conditions;
e) communication factors, including media, terminology and language;
2-12 Safety Management Manual (SMM)
f) organizational factors, such as those related to the recruitment, training and retention of personnel,
compatibility of production and safety goals, allocation of resources, operating pressures and
corporate safety culture;
g) factors related to the operational environment (e.g. weather, ambient noise and vibration, temperature
and lighting);
h) regulatory oversight factors, including the applicability and enforceability of regulations, and the
certification of equipment, personnel and procedures;
i) performance monitoring systems that can detect practical drift, operational deviations or a
deterioration of product reliability;
j) human-machine interface factors; and
k) factors related to the SSP/SMS interfaces with other organizations.
Occupational safety health and environment hazards
2.5.2.9 Safety risks associated with compound hazards that simultaneously impact aviation safety as well as
OSHE may be managed through separate (parallel) risk mitigation processes to address the separate aviation and
OSHE consequences, respectively. Alternatively, an integrated aviation and OSHE risk mitigation system may be used
to address compound hazards. An example of a compound hazard is a lightning strike on an aircraft at an airport transit
gate. This hazard may be deemed by an OSHE inspector to be a “workplace hazard” (ground personnel/workplace
safety). To an aviation safety inspector, it is also an aviation hazard with risk of damage to the aircraft and a risk to
passenger safety. It is important to consider both the OSHE and aviation safety consequences of such compound
hazards, since they are not always the same. The purpose and focus of preventive controls for OSHE and aviation
safety consequences may differ.
Hazard identification methodologies
2.5.2.10 The two main methodologies for identifying hazards are:
a) Reactive. This methodology involves analysis of past outcomes or events. Hazards are identified
through investigation of safety occurrences. Incidents and accidents are an indication of system
deficiencies and therefore can be used to determine which hazard(s) contributed to the event.
b) Proactive. This methodology involves collecting safety data of lower consequence events or process
performance and analysing the safety information or frequency of occurrence to determine if a hazard
could lead to an accident or incident. The safety information for proactive hazard identification
primarily comes from flight data analysis (FDA) programmes, safety reporting systems and the safety
assurance function.
2.5.2.11 Hazards can also be identified through safety data analysis which identifies adverse trends and makes
predictions about emerging hazards, etc.
Chapter 2. Safety management fundamentals 2-13
Hazards related to SMS interfaces with external organizations
2.5.2.12 Organizations should also identify hazards related to their safety management interfaces. This should,
where possible, be carried out as a joint exercise with the interfacing organizations. The hazard identification should
consider the operational environment and the various organizational capabilities (people, processes, technologies)
which could contribute to the safe delivery of the service or product’s availability, functionality or performance.
2.5.2.13 As an example, an aircraft turnaround involves many organizations and operational personnel all working
in and around the aircraft. There are likely to be hazards related to the interfaces between operational personnel, their
equipment and the coordination of the turnaround activity.
2.5.3 Safety risk probability
2.5.3.1 Safety risk probability is the likelihood that a safety consequence or outcome will occur. It is important to
envisage a variety of scenarios so that all potential consequences can be considered. The following questions can assist
in the determination of probability:
a) Is there a history of occurrences similar to the one under consideration, or is this an isolated
occurrence?
b) What other equipment or components of the same type might have similar issues?
c) What is the number of personnel following, or subject to, the procedures in question?
d) What is the exposure of the hazard under consideration? For example, during what percentage of the
operation is the equipment or activity in use?
2.5.3.2 Taking into consideration any factors that might underlie these questions will help when assessing the
probability of the hazard consequences in any foreseeable scenario.
2.5.3.3 An occurrence is considered foreseeable if any reasonable person could have expected the kind of
occurrence to have happened under the same circumstances. Identification of every conceivable or theoretically
possible hazard is not possible. Therefore, good judgment is required to determine an appropriate level of detail in
hazard identification. Service providers should exercise due diligence when identifying significant and reasonably
foreseeable hazards related to their product or service.
Note.— Regarding product design, the term “foreseeable” is intended to be consistent with its use in
airworthiness regulations, policy, and guidance.
2.5.3.4 Table 1 presents a typical safety risk probability classification table. It includes five categories to denote the
probability related to an unsafe event or condition, the description of each category, and an assignment of a value to
each category. This example uses qualitative terms; quantitative terms could be defined to provide a more accurate
assessment. This will depend on the availability of appropriate safety data and the sophistication of the organization and
operation.
2-14 Safety Management Manual (SMM)
Table 1. Safety risk probability table
Likelihood Meaning Value
Frequent Likely to occur many times (has occurred frequently) 5
Occasional Likely to occur sometimes (has occurred infrequently) 4
Remote Unlikely to occur, but possible (has occurred rarely) 3
Improbable Very unlikely to occur (not known to have occurred) 2
Extremely improbable Almost inconceivable that the event will occur 1
Note.— This is an example only. The level of detail and complexity of tables and matrices should be
adapted to the particular needs and complexities of each organization. It should also be noted that organizations might
include both qualitative and quantitative criteria.
2.5.4 Safety risk severity
2.5.4.1 Once the probability assessment has been completed, the next step is to assess the severity, taking into
account the potential consequences related to the hazard. Safety risk severity is defined as the extent of harm that might
reasonably be expected to occur as a consequence or outcome of the identified hazard. The severity classification
should consider:
a) fatalities or serious injury which would occur as a result of:
1) being in the aircraft;
2) having direct contact with any part of the aircraft, including parts which have become detached
from the aircraft; or
3) having direct exposure to jet blast; and
b) damage:
1) damage or structural failure sustained by the aircraft which:
i) adversely affects the structural strength, performance or flight characteristics of the aircraft;
ii) would normally require major repair or replacement of the affected component;
2) damage sustained by ATS or aerodrome equipment which:
i) adversely affects the management of aircraft separation; or
ii) adversely affects landing capability.