0% found this document useful (0 votes)
71 views11 pages

Major Incidents

The document outlines best practices for creating and managing a Major Incident Procedure within organizations, emphasizing the importance of defining what constitutes a major incident and establishing a dedicated Major Incident Team. It details the roles and responsibilities involved, the necessity for effective communication during incidents, and the need for regular reviews of the procedure to prevent recurrence. Additionally, it highlights the importance of planning and preparation for potential major incidents, including defining processes, roles, and communication plans.

Uploaded by

lennss857
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views11 pages

Major Incidents

The document outlines best practices for creating and managing a Major Incident Procedure within organizations, emphasizing the importance of defining what constitutes a major incident and establishing a dedicated Major Incident Team. It details the roles and responsibilities involved, the necessity for effective communication during incidents, and the need for regular reviews of the procedure to prevent recurrence. Additionally, it highlights the importance of planning and preparation for potential major incidents, including defining processes, roles, and communication plans.

Uploaded by

lennss857
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Major Incidents - Best Practice Advice

Creating a Major Incident Procedure is often overlooked in many organisations, or left to IT


Service Continuity Management (ITSCM) to create. It’s worthwhile considering if you have an
appropriate procedure in place. If not then here is the basic information you will need to get
started. If you already have a procedure then you could use this as a checklist. Either way we
hope that you find this article of use…

Firstly – What Are Major Incidents?

Major Incidents are events which require a different approach from ‘normal’ day to day
incidents.

People sometimes use loose terminology and confuse a major incident with a problem. In
reality, an incident remains an incident for ever – it may grow in impact or priority to
become a major incident, but an incident never ‘becomes’ a problem. A problem is the
underlying cause of one or more incidents and remains a separate entity always!

“An incident never ‘becomes’ a problem"

A definition of what constitutes a major incident must be agreed and ideally mapped on to
the overall incident prioritisation system.

Where necessary, the major incident procedure should include the dynamic establishment
of a separate Major Incident Team, under the direct leadership of the Incident Manager.

The Major Incident Team is tasked to concentrate on this incident alone, and to ensure that
adequate resources and focus are provided for finding a fast and effective resolution.

Generally major incidents are typically those for which the degree of impact on the
business/organisation is extreme.

Incidents for which the timescale of disruption – to even a relatively small percentage of
users – becomes excessive should also be regarded as major incidents.

“Are all priority 1 incidents, Major Incidents?”

It is possible to define some of these major incidents, but most will be prioritised as they
happen based on impact and urgency. Usually Priority 1 is set aside for these types of
incident. A separate procedure, with shorter timescales and greater urgency, must be used
for major incidents.

Typically the same major incident doesn’t recur. If it does then someone has seriously failed
in their duties to prevent a recurrence. But more of that later…

(Click here if you might be interested in our major incident management training course).

What If We Don’t Have A Dedicated Incident Manager?


If the Service Desk Manager is also fulfilling the role of Incident Manager, which can be the
situation in some smaller businesses and organisations, then a separate person may need to
be designated to lead the major incident investigation team. This will avoid conflict of time
or priorities. Whoever is appointed should ultimately report back to the Incident
Manager/Service Desk Manager.

The Incident Manager is the process owner for the Incident Management process and as
such needs to work closely with other process owners and practitioners.

If the cause of the incident needs to be investigated at the same time, then the Problem
Manager will be involved as well, but the Incident Manager must ensure that service
restoration and underlying cause investigation are kept separate.

“Beware of the ‘conflict of interest’ between Incident and Problem Management”

Throughout the major incident the service desk will ensure that all activities are recorded
and users are kept fully informed of progress. Communication is a hugely important activity
in handling major incidents and should not be underestimated.

The Major Incident Manager (or Problem Manager if covering the role) should arrange a
formal meeting with interested parties (or regular meetings if necessary). These should be
attended by all key in house support staff, vendor support staff and IT services management,
with the purpose of reviewing progress and determining the best course of action. The
service desk representative should attend these meetings and ensure that a record of
actions/decisions is maintained, ideally as part of the overall incident record as major
incidents are still logged in the same way as all other incidents (it is only the priority and
management of the incident which is different).

As a side note: If no Problem Manager or Problem Process Owner is currently in place, an


Incident Management Executive and Major Incident Team could take on the root cause
analysis activities.

“If root cause is not determined and addressed then there is a high risk that the major
incident will reoccur”

(You might also like to read this post about Problem Management in ITIL 4).

What Is The Major Incident Procedure?

A separate procedure, with shorter timescales and greater urgency, must be used for ‘major’
incidents. A definition of what constitutes a major incident must be agreed and ideally
mapped onto the overall incident prioritization scheme – such that they will be dealt with
through this separate procedure.

The Major Incident Procedure


A procedure should be in place to manage all aspects of a major incident, including
resources and communication.

It should describe how the business/organisation handles major incidents from receiving
notification of a potential major incident, through the investigation process itself and to the
delivery of a final report.

A related procedure describing the process of reviewing the major incident policy and
procedure also needs to be in place.

The Major Incident procedure should be reviewed on a regular (at least annual basis) also
before any major change and also following the occurrence of a major incident.

Some of the areas to be covered in the major incident policy and procedure are: Purpose,
Scope, Activity Definition, Policy and Roles and responsibilities.

(Click here if you might be interested in our incident management training qualification).

Purpose Of The Major Incident Procedure

Describe the purpose of the major incident policy and procedure. For example: “This
procedure and related policies have been put in place to document the
business/organisation’s requirements and arrangements for responding to and investigating
major incidents.”

Before continuing with this post, you might also be interested to read this one, looking that
question - What Is ITIL® Change Management?
Scope Of The Major Incident Procedure

Document the exact scope of the major incident procedure and policy. For example: “This
procedure and related policies apply to all Incidents that, due to their status of impact or
urgency to the business/ organisation, have been prioritised as a major incident.” Definition

A major incident is defined as an event which has significant impact or urgency for the
business/organisation and which demands a response beyond the routine incident
management process.

A major incident will be an Incident that is either defined in the major incident procedure or
which may either cause, or have the potential to cause, impact on business critical services
or systems (which should be named in the major incident procedure)

A major incident can also be an incident that has significant impact on business reputation,
legal compliance, regulation or security of the business/organisation.

Major Incident Procedure Policy

A policy defines the scope of the process or procedure. It effectively gives you a boundary to
prevent ‘scope creep’.

The business/organisation’s policy is to have an effective and efficient system for responding
to major incidents, which is appropriate to the individual circumstances.

The key requirements of the policy are:

 To provide an effective communication system across the business/organisation


during a major incident

 To ensure that an appropriate Incident Manager/Major Incident Team/Management


Group are in place to manage a major incident

 That there are in place appropriate arrangements to ensure that major incidents are
notified promptly to appropriate management and technical groups, so that the
appropriate resources are made available

 To conduct major incident investigations and to contribute to the


business/organisation’s knowledge of the causes of incidents.

 To provide timely information about the causes of incidents and any relevant findings
from investigations

 To conduct a review of each major incident once service has been restored and, in
line with problem management, to look at root cause and options for a permanent
solution to prevent the same major incident happening again

 To conduct reviews of major incident investigation policy and procedure,


independent of the major incident investigation, and to report on them (any lessons
to be learned from the policy and procedure review will be considered, and
appropriate action taken to ensure any improvements to existing arrangements are
implemented within a specified timescale)

Roles And Responsibilities In The Major Incident Procedure

The following roles and responsibilities need to be defined for managing major incidents:

 The Incident Manager

 The Problem Manager or if no Problem Manager exists, then the role of a Root Cause
Analyst (effectively a technical expert trained in RCA techniques – possibly 3rd line
support)

 Major Incident Investigation Board

 Investigation Team/investigation resources (technical staff)

 The service desk

 Service level managers/IT account managers

 Business relationship managers who may take part in the management of major
incidents to conduct communication with key customers.

 Any other relevant groups who will act as part of the Major Incident Team

Major Incident Procedure Reviews

It is important to conduct Major Incident Reviews where the review determines:

 How well did we manage the Major Incident?

 Could it have been prevented?

 Could we do things better next time?

 How do we stop a recurrence?

A review is a very important aspect of the Major Incident process and should be carefully
planned and managed. Typically it will be chaired by the Major Incident Manager or by a
senior member of the management team.

 All relevant parties involved in the Major Incident should attend the review.

 Supporting documentation such as the Incident Record is shared in advance if


possible.

 A walk-through of the incident together with the actions taken.


 Attendees are asked what went well and what did not, what actions will be taken to
prevent re-occurrences and/or assist with the resolution should it happen again in
the future.

 In addition the Major Incident process as a whole is reviewed and improvements are
again identified.

 Meeting minutes should be produced detailing attendees, actions identified, who has
been assigned the action and expected completion date.

 It is the responsibility of the Major Incident Manager to ensure the actions are
completed accordingly within agreed timescales.

Beware not to confuse a Major Incident review with a Major Problem Review:

 Those things that were done correctly

 Those things that were done wrong

 What could be done better in the future

 How to prevent recurrence

 Whether there has been any third-party responsibility and whether follow-up actions
are needed.

The knowledge gained from both reviews should be incorporated into a service review
meeting with the business customer to ensure the customer is aware of the actions taken
and the plans to prevent future major incidents from occurring. This helps to improve
customer satisfaction and assure the business that service operation is handling major
incidents and problems responsibly and actively working to prevent their future recurrence.)

Communication In Major Incident Procedure And Related To Emergencies

Although ITIL® specifies how to deal with urgent, high-impact situations such as disasters (IT
Service Continuity Management) and major incidents (Incident Management), managers in
the service operation stage will find themselves dealing with various types and scales of
emergency not covered in these processes. It is important to note that this is not a separate
process; rather it is a view of several processes and situations from a communication
perspective.

Communication during emergencies is similar in purpose and content to communication


during exceptions. The main differences are in the level of urgency and impact of the
exception.

Emergency communications are usually initiated by the incident manager or by a senior IT


manager who has been designated as the escalation point for all such emergencies.
In the case where an IT service continuity plan is invoked, this will include a detailed
communication plan to be executed by the appropriate authority.

The incident manager or designated manager will often form a ‘response team’, and the
communication is initiated and coordinated by this team.

If the Major Incident is evident in the public domain then careful communication needs to
be delivered by a senior member of the management team, maybe in the form of a carefully
drafted press release.

Other Considerations

The following also need to be considered for managing major incidents:

 Knowledge transfer – The Major Incident Manager should not become a single point
of failure. Consider training a number of senior personnel in this role if possible and
establish a way of keeping all stakeholders informed of changes to plans, procedures
etc.

 Changes to appropriate documentation – Consider how changes to documents are


managed. A history log at the beginning of each document is the simplest way, but
you may also consider formal version and copy control, or use of document sharing
technology.

 Changes to appropriate processes – When processes change ensure all stakeholders


are informed and trained as appropriately.
Planning For Major Incidents

Despite our best efforts a major incident will at some stage no doubt occur and therefore as
well as developing a Major Incident policy and major incident procedure we should
undertake a number of other activities in preparation…

 Define Major Incident Process - Defining and documenting the Major Incident
process, including a high-level flow diagram, is invaluable. The process
documentation will then assist with defining the associated procedures to be used by
all parties.

 Define Roles and Responsibilities - Clearly define in generic terms the roles and
responsibilities of each party, both internal and external to the organisation, engaged
in the Major Incident process. Creating a RACI matrix is often the easiest way to do
this, determining those: (Responsible, Accountable, Consulted and Informed) for
each activity. Ensure that all involve understand their role and responsibility and are
appropriately trained.

 Review Service Level Agreements and Service Catalogues - Working with the
Business Relationship Manager and Business representatives determine the mission
critical services and components. Business relationship managers may assist in
gathering detailed requirements during the service design stage of the lifecycle for
new services.

 Liaise with Information Security Management - In the current ‘Cyber-attack’ climate


consider the implications of security breaches, Phishing attacks, Malware,
Ransomware and other software virus attacks as part of your planning.

 Identify IT Service Continuity Management (ITSCM) Interfaces and Involvement -


Determining when and who needs to be communicated with from the IT Service
Continuity Management team when a Major Incident occurs. Also agree and capture
what triggers the ITSCM plan or what circumstances would invoke the ITSCM plan.

 Define Incident Priorities – As part of ‘normal’ incident management process it’s


important to establish a simple clearly defined incident priority hierarchy covering
low priority through too high or critical priority incidents (Major Incidents). This
would normally be based on Business Impact and Business Urgency, but could
incorporate other factors such as ‘Technical Severity’. The incident priority should be
reflected in the generic "IT Service Support Model" if one exists. It’s imperative to
ensure there is no confusion regarding priority, especially regarding what constitutes
a "Major Incident" and that they can be applied across the IT department and its
third party suppliers as well as the organisation's Business community.

 Define Incident Escalations - A Major Incident has the potential to have a significant
impact upon an organisation for example, from a reputational, legal, trading and in
some cases life and death perspective. Speed is of the essence and any delays can be
very costly. By establishing an Escalation Hierarchy within the organisation and
associated third party suppliers, appropriate authorisation, focus and resource can
be committed, in a timely manner to the Major Incident, to resolve and re-establish
the service(s) in question. Note: Both hierarchic and function escalations need to be
considered.

 Review Underpinning Contract(s) and Operating Level Agreement(s) - Examine


Contracts with existing third party suppliers and Operating Level Agreements (OLAs)
with internal support teams to determine whether they align with the Major Incident
process. Where UCs and OLAs do not align with SLAs they will need to be
renegotiated.

 Establish need for any ‘out of hours’ (OOH) arrangements - It is possible that some
internal support teams will be required to have staff available "out of hours" to assist
with Major Incidents that may occur. Some form of compensation may need to be
agreed with staff. The organisations Human Resources department would usually
undertake this requirement.

 Create a key contact list - Capture the names, job titles, telephone numbers (both
landline and mobile), preferred methods of communication of the various individual
team members and third party suppliers involved in the Major Incident process.

 Establish Communication Plan(s) - It is important to communicate out to the


Business community and other relevant staff (e.g. Business Relationship Manager(s))
in a timely manner detailing when a Major Incident occurs, followed by progress
update(s) and finally notification of the restoration of service. From experience it
worthwhile from an effectiveness and efficiency perspective to target the
communications to those who are affected by the Major Incident. In advance identify
who is to be contacted, the method and frequency of the communication, the
Business Relationship Manager(s) should be of significant value in identifying and
agreeing the contacts. Email is often used and setting up and maintaining distribution
lists often facilitates such communications. Consider setting up Communication
templates for:

o Major Incident notification

o Progress Updates

o Service Restoration

 Create an Escalation Plan - Establish the hierarchy of names, job titles, telephone
numbers (both landline and mobile), and the time period following the occurrence of
the Major Incident each individual will be contacted should the incident not be
resolved. The further up the hierarchy the more influential the individual will be
expected to be within the organisation with the capability to expedite resources and
the availability of key individuals. The scope of the Escalation Plan should include
internal and third party suppliers.

 Checklist(s) - Checklists save time, reduce stress and ensure all aspects of a Major
Incident are considered. Establish checklist for:

o Meeting Agendas

o Communications

o Escalations

o Staff Rotation (shifts)

o Staff Facilities.

 Command and Control Centre - Where possible identify a dedicated location,


including meeting room equipped with conference call facilities, whiteboards,
flipcharts and pens. Ensure out of hours facilities such as Security, parking, heating,
toilets, food and water are available and maintained. The meeting room may well be
used outside of Major Incidents but on the understanding that if a Major Incident
occurs then the room will be commandeered and existing occupants expected to
leave immediately.

How Do We Establish A Culture Of Continual Improvement?

It’s necessary to maintain and continually improve the major incident procedure and plans,
and to encourage all stakeholders to keep up to date with their knowledge of Major Incident
Management.

Therefore consider:

 Policy and Process Improvement - As part of the "Post Major Incident Review" the
Major Incident process is reviewed. In addition the process should be periodically
reviewed with the stakeholders. Any improvements would be raised as a Request for
Change (RFC) and follow the Change process. All policies and processes should be
reviewed at least annually.

 Change Management – Consideration needs to be given to all changes to evaluate if


a change may impact on the existing Major Incident procedure or plans. This is
equally important for ITSCM.

 Current and Accurate Contact Data - As defined in the roles and responsibilities
supporting the Major Incident process everyone is required to provide any updates
such as contact names, job titles, telephone numbers, email addresses and methods
of communication to the Major Incident Process Owner. The Process Owner will
update the relevant checklists and communication documents and communicate out
accordingly.

 Lessons Learnt - As previously mentioned time is of the essence when dealing with a
Major Incident and for those new to the process receiving education and training in
advance can only be beneficial. Obviously understanding the process and procedures
is important, but also consider using recent Major Incidents as training scenarios,
including the lessons learnt from the post Major Incident Review.

 On-going education and awareness - All organisations to some extent experience a


turnover in staff. Therefore it would be beneficial to establish an education and
awareness plan incorporating scheduled sessions for both key internal staff and
those of third party suppliers involved in major incident management.

You might also like