0% found this document useful (0 votes)
40 views109 pages

SAP Process Refresh Session v3.0

The document provides an overview of SAP HEC (HANA Enterprise Cloud) support processes, including roles such as the 24/7 Operations Manager and Cloud De-Escalation Architect, who manage incident resolution and communication during outages. It outlines the escalation criteria for incidents, the importance of maintaining communication with customers, and the procedures for reporting production down situations. Additionally, it details the customer lifecycle, service level agreements, and the technical aspects of incident management within the SAP HEC framework.

Uploaded by

koragi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views109 pages

SAP Process Refresh Session v3.0

The document provides an overview of SAP HEC (HANA Enterprise Cloud) support processes, including roles such as the 24/7 Operations Manager and Cloud De-Escalation Architect, who manage incident resolution and communication during outages. It outlines the escalation criteria for incidents, the importance of maintaining communication with customers, and the procedures for reporting production down situations. Additionally, it details the customer lifecycle, service level agreements, and the technical aspects of incident management within the SAP HEC framework.

Uploaded by

koragi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

SAP HEC Overview

Process Refresher Session

www.cloud4c.com
SAP
Systems Applications Products
Sap
Major Incident Management
SAP HEC Support - 24x7 Phone Contact
Process Flow : “Production Down”
Escalation Criteria Based on Ticket Priority
24/7 Operations Manager (Cloud MoD)
Role Description

The 24/7 Operations Manager (Cloud MoD) facilitates the coordination of the incident resolution process
during escalations. Ultimate goal is that the customer’s cloud services are up and running as soon as
possible.
Main Tasks:
 Owns Major Incidents (escalation level 1) and drives technical incident resolution in case of
escalations level 2/3
 Evaluates and decides whether to trigger an Escalation Level 2
 Management of Cloud Task Force: set up, coordinate and close
 Involves necessary SMEs from different CDS areas and escalates to Cloud management in case of
bottlenecks
 Collaborates closely with Cloud DEA to drive overall Action Plan (provides technical solution plan)
 Provides list of affected customers and impacted services in case of Level 2 Escalations
 Triggers Problem Management to ensure finding of root causes and implementation of solutions
 Ensures smooth handover of the issue to the next Cloud MoD shift
 Ends hierarchical escalation

Availability: 7x24
Cloud DEA (Cloud De-Escalation Architect)
Role Description

The Cloud De-Escalation Architect’s (Cloud DEA) main task is to take ownership of our customer’s critical and
complex incidents, setting up an Action Plan and ensuring appropriate communication to customers & internal
stakeholders. Ultimate goal is that the customer’s core business processes are up and running again as soon
as possible.
Main Tasks:
 Owns critical incident/situation (escalation level II)
 Understands the business impact for the customer and ensures that this is reflected in the Action Plan
 Creates the overall Action Plan with input of the MCD MIM/MoD, or similar role on partner side
 Communicates the Action Plan, solution approach, status and resolution to the customer
 Communicates to and aligns with internal stakeholders (e.g. Management, Sales, Consulting, Board)
 Engages Legal/Corporate Communication if necessary
 Ensures smooth handover of the issue to the next 24x7 MCC Location (shift change)

Availability: 7x24
Trigger: Every priority 1 (“very high”) incident in BCP, that passes initial Production Down Judging
How can SAP reach the Supplier Cloud Control Centre?
Technical View: Ticket flow
Single and Multiple Customer Outages: Overview (simplified)
Incident Escalation in Customer Outage

This guideline applies for customer individual outage and also for Multi Customer Outages (MCO)

 Trigger Escalation Level II to SAP HEC


•Based on defined thresholds, the Premium Partner triggers an Escalation Level II in HEC to reach HEC Cloud. DEA.
The Premium Partner ensures "very high“ incident tickets are always created in the Service Marketplace.
•The SAP Cloud DEA announces herself or himself in the ticket if the CESM is not available

 Escalation Communication
• SAP and their premium partner has Service Level Agreements regarding customer communication in the
Incidents for governing the SAP HEC model
• The Premium Partner provides a technical resolution plan in SMP (SAP Service Marketplace)
• The Premium Partner keeps SAP incident updated at least every 2 hours* (SAP Best Practice, see next slide)
• The SAP Cloud DEA is responsible for customer communication if the CESM is not available

 SAP ticket is the “Single Source of Truth” and is used for legal and compliance support. If an email is used
or if a conference call is held, the email or conference call summary should be documented within the SAP ticketing
system
SAP Best Practice for Customer Outage Communication
Communication Sequence | Multiple Customer Outage (MCO)
ITIL Definition: Incident vs. Problem
MCC Production Support: Mission & Priorities
Definition of “Production Down”

Production Down = System Down AND/OR Business Service Unavailability

 System Down: Technical unavailability of


– One mission critical part of business service stack from disk up to application or
– Majority of redundant parts of business service stack (e.g. application servers,
VMs)

 Business Service Unavailability: For a <majority> or <important group> of users


the system is perceived – (too) Slow, severe delay in business operation
– Instabilities or
– Producing incorrect results
Critical & Urgent

SAP Note 67739:


 „...
 A productive system is completely down.
 The imminent go-live of a production system can't be completed.
 The customer's core business processes are seriously affected.
 A workaround is not available.
 ...”

SAP Note 19500:


 „...
 We have changed the priority you suggested as from our perspective, the
preconditions for this priority are not met in this case.
 ...”
Production Down Time Line
Process Flow: MCC Production Support
How to Trigger MCC Production Support
Every “Production Down” situation has to be reported ASAP in ONE Customer Incident on priority
“Very High” via Support Portal to trigger MCC Production Support for “escalation level II”
procedures
Use dedicated customer account and installation number. In case of a multiple customer outage, the
Cloud MoD ensures that a BCP ticket is raised on an Supplier customer number internal customer
number on priority "very high". This serves as master ticket for the outage.
 Content of Incident – add these details
 Priority: Very High
 Your24/7 contact details(24/7OPSMNGR)
 Comprehensive list of affected CUSTOMERS, SERVICES and CUSTOMER CONTACTS (e.g. CESM, TLO,
TQM, EA, ...)
 Short description of affected business functionality and timeline, especially the starting time of the business
service unavailability
 Affected systems (e.g. Data Center, SID, DBPool, Tenants) Changes performed &
Symptoms(e.g.anychangesperformedrecently,servicenotavailable,anyrecurrencepatternsrecurringperformancei
ssues)
 All external ticket numbers already raised in the context of this incident, countermeasures and customer
communication already performed (e.g. iterative calls
 every 2h etc.)
 Technical Resolution Plan
 System access
Decision Guidelines for MCC Production Support
RED LINE: Rule Set
Cloud Escalation Levels - Overview
Escalation Level Descriptions
SAP HEC Additional Information
HEC |Cloud Start Vs HEC Production
HEC | IaaS
HEC | Cloud4C Services Framework
HEC | Hostname Convention
HEC | <SID> Nomenclature

SID -> System Identifier


• Three letter
• First Letter is Character
• Next two letters can be Alpha Numeric
• For Eg:
• Customer : Bajaj Auto
• Customer Identification : <CID> : BAL
• S/4 HANA DB : Prod : <SID> = S4P
• S/4 HANA DB : Dev : <SID> = S4D
• S/4 HANA DB : QA : <SID> = S4Q
SLAs
HEC | Application Availability

Production - PRD

99.9% 99.7% 99.5%

Non – Production – DEV / QAS

99.0% 95.0%
Monthly Downtime Tolerance per SID

IaaS Server Provisioning • 99.9% SLA = 43mins


• 99.5% SLA = 3hr 23mins
95.0% (VM Uptime)
• 95% SLA = 1day 12hr
HEC | Initial Response Time – (IRT)

Very High

20mins 24X7
Action pLAN after 4 hours

High

20mins 24X7 (PROD)


4hours Local business hours* (non-prod)
RESPONSE

INITIAL
Medium

TIME
4hours Local business
Hours* (non-prod)

Low

1 business day*

* 8:00 to 18:00 at Business Day of contractually defined Customer main location


Backup Frequency And Retention
PROD: Daily Full, logs per SAP product support
30 days retention time.

DATABASE
Non-PROD: Weekly Full, logs per SAP product support
BACKUP 30 days retention time.

Non-PROD: Monthly Full, daily incremental


FILESYSTEM 2 months retention time.
BACKUP

All backups will be replicated to a secondary


datacenter or location.
SECONDARY
BACKUP COPY
EMS - AMS - HEC Services
EMPTY HEC CELL
EMPTY HEC CELL
EWA | Analysis
Cloud4C | KeePass

Manage your
passwords in a
secure way
HEC | Key Points
Cloud4C HEC Process
HEC & NON HEC Engagement

SAP HEC (HANA Enterprise Cloud) is a managed, private cloud hosting service for SAP HANA and its related
applications.

HEC Non-HEC
NON-HEC

Customer Customer
HEC Engagement

Build Run
Build Activities
• Project Management Run Activities
• Solutions • Alert and Incident Handling
• Project Deployment • Service Request and Change
• QC Handling
• Handover • Problem RCA and SIP Handling
• EWA, Security Management
Stages Involved : Presales Phase, S2D Phase, Infra • Monthly, Quarterly Reporting
Build Phase, Application Build Phase, Quality
Check Phase, D2R Phase
HEC | Customer Life Cycle – With CSM
S2D D2R

Sales Project Delivery Operations

=
Systems handed over
DEV to customer.
QA Monitoring and SLA’s
PRD tracked
HEC Sales & CAA HEC Sales & CAA

Customer Delivery Manager (CDM)


=
Business Go-Live and
Program Manager (PM) Change Control
Board process
Build Manager (BM)
initiated.
Customer Success Manager (CSM)

SAM will be working as Shift leads 24x7. SAM

Infra Build App Build 24 X 7 Support Team


QA / QC

Quality
Customer implementation Project
Named resources. Working during customer working hours.
Technical Landscape Owner (TLO)

Customer Lifecycle /Ops (SAP delivery org. setup/main roles towards the customer and details of On-boarding to Operations – Non CSM)
HEC | Customer Life Cycle – With Non CSM
S2D D2R

Sales Project Delivery Operations

=
Systems handed over
DEV to customer.
QA Monitoring and SLA’s
PRD tracked
HEC Sales & CAA HEC Sales & CAA

Customer Delivery Manager (CDM)


=
Business Go-Live and
Program Manager (PM) Change Control
Board process
Build Manager (BM)
initiated.

SAM will be working as Shift leads 24x7. SAM

Infra Build App Build 24 X 7 Support Team


QA / QC

Quality
Customer implementation Project
Named resources. Working during customer working hours.
Technical Landscape Owner (TLO)

Customer Lifecycle /Ops (SAP delivery org. setup/main roles towards the customer and details of On-boarding to Operations – Non CSM)
HEC Engagement - Build

Build
Engineer
Infra SAP System Build Quality +
PM would: Readiness Application Build Check
NEW CUSTOMER Build
1. Perform sanity check of

QC Team to perform APP & DB QC . Ensure QC of


S2D Gate - Customer Kick off Call
NW Manager
PO, SOW & OPF

Committed Delivery Date


+

Q Gate -- B2R Build to Run


2. Liaise with build lead

Upload duly filled QC in Centralized Portal


to create build sheet HW PM
NEW PHASES
3. Liaise with customer System Build +
validate SOW Vs Build OS Monitoring CSM
Sheet Configuration
+
4. Ensure that the Run TLO
1 ) Signed infrastructure Backup

(wiki.cloud4c.com)
procurement including Configuration

Infra is in place .
Purchase Order

Perform
&

Infra QC
backup infra, is in place
2) OPF, SOW filled Run TLO to perform
& available 5. Plan for S2D/ Customer MyShift Customer mini-checks RUN
Kick off call Configuration TLO
3) PM for region
informed

PREP 4 BUILD SYSTEM BUILD PHASE OPERATIONS

[ PM & CSM ]
Abbreviations

Abbreviation Definition
CDM Client Delivery Manager (aka Engagement Lead)

CAA Cloud Architecture & Advisor

HEC HANA Enterprise Cloud

AE Account Executives (aka Sales)

PSA Principal Solution Architect

PM Program Manager

TLO Technical Landscape Owner

CSM Customer Success Manager

PCoE Partner Center of Excellence (TPMs)

SAM Service Assurance Manager

QA / QC Quality Assurance / Quality Control

S2D Sales to Delivery

D2R Delivery to Run

TCV Total Contract Value


Communication Flow
Outage Communication(s)

Resolve Ticket(s) SAP CDM


CUSTOMERS

CSM Led Customers TLO / RL Led Customers

CSM + TLO/
SAP DEA
RL

SAP COE
COE Team
- OS Team - Database Team
TLO SAM Team

- Network Team - Virtualization Team


- Backup Team - Storage Team
Shared Basis Pool

ż 24/7 SCC/Support Team


(Resolve Tickets Handling)
L1/L2 Team

Command Center

Zabbix

FRUN Alert Raise


ALERTS Resolve Ticket
Routing|Monitoring Team Qualifies Incident
TLO
CSM
Roles Build
SCC

56
Joint SAP / Supplier delivery
Role definitions

Role Definition

Program Manager
o PM is responsible for all build aspects of the system for customers of the assigned

region
o Liaise with build team to create build sheet.
o Liaise with CSM, to perform a proactive planning for the customer
o Ensure that the infrastructure is in place ‘on-time’ for the builds
o Ensure QA / QC is done.
o Duly filled QC Report is uploaded into the JAM site
o Ensure all configurations viz., backup, EWA & Technical Monitoring via FRUN are
done before D2R.
o Plan in such a way that the D2R is conducted minimum of CDD-2 date
o Assume the role of ‘Build TLO’ from the contract till S2D
o Help CSM to liaise with EL and Customer during build phase
Joint SAP / Supplier delivery
Role definitions

Role Definition

Build Manager
o Understand PRC, SRO and Create Technical understanding document

o Development Deployment Schedule per Phase per SID based on CDD as per the templates

o Engage in S2D calls update the deployment plan with the SID technical Design date

o Track individual milestones with respective tests and QC

o Automation and Test Report(Quality Checklist)

o upload all the test data at designated as defined

o Participate in D2R and undertake tasks as defined by CSM

o Generate First EWA and undertake any action tracker deficiency as defined by CSM
Joint SAP / Supplier delivery
Role definitions

Role Definition

QA / QC Build phase :
o Compliance check during build phase inline with PRC and OPF
o Report on the findings to the Build Lead and final audit report post corrective actions to be
uploaded into the JAM portal.

Run Phase :
o Daily ticket audits and coach the team on improvement areas
o Support the Run Team in building and reviewing the RCA for quality and timely submission
within the SLA.
o Ensure RCAs are created for every outage.
o Own Change Management life cycle, circulate internal reports for the same.
o Publish KPI reports on monthly basis.
o Maintain process and SOP document repository
o Shift handover audits.
Joint SAP / Supplier delivery
Role definitions

Role Definition

SAP Command Centre


o Monitoring Team: Responsible for acknowledgment of tickets, creation of tickets during outages
(SCC) (24*7 Team)
and assignment of tickets to the respective stakeholders

o Alert Handling Team: Responsible for validating the alerts

o Outage Handling Team: Responsible for fixing issues occurred during outages

o Incident Handling Team: Responsible for fixing P2, P3 issues


Joint SAP / Supplier delivery
Role definitions
Role Definition
Support Team 24x7 o Acknowledge the ticket received from Customer on BCP(Resolve Tool) and MyShift Tool and
work towards the resolution.
o Perform the planned activities raised via Resolve and MyShift CR or EMS tickets and execute
to closure.
o Provide 24x7 Support on all SAP HEC Customers and 16x7 Support for Non-HEC Customers.
o Provide proper shift handover to the next Shift joining the shift.
o Escalate to TLO/CSM/SAM in case of issue fix is getting delayed due to any challenges.
o Work on Solution Manager alerts.
o Monitor alerts from SolMan
o Qualify and work on Alerts
o Raise incident for expert intervention
o Support Outage Management team
o Outage Management (creating tickets/ triggering bridge calls/escalations)
o Liaise with individual CoE to ensure alerts are valid
o Pull DEA in case of outage
Joint SAP / Supplier delivery
Role definitions
Role Definition

SAM o Take complete ownership of the entire Shift.


o Queue Management
o Incident, Service Request and Alert lifecycle management – tracing and tracking the tickets at every stage.
o Ensure ticket quality hygiene is maintained.
o Ensure KPI of shift is adhered based on the SLAs defined.
o In case of Major Incidents adhering to the process and execute timely actions.
o Timely escalations to be done following escalation matrix
o Ensure all incoming calls, mails and requests are addressed as per defined TAT’s
o Ensure minimal backlog tickets.
o Scheduled maintenance activities & internal tasks
o Support TLO in day to day activities, as defined
o Regular health checks for RED & YELLOW SID as defined
o Monthly Report creation and EWA implementation of corrective plans
o Ensure shift handover is properly done and shift handover documents are populated.
o Coaching, training and mentoring the entire support team.
Joint SAP / Supplier delivery
Role definitions
Role Definition Comment

TLO o TLO is the technically responsible owner of the entire customer landscape from D2R Q Gate TLOs do not interact
until end of contract, including handling escalations / outages by providing technical solutions. with the EL /
o Validate & takeover the landscape as part of D2R –Eg: SolMan Integration, PRC, QA report Customers directly and
o Conduct PPM (Proactive Preventive Maintenance) of the landscapes . all communication
o Work with the CSM to provide in depth technical root causes and preventive actions as part of the RCA and submit to CSM. should go through
o Owns the IT calendar for the customer landscape and approves any landscape changes formally CSM.
o Trigger ‘Planned Maintenance Activities’ with SAM and provide details for L1/L2 to perform In the absence of the
the tasks. CSM, The TLO & SAM
Analyze of EWA Reports and create plans. will assume the role of
o Create ‘Go-to-Green’ plans for RED Customer and ask CSM for support/approval from EL CSM.
o Perform QC of the planned activities done by L1/L2 support, before informing CSM about
completion. CSM would in turn perform a basic check before informing EL/customer SPoC
o Support CSM in customer and EL engagement and interactions (offsite /onsite) & handling escalations
o Create RED Customer get well plans and ad hoc measures
o Routine & Ad hoc reports (daily, weekly, monthly) of all assigned customers as per CSM directions
o EWA Reports, Support in RCA, Resolve ticket updating as per medians
o Regular health check of systems (App & DB)
o TLO consumes the report from Command Center –GREEN in a week, YELLOW twice a week & RED customer every day and
sends it to the EL & CSM as part of the SoD(start of day) process
o VERY CLOSE interaction with CSM on daily basis on customer landscape. Leave plans to be informed to CSM for better planning
o Regular health check of systems
o Attend call with CSM & EL , as and when required to provide technical update/support
o Attend to tickets (RESOLVE/MYSHIFT) adhering to SLAs. support L1/L2 for technical resolutions
o Spillover of day activities to be informed to the SAM and CSM(in Cc)
Joint SAP / Supplier delivery
Role definitions
Role Definition Comment
CSM o CSM owns the entire Account lifecycle from start till end of contract
o CSM is identified upfront at the sign of the contract and stays through the project till end of the In the absence of
CSM, The SAM &
contract. TLO’s play the role
o CSM is responsible to liaise with EL and Customer and plan for the further system builds of CSM.
o phases of customer projects
o During build owns all Q gates ( S2D, CDD, D2R) , while individual roles shares the responsibilities
o Liaise with Customer and EL engagement and interactions (offsite /onsite) & handling
o customer escalation by communicating at regular intervals and also ensuring customer
o confidence is not lost. Daily contact with PM & TLO. Absence to be updated to TLO/PM.
o Ensure ‘Go-to-Green’ plans are created for RED Customers
o Ensure that RED customers turn into GREEN within short time (~ 4-5 days)
o Generate and publish routine (daily, weekly, monthly) , Availability and Performance Reports & Ad hoc
reports of all assigned Customers.
o Quarterly update of Myshift and CMDB with Customer and SID details.
o Ensure that the customer systems are monitored. Conduct surprise audit of
o alerting/monitoring
o Work closely with TLO’s and Engineering Team to build and publish RCA’s within SLA to the Customer.
o In the absence of TLO (for any reason) The CSM will Liaise with other SME’s to take ownership of the
technical and non-technical issue(s)
Joint SAP / Supplier delivery
Role definitions
Role Definition

CoE Build phase :


o Infra Build
o Backup Configuration as per contract
o Network Build
o POD Build

Run Phase :
o Incident Management
o Problem management
o Alert Management
o Change Management
o PPM activities
Cloud4C
ITIL Processes

66
Service Operation Processes
Service Operation
Problem Change
Service Request Management Management
Service Desk Detection & Logging
Change Logging
Function
Risk & Impact
Categorization Service Assessment
Request Improvement
Fulfillment Prioritization Proposals POA & Customer Approval Service
Logging & Validation
Incident Analysis & Policy
Chane Execution
Management Diagnosis / RCA
Permanent
Categorization & Resolution of
Prioritization Identification & Logging Fix Change Closure
Known Error

Authorization Categorization Closure KEDB Event Access


Fulfillment Prioritization Review Management Management
Alerts
Diagnosis & Escalation & Events Detection Request Logging
Closure
Hierarchical / Functional
Request Verification
Correlation
& Validation
Resolution & Recovery Known
Errors Response Selection Provide Access Rights
Closure
Request Request / Remove
Review Action
Model Problem Access Rights
Records
Close Event Log & Track Access

Incident
Records

Service
Requests

Roles
& Groups

Customer Satisfaction Survey Service Reviews & Improvement Planning Management Information Review & Trend Reporting
Steps involved in handling SR with downtime

Closure
Once activity is successfully 06
executed, close the internal CR
and update the Resolve SR ticket

01 02
Service Requests
Zero Minutes 05 Minutes
With Downtime SCC Acknowledge the ticket, provide an
Receives a SR with downtime
from customer through Resolve. acknowledgement to customer and
assign it to appropriate department SR Fulfillment
05 Activity is executed
03
15 Minutes
Respective stakeholder to
create the internal CR
ticket, followed with PoA.

04
CAB Approval
CR is validated in the
CAB and approved
Steps Involved in Handling Major Incident or Outage

Closure
Closure notification to be 06
released based on outage
resolution time till all customers
issues are fixed.

Handling Major 01 02
Incident or Zero Minutes 05 Minutes
SAP NOC team Communicate to Confirmation received from COE/TLO
Outage team on qualifying of alert and business
COE/TLO team to check alert (via
email) to validate if they are impact. 45 Minutes
05 Further periodic updates
genuine.
03 about customers impact
and the issue got
20 Minutes resolved will be released
SAM will release outage notification and bridge
by SAM.
to be initiated with respective COE/TLO teams.
Also, a Very High Priority Resolve ticket is created
by the SAP NOC team.
04
25 Minutes
SAM keep coordinate with respective COE
teams, TLO’s & monitor alerts in order to
keep track of issues and getting fixed on
timely basis.
Steps Involved in Handling Major Incident – Reported by Customer

Closure
Closure notification to be released to 06
customer and keep the ticket in resolved
status . With in 24 hours Incident report
will be shared to customer through same
ticket.

Handling Major 01 02
Incident –
Zero Minutes 05 Minutes
Reported by Receives a Very High Incident SCC Acknowledge the ticket, provide
Customer from customer through Resolve. an acknowledgement to customer and
assign it to appropriate department. 45 Minutes
05 Further periodic updates
03 about customers impact and
the issue got resolved will be
15 Minutes released by COE over ticket.
SAM will release outage notification
and bridge to be initiated with
respective COE/TLO teams.

04
25 Minutes
SAM keep coordinate with respective
COE teams, TLO’s & send a primary
update to customer.
Steps Involved in Handling Regular Tickets

Closure
Closure notification to be released 06
to customer with resolution steps
and ticket will be closed with
customer concurrence.

01 02
Handling
Zero Minutes 15 Minutes
Regular Tickets SCC Acknowledge the ticket, provide an acknowledgement to
Receives a P2 & P3 Incident &
customer and assign it to TLO whenever TLO intervention is
Service Request from customer
required, create a internal Myshift ticket and assign it to COE Further Updates
through Resolve, Email & Call. 05
whenever COE intervention is required. Further periodic updates
03 will be shared to customer
based on defined SLA till
ticket is closed .
SLA
TLO will work on the tickets and update over the Resolve
tickets. In case of COE teams, they will work on the Myshift
tickets and send the ticket to the TLO/SCC.
04
SLA
Update 2 will be shared to customer on current
status and resolution progress.
Steps Involved in Problem Management (RCA) - Stages

Day 10
Final RCA approval from 06
Customer and upload same to
wiki.ctrls.in

Problem 01 02
Management Day 1
Day 0
(RCA) - Stages Problem ticket will be generated
Respective COE manager/TLO will revert with
initial RCA to Quality team. Quality team will
for all Very High Incident tickets
& Customer/EL requested tickets.
validate and send it to Management Day 7 & 8
Approval. 05 Final RCA to be provided
03 to Quality for approval.

Day 2
Management will Approve
the RCA.

04
Further periodic follow-ups on final RCA
Day 3
Initial RCA will be shared to Customer and
same will be uploaded to JAM portal .
Steps Involved in Change Management – Normal Change - Stages

Closure Time
Change Closure in concurrence 06
with customer

Change 01 02
Management – Zero Minutes Day 1
Normal Change Identify the change to be executed Change POA to be submitted to TLO &
through P1 Incidents CSM by respective COE
Day 7 & 8
05 Change Execution
03
Day 2
TLO&CSM will raise a
change request after
getting an approval from
04 customer
Approval will be given
Day 3
CAB meeting will be conducted and approval will be taken
1. Technical Approval
2. Business Approval
3. Process Approval
Steps Involved in Change Management – Emergency Change - Stages

Closure Time
Change Closure with 06
customer concurrence.

Change 01 02
Management –
Zero Minutes 30 Minutes
Emergency Change Identify the change to be executed Change POA to be submitted to TLO &
through PPM, Org change, Incident, CSM by respective COE.
Problem permanent fix. 90 Minutes
05 Change Execution
03
45 Minutes
TLO&CSM will raise a change request
after getting an approval from the
customer .

04 Confirmation received from COE/TLO team


on qualifying of alert and business impact
60 Minutes
CAB meeting will be conducted and approval will be taken
1. Technical Approval
2. Business Approval
3. Process Approval
Steps Involved in Handling SIP Action Items

SIP action items are closed with 06


all the artefacts. Quality team to
track all the incidents before and
after execution of SIP.

Handling SIP 01 02
Action Items
SIP tickets are created and assigned to the
SIP action items are derived from
respective BU heads.
the RCAs and also proactive
inputs from Management team. 05 BU head with TLOs and
03 relevant stakeholders
execute the SIP action
items.
BU head decides the ETA.

04
Further periodic follow-ups

Quality team will follow up with the BU head for the SIP
action item updates.
MyShift

https://myshift.ctrls.in/

Cloud4C in-house ITSM tool which supports Event Management, Incident Management,
Service Request Management, Problem Management & Change Management
MyShift | Status

Open Activity Pending on


Customer

Engineer must If the engineer If the engineer


keep the ticket in planned a scheduled required / seeking
Open status till the activity then the ticket any information
time customer is will be kept in activity from customer
served with status with follow up ticket will kept in
requirement POC
MyShift | Status

Resolved Closed

The engineer resolved the issue Once the engineer receives the
and the customer is satisfied email / phone confirmation the
with the resolution provided ticket will be closed
then the status will be changed
to resolved
SLAs
Urgency is determined by how much the user is restricte
from performing their work.
Business/Service Impact
Impact is determined by how many personnel or
Issue Type functions are affected.

Primary Issue Secondary Issue Ternary Issue


P2: Critical Component
Critical P1 P2 P2
degraded

Business/

Urgency
Severity
Service
High P2 P2 P3

Medium P2 P3 P3
P1: Critical Component
Low P4 P4 P4
down

P3: Non-critical
component

25%
50%

Single User
P1: 50% or above of his environment is not accessible at application level. P4: Other
P2: 25% or less than 50% of his environment is not accessible at application level. request,
P3: Single user or degrade in performance question
0% Request
Service
Myshift- Points to remember while working on Myshift Tickets

Ensure ZERO backlog Myshift tickets always. Please use the below link to monitor the backlog tickets,
https://reports.ctrls.in/SAP_cloud4cdashboard/SAP_Cloud4CDashboard.php

Ensure to add Escalate when required


Transfer the ticket if you receive large
Adhere to SLAs
comments on the
to the correct number of breached
tickets before
team directly to tickets to ensure the
transferring customers query is
avoid multiple
hops addressed in a timely
manner

All Multi Level Approval tickets to be closed in timely manner. Please use the below link to monitor the Multi Level Approval Tickets
https://reports.ctrls.in/Ageing_Report_Sap/ageing_approval_dashboard.php
HEC | Ticketing Flow

CUSTOMERS

AUTO MANUAL
SAP BCP RESOLVE MYSHIFT
RAISES TICKET

SYNC

HEC RELATED TICKETS

Incident Tickets Change Tickets

Service Request Tickets EMS Tickets


Resolve | Ticket Status
Incident has arrived in SAP Resolve for the
first time and has not been modified yet.
New

The responsibility for processing the incident is


In with the partner. The customer expects an
answer to the problem. An incident is in this
Process status once a new incident has been modified
by the partner in SAP Resolve.

Sent
Incident is at customer side. The customer is
responsible to take the next action.
To
Customer

Sent
To Incident has been sent to SAP. No
updates are allowed.
SAP

Incident has been closed by the customer.


It cannot be edited anymore. No updatesConfirmed
are allowed.

Incident has been closed automatically due


Confirmed to maximum time at customer reached. It
Automatically cannot be edited anymore. No updates are
allowed.
Resolve | Priority

Very High High Medium Low

Problem is Business Business The problem


business Operations Operations has little
critical and are seriously are seriously influence on
has serious threatened threatened business
consequences and urgent and urgent operations and
for business tasks cannot tasks cannot does not hinder
operations be executed be executed daily operations
Resolve - Components
SAP in-house ITSM tool which supports Incident Management, Service Request Management & Change
Management

Resolve
XX-HST-CLS-INC Component
XX-HST-CLS-SRV XX-HST-CLS-CHG XX-HST-CLS-EMS

01 02 03 04 05
Incident Service Request Change EMS

Resolve URL: https://accounts.sap.com/saml2/idp/sso/accounts.sap.com


Alert, RCA, CR SLA’s

Alert
 For every qualified alert, Resolve ticket to be
created within 20 mins of alert generated time.

 Initial RCA SLA : 1 Day


 Final RCA SLA : 5 Days
INTERNAL RCA
 https://wikimedia.cloud4c.com/wiki/processdocum
ents/SAP-Run-Problem-Management-Process-
SOP15902161120cT1H.pdf

 All tasks and respective CR to be closed within 12 hours Change Request


of the activity execution time
 For every Service Request with downtime, internally CR
to be raised

Initial RCA SLA : 03 Days


EXTERNAL
Final RCA SLA : 10 Days
85
WIKI – Landscape Documentation

All the SID, IP address should be clearly


mentioned
01

02
Ensure Quality, production, development
systems are clearly mentioned with the
details

ERP systems information to be clearly


mentioned
03

04 All the customer landscape details to be


presented on the documents

Landscape overview to be clearly


explained 05
HEC Defined SLA’s
Customer update frequency*
IRT MPT (h:m)
SLA Priority * Meaningful customer updates while includes current status,
(hh:mm) max MPT for entire cycle
actions taken and next steps
Very High 00:15 04:00 Once per every hour
High 02:00 24:00 Once per every 2 hours
Incident
Medium 04:00 72:00 Once per every 24 hours
Low 08:00 144:00 Once per every 48 hours
Very High NA NA NA
High 02:00 24:00 Once per every 4 hours
Service requests
Medium 04:00 72:00 Once per every 24 hours
Low 08:00 144:00 Once per every 48 hours

Change requests and • Once to confirm/reject the activity execution (within 2 hrs. of ticket
As per the given
Service requests which All 02:00 creation)
downtime/execution plan
involves downtime • Once to notify start (When the downtime/activity starts)
• Once to notify end (When the downtime/activity ends)
• Activity status update once per every 2 hours
For every EMS, Initial update should be done within 6 hours of EMS received time and total billable hours to
be updated in the ticket within 24 hours of EMS ticket received time
EMS All

87
EWA Process

RED EWA

01 Do the EWA analysis identify the alert which caused the EWA to be Red.
02 Identify the accountable person to clear the alert.
03 Upload EWA Analysis in Jam Portal before 7 of every month.

04 Schedule a meeting with customer define the next action items.

05 Get the approval from customer.

06 Plan of Action & Resolve ticket to be created.


07 CR to be raised.

08 Plan of Action to be executed.


09 Validation to be done by TLO after executing the change.
EWA Process

GREY EWA
01 TLO to approach FRUN team for the root cause of Grey EWA.

02 Once the root cause is identified FRUN team inform TLO to update the SAP collector services.

03 TLO should approach customer seek approval for updating the services and restart the system in case of Java.

04 Once the customer approves the change TLO perform necessary updates.

05 Regenerate EWA.
Resolve

Downtime Activity:

For every downtime activity there will be a date and time provided by the customer. Before
we start an activity we need to update the resolve ticket i.e. ‘we are starting the activity’.
Once the activity is completed we need to update the customer about the completion and
provide the documents and poofs (if required) i.e. Activity is completed.
If the activity is taking longer than expected then an update needs to be provided every 2
hours about the status of the activity.

Example: If the downtime activity is scheduled on 31st March 2020 2:00 PM to 6:00 PM,
before the activity starts an update needs to be provided to the customer. If the activity is
taking longer and will not be completed by 6:00 PM, before this an update should be
provided on resolve ticket that we are working on this and it will take longer based on the
duration
Resolve

Check for Information:

We have to check the information provided by the customer if it is an Incident, Service Request or an
EMS ticket. If the ticket lands in a different queue we need to change the component to the right one
providing the reason for change.

Maximum Processing Time (MPT):

MPT is calculated from the time it is forwarded to partner. Based on the priority the MPT will change for
tickets.
Example: Incident with a Very High priority, if the ticket is forwarded to partner on 31st March 2020 at
7:00 AM then the MPT will be met if the ticket is updated with the resolution on 31st March 2020 at
11:00 AM with updated every one hour.
In this case, the updated needs to be provided on or between 7:00 AM to 8:00 AM, 8:00 AM to 9:00 AM,
9:00 AM to 10:00 AM, 10:00 AM to 11:00 AM. This is when the MPT and Update SLA will be met.
Resolve
Transferring the Ticket:

When transferring the ticket an update needs to be provided by the current owner and the reason why it
is being transferred. A ticket should be transferred to the person available in shift to make sure the SLA is
not breached, incases of no update available the owner of the landscape needs to intimated and with an
approval the transfer needs to take place. If the owner has provide an update that needs to be updated
on the ticket.

Changing the Component to Others (OTH):

We have an option to change the component to OTH for Service Request (SRV) and Extended Managed
services (EMS). This can be used when we have a scheduled activity or a recurring activity to be
performed. In this cases, we need to clearly mention the downtime provided/ approved by customer
and the reason for changing the component i.e multiple time the tickets needs to be kept in open to
perform the activity. However, for scheduled activity the update SLA needs to be followed
Resolve
Returning the Incident to Customer:

An incident is returned to customer as below:

• When the issue/query is resolved and we have provided a confirmation/resolution for the
issue/query.
• When we are requesting additional information from the customer for resolving the issue/query.
• When we are not clear with the information provided.

In any other cases than above the ticket is sent back to customer, it will still be considered as a breach in
process.
HEC : Roles and Responsibilities

SLA:
All tasks/services that are included as part of the standard HEC Services, covered by the
HEC Standard Services HEC Service Fee and performed by the HEC delivery organization, as applicable to
customer.
Every EMS ticket should be
HEC Optional Services: these tasks/services are not covered in the standard HEC updated with the clear
Services, and are not and cannot be covered by the HEC Enhanced Managed Services1
("EMS"). These tasks/services
information related to the
• may be elected by customer, progress and total billable hours
HEC Optional Services
• are subject to additional service fees, of the ticket within first
• must be specifically contracted for and itemized in the customer’s contract (original
HEC contract or via a change request), and 24 hours from the ticket
• can only be performed by the HEC delivery organization acknowledged time
HEC Enhanced
Managed Services1
HEC EMS services include tasks/services that a customer can perform, but the customer
("EMS") that
may elect to have SAP2 deliver.
can be performed by
customer
HEC Enhanced
Managed Services1 HEC EMS services include tasks/services that are not required for the HEC Computing
("EMS") that Environment, but that the customer may elect to have performed. These tasks/services
can only be performed can only be performed by SAP.
by SAP
HEC Excluded Tasks are those tasks/services that can only be performed by the
HEC Excluded Tasks customer and are excluded from HEC Standard Services, HEC Optional Services and
HEC EMS Services.
94
Problem Management – Process
EL Request for RCA in High
Very High Incident
Incident Ticket
14. Measure the
15. Knowledge base and KEDB recurring of incidents
will be updated after SIP Closure

1. SAM Team will raise the problem ticket


and assign it to respective CoE for RCA

11. From RCA Lessons 12. SIP Actions


13. SIP Actions
Learnt and derive the will be raised on
Tracked till they
2. Incident / Recurring Incident details will be SIP action items respective
closed
captured from wiki.cloud4c.com Owners

3. RCA will be 9. Permanent fixes


4. RCA will be 5. RCA will be 10. Problem ticket
prepared by will be applied in
shared to Quality verified with will be closed after
respective CoE/ concurrence with
for approval evidences by QC permanent fix
TLO customer
6. RCA will be
shared to 5.1 Steering
PM/TLO/CSM committee will vet
the RCA

7. RCA will be shared


8. RCA Reaches to
to Customer by
Customer
TLO/CSM
95
Change Management- Process

Low Priority & Low


Apply Get Approval Risk through Std .
Permanent from customer Pre approved
Fix Change

PM Will Capture
Incident
all changes
Resolution
needed - POA
prepared
High Priority & Low Risk
Low Priority & High Risk
through CAB success
Risk & Impact Close the
PPM Output Analysis will be Success / Fail Change
done through COE
Fail

PIR to be
done
Org High Priority & High
Transform Risk
Changes through Emergency Identify Root Cause and Work on
CAB . improvement areas through SIP
SOP Associated with Ticket

My Shift Ticket

Zabbix Alert

Alert triggers from


Alerting System
Solman Alert

System links Alert


Wiki
with SOP
SOP associated
Ticket raised in
with Alert
Myshift

140 Solman Alerts


114 Zabbix Alerts
Note : Whenever new templates added to monitoring system, SOP are being created and associate them
with alerts at backend
4 Eye Approval Approach
Approver 1 Approver 2

Updates will be shared to customer over Resolve Ticket


Basis Support Resolves
Incident
the issue and send the TLO Provides Regional TLO / CSM
Request resolution for Approval Provides Approval
IR
peer Review

Service Request
SR

EMS

EMS Basis Support TLO Resolves the issue


Regional TLO
/CoE Engineer and send the resolution
/
Approves the
for peer review
Change Request TLO Resolution

CR

Task Ticket
TT

CoE Resolves the issue CoE Manager


Alert Ticket TLO Approves the
and send the resolution Approves the
Resolution
for peer review Resolution
AR
DC - Maintenance Activity – Process

Notification from Notification from


Remote Location Infrastructure Team

01. SAM Team will receives the ticket in


MyShift when an email send to
[email protected] 09. After
Maintenance
activity closure
02. For any Critical Activity, SAM Team will
notification will be
Collate the information from Program
shared
Manager associated to that DC

04. TLO will add 07. Post Confirmation


03. Plan Of 08. As per POA SAM
risk and impact 05. Change will be from EL, Formal
Action will be will initiate a bridge
and raise Change discussed in CAB notification will be
shared to TLO call
Ticket shared

06. Approved
Changes will be 10. In case of failure
RCA process will be
discussed with EL followed

99
WIKI – Knowledge Management System

Project Handover Build Team will


BUILD PHASE

upload Handover and


Landscape in Wiki
and provide
Technical
information to Run
Landscape

SOP’s

RCA’s
TLO, SAP Support,
RUN PHASE

Landscape
CSM will be using
Documents
these documents
in day to day Project Handover
Operations Documents
Service Request
Templates
Quality team will add
Engineering team will
Video Tutorials RCAs and Process
add SOP’s
documents to Wiki
WIKI – Landscape Documentation

 All the SID, IP address should be clearly mentioned


 Ensure Quality, production, development systems are clearly
mentioned with the details

 ERP systems information to be clearly mentioned


 All the customer landscape details to be presented on the
documents
 Landscape overview to be clearly explained
CMC|Customer Maintenance Calendar

PoA Identification 01

PoA Preparation 02

PoA Consolidation & CMC Conversion 03

PoA Approval 04

PoA Implementation 05
PoA Identification 01
Permanent
Permanent Fixes
Cloud4C Delivery
Management Team identifies
the RCA for outages and
SIP Database
categorizes them into one of Service Improvement Plan
the categories as mentioned HANA, Sybase and MaxDB tasks
in the diagram. Eg: SAP Infra,
SAP HANA & Basis and
Sybase etc.
PPM Infrastructure
Activities
DC Maintenance, VMWare,
Network, Storage tasks

Plan of Action (PoA) is Security EWA


then created for those Security Tasks
topics wherever EWA Corrective action plans.
necessary and
communicated internally DC Maintenance Tasks
at Cloud4C Product Related Planned
SAP Product Related / Planned Tasks
AdHoc
SAP Ad-hoc Tasks
CMC – Topics

EWA Corrective Action Plan Tasks


EWA (Early Watch Alert) report Service
summary shows the executive summary
where the overall rating of the SAP SIP
System is reported. The rating can either
be GREEN, YELLOW or RED. If there is
01 05 Service Improvement Plan to be
implemented based on the
insufficient data the rating will be GRET action items derived during the
as the report is unable to properly make root cause analysis across all
an assessment. The KPI indicators with the landscapes.
RED rating fall under the EWA CAP
(Corrective Action Plan) category of the
downtime tracker.
02 04 Permanent Fixes
Infrastructure Tasks Permanent Fix/Solution to be
(VMWare, Network) implemented based on the
action items derived during the
Infrastructure tasks include root cause analysis for
storage activities, Network individual Customers.
activities, Switch Firmware
Upgrades, LB Upgrade, etc. 03
DB (Sybase /HANA/MaxDB) Tasks
DB tasks includes Sybase ASE
patches, HANA DB upgrade,
HANA revision Upgrade, etc.,
CMC – Topics Contd…

PPM
(Preventive & Proactive Maintenance)
Patch management is a process that
must be done routinely and should be as
all-encompassing as possible to be most DC Maintenance Tasks
effective. Patch management plays an
important role in upholding a good
06 10 As part of C4C sustained effort
to provide clients with a State-
enterprise security posture. Having of-art Datacenter infrastructure,
multiple security controls, of which patch we will be performing activities
management is a part, is the most in Cloud4C Data Centers.
effective means of protecting against
potential threats.
07 09 SAP Adhoc Tasks
Security SAP Planned Tasks include
(Go-Gemba) Tasks SAP SID renames activities,
Security (Go-Gemba) Audit R3trans Kernel Upgrade, DR
tasks include Network Security Tests, etc.,
tasks, OS Security tasks, DB
Security tasks, Application
Security tasks, Encryption 08
SAP Product Related Planned Tasks
tasks, etc.,
SAP Planned Tasks include
system restarts, SAP Kernel
Upgrades, SAP Parameter
changes, SAP SPS Upgrades,
etc.,
PoA Preparation 02

 If the category falls under SAP Infra, the respective Infra


COE team prepares customer-specific POA for each
category
01
 If the category falls under SAP HANA or BASIS, the SAP
BASIS COE prepare a customer-specific POA for each
02
category
SAP Infra
 If the category falls under Sybase, the Sybase COE
prepares a customer-specific POA for each category

 After preparation & review of the PoA, it would be

01 03
uploaded to the JAM page as per region of the customer
(https://jam4.sapjam.com/groups/YWo5PrH7gYvNx5XuZX
qReD/content?folder_id=zjK4CHhQjhdu1DBzem3m6V) BASIS &
HANA Sybase
 On upload, the CSM / TLO would inform the EL about the
availability of the PoA on JAM
Downtime Request | Process Flow

CUSTOMERS

SAP EL / CDM

Gain approval for proposed


downtime

POA will be shared OR


along with the
proposed
If customer doesn’t approve
downtime
proposed downtime, EL/CDM should
get the next approved downtime
from customer

POA will be shared

SAP COE CSM TLO


PoA Consolidation & CMC Conversion 03
 The TLO consolidates all the customer-specific POAs for their accounts and prepares a customer TLO tracker, which is
also uploaded to the central repository (C4C Internal). ( https://sharing.cloud4c.com/index.php/s/GX22kQGKYnWEsRx )

 If the accounts are CSM Led Accounts, the TLOs share the customer TLO tracker and the consolidated POAs with the
respective CSMs.

 The TLO tracker is converted to a CMC (Customer Maintenance Calendar).


 This is done by CSM for CSM led accounts
 This is done by TLO for non CSM led accounts

Note :

 CMC to be prepared for the downtimes minimum of 4-6weeks in advance. This would provide enough lead time for the
EL / CDM to check and confirm with the customer.

 CMC for the month of month (n), should be created and shared with EL /CDM, latest by 5th of month (n-1), with PROD
systems taking priority over the non-PROD. Eg: For Sep 2020, the PoA should be shared with EL by 5th Aug 2020

 CMC to be prepared in a detailed self explanatory manner with all related activities grouped together.
PoA – Approval 04
 On or Before 5th of every month, TLO/CSM should upload the CMC to the CtrlS-HEC Delivery Share Jam Portal (Region
& Customer wise).
 https://jam4.sapjam.com/groups/YWo5PrH7gYvNx5XuZXqReD/content?folder_id=zjK4CHhQjhdu1DBzem3m6V

 At any given point in time, the EL can refer to the calendar in JAM page and inform the customer

 Any update to the calendar in JAM, the TLO/CSM is supposed to inform the EL

 After uploading the CMC to the Jam Portal, EL to be notified by respective TLO/CSM.

 Once EL is notified, EL will reach out to Customer within a week time.

 By 20th of every month, EL should get the approval for the proposed downtime OR customer agreed downtime
PoA Approval received 04

CUSTOMERS
If Customer is providing the approval, EL
01 should notify the TLO/CSM at least 4 weeks
ahead of the proposed downtime.
Downtime
approved

CDM

Notify 4 weeks
Once downtime approval confirmation is ahead of the
downtime
received from the EL, TLO/CSM will create a
02 Resolve ticket for Customer/EL with the
Customer approval & POA attached.
CS TLO
M
Resolve Ticket creation
PoA Approval not received 04

If Customer is not providing the approval for the


01 proposed downtime, EL should take a Risk letter
from the Customer for not providing the
downtime which are falling in the below CUSTOMERS
categories and follow up for the next possible
downtime.
EL to follow
 Availability Downtime not
up for next
approved
 Security downtime

For all other categories, EL should follow up


02 continuously with Customer and get the
downtime in the earliest possible date.
CDM
PoA Implementation 05

01 After receiving the downtime confirmation from the EL, TLO/CSM will inform the respective stakeholders
(SAP Infra, Sybase team & SAP SAM) about the activity.

02 Respective stakeholders will create the internal CR tickets and undergo CAB approval.

03 After receiving the CAB approval, respective stakeholders will keep the SAM team notified. Hence, SAM team
will send internal notification to TLO and TLO will inform the EL/Customer before two days of the activity
execution.

04 CSM / TLO & SAM to ensure the activity is done with experienced / knowledgeable resources to avoid human
errors.

05 Activity will be executed as per the downtime and the status will be updated to EL & Customer through the
Resolve ticket by the TLO/CSM.
CMC – SLA Timeline

On or before 5th of every


month TLO/CSM should upload
the CMC to the CtrlS-HEC Activity Execution as per the
Delivery Share Jam Portal agreed downtime

By 20th of every month, EL


should get the approval for the
proposed downtime OR
customer agreed downtime
Make the Right Move
with us Today!

You might also like