SAS Data Analytics Cycle
SAS Data Analytics Cycle
Decisions at Scale............................................................... 2
What Are the Challenges?.......................................................... 2
Case Studies.......................................................................10
UK Financial Institution:
Modernizing Its Analytical Life Cycle......................................10
Introduction
Consider this scenario: For starters, you need powerful, easy-to-use software that can
help you wrangle your data into shape and quickly create many
An organization has hundreds of analytical models accurate predictive models. Then it takes powerful, integrated
embedded in production systems to support decision processes to manage your analytical models for optimal perfor-
making in marketing, pricing, credit risk, operational risk, mance throughout their entire life cycle. Both IT and analytics teams
fraud and finance functions. need efficient, repeatable processes and a reliable architecture
for managing data, communicating the rationale, and tracing
Analysts across different business units develop their the predictive analytics models through the deployment cycle.
models without any formalized or standardized processes
for storing, deploying and managing them. Some models And most importantly, the key to analytical success is quickly
don’t have documentation describing the model’s turning data into insights into actions, which means you must
owner, business purpose, usage guidelines or other efficiently integrate accurate predictive models into the produc-
information necessary for managing the model or tion systems that drive your automated decisions.
explaining it to regulators.
The growing complexity and magnitude of managing poten- 4. A way to manage and monitor the analytical models to
tially hundreds or thousands of models in flux puts organiza- ensure they are performing well and continue to deliver the
tions at the cusp of an information revolution. The old and right answers.
inefficient handcrafted approach must evolve to a more effec- 5. An architecture and processes that can grow to address new
tive automated process. needs, like streaming data and building more detailed
predictive models faster than ever.
James Taylor and Neil Raden, • Delays. Due to processes that are often manual and ad hoc,
Smart (Enough) Systems it can take months to get a model implemented into produc-
tion systems. Because it takes so long to move models
through development and testing phases, they can be stale
Decisions at Scale by the time they reach production. Or they never get
deployed at all. Internal and external compliance issues can
How many operational decisions are made in your organization make the process even more challenging.
each day? Probably more than you can even imagine. For
• Difficulty defining triggers. The step of translating answers
example, take a financial institution. How many credit card trans-
from analytical models into business actions for operational
actions are processed each hour? (At Visa, the transaction rate
decisions requires clear, agreed-upon business rules. These
can reach 14,000 per second. See page 3.) Each one represents
business rules need to become part of the governed envi-
an operational decision – to allow or disallow the transaction to
ronment because they define how the results of the models
go through based on a calculated fraud risk score. And while
are used. For example, a fraud detection model might return
each operational decision or transaction may have a low indi-
a fraud risk score as a number between 100 and 1,000
vidual risk, the large number of these decisions that are made
(similar to a FICO credit score). It is up to the business to
hourly or daily greatly compounds the associated risk.
decide what level of risk requires action. If the trigger for a
fraud alert is set too high, fraud might go unnoticed. If the
That’s why the ability to produce good operational decisions
trigger for fraud is set too low, the alerts create too many
very quickly, while incorporating ever-increasing volumes of
false positives. Both outcomes will decrease the value these
data, can mean the difference between fraud and no fraud – in
models create and also reduce trust in the results.
business success or failure.
• Poor results. Too often poorly performing models remain in
production even though they are producing inaccurate
So what does it take to make a lot of good, fast operational deci-
results that lead to bad business decisions. Model results will
sions that consistently reflect overall organizational strategy and
change as the data changes to new conditions and behav-
at the same time keep your organization humming happily
iors. The main reasons for this situation are a lack of a central
along, faster and better than anyone else?
repository for models and no consistent metrics to deter-
1. Operational applications that use data to produce answers mine when a model needs to be refreshed or replaced.
for people (or systems) so that the right actions are taken.
2. Insightful and up-to-date analytical models that the business
can rely on for optimal decisions at the right time.
3. The integration of business rules and predictive analytics into
operational decision flows that provide the instructional
insight needed for vetted, trusted decisions.
3
Read more
4
Prepare Implement
Ask
lore
Discovery Deployment
Act
Exp
l Eva
M o de lu ate
organizations still report that they spend an inordinate The Deployment Phase of the
amount of time, sometimes up to 80 percent, dealing with
Analytical Life Cycle
data preparation tasks. The majority of time in the discovery
phase should be spent on exploring data and creating good • Implement your models. This is where you take the insights
models instead of preparing data. learned from the discovery phase and put them into action
• Explore the data. Interactive, self-service visualization tools using repeatable, automated processes. In many organiza-
need to serve a wide range of users (from the business tions this is the point where the analytical modeling process
analyst with no statistical knowledge to the analytically savvy slows dramatically because there is no defined transition
data scientist) so they can easily search for relationships, between discovery and deployment, or collaboration
trends and patterns to gain deeper understanding of the between the model developers and IT deployment archi-
data. In this step, the question and the approach formed in tects, much less optimized automation. In most organizations
the initial “ask” phase of the project will be refined. Ideas on the deployment environment is very different from the
how to address the business problem from an analytical discovery environment, especially when the predictive
perspective are developed and tested. While examining the models are supporting operational decision making. Often,
data, you may find the need to add, delete or combine vari- IT has to apply rigorous governance policies to this environ-
ables to create more precisely focused models. Fast, interac- ment to ensure service-level agreements with the business.
tive tools help make this an iterative process, which is crucial By integrating the discovery and deployment phases, you
for identifying the best questions and answers. can create an automated, flexible and repeatable transition
• Model the data. In this stage, numerous analytical and that improves operational decisions. Additionally, a trans-
machine-learning modeling algorithms are applied to the parent, governed process is important for everyone – espe-
data to find the best representation of the relationships in the cially auditors. Once built, the model is registered, tested or
data that will help answer the business question. Analytical validated, approved and declared ready to be used with
tools search for a combination of data and modeling tech- production data (embedded into operational systems).
niques that reliably predict a desired outcome. There is no • Act on new information. There are two types of decisions
single algorithm that always performs best. The “best” algo- that can be made based on analytical results. Strategic deci-
rithm for solving the business problem depends on the data. sions are made by humans who examine results and take
Experimentation is key to finding the most reliable answer, action, usually looking to the future. Operational decisions
and automated model building can help minimize the time are automated – like credit scores or recommended best
to results and boost the productivity of analytical teams. offers. They don’t require human intervention because the
rules that humans would apply can be coded into the
In the past, data miners and data scientists were only able production systems. More and more organizations are
to create several models in a week or month using manual looking to automate operational decisions and provide real-
model-building tools. Improved software and faster time results to reduce decision latencies. Basing operational
computers have sped up the model-building process so decisions on answers from analytical models also makes
hundreds or even thousands of models can be created these decisions objective, consistent, repeatable and
today in the same time frame. But that brings to the surface measurable. Integration of models with enterprise decision
another issue – how to quickly and reliably identify the one management tools enables organizations to build compre-
model (out of many) that performs best? Model “tourna- hensive and complete operational decision flows. These
ments” provide a way to compare many different competing combine analytical models with business-rule-based triggers
algorithms with the opportunity to choose the one that to produce the best automated decisions. And because this
provides the best results for a specific set of data. With auto- is formally defined within the decision management tool,
mated tournaments of machine-learning algorithms and updates and refinements to changing conditions are easy
clearly defined metrics to identify the champion model, this – improving business agility and governance. Once
has become an easier process. Analysts and data scientists approved for production, the decision management tool
can spend their time focusing on more strategic questions applies the model to new operational data, generating the
and investigations. predictive insights necessary for better actions.
6
• Evaluate your results. The next and perhaps most important development are usually the responsibility of business
step is to evaluate the outcome of the actions produced by analysts and data scientists. Deployment, especially when it
the analytical model. Did your models produce the correct includes integration into operational business processes, is
predictions? Were tangible results realized, such as increased managed by IT again (though this could be a different IT
revenue or decreased costs? With continuous monitoring group than the data management group providing data).
and measurement of the models’ performance based on
standardized metrics, you can evaluate the success of these The net effect is that the models, which are supposed to yield
assets for your business. That evaluation can then feed the solid business insights, lead instead to suboptimal decisions,
next iteration of the model, creating a continuous machine- missed opportunities and misguided actions. But it doesn’t have
learning loop. If you identify degradation of analytical to be that way!
models, you can define the optimal strategy to refresh them
so they continue to produce the desired results. With
increasing numbers of analytical models, automation is From Concept to Action:
necessary to quickly identify models that need the most
attention, and even deliver automated retraining.
How to Create an Efficient
• Ask again. Predictive models are not forever. The factors that Analytical Environment
drive the predictions in a model change over time, your In an effective analytical environment, data is rapidly created
customers change over time, competitors enter or leave the and accessed in the correct structure for exploration and model
market, and new data becomes available. You have to refresh development. Models are rapidly built and tested, and
even the most accurate models, and organizations will need deployed into a production environment with minimal delay.
to go through the discovery and deployment steps again. It’s Production models quickly generate trusted output. Model
a constant and evolving process. If a model degrades, it is performance is constantly monitored, and underperforming
recalibrated by changing the model coefficients or rebuilt models are quickly replaced by more up-to-date models.
with existing and new characteristics. When the model no
longer serves a business need, it is retired. In short, a successful analytics strategy means more than
creating a powerfully predictive model; it is about managing
It is easy to imagine the many ways this process can go wrong. each of these lifecycle stages holistically for a particular model
Organizations often take months, sometimes years, to move and across the entire portfolio of models. This is no easy feat.
through this end-to-end process. There are many common
complicating factors: Consider that analysts and data scientists don’t just develop one
• The needed data sources might be scattered across your model to solve a business problem. They develop a set of
organization. competing models and use different techniques to address
complex problems. They will have models at various stages of
• Data may need to be integrated and cleansed multiple times
development and models tailored for different product lines
to support a variety of analytical requirements.
and business problems. An organization can quickly find itself
• It can take a long time for models to be manually translated
managing thousands of models.
to different programming languages for integration with
critical operational systems, which can include both batch
Furthermore, the model environment is anything but static.
and real-time systems.
Models will be continually updated as they are tested and as
• Organizations might be slow to recognize when a model new results and data become available. The goal is to build the
needs to be changed, so they forge ahead with bad deci- best predictive models possible, using the best data available.
sions based on outdated model results.
• Many of the steps in the analytical life cycle are iterative in Predictive models are high-value organizational assets, and
nature and might require going back to a previous step in success requires more than relying solely on the technology
the cycle to add and/or refresh data. element. Organizations must also closely look at people and
• Different personas add complexity to the process, which processes. For example, it’s important to constantly upgrade
makes collaboration and documentation very important. In business and technical analytical skills so that the most impor-
many organizations, data preparation in the discovery phase tant business issues are identified and analytical insights can be
is handled by the IT unit, while data exploration and model applied to operational processes.
7
The analytical life cycle is iterative and collaborative in nature. Data Preparation and Exploration:
Staff members with different backgrounds and skills are
A Systematic Approach
involved at various stages of the process. A business manager
has to clearly identify an issue or problem that requires • Data preparation. SAS® Data Management enables you to
analytics-driven insights, then make the appropriate business profile and cleanse data and create extract, load and trans-
decision and monitor the returns from the decision. A business form (ELT) routines that produce analytical data marts, using
analyst conducts data exploration and visualization and works to only the data that is required from the database. The data is
identify key variables influencing outcomes. The IT and data staged in the database for fast loading, transformed into a
management teams help facilitate data preparation and model structure fit for model building and summarized to create
deployment and monitoring. A data scientist or data miner derived fields. These processes can be automated and
performs more complex exploratory analysis, descriptive scheduled in batch or run ad hoc and in real time,
segmentation and predictive modeling. depending on the stage of the analytical life cycle. Self-
service data preparation and data wrangling tools like SAS
To get the best analytics results, organizations need to put Data Loader for Hadoop help business analysts and data
people with the right skills in place, and enable them to work scientists streamline access, blending and cleansing of data
collaboratively to perform their roles. without burdening IT. Event stream processing from SAS
leads to high-volume throughput of hundreds of millions of
How SAS® Can Help Across the events per second – with low-latency response times. It helps
you know what needs immediate attention, what can be
Analytical Life Cycle ignored and what should be stored. Finally, in-database
processing is used to reduce data movement and improve
SAS uses integrated components to reduce the time to value for
performance.
the modeling life cycle – eliminating redundant steps and
supporting cohesion across the information chain from data to • Data exploration. SAS Visual Analytics lets business analysts
decision management. Consistent processes and technologies easily discover important relationships in data and quickly
for model development and deployment reduce the risks zero in on areas of opportunity or concern, uncover unex-
involved in the modeling process while supporting collabora- pected patterns, examine data distributions, find the preva-
tion and governance among key business and IT stakeholders. lence of extreme values, and identify important variables to
incorporate in the model development process.
Case Studies The Magic don’t have a crystal ball, but they do have SAS Enterprise
Miner, which allowed them to better understand their data
With a predictive analytics lifecycle approach, the “after” scenario and develop analytic models that predict season ticket holder
looks quite different from the usual modus operandi – and renewals. The data mining tools allowed the team to accomplish
creates a serious competitive advantage. more accurate scoring that led to a difference – and market
improvement – in the way it approached customer retention
UK Financial Institution: and marketing.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of
SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product
names are trademarks of their respective companies. Copyright © 2016, SAS Institute Inc. All rights reserved.
106179_S150641.0216