0% found this document useful (0 votes)
14 views14 pages

Data Analytics 1.2 - Analytics Overview

Paper of study

Uploaded by

Isaac Hermida
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views14 pages

Data Analytics 1.2 - Analytics Overview

Paper of study

Uploaded by

Isaac Hermida
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Real-World Data Mining: Introduc on to Analy cs

By Dursun Delen

Date: Dec 15, 2014

Sample Chapter is provided courtesy of Pearson FT Press.

Dursun Delen discusses the details of analy cs, including where data mining fits in, the
sudden popularity of analy cs, the applica on areas and main challenges of analy cs, and
the cu ng edge of analy cs.

Business analy cs is a rela vely new term that is gaining popularity in the business world like nothing else in
recent history. In general terms, analy cs is the art and science of discovering insight—by using sophis cated
mathema cal models along with a variety of data and expert knowledge—to support solid, mely decision
making. In a sense, analy cs is all about decision making and problem solving. These days, analy cs can be
defined as simply as “the discovery of meaningful pa erns in data.” In this era of abundant data, analy cs
tends to be used on large quan es and varie es of data. Although analy cs tends to be data focused, many
applica ons of analy cs involve very li le or no data; instead, those analy cs projects use mathema cal
models that rely on process descrip on and expert knowledge (e.g., op miza on and simula on models).

Business analy cs is the applica on of the tools, techniques, and principles of analy cs to complex business
problems. Firms commonly apply analy cs to business data to describe, predict, and improve business
performance. Firms have used analy cs in many ways, including the following:

To improve their rela onships with their customers (encompassing all phases of customer
rela onship management—acquisi on, reten on, and enrichment), employees, and other
stakeholders
To iden fy fraudulent transac ons and odd behaviors—and, in doing so, saving money
To enhance product and service features and their pricing, which would lead to be er customer
sa sfac on and profitability
To op mize marke ng and adver sing campaigns so they can reach more customers with the right
kind of message and promo ons with the least amount of expense
To minimize opera onal costs by op mally managing inventories and alloca ng resources wherever
and whenever they are needed by using op miza on and simula on modeling
To empower employees with the informa on and insight they need to make faster and be er
decisions while they are working with customers or customer-related issues

The term analy cs, perhaps because of its rapidly increasing popularity as a buzzword, is being used to
replace several previously popular terms, such as intelligence, mining, and discovery. For example, the term
business intelligence has now become business analy cs; customer intelligence has become customer
analy cs, Web mining has become Web analy cs, knowledge discovery has become data analy cs, etc.
Modern-day analy cs can require extensive computa on because of the volume, variety, and velocity of data
(which we call Big Data). Therefore, the tools, techniques, and algorithms used for analy cs projects
leverage the most current, state-of-the-art methods developed in a variety of fields, including management
science, computer science, sta s cs, data science, and mathema cs. Figure 1.1 shows a word cloud that
includes concepts related to analy cs and Big Data.
Figure 1.1 Analy cs and Big Data Word Cloud

Is There a Difference Between Analy cs and Analysis?


Even though the two terms analy cs and analysis are o en used interchangeably, they are not the same.

Basically, analysis refers to the process of separa ng a whole problem into its parts so that the parts can be
cri cally examined at the granular level. It is o en used when the inves ga on of a complete system is not
feasible or prac cal, and the system needs to be simplified by being decomposed into more basic
components. Once the improvements at the granular level are realized and the examina on of the parts is
complete, the whole system (either a conceptual or physical system) can then be put together using a
process called synthesis.

Analy cs, on the other hand, is a variety of methods, technologies, and associated tools for crea ng new
knowledge/insight to solve complex problems and make be er and faster decisions. In essence, analy cs is a
mul faceted and mul disciplinary approach to addressing complex situa ons. Analy cs take advantage of
data and mathema cal models to make sense of the complicated world we are living in. Even though
analy cs includes the act of analysis at different stages of the discovery process, it is not just analysis but
also includes synthesis and other complemen ng tasks and processes.

Where Does Data Mining Fit In?


Data mining is the process of discovering new knowledge in the forms of pa erns and rela onships in large
data sets. The goal of analy cs is to convert data/facts into ac onable insight, and data mining is the key
enabler of that goal. Data mining has been around much longer than analy cs, at least in the context of
analy cs today. As analy cs became an overarching term for all decision support and problem-solving
techniques and technologies, data mining found itself a rather large space within that arc, ranging from
descrip ve explora on of iden fying rela onships and affini es among variables (e.g., market-basket
analysis) to developing models to es mate future values of interes ng variables. As we will see later in this
chapter, within the taxonomy of analy cs, data mining plays a key role at every level, from the most simple
to the most sophis cated.

Why the Sudden Popularity of Analy cs?


Analy cs is a buzzword of business circles today. No ma er what business journal or magazine you look at, it
is very likely that you will see ar cles about analy cs and how analy cs is changing the way managerial
decisions are being made. It has become a new label for evidence-based management (i.e., evidence/data-
driven decision making). But why has analy cs become so popular? And why now? The reasons (or forces)
behind this popularity can be grouped into three categories: need, availability and affordability, and culture
change.

Need
As we all know, business is anything but “as usual” today. Compe on has been characterized progressively
as local, then regional, then na onal, but it is now global. Large to medium to small, every business is under
the pressure of global compe on. The tariff and transporta on cost barriers that sheltered companies in
their respec ve geographic loca ons are no longer as protec ve as they once were. In addi on to (and
perhaps because of) the global compe on, customers have become more demanding. They want the
highest quality of products and/or services with the lowest prices in the shortest possible me. Success or
mere survival depends on businesses being agile and their managers making the best possible decisions in a
mely manner to respond to market-driven forces (i.e., rapidly iden fying and addressing problems and
taking advantage of the opportuni es). Therefore, the need for fact-based, be er, and faster decisions is
more cri cal now than ever before. In the midst of these unforgiving market condi ons, analy cs is
promising to provide managers the insights they need to make be er and faster decisions, which help
improve their compe ve posture in the marketplace. Analy cs today is widely perceived as saving business
managers from the complexi es of global business prac ces.

Availability and Affordability


Thanks to recent technological advances and the affordability of so ware and hardware, organiza ons are
collec ng tremendous amounts of data. Automated data collec ons systems—based on a variety of sensors
and RFID—have significantly increased the quan ty and quality of organiza onal data. Coupled with the
content-rich data collected from Internet-based technologies such as social media, businesses now tend to
have more data than they can handle. As the saying goes, “They are drowning in data but starving for
knowledge.”

Along with data collec on technologies, data processing technologies have also improved significantly.
Today’s machines have numerous processors and very large memory capaci es, so they are able to process
very large and complex data in a reasonable me frame—o en in real me. The advances in both hardware
and so ware technology are also reflected in the pricing, con nuously reducing the cost of ownership for
such systems. In addi on to the ownership model, along came the so ware- (or hardware-) as-a-service
business model, which allows businesses (especially small to medium-size businesses with limited financial
power) to rent analy cs capabili es and pay only for what they use.

Culture Change
At the organiza onal level, there has been a shi from old-fashioned intui on-driven decision making to
new-age fact-/evidence-based decision making. Most successful organiza ons have made a conscious effort
to shi to data-/evidence-driven business prac ces. Because of the availability of data and suppor ng IT
infrastructure, such a paradigm shi is taking place faster than many thought it would. As the new
genera on of quan ta vely savvy managers replaces the baby boomers, this evidence-based managerial
paradigm shi will intensify.

The Applica on Areas of Analy cs


Even though the business analy cs wave is somewhat new, there are numerous applica ons of analy cs,
covering almost every aspect of business prac ce. For instance, in customer rela onship management, a
wealth of success stories tell of sophis cated models developed to iden fy new customers, look for up-
sell/cross-sell opportuni es, and find customers’ with a high propensity toward a ri on. Using social media
analy cs and sen ment analysis, businesses are trying to stay on top of what people are saying about their
products/services and brands. Fraud detec on, risk mi ga on, product pricing, marke ng campaign
op miza on, financial planning, employee reten on, talent recrui ng, and actuarial es ma on are among
the many business applica ons of analy cs. It would be very hard to find a business issue where a number of
analy cs applica on could not be found. From business repor ng to data warehousing, from data mining to
op miza on analy cs, techniques are used widely in almost every facet of business.

The Main Challenges of Analy cs


Even though the advantages of and reasons for analy cs are evident, many businesses are s ll hesitant to
jump on the analy cs bandwagon. These are the main roadblocks to adop on of analy cs:

Analy cs talent. Data scien sts, the quan ta ve geniuses who can convert data into ac onable
insight, are scarce in the market; the really good ones are very hard to find. Because analy cs is
rela vely new, the talent for analy cs is s ll being developed. Many colleges have started
undergraduate and graduate programs to address the analy cs talent gap. As the popularity of
analy cs increases, so will the need for people who have the knowledge and skills to convert Big
Data into informa on and knowledge that managers and other decision makers need to tackle real-
world complexi es.
Culture. As the saying goes, “Old habits die hard.” Changing from a tradi onal management style
(o en characterized by intui on as the basis of making decision) to a contemporary management
style (based on data and scien fic models for managerial decisions and collec ve organiza onal
knowledge) is not an easy process to undertake for any organiza on. People do not like to change.
Change means losing what you have learned or mastered in the past and now needing to learn how
to do what you do all over again. It suggests that the knowledge (which is also characterized as
power) you’ve accumulated over the years will disappear or be par ally lost. The culture shi may be
the most difficult part of adop ng analy cs as the new management paradigm.
Return on investment. Another barrier to adop on of analy cs is the difficulty in clearly jus fying its
return on investment (ROI). Analy cs projects are complex and costly endeavors, and their return is
not immediately clear, many execu ves are having a hard me inves ng in analy cs, especially on
large scales. Will the value gained from analy cs outweigh the investment? If so, when? It is very
hard to convert the value of analy cs into jus fiable numbers. Most of the value gained from
analy cs is somewhat intangible and holis c. If done properly, analy cs could transform an
organiza on, pu ng it on a new and improved level. A combina on of tangible and intangible factors
needs to be brought to bear to numerically ra onalize investment and movement toward analy cs
and analy cally savvy management prac ce.
Data. The media is taking about Big Data in a very posi ve way, characterizing it as an invaluable
asset for be er business prac ces. This is mostly true, especially if the business understands and
knows what to do with it. For those who have no clue, Big Data is a big challenge. Big Data is not just
big; it is unstructured, and it is arriving at a speed that prohibits tradi onal collec on and processing
means. And it is usually messy and dirty. For an organiza on to succeed in analy cs, it needs to have
a well-thought-out strategy for handling Big Data so that it can be converted to ac onable insight.
Technology. Even though technology is capable, available, and, to some extent, affordable,
technology adop on poses another challenge for tradi onally less technical businesses. Although
establishing an analy cs infrastructure is affordable, it s ll costs a significant amount of money.
Without financial means and/or a clear return on investment, management of some businesses may
not be willing to invest in needed technology. For some businesses, an analy cs-as-a-service model
(which includes both so ware and the infrastructure/hardware needed to implement analy cs) may
be less costly and easier to implement.
Security and privacy. One of the most common cri cisms of data and analy cs is the security. We
o en hear about data breaches of sensi ve informa on, and indeed, the only completely secured
data infrastructure is isolated and disconnected from all other networks (which goes against the very
reason for having data and analy cs). The importance of data security has made informa on
assurance one of the most popular concentra on areas in informa on systems departments around
the world. At the same me that increasingly sophis cated techniques are being used to protect the
informa on infrastructure, increasingly sophis cated a acks are becoming common. There are also
concerns about personal privacy. Use of personal data about customers (exis ng or prospec ve),
even if it is within legal boundaries, should be avoided or carefully scru nized to protect an
organiza on against bad publicity and public outcry.
Despite the hurdles in the way, analy cs adop on is growing, and analy cs is inevitable for today’s
enterprises, regardless of size or industry segment. As the complexity in conduc ng business increases,
enterprises are trying to find order in the midst of the chao c behaviors. The ones that succeed will be the
ones fully leveraging the capabili es of analy cs.

A Longitudinal View of Analy cs


Although the buzz about it is rela vely recent, analy cs isn’t new. It’s possible to find references to corporate
analy cs as far back as the 1940s, during the World War II era, when more effec ve methods were needed
to maximize output with limited resources. Many op miza on and simula on techniques were developed
then. Analy cal techniques have been used in business for a very long me. One example is the me and
mo on studies ini ated by Frederick Winslow Taylor in the late 19th century. Then Henry Ford measured
pacing of assembly lines, which led to mass-produc on ini a ves. Analy cs began to command more
a en on in the late 1960s, when computers were used in decision support systems. Since then, analy cs has
evolved with the development of enterprise resource planning (ERP) systems, data warehouses, and a wide
variety of other hardware and so ware tools and applica ons.

The meline in Figure 1.2 shows the terminology used to describe analy cs since the 1970s. During the early
days of analy cs, prior to the 1970s, data was o en obtained from the domain experts using manual
processes (i.e., interviews and surveys) to build mathema cal or knowledge-based models to solve
constraint op miza on problems. The idea was to do the best with limited resources. Such decision support
models were typically called opera ons research (OR). The problems that were too complex to solve
op mally (using linear or non-linear mathema cal programming techniques) were tackled using heuris c
methods such as simula on models.

Figure 1.2 A Longitudinal View of the Evolu on of Analy cs

In the 1970s, in addi on to the mature OR models that were being used in many industries and government
systems, a new and exci ng line of models had emerged: rule-based expert systems (ESs). These systems
promised to capture experts’ knowledge in a format that computers could process (via a collec on of if–then
rules) so that they could be used for consulta on much the same way that one would use domain experts to
iden fy a structured problem and to prescribe the most probable solu on. ESs allowed scarce exper se to
be made available where and when needed, using an “intelligent” decision support system. During the
1970s, businesses also began to create rou ne reports to inform decision makers (managers) about what
had happened in the previous period (e.g., day, week, month, quarter). Although it was useful to know what
had happened in the past, managers needed more than this: They needed a variety of reports at different
levels of granularity to be er understand and address changing needs and challenges of the business.

The 1980s saw a significant change in the way organiza ons captured business-related data. The old prac ce
had been to have mul ple disjointed informa on systems tailored to capture transac onal data of different
organiza onal units or func ons (e.g., accoun ng, marke ng and sales, finance, manufacturing). In the
1980s, these systems were integrated as enterprise-level informa on systems that we now commonly call
ERP systems. The old mostly sequen al and nonstandardized data representa on schemas were replaced by
rela onal database management (RDBM) systems. These systems made it possible to improve the capture
and storage of data, as well as the rela onships between organiza onal data fields while significantly
reducing the replica on of informa on. The need for RDBM and ERP system emerged when data integrity
and consistency became an issue, significantly hindering the effec veness of business prac ces. With ERP, all
the data from every corner of the enterprise is collected and integrated into a consistent schema so that
every part of the organiza on has access to the single version of the truth when and where needed. In
addi on to the emergence of ERP systems—or perhaps because of these systems—business repor ng
became an on-demand, as-needed business prac ce. Decision makers could decide when they needed to or
wanted to create specialized reports to inves gate organiza onal problems and opportuni es.

In the 1990s, the need for more versa le repor ng led to the development of execu ve informa on systems
(decision support systems designed and developed specifically for execu ves and their decision-making
needs). These systems were designed as graphical dashboards and scorecards so that they could serve as
visually appealing displays while focusing on the most important factors for decision makers to keep track
of—the key performance indicators. In order to make this highly versa le repor ng possible while keeping
the transac onal integrity of the business informa on systems intact, it was necessary to create a middle
data er—known as a data warehouse (DW)—as a repository to specifically support business repor ng and
decision making. In a very short me, most large to medium-size businesses adopted data warehousing as
their pla orm for enterprise-wide decision making. The dashboards and scorecards got their data from a
data warehouse, and by doing so, they were not hindering the efficiency of the business transac on systems
—mostly referred to as enterprise resource planning (ERP) systems.

In the 2000s the DW-driven decision support systems began to be called business intelligence systems. As the
amount of longitudinal data accumulated in the DWs increased, so did the capabili es of hardware and
so ware to keep up with the rapidly changing and evolving needs of the decision makers. Because of the
globalized compe ve marketplace, decision makers needed current informa on in a very diges ble format
to address business problems and to take advantage of market opportuni es in a mely manner. Because
the data in a DW is updated periodically, it does not reflect the latest informa on. In order to elevate this
informa on latency problem, DW vendors developed a system to update the data more frequently, which
led to the terms real- me data warehousing and, more realis cally, right- me data warehousing, which
differs from the former by adop ng a data refreshing policy based on the needed freshness of the data items
(i.e., not all data items need to be refreshed in real me). Data warehouses are very large and feature rich,
and it became necessary to “mine” the corporate data to “discover” new and useful knowledge nuggets to
improve business processes and prac ces—hence the terms data mining and text mining. With the
increasing volumes and varie es of data, the needs for more storage and more processing power emerged.
While large corpora ons had the means to tackle this problem, small to medium-size companies needed
financially more manageable business models. This need led to service-oriented architecture and so ware
and infrastructure-as-a-service analy cs business models. Smaller companies therefore gained access to
analy cs capabili es on an as-needed basis and paid only for what they used, as opposed to inves ng in
financially prohibi ve hardware and so ware resources.

In the 2010s we are seeing yet another paradigm shi in the way that data is captured and used. Largely
because of the widespread use of the Internet, new data-genera on mediums have emerged. Of all the new
data sources (e.g., RFID tags, digital energy meters, clickstream Web logs, smart home devices, wearable
health monitoring equipment), perhaps the most interes ng and challenging is social networking/social
media. This unstructured data is rich in informa on content, but analysis of such data sources poses
significant challenges to computa onal systems, from both so ware and hardware perspec ves. Recently,
the term Big Data has been coined to highlight the challenges that these new data streams have brought
upon us. Many advancements in both hardware (e.g., massively parallel processing with very large
computa onal memory and highly parallel mul processor compu ng systems) and so ware/algorithms
(e.g., Hadoop with MapReduce and NoSQL) have been developed to address the challenges of Big Data.

It’s hard to predict what the next decade will bring and what the new analy cs-related terms will be. The
me between new paradigm shi s in informa on systems and par cularly in analy cs has been shrinking,
and this trend will con nue for the foreseeable future. Even though analy cs is not new, the explosion in its
popularity is very new. Thanks to the recent explosion in Big Data, ways to collect and store this data, and
intui ve so ware tools, data and data-driven insight are more accessible to business professionals than ever
before. Therefore, in the midst of global compe on, there is a huge opportunity to make be er managerial
decisions by using data and analy cs to increase revenue while decreasing costs by building be er products,
improving customer experience, and catching fraud before it happens, improving customer engagement
through targe ng and customiza on—all with the power of analy cs and data. More and more companies
are now preparing their employees with the know-how of business analy cs to drive effec veness and
efficiency in their day-to-day decision-making processes.

A Simple Taxonomy for Analy cs


Because of the mul tude of factors related to both the need to make be er and faster decisions and the
availability and affordability of hardware and so ware technologies, analy cs is gaining popularity faster
than any other trends we have seen in recent history. Will this upward exponen al trend con nue? Many
industry experts think it will, at least for the foreseeable future. Some of the most respected consul ng
companies are projec ng that analy cs will grow at three mes the rate of other business segments in
upcoming years; they have also named analy cs as one of the top business trends of this decade (Robinson
et al., 2010). As interest in and adop on of analy cs have grown rapidly, a need to characterize analy cs into
a simple taxonomy has emerged. The top consul ng companies (e.g., Accenture, Gartner, and IDT) and
several technologically oriented academic ins tu ons have embarked on a mission to create a simple
taxonomy for analy cs. Such a taxonomy, if developed properly and adopted universally, could create a
contextual descrip on of analy cs, thereby facilita ng a common understanding of what analy cs is,
including what is included in analy cs and how analy cs-related terms (e.g., business intelligence, predic ve
modeling, data mining) relate to each other. One of the academic ins tu ons involved in this challenge is
INFORMS (Ins tute for Opera ons Research and Management Science). In order to reach a wide audience,
INFORMS hired Capgemini, a strategic management consul ng firm, to carry out a study and characterize
analy cs.

The Capgemini study produced a concise defini on of analy cs: “Analy cs facilitates realiza on of business
objec ves through repor ng of data to analyze trends, crea ng predic ve models for forecas ng and
op mizing business processes for enhanced performance.” As this defini on implies, one of the key findings
from the study is that execu ves see analy cs as a core func on of businesses that use it. It spans many
departments and func ons within organiza ons, and in mature organiza ons, it spans the en re business.
The study iden fied three hierarchical but some mes overlapping groupings for analy cs categories:
descrip ve, predic ve, and prescrip ve analy cs. These three groups are hierarchical in terms of the level of
analy cs maturity of the organiza on. Most organiza ons start with descrip ve analy cs, then move into
predic ve analy cs, and finally reach prescrip ve analy cs, the top level in the analy cs hierarchy. Even
though these three groupings of analy cs are hierarchical in complexity and sophis ca on, moving from a
lower level to a higher level is not clearly separable. That is, a business can be in the descrip ve analy cs
level while at the same me using predic ve and even prescrip ve analy cs capabili es, in a somewhat
piecemeal fashion. Therefore, moving from one level to the next essen ally means that the maturity at one
level is completed and the next level is being widely exploited. Figure 1.3 shows a graphical depic on of the
simple taxonomy developed by INFORMS and widely adopted by most industry leaders as well as academic
ins tu ons.
Figure 1.3 A Simple Taxonomy for Analy cs

Descrip ve analy cs is the entry level in analy cs taxonomy. It is o en called business repor ng because of
the fact that most of the analy cs ac vi es at this level deal with crea ng reports to summarize business
ac vi es in order to answer ques ons such as “What happened?” and “What is happening?” The spectrum
of these reports includes sta c snapshots of business transac ons delivered to knowledge workers (i.e.,
decision makers) on a fixed schedule (e.g., daily, weekly, quarterly); dynamic views of business performance
indicators delivered to managers and execu ves in a easily diges ble form—o en in a dashboard-looking
graphical interface—on a con nuous manner; and ad hoc repor ng where the decision maker is given the
capability of crea ng his or her own specific report (using an intui ve drag-and-drop graphical user interface)
to address a specific or unique decision situa on.

Descrip ve analy cs is also called business intelligence (BI), and predic ve and prescrip ve analy cs are
collec vely called advanced analy cs. The logic here is that moving from descrip ve to predic ve and/or
prescrip ve analy cs is a significant shi in the level of sophis ca on and therefore warrants the label
advanced. BI has been one of the most popular technology trends for informa on systems designed to
support managerial decision making since the start of the century. It was popular (to some extent, it s ll is in
some business circles) un l the arrival of the analy cs wave. BI is the entrance to the world of analy cs,
se ng the stage and paving the way toward more sophis cated decision analysis. Descrip ve analy cs
systems usually work off a data warehouse, which is a large database specifically designed and developed to
support BI func ons and tools.

Predic ve analy cs comes right a er descrip ve analy cs in the three-level analy cs hierarchy.
Organiza ons that are mature in descrip ve analy cs move to this level, where they look beyond what
happened and try to answer the ques on “What will happen?” In the following chapters, we will cover the
predic ve capabili es of these analy cs techniques in depth as part of data mining; here we provide only a
very short descrip on of the main predic ve analy cs classes. Predic on essen ally is the process of making
intelligent/scien fic es mates about the future values of some variables, like customer demand, interest
rates, stock market movements, etc. If what is being predicted is a categorical variable, the act of predic on
is called classifica on; otherwise, it is called regression. If the predicted variable is me dependent, the
predic on process is o en called me-series forecas ng.
Prescrip ve analy cs is the highest echelon in analy cs hierarchy. It is where the best alterna ve among
many courses of ac on—that are usually created/iden fied by predic ve and/or descrip ve analy cs—is
determined using sophis cated mathema cal models. Therefore, in a sense, this type of analy cs tries to
answer the ques on “What should I do?” Prescrip ve analy cs uses op miza on-, simula on-, and
heuris cs-based decision-modeling techniques. Even though prescrip ve analy cs is at the top of the
analy cs hierarchy, the methods behind it are not new. Most of the op miza on and simula on models that
cons tute prescrip ve analy cs were developed during and right a er World War II, when there was a dire
need for a lot with limited resources. Since then, some businesses have used these models for some very
specific problem types, including yield/revenue management, transporta on modeling, scheduling, etc. The
new taxonomy of analy cs has made them popular again, opening their use to a wide array of business
problems and situa ons.

Figure 1.4 shows a tabular representa on of the three hierarchical levels of analy cs, along with the
ques ons answered and techniques used at each level. As can be seen data mining is the key enabler of
predic ve analy cs.

Figure 1.4 Three Levels of Analy cs and Their Enabling Techniques

Business analy cs is gaining popularity because it promises to provide decision makers with informa on and
knowledge that they need to succeed. Effec veness of business analy cs systems, no ma er the level in the
analy cs hierarchy, depends largely on the quality and quan ty of the data (volume and representa onal
richness); the accuracy, integrity, and meliness of the data management system; and the capabili es and
sophis ca on of the analy cal tools and procedures used in the process. Understanding the analy cs
taxonomy helps organiza ons to be smart about selec ng and implemen ng analy cs capabili es to
efficiently navigate through the maturity con nuum.

The Cu ng Edge of Analy cs: IBM Watson


IBM Watson is perhaps the smartest computer system built to date. Since the emergence of computers and
subsequently ar ficial intelligence in the late 1940s, scien sts have compared the performance of these
“smart” machines with human minds. Accordingly, in the mid- to late 1990s, IBM researchers built a smart
machine and used the game of chess (generally credited as the game of smart humans) to test their ability
against the best of human players. On May 11, 1997, an IBM computer called Deep Blue beat the world
chess grandmaster a er a six-game match series: two wins for Deep Blue, one for the champion, and three
draws. The match lasted several days and received massive media coverage around the world. It was the
classic plot line of human versus machine. Beyond the chess contest, the inten on of developing this kind of
computer intelligence was to make computers able to handle the kinds of complex calcula ons needed to
help discover new medical drugs, do the broad financial modeling needed to iden fy trends and do risk
analysis, handle large database searches, and perform massive calcula ons needed in advanced fields of
science.

A er a couple decades, IBM researchers came up with another idea that was perhaps more challenging: a
machine that could not only play Jeopardy! but beat the best of the best. Compared to chess, Jeopardy! is
much more challenging. While chess is well structured and has very simple rules, and therefore is very good
match for computer processing, Jeopardy! is neither simple nor structured. Jeopardy! is a game designed for
human intelligence and crea vity, and therefore a computer designed to play it needed to be a cogni ve
compu ng system that can work and think like a human. Making sense of imprecision inherent in human
language was the key to success.

In 2010 an IBM research team developed Watson, an extraordinary computer system—a novel combina on
of advanced hardware and so ware—designed to answer ques ons posed in natural human language. The
team built Watson as part of the DeepQA project and named it a er IBM’s first president, Thomas J. Watson.
The team that built Watson was looking for a major research challenge: one that could rival the scien fic and
popular interest of Deep Blue and would also have clear relevance to IBM’s business interests. The goal was
to advance computa onal science by exploring new ways for computer technology to affect science,
business, and society at large. Accordingly, IBM Research undertook a challenge to build Watson as a
computer system that could compete at the human champion level in real me on the American TV quiz
show Jeopardy! The team wanted to create a real- me automa c contestant on the show, capable of
listening, understanding, and responding—not merely a laboratory exercise.

Compe ng Against the Best at Jeopardy!


In 2011, as a test of its abili es, Watson competed on the quiz show Jeopardy!, in the first-ever human-
versus-machine matchup for the show. In a two-game, combined-point match (broadcast in three Jeopardy!
episodes during February 14–16), Watson beat Brad Ru er, the biggest all- me money winner on Jeopardy!,
and Ken Jennings, the record holder for the longest championship streak (75 days). In these episodes,
Watson consistently outperformed its human opponents on the game’s signaling device, but it had trouble
responding to a few categories, notably those having short clues containing only a few words. Watson had
access to 200 million pages of structured and unstructured content, consuming 4 terabytes of disk storage.
During the game, Watson was not connected to the Internet.

Mee ng the Jeopardy! challenge required advancing and incorpora ng a variety of text mining and natural
language processing technologies, including parsing, ques on classifica on, ques on decomposi on,
automa c source acquisi on and evalua on, en ty and rela onship detec on, logical form genera on, and
knowledge representa on and reasoning. Winning at Jeopardy! requires accurately compu ng confidence in
answers. The ques ons and content are ambiguous and noisy, and none of the individual algorithms are
perfect. Therefore, each component must produce a confidence in its output, and individual component
confidences must be combined to compute the overall confidence of the final answer. The final confidence is
used to determine whether the computer system should risk choosing to answer at all. In Jeopardy!
parlance, this confidence is used to determine whether the computer will “ring in” or “buzz in” for a
ques on. The confidence must be computed during the me the ques on is read and before the
opportunity to buzz in. This is roughly between one and six seconds, with an average around three seconds.

How Does Watson Do It?


The system behind Watson, which is called DeepQA, is a massively parallel, text mining–focused,
probabilis c evidence-based computa onal architecture. For the Jeopardy! challenge, Watson used more
than 100 different techniques for analyzing natural language, iden fying sources, finding and genera ng
hypotheses, finding and scoring evidence, and merging and ranking hypotheses. What is far more important
than any par cular technique the IBM team used was how it combined them in DeepQA such that
overlapping approaches could bring their strengths to bear and contribute to improvements in accuracy,
confidence, and speed.

DeepQA is an architecture with an accompanying methodology that is not specific to the Jeopardy!
challenge. These are the overarching principles in DeepQA:

Massive parallelism. Watson needed to exploit massive parallelism in the considera on of mul ple
interpreta ons and hypotheses.
Many experts. Watson needed to be able to integrate, apply, and contextually evaluate a wide range
of loosely coupled probabilis c ques on and content analy cs.
Pervasive confidence es ma on. No component of Watson commits to an answer; all components
produce features and associated confidences, scoring different ques on and content interpreta ons.
An underlying confidence-processing substrate learns how to stack and combine the scores.
Integra on of shallow and deep knowledge. Watson needed to balance the use of strict seman cs
and shallow seman cs, leveraging many loosely formed ontologies.

Figure 1.5 illustrates the DeepQA architecture at a very high level. More technical details about the various
architectural components and their specific roles and capabili es can be found in Ferrucci et al. (2010).

Figure 1.5 A High-Level Depic on of DeepQA Architecture

What Is the Future for Watson?


The Jeopardy! challenge helped IBM address requirements that led to the design of the DeepQA architecture
and the implementa on of Watson. A er three years of intense research and development by a core team of
about 20 researchers, as well as a significant R&D budget, Watson managed to perform at human expert
levels in terms of precision, confidence, and speed at the Jeopardy! quiz show.

A er the show, the big ques on was “So what now?” Was developing Watson all for a quiz show? Absolutely
not! Showing the rest of the world what Watson (and the cogni ve system behind it) could do became an
inspira on for the next genera on of intelligent informa on systems. For IBM, it was a demonstra on of
what is possible with cu ng-edge analy cs and computa onal sciences. The message is clear: If a smart
machine can beat the best of the best in humans at what they are the best at, think about what it can do for
your organiza onal problems. The first industry that u lized Watson was health care, followed by security,
finance, retail, educa on, public services, and research. The following sec ons provide short descrip ons of
what Watson can do (and, in many cases, is doing) for these industries.

Health Care
The challenges that health care is facing today are rather big and mul faceted. With the aging U.S.
popula on, which may be par ally a ributed to be er living condi ons and advanced medical discoveries
fueled by a variety of technological innova ons, demand for health care services is increasing faster than the
supply of resources. As we all know, when there is an imbalance between demand and supply, the prices go
up and the quality suffers. Therefore, we need cogni ve systems like Watson to help decision makers
op mize the use of their resources, both in clinical and managerial se ngs.

According to health care experts, only 20% of the knowledge physicians use to diagnose and treat pa ents is
evidence based. Considering that the amount of medical informa on available is doubling every five years
and that much of this data is unstructured, physicians simply don’t have me to read every journal that can
help them keep up-to-date with the latest advances. Given the growing demand for services and the
complexity of medical decision making, how can health care providers address these problems? The answer
could be to use Watson, or some other cogni ve systems like Watson that has the ability to help physicians
in diagnosing and trea ng pa ents by analyzing large amounts of data—both structured data coming from
electronic medical record databases and unstructured text coming from physician notes and published
literature—to provide evidence for faster and be er decision making. First, the physician and the pa ent can
describe symptoms and other related factors to the system in natural language. Watson can then iden fy the
key pieces of informa on and mine the pa ent’s data to find relevant facts about family history, current
medica ons, and other exis ng condi ons. It can then combine that informa on with current findings from
tests, and then it can form and test hypotheses for poten al diagnoses by examining a variety of data
sources—treatment guidelines, electronic medical record data and doctors’ and nurses’ notes, and peer-
reviewed research and clinical studies. Next, Watson can suggest poten al diagnos cs and treatment
op ons, with a confidence ra ng for each sugges on.

Watson also has the poten al to transform health care by intelligently synthesizing fragmented research
findings published in a variety of outlets. It can drama cally change the way medical students learn. It can
help healthcare managers to be proac ve about the upcoming demand pa erns, op mally allocate
resources, and improve processing of payments. Early examples of leading health care providers that use
Watson-like cogni ve systems include MD Anderson, Cleveland Clinic, and Memorial Sloan Ke ering.

Security

As the Internet expands into every facet of our lives—ecommerce, ebusiness, smart grids for energy, smart
homes for remote control of residen al gadgets and appliances—to make things easier to manage, it also
opens up the poten al for ill-intended people to intrude in our lives. We need smart systems like Watson
that are capable of constantly monitoring for abnormal behavior and, when it is iden fied, preven ng
people from accessing our lives and harming us. This could be at the corporate or even na onal security
system level; it could also be at the personal level. Such a smart system could learn who we are and become
a digital guardian that could make inferences about ac vi es related to our life and alert us whenever
abnormal things happen.

Finance

The financial services industry faces complex challenges. Regulatory measures, as well as social and
governmental pressures for financial ins tu ons to be more inclusive, have increased. And the customers
the industry serves are more empowered, demanding, and sophis cated than ever before. With so much
financial informa on generated each day, it is difficult to properly harness the right informa on to act on.
Perhaps the solu on is to create smarter client engagement by be er understanding risk profiles and the
opera ng environment. Major financial ins tu ons are already working with Watson to infuse intelligence
into their business processes. Watson is tackling data-intensive challenges across the financial services
sector, including banking, financial planning, and inves ng.

Retail
Retail industry is rapidly changing with the changing needs and wants of customers. Customers, empowered
by mobile devices and social networks that give them easier access to more informa on faster than ever
before, have high expecta ons for products and services. While retailers are using analy cs to keep up with
those expecta ons, their bigger challenge is efficiently and effec vely analyzing the growing mountain of
real- me insights that could give them the compe ve advantage. Watson’s cogni ve compu ng capabili es
related to analyzing massive amounts of unstructured data can help retailers reinvent their decision-making
processes around pricing, purchasing, distribu on, and staffing. Because of Watson’s ability to understand
and answer ques ons in natural language, it is an effec ve and scalable solu on for analyzing and
responding to social sen ment based on data obtained from social interac ons, blogs, and customer
reviews.

Educa on

With the rapidly changing characteris cs of students—more visually oriented/s mulated, constantly
connected to social media and social networks, increasingly shorter a en on spans—what should the future
of educa on and the classroom look like? The next genera on of educa onal system should be tailored to fit
the needs of the new genera on, with customized learning plans, personalized textbooks (digital ones with
integrated mul media—audio, video, animated graphs/charts, etc.), dynamically adjusted curriculum, and
perhaps smart digital tutors and 24/7 personal advisors. Watson seems to have what it takes to make all this
happen. With its natural language processing capability, students can converse with it just as they do with
their teachers, advisors, and friends. This smart assistant can answer students’ ques ons, sa sfy their
curiosity, and help them keep up with the endeavors of the educa onal journey.

Government

For local, regional, and na onal governments, the exponen al rise of Big Data presents an enormous
dilemma. Today’s ci zens are more informed and empowered than ever before, and that means they have
high expecta ons for the value of the public sector serving them. And government organiza ons can now
gather enormous volumes of unstructured, unverified data that could serve their ci zens—but only if that
data can be analyzed efficiently and effec vely. IBM Watson’s cogni ve compu ng may help make sense of
this data deluge, speeding governments’ decision-making processes and helping public employees focus on
innova on and discovery.

Research

Every year, hundreds of billions of dollars are spent on research and development, most of it documented in
patents and publica ons, crea ng enormous amount of unstructured data. To contribute to the extant
knowledgebase, one needs to si through these data sources to find the outer boundaries of research in a
par cular field. This is very difficult, if not impossible work, if it is done with tradi onal means, but Watson
can act as a research assistant to help collect and synthesize informa on to keep people updated on recent
findings and insights. For instance, New York Genome Center is using the IBM Watson cogni ve compu ng
system to analyze the genomic data of pa ents diagnosed with a highly aggressive and malignant brain
cancer and to more rapidly deliver personalized, life-saving treatment to pa ents with this disease (Royyuru,
2014).

References
Bi, R. (2014). When Watson Meets Machine Learning. [Link]/2014/07/watson-meets-
[Link] (accessed June 2014).

DeepQA. (2011). DeepQA Project: FAQ, IBM Corpora on. [Link]/deepqa/[Link]


(accessed April 2014).
Feldman, S., J. Hanover, C. Burghard, & D. Schubmehl. (2012). Unlocking the Power of Unstructured Data.
[Link]/so ware/ebusiness/jstart/downloads/[Link] (accessed May
2014).

Ferrucci, D., et al. (2010). “Building Watson: An Overview of the DeepQA Project,” AI Magazine, 31(3).

IBM. (2014). Implement Watson. [Link]/smarterplanet/us/en/ibmwatson/[Link]


(accessed July 2014).

Liberatore, M., & W. Luo. (2011). “INFORMS and the Analy cs Movement: The View of the Membership,”
Interfaces, 41(6): 578–589.

Robinson, A., J. Levis, & G. Benne . (2010, October). “Informs to Officially Join Analy cs Movement,” ORMS
Today.

Royyuru, A. (2014). “IBM’s Watson Takes on Brain Cancer: Analyzing Genomes to Accelerate and Help
Clinicians Personalize Treatments.” Thomas J. Watson Research Center, [Link]/ar cles
/[Link] (accessed September 2014).

© 2019 Pearson Educa on, Informit. All rights reserved.


800 East 96th Street, Indianapolis, Indiana 46240

You might also like