Global Consulting Practice (GCP) Big Data Point of View GCP Information Management
INTERNAL & CONFIDENTIAL
October 30, 2012
Copyright 2012 Tata Consultancy Services Limited
Why Big Data?
Social Media Sensor Data Video Feeds Audio Clips Images News Feeds Log Files
Explosion of Big Data
Digital Expansion
Social Explosion
Google Amazon Yahoo eBay Apple Hadoop Map/Reduce
Emergence of Big Data Platforms Mobility/ Location Cloud Computing
Big Data
Listening Text Mining Machine Learning Automated Reasoning Artificial Intelligence
Maturation of Analytic Tools (Advanced A.I)
Explosion of Information plus Multiple Innovations are creating a Perfect Storm
Document Name TCS Confidential
Leveraging Big Data The New Challenge
Big Data : Web Scale
50 billion web pages 800 million Facebook users 1000 million Facebook pages 200 million Twitter accounts 100 million tweets per day 5 billion Google queries per day Millions of servers, Petabytes of data
Digital Expansion
Social Explosion
Big Data
Varieties of Data
Video / Audio Images / Pictures Diverse internal and external data
Mobility/ Location
Sources of Data
Cloud Computing
News / Feeds / Blogs / forums Groups / Polls / Chats / Wiki
Information is exploding all around But the challenge is to understand
Document Name TCS Confidential
The Net Generation is Here
The Net Generation is inter-connected on a variety of Web based and Digital channels. Facebook Twitter Google Youtube Linkedin Wikipedia Blogs Forums Groups
This is changing the rules of Customer engagement
Document Name TCS Confidential
The Voice of the Customer must be heard
Listening to the voice of the customer (VoC) has acquired new meaning in the wake of Social Media
Sales and Marketing
Identify new value added service ideas Accelerated new product introductions Improved new product adoption rates, Increased sales Improved lead conversion rates Reduced sales and marketing expense
Product Innovation
Customer Acquire new customers Acquisition Grow share-of-wallet from
existing customers
Retained customers Improved customer responsiveness and service levels Improved customer satisfaction
Customer Retention
Customer Service
Higher customer satisfaction Faster implementation of service improvements Reduced customer service expense
Brand Reputation
Proactively manage brand risk Identify areas where damage control is required
-4-
Document Name TCS Confidential
TCS Point of View # 1
POV : Big Data is here to stay and is going to be an increasingly relevant arena of competitive differentiation Rationale : Given the information explosion going on all around, and the current stream of innovations happening altogether, Big Data is going to be very important. Organizations that learn how to harness Big Data and harvest useful information and insight from Big Data will create competitive advantage for themselves. They will be seen by their customers as keeping up with the March of technology capabilities. Others that are not current will appear to behind the times, and therefore not competitive. Implication : Most organizations will invest resources and time to uncover use case scenarios for Big Data in various Business Processes, and deploy Big Data platforms to harness and harvest useful insight from Big Data. While the particular sources of data that are relevant for a given Business scenario may vary from use case to use case within an organization, and from one Industry Vertical to another, the application of techniques for harnessing Big Data and harvesting useful insight will be nearly Universally adopted.
-5Document Name TCS Confidential
Big Data The New Frontier VELOCITY
Worldwide digital content will double in 18 months, and every 18 months thereafter.
IDC
Processing
VOLUME
Opportunities
Mobile
In 2005, humankind created 150 exabytes of information. In 2011, 1,200 exabytes were be created.
The Economist
Emails
GPS
CRM Data
Planning
Tweets
Inventory
Deman d
VARIETY
80% of enterprise data will be unstructured, spanning traditional and non traditional sources.
Gartner
Instant Messages
Speed
Sales Orders
Velocity Customer
Things
Service Calls
Transactions
Document Name TCS Confidential
Big Data Management and Interpretation
Data
Management Services Analytics Services Structured X Internal
Unstructured
External
-7-
Document Name TCS Confidential
TCS Point of View # 2
POV : There are two fundamental aspects to Big Data The harnessing aspect, i.e. the Technology required to Manage Big Data, and the harvesting aspect i.e. The Technology required to analyze and derive insight from Big Data. Rationale : Given the volume, variety, velocity characteristics of Big Data, it is not amenable to being managed by traditional technologies. It requires a new class of Big Data platforms e.g. The Hadoop ecosystem, the Map / Reduce Algorithm and technologies built on top of them, to harness Big Data. At the same time, analyzing Big Data with a view to harvesting useful nuggets of insight from a variety of Big Data sources requires completely different technologies as well. These two domains of technologies are complementary to each other, i.e. two sides of the Big Data coin. Implication : Both Technology domains need to be deployed for Big Data to be useful. Correspondingly the skills required to harness and manage Big Data, and the skills required for analyzing and interpreting Big Data are also necessary. However, they are generally different skills. Harnessing Big Data requires purely a technology orientation, while harvesting insights from Big Data requires a more comprehensive business context i.e. the Business problem we are trying to solve, and metrics we are trying to impact etc.
-8Document Name TCS Confidential
Big Data Technology is Here Now
Hadoop : Massively Parallel Processing Capability, running on commodity hardware
Big Data Technology handles data at extreme scale and is characterized by Massive parallel computing to divide and conquer workloads. Extremely flexible to allow unlimited data manipulation and transformation Massively scalable in terms of both technology and cost
Hbase and Hadoop/HDFS are designed to store and manage massive amounts of data
Hive, Mahout and R, enable query, analysis and running in-memory compute-intensive applications
The ecosystem of Big Data Technology is affordable, and within the reach of companies
Document Name TCS Confidential
What Does a Big Data Platform Do?
Document Name TCS Confidential
TCS Point of View # 3
POV : Big Data Technology Platforms built around the Hadoop ecosystem, using The Map / Reduce algorithms can be used to solve many traditional problems, i.e. not involving Big Data per se. Rationale : The Hadoop and Map/Reduce based frameworks, represent a paradigm Shift in Data Processing capabilities. While they originated in the context of handling Big Data from vendors such as Google, Yahoo, Amazon etc. they can be used to Handle many traditional Data Processing contexts as well. One example is the use Of the Hadoop Platform as an ETL Toolset working exclusively with traditional Structured, transactional and master data. Thus the Big Data Technology Platform Has use in contexts such as ETL, DWH, MDM, Analytics etc. Implication : Organizations which are experiencing extremely high workloads, in traditional Data Warehousing and Analytics contexts, are likely to experiment with Big Data Technologies for solving traditional data processing problems. In fact, many benefits ranging from significant performance improvements, total cost of ownership, increased throughput of processing activity, improved availability of data to end users, and many others can be generated from deploying Big Data Platforms, without the incorporation Big Data sources.
- 11 Document Name TCS Confidential
Hadoop as Transformation Platform in ETL
Within Hadoop Ecosystem Transactional Systems
MapReduce / Hive / Pig could be used to transform data within the distributed file system (HDFS).
Data Warehouse
MapReduce / Hive /Pig HDFS
Hadoop Cluster
Less number of Higher end nodes
Tools like SQOOP could be leveraged to load data from and to HDFS
Document Name TCS Confidential
TCS Confidential
Hadoop complements Data Warehouse
Data-Mart on Hadoop (to store more granular data) Transactional Systems
MapReduce / Hive / Pig could be used to transform data within the distributed file system (HDFS), and create the aggregates and the same could be moved to aggregate level data marts
MapReduce / Hive /Pig HDFS
Data Warehouse Data Marts at Aggregate Levels
Hadoop Cluster
Higher number of nodes for larger storage
Tools like SQOOP could be leveraged to load data from and to HDFS
Document Name TCS Confidential
Hadoop as an ad-hoc analysis platform
Transactional Systems
Hadoop as an ad-hoc analysis platform
MapReduce / Hive / Pig could be used to transform data within the distributed file system (HDFS), this could provide the business analytics team a platform for innovation
Data Warehouse
MapReduce / Hive /Pig HDFS
Hadoop Cluster
Higher number of nodes for larger storage
Tools like SQOOP could be leveraged to load data from and to HDFS
TCS Confidential
Document Name TCS Confidential
TCS Point of View # 4
POV : The Big Data Technology and Product landscape is quite vast and varied right now. There are hundreds of products and offerings. Consolidation of Products and offerings will be natural over the 2-5 years. Rationale : The basic Hadoop and Map Reduce technologies which are at the heart of all Big Data Technology Platforms are available in three forms i.e. open source, proprietary and hybrid. Open Source technologies can be deployed as they are, and many companies are choosing to do this. However, they will have the issues of security privacy and robustness of management etc. Niche players are relatively new and will get consolidated in course of time. The major Technology vendors such as IBM, HP, Oracle, Teradata, Informatica etc. will complement, fill gaps and improve their offerings Implication : It is difficult to predict, which technologies will survive, which will get acquired and consolidated and which will simply die, at this time. Companies which are committed to the open source idea and wish to exploit this technology may invest in these directly, and build skills in this area. On the other hand, companies which are committed to Vendors such as IBM or Teradata, etc, may weigh the costs versus benefits of going with pure open source, or buy into a hybrid strategy, where some of the capability gaps are filled by the Vendors. This needs careful evaluation.
- 15 Document Name TCS Confidential
Big Data Product and Offering Landscape
Analytics / Visualization Search
CEP
No SQL
Data Integration
Tools
Data Integration
Hadoop Distributions
Appliance/ Vendor
Cloud Distributions
Document Name TCS Confidential
Pure-Play Vendors
Document Name TCS Confidential
Big Data Product Landscape
Commercial
Open Source
Hybrid
Document Name TCS Confidential
TCS Point of View # 5
POV : Unstructured Data cannot be consumed as it is, in its raw form. It must be processed into useful nuggets of information i.e. converted into a consumable Structured form, before it can be interpreted and acted upon. Rationale : Unstructured information cannot be interpreted and used by end users, as it is. It must be converted into a useful form. This requires filtering a lot of noise out of the data, since Big Data tends to have a lot of noise relative to useful data. Further the information content of Big Data streams, must be interpreted in the context of other more traditional types of information, before it can be deemed useful. This requires the Fusion of Big Data based information with more traditional structured information to derive useful insight. Implication : Big Data is not a new opportunity or capability that stands on its own. It is better considered as augmenting already existing Data Management and Analytics capabilities in an organization. Big Data platforms are not replacements for existing traditional Data Management and Analytics platforms. They merely add, mature and improve upon existing environments and capabilities. The information fusion i.e. the ability to bring together structured and unstructured information in the context of specific business problems and opportunities is what is needed to exploit Big Data.
- 19 Document Name TCS Confidential
An Example - Social Intelligence
Social Intelligence i.e. the process of generating useful knowledge from the web of social media activity is maturing : However the social Web is too big, moving too fast and too full of irrelevant data trash.
Radian 6 Visible Technologies
Listening
Synethesio Attensity
Converseon SDL
Dashboards
Networked Insights
Filtering
Lithium
Analysis
Friends
Fusion
Fans
Followers
- 20 -
Value Network Influencers
Document Name TCS Confidential
Listen & Learn Machine Learning
News Chatter Events
Respond Alert
Listen
Learn, Focus, Filter, Reason Fuse, Connect
Document Name TCS Confidential
Analyze
This requires Information Fusion
Real Time Streams
Real-Time Business Insights and Alerts Early-Problem Detection Market Intelligence Demand Signal Refinement
EIF Framework
Marketing
Analytics
Real Time Structured Database
Big SQL
No SQL Processing
Unstructured Data (HDFS)
Integrated Customer Insights environment
Document Name TCS Confidential
Enterprise Information Fusion (EIF)
Structured Information
Unstructured Information
Document Name TCS Confidential
Big Data requires connecting the dots
Marketing Public Relations Customer Service Sales Product Development Human Resources Finance
Web
Mobile
Tablets
Smartphones
Mobile Applications Mobile App Stores Mobile Web Mobile Messaging Location-based services
Website Intranet Partner Portals SEO SEM Online Advertising Web presence Micro-sites ecommerce
Partner Portals
Big Data
Traditional Channels
Social Network Applications Social Search Engine Optimization Community management Social Media Expansion Social Business Initiatives Crowd sourcing
Call Center RFID, Monitors and Sensors
Social
Document Name TCS Confidential
In order to generate useful Insights
Marketing Public Relations Customer Service Sales Product Development Human Resources Finance
Big Data Big Insights
Mobile
Tablets
Web
Mobile Applications Mobile App Stores Mobile Web Mobile Messaging Location-based services Social Network Applications Social Search Engine Optimization Community management Social Media Expansion Social Business Initiatives Crowd sourcing
Smartphones
Website Intranet Partner Portals SEO SEM Online Advertising Web presence Micro-sites ecommerce
Partner Portals
Traditional Channels
Social
RFID, Monitors and Sensors
Call Center
The new Technology Challenge Harnessing the power of Big Insights
Document Name TCS Confidential
TCS Point of View # 6
POV : The Fusion of Unstructured and Structured Information for a given Business context, requires Business domain expertise in addition to Data Analysis Expertise. This is a new science i.e. Data Science Rationale : While Information Fusion is a general expertise, its application is usually within the confines of a specific Business context. Examples of specific business contexts are Marketing, Sales, Brand Management, Customer Service, Fraud and Risk analytics etc. Within each Business context, the information sources that are relevant, and the process of extracting useful insights from Big Data, are unique and distinct. This requires knowledge and understanding of Data sources and the processes for deriving useful information from Big Data in business contexts. Implication : Data Science, and the role of a Data Scientist is going to be a new area of growth and development. The traditional Analyst who was equipped with managing and analyzing structured data is going to have to extend themselves to understand and work with non-traditional Big Data sources, and tools appropriate to working with them. There is likely to be a tremendous demand for Data Scientists in the future. It is possible that many universities and colleges may offer courses on Data Science and the Tools required to work with big Data.
- 26 Document Name TCS Confidential
Data Science and Advanced Analytics
Analytics is evolving to meet the needs of the market. Leaders can expect: Big Data
Future Direction
Business Analytics
Description
Business intelligence combines with advanced analytics to form a new category called business analytics Social data will play a greater role in decision processes The emergence of applications that bundle, data, knowledge, and analytics to solve business problems Analytics will increasingly identify market signals and initiate action, through context sensitive alerts The growing enterprise realization that Analytic COEs are required McKinsey Global Institute predicts a future shortage of analysts and managers with the necessary analytical skills Text Analytics is absorbed into business applications The shift from analytics as a reporter of process, to analytics as an enabler of process The growing role of analytics throughout the information life cycle
Document Name TCS Confidential
Social Channels Blogs, Wikis, Forums Social networking Groups User profiles Ratings, reviews, etc. Polls, chat, podcasting Audio, video, photos Events & calendar Private messaging+
Social Data Analytic Applications The Awareness-to-Action Imperative Analytic Centers of Excellence Analytic Outsourcing Text Analytics Maturation Process Enablement The Information Lifecycle
Instrumented Channels Smart grid Home appliances Cars Sensors Monitors Supply chain devices Other mobile devices Mobile Channels Mobile Applications Other Channels Video Audio Other
Analytics Classifications
Text Analytics Social Analytics Sentiment Analysis Brand Identity Product & Brand Affinity Reputation Driven Online-Economy Predictive Analytics Forecasting Targeting Fraud Detection, Anti-Fraud Analytics Regression, Predictive, Multivariate Propensity Price Elasticity Mobile Analytics
Digital Delivery Channels & Services Property Effectiveness Application Analytics Ad Analytics Geo-Spatial Analytics User profile and Relevance Identify New Opportunities
Segmentation Analytics Customer Segmentation in real-time Churn Analysis, Attrition Funnel Analysis Behavioral Segmentations
Document Name TCS Confidential
Big Data Analytics
Prescriptive
(What should happen?)
Optimizing Outcomes
Optimization Simulation
Identifying possible outcomes
Predictive
(What will happen?)
Domain Expertise Text Analytics Data Mining Knowledge
Predictive Modeling Statistical Analysis Visual Analytics Forecasting
Describing and analyzing outcomes
Descriptive
(What has Happened?)
Query, Analysis, Drill-Down, Ad-Hoc Reporting Dashboards and Scorecards Visual Analytics
* Source GCP Business Analytics
Document Name TCS Confidential
Examples of Uses of Big Data
Log Analytics & Storage Smart Grid / Smarter Utilities RFID Tracking & Analytics Fraud / Risk Management & Modeling 360 View of the Customer Warehouse Extension Email / Call Center Transcript Analysis Call Detail Record Analysis +++
30
Document Name TCS Confidential
Some Examples of Use Cases
Data Source High-Frequency Operations Low-Frequency Operations
Document Name TCS Confidential
Applications for Big Data Analytics
Smarter Healthcare Multi-channel sales Finance Log Analysis
Homeland Security
Traffic Control
Telecom
Search Quality
Manufacturing
Trading Analytics
Fraud and Risk
Retail: Churn, NBO
Document Name TCS Confidential
TCS Point of View # 7
POV : We are still in the very early days of Big Data adoption. The companies That have deployed and exploited Big Data technologies are Google, Yahoo, Amazon etc. The rest are just beginning their Big Data Journey. Rationale : Big Data Technologies have been used exclusively so far in companies that are dealing with Web Scale data. This technology is now slowly beginning to become viable for large commercial enterprises. Use cases which represent possible scenarios where Big Data can be fruitfully exploited, are still being discovered and documented. Very few case studies are available which represent full scale adoption of Big Data technologies. We are still in an era of experimentation, trial and error, do and learn, Proof of concept and Value cycles. Implication : Big Data adoption will increase steadily over the next few years. Gartner is predicting that we are still in the early Technology Trigger phase of Big Data. IDC and Wikibon are predicting a ten-fold growth in the Big Data Market over the next five years. Most companies will do well to set aside budgets for experimentation and laboratory scale projects to explore the uses of Big Data in various business contexts and in the process develop some skills in these new technologies and Data Science areas.
- 33 Document Name TCS Confidential
The Gartner Hype Cycle
Document Name TCS Confidential
What is the Market?
Document Name TCS Confidential
Business Drivers for Big Data
Document Name TCS Confidential
Thank You
Big data analytics will push businesses to become smarter, social, more relevant
TCS Confidential
30 October, 2012
Copyright 2012 Tata Consultancy Services Limited