Module 1
INTRODUCTION TO DATA SCIENCE FOR BUSINESS DECISION MAKING
DATA
Data is a source of information. Any data its own does not having any meaning. It just a raw
data. That is which is collection of meaningless text, number, symbol.
Eg :- 2,4,6,8…….
Amul, Nestle, ITC…………
36,37,38,35,36………..
INFORMATION
Information = Data + Context (Meaning)
Data needs to be processed for gathering information and we take the help of computer and
software packages
Eg :- 2,4,6,8 (Multiplication of table 3)
Amule, Nestle, ITC (3 FMCG companies listed in NSE)
36,37,38,35,36 (Highest temperature in Kolkata in last 5 days)
KNOWLEDGE
Knowledge = Information + Application on it
When a information is used for solving a problem we say it’s the use of knowledge
Eg :- 2,4,6,8 (Multiplication of table 3) – Table of 3 start from 3.
Amule, Nestle, ITC (3 FMCG companies listed in NSE) – To understand Indian FMCG
sector we analyze above 3 companies financial performance.
36,37,38,35,36 (Highest temperature in Kolkata in last 5 days) – The sale of ACs may be
estimated using this information.
NATURE OF DATA
Data may be classified into different groups as below,
Numerical Data: - Any data expressed as number is numerical data.
Eg:- Stock price data
Descriptive data:- Information described in the form of qualitative information.
Eg :- Annual report of HUL
Notes By: Haris Hameed – Btech(EEE), Mtech(PS) – Academic Head – PROFINZ
Graphical data :- Data presented in the form of a picture or graph. A picture may tell
1000 words.
Eg :- Stock price of HUL presented in the form a chart
TYPES OF DATA IN FINANCE AND COSTING
The kinds of data used in finance and costing are;
a) Quantitative data :- By the form quantitative data means data expressed in numbers.
Eg :- Stock price data, Financial Statements
b) Qualitative data :- Some data may be appearing in qualitative format like text, video,
audio…
Eg :- Management discussions and analysis presented as a part of annual report
TYPES OF DATA
There is another classification of data;
a) Nominal
b) Ordinal
c) Interval
d) Ratio
This classification based on 3 basic characteristics;
Whether the sequence of answers matters or not
Whether the gap between deviation significant
The presence of genuine zero
A. Nominal Scale
Nominal scale = Named variable
Nominal scale used for categorizing data. Under this scale, observations are classified
based on certain characteristics. The category label contain numbers but have no
numerical value.
Eg :- Apple, Samsung, Oppo, Vivo, Redmi, etc
Classifying funds as equity fund, debt fund, balance fund
B. Ordinal scale
Ordinal scale = Named + Ordered variable
Ordinal scale used for classifying and put in order. The number just indicate a order they
do not specify how much better or verse
Notes By: Haris Hameed – Btech(EEE), Mtech(PS) – Academic Head – PROFINZ
Eg :- Top 10 stock by PE ratio (increasing or decreasing order)
How satisfied in customer service
a. Very unsatisfied
b. Unsatisfied
c. Neutral
d. Satisfied
e. Very satisfied
C. Interval scale
Interval = Named + Ordered + Proportionate interval between the variable
It is used to measure variable with equal interval between the variable. The only
drawback of the scale is that there is no predefined starting point or a absolute zero.
Eg :- Temperatures scale
Time data
Ratio scale
D. Ratio Scale
Ratio scale = Named + Ordered + Proportionate interval between the
variable + Accommodate absolute zero
Accomadate
The ratio scale possess absolute zeroof nominal, ordinal and interval scale and also
all characteristics
have a true zero. That is zero has a significant value.
Eg :- Mass
Money
Digitalization of Data and Information
Digitalization is the process of converting data and information from analog to digital format.
Objectives of Digitalization
To provide widespread of access of data information to a very large group of users
simultaneously.
Digitalization helps in preservation of data for a long period.
Eg: UID or ADHAAR in India is one of the large digitalization project.
Notes By: Haris Hameed – Btech(EEE), Mtech(PS) – Academic Head – PROFINZ
Why we Digitalize
Improve classification and indexing
This help to easy to retrieval of records
May access by more than one person simultaneously
It becomes easier to reuse data
Help in work processing
Easier to keep backup files and retrieval during any unexpected disaster
Can access from multiple location
Increase organizational productivity
Require less storage capacity
How we digitalize (6 Phases of digitalization)
Phase 1 :- Justification of the proposed digitalization process
The accrual benefits of the digitalization project need to be identified also the cost aspects and
availability of resources.
Phase 2 :- Assessment
All records are never digitalized the data that require digitalization is to be decided on the basis
of Content and context. Some data may be digitalized in a consolidated formats and in detailed
formats. The software, hardware, human resources etc also planned.
Phase 3 :- Planning
Successful execution of digitalization process or projects need meticulous planning. That is
digitalization Approach project documentation, resource management, technical specification,
risk management etc.The digitalization may be completed in house or alternatively by an
outsourced agency.
Phase 4 :- Digitalization activity
The Wisconism Historical Society developed a six phase process for digitalization planning,
Capture, Primary quality control, Editing, Secondary quality control, Storage and Management.
Phase 5 :- Process in the care of records
Once the digitalization of records is complete there are few additional requirements arise. That
is Permission for accessing the data, intellectual control, classification and maintenance.
Phase 6 :- Evaluation
The primary purpose Is to enable reflection and assist identify changes that would improve future
Digitalization process.
Notes By: Haris Hameed – Btech(EEE), Mtech(PS) – Academic Head – PROFINZ
To make data into user friendly information we should go through 6 core steps
Collection of data :- Should be done with standardized system, appropriate software and
hardware and trained staff.
Organizing the data :- The data needs to be organized in appropriate manner to
generate relevant information.
Data processing :- Data need to be cleaned to remove unnecessary elements.
Integration of data :- Data integration is the process of combining data from various
source into single unified form and store in a master server. This enable analyst to
produce effective actionable intelligence.
Data reporting :- Translating the data into consumable format to make accessible by
user.
Revenue report, net profit report
Data utilization :- Data is utilized to back corporate activity and enhance efficiency and
productivity for the growth of the business.
Five basic principles of Data Ethics
Regarding ownership :- The first principle is that ownership of any person belongs to
the person. It is unethical to collect some one person without their concern. It may be
obtain through digital privacy policy or by asking user to agree with terms and
conditions.
Regarding transparency :- Maintaining transparency is important while gathering data.
The objective is that companies collecting user data it should be known to the user.
Regarding privacy :- As the user may allow to collect, store, and analyze personal
information but does not make publically available.
Regularly intention :- The intention of data analysis should never be making profit out
of others weakness or for hearting others. Collecting data unnecessarily should be
avoided and it unethical.
Regarding outcome: - Even the intention of data analysis is good the result of the data
analysis may heart the client or data provider this is called disparate impact, which is
unethical.
Notes By: Haris Hameed – Btech(EEE), Mtech(PS) – Academic Head – PROFINZ
Module 2
DATA PROCESSING, ORGANIZATION, CLEANING AND VALIDATION
DATA PROCESSING
Data processing is the process of organizing, categorizing and manipulating data in order to
extract information.
The history of Data processing can be divided into 3 phases;
A. Manual DP :- It involve processing data without much assistance from machine. Only
small scale data processing was possible using manual effort. But it is still use today due
to difficulty in digitalisation or inability to read by machine
Eg :- Outdated text or document.
B. Mechanical DP :- It process data using mechanical tools and technology and not modern
computer.
Eg :- Punch card machine installed by us for population senses.
C. Electronic DP :- Data processing is being done electronically using computers and other
cutting edge electronics. Electronic DP replaces mistakes in other two DPs.
Eleven (11) significant area where data science play important role in finance
Risk analytics
Real time analytics
Customer data management
Consumer analytics
Customer segmentation
Personal services
Advanced customer services
Predictive analytics
Fraud detection
Anomaly detection
Algorithmic Trading
DATA CLEANING
It is the process of correcting or detecting in accurate, corrupted, improperly formatted, duplicate
or insufficient data from database.
Notes By: Haris Hameed – Btech(EEE), Mtech(PS) – Academic Head – PROFINZ
Steps in data cleaning
1) Removal of duplicate and irrelevant information
Eliminate unnecessary observations from your dataset such as duplicate or irrelevant.
2) Fix structural errors
While measuring or transferring data you may detect unusual naming standards or wrong
capitalization.
Eg :- N/A, Not applicable
3) Filter unwanted applicable.
4) Handle missing data
5) Validation and QA
Benefits of Data Validation
Error correction – When multiple data sources are combined.
Fewer mistake result in happier customer and less irritated worker.
Monitoring mistakes improve reporting.
Effective corporate procedure and fast decision making.
DATA VALIDATION
Data validation is a process that ensures the accuracy and reliability of a data in a system. It
typically involves checking data for accuracy, completeness and consistency.
Types of Data validation
1. Data types check
2. Code check
3. Range check
4. Format check
5. Consistency check
6. Uniqueness check
DATA ORGANISATION AND DISTRIBUTION
DATA ORGANISATION
Data Organisation is the classification of unstructured data into distinct group. It is a process of
arranging unstructured data in a meaningful manner.
Eg:- Frequency distribution table.
Classification, Image representation
Graphical representation
Notes By: Haris Hameed – Btech(EEE), Mtech(PS) – Academic Head – PROFINZ
DATA DISTRIBUTION
It is a function that indentify and quantify all potential value for a variable as well as their
relative frequency. The primary benefit of data distribution is the estimation of probability of any
certain within a sample space.
Discrete Distribution Continuous Distribution
1) Binomial 1) Nominal
2) Poison 2) Log normal
3) Hyper geometric 3) F distribution
4) Geometric 4) Chi square
5) Exponential
6) T- student
Types of distribution
Based on type of data distribution is classified into two:-
1. Discrete distribution:- A discrete distribution that results from countable data and has a
finite number of potential values.
Eg:- rolling dices, tossing coin and counting heads.
2. Continuous distribution:- A distribution with unlimited number of variable or data
points that may be represented on a continuous measuring scale.
Eg:- Human Hight, time between arrival of buses at a bus stop, waiting time in que.
Types of Discrete distribution
1. Binomial distribution:- Binomial distribution quantify the chances of obtaining specific
number of success or failure in each experiment.
Eg:- Probability of getting heads or tail will tossing number of times.
2. Poison distribution:- It is a probability distribution that quantify chances of certain
number of events occurring in a given time period. Where the events occur in a well
defined order.
Eg:- No of calls received in a call centre.
No of accidents
No of absenteeism
3. Hyper Geometric Distribution:- It is the distribution that assess the chances of certain
number of success in n trails without replacement from a sufficient large population N.
Notes By: Haris Hameed – Btech(EEE), Mtech(PS) – Academic Head – PROFINZ
4. Geometric Distribution:- It is a discrete distribution that assess the probability of the
first success.
Eg:- No. of tries to solve a puzzle.
No. of attempts to win CMA exam.
No. of call until a call centre respond.
Types of Continuous Distribution
1. Normal distribution:- It is a small curve with a greater frequency around core point
another name is Gavssion distribution. As way go from the centre on each side. The
frequency drop dramatically.
Eg:- Human height, blood pressure measurement, daily temperature.
2. Log Normal distribution:- A continuous random variable X follows a lognormal
distribution. If the distribution of is natural logarithm. In log (X) is normal.
3. F Distribution:- F distribution is used to examine quality of between two normal
distribution.
Eg:- Analysis of variance in biology that is apply different fertilizers to plant and study
the growth of the plant.
4. Chi Square distribution: - When independent variable with standard normal distribution
is squared and added the Chi Square distribution occure,
Eg:- Goodness of fittest
Survey analysis
5. Exponential Distribution:- It is used to describe product with bell consistent failure rate.
Eg:- Failure of electronic components, services time, Life time of mechanical
components.
6. T-student distribution:- It is a probability distribution with bell shaped that is
symmetrical about its mean.
Eg:- Environment studies,
Educational results.
Functions of Data processing
1. Validation:- Validation may be defined as an activity aimed at verifying whether value
of data comes from a given set of acceptable value. The objective of data validation is to
assure degree of data quality.
2. Sorting:- It is a procedure that organize data into meaningful order. To make it simpler to
analyze and visualize.
3. Aggregation:- Data aggregation refers to any process in which data is collected and
summarized. A common application of data aggregation is to offer for the analyst to
provide the summary of data for business analysis.
Notes By: Haris Hameed – Btech(EEE), Mtech(PS) – Academic Head – PROFINZ
4. Analysis:- It is a process of cleaning, converting and modeling data to obtain actionable
business intelligence. Some popular data analysis tools are;
R-programming, Python, SQL, MATLAB, Java, sas.
5. Reporting:- Data reporting is an act of gathering and structuring raw data turning into
consumable format in order to evaluate the organizational continuous performance.
Eg:- Financial data such as revenue, account receivable, net profit etc….
6. Classification:- It is a process of classifying data according to important category so that
it may be utilized, safeguard more effectively. Classification make data searchable and
trackable.
Steps for effective Data classification
1) Step 1:- Understanding the current setup
2) Step 2:- Creation of data classification policy
3) Step 3:- Prioritize and organize data
Notes By: Haris Hameed – Btech(EEE), Mtech(PS) – Academic Head – PROFINZ
Module 3
DATA PRESENTATION: VISUALISATION AND GRAPHICAL PRESENTATION
Data Visualisation -keeping in mind (right way of doing data visualisation)
Know the objective
Eg :- Income increase in correlate with social media spending
Always keep audience in mind – Who view the data visualisation will determine the
degree of details required.
Invest in the best technology.
Improve the team ability to visualize the data.
Objectives of Data visualisation
Making a better analysis
Faster decision making :-
Visuals are easier for human to process rather than tables or reports.
Analysing complicated data.
Dash Board
Data visualisation dashboard is an interactive dashboard that enable to manage important metrics
across numerous financial channels, visualise the data point and generate report for customers
that summarize the results.
Bar Chart
It used to compare data across categories, highlight discrepancies, demonstrate trend and
illustrate historical highs and lows.
Line Chart
Line chart joins various data points and displaying them as a continuous progression. Line chart
observe the trend in data with respect to time.
Pie Chart (Circular Chart)
It is also called circle chart. It is a circular graphical representation to demonstrate numerical
proportion.
Notes By: Haris Hameed – Btech(EEE), Mtech(PS) – Academic Head – PROFINZ
MAP
For displaying type of location data including postal code, state abbreviations, country code and
custom geocoding maps are used.
Density map
It indicate pattern or relative concentration by overlapping marks on map.
Eg :- Cyclone hazard prone districts by density map.
Scatter plots
Scatter plots are useful for examining connection between many variables, that is revealing
whether one variable is a good predictor of another or whether they vary independently. It
displays several unique data points on a single graph.
Gantt Chart
Gantt Chart represent project time line or activity changes across time. Gantt Chart represent task
that must be completed before others may begin. Gantt Chart not restricted to project, any data
connected to time series can be represented by Gantt Chart.
Bubble chart
Bubble chart are not exactly their own sort of visulisation. It is a method to enhance scatter plots
and maps that illustrate link between 3 or more variables.
By varying size and colour of circles chart display large amount of data.
Eg :- computer scientist, teacher, engineer, writer
Histogram
Histogram illustrate distribution of data among various group and provide proportionate number
of entries in each category using different “bins”.=
How to use data visualisation in report design
1) Find a story in the data.
2) Create a narrative;
a) Engage the viewer with title and subheading.
b) Incorporate context into the data.
c) Create a consistent and logical flow.
d) Highlight significant discoveries and insight.
3) Choose the most suitable data visulisation.
4) Follow the visual language.
5) Publicize the report.
Notes By: Haris Hameed – Btech(EEE), Mtech(PS) – Academic Head – PROFINZ
Tools and Techniques of Visualisation and Graphical Presentation
Tableau
It is an application for graph, chart, map.
Tableau desktop
Tableau public :- free version of Tableau desktop with some restrictions.
It take time and effort to understand Tableau.
Microsoft Power BI
It is a visualisation tool for business intelligence data. It includes reporting, self service analytics,
predictive analytics.
Microsoft Excel
It is easy use visualisation tool. It includes several options of viewing data such as scatter plot,
bar chart, line chart, pie chart, histogram, tree map.
Qlik View
It is a drag and drop visualisation interface. It enable user to make quicker and more informed
choices by speeding analytics. It allow colour coded table, bar graph, pie chart, Line chart and
sliders.
Notes By: Haris Hameed – Btech(EEE), Mtech(PS) – Academic Head – PROFINZ
Module 4
DATA ANALYSIS AND MODELLING
TYPES OF DATA ANALYTICS
There are 4 types of data analytics, depending on the type of available data and type of
knowledge.
A. Descriptive analytics
It is the most basic type of analytics. It looks at the data to examine, understand and
describe something that has already happened. It provides quantitative information on
about “what happened”. It uses statistical technique like mean, median, mode.
Advantages and Disadvantages of descriptive analytics
Easily applicable to day to day operations. Since, it depends on historical data and
basic computation.
Doesn’t need in depth understanding of analytics.
This analysis is limited to available data.
Difficult to handle big data.
Eg:- Social media usage and engagement.
Past sales and operational data’s.
B. Diagnostic analytics
It goes deeper than descriptive analytics by seeking to understand why behind what
happened. It uses statistical technique like, correlation of two different data bases.
Advantages and Disadvantages of Diagnostic analytics
Data turns into visuals and insight.
Develop solution to data related problems.
Derive values from the data.
It is complex and requires deep understanding of statistical technique.
Time consuming.
Eg:- HR department examine performance of its employees based on quarterly
Performance level, absentees and weekly overtime.
Cyber security team determines relation between security rating, number of
Incidents and time to resolution.
C. Predictive analytics
This is based on historical data, past trend and assumption to answer question about
“what will happen in future”. It correlates the result of descriptive and diagnostic
analytics with external data base.
Notes By: Haris Hameed – Btech(EEE), Mtech(PS) – Academic Head – PROFINZ
Advantages and Disadvantages of Predictive analytics
It serves as crucial tool for forecasting probable future occurrences.
It never be absolutely precise.
Eg:- Clinical decision support software for aged care.
E-commerce software, anticipating client preference based on previous
Purchase and search history.
D. Prescriptive analytics
It identifies specific actions an individual organization should take to reach targets and
goals. In other words it come up with the recommendation of “what action to be taken”.
This type of analytical skill observed in Artificial Intelligence and Machine Learning.
Advantages Disadvantages of Prescriptive analytics
It gives important insight for data driven to optimize corporate performance.
It requires large volume of data.
Machine learning minimizes likelihood of human mistakes.
Eg:- It is used for monitoring price fluctuation in oil manufacturing.
Insurance company evaluating customer risks in terms of price and premium.
Artificial Intelligence (AI)
It is a science and engineering of making intelligent machine especially intelligent computer
programmes. Its related to the similar task of using computer to understand human intelligence.
Application of AI in finance
A. Investment services
i) Algorithmic services
ii) Robo advisory
iii) Insurance claim processing
iv) Pricing of insurance product
B. Customer services
i) Detecting new sell opportunity
ii) Know your customer(KYC)
iii) Prediction of customers churing
C. Audit and compliance:
i) Fraud detection
ii) Regulatory compliances
iii) Travel and expense management
D. Lending:
i) Retail and commercial lending operation.
Notes By: Haris Hameed – Btech(EEE), Mtech(PS) – Academic Head – PROFINZ
ii) Retail and commercial lending score.
iii) Detecting possibility of default.
Types of Artificial Intelligence (AI)
A. Week AI
It is also known as narrow AI. This AI has been trained to do particular task.
Eg:- Apple siri, Amazon alexa, IBM Watson(business supporter)
B. Strong AI
It has two types: artificial general intelligence and artificial super intelligence.
i) AGI (Artificial General Intelligence)
Here machine possess human level intellect (reason and understanding) ability to
solve problems, learn and plan for future.
ii) ASI (Artificial Super Intelligence)
Intelligence and capability similar to human brain. It is a theoretical concept and has
no practical application.
DEEP LEARNING
Deep learning and Machine learning are subfields of AI. And deep learning is a subfield of
machine learning. The word ‘deep’ in deep learning refers to a neural network with more than 3
layers. Which includes input layer, multiple hidden layer and output layer.
MACHINE LEARNING
Machine learning is a subset of artificial intelligence that involves the development of algorithm
and statistical model that enables computer system to improve their performance on a specific
task through experience. Machine learns from data and makes prediction or decision based on
that learning.
Approaches towards Machine learning
1. Supervised learning: It involves training and model on labeled data. Where algorithm
learn the relationship between the input features and corresponding output label.
Eg:- Email spam classification, Face verification, Speaker verification.
2. Unsupervised learning: It involves working with unlabelled data where, the algorithm
identifies pattern, structure or relationship within the data without supervision.
Eg:- Customer segmentation in marketing, clustering
3. Semi-supervised learning: It is the combination of labeled and unlabelled data for
training.
Eg:- Email spam filtering, speech recognition
Notes By: Haris Hameed – Btech(EEE), Mtech(PS) – Academic Head – PROFINZ
4. Reinforcement learning: It involves an agent learning to make decision by interacting
with a environmental. The agent receives feedback in the form of rewards or penalty and
allowing to learn best action to maximize the cumulative reward.
Eg:- Training a computer program to play video games, Autonomous robotic control,
Autonomous vehicle navigation etc..
5. Dimensionality reduction: It is a technique to reduce no of input features in a data while
preserving essential information.
Eg:- Converting 3D data into 2D.
ROBOTIC PROCESS AUTOMATION (RPA)
RPA is a technology that uses software robots or bots to automate repetitive and rule based task
within a business process.RPA bots are designed to interact with digital systems mimic human
actions and perform task just like human operators but without human interactions.
Benefits of RPA
Higher productivity
Higher accuracy
Serving of cost
Integration across platform
Better customer experience
Harnessing AI
Scalability
CLOUD COMPUTING
1. Private cloud – Physical components housed or premises or in a vendor data centre.
2. Public cloud – Data and application through internet and it is fully virtualized.
3. Hybrid cloud – It blends private cloud and public cloud.
Process of Data Analysis
1. Criteria for grouping data.
2. Collecting the data.
3. Organizing the data.
4. Cleaning the data.
5. Adopt right type of data analysis.
ie, descriptive, diagnostic, predictive and prescriptive data analytics.
Notes By: Haris Hameed – Btech(EEE), Mtech(PS) – Academic Head – PROFINZ
Benefits of Data Analytics
Improve decision making process.
Increase efficiency of operation.
Improved service to stakeholders.
DATA MINING
Data mining is the process of discovering patterns, relationships and anomalies an valuable
insight from a largest data sets. It combines techniques from statistics, machine learning and AI
and database management to analyze and interrupt complex data.
Steps in Data mining
1. Setting the business objective.
2. Preparation of data.
3. Model building and pattern mining.
4. Result evaluation and implementation of knowledge.
Techniques in Data mining
A. Association rule: It is a rule based technique for discovering association between
variables inside a given dataset. This method is commonly employed for market basket
analysis ie understanding purchase behaviour and product association.
B. Neural network: It is used in deep learning algorithm, it replicate the interconnection of
human brain through layers of nodes. Every node as a input weight, biases as well as an
output. If the output value exceeds the predetermined threshold the nodes fires and pass
the data to subsequent network layer.
C. Decision tree: Using classification and regression algorithm predict likely outcomes of
based on collection of decision. It’s a tree like representation.
D. K-nearest neighbor: It is also known as KNN algorithm. It classifies data points
depending on their closeness and correlation with other assessable data.
Implementation of Data mining in Finance and Management
Detecting money laundering and other financial crimes.
Prediction of loan repayment and customer credit policy.
Target marketing.
Design and construction of data warehouse.
Notes By: Haris Hameed – Btech(EEE), Mtech(PS) – Academic Head – PROFINZ
Standards of Data tagging and reporting
.XML – XML is a markup language used for encoding documents in a format that is both
human readable and machine readable.
XBRL – It is a specific application of XML designed for financial and business
reporting. It provides standardized way to describe financial information and make it
easier to exchange and analyse financial data across various organizations and systems.
XBRL tax are tailored for specific financial concepts such as revenue, expense, assets
and liability.
Notes By: Haris Hameed – Btech(EEE), Mtech(PS) – Academic Head – PROFINZ