B.Tech AI & DS Course Outline
B.Tech AI & DS Course Outline
UNIT – I
PREPARED BY
VERIFIED BY
HOD PRINCIPAL
CEO/CORRESPONDENT
1
CREDIT POINT
SEMESTER IV
SL. COURSE
COURSE TITLE L T P C
No. CODE
THEORY
4. Fundamentals Of Data
AD3491 Science and Analytics 3 0 0 3
5. CS3591 Computer Networks 3 0 2 4
TOTAL 17 1 12 24
2
SENGUNTHAR COLLEGE OF ENGINEERING
TIRUCHENGODE – 637 205.
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
LECTURE PLAN
Subject Code : AD 3491
Subject Name : FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS
Name of the Faculty : S.SANTHIPRIYA
Designation : Assistant Professor / AI&DS
Course : IV Semester B.Tech – Artificial Intelligence And Data
Science
Academic Year : 2022-2023
3
TEACHING No. of
TOPIC REFERENCE
AIDS HOURS
UNIT- I
INTRODUCTION
Need For Data Science Black
T1-CH1 1
Board
Benefits And Uses Black
T1-CH1 2
Board
Facets Of Data , Data Science Process Black
T1-CH2 2
Board
Setting The Research Goal Black
T1-CH2 1
Board
Retrieving Data Black
T1-CH2 1
Board
Cleansing, Integrating And Black
Transforming Data T1-CH2 1
Board
Interpretation Of R2 ,
Multiple
Regression Equations Regression Black
T1-CH3 1
Toward The Mean. Board
4
UNIT III
INFERENTIAL STATISTICS
Populations,Samples,Random Black
T1-CH4 1
Board
Hypothesis Testing ,Z-Test ,
Black
T1-CH4 2
Sampling ,Sampling Distribution Board
UNIT IV
ANALYSIS OF VARIANCE
6
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS L TP C
3 00 3
t-test for one sample – sampling distribution of t – t-test procedure – t-test for two
independent samples – p-value – statistical significance – t-test for two related samples.
F-test – ANOVA – Two- factor experiments – three f-tests – two-factor ANOVA –Introduction to
chi-square tests.
Linear least squares – implementation – goodness of fit – testing a linear model – weighted
resampling. Regression using StatsModels – multiple regression – nonlinear relationships –
logistic regression – estimating parameters – Time series analysis – moving averages –
missing values – serial correlation – autocorrelation. Introduction to survival analysis.
TOTAL : 45 PERIODS
TEXT BOOKS
1. David Cielen, Arno D. B. Meysman, and Mohamed Ali, “Introducing Data Sci-
ence”, Manning Publications, 2016.
REFERENCES
1. Allen B. Downey, “Think Stats: Exploratory Data Analysis in Python”, Green
Tea Press, 2014.
7
2. Sanjeev J. Wagh, Manisha S. Bhende, Anuradha D. Thakare, “Fundamentals
of Data Science”, CRC Press, 2022.
UNIT I
Facets Of Data
Retrieving Data
8
LIST OF IMPORTANT QUESTIONS
UNIT 1
INTRODUCTION
PART A
9
UNIT 1
INTRODUCTION TO DATA SCIENCE
PART A
1.What is Data Science?
Data Science is a field of computer science that explicitly deals with turning data into
information and extracting meaningful insights out of it. The reason why Data Science is so
popular is that the kind of insights it allows us to draw from the available data has led to
some major innovations in several products and companies. Using these insights, we are
able to determine the taste of a particular customer, the likelihood of a product succeeding
in a particular market, etc.
2. Differentiate between Data Analytics and Data Science
Data Analytics is a subset of Data Science. Data Science is a broad technology that
includes various subsets such as Data
Analytics, Data Mining, Data Visualization,
etc.
The goal of data analytics is to illustrate the The goal of data science is to discover
precise details of retrieved insights. meaningful insights from massive datasets
and derive the best possible solutions to
resolve business issues.
It focuses on just finding the solutions. Data Science not only focuses on finding
the solutions but also predicts the future
with past patterns or insights.
A data analyst’s job is to analyse data in A data scientist’s job is to provide insightful
order to make decisions. data visualizations from raw data that are
easily understandable.
10
3. What are the challenges in Data Science?
Multiple Data Sources. ...
Data Security. ...
Lack of Clarity on Business Problem. ...
Undefined KPIs and Metrics. ...
Difficulty in Finding Skilled Data Scientists. ...
Getting Value Out of Data Science
11
6. Explain unstructured data and give example.
Unstructured data just happens to be in greater abundance than structured
data is. Examples of unstructured data are: Rich media. Media and entertainment data,
surveillance data, geo-spatial data, audio, weather data. Document collections.
Temperature and humidity are the independent variables, and rain would be our dependent
variable. So, the logistic regression algorithm actually produces an S shape curve.
Now, let us look at another scenario: Let’s suppose that x-axis represents the runs scored
by Virat Kohli and the y-axis represents the probability of the team India winning the match.
12
From this graph, we can say that if Virat Kohli scores more than 50 runs, then there is a
greater probability for team India to win the match. Similarly, if he scores less than 50 runs
then the probability of team India winning the match is less than 50 percent.So, basically in
logistic regression, the Y value lies within the range of 0 and 1. This is how logistic regres-
sion works.
The confusion matrix is a table that is used to estimate the performance of a model. It
tabulates the actual values and the predicted values in a 2×2 matrix.
True Positive (d): This denotes all of those records where the actual values are true and the
predicted values are also true. So, these denote all of the true positives. False Negative (c):
This denotes all of those records where the actual values are true, but the predicted values
are false. False Positive (b): In this, the actual values are false, but the predicted values are
true. True Negative (a): Here, the actual values are false and the predicted values are also
false. So, if you want to get the correct values, then correct values would basically represent
all of the true positives and the true negatives. This is how the confusion matrix works.
11. What do you understand about the true-positive rate and false-positive rate?
True positive rate: In Machine Learning, true-positive rates, which are also referred to
as sensitivity or recall, are used to measure the percentage of actual positives which are
correctly identified.
Formula: True Positive Rate = True Positives/Positives
False positive rate: False positive rate is basically the probability of falsely rejecting
the null hypothesis for a particular test. The false-positive rate is calculated as the ratio be-
tween the number of negative events wrongly categorized as positive (false positive) upon
13
the total number of actual events. Formula: False-Positive Rate = False-Positives/Nega-
tives.
In traditional programming paradigms, we used to analyze the input, figure out the expected
output, and write code, which contains rules and statements needed to transform the pro-
vided input into the expected output. As we can imagine, these rules were not easy to write,
especially, for data that even computers had a hard time understanding, e.g., images,
videos, etc.
Data Science shifts this process a little bit. In it, we need access to large volumes of data
that contain the necessary inputs and their mappings to the expected outputs. Then, we use
Data Science algorithms, which use mathematical analysis to generate rules to map the
given inputs to outputs.
This process of rule generation is called training. After training, we use some data that was
set aside before the training phase to test and check the system’s accuracy. The generated
rules are a kind of a black box, and we cannot understand how the inputs are being trans -
formed into outputs.
Supervised and unsupervised learning are two types of Machine Learning techniques.
They both allow us to build models. However, they are used for solving different kinds of problems.
Works on the data that contains both Works on the data that contains no
inputs and the expected output, i.e., the mappings from input to output, i.e., the
labeled data unlabeled data
Used to create models that can be Used to extract meaningful information out
employed to predict or classify things of large volumes of data
14
Commonly used supervised learning Commonly used unsupervised learning
algorithms: Linear regression, decision algorithms: K-means clustering, Apriori
tree, etc. algorithm, etc.
14. What is the difference between the long format data and wide format data?
A long format data has a column for Whereas, Wide data has a column for each
possible variable types and a column for variable.
the values of those variables.
Each row in the long format represents one The repeated responses of a subject will be
time point per subject. As a result, each in a single row, with each response in its
topic will contain many rows of data. own column, in the wide format.
This data format is most typically used in R This data format is most widely used in
analysis and for writing to log files at the data manipulations, stats programmes for
end of each experiment. repeated measures ANOVAs and is seldom
used in R analysis.
A long format contains values that do A wide format contains values that do not
repeat in the first column. repeat in the first column.
Use df.melt() to convert wide form to long use df.pivot().reset_index() to convert long
form form into wide form
15. Mention some techniques used for sampling. What is the main advantage of
sampling?
Probability sampling: It involves random selection which makes every element get a
chance to be selected. Probability sampling has various subtypes in it, as mentioned below:
15
Simple Random Sampling
Stratified sampling
Systematic sampling
Cluster Sampling
Multi-stage Sampling
Convenience Sampling
Purposive Sampling
Quota Sampling
Bias is a type of error that occurs in a Data Science model because of using an algo-
rithm that is not strong enough to capture the underlying patterns or trends that exist in the
data. In other words, this error occurs when the data is too complicated for the algorithm to
understand, so it ends up building a model that makes simple assumptions. This leads to
lower accuracy because of underfitting. Algorithms that can lead to high bias are linear re-
gression, logistic regression, etc.
16
18.Why is Python used for Data Cleaning in DS?
Data Scientists have to clean and transform the huge data sets in a form that they
can work with. It is important to deal with the redundant data for better results by removing
nonsensical outliers, malformed records, missing values, inconsistent formatting, etc.
Python libraries such as Matplotlib, Pandas, Numpy, Keras, and SciPy are extensively used
for Data cleaning and analysis. These libraries are used to load and clean the data and do
effective analysis. For example, a CSV file named “Student” has information about the stu-
dents of an institute like their names, standard, address, phone number, grades, marks, etc.
Below are the popular libraries used for data extraction, cleaning, visualization, and deploy-
ing DS models:
Pandas: Used to implement the ETL(Extracting, Transforming, and Loading the data-
sets) capabilities in business applications.
Matplotlib: Being free and open-source, it can be used as a replacement for MAT-
LAB, which results in better performance and low memory consumption.
PyTorch: Best for projects which involve Machine Learning algorithms and Deep
Neural Networks.
Variance is a type of error that occurs in a Data Science model when the model ends
up being too complex and learns features from data, along with the noise that exists in it.
This kind of error can occur if the algorithm used to train the model has high complexity,
even though the data and the underlying patterns and trends are quite easy to discover.
This makes the model a very sensitive one that performs well on the training dataset but
17
poorly on the testing dataset, and on any kind of data that the model has not yet seen. Vari-
ance generally leads to poor accuracy in testing and results in overfitting.
Pruning a decision tree is the process of removing the sections of the tree that are
not necessary or are redundant. Pruning leads to a smaller decision tree, which performs
better and gives higher accuracy and speed.
When building a decision tree, at each step, we have to create a node that decides
which feature we should use to split data, i.e., which feature would best separate our data
so that we can make predictions. This decision is made using information gain, which is a
measure of how much entropy is reduced when a particular feature is used to split the data.
The feature that gives the highest information gain is the one that is chosen to split the data.
In k-fold cross-validation, we divide the dataset into k equal parts. After this, we loop
over the entire dataset k times. In each iteration of the loop, one of the k parts is used for
testing, and the other k − 1 parts are used for training. Using k-fold cross-validation, each
one of the k parts of the dataset ends up being used for training and testing purposes.
18
25.Explain how a recommender system works.
For example, imagine that we have a movie streaming platform, similar to Netflix or
Amazon Prime. If a user has previously watched and liked movies from action and horror
genres, then it means that the user likes watching the movies of these genres. In that case,
it would be better to recommend such movies to this particular user. These recommenda-
tions can also be generated based on what users with a similar taste like watching.
Data distribution is a visualization tool to analyze how data is spread out or distrib-
uted. Data can be distributed in various ways. For instance, it could be with a bias to the left
or the right, or it could all be jumbled up.
Data may also be distributed around a central value, i.e., mean, median, etc. This
kind of distribution has no bias either to the left or to the right and is in the form of a bell-
shaped curve. This distribution also has its mean equal to the median. This kind of distribu-
tion is called a normal distribution.
Deep Learning is a kind of Machine Learning, in which neural networks are used to
imitate the structure of the human brain, and just like how a brain learns from information,
machines are also made to learn from the information that is provided to them.
Deep Learning is an advanced version of neural networks to make the machines learn from
data. In Deep Learning, the neural networks comprise many hidden layers (which is why it is
called ‘deep’ learning) that are connected to each other, and the output of the previous layer
is the input of the current layer.
PART B
20
1.Explain various steps in the Data Science process (OR) Data Science Lifecycle
The main phases of data science life cycle are given below:
1. Discovery: The first phase is discovery, which involves asking the right questions. When
you start any data science project, you need to determine what are the basic requirements,
priorities, and project budget. In this phase, we need to determine all the requirements of
the project such as the number of people, technology, time, data, an end goal, and then we
can frame the business problem on first hypothesis level.
2. Data preparation: Data preparation is also known as Data Munging. In this phase, we
need to perform the following tasks:
Data cleaning
Data Reduction
Data integration
Data transformation,
21
After performing all the above tasks, we can easily use this data for our further processes.
3. Model Planning: In this phase, we need to determine the various methods and tech-
niques to establish the relation between input variables. We will apply Exploratory data ana-
lytics(EDA) by using various statistical formula and visualization tools to understand the re-
lations between variable and to see what data can inform us. Common tools used for model
planning are:
R
SAS
Python
4. Model-building: In this phase, the process of model building starts. We will create
datasets for training and testing purpose. We will apply different techniques such as associ-
ation, classification, and clustering, to build the model.
WEKA
SPCS Modeler
MATLAB
5. Operationalize: In this phase, we will deliver the final reports of the project, along with
briefings, code, and technical documents. This phase provides you a clear overview of com-
plete project performance and other components on a small scale before the full deploy-
ment.
6. Communicate results: In this phase, we will check if we reach the goal, which we have
set on the initial phase. We will communicate the findings and final result with the business
team.
22
Data science is currently using for Image and speech recognition. When you upload
an image on Facebook and start getting the suggestion to tag to your friends. This
automatic tagging suggestion uses image recognition algorithm, which is part of data
science.
When you say something using, "Ok Google, Siri, Cortana", etc., and these devices
respond as per voice control, so this is possible with speech recognition algorithm.
Gaming world:
In the gaming world, the use of Machine learning algorithms is increasing day by day.
EA Sports, Sony, Nintendo, are widely using data science for enhancing user experi-
ence.
Internet search:
When we want to search for something on the internet, then we use different types of
search engines such as Google, Yahoo, Bing, Ask, etc. All these search engines use
the data science technology to make the search experience better, and you can get a
search result with a fraction of seconds.
Transport:
Transport industries also using data science technology to create self-driving cars.
With self-driving cars, it will be easy to reduce the number of road accidents.
Healthcare:
In the healthcare sector, data science is providing lots of benefits. Data science is be-
ing used for tumor detection, drug discovery, medical image analysis, virtual medical
bots, etc.
Recommendation systems:
Most of the companies, such as Amazon, Netflix, Google Play, etc., are using data
science technology for making a better user experience with personalized recom-
mendations. Such as, when you search for something on Amazon, and you started
getting suggestions for similar products, so this is because of data science techno-
logy.
Risk detection:
23
Finance industries always had an issue of fraud and risk of losses, but with the help
of data science, this can be rescued.
Most of the finance companies are looking for the data scientist to avoid risk and any
type of losses with an increase in customer satisfaction.
In Data Science and Big Data you’ll come across many different types of data, and each of
them tends to require different tools and techniques. The main categories of data are
these:
Structured
Unstructured
Natural Language
Machine-generated
Graph-based
Streaming
Structured Data
Structured data is the data that depends on a data model and resides in a fixed field
within a record. It’s often easy to store structured data in tables within data bases or Excel
files. SQL, Structured Query Language, is the preferred way to manage and query data that
resides in data bases. You may also come across structured data that might give you a hard
Hierarchical data such as a family tree is one such example.The world isn’t made up
Unstructured Data
24
Unstructured data is data that isn’t easy to fit into a data model because the con-
tent is context-specific or varying. One example of unstructured data is your regular email.
Although email contains structured elements such as the sender, title, and body text, it’s a
challenge to find the number of people who have written an email complaint about a specific
employee because so many ways exist to refer to a person, for example. The thousands of
Natural Language
natural language processing community has had success in entity recognition, topic re-
cognition, summarization, text completion, and sentiment analysis, but models trained
in one domain don’t generalize well to other domains. Even state-of-the-art techniques aren’t
able to decipher the meaning of every piece of text. This shouldn’t be a surprise though: hu-
mans struggle with natural language as well. It’s ambiguous by nature. The concept of
meaning itself is questionable here. Have two people listen to the same conversation. Will
they get the same meaning? The meaning of the same words can vary when coming from
Machine-generated Data
25
The analysis of Machine data relies on highly scalable tools, due to high volume and
speed.
Examples are, web server logs, call detail records, network event logs and telemetry.
This is not the best approach for highly interconnected or “networked” data, where the
“Graph data” can be a confusing term because any data can be shown in a graph.
“Graph” in this case points to mathematical graph theory. In graph theory, a graph is
network data is, in short, data that focuses on the relationship or adjacency of ob-
jects.
The graph structures use nodes, edges, and properties to represent and store graphical
data.
26
Friends in social network is an example of Graph-based data
Graph-based data is a natural way to represent social networks, and its structure al-
lows you to calculate specific metrics such as the influence of a person and the shortest path
Graph databases are used to store graph-based data and are queried with specialized
Graph data poses its challenges, but for a computer interpreting additive and image data, it
Audio, image, and video are data types that pose specific challenges to a data scient-
ist. Tasks that are trivial for humans, such as recognizing objects in pictures, turn out to be
Multimedia data in the form of audio, video, images and sensor signals have become
an integral part of everyday life. Moreover, they have revolutionized product testing and evid-
ence collection by providing multiple sources of data for quantitative and systematic assess-
ment.
We have various libraries, development languages and IDEs commonly used in the field,
such as :
MATLAB
openCV
ImageJ
27
Python
R
Java
C
C++
C#
Streaming Data
While streaming data can take almost any of the previous forms, it has an extra prop-
erty. The data flows into the system when an event happens instead of being
loaded into a data store in a batch. Although it isn’t really a different type of data, we treat it
here as much because you need to adapt your process to deal with this type of information.
Data cleaning is the process that removes data that does not belong in your dataset.
Data transformation is the process of converting data from one format or structure into an-
other. Transformation processes can also be referred to as data wrangling, or data mung-
ing, transforming and mapping data from one "raw" data form into another format for ware-
housing and analyzing. This article focuses on the processes of cleaning that data.
While the techniques used for data cleaning may vary according to the types of data
your company stores, you can follow these basic steps to map out a framework for your or-
ganization.
28
Step 1: Remove duplicate or irrelevant observations
Structural errors are when you measure or transfer data and notice strange naming
conventions, typos, or incorrect capitalization. These inconsistencies can cause mislabeled
categories or classes. For example, you may find “N/A” and “Not Applicable” both appear,
but they should be analyzed as the same category.
Often, there will be one-off observations where, at a glance, they do not appear to fit
within the data you are analyzing. If you have a legitimate reason to remove an outlier, like
improper data-entry, doing so will help the performance of the data you are working with.
However, sometimes it is the appearance of an outlier that will prove a theory you are work-
ing on. Remember: just because an outlier exists, doesn’t mean it is incorrect. This step is
needed to determine the validity of that number. If an outlier proves to be irrelevant for anal-
ysis or is a mistake, consider removing it.
You can’t ignore missing data because many algorithms will not accept missing values.
There are a couple of ways to deal with missing data. Neither is optimal, but both can be
considered.
1. As a first option, you can drop observations that have missing values, but doing this
will drop or lose information, so be mindful of this before you remove it.
29
2. As a second option, you can input missing values based on other observations;
again, there is an opportunity to lose integrity of the data because you may be oper-
ating from assumptions and not actual observations.
3. As a third option, you might alter the way the data is used to effectively navigate null
values.
At the end of the data cleaning process, you should be able to answer these questions
as a part of basic validation:
Does the data make sense?
Does the data follow the appropriate rules for its field?
Does it prove or disprove your working theory, or bring any insight to light?
Can you find trends in the data to help you form your next theory?
False conclusions because of incorrect or “dirty” data can inform poor business strategy
and decision-making. False conclusions can lead to an embarrassing moment in a reporting
meeting when you realize your data doesn’t stand up to scrutiny. Before you get there, it is
important to create a culture of quality data in your organization. To do this, you should doc-
ument the tools you might use to create this culture and what data quality means to you.
Try Tableau for free to create beautiful visualizations with your data.
Validity. The degree to which your data conforms to defined business rules or constraints.
30
3. Consistency. Ensure your data is consistent within the same dataset and/or across
multiple data sets.
4. Uniformity. The degree to which the data is specified using the same unit of meas-
ure.
Having clean data will ultimately increase overall productivity and allow for the highest
quality information in your decision-making. Benefits include:
Ability to map the different functions and what your data is intended to do.
Monitoring errors and better reporting to see where errors are coming from, making it
easier to fix incorrect or corrupt data for future applications.
Using tools for data cleaning will make for more efficient business practices and
quicker decision-making.
31
Incorrect data may lead to bad decisions
While operating your business you rely on certain source of data, based on
which you make most of your business decisions. If the data has a lot of errors, the
decisions you take may be incorrect and prove to be hazardous for your business.
The way you collect data and how your data warehouse functions can easily have an
impact on your productivity.
32
5.What are the Steps Involved in Data Science Modelling.
The key steps involved in Data Science Modelling are:
Data Cleaning is useful as you need to sanitize Data while gathering it. The following are
some of the most typical causes of Data Inconsistencies and Errors:
33
Variables with missing values across multiple Databases.
Exploratory Data Analysis (EDA) is a robust technique for familiarising yourself with
Data and extracting useful insights. Data Scientists sift through Unstructured Data to find
patterns and infer relationships between Data elements. Data Scientists use Statistics and
Visualisation tools to summarise Central Measurements and variability to perform EDA.
If Data skewness persists, appropriate transformations are used to scale the distribu-
tion around its mean. When Datasets have a lot of features, exploring them can be difficult.
As a result, to reduce the complexity of Model inputs, Feature Selection is used to rank
them in order of significance in Model Building for enhanced efficiency. Using Business In-
telligence tools like Tableau, MicroStrategy, etc. can be quite beneficial in this step. This
step is crucial in Data Science Modelling as the Metrics are studied carefully for validation of
Data Outcomes.
Feature Selection is the process of identifying and selecting the features that contrib-
ute the most to the prediction variable or output that you are interested in, either automati-
cally or manually.
The presence of irrelevant characteristics in your Data can reduce the Model accuracy
and cause your Model to train based on irrelevant features. In other words, if the features
are strong enough, the Machine Learning Algorithm will give fantastic outcomes. Two types
of characteristics must be addressed:
This is one of the most crucial processes in Data Science Modelling as the Machine
Learning Algorithm aids in creating a usable Data Model. There are a lot of algorithms to
pick from, the Model is selected based on the problem. There are three types of Machine
Learning methods that are incorporated:
34
1) Supervised Learning
It is based on the results of a previous operation that is related to the existing business
operation. Based on previous patterns, Supervised Learning aids in the prediction of an out-
come. Some of the Supervised Learning Algorithms are:
Linear Regression
Random Forest
2) Unsupervised Learning
K-means Clustering
Hierarchical Clustering
Anomaly Detection
3) Reinforcement Learning
It is a fascinating Machine Learning technique that uses a dynamic Dataset that interacts
with the real world. In simple terms, it is a mechanism by which a system learns from its
mistakes and improves over time. Some of the Reinforcement Learning Algorithms are:
Q-Learning
State-Action-Reward-State-Action (SARSA)
Deep Q Network
This is the next phase, and it’s crucial to check that our Data Science Modelling ef-
forts meet the expectations. The Data Model is applied to the Test Data to check if it’s accu-
rate and houses all desirable features. You can further test your Data Model to identify any
35
adjustments that might be required to enhance the performance and achieve the desired re-
sults. If the required precision is not achieved, you can go back to Step 5 (Machine Learning
Algorithms), choose an alternate Data Model, and then test the model again.
The Model which provides the best result based on test findings is completed and de-
ployed in the production environment whenever the desired result is achieved through
proper testing as per the business needs. This concludes the process of Data Science Mod-
elling.
Banking and Finance: The banking industry can benefit from Data Science in many
aspects. Fraud Detection is a well-known application in this field that assists banks in
reducing non-performing assets.
Healthcare: Health concerns are being monitored and prevented using Wearable
Data. The Data acquired from the body can be used in the medical field to prevent fu-
ture calamities.
Marketing: Marketing offers a lot of potential, such as a more effective price strategy.
Pricing based on Data Science can help companies like Uber and E-Commerce busi-
nesses enhance their profits.
Government Policies: Based on Data gathered through surveys and other official
sources, the government can use Data Science to better build policies that cater to
the interests and wishes of the people
36
37
38