0% found this document useful (0 votes)

57 views38 pages

B.Tech AI & DS Course Outline

Uploaded by

ssanthipriya.aids

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views38 pages

B.Tech AI & DS Course Outline

Uploaded by

ssanthipriya.aids

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 38

AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS

II YEAR / III SEMESTER (B.Tech- ARTIFICIAL INTELLIGENCE AND DATA SCIENCE)

UNIT – I

INTRODUCTION TO DATA SCIENCE

PREPARED BY

S.SANTHI PRIYA, M.E., (AP/ AI&DS)

VERIFIED BY

HOD PRINCIPAL
CEO/CORRESPONDENT

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

SENGUNTHAR COLLEGE OF ENGINEERING ,TIRUCHENGODE-637 205.

1
CREDIT POINT

ANNA UNIVERSITY, CHENNAI

AFFILIATED INSTITUTIONS
R-2021
B.TECH- ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

SEMESTER IV

SL. COURSE
COURSE TITLE L T P C
No. CODE
THEORY

1. MA3391 Probability and Statistics 3 1 0 4

2. AL3452 Operating Systems 3 0 2 4

3. AL3451 Machine Learning 3 0 0 3

4. Fundamentals Of Data
AD3491 Science and Analytics 3 0 0 3
5. CS3591 Computer Networks 3 0 2 4

6. Environmental Sciences and

GE3451 Sustainability 2 0 0 2
PRACTICALS
8. Data Science and Analytics 0 0 4
AD3411 Laboratory 2
9. AL3461 Machine Learning Laboratory 0 0 4 2

NCC Credit Course Level 2# 3 0 0

TOTAL 17 1 12 24

2
SENGUNTHAR COLLEGE OF ENGINEERING
TIRUCHENGODE – 637 205.
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
LECTURE PLAN
Subject Code : AD 3491
Subject Name : FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS
Name of the Faculty : S.SANTHIPRIYA
Designation : Assistant Professor / AI&DS
Course : IV Semester B.Tech – Artificial Intelligence And Data
Science
Academic Year : 2022-2023

Recommended Text books / Reference books

S.No. Title of the book Author Reference

Introducing Data Science Manning Public- David Cielen, Arno D.

1 ations, 2016. (first two chapters for Unit I). B. Meysman, and T1
Mohamed Ali,

Statistics Eleventh Edition, Wiley Robert S. Witte and

2 T2
Publications, 2017. John S.

Think Stats: Exploratory Data

Allen B. Downey
3 Analysis in Python”, Green Tea R1
Press, 2014.

3
TEACHING No. of
TOPIC REFERENCE
AIDS HOURS
UNIT- I
INTRODUCTION
Need For Data Science Black
T1-CH1 1
Board
Benefits And Uses Black
T1-CH1 2
Board
Facets Of Data , Data Science Process Black
T1-CH2 2
Board
Setting The Research Goal Black
T1-CH2 1
Board
Retrieving Data Black
T1-CH2 1
Board
Cleansing, Integrating And Black
Transforming Data T1-CH2 1
Board

Exploratory Data Analysis, Build The Models T1-CH2 Black

1
Board
Presenting And Building Applications. T1-CH2 Black
1
Board
UNIT II
DESCRIPTIVE ANALYTICS

Frequency Distributions , Outliers Black

T1-CH3 1
Board
Interpreting Distributions, Graphs Black
T1-CH3 1
Board
Averages , Describing Variability Black
T1-CH3 2
Board
Inter quartile Range, Variability For Black
Qualitative And Ranked Data T1-CH3 2
Board

Normal Distributions , Z Scores Black

T1-CH3 1
Board
Regression,Regression Line, Black
Correlation T1-CH3 1
Board

ScatterPlots , Least Squares

Regression Line, Standard Error Of Black
T1-CH3 1
Board
Estimate

Interpretation Of R2 ,
Multiple
Regression Equations Regression Black
T1-CH3 1
Toward The Mean. Board

4
UNIT III
INFERENTIAL STATISTICS

Populations,Samples,Random Black
T1-CH4 1
Board
Hypothesis Testing ,Z-Test ,
Black
T1-CH4 2
Sampling ,Sampling Distribution Board

Procedure ,Decision Rule Black

T1-CH4 2
Board
Calculations ,Decisions, Interpretations Black
T1-CH4 1
Board
Mean - One-Tailed And Two-Tailed Tests ,
Black
T1-CH4 2
Standard Error Of The Estimation Board
Point Estimate , Confidence Interval,
Level Of Confidence, Effect Of Sample Black
Size. T1-CH4 1
Board

UNIT IV
ANALYSIS OF VARIANCE

T-Test For One Sample ,Sampling Black

T1-CH8 2
Board
T-Test Procedure , T-Test For Two Black
Independent Samples T1-CH8 2
Board

P-Value , Statistical Significance , T-Test Black

For Two Related Samples T1-CH8 2
Board

F-Test,ANOV , Two- Factor Experiments Black

Three F-Tests T1-CH8 2
Board

Two-Factor ANOVA ,Introduction To Black

Chi-Square Tests. T1-CH7 2
Board
UNIT V
PREDICTIVE ANALYTICS

Linear Least Squares , Missing Values Black

T1-CH7 2
Board
Testing A Linear Model ,Weighted Black
Resampling. T1-CH9 2
Board

Regression Using Statsmodels ,

Multiple Regression , Goodness Of Black
T1-CH9 1
Board
Nonlinear Relationships

Logistic Regression , Estimating T1-CH9 Black 2

5
Parameters, Serial Correlation
,Autocorrelation. Board

Implementation, Time Series Analysis

Moving Averages, Survival Analysis. Black
T1-CH9 1
Board

Total Lecture Hours 45

Revision Hours 15
Total Hours 60

6
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS L TP C
3 00 3

UNIT I INTRODUCTION TO DATA SCIENCE 08

Need for data science – benefits and uses – facets of data – data science process –
setting the research goal – retrieving data – cleansing, integrating, and transforming data –
exploratory data analysis – build the models – presenting and building applications.

UNIT II DESCRIPTIVE ANALYTICS 10

Frequency distributions – Outliers –interpreting distributions – graphs – averages -

describing variability – interquartile range – variability for qualitative and ranked data -
Normal distributions – z scores –correlation – scatter plots – regression – regression line
– least squares regression line – standard error of estimate – interpretation of r 2 – multiple
regression equations – regression toward the mean.

UNIT III INFERENTIAL STATISTICS 09

Populations – samples – random sampling – Sampling distribution- standard error of the

mean - Hypothesis testing – z-test – z-test procedure –decision rule – calculations –
decisions – interpretations - one-tailed and two-tailed tests – Estimation – point estimate –
confidence interval – level of confidence – effect of sample size.

UNIT IV ANALYSIS OF VARIANCE 09

t-test for one sample – sampling distribution of t – t-test procedure – t-test for two
independent samples – p-value – statistical significance – t-test for two related samples.
F-test – ANOVA – Two- factor experiments – three f-tests – two-factor ANOVA –Introduction to
chi-square tests.

UNIT V PREDICTIVE ANALYTICS 09

Linear least squares – implementation – goodness of fit – testing a linear model – weighted
resampling. Regression using StatsModels – multiple regression – nonlinear relationships –
logistic regression – estimating parameters – Time series analysis – moving averages –
missing values – serial correlation – autocorrelation. Introduction to survival analysis.

TOTAL : 45 PERIODS
TEXT BOOKS
1. David Cielen, Arno D. B. Meysman, and Mohamed Ali, “Introducing Data Sci-
ence”, Manning Publications, 2016.

REFERENCES
1. Allen B. Downey, “Think Stats: Exploratory Data Analysis in Python”, Green
Tea Press, 2014.
7
2. Sanjeev J. Wagh, Manisha S. Bhende, Anuradha D. Thakare, “Fundamentals
of Data Science”, CRC Press, 2022.

UNIT I

INTRODUCTION TO DATA SCIENCE

 Need For Data Science

 Benefits And Uses

 Facets Of Data

 Data Science Process

 Setting The Research Goal

 Retrieving Data

 Cleansing, Integrating, And Transforming Data

 Exploratory Data Analysis

 Build The Models

 Presenting And Building Applications.

8
LIST OF IMPORTANT QUESTIONS
UNIT 1
INTRODUCTION
PART A

1. What is Data Science?

2. Differentiate between Data Analytics and Data Science
3. What are the challenges in Data Science?
4. List the facets of data.
5. What are the steps in data science process ?
6. Explain unstructured data and give example.
7. What do you understand about linear regression?
8. What are outliers?
9. What do you understand by logistic regression?
10.What is a confusion matrix?
11. What do you understand about the true-positive rate and false-positive rate?
12.How is Data Science different from traditional application programming?
13.Explain the differences between supervised and unsupervised learning.
14.What is the difference between the long format data and wide format data?
15.Mention some techniques used for sampling. What is the main advantage of
sampling?
PART B
1.Explain various steps in the Data Science process (OR) Data Science Lifecycle.
2.Explain the facets of data in detail
3.What are the steps in Data Cleansing, explain with example.
4.Explain the common error occur in data cleansing process
5.Steps Involved in Data Science Modelling.

9
UNIT 1
INTRODUCTION TO DATA SCIENCE
PART A
1.What is Data Science?
Data Science is a field of computer science that explicitly deals with turning data into
information and extracting meaningful insights out of it. The reason why Data Science is so
popular is that the kind of insights it allows us to draw from the available data has led to
some major innovations in several products and companies. Using these insights, we are
able to determine the taste of a particular customer, the likelihood of a product succeeding
in a particular market, etc.
2. Differentiate between Data Analytics and Data Science

Data Analytics Data Science

Data Analytics is a subset of Data Science. Data Science is a broad technology that
includes various subsets such as Data
Analytics, Data Mining, Data Visualization,
etc.

The goal of data analytics is to illustrate the The goal of data science is to discover
precise details of retrieved insights. meaningful insights from massive datasets
and derive the best possible solutions to
resolve business issues.

Requires just basic programming Requires knowledge in advanced

languages. programming languages.

It focuses on just finding the solutions. Data Science not only focuses on finding
the solutions but also predicts the future
with past patterns or insights.

A data analyst’s job is to analyse data in A data scientist’s job is to provide insightful
order to make decisions. data visualizations from raw data that are
easily understandable.

10
3. What are the challenges in Data Science?
 Multiple Data Sources. ...
 Data Security. ...
 Lack of Clarity on Business Problem. ...
 Undefined KPIs and Metrics. ...
 Difficulty in Finding Skilled Data Scientists. ...
 Getting Value Out of Data Science

4. List the facets of data.

There are many facets of data science, including:
 Identifying the structure of data.
 Cleaning, filtering, reorganizing, augmenting, and aggregating data.
 Visualizing data.
 Data analysis, statistics, and modeling.
 Machine Learning.
 Assembling data processing pipelines to link these steps.

5. What are the steps in data science process ?

The Data Science Process
Step 1: Frame the problem. ...
Step 2: Collect the raw data needed for your problem. ...
Step 3: Process the data for analysis. ...
Step 4: Explore the data. ...
Step 5: Perform in-depth analysis. ...
Step 6: Communicate results of the analysis.

11
6. Explain unstructured data and give example.
Unstructured data just happens to be in greater abundance than structured
data is. Examples of unstructured data are: Rich media. Media and entertainment data,
surveillance data, geo-spatial data, audio, weather data. Document collections.

7. What do you understand about linear regression?

Linear regression helps in understanding the linear relationship between the depen-
dent and the independent variables. Linear regression is a supervised learning algorithm,
which helps in finding the linear relationship between two variables. One is the predictor or
the independent variable and the other is the response or the dependent variable. In Linear
Regression, we try to understand how the dependent variable changes w.r.t the indepen-
dent variable. If there is only one independent variable, then it is called simple linear regres-
sion, and if there is more than one independent variable then it is known as multiple linear
regression.8.

8. What are outliers?

An outlier is an observation that lies an abnormal distance from other values in

a random sample from a population

9. What do you understand by logistic regression?

Logistic regression is a classification algorithm that can be used when the dependent
variable is binary. Let’s take an example. Here, we are trying to determine whether it will
rain or not on the basis of temperature and humidity.

Temperature and humidity are the independent variables, and rain would be our dependent
variable. So, the logistic regression algorithm actually produces an S shape curve.

Now, let us look at another scenario: Let’s suppose that x-axis represents the runs scored
by Virat Kohli and the y-axis represents the probability of the team India winning the match.

12
From this graph, we can say that if Virat Kohli scores more than 50 runs, then there is a
greater probability for team India to win the match. Similarly, if he scores less than 50 runs
then the probability of team India winning the match is less than 50 percent.So, basically in
logistic regression, the Y value lies within the range of 0 and 1. This is how logistic regres-
sion works.

10. What is a confusion matrix?

The confusion matrix is a table that is used to estimate the performance of a model. It
tabulates the actual values and the predicted values in a 2×2 matrix.

True Positive (d): This denotes all of those records where the actual values are true and the
predicted values are also true. So, these denote all of the true positives. False Negative (c):
This denotes all of those records where the actual values are true, but the predicted values
are false. False Positive (b): In this, the actual values are false, but the predicted values are
true. True Negative (a): Here, the actual values are false and the predicted values are also
false. So, if you want to get the correct values, then correct values would basically represent
all of the true positives and the true negatives. This is how the confusion matrix works.

11. What do you understand about the true-positive rate and false-positive rate?
True positive rate: In Machine Learning, true-positive rates, which are also referred to
as sensitivity or recall, are used to measure the percentage of actual positives which are
correctly identified.
Formula: True Positive Rate = True Positives/Positives
False positive rate: False positive rate is basically the probability of falsely rejecting
the null hypothesis for a particular test. The false-positive rate is calculated as the ratio be-
tween the number of negative events wrongly categorized as positive (false positive) upon

13
the total number of actual events. Formula: False-Positive Rate = False-Positives/Nega-
tives.

12. How is Data Science different from traditional application programming?

Data Science takes a fundamentally different approach in building systems that pro-
vide value than traditional application development.

In traditional programming paradigms, we used to analyze the input, figure out the expected
output, and write code, which contains rules and statements needed to transform the pro-
vided input into the expected output. As we can imagine, these rules were not easy to write,
especially, for data that even computers had a hard time understanding, e.g., images,
videos, etc.

Data Science shifts this process a little bit. In it, we need access to large volumes of data
that contain the necessary inputs and their mappings to the expected outputs. Then, we use
Data Science algorithms, which use mathematical analysis to generate rules to map the
given inputs to outputs.

This process of rule generation is called training. After training, we use some data that was
set aside before the training phase to test and check the system’s accuracy. The generated
rules are a kind of a black box, and we cannot understand how the inputs are being trans -
formed into outputs.

13. Explain the differences between supervised and unsupervised learning.

Supervised and unsupervised learning are two types of Machine Learning techniques.
They both allow us to build models. However, they are used for solving different kinds of problems.

Supervised Learning Unsupervised Learning

Works on the data that contains both Works on the data that contains no
inputs and the expected output, i.e., the mappings from input to output, i.e., the
labeled data unlabeled data

Used to create models that can be Used to extract meaningful information out
employed to predict or classify things of large volumes of data

14
Commonly used supervised learning Commonly used unsupervised learning
algorithms: Linear regression, decision algorithms: K-means clustering, Apriori
tree, etc. algorithm, etc.

14. What is the difference between the long format data and wide format data?

Long Format Data Wide Format Data

A long format data has a column for Whereas, Wide data has a column for each
possible variable types and a column for variable.
the values of those variables.

Each row in the long format represents one The repeated responses of a subject will be
time point per subject. As a result, each in a single row, with each response in its
topic will contain many rows of data. own column, in the wide format.

This data format is most typically used in R This data format is most widely used in
analysis and for writing to log files at the data manipulations, stats programmes for
end of each experiment. repeated measures ANOVAs and is seldom
used in R analysis.

A long format contains values that do A wide format contains values that do not
repeat in the first column. repeat in the first column.

Use df.melt() to convert wide form to long use df.pivot().reset_index() to convert long
form form into wide form

15. Mention some techniques used for sampling. What is the main advantage of

sampling?

Sampling is defined as the process of selecting a sample from a group of people or

from any particular kind for research purposes. It is one of the most important factors which
decides the accuracy of a research/survey result.

Mainly, there are two types of sampling techniques:

Probability sampling: It involves random selection which makes every element get a
chance to be selected. Probability sampling has various subtypes in it, as mentioned below:

15
 Simple Random Sampling

 Stratified sampling

 Systematic sampling

 Cluster Sampling

 Multi-stage Sampling

 Non- Probability Sampling: Non-probability sampling follows non-random selection

which means the selection is done based on your ease or any other required criteria.
This helps to collect the data easily. The following are various types of sampling in it:

 Convenience Sampling

 Purposive Sampling

 Quota Sampling

 Referral /Snowball Sampling

16 What is bias in Data Science?

Bias is a type of error that occurs in a Data Science model because of using an algo-
rithm that is not strong enough to capture the underlying patterns or trends that exist in the
data. In other words, this error occurs when the data is too complicated for the algorithm to
understand, so it ends up building a model that makes simple assumptions. This leads to
lower accuracy because of underfitting. Algorithms that can lead to high bias are linear re-
gression, logistic regression, etc.

17. What is dimensionality reduction?

Dimensionality reduction is the process of converting a dataset with a high number of

dimensions (fields) to a dataset with a lower number of dimensions. This is done by drop-
ping some fields or columns from the dataset. However, this is not done haphazardly. In this
process, the dimensions or fields are dropped only after making sure that the remaining in-
formation will still be enough to succinctly describe similar information.

16
18.Why is Python used for Data Cleaning in DS?

Data Scientists have to clean and transform the huge data sets in a form that they
can work with. It is important to deal with the redundant data for better results by removing
nonsensical outliers, malformed records, missing values, inconsistent formatting, etc.

Python libraries such as Matplotlib, Pandas, Numpy, Keras, and SciPy are extensively used
for Data cleaning and analysis. These libraries are used to load and clean the data and do
effective analysis. For example, a CSV file named “Student” has information about the stu-
dents of an institute like their names, standard, address, phone number, grades, marks, etc.

19. What are the popular libraries used in Data Science?

Below are the popular libraries used for data extraction, cleaning, visualization, and deploy-
ing DS models:

 TensorFlow: Supports parallel computing with impeccable library management

backed by Google.

 SciPy: Mainly used for solving differential equations, multidimensional programming,

data manipulation, and visualization through graphs and charts.

 Pandas: Used to implement the ETL(Extracting, Transforming, and Loading the data-
sets) capabilities in business applications.

 Matplotlib: Being free and open-source, it can be used as a replacement for MAT-
LAB, which results in better performance and low memory consumption.

 PyTorch: Best for projects which involve Machine Learning algorithms and Deep
Neural Networks.

20. What is variance in Data Science?

Variance is a type of error that occurs in a Data Science model when the model ends
up being too complex and learns features from data, along with the noise that exists in it.
This kind of error can occur if the algorithm used to train the model has high complexity,
even though the data and the underlying patterns and trends are quite easy to discover.
This makes the model a very sensitive one that performs well on the training dataset but
17
poorly on the testing dataset, and on any kind of data that the model has not yet seen. Vari-
ance generally leads to poor accuracy in testing and results in overfitting.

21.What is pruning in a decision tree algorithm?

Pruning a decision tree is the process of removing the sections of the tree that are
not necessary or are redundant. Pruning leads to a smaller decision tree, which performs
better and gives higher accuracy and speed.

22. What is entropy in a decision tree algorithm?

In a decision tree algorithm, entropy is the measure of impurity or randomness. The

entropy of a given dataset tells us how pure or impure the values of the dataset are. In sim-
ple terms, it tells us about the variance in the dataset.
For example, suppose we are given a box with 10 blue marbles. Then, the entropy of the
box is 0 as it contains marbles of the same color, i.e., there is no impurity. If we need to
draw a marble from the box, the probability of it being blue will be 1.0. However, if we re-
place 4 of the blue marbles with 4 red marbles in the box, then the entropy increases to 0.4
for drawing blue marbles.

23. What information is gained in a decision tree algorithm?

When building a decision tree, at each step, we have to create a node that decides
which feature we should use to split data, i.e., which feature would best separate our data
so that we can make predictions. This decision is made using information gain, which is a
measure of how much entropy is reduced when a particular feature is used to split the data.
The feature that gives the highest information gain is the one that is chosen to split the data.

24. What is k-fold cross-validation?

In k-fold cross-validation, we divide the dataset into k equal parts. After this, we loop
over the entire dataset k times. In each iteration of the loop, one of the k parts is used for
testing, and the other k − 1 parts are used for training. Using k-fold cross-validation, each
one of the k parts of the dataset ends up being used for training and testing purposes.

18
25.Explain how a recommender system works.

A recommender system is a system that many consumer-facing, content-driven, on-

line platforms employ to generate recommendations for users from a library of available
content. These systems generate recommendations based on what they know about the
users’ tastes from their activities on the platform.

For example, imagine that we have a movie streaming platform, similar to Netflix or
Amazon Prime. If a user has previously watched and liked movies from action and horror
genres, then it means that the user likes watching the movies of these genres. In that case,
it would be better to recommend such movies to this particular user. These recommenda-
tions can also be generated based on what users with a similar taste like watching.

26. What is a normal distribution?

Data distribution is a visualization tool to analyze how data is spread out or distrib-
uted. Data can be distributed in various ways. For instance, it could be with a bias to the left
or the right, or it could all be jumbled up.

Data may also be distributed around a central value, i.e., mean, median, etc. This
kind of distribution has no bias either to the left or to the right and is in the form of a bell-
shaped curve. This distribution also has its mean equal to the median. This kind of distribu-
tion is called a normal distribution.

27. What is Deep Learning?

Deep Learning is a kind of Machine Learning, in which neural networks are used to
imitate the structure of the human brain, and just like how a brain learns from information,
machines are also made to learn from the information that is provided to them.

Deep Learning is an advanced version of neural networks to make the machines learn from
data. In Deep Learning, the neural networks comprise many hidden layers (which is why it is
called ‘deep’ learning) that are connected to each other, and the output of the previous layer
is the input of the current layer.

28.Mention the Tools for Data Science

Following are some tools required for data science:
19
o Data Analysis tools: R, Python, Statistics, SAS, Jupyter, R Studio, MATLAB, Excel,
RapidMiner.

o Data Warehousing: ETL, SQL, Hadoop, Informatica/Talend, AWS Redshift

o Data Visualization tools: R, Jupyter, Tableau, Cognos.

o Machine learning tools: Spark, Mahout, Azure ML studio.

PART B

20
1.Explain various steps in the Data Science process (OR) Data Science Lifecycle

The life-cycle of data science is explained as below diagram.

The main phases of data science life cycle are given below:

1. Discovery: The first phase is discovery, which involves asking the right questions. When
you start any data science project, you need to determine what are the basic requirements,
priorities, and project budget. In this phase, we need to determine all the requirements of
the project such as the number of people, technology, time, data, an end goal, and then we
can frame the business problem on first hypothesis level.

2. Data preparation: Data preparation is also known as Data Munging. In this phase, we
need to perform the following tasks:

 Data cleaning

 Data Reduction

 Data integration

 Data transformation,

21
After performing all the above tasks, we can easily use this data for our further processes.

3. Model Planning: In this phase, we need to determine the various methods and tech-
niques to establish the relation between input variables. We will apply Exploratory data ana-
lytics(EDA) by using various statistical formula and visualization tools to understand the re-
lations between variable and to see what data can inform us. Common tools used for model
planning are:

 SQL Analysis Services

 R

 SAS

 Python

4. Model-building: In this phase, the process of model building starts. We will create
datasets for training and testing purpose. We will apply different techniques such as associ-
ation, classification, and clustering, to build the model.

Following are some common Model building tools:

 SAS Enterprise Miner

 WEKA

 SPCS Modeler

 MATLAB

5. Operationalize: In this phase, we will deliver the final reports of the project, along with
briefings, code, and technical documents. This phase provides you a clear overview of com-
plete project performance and other components on a small scale before the full deploy-
ment.

6. Communicate results: In this phase, we will check if we reach the goal, which we have
set on the initial phase. We will communicate the findings and final result with the business
team.

Applications of Data Science:

 Image recognition and speech recognition:

22
 Data science is currently using for Image and speech recognition. When you upload
an image on Facebook and start getting the suggestion to tag to your friends. This
automatic tagging suggestion uses image recognition algorithm, which is part of data
science.
When you say something using, "Ok Google, Siri, Cortana", etc., and these devices
respond as per voice control, so this is possible with speech recognition algorithm.

 Gaming world:

 In the gaming world, the use of Machine learning algorithms is increasing day by day.
EA Sports, Sony, Nintendo, are widely using data science for enhancing user experi-
ence.

 Internet search:

 When we want to search for something on the internet, then we use different types of
search engines such as Google, Yahoo, Bing, Ask, etc. All these search engines use
the data science technology to make the search experience better, and you can get a
search result with a fraction of seconds.

 Transport:
Transport industries also using data science technology to create self-driving cars.
With self-driving cars, it will be easy to reduce the number of road accidents.

 Healthcare:
In the healthcare sector, data science is providing lots of benefits. Data science is be-
ing used for tumor detection, drug discovery, medical image analysis, virtual medical
bots, etc.

 Recommendation systems:

 Most of the companies, such as Amazon, Netflix, Google Play, etc., are using data
science technology for making a better user experience with personalized recom-
mendations. Such as, when you search for something on Amazon, and you started
getting suggestions for similar products, so this is because of data science techno-
logy.

 Risk detection:

23
Finance industries always had an issue of fraud and risk of losses, but with the help
of data science, this can be rescued.
Most of the finance companies are looking for the data scientist to avoid risk and any
type of losses with an increase in customer satisfaction.

2.Explain the facets of data in detail

In Data Science and Big Data you’ll come across many different types of data, and each of

them tends to require different tools and techniques. The main categories of data are

these:
 Structured

 Unstructured

 Natural Language

 Machine-generated

 Graph-based

 Audio, video and images

 Streaming

Let’s explore all these interesting data types..

Structured Data

Structured data is the data that depends on a data model and resides in a fixed field

within a record. It’s often easy to store structured data in tables within data bases or Excel

files. SQL, Structured Query Language, is the preferred way to manage and query data that

resides in data bases. You may also come across structured data that might give you a hard

time storing it in a traditional relational database.

Hierarchical data such as a family tree is one such example.The world isn’t made up

of structured data, though; it’s imposed upon it by humans and machines.

Unstructured Data

24
Unstructured data is data that isn’t easy to fit into a data model because the con-

tent is context-specific or varying. One example of unstructured data is your regular email.

Although email contains structured elements such as the sender, title, and body text, it’s a

challenge to find the number of people who have written an email complaint about a specific

employee because so many ways exist to refer to a person, for example. The thousands of

different languages and dialects out there further complicate this.

A human-written email, is also a perfect example of natural language data.

Natural Language

Natural language is a special type of unstructured data ;it’s challenging to process

because it requires knowledge of specific data science techniques and linguistics.The

natural language processing community has had success in entity recognition, topic re-

cognition, summarization, text completion, and sentiment analysis, but models trained

in one domain don’t generalize well to other domains. Even state-of-the-art techniques aren’t

able to decipher the meaning of every piece of text. This shouldn’t be a surprise though: hu-

mans struggle with natural language as well. It’s ambiguous by nature. The concept of

meaning itself is questionable here. Have two people listen to the same conversation. Will

they get the same meaning? The meaning of the same words can vary when coming from

someone upset or joyous.

Machine-generated Data

Machine-generated data is informative that’s automatically created by a computer,

process, application or other machine without human intervention. Machine-generated

data is becoming a major data resource and will continue to do so.

25
The analysis of Machine data relies on highly scalable tools, due to high volume and

speed.

Examples are, web server logs, call detail records, network event logs and telemetry.

Example for Machine data

This is not the best approach for highly interconnected or “networked” data, where the

relationship between entities have a valuable role to play.

Graph-based or Network Data

“Graph data” can be a confusing term because any data can be shown in a graph.

“Graph” in this case points to mathematical graph theory. In graph theory, a graph is

a mathematical structure to model pair-wise relationships between objects. Graph or

network data is, in short, data that focuses on the relationship or adjacency of ob-

jects.

The graph structures use nodes, edges, and properties to represent and store graphical

data.

26
Friends in social network is an example of Graph-based data

Graph-based data is a natural way to represent social networks, and its structure al-

lows you to calculate specific metrics such as the influence of a person and the shortest path

between two people.

Graph databases are used to store graph-based data and are queried with specialized

query languages such as SPARQL.

Graph data poses its challenges, but for a computer interpreting additive and image data, it

can be ever more difficult.

Audio, Images and Videos

Audio, image, and video are data types that pose specific challenges to a data scient-

ist. Tasks that are trivial for humans, such as recognizing objects in pictures, turn out to be

challenging for computers.

Multimedia data in the form of audio, video, images and sensor signals have become

an integral part of everyday life. Moreover, they have revolutionized product testing and evid-

ence collection by providing multiple sources of data for quantitative and systematic assess-

ment.

We have various libraries, development languages and IDEs commonly used in the field,

such as :
 MATLAB
 openCV
 ImageJ

27
 Python
 R
 Java
 C
 C++
 C#

Streaming Data

While streaming data can take almost any of the previous forms, it has an extra prop-

erty. The data flows into the system when an event happens instead of being

loaded into a data store in a batch. Although it isn’t really a different type of data, we treat it

here as much because you need to adapt your process to deal with this type of information.

3.What are the steps in Data Cleansing, explain with example?

Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly for-
matted, duplicate, or incomplete data within a dataset. When combining multiple data
sources, there are many opportunities for data to be duplicated or mislabeled. If data is in-
correct, outcomes and algorithms are unreliable, even though they may look correct. There
is no one absolute way to prescribe the exact steps in the data cleaning process because
the processes will vary from dataset to dataset. But it is crucial to establish a template for
your data cleaning process so you know you are doing it the right way every time.

Difference between data cleaning and data transformation

Data cleaning is the process that removes data that does not belong in your dataset.
Data transformation is the process of converting data from one format or structure into an-
other. Transformation processes can also be referred to as data wrangling, or data mung-
ing, transforming and mapping data from one "raw" data form into another format for ware-
housing and analyzing. This article focuses on the processes of cleaning that data.

Techniques used for data cleaning

While the techniques used for data cleaning may vary according to the types of data
your company stores, you can follow these basic steps to map out a framework for your or-
ganization.

28
Step 1: Remove duplicate or irrelevant observations

Remove unwanted observations from your dataset, including duplicate observations

or irrelevant observations. Duplicate observations will happen most often during data collec-
tion. When you combine data sets from multiple places, scrape data, or receive data from
clients or multiple departments, there are opportunities to create duplicate data. De-duplica-
tion is one of the largest areas to be considered in this process. Irrelevant observations are
when you notice observations that do not fit into the specific problem you are trying to ana -
lyze. For example, if you want to analyze data regarding millennial customers, but your
dataset includes older generations, you might remove those irrelevant observations. This
can make analysis more efficient and minimize distraction from your primary target—as well
as creating a more manageable and more performant dataset.

Step 2: Fix structural errors

Structural errors are when you measure or transfer data and notice strange naming
conventions, typos, or incorrect capitalization. These inconsistencies can cause mislabeled
categories or classes. For example, you may find “N/A” and “Not Applicable” both appear,
but they should be analyzed as the same category.

Step 3: Filter unwanted outliers

Often, there will be one-off observations where, at a glance, they do not appear to fit
within the data you are analyzing. If you have a legitimate reason to remove an outlier, like
improper data-entry, doing so will help the performance of the data you are working with.
However, sometimes it is the appearance of an outlier that will prove a theory you are work-
ing on. Remember: just because an outlier exists, doesn’t mean it is incorrect. This step is
needed to determine the validity of that number. If an outlier proves to be irrelevant for anal-
ysis or is a mistake, consider removing it.

Step 4: Handle missing data

You can’t ignore missing data because many algorithms will not accept missing values.
There are a couple of ways to deal with missing data. Neither is optimal, but both can be
considered.

1. As a first option, you can drop observations that have missing values, but doing this
will drop or lose information, so be mindful of this before you remove it.

29
2. As a second option, you can input missing values based on other observations;
again, there is an opportunity to lose integrity of the data because you may be oper-
ating from assumptions and not actual observations.

3. As a third option, you might alter the way the data is used to effectively navigate null
values.

Step 5: Validate and QA

At the end of the data cleaning process, you should be able to answer these questions
as a part of basic validation:
 Does the data make sense?

 Does the data follow the appropriate rules for its field?

 Does it prove or disprove your working theory, or bring any insight to light?

 Can you find trends in the data to help you form your next theory?

 If not, is that because of a data quality issue?

False conclusions because of incorrect or “dirty” data can inform poor business strategy
and decision-making. False conclusions can lead to an embarrassing moment in a reporting
meeting when you realize your data doesn’t stand up to scrutiny. Before you get there, it is
important to create a culture of quality data in your organization. To do this, you should doc-
ument the tools you might use to create this culture and what data quality means to you.

Try Tableau for free to create beautiful visualizations with your data.

Components of quality data

Determining the quality of data requires an examination of its characteristics, then

weighing those characteristics according to what is most important to your organization and
the application(s) for which they will be used.

5 characteristics of quality data

Validity. The degree to which your data conforms to defined business rules or constraints.

1. Accuracy. Ensure your data is close to the true values.

2. Completeness. The degree to which all required data is known.

30
3. Consistency. Ensure your data is consistent within the same dataset and/or across
multiple data sets.

4. Uniformity. The degree to which the data is specified using the same unit of meas-
ure.

Benefits of data cleaning

Having clean data will ultimately increase overall productivity and allow for the highest
quality information in your decision-making. Benefits include:

 Removal of errors when multiple sources of data are at play.

 Fewer errors make for happier clients and less-frustrated employees.

 Ability to map the different functions and what your data is intended to do.

 Monitoring errors and better reporting to see where errors are coming from, making it
easier to fix incorrect or corrupt data for future applications.

 Using tools for data cleaning will make for more efficient business practices and
quicker decision-making.

4. Explain the common error occur in data cleansing process.

Data Cleansing: Problems and Solutions
It is more important for any organization to have the right data as compared to a
large data set. Data cleansing solutions can have several problems during the process
of data scrubbing. The company needs to understand the various problems and figure
out how to tackle them. Some of the key data cleaning problems and solutions include

Data is never static

It is important that the data cleansing process arranges the data so that it is
easily accessible to everyone who needs it. The warehouse should contain unified
data and not in a scattered manner. The data warehouse must have a documented
system which is helpful for the employees to easily access the data from different
sources. Data cleaning also further helps to improve the data quality by removing in -
accurate data as well as corrupt and duplicate entries.

31
Incorrect data may lead to bad decisions
While operating your business you rely on certain source of data, based on
which you make most of your business decisions. If the data has a lot of errors, the
decisions you take may be incorrect and prove to be hazardous for your business.
The way you collect data and how your data warehouse functions can easily have an
impact on your productivity.

Incorrect data can affect client records

Complete client records are only possible when the names and addresses
match. Names and addresses of the client can be poor sources of data. To avoid
these mistakes, companies should provide external references which are capable of
verifying the data, supplementing data points and correcting any inconsistencies.

Develop a data cleansing framework in advance

Data cleansing can be a time consuming and expensive job for your company.
Once the data is cleaned it needs to be stored in a secure location. The staff should
keep a complete log of the entire process so as to ascertain which data went through
which process. If a data scrubbing framework is not created in advance, the entire
process can become repetitive.

Big data can bring in bigger problems

Big data needs regular cleansing to maintain its effectiveness. It requires com -
plex computer data analysis of semi-structured or structured and voluminous data.
Data cleansing helps in extracting information from such a big set of data and come
up with some data which can be used to make certain key business decisions.

 It is good with large databases and datasets

 It predicts future results
 It creates actionable insights
 It utilizes the automated discovery of patterns

32
5.What are the Steps Involved in Data Science Modelling.
The key steps involved in Data Science Modelling are:

 Step 1: Understanding the Problem

 Step 2: Data Extraction

 Step 3: Data Cleaning

 Step 4: Exploratory Data Analysis

 Step 5: Feature Selection

 Step 6: Incorporating Machine Learning Algorithms

 Step 7: Testing the Models

 Step 8: Deploying the Model

Step 1: Understanding the Problem

The first step involved in Data Science Modelling is understanding the problem. A
Data Scientist listens for keywords and phrases when interviewing a line-of-business expert
about a business challenge. The Data Scientist breaks down the problem into a procedural
flow that always involves a holistic understanding of the business challenge, the Data that
must be collected, and various Artificial Intelligence and Data Science approach that can be
used to address the problem.

Step 2: Data Extraction

The next step in Data Science Modelling is Data Extraction. Not just any Data, but
the Unstructured Data pieces you collect, relevant to the business problem you’re trying to
address. The Data Extraction is done from various sources online, surveys, and existing
Databases.

Step 3: Data Cleaning

Data Cleaning is useful as you need to sanitize Data while gathering it. The following are
some of the most typical causes of Data Inconsistencies and Errors:

 Duplicate items are reduced from a variety of Databases.

 The error with the input Data in terms of Precision.

 Changes, Updates, and Deletions are made to the Data entries.

33
 Variables with missing values across multiple Databases.

Step 4: Exploratory Data Analysis

Exploratory Data Analysis (EDA) is a robust technique for familiarising yourself with
Data and extracting useful insights. Data Scientists sift through Unstructured Data to find
patterns and infer relationships between Data elements. Data Scientists use Statistics and
Visualisation tools to summarise Central Measurements and variability to perform EDA.

If Data skewness persists, appropriate transformations are used to scale the distribu-
tion around its mean. When Datasets have a lot of features, exploring them can be difficult.
As a result, to reduce the complexity of Model inputs, Feature Selection is used to rank
them in order of significance in Model Building for enhanced efficiency. Using Business In-
telligence tools like Tableau, MicroStrategy, etc. can be quite beneficial in this step. This
step is crucial in Data Science Modelling as the Metrics are studied carefully for validation of
Data Outcomes.

Step 5: Feature Selection

Feature Selection is the process of identifying and selecting the features that contrib-
ute the most to the prediction variable or output that you are interested in, either automati-
cally or manually.

The presence of irrelevant characteristics in your Data can reduce the Model accuracy
and cause your Model to train based on irrelevant features. In other words, if the features
are strong enough, the Machine Learning Algorithm will give fantastic outcomes. Two types
of characteristics must be addressed:

 Consistent characteristics that are unlikely to change.

 Variable characteristics whose values change over time.

Step 6: Incorporating Machine Learning Algorithms

This is one of the most crucial processes in Data Science Modelling as the Machine
Learning Algorithm aids in creating a usable Data Model. There are a lot of algorithms to
pick from, the Model is selected based on the problem. There are three types of Machine
Learning methods that are incorporated:

34
1) Supervised Learning

It is based on the results of a previous operation that is related to the existing business
operation. Based on previous patterns, Supervised Learning aids in the prediction of an out-
come. Some of the Supervised Learning Algorithms are:

 Linear Regression

 Random Forest

 Support Vector Machines

2) Unsupervised Learning

This form of learning has no pre-existing consequence or pattern. Instead, it concen-

trates on examining the interactions and connections between the presently available Data
points. Some of the Unsupervised Learning Algorithms are:

 KNN (k-Nearest Neighbors)

 K-means Clustering

 Hierarchical Clustering

 Anomaly Detection

3) Reinforcement Learning

It is a fascinating Machine Learning technique that uses a dynamic Dataset that interacts
with the real world. In simple terms, it is a mechanism by which a system learns from its
mistakes and improves over time. Some of the Reinforcement Learning Algorithms are:

 Q-Learning

 State-Action-Reward-State-Action (SARSA)

 Deep Q Network

For further information on Advance Machine Learning techniques, visit here.

Step 7: Testing the Models

This is the next phase, and it’s crucial to check that our Data Science Modelling ef-
forts meet the expectations. The Data Model is applied to the Test Data to check if it’s accu-
rate and houses all desirable features. You can further test your Data Model to identify any
35
adjustments that might be required to enhance the performance and achieve the desired re-
sults. If the required precision is not achieved, you can go back to Step 5 (Machine Learning
Algorithms), choose an alternate Data Model, and then test the model again.

Step 8: Deploying the Model

The Model which provides the best result based on test findings is completed and de-
ployed in the production environment whenever the desired result is achieved through
proper testing as per the business needs. This concludes the process of Data Science Mod-
elling.

Applications of Data Science

Every industry benefits from the experience of Data Science companies, but the most
common areas where Data Science techniques are employed are the following:

 Banking and Finance: The banking industry can benefit from Data Science in many
aspects. Fraud Detection is a well-known application in this field that assists banks in
reducing non-performing assets.

 Healthcare: Health concerns are being monitored and prevented using Wearable
Data. The Data acquired from the body can be used in the medical field to prevent fu-
ture calamities.

 Marketing: Marketing offers a lot of potential, such as a more effective price strategy.
Pricing based on Data Science can help companies like Uber and E-Commerce busi-
nesses enhance their profits.

 Government Policies: Based on Data gathered through surveys and other official
sources, the government can use Data Science to better build policies that cater to
the interests and wishes of the people

36
37
38

FDSA Unit 1
No ratings yet
FDSA Unit 1
34 pages
Fdsa 1
No ratings yet
Fdsa 1
11 pages
AD3491 FDSA Syllabus
No ratings yet
AD3491 FDSA Syllabus
2 pages
Course Plan - FDS Theory
No ratings yet
Course Plan - FDS Theory
7 pages
Data Science Fundamentals Course Outline
No ratings yet
Data Science Fundamentals Course Outline
2 pages
SYLLABUS
No ratings yet
SYLLABUS
1 page
Padeepz App AD3491 Syllabus
No ratings yet
Padeepz App AD3491 Syllabus
2 pages
DS Tansche 03.06.2024
No ratings yet
DS Tansche 03.06.2024
23 pages
Syllabus Fundamentals of Data Science
No ratings yet
Syllabus Fundamentals of Data Science
7 pages
Edit Ds
No ratings yet
Edit Ds
37 pages
Andhra University
No ratings yet
Andhra University
51 pages
BCA (AIDS) - 3rd Sem - TBD303 - Statistical Methods For Data Science-JBK
No ratings yet
BCA (AIDS) - 3rd Sem - TBD303 - Statistical Methods For Data Science-JBK
2 pages
Statics For DS-Honors
No ratings yet
Statics For DS-Honors
3 pages
Unit 1 Fod
No ratings yet
Unit 1 Fod
43 pages
Unit I Introduction To Data Science: Dept. of IT 2024-2025
No ratings yet
Unit I Introduction To Data Science: Dept. of IT 2024-2025
9 pages
FDA Lesson Plan
No ratings yet
FDA Lesson Plan
3 pages
FDSA - Question Bank
No ratings yet
FDSA - Question Bank
5 pages
CSE - DS DJS22 Sem IV Syllabus
No ratings yet
CSE - DS DJS22 Sem IV Syllabus
23 pages
IDS (R22) U1 NotesRK 03092024
100% (1)
IDS (R22) U1 NotesRK 03092024
22 pages
Ad3491 Fdsa Syllabus
No ratings yet
Ad3491 Fdsa Syllabus
1 page
FDS LP - 25 - 26 - Fin
No ratings yet
FDS LP - 25 - 26 - Fin
5 pages
Cs3352 - Foundation of Data Science
No ratings yet
Cs3352 - Foundation of Data Science
56 pages
Course Plan - FDS Theory
No ratings yet
Course Plan - FDS Theory
8 pages
1152CS239-Intro. To Data Science-Syllabus
No ratings yet
1152CS239-Intro. To Data Science-Syllabus
6 pages
22CD1101
No ratings yet
22CD1101
2 pages
Sem1 DTSA Major1
No ratings yet
Sem1 DTSA Major1
1 page
Cs3352 Fds Lesson Plan
No ratings yet
Cs3352 Fds Lesson Plan
10 pages
CS3352 FDS
No ratings yet
CS3352 FDS
23 pages
FDS Lesson Plan
No ratings yet
FDS Lesson Plan
8 pages
Introduction to Data Science Lecture Notes
100% (5)
Introduction to Data Science Lecture Notes
133 pages
Syllabus FDS
No ratings yet
Syllabus FDS
4 pages
Data Science 5
100% (4)
Data Science 5
216 pages
Foundations of Data Science
No ratings yet
Foundations of Data Science
139 pages
LP Stats
No ratings yet
LP Stats
34 pages
Fds Question Bank
No ratings yet
Fds Question Bank
116 pages
CS3352 Foundations of Data Science
No ratings yet
CS3352 Foundations of Data Science
1 page
Data Science & Python Syllabus 2022-24
No ratings yet
Data Science & Python Syllabus 2022-24
9 pages
FDS R2023
No ratings yet
FDS R2023
2 pages
Principles of Data Science
No ratings yet
Principles of Data Science
2 pages
CS5103 Lecture Plan - Fundamnetals of Data Science
No ratings yet
CS5103 Lecture Plan - Fundamnetals of Data Science
2 pages
Important Part B and Part C Questions1
No ratings yet
Important Part B and Part C Questions1
4 pages
Data Science Syllabus
No ratings yet
Data Science Syllabus
4 pages
FDS Lesson Plan Upload
No ratings yet
FDS Lesson Plan Upload
6 pages
Introduction to Data Science Course
No ratings yet
Introduction to Data Science Course
3 pages
U23AD492 - Data Science Syllabus
No ratings yet
U23AD492 - Data Science Syllabus
4 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
2 pages
CS3352 QB
No ratings yet
CS3352 QB
15 pages
MCS DS
No ratings yet
MCS DS
5 pages
19CS003 Handout
No ratings yet
19CS003 Handout
5 pages
Foundations of Data Science
No ratings yet
Foundations of Data Science
2 pages
Data Science Syllabus
No ratings yet
Data Science Syllabus
3 pages
Data Science Foundations Syllabus
No ratings yet
Data Science Foundations Syllabus
2 pages
10212ec146 - Introduction To Data Science Syllabus
No ratings yet
10212ec146 - Introduction To Data Science Syllabus
2 pages
Notes
No ratings yet
Notes
18 pages
Mtech-Syllabus-Data Science - Sem1
No ratings yet
Mtech-Syllabus-Data Science - Sem1
25 pages
Advanced Data Science Course Guide
No ratings yet
Advanced Data Science Course Guide
6 pages
M.Sc. Data Science Syllabus
No ratings yet
M.Sc. Data Science Syllabus
6 pages
Fds Course File
No ratings yet
Fds Course File
13 pages
Data Science Course Overview 2019
No ratings yet
Data Science Course Overview 2019
1 page
7a3365e2-521a-4ff9-851b-d8b467636e4f
No ratings yet
7a3365e2-521a-4ff9-851b-d8b467636e4f
34 pages
UNIT II 2 Mark Questions and Answers
No ratings yet
UNIT II 2 Mark Questions and Answers
7 pages
tmpB33B TMP
No ratings yet
tmpB33B TMP
9 pages
Analysis of The Effect of Tool Nose Radius, Feed Rate, and Cutting Depth Parameters On Surface Roughness and Cutting Force in CNC Lathe Machining of 36CrNiMo4 Alloy Steel (
No ratings yet
Analysis of The Effect of Tool Nose Radius, Feed Rate, and Cutting Depth Parameters On Surface Roughness and Cutting Force in CNC Lathe Machining of 36CrNiMo4 Alloy Steel (
10 pages
Clove Oil Analgesic
No ratings yet
Clove Oil Analgesic
6 pages
Effect of Four Rootstocks On Concentration of Leaf Mineral Elements in Scion of Granny Smith Cultivar
No ratings yet
Effect of Four Rootstocks On Concentration of Leaf Mineral Elements in Scion of Granny Smith Cultivar
4 pages
Legume Flour Cookies Analysis
No ratings yet
Legume Flour Cookies Analysis
6 pages
Celebrity Endorsement To The Buying Intention of Mabalacat City College Students in Purchasing Skin Whitening Soap
No ratings yet
Celebrity Endorsement To The Buying Intention of Mabalacat City College Students in Purchasing Skin Whitening Soap
48 pages
Machine Learning Algorithms, Real-World Applications and Research Directions
No ratings yet
Machine Learning Algorithms, Real-World Applications and Research Directions
73 pages
Can Toddlers Learn Vocabulary From Television? An Experimental Approach
No ratings yet
Can Toddlers Learn Vocabulary From Television? An Experimental Approach
23 pages
الزمالة المصرية لجراحة الاورام المنهج العلمى
No ratings yet
الزمالة المصرية لجراحة الاورام المنهج العلمى
41 pages
Statistics For Data Science
100% (3)
Statistics For Data Science
39 pages
Yield, Viscosity, and Gel Strength of Wami Tilapia (Oreochromis Urolepis Hornorum) Skin Gelatin: Optimization of The Extraction Process
No ratings yet
Yield, Viscosity, and Gel Strength of Wami Tilapia (Oreochromis Urolepis Hornorum) Skin Gelatin: Optimization of The Extraction Process
9 pages
Selected Statistical Tests
No ratings yet
Selected Statistical Tests
258 pages
AP Statistics Quiz 9 Review
No ratings yet
AP Statistics Quiz 9 Review
3 pages
Newbold Book Solutions
100% (2)
Newbold Book Solutions
65 pages
Statistics in Modern Mathematics
No ratings yet
Statistics in Modern Mathematics
12 pages
Impact of Rationalization To Employees' Job Displacement
100% (2)
Impact of Rationalization To Employees' Job Displacement
55 pages
Digital Reading Strategies Study
No ratings yet
Digital Reading Strategies Study
45 pages
International Business Review: Marja Roza, Frans A.J. Van Den Bosch, Henk W. Volberda
No ratings yet
International Business Review: Marja Roza, Frans A.J. Van Den Bosch, Henk W. Volberda
10 pages
Hypothesis Testing in Education
100% (2)
Hypothesis Testing in Education
47 pages
Forensic Accounting in Indonesia
100% (1)
Forensic Accounting in Indonesia
16 pages
English Teachers' Concerns on CEFR Implementation
No ratings yet
English Teachers' Concerns on CEFR Implementation
13 pages
Development and Optimization of Gastroretentive Mucoadhesive Microspheres of Gabapentin by Box Behnken Design
No ratings yet
Development and Optimization of Gastroretentive Mucoadhesive Microspheres of Gabapentin by Box Behnken Design
12 pages
Car Loan Satisfaction in Surigao
No ratings yet
Car Loan Satisfaction in Surigao
16 pages
MBA Second Semester Syllabus
No ratings yet
MBA Second Semester Syllabus
15 pages
James & McCulloch 1990
No ratings yet
James & McCulloch 1990
40 pages
Multivariate Analysis: Subject: Research Methodology
No ratings yet
Multivariate Analysis: Subject: Research Methodology
14 pages
Previewpdf
0% (1)
Previewpdf
31 pages