0% found this document useful (0 votes)
17 views20 pages

Foundations For Data Analytics: Dr. D. Kothandaraman Associate Professor Scope-Vit-Ap Module-1

The document outlines the foundations of data analytics, explaining the distinction between data and information, and the importance of data quality. It covers various types of data analytics, including descriptive, diagnostic, predictive, and prescriptive analytics, as well as techniques for data munging, scraping, sampling, and cleaning. Additionally, it emphasizes the significance of data analytics in decision-making and operational efficiency, along with success strategies for analysts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views20 pages

Foundations For Data Analytics: Dr. D. Kothandaraman Associate Professor Scope-Vit-Ap Module-1

The document outlines the foundations of data analytics, explaining the distinction between data and information, and the importance of data quality. It covers various types of data analytics, including descriptive, diagnostic, predictive, and prescriptive analytics, as well as techniques for data munging, scraping, sampling, and cleaning. Additionally, it emphasizes the significance of data analytics in decision-making and operational efficiency, along with success strategies for analysts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Foundations for Data Analytics

Dr. D. Kothandaraman
Associate Professor
SCOPE-VIT-AP
Module-1
Module 1
• Data-Information- characteristics of data - data
munging-scraping-sampling-cleaning- importance of
data analytics -Success Stories
Data
What is Data?
• Data is a collection of raw, unorganised facts and details like text,
observations, figures, symbols and descriptions of things etc.
• Data does not carry any specific purpose and has no significance by
itself.
• Data is measured in terms of bits and bytes – which are basic units of
information in the context of computer storage and processing.
Information
What is Information?
• Information is processed, organised and structured data. It provides
context for data and enables decision making.
• For example, a single customer’s sale at a restaurant is data – this
becomes information when the business is able to identify the most
popular or least popular dish.
Data Vs information
Data Information
Data is unorganised and unrefined facts Information comprises processed,
organised data presented in a meaningful
context

Data is an individual unit that contains raw Information is a group of data that
materials which do not carry any specific collectively carries a logical meaning.
meaning.
Data Vs
Information Data doesn’t depend on information. Information depends on data.

Raw data alone is insufficient for decision Information is sufficient for decision making
making

An example of data is a student’s test score The average score of a class is the
information derived from the given data.
Data Vs information
• Example:
Basics of Data Analytics
• Data analytics is the science of analysing raw data to make
conclusions about that information.
• Data analytics help a business optimize its performance, perform
more efficiently, maximize profit, or make more strategically-guided
decisions.
• Various approaches to data analytics include looking at what
happened (descriptive analytics), why something happened
(diagnostic analytics), what is going to happen (predictive analytics),
or what should be done next (prescriptive analytics).
• Data analytics relies on a variety of software tools including
spreadsheets, data visualization, reporting tools, data mining
programs, and open-source languages for the greatest data
manipulation.
Types of Data Analytics
Data analytics is broken down into four basic types:
1.Descriptive analytics: This describes what has happened over a given
period of time. Have the number of views gone up? Are sales stronger
this month than last?
2.Diagnostic analytics: This focuses more on why something
happened. It involves more diverse data inputs and a bit of
hypothesizing. Did the weather affect beer sales? Did that latest
marketing campaign impact sales?
3.Predictive analytics: This moves to what is likely going to happen in
the near term. What happened to sales the last time we had a hot
summer? How many weather models predict a hot summer this year?
4.Prescriptive analytics: This suggests a course of action.
Characteristics of Data
• Data quality is crucial – it assesses whether information can serve its
purpose in a particular context (such as data analysis, for example).
So, how do you determine the quality of a given set of information?
There are data quality characteristics of which you should be aware.
• There are five traits that you’ll find within data quality:
• Accuracy
• Completeness
• Reliability
• Relevance
• Timeliness
Characteristics of Data
Characteristic How it’s measured
Accuracy Is the information correct in every detail?

Completeness How comprehensive is the information?

Reliability Does the information contradict other trusted resources?

Relevance Do you really need this information?

Timeliness How up- to-date is information? Can it be used for real-time reporting?
Data Munging-Scraping-Sampling-Cleaning
• Data Munging
• In data analysis, Data munging or Data wrangling refers to the process of
cleaning and transforming raw data into its desired format, usually to
facilitate further analysis or visualization.
• Data munging can be done in Python or R
• The process of data munging can be broken down into three steps:
• Data Pre-processing
• Data Enriching
• Data validation.
• Data pre-processing includes data discovery and data transformation.
• In the data enrichment process the cleaned and transformed data is turned
into meaningful and accurate information.
• Data validation is the last stage in the data munging process. It’s important to
look for inconsistencies and errors that occurred during the transformation
process.
• Data Scraping
• Data scraping, also known as web scraping, is the process of importing
information from a website into a spreadsheet or local file saved on your
computer.
• It’s one of the most efficient ways to get data from the web, and in some
cases to channel that data to another website.
• You might use data scraping for:
• Website upgrades
• Competitor analysis
• In-depth reporting
• Some people use the technique to harm others. For example, some people
set up scraping tools to gather email addresses or social media profiles. Then
they bundle up that data and sell it to email spammers.
• 4 Ways to Protect Your Data
• Limit requests.
• Apply CAPTCHA.
• Use images.
• Shake up your text.
Data Sampling
• In data sampling is the practice of analyzing a subset of all data in
order to uncover the meaningful information in the larger data set.

• For example, if you wanted to estimate the number of trees in a 100-


acre area where the distribution of trees was fairly uniform.

• You could count the number of trees in 1 acre and multiply by 100, or
count the trees in a half acre and multiply by 200 to get an accurate
representation of the entire 100 acres.
Data Sampling-Cont…
• It is the practice of selecting an individual group from a population to
study the whole population.
• Every sampling type comes under two broad categories:
• Probability sampling - Random selection techniques are used to select the sample.
• Non-probability sampling - Non-random selection techniques based on certain
criteria are used to select the sample.
• Probability Sampling Techniques:
Simple Random Sampling
 In simple random sampling, the researcher selects the participants randomly. There are a
number of data analytics tools like random number generators and random number tables
used that are based entirely on chance.
 Systematic Sampling
 In systematic sampling, every population is given a number as well like in simple random
sampling. However, instead of randomly generating numbers, the samples are chosen at
regular intervals.
Cont….
Stratified Sampling
 In stratified sampling, the population is subdivided into subgroups, called strata, based on
some characteristics (age, gender, income, etc.). After forming a subgroup, you can then use
random or systematic sampling to select a sample for each subgroup.
Cluster Sampling
 In cluster sampling, the population is divided into subgroups, but each subgroup has similar
characteristics to the whole sample. Instead of selecting a sample from each subgroup, you
randomly select an entire subgroup. This method is helpful when dealing with large and
diverse populations.
Convenience Sampling
 In this sampling method, the researcher simply selects the individuals which are most easily
accessible to them.
 This is an easy way to gather data, but there is no way to tell if the sample is representative of
the entire population.
Data Cleaning
Data cleaning is the process of preparing data for analysis by removing or modifying
data that is incorrect, incomplete, irrelevant, duplicated, or improperly formatted.
This data is usually not necessary or helpful when it comes to analyzing data because
it may hinder the process or provide inaccurate results.
Importance of Data Analytics?
• 1) Data-driven decision-making
• 2) Competitive advantage
• 3) Personalization and customer experience
• 4) Operational efficiency
• 5) Predictive analytics and forecasting
• 6) Fraud detection and security
• 7) Internet of Things (IoT) and Big Data
• 8) Healthcare and research
• 9) Supply chain optimisation
• 10) Data privacy and governance
Success stories

• How do you succeed in data analytics?


• Here are the 8 pointers every analyst should strive to develop:
• Be able to tell a story, but keep it Simple. ...
• Pay attention to Detail. ...
• Be Commercially Savvy. ...
• Be Creative with Data. ...
• Be a People Person. ...
• Keep Learning new Tools and Skills. ...
• Don't be Afraid to make Mistakes, Learn from Them. ...
• Know when to Stop.
Thank You

You might also like