0% found this document useful (0 votes)

10 views54 pages

Module1 Cse2500 Da

The document provides an overview of data analysis, covering the definition, types of data, data preparation, and various analytical methods. It emphasizes the importance of transforming raw data into meaningful information for business decision-making and outlines challenges in data analysis such as volume, velocity, and variety. Additionally, it discusses different types of analyses including descriptive, diagnostic, predictive, and prescriptive analysis, along with the significance of data quality and preparation.

Uploaded by

yodip71665

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views54 pages

Module1 Cse2500 Da

Uploaded by

yodip71665

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

CSE2500 - Data Analytics

Module 1-Introduction to Data Analysis

Introducing Data, overview of data analysis: Data in the Real World,
Data vs. Information, Many “Vs” of Data, Structured Data and
Unstructured Data, Types of Data, Data Analysis Defined, Types of
Variables, Central Tendency of Data, Scales of Data, Sources of Data,
Data preparation: Cleaning the data, Removing variables, Data
Transformations.
R Studio: Base R-R Studio IDE-Introduction to R Projects and R
Markdown. Basic R: R as a calculator-Scripts and Comments-R
Variables. Data I/O: Working Directories-Importing Data Exporting
Data-More ways to save-Data I/O in Base R.
Introducing Data
• Facts and statistics collected together for reference or analysis

• Data has to be transformed into a form that is efficient for

movement or processing.

2
Data Analysis Definition
• Data analysis is defined as a process of cleaning,
transforming, and modeling data to discover useful
information for business decision-making.
• Purpose: Extract useful information for business decision-
making.
• Example: Day-to-day decision-making based on past
experiences or future expectations, Gathering memories (past
data) or dreams (future data) to inform decisions
• Business Application: Analysts apply data analysis
techniques for business purposes and Data analysis informs
business decision-making

3
Over view of Data
Analysis
OVERVIEW
Data Collection and Importance
• Various disciplines collect and store data digitally
• Retail, insurance, and meteorological organizations use data for
informed decisions
• Timely decisions maximize sales, improve R&D, and reduce costs

Data Analysis Challenges

• Fast-growing data production due to internet and operational
systems
• Increasing volume, complexity, and reliability concerns

5
Data Analysis Process
• Define project and problem
• Prepare data for analysis
• Select and optimize data analysis approaches
• Deploy and measure results for expected benefits.

Key Objectives:
• Focus on converting raw data to meaningful information
• Outlines major steps in data analysis projects from
defining the problem to the deployment of the results.

6
Data in the Real World
• Surveys or polls, interviews, and experiments are valuable
approaches for gathering data to answer specific questions.

Data is collected from various sources, including:

• Surveys and polls to understand opinions, preferences, and
behavior
Example: Casting a vote before election.
• Interviews to elicit information on people's opinions,
preferences, and behavior.
Example: Conducted over the phone.

7
• Experiments to measure and collect data in a highly controlled
manner
Example: Double-blind drug study (one group gets the drug,
other a placebo).
• Operational databases containing ongoing business transactions
-Sensors monitoring operational processes.
-Stored in databases such as CRM, ERP, supply chain systems.
• Data warehouses for making decisions
• Databases used for Historical polls, surveys, and experiment.
• External sources such as the web or literature
• Data is used to answer specific questions, understand opinions
and needs, and make informed decisions

8
Data vs. Information
• Data refers to the raw, unprocessed facts and figures collected from
various sources.
• Information refers to the processed and analyzed data that provides
meaning and insight.
• Data becomes information when it is:
- Collected and stored
- Processed and analyzed
- Interpreted and understood
- Used to make informed decisions
• In other words, data is the raw material
• Information is the result of processing and analyzing that data to
extract meaning and value.

9
Many v’s of Data
A. Volume
• The term Volume is meant for the Magnitude or Scale of data.
Data Sources and Repositories:-
• Websites
Data Generation and Sharing:-
• User clickstreams recorded and stored
• Social media applications (Facebook, Twitter, etc.) become prosumers
(producers and consumers) of data
• Increased data shares and larger data elements
• High-definition videos increase shared data.
Autonomous Data Streams:-
• Video, Audio, Text
• Data from: Social media sites , Websites , RFID applications
B. Velocity
• Velocity refers to the speed at which the gigantic amount of data is being
generated, collected and scrutinized.
Enhanced Data Movement:-
• Faster data movement through the Internet
• Quick transfer of E-mails, Social media, Video files
Cloud-Based Storage:-
• Instantaneous sharing-
• Easy accessibility from anywhere
Social Media and Data Sharing:-
• Instant data sharing among people-
• Mobile access for faster data generation and access
C. Variety
Types of Data:-
• Structured data (numeric, text fields)-
• Unstructured data (images, video, audio, etc.)

Sources of Data:-
• Structured data: ERPs, operational systems-
• Unstructured data: social media, web, RFID, machine data, etc.

Characteristics of Unstructured Data:-

• Varying sizes and resolutions
• Subject to different types of analysis
Examples: Video: tagging, playback (not computable)
Audio: playback (not computable)
Graphic: network distance analysis
Text (Facebook, tweets): sentiment analysis (not directly comparable)
D. Value
• Value refers to convert our investigated data into values.

• Value in Big data

- Important characteristic of Big Data

- Involves collecting and analyzing data to Boost organizational
performance and Enhance customer understanding

• With the access to this useful data, one must analyze great values in
order to get amazing benefits.
E. Variability
• Variability refers to unpredictable changes in the data.

• It may happen because of multiple data types & the speed with
which data is generating and being loaded into the database.
F. Veracity
• Veracity refers to the term trustworthiness with reference to accurate data.

• If the data is accurate, only then you could think of meaningful data.

• For example, consider a dataset of thirty students on which we have to make an

analysis about the reason they got distinction.

• Being an analyzer, you can ask questions like:

• what are the methodology you adopted to get good marks in all the subjects?
• How much time you devote to individual subject?

• Do you learn some subjects with the help of daily life activities like sports etc?

• Have you ever been a scholar?

• Be getting answers like this it would be easier to determine the accuracy of

information which could easily be maintained in statistical form.
G. Validity
• Two terms of big data veracity and validity seems to be alike but
are quite different.

• validity is meant for an accurate analysis in order to get optimized

results.
H. Vulnerability
• Vulnerability is one of the major challenge in big data as the data
generated from multiple sources with such an erratic speed has high
chances of being harmed by any intruder.

• Currently, in a case of Facebook, where the Belgium court has

threatened to fine a high amount on breaking privacy recently.
I. Volatility
• Volatility refers to how long the perceived data remains to be
useful for us and how it is to be kept.

• For analyzing the same, it is necessary to develop some new rules

and techniques through which rapid access to information is
possible.
J. Visualization
• Data Visualization is one of the most complex challenge in big data.

• In this information age, data is not only going beyond the limits but also
is composed of different data types.

• So, there is a need of communicate the information by visualizing it

through some special ways with special functionalities like a web-based
approach, statistical analysis etc.
• Traditional tools of data visualization face severe challenges
like low response time, complex methods of scalability,
precision in reporting time etc.

• So, it is a challenge to work with the concept which way of

communication with data is most suitable in order to make
visualization more effective.
23
Typical human-generated unstructured
data includes
• Text files: Word processing, spreadsheets, presentations, email, logs.
• Email: Email has some internal structure thanks to its metadata, and we sometimes refer to
it as semi-structured. However, its message field is unstructured and traditional analytics
tools cannot parse it.
• Social Media: Data from Facebook, Twitter, LinkedIn.
• Website: YouTube, Instagram, photo sharing sites.
• Mobile data: Text messages, locations.
• Communications: Chat, IM, phone recordings, collaboration software.
• Media: MP3, digital photos, audio and video files.
• Business applications: MS Office documents, productivity applications

24
Typical machine-generated unstructured
data includes:
• Satellite imagery: Weather data, land forms, military movements.

• Scientific data: Oil and gas exploration, space exploration,

seismic imagery, atmospheric data.

• Digital surveillance: Surveillance photos and video.

• Sensor data: Traffic, weather, oceanographic sensors.

25
26
27
Types of Digital Data

28
Data Analysis-Types

29
Statistical Analysis
• Statistical Analysis shows "What happen?" by
using past data in the form of dashboards.
• Statistical Analysis includes collection, Analysis,
interpretation, presentation, and modeling of
data.
• Analyses a sample of data.
• There are two categories of this type of Analysis -
Descriptive Analysis
Inferential Analysis.

30
Descriptive Analysis
• Analyses complete data or a sample of summarized
numerical data.
• It shows mean and deviation for continuous data
whereas percentage and frequency for categorical
data.
Inferential Analysis
• Analyses sample from complete data. In this type of
Analysis, you can find different conclusions from the
same data by selecting different samples.

31
Diagnostic Analysis
• Diagnostic Analysis shows "Why did it happen?" by
finding the cause from the insight found in Statistical
Analysis. This Analysis is u
• Useful to identify behavior patterns of data.
• If a new problem arrives in your business process,
then you can look into this Analysis to find similar
patterns of that problem.
• May have chances to use similar prescriptions for the
new problems.

32
Predictive Analysis
• Predictive Analysis shows "what is likely to happen" by using
previous data.
• The simplest data analysis example is like if last year I bought two
dresses based on my savings and if this year my salary is
increasing double then I can buy four dresses. But of course it's
not easy like this because you have to think about other
circumstances like chances of prices of clothes is increased this
year or maybe instead of dresses you want to buy a new bike, or
you need to buy a house!
• Predictive Analysis makes predictions about future outcomes
based on current or past data.
• Forecasting is just an estimate. Its accuracy is based on how much
detailed information you have and how much you dig in it.

33
Prescriptive Analysis
• Prescriptive Analysis combines the insight from all
previous Analysis to determine which action to take
in a current problem or decision.
• Data-driven companies are utilizing Prescriptive
Analysis because predictive and descriptive
Analysis are not enough to improve data
performance.
• Based on current situations and problems, they
analyze the data and make decisions.

34
Types of Variable
Categorized based on type of values variable has.
Discrete Variables:
• Contain a fixed number of distinct values
• Finite number of possible values
• Example: Industrial sector variable values are
telecommunication industry, retail industry with finite
number of possible values.
Continuous Variables:
• Can take any numeric value within a range
• Infinite number of possible values
• Example: Patient's weight (e.g., 153.2 lb, 98.2 lb)

35
Ratio Scale:
Intervals and ratios of values can be compared
Natural zero point
Example: Bank account balance ($5, $10, $15)

Special Types of Variables:-

Dichotomous Variable:
Only two possible values
Example: Gender (male, female)

Binary Variable: Dichotomous variable with values 0 or 1

Example: Purchase (0 = no, 1 = yes), Fuel Efficiency (0 = low, 1 =
high)

36
Scales of Data
Variables classified according to the scale on which they are measured.
Nominal Scale:
• Variable with Limited number of values
• Values cannot be ordered
• Example: Industry (financial, engineering, retail)

Ordinal Scale:
• Variable whose Values can be ordered or ranked
• Values are assigned to fixed categories
• Example: Low, Medium, High

Interval Scale:
• Intervals between values can be compared
• Values share same unit of measurement
• Example: Fahrenheit scale (5◦F, 10◦F, 15◦F)

37
Central Tendency of Data
• Definition: Value that characterizes the center of a set of
values
• Purpose: Quantify the middle or central location of a
variable such as average.
• Many observations values lie around central value.
• Approaches to Calculating Central Tendency:- Mode,
Median and Mean

38
Mode:
• The mode is the most commonly reported value for a
particular variable.
• It is illustrated using the following variable whose
values are: 3, 4, 5, 6, 7, 7, 7, 8,8,9
• The mode is 7 since there are three occurrences of 7.
• The following values, both 7 and 8 are reported three
times: 3, 4, 5, 6, 7,7, 7, 8, 8, 8, 9
• The mode may be reported as {7, 8} or 7.5.

39
Median
• The median is the middle value of a variable
• Sort values from low to high.
• For variables with an even number of values, the mean of
the two values closest to the middle is selected.
• The following set of values will be used to illustrate: 3, 4,
7, 2, 3, 7,4, 2, 4, 7, 4.
• Before identifying the median, the values must be sorted: 2,
2, 3, 3, 4, 4, 4, 4, 7,7, 7

40
Mean:
• Referred to as the average.
• Commonly used central tendency for variables
measured on the interval or ratio scales. S
• Sum of all the values divided by the number of
values.
• For example, for the following set of values: 3, 4, 5,
7, 7, 8, 9, 9, 9
• Calculate mean using the formula

41
Source of Data
• External data- may be incomplete, varying quality and accuracy
• Internal data -higher quality, from within the organization
Main Sources of Data:-
• Social Media: Web and social media activities, Email, Google
searches, Facebook posts, Tweets, YouTube videos, blogs generate
data for people.
• Organizations: Major source are Business and government data,
ERP systems, e-commerce systems, user-generated content, web
access logs
• Machines: Internet of Things (IoT) is evolving, Autonomous data
from connected machines such as RFID tags, telematics, phones,
refrigerators.

42
• Metadata: Enormous data about data itself
- Web crawlers and web-bots scans the web for new
webpages, html structure, and metadata
- Used by applications like web search engines.

Data Quality:-
• Varies depending on purpose and collection methods
• Internal data generally higher quality
• Publicly available data includes trustworthy sources
e.g. government data.

43
Data Preparation
• Preparing data is a time-consuming step in data analysis.
• Data preparation involves merging, characterizing, cleaning, and
transforming data
Required Steps
• Merge data into a table from multiple sources
• Characterize data
• Clean data by:
- Resolving ambiguities and errors
- Removing redundant and problematic data
- Eliminating irrelevant columns
- Calculate new columns of data (if necessary)
- Divide data into subsets (if appropriate)

44
Important Considerations
• Record details of data preparation steps and rationale
• Provide documentation for future reference and validation of
results
• Ensure consistency in data preparation methodology

Data Preparation Tasks

• Identify and clean up errors
• Remove certain variables or observations
• Generate consistent scales across observations
• Generate new frequency distributions
• Convert text to numbers and vice versa
• Combine variables
• Generate groups
• Prepare unstructured data

45
Cleaning the Data
• Since the data available for analysis may not have been
originally collected with this project’s goal in mind, it is
important to spend time cleaning the data.
• It is also beneficial to understand the accuracy with
which the data was collected as well as correcting any
errors.
• For variables measured on a nominal or ordinal scale
(where there are a fixed number of possible values), it is
useful to inspect all possible values to uncover mistakes
and/or inconsistencies.
• Any assumptions made concerning possible values that
the variable can take should be tested.

46
• For example, a variable Company may include a
number of different spellings for the same company
such as:
• General Electric Company
• General Elec. Co
• GE
• Gen. Electric Company
• General electric company
• G.E. Company

47
• These different terms, where they refer to the
same company, should be consolidated into one
for analysis.
• In addition, subject matter expertise may be
needed in cleaning these variables.
• For example, a company name may include one
of the divisions of the General Electric
Company and for the purpose of this specific
project it should be included as the ‘‘General
Electric Company.’’

48
Removing Variables
• On the basis of an initial categorization of the variables,
it may be possible to remove variables from
consideration at this point.
• For example, constants and variables with too many
missing data points should be considered for removal.
• Further analysis of the correlations between multiple
variables may identify variables that provide no
additional information to the analysis and hence could
be removed.

49
Data Transformation
Normalization
• Normalization is a process where numeric columns are transformed
using a mathematical function to a new range. It is important for two
reasons.
• First, analysis of the data should treat all variables equally so that one
column does not have more influence over another because the ranges
are different.
• For example, when analyzing customer credit card data, the Credit limit
value is not given more weightage in the analysis than the Customer’s
age.
• Second, certain data analysis and data mining methods require the data
to be normalized prior to analysis, such as neural networks or k-nearest
neighbors

50
Min-Max Normalization
• Linear transformation is performed on the original data.
• Minimum and maximum value from data is fetched and each
value is replaced according to the following formula.

Where A is the attribute data,

Min(A), Max(A) are the minimum and maximum absolute value of A respectively.
v’ is the new value of each entry in data.
v is the old value of each entry in data.
new_max(A), new_min(A) is the max and min value of the range(i.e boundary value
of range required) respectively.

51
Problem

52
Solution

Screenshot 2024-11-08 at 11.01.05 AM
No ratings yet
Screenshot 2024-11-08 at 11.01.05 AM
54 pages
Data Analysis Basics for Students
No ratings yet
Data Analysis Basics for Students
55 pages
Data Analysis Fundamentals Overview
No ratings yet
Data Analysis Fundamentals Overview
55 pages
DA Merge Notes (30!09!24)
No ratings yet
DA Merge Notes (30!09!24)
348 pages
Data Analysis - Unit1
No ratings yet
Data Analysis - Unit1
65 pages
Big Data Basics for Beginners
No ratings yet
Big Data Basics for Beginners
53 pages
Unit 1ppt
No ratings yet
Unit 1ppt
29 pages
Course Name: Introduction To Emerging Technologies
No ratings yet
Course Name: Introduction To Emerging Technologies
24 pages
BDA Unit 1 Bigdata Intro
No ratings yet
BDA Unit 1 Bigdata Intro
69 pages
Big Data Insights for IT Professionals
No ratings yet
Big Data Insights for IT Professionals
35 pages
Big Data
No ratings yet
Big Data
52 pages
Chapter 2 - EMTE - 240216 - 133452
No ratings yet
Chapter 2 - EMTE - 240216 - 133452
47 pages
Most Frequent Attribute in Data Analysis
No ratings yet
Most Frequent Attribute in Data Analysis
86 pages
Dataanalyticsunit 1
No ratings yet
Dataanalyticsunit 1
26 pages
ToolKit 1 - Unit 1 - Introduction To Data Analytics
No ratings yet
ToolKit 1 - Unit 1 - Introduction To Data Analytics
15 pages
Data Management and Analysis Overview
No ratings yet
Data Management and Analysis Overview
139 pages
Chapter-2 Data Science2
No ratings yet
Chapter-2 Data Science2
24 pages
Big Data: Transforming Business
No ratings yet
Big Data: Transforming Business
93 pages
Da 33
No ratings yet
Da 33
76 pages
Chapter-1 Introduction To Data Analytics
No ratings yet
Chapter-1 Introduction To Data Analytics
34 pages
Notes - KCS 061 Big Data Unit 1
No ratings yet
Notes - KCS 061 Big Data Unit 1
25 pages
Big Data and Data Analytics Overview
No ratings yet
Big Data and Data Analytics Overview
58 pages
Big Data: Nature, Sources, and Challenges
No ratings yet
Big Data: Nature, Sources, and Challenges
10 pages
Summary - Introduction To Data Analytics (2) - 3978
No ratings yet
Summary - Introduction To Data Analytics (2) - 3978
7 pages
2 Technology and Data
No ratings yet
2 Technology and Data
12 pages
What Is Big Data?
No ratings yet
What Is Big Data?
19 pages
Intro to Big Data & Analytics
No ratings yet
Intro to Big Data & Analytics
34 pages
Chapter 2 EMTE@Kibru 014914
No ratings yet
Chapter 2 EMTE@Kibru 014914
40 pages
Introduction To Big Data Platform (Module-3)
No ratings yet
Introduction To Big Data Platform (Module-3)
23 pages
Module 1
No ratings yet
Module 1
21 pages
Bda Combined
No ratings yet
Bda Combined
102 pages
Data Analytics in Business
No ratings yet
Data Analytics in Business
13 pages
Microooooooooooooo
No ratings yet
Microooooooooooooo
33 pages
ET Ch-2 Data Science PPT
No ratings yet
ET Ch-2 Data Science PPT
28 pages
3 Data Analytics Techniques
No ratings yet
3 Data Analytics Techniques
17 pages
FALLSEM2024-25 SWE2011 ETH VL2024250103282 2024-07-15 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE2011 ETH VL2024250103282 2024-07-15 Reference-Material-I
69 pages
Introduction To Data Science Module 2
No ratings yet
Introduction To Data Science Module 2
35 pages
All About Data Science
No ratings yet
All About Data Science
35 pages
Unit 1ppt 241202105748 Ba1c594f
No ratings yet
Unit 1ppt 241202105748 Ba1c594f
30 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
70 pages
DATA Analytics
No ratings yet
DATA Analytics
86 pages
Unit I Big Data
No ratings yet
Unit I Big Data
256 pages
Data Analysis - Version 2
No ratings yet
Data Analysis - Version 2
12 pages
Learn Data Analysis Private PDF
No ratings yet
Learn Data Analysis Private PDF
54 pages
Da Unit-1
No ratings yet
Da Unit-1
24 pages
Big Data Analytics Group 6
No ratings yet
Big Data Analytics Group 6
23 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
57 pages
What Is Big Data
No ratings yet
What Is Big Data
5 pages
$R3N9XOZ
No ratings yet
$R3N9XOZ
56 pages
Ds Notes-Unit 1, II and III Upto Part1
No ratings yet
Ds Notes-Unit 1, II and III Upto Part1
341 pages
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
No ratings yet
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
35 pages
Data Analytics III-i
No ratings yet
Data Analytics III-i
85 pages
EmgTech Chapter 02
No ratings yet
EmgTech Chapter 02
52 pages
IET - Chapter 2
No ratings yet
IET - Chapter 2
32 pages
Week 5 - Big Data Analytics - 2025 2025-05-22 02 - 53 - 39
No ratings yet
Week 5 - Big Data Analytics - 2025 2025-05-22 02 - 53 - 39
94 pages
Data Analytics
No ratings yet
Data Analytics
26 pages
Title Block Token - Attribute List - Advance Steel - Autodesk Knowledge Network
No ratings yet
Title Block Token - Attribute List - Advance Steel - Autodesk Knowledge Network
9 pages
Startup Pitch Guide
No ratings yet
Startup Pitch Guide
19 pages
GO Transit - Bus & Train Schedules
No ratings yet
GO Transit - Bus & Train Schedules
2 pages
Advantage and Limitation of Fixed Product Layout
100% (6)
Advantage and Limitation of Fixed Product Layout
4 pages
Istituto Italiano Per L'Africa E L'Oriente (Isiao)
No ratings yet
Istituto Italiano Per L'Africa E L'Oriente (Isiao)
18 pages
Plant Sap-Feeders: Rice Black Bug: Nymph Adult
No ratings yet
Plant Sap-Feeders: Rice Black Bug: Nymph Adult
41 pages
Orissa Industrial Policy 1980 Overview
No ratings yet
Orissa Industrial Policy 1980 Overview
6 pages
Bachelor of Engineering (B.Engg) : Education St. John College of Engineering and Technology
No ratings yet
Bachelor of Engineering (B.Engg) : Education St. John College of Engineering and Technology
1 page
Polyester Yarn Tech for Industry
No ratings yet
Polyester Yarn Tech for Industry
8 pages
08.4 Expedition To The Haunted Vale - Master NPC and Military Forces Tables by Phillip Gladney (October, 2000)
No ratings yet
08.4 Expedition To The Haunted Vale - Master NPC and Military Forces Tables by Phillip Gladney (October, 2000)
2 pages
Low FODMAP Diet Guide
No ratings yet
Low FODMAP Diet Guide
3 pages
Introducing Physical Geography 6th Edition Alan H. Strahler Ebook Multi Format Option
100% (5)
Introducing Physical Geography 6th Edition Alan H. Strahler Ebook Multi Format Option
107 pages
Candidates Export 2025-10-12
No ratings yet
Candidates Export 2025-10-12
9 pages
Petitioners Informal-Opening-Brief-Form-Agency
No ratings yet
Petitioners Informal-Opening-Brief-Form-Agency
6 pages
ABN203C - 103C - Metasol Breaker
No ratings yet
ABN203C - 103C - Metasol Breaker
14 pages
Exports of Ayush Services
No ratings yet
Exports of Ayush Services
2 pages
McDonald's Market Performance Analysis
No ratings yet
McDonald's Market Performance Analysis
247 pages
Planning A Successful Upgrade To Siebel 8.x: Robert Ponder and Subodh Patra Ponder Pro Serve
No ratings yet
Planning A Successful Upgrade To Siebel 8.x: Robert Ponder and Subodh Patra Ponder Pro Serve
31 pages
Steps On How To Register Your Laundry Business
No ratings yet
Steps On How To Register Your Laundry Business
1 page
Before Mestizaje: Published Online by Cambridge University Press
No ratings yet
Before Mestizaje: Published Online by Cambridge University Press
308 pages
Unit 4,5,6,7 and 9 Organic Reactions
No ratings yet
Unit 4,5,6,7 and 9 Organic Reactions
134 pages
New York City Department of Investigation Report On The Recruiting and Hiring Process For New York City Correction Officers
No ratings yet
New York City Department of Investigation Report On The Recruiting and Hiring Process For New York City Correction Officers
24 pages
The Swimming Dragon A Chinese Way To Fitness Beautiful Skin Weight Loss and High Energy
0% (2)
The Swimming Dragon A Chinese Way To Fitness Beautiful Skin Weight Loss and High Energy
27 pages
Part Number Packer Machine
No ratings yet
Part Number Packer Machine
1 page
Upstream Placement Test
No ratings yet
Upstream Placement Test
3 pages
Basa Pilipinas
No ratings yet
Basa Pilipinas
26 pages
Syllabus Rabat Language and Liberal Arts Rabat Arabic
No ratings yet
Syllabus Rabat Language and Liberal Arts Rabat Arabic
40 pages
Periodontics and Host Modulation
No ratings yet
Periodontics and Host Modulation
24 pages
Jose Rizal'S Education in Europe
No ratings yet
Jose Rizal'S Education in Europe
16 pages
Thesis
No ratings yet
Thesis
15 pages

Module1 Cse2500 Da

Uploaded by

Module1 Cse2500 Da

Uploaded by

CSE2500 - Data Analytics

Module 1-Introduction to Data Analysis

• Data has to be transformed into a form that is efficient for

Data Analysis Challenges

Data is collected from various sources, including:

Characteristics of Unstructured Data:-

• Value in Big data

- Important characteristic of Big Data

• For example, consider a dataset of thirty students on which we have to make an

• Being an analyzer, you can ask questions like:

• Have you ever been a scholar?

• Be getting answers like this it would be easier to determine the accuracy of

• validity is meant for an accurate analysis in order to get optimized

• Currently, in a case of Facebook, where the Belgium court has

• For analyzing the same, it is necessary to develop some new rules

• So, there is a need of communicate the information by visualizing it

• So, it is a challenge to work with the concept which way of

• Scientific data: Oil and gas exploration, space exploration,

• Digital surveillance: Surveillance photos and video.

• Sensor data: Traffic, weather, oceanographic sensors.

Special Types of Variables:-

Binary Variable: Dichotomous variable with values 0 or 1

Data Preparation Tasks

Where A is the attribute data,

You might also like