0% found this document useful (0 votes)

142 views49 pages

2 Data-Science PDF

The document provides an overview of a data science lesson that covers topics like the definition of data science, the data value chain, and basic concepts of big data. It defines data science as a multi-disciplinary field that uses scientific methods and processes to extract knowledge and insights from data. The key steps of the data value chain are described as data collection, data preparation, data analysis, and data visualization. The data value chain stages involve data acquisition, analysis, curation, storage, and usage.

Uploaded by

Frinzcharls Dajao Davocol

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

142 views49 pages

2 Data-Science PDF

Uploaded by

Frinzcharls Dajao Davocol

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

DATA SCIENCE

Lesson 2
Instructor: Ellen M. Guiñares
TOPICS COVERED

01 Overview for Data Science

Definition of Data and Information
Data types and representation

02 Data Value Chain

Data Acquisition
Data Analysis
Data Curating
Data Storage
Data Usage

03 Basic concepts of Big Data

What is Data
Science?
Data Science

It is a multi-disciplinary field that uses scientific

methods, processes, algorithms and systems to
extract knowledge and insights and knowledge
from data to drive decision-making and solve
complex problems.
KEY STEPS OF DATA SCIENCE
Data Collection
Gather relevant data

Data Preparation
Suitable format for
analysis Data Visualization
Implement the findings.

Data Analysis
Identify patterns,
relationships and
insights Data Visualization
Communicating the
findings
KEY STEPS OF DATA SCIENCE
WHAT IS EXPECTED OF A DATA SCIENTIST?

• Data scientists must master the full spectrum of

the data science life cycle and possess a level of
flexibility and understanding to maximize returns
at each phase of the process.
• Data scientists need to be curious and result-
oriented.
• Data science need a strong quantitative
background in statistics and linear algebra as
well as programming knowledge.
DATA SCIENCE LIFE CYCLE
Data and Information

Data Information
a representation of facts, Information is organized or
concepts, or instructions. classified data, which has
some meaningful values for
the receiver.
Data

● Data can be defined as a representation of facts, concepts,

or instructions in a formalized manner, which should be
suitable for communication, interpretation, or processing
by human or electronic machine.
● Data is represented with the help of characters such as
alphabets (A-Z, a-z), digits (0-9) or special characters (+,-
,/,*,<,>,= etc.)
Information

● Information is organized or classified data, which has some

meaningful values for the receiver. Information is the
processed data on which decisions and actions are based.

● Information is a data that has been processed into a form

that is meaningful to recipient and is of real or perceived
value in the current or the prospective action or decision of
recipient.
Information

● For the decision to be meaningful, the processed data must

qualify for the following characteristics:

✓ Timely − Information should be available when required.

✓ Accuracy − Information should be accurate.
✓ Completeness − Information should be complete.
Data Processing Cycle

● Data processing is the re-structuring or re-ordering of data

by people or machine to increase their usefulness and add
values for a particular purpose.

● Data processing consists of the following basic steps - input,

processing, and output. These three steps constitute the
data processing cycle.
Data Processing Cycle

● Data processing is the re-structuring or re-ordering of data

by people or machine to increase their usefulness and add
values for a particular purpose.

● Data processing consists of the following basic steps - input,

processing, and output. These three steps constitute the
data processing cycle.
Data Processing Cycle

Input Step Processing Step Output Step

the input data is the input data is the result of the
prepared in some changed to produce proceeding processing
convenient form for data in a more useful step is collected.
processing. form.
The form depends on
the processing
machine.
Data Types and its representation

● Data type or simply type is an attribute of data which tells

the compiler or interpreter how the programmer intends to
use the data.
● Almost all programming languages explicitly include the
notion of data type. Common data types include:
✓ Integers
✓ Booleans
✓ Characters
✓ floating-point numbers
✓ alphanumeric strings
01
Data types / structure
Based on analysis of data
Data Types / structure

● Based on analysis of data:

✓ Structured
✓ Unstructured
✓ Semi-structured
✓ Metadata
● is data that adheres to a pre-defined data
Data Types / structure model and is therefore straightforward to
analyze.

● conforms to a tabular format with

relationship between the different rows and
columns. Common examples are Excel files
or SQL databases.
What is a structured
data? ● Structured data is considered the most
‘traditional’ form of data storage, since the
earliest versions of database management
systems (DBMS) were able to store, process
and access structured data.
● Unstructured data is information that either
Data Types / structure does not have a predefined data model or is
not organized in a pre-defined manner.

● It is without proper formatting and

alignment

What is ● Unstructured information is typically text-

heavy, but may contain data such as dates,
unstructured data? numbers, and facts as well.

● The ability to extract value from

unstructured data is one of main drivers
behind the quick growth of Big Data.
Data Types / structure
● Semi-structured data is a form of structured
data that does not conform with the formal
structure of data models associated with
relational databases or other forms of data
tables.
What is a semi-
structured data? ● Fore example: JSON and XML are forms of
semi-structured data.

● The reason that this third category exists

(between structured and unstructured data)
is because semi-structured data is
considerably easier to analyze than
unstructured data.
Data Types / structure
● A last category of data type is metadata.
From a technical point of view, this is not a
separate data structure, but it is one of the
most important elements for Big Data
analysis and big data solutions.
What is Metadata?
● Metadata is data about data.

● It provides additional information about a

specific set of data.
02
Data Value Chain
Information flow within a big data systems
Data Value Chain

● refers to the entire process of transforming raw data into

valuable insights, information, and knowledge that can be
used for decision-making, innovation, and business growth.

● It involves a series of interconnected activities that add

value to data at each stage of the process, from data
collection and processing to analysis and dissemination.
Data Value Chain

● The Data Value Chain is introduced to describe the

information flow within a big data system as a series of
steps needed to generate value and useful insights from
data.

● The Big Data Value Chain identifies the following key high-
level activities:
Data Value Chain Stages

● The Data Value Chain typically includes the following stages:

✓ Data Acquisition
✓ Data Analysis
✓ Data Curation
✓ Data Storage
✓ Data Usage
Data Value Chain Stages

1. Data Acquisition/Collection

✓ It is the process of gathering, filtering, and cleaning data

before it is put in a data warehouse or any other storage
solution on which data analysis can be carried out.
✓ Data acquisition is one of the major big data challenges in
terms of infrastructure requirements.
Data Value Chain Stages

2. Data Analysis

✓ It is concerned with making the raw data acquired amenable

to use in decision-making as well as domain-specific usage.
✓ Data analysis involves exploring, transforming, and modelling
data with the goal of highlighting relevant data, synthesizing
and extracting useful hidden information with high potential
from a business point of view.
Data Value Chain Stages

3. Data Curation

✓ It is the active management of data over its life cycle to

ensure it meets the necessary data quality requirements for
its effective usage.
✓ Data curation processes can be categorized into different
activities such as content creation, selection, classification,
transformation, validation, and preservation.
Data Value Chain Stages

4. Data Storage

✓ It is the persistence and management of data in a scalable

way that satisfies the needs of applications that require fast
access to the data.
✓ Relational Database Management Systems (RDBMS) have
been the main, and almost unique, solution to the storage
paradigm for nearly 40 years.
Data Value Chain Stages

5. Data Usage

✓ It covers the data-driven business activities that need access

to data, its analysis, and the tools needed to integrate the
data analysis within the business activity.
✓ Data usage in business decision-making can enhance
competitiveness through reduction of costs, increased added
value, or any other parameter that can be measured against
existing performance criteria
03
Basic Concepts of Big Data
Information flow within a big data systems
Basic Concepts of Big Data

• Big data is a blanket term for the non-traditional strategies

and technologies needed to gather, organize, process, and
gather insights from large datasets.
Basic Concepts of Big Data

• An exact definition of “big data” is difficult to nail down

because projects, vendors, practitioners, and business
professionals use it quite differently. With that in mind,
generally speaking, big data is:
✓ large datasets
✓ the category of computing strategies and technologies
that are used to handle large datasets
Basic Concepts of Big Data

• refers to the vast and diverse sets of data that are generated
at a high velocity, volume, and variety from various sources.
The data may be structured, semi-structured, or unstructured
and cannot be easily processed or analyzed using traditional
data processing techniques.
Key Components of Big Data

1. Volume - Big Data refers to data that is too large to be

processed using traditional data processing tools and
techniques. The volume of data can range from terabytes to
petabytes and beyond.

2. Velocity: Big Data is generated at an unprecedented speed

and needs to be processed in real-time or near real-time to
derive meaningful insights. This velocity can be measured in
microseconds to seconds or minutes.
Big Data Characteristics
Other Characteristics of Big Data – 6V’s

1. Veracity: The variety of sources and the complexity of the

processing can lead to challenges in evaluating the quality of
the data (and consequently, the quality of the resulting
analysis).

2. Variability: Variation in the data leads to wide variation in

quality. Additional resources may be needed to identify, process,
or filter low quality data to make it more useful.
Other Characteristics of Big Data – 6V’s

3. Value: The ultimate challenge of big data is delivering value.

Sometimes, the systems and processes in place are complex
enough that using the data and extracting actual value can
become difficult.
Where does big data
come from?
Sources of Big Data

1. Social Media: Social media platforms such as Facebook,

Twitter, LinkedIn, and Instagram generate vast amounts of
data in the form of user interactions, posts, comments, likes,
and shares.

2. Internet of Things (IoT) Devices: IoT devices such as sensors,

smart appliances, and wearable technology generate huge
volumes of data in real-time.
Sources of Big Data

3. E-commerce Transactions: E-commerce platforms generate a

significant amount of data related to customer behavior,
purchase history, preferences, and trends.

4. Machine-generated Data: Machines and applications

generate a massive amount of data, including log files,
clickstream data, system-generated data, and more.
Sources of Big Data

5. Mobile Devices: Mobile devices generate a significant amount

of data, including location data, usage data, and user behavior
data.

6. Customer Feedback: Customer feedback in the form of

surveys, reviews, and support tickets generate large volumes of
data that can be analyzed to improve customer experience.
Sources of Big Data

7. Business Applications: Business applications such as CRM,

ERP, and HRM generate a vast amount of data that can be
analyzed to improve business operations.

8. Public Data Sources: Public data sources such as government

data, weather data, and census data can be combined with other
data sources to create more significant insights.

Data Science
No ratings yet
Data Science
32 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
32 pages
Chapter 2 (Data Science)
No ratings yet
Chapter 2 (Data Science)
35 pages
ET Ch-2 Data Science PPT
No ratings yet
ET Ch-2 Data Science PPT
28 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
57 pages
CH 2 Data Science
No ratings yet
CH 2 Data Science
28 pages
Emerging Tech CH 2
No ratings yet
Emerging Tech CH 2
52 pages
EmgTech Chapter 02
No ratings yet
EmgTech Chapter 02
52 pages
Chapter 2 Emerging
No ratings yet
Chapter 2 Emerging
31 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
30 pages
CH-2 Data Science
No ratings yet
CH-2 Data Science
45 pages
Chapter 2 EmTe
No ratings yet
Chapter 2 EmTe
37 pages
Ict Ch. 2
No ratings yet
Ict Ch. 2
38 pages
Data Science and Big Data Basics
No ratings yet
Data Science and Big Data Basics
32 pages
Course Name: Introduction To Emerging Technologies
No ratings yet
Course Name: Introduction To Emerging Technologies
24 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
41 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
41 pages
Chapter-2 Data Science2
No ratings yet
Chapter-2 Data Science2
24 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
28 pages
IT 106 - Intro To Data Sciences
No ratings yet
IT 106 - Intro To Data Sciences
32 pages
Data Science: Chapter Two
No ratings yet
Data Science: Chapter Two
8 pages
Chapter 2 - Intro To Data Sciences (Updated)
No ratings yet
Chapter 2 - Intro To Data Sciences (Updated)
67 pages
Emergency Chapter Two
No ratings yet
Emergency Chapter Two
41 pages
Data Lifecycle
No ratings yet
Data Lifecycle
55 pages
Chapter 2EMR
No ratings yet
Chapter 2EMR
21 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
22 pages
Chapter 2. Introduction To Data Science
No ratings yet
Chapter 2. Introduction To Data Science
41 pages
Ch2 Emerging
No ratings yet
Ch2 Emerging
24 pages
Chapter 2 Introduction To Data Science
No ratings yet
Chapter 2 Introduction To Data Science
50 pages
IET - Chapter 2
No ratings yet
IET - Chapter 2
32 pages
Data Science Overview and Concepts
No ratings yet
Data Science Overview and Concepts
20 pages
CHAPTER 2 Emerging
No ratings yet
CHAPTER 2 Emerging
8 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
27 pages
Chapter 2 Data Science1
No ratings yet
Chapter 2 Data Science1
41 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Introduction To Emerging Technologies Chapter 2
No ratings yet
Introduction To Emerging Technologies Chapter 2
31 pages
Chapter 2 EMTE@Kibru 014914
No ratings yet
Chapter 2 EMTE@Kibru 014914
40 pages
Chapter Two
No ratings yet
Chapter Two
14 pages
Chapter 2 - EMTE - 240216 - 133452
No ratings yet
Chapter 2 - EMTE - 240216 - 133452
47 pages
Data Science Essentials & Big Data Concepts
No ratings yet
Data Science Essentials & Big Data Concepts
20 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
58 pages
Chapter Two
No ratings yet
Chapter Two
57 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
33 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
37 pages
Data Science: Insights & Challenges
No ratings yet
Data Science: Insights & Challenges
33 pages
Chapter 2 Introduction To Data Science - For Extension
No ratings yet
Chapter 2 Introduction To Data Science - For Extension
51 pages
ETCh 2
No ratings yet
ETCh 2
36 pages
CH 2
No ratings yet
CH 2
20 pages
Multidisciplinary Field That Uses A Variety
No ratings yet
Multidisciplinary Field That Uses A Variety
48 pages
CH-2 Introduction To Data Science
No ratings yet
CH-2 Introduction To Data Science
26 pages
CH 2
No ratings yet
CH 2
23 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
37 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
43 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
35 pages
Chapter - 2 Data Sciences
No ratings yet
Chapter - 2 Data Sciences
25 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
8 pages
Chapter 2: Data Science
No ratings yet
Chapter 2: Data Science
32 pages
Installing Oracle Database 10g Release 1 and 2 On Red Hat Enterprise Linux
No ratings yet
Installing Oracle Database 10g Release 1 and 2 On Red Hat Enterprise Linux
35 pages
Appendix A-Agent For Linux
No ratings yet
Appendix A-Agent For Linux
23 pages
Instantiating Prisma Client - Prisma Documentation
No ratings yet
Instantiating Prisma Client - Prisma Documentation
2 pages
Data Flow Diagram Progect
No ratings yet
Data Flow Diagram Progect
3 pages
Database Fundamentals
No ratings yet
Database Fundamentals
11 pages
Git 203 Assignment 1
No ratings yet
Git 203 Assignment 1
2 pages
Washington
No ratings yet
Washington
118 pages
Procedures, Functions & Triggers
No ratings yet
Procedures, Functions & Triggers
29 pages
Gartner's 2023 Cloud DBMS Insights
No ratings yet
Gartner's 2023 Cloud DBMS Insights
26 pages
SQL Server 2008 Install Guide
No ratings yet
SQL Server 2008 Install Guide
45 pages
Northwind Database Structure Example
No ratings yet
Northwind Database Structure Example
63 pages
Ibm X3400 M2
No ratings yet
Ibm X3400 M2
8 pages
Day 3-Linked List
No ratings yet
Day 3-Linked List
22 pages
Industry AI Playbook - May 2025
No ratings yet
Industry AI Playbook - May 2025
4 pages
FlashSystem Overview and Safeguarded
No ratings yet
FlashSystem Overview and Safeguarded
15 pages
Document 658930.1
No ratings yet
Document 658930.1
2 pages
DBMS Exp 10
No ratings yet
DBMS Exp 10
3 pages
Staff Record
No ratings yet
Staff Record
4 pages
Metodica Predării Informaticii, B. 2001
No ratings yet
Metodica Predării Informaticii, B. 2001
100 pages
SAP ABAP Interview Q&A Guide
No ratings yet
SAP ABAP Interview Q&A Guide
8 pages
Multer File Upload Quiz
No ratings yet
Multer File Upload Quiz
86 pages
Distributed Operating Systems and File Systems
No ratings yet
Distributed Operating Systems and File Systems
3 pages
Dbms Lab Exam
0% (2)
Dbms Lab Exam
13 pages
DBMS Viva Questions
No ratings yet
DBMS Viva Questions
6 pages
JCVA 2nd Session Basic To Advanced Excel
No ratings yet
JCVA 2nd Session Basic To Advanced Excel
26 pages
Framework and LINQ To Entities
No ratings yet
Framework and LINQ To Entities
9 pages
SQL Injection Attack Lab
No ratings yet
SQL Injection Attack Lab
9 pages
Applies To:: Oracle Retail 16.0.x Data Models (Doc ID 2200398.1)
No ratings yet
Applies To:: Oracle Retail 16.0.x Data Models (Doc ID 2200398.1)
3 pages
Temenos T24
No ratings yet
Temenos T24
7 pages
AUth
No ratings yet
AUth
23 pages

2 Data-Science PDF

Uploaded by

2 Data-Science PDF

Uploaded by

DATA SCIENCE

01 Overview for Data Science

02 Data Value Chain

03 Basic concepts of Big Data

It is a multi-disciplinary field that uses scientific

• Data scientists must master the full spectrum of

● Data can be defined as a representation of facts, concepts,

● Information is organized or classified data, which has some

● Information is a data that has been processed into a form

● For the decision to be meaningful, the processed data must

✓ Timely − Information should be available when required.

● Data processing is the re-structuring or re-ordering of data

● Data processing consists of the following basic steps - input,

● Data processing is the re-structuring or re-ordering of data

● Data processing consists of the following basic steps - input,

Input Step Processing Step Output Step

● Data type or simply type is an attribute of data which tells

● Based on analysis of data:

● conforms to a tabular format with

● It is without proper formatting and

What is ● Unstructured information is typically text-

● The ability to extract value from

● The reason that this third category exists

● It provides additional information about a

● refers to the entire process of transforming raw data into

● It involves a series of interconnected activities that add

● The Data Value Chain is introduced to describe the

● The Data Value Chain typically includes the following stages:

✓ It is the process of gathering, filtering, and cleaning data

✓ It is concerned with making the raw data acquired amenable

✓ It is the active management of data over its life cycle to

✓ It is the persistence and management of data in a scalable

✓ It covers the data-driven business activities that need access

• Big data is a blanket term for the non-traditional strategies

• An exact definition of “big data” is difficult to nail down

1. Volume - Big Data refers to data that is too large to be

2. Velocity: Big Data is generated at an unprecedented speed

1. Veracity: The variety of sources and the complexity of the

2. Variability: Variation in the data leads to wide variation in

3. Value: The ultimate challenge of big data is delivering value.

1. Social Media: Social media platforms such as Facebook,

2. Internet of Things (IoT) Devices: IoT devices such as sensors,

3. E-commerce Transactions: E-commerce platforms generate a

4. Machine-generated Data: Machines and applications

5. Mobile Devices: Mobile devices generate a significant amount

6. Customer Feedback: Customer feedback in the form of

7. Business Applications: Business applications such as CRM,

8. Public Data Sources: Public data sources such as government

You might also like