0% found this document useful (0 votes)

25 views30 pages

Emerging Chapter 2

Chapter 2 of the course 'Introduction to Emerging Technology' focuses on data science, covering its definition, the distinction between data and information, data types, and the data value chain. It outlines the objectives for students, including understanding the data processing life cycle and the basics of big data, as well as the Hadoop ecosystem. The chapter also discusses the characteristics of big data and the importance of clustered computing in managing large datasets.

Uploaded by

ambachewm27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views30 pages

Emerging Chapter 2

Uploaded by

ambachewm27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

Jigjiga University

Course Name:
Introduction to Emerging Technology
Chapter 2

Data Science
Introduction

 In the previous chapter, the concept of the role of

data for emerging technologies was discussed.
• In this chapter, you are going to learn more
about:-
Data science,
Data vs. information,
Data types and representation,
Data value chain, and
Basic concepts of big data.
Objectives
After completing this chapter, the students will be able to:
 Describe what data science is and the role of data scientists.
Differentiate data and information.
 Describe data processing life cycle
 Understand different data types from diverse perspectives
 Describe data value chain in emerging era of big data.
 Understand the basics of Big Data.
 Describe the purpose of the Hadoop ecosystem
components.
Activity 2.1

 What is data science? Can you describe the

role of data in emerging technology?
 What are data and information?
 What is big data?
An Overview of Data Science


Data science is a multi-disciplinary field that uses scientific
methods, processes, algorithms, and systems to extract
knowledge and insights from structured, semi-structured
and unstructured data.

Data science is much more than simply analyzing data.
It offers a range of roles and requires a range of skills.

What are data and information?

Data can be defined as:-


It can be described as unprocessed facts and figures.
An Overview of Data Science
Whereas information is: -
The processed data on which decisions and actions are
based.
 It is data that has been processed into a form that is
meaningful to the recipient and is of real or perceived
value in the current or the prospective action or decision
of recipient.
Furtherer more, information is interpreted data; created
from organized, structured, and processed data in a
particular context.
The difference between Data and Information
What is Data?
 Data is definid as the symbols that represent people, events, things
and ideas.
 Data becomes information when it is presented in a format that
people can understand and use.
 Data is a raw material for information and Data alone tells no story.
What is Information?
 Information is the collection of facts and figures which are
organized in a meaningful manner to be used as a base for guidance
and decision making.
 Information is a data that has been processed and has a meaning to
the user.
 Information is a processed data we get as an output.
Data vs Information

Data Information
• Meaningless • Meaningful
• Doesn’t used for- • used for decision
decision making making

9
Data Processing Cycle

Data processing is the re-structuring or re-ordering of data

by people or machines to increase their usefulness and add
values for a particular purpose.
Data processing consists of the following basic steps - input,
processing, and output.
These three steps constitute the data processing cycle.
 Input − in this step, the input data is prepared in some
convenient form for processing.
Processing − in this step, the input data is changed to
produce data in a more useful form.
 Output − at this stage, the result of the processing step is
collected.
Data types and their representation

Data type is simply an attribute of data that tells the

compiler or interpreter how the programmer intends to use
the data.
Data types from Computer programming perspective
Common data types include:
Integers(int)- is used to store whole numbers,
Booleans(bool)- is used to represent true or false
Characters(char)- is used to store a single character
Floating(float)- is used to store real numbers
Alphanumeric strings(string)- used to store a
combination of characters and numbers
Data types from Data Analytics perspective

From a data analytics point of view, it is important to

understand that there are three common types of data
types or structures:
1. Structured,
2. Semi-structured, and
3. Unstructured data types.
Structured Data
Structured data is highly-organized and is stored in a predefined
format.
Structured data are stored in rows and columns or in tabular
format.
Common examples of structured data are Excel files or SQL
db has structured rows and columns that can be sorted.
Data types from Data Analytics perspective

Semi-structured Data
It is difficult to categorize this types of data. Because
sometimes it look like structured data and sometimes
look like unstructured data. E.g. JSON and XML
Unstructured Data
Unstructured data is information that either does not
have a predefined data model or is not organized in a

pre-defined manner.
Unstructured data can not be stored in rows and
columns.
Example. audio, video files ,images or No- SQL
Structured data vs Unstructured data
Can be displayed in Cannot be displayed in
rows, columns and rows, columns and
relational database relational database
Only 20% of world data 80% of world data
Requires less storage Requires more storage
Easy to manage Difficult to manage
Metadata – Data about Data

Metadata is data about data.

Metadata is defined as the data providing information
about one or more aspects of the data.
It provides additional information about a specific
set of data.
In a set of photographs, for example, metadata could
describe when and where the photos were taken.
The metadata then provides fields for dates and locations
which, by themselves, can be considered structured data.
Because of this reason, metadata is frequently used by Big
Data solutions for initial analysis.
Data value Chain

The data value chain describes the process of data creation and use from
first identifying a need for data to its final use and possible reuse.
The Big Data Value Chain identifies five high-level activities:
Data Acquisition
It is the process of gathering, filtering, and cleaning data before it is
put in a data warehouse or any other storage solution on which data
analysis can be carried out.
Data acquisition is one of the major big data challenges in terms of
infrastructure requirement.
Data Analysis
Concerned with making the raw data acquired amenable to
use in decision-making as well as domain-specific usage.
Data analysis involves exploring, transforming and
modeling data with the goal of highlighting relevant data.
Data value Chain
Data Curation
It is the active management of data over its life cycle to
ensure it meets the necessary data quality requirements
for its effective usage.
 The professional person that ensures the data quality
called data curator.
Data curation processes can be categorized into different
activities such as content creation, selection, classification,
transformation, validation, and preservation.
Data value Chain

Data Storage
 It is the persistence and management of data in a
scalable way that satisfies the needs of applications
that require fast access to the data. Because of the
fact that SQL(relational database)lack to manage
 ACID(Atomicity , Consistency , Isolation and
Durability) Properties.

Nowadays we used NoSQL

Data value Chain

Data Usage
It covers the data-driven business activities
that need access to data, its analysis, and the
tools needed to integrate the data analysis
within the business activity.
What Is Big Data?

Big data is the term for a collection of data sets so large and
complex that it becomes difficult to process using on-hand
database management tools or traditional data processing
applications.
 Big data refers to large sets of complex data ,both
structured and unstructured which traditional processing
techniques and /or algorisms are unable to operate on.
 Big data refers to data sets whose size is beyond the
ability of typical database software tools to capture, store,
manage and analyzed.
Big Data

 Big data is characterized by 4V and more:

Volume: large amounts of data Zeta bytes/
Massive datasets.
Velocity: Data is live streaming or in motion/ It is
the velocity of generating new data.
Variety: data comes in many different forms from
diverse sources
Veracity: can we trust the data? How accurate is
it? etc.
Clustered Computing and Hadoop Ecosystem

Clustered Computing
Big data clustering software combines the resources of
many smaller machines, seeking to provide a number of
benefits:
Resource pool: Combining the available storage space to
hold data is a clear benefit.
High Availability: Clusters can provide varying levels of
fault tolerance and availability guarantees to prevent
hardware or software failures from affecting access to data
and processing.
Easy Scalability: Clusters make it easy to scale
horizontally by adding additional machines to the group.
without expanding the physical resources on a machine.
Clustered Computing and Hadoop Ecosystem…

Hadoop and its Ecosystem

Hadoop is an open-source framework intended to make
interaction with big data easier.

It is a framework that allows for the distributed processing

of large datasets across clusters of computers using simple
programming models.
Clustered Computing and Hadoop Ecosystem

The four key characteristics of Hadoop are:

Economical: Its systems are highly economical.
Reliable: It is reliable as it stores copies of the data on
different machines and is resistant to hardware
failure.
Scalable: It is easily scalable both, horizontally and
vertically. A few extra nodes help in scaling up
the framework.
Flexible: It is flexible and you can store as much
structured and unstructured data as you need
to and decide to use them later.
Big Data Life Cycle with Hadoop

Ingesting data into the system

The first stage of Big Data processing is Ingest.
The data is ingested or transferred to Hadoop from various
sources such as relational databases, systems, or local files.
Sqoop transfers data from RDBMS to HDFS, whereas Flume
transfers event data.
Processing the data in storage
The second stage is processing.
In this stage, the data is stored and processed.
The data is stored in the distributed file system, HDFS, and
the NoSQL distributed data, HBase.
Spark and Map Reduce perform data processing.
Big Data Life Cycle with Hadoop…
Computing and analyzing data
The third stage is to Analyze.
Here, the data is analyzed by processing frameworks such
as Pig, Hive, and Impala.
Pig converts the data using a map and reduces and then
analyzes it.
Hive is also based on the map and reduces programming
and is most suitable for structured data.
Visualizing the results
The fourth stage is Access, which is performed by tools such
as Hue and Cloud era Search.
In this stage, the analyzed data can be accessed by users.
Review Questions
1. What is data science?
2. What is the difference between structured
and unstructured data?
3. What is the difference between data and
information?
4. What is data processing life cycle, list the
activity?
Quiz 5%

Part I: - True or false

1.The first industrial revolution were started in America……………………
2. Data Curation is the active management over its life cycle to ensure
it meets the necessary data quality requirements…………………….
3.The industrial revolution was a time when the manufacturing of
goods moved from small shops and homes to large factories…………….
Part II: - Choose the best answer
4. Which data type used to store integer
A. float B. int C. bool D. char
5.Human Computer Interaction consists of
A. User B. Computer itself C. The Way they work together D.
All
Part III: - Short Answer
6. What is Meta Data ?
End of
Chapter Two
Any Questions?

Emerging Chapter 2
No ratings yet
Emerging Chapter 2
22 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
32 pages
ET Ch-2 Data Science PPT
No ratings yet
ET Ch-2 Data Science PPT
28 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Ch2 Emerging
No ratings yet
Ch2 Emerging
24 pages
Chapter 2 EMTE@Kibru 014914
No ratings yet
Chapter 2 EMTE@Kibru 014914
40 pages
EmgTech Chapter 02
No ratings yet
EmgTech Chapter 02
52 pages
Ict Ch. 2
No ratings yet
Ict Ch. 2
38 pages
Emerging Tech CH 2
No ratings yet
Emerging Tech CH 2
52 pages
CH-2 Data Science
No ratings yet
CH-2 Data Science
45 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Data Science
No ratings yet
Data Science
32 pages
CH 2 Data Science
No ratings yet
CH 2 Data Science
28 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
28 pages
Chapter 2 Emerging
No ratings yet
Chapter 2 Emerging
31 pages
Emergency Chapter Two
No ratings yet
Emergency Chapter Two
41 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
57 pages
Chapter 2 (Data Science)
No ratings yet
Chapter 2 (Data Science)
35 pages
Data Science and Big Data Basics
No ratings yet
Data Science and Big Data Basics
32 pages
Chapter Two
No ratings yet
Chapter Two
14 pages
Chapter 2. Introduction To Data Science
No ratings yet
Chapter 2. Introduction To Data Science
41 pages
Course Name: Introduction To Emerging Technologies
No ratings yet
Course Name: Introduction To Emerging Technologies
24 pages
CH 2
No ratings yet
CH 2
23 pages
Chapter-2 Data Science2
No ratings yet
Chapter-2 Data Science2
24 pages
Chapter 2 EmTe
No ratings yet
Chapter 2 EmTe
37 pages
Chap 2-Data Analysis
No ratings yet
Chap 2-Data Analysis
27 pages
Data Science Essentials & Big Data Concepts
No ratings yet
Data Science Essentials & Big Data Concepts
20 pages
Data Science: Insights & Challenges
No ratings yet
Data Science: Insights & Challenges
33 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Data Science: Chapter Two
No ratings yet
Data Science: Chapter Two
8 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
27 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
33 pages
ETCh 2
No ratings yet
ETCh 2
36 pages
Chapter 2 - EMTE - 240216 - 133452
No ratings yet
Chapter 2 - EMTE - 240216 - 133452
47 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
43 pages
Understanding Data Science Concepts
No ratings yet
Understanding Data Science Concepts
29 pages
Chapter Two
No ratings yet
Chapter Two
57 pages
Emerging CH2
No ratings yet
Emerging CH2
41 pages
Chapter 2-2
No ratings yet
Chapter 2-2
34 pages
Data Science Overview and Concepts
No ratings yet
Data Science Overview and Concepts
20 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
37 pages
Data Science & Big Data Essentials
No ratings yet
Data Science & Big Data Essentials
31 pages
Islamic Answer
No ratings yet
Islamic Answer
27 pages
Chapter 2 Data Science1
No ratings yet
Chapter 2 Data Science1
41 pages
Introduction To Emerging Technologies Chapter 2
No ratings yet
Introduction To Emerging Technologies Chapter 2
31 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
41 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
41 pages
Chapter 2 Introduction To Data Science
No ratings yet
Chapter 2 Introduction To Data Science
50 pages
CHAPTER 2 Emerging
No ratings yet
CHAPTER 2 Emerging
8 pages
Data Science
No ratings yet
Data Science
23 pages
Chapter 2EMR
No ratings yet
Chapter 2EMR
21 pages
CH 2
No ratings yet
CH 2
20 pages
IET - Chapter 2
No ratings yet
IET - Chapter 2
32 pages
Chapter 2 - Intro To Data Sciences (Updated)
No ratings yet
Chapter 2 - Intro To Data Sciences (Updated)
67 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
58 pages
2 Data-Science PDF
No ratings yet
2 Data-Science PDF
49 pages
Chapter - 2 Data Sciences
No ratings yet
Chapter - 2 Data Sciences
25 pages
Data Science Essentials for Beginners
No ratings yet
Data Science Essentials for Beginners
20 pages
IT 106 - Intro To Data Sciences
No ratings yet
IT 106 - Intro To Data Sciences
32 pages
Chapter 3
No ratings yet
Chapter 3
14 pages
Chapter 1 1
No ratings yet
Chapter 1 1
56 pages
Chapter Two 1
No ratings yet
Chapter Two 1
41 pages
Chapter 1
No ratings yet
Chapter 1
11 pages
Chapter 3 1
No ratings yet
Chapter 3 1
43 pages
Information Technology Program
No ratings yet
Information Technology Program
2 pages
Chapter 3
No ratings yet
Chapter 3
48 pages
Chapter 2.2
No ratings yet
Chapter 2.2
40 pages
Emerging Technology Chapter 1
No ratings yet
Emerging Technology Chapter 1
29 pages
Chapter 3
No ratings yet
Chapter 3
44 pages
Chapter 4
No ratings yet
Chapter 4
17 pages
Chapter 5
No ratings yet
Chapter 5
13 pages
Chapter 2
No ratings yet
Chapter 2
17 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Chapter 1
No ratings yet
Chapter 1
33 pages
Chapter 6
No ratings yet
Chapter 6
11 pages
Chapter 6
No ratings yet
Chapter 6
13 pages
Chapter Three
No ratings yet
Chapter Three
50 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Chapter Two
No ratings yet
Chapter Two
25 pages
Introduction To IT Chapter 5
No ratings yet
Introduction To IT Chapter 5
45 pages
Mtech Cse Curriculum N Syllabus
No ratings yet
Mtech Cse Curriculum N Syllabus
91 pages
Emr MGMT
No ratings yet
Emr MGMT
585 pages
Iot Module 5 Textbook
No ratings yet
Iot Module 5 Textbook
65 pages
HDFS Overview and Key Concepts
No ratings yet
HDFS Overview and Key Concepts
22 pages
Linux For Education
No ratings yet
Linux For Education
10 pages
GitHub Mikeroyal Digital Forensics Guide Digital Forensics Guide
No ratings yet
GitHub Mikeroyal Digital Forensics Guide Digital Forensics Guide
30 pages
Syllabus of BDA
No ratings yet
Syllabus of BDA
2 pages
Big Data Analytics
No ratings yet
Big Data Analytics
287 pages
Business Intelligence: Data Management Issues
No ratings yet
Business Intelligence: Data Management Issues
8 pages
Social Network Analysis PhD Proposal
No ratings yet
Social Network Analysis PhD Proposal
45 pages
Yahoo Hadoop
No ratings yet
Yahoo Hadoop
154 pages
Big Data Analytics (Unit-II)
No ratings yet
Big Data Analytics (Unit-II)
17 pages
Divya Namdev Resume
No ratings yet
Divya Namdev Resume
3 pages
IOT Syl
No ratings yet
IOT Syl
4 pages
Hive Architecture and Query Operations
No ratings yet
Hive Architecture and Query Operations
36 pages
Traditional Data and Big Data
No ratings yet
Traditional Data and Big Data
12 pages
Cloudcomputingbasics Aselfteachingintroduction PDF
100% (1)
Cloudcomputingbasics Aselfteachingintroduction PDF
199 pages
Sparks QL Sig Mod 2015
No ratings yet
Sparks QL Sig Mod 2015
12 pages
Google Cloud Foundations: Data, ML, AI
No ratings yet
Google Cloud Foundations: Data, ML, AI
15 pages
Cs8711-Cloud Computing Laboratory Manual
No ratings yet
Cs8711-Cloud Computing Laboratory Manual
69 pages
Onefs 9400 Relnotes
No ratings yet
Onefs 9400 Relnotes
11 pages
Big Data Presentation
No ratings yet
Big Data Presentation
45 pages
BIG DATA IOT 600 Assigngment
No ratings yet
BIG DATA IOT 600 Assigngment
8 pages
Professional Hadoop Solutions 1st Edition Boris Lublinsky All Chapter Instant Download
100% (27)
Professional Hadoop Solutions 1st Edition Boris Lublinsky All Chapter Instant Download
52 pages
Azure HDInsight Spark Lab Guide
No ratings yet
Azure HDInsight Spark Lab Guide
29 pages
Article 02. A Review On Data Science Technologies
No ratings yet
Article 02. A Review On Data Science Technologies
4 pages
Hadoop Architecture & HDFS Guide
100% (1)
Hadoop Architecture & HDFS Guide
74 pages
MCA Board Meeting Minutes 2024
No ratings yet
MCA Board Meeting Minutes 2024
42 pages
MyBatis 3 User Guide 5
No ratings yet
MyBatis 3 User Guide 5
15 pages

Emerging Chapter 2

Uploaded by

Emerging Chapter 2

Uploaded by

Jigjiga University

 In the previous chapter, the concept of the role of

 What is data science? Can you describe the

What are data and information?

Data can be defined as:-

Data processing is the re-structuring or re-ordering of data

Data type is simply an attribute of data that tells the

From a data analytics point of view, it is important to

Metadata is data about data.

Nowadays we used NoSQL

 Big data is characterized by 4V and more:

Hadoop and its Ecosystem

It is a framework that allows for the distributed processing

The four key characteristics of Hadoop are:

Ingesting data into the system

Part I: - True or false

You might also like