0% found this document useful (0 votes)

37 views33 pages

Data Science: Insights & Challenges

Data science is a multi-disciplinary field focused on extracting knowledge and insights from various forms of data using scientific methods and algorithms. It offers significant advantages, such as fraud detection and improved decision-making, but also faces challenges like data variety and a lack of skilled professionals. The document also discusses data types, the data processing cycle, and the concept of big data, highlighting its characteristics and applications.

Uploaded by

daniel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views33 pages

Data Science: Insights & Challenges

Uploaded by

daniel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 33

Chapter Two

Data Science
2.1. Overview of Data Science

 Data science is a multi-disciplinary field that uses scientific

methods, processes, algorithms, and systems to extract knowledge
and insights from structured, semi-structured and unstructured
data.
 Data Science is the area of study which involves extracting insights
from vast amounts of data by the use of various scientific methods,
algorithms, and processes. It helps you to discover hidden patterns
from the raw data.
Overview of Data Science
3
(I)
 Data Science is an interdisciplinary field that allows you to extract
knowledge from structured or unstructured data.
 Data science enables you to translate a business problem into a
research project and then translate it back into a practical solution.
Significant advantages of using Data Science

 Data is the oil for today's world. With the right tools, technologies,
algorithms, we can use data and convert it into a distinctive
business advantage.
 Data Science can help you to detect fraud using advanced machine
learning algorithms.
 It helps you to prevent any significant monetary losses.
Significant advantages of using Data
5
Science (II)

 Allows to build intelligence ability in machines

 You can perform sentiment analysis to gauge customer
brand loyalty
 It enables you to take better and faster decisions
 Helps you to recommend the right product to the right
customer to enhance your business.
Challenges of Data science
6

 High variety of information & data is required for accurate analysis

 Not adequate data science talent pool available
 Management does not provide financial support for a data science
team
 Unavailability of/difficult access to data
Challenges of Data science
7
(I)
 Data Science results not effectively used by business decision
makers
 Explaining data science to others is difficult
 Privacy issues
 Lack of significant domain expert
 If an organization is very small, they can't have a Data Science team
What are data and information?
8

 Data can be defined as a representation of facts, concepts, or

instructions in a formalized manner, which should be suitable for
communication, interpretation, or processing, by human or
electronic machines.
 It can be described as unprocessed facts and figures.
 It is represented with the help of characters such as alphabets (A-Z,
a-z), digits (0-9) or special characters (+, -, /, *, <,>, =, etc.).
What are data and information?
(I)
9

 Information is the processed data on which decisions and actions

are based.
 Information is data that has been processed into a form that is
meaningful to the recipient and is of real or perceived value in the
current or the prospective action or decision of recipient.
 Furtherer more, information is interpreted data; created from
organized, structured, and processed data in a particular context.
Data Processing Cycle
10

 Data processing is the re-structuring or re-ordering of data by

people or machines to increase their usefulness and add values for
a particular purpose.
 Data processing consists of the following basic steps: Input,
Processing and Output. These three steps constitute the data
processing cycle.

Fig. 1.Data processing Cycle

Data Processing Cycle (I)
11

 Input :- in this step, the input data is prepared in some convenient form for
processing.
 The form will depend on the processing machine.
 For example, when electronic computers are used, the input data can be recorded on
any one of the several types of storage medium, such as hard disk, CD, flash disk and
so on.
 Processing:- in this step, the input data is changed to produce data in a more
useful form.
 For example, interest can be calculated on deposit to a bank, or a summary of
sales for the month can be calculated from the sales orders.
Data Processing Cycle
(II)
12

Output-at this stage, the result of the proceeding processing step is

collected.
 The particular form of the output data depends on the use of the

data.
 For example, output data may be payroll for employees.
Data types and their representation

 Data types can be described from diverse perspectives.

 In computer science and computer programming, for instance, a
data type is simply an attribute of data that tells the compiler or
interpreter how the programmer intends to use the data.
Data types from Computer programming perspective
14
 Almost all programming languages explicitly include the notion of
data type, though different languages may use different
terminology. Common data types include:
 Integers(int):- is used to represent whole numbers, mathematically
known as integers
 Booleans(bool):- is used to represent restricted to one of two
values: true or false
 Characters(char):- is used to represent a single character
 Floating-point numbers(float)- is used to represent real numbers
 Alphanumeric strings(string):- used to represent a combination of
characters and numbers
Data types from Data Analytics perspective
15

 From a data analytics point of view, it is important to

understand that there are three common types of
data types or structures:
 Structured

 Semi-structured and

 Unstructured data types.

Data types from Data Analytics perspective
16
Structured Data
17

 Structured data is data that adheres to a pre-defined data

model and is therefore straightforward to analyze.
 Structured data conforms to a tabular format with a
relationship between the different rows and columns.
 Common examples of structured data are Excel files or SQL
databases.
 Each of these has structured rows and columns that can be
sorted.
Semi-structured Data
18

 Semi-structured data is a form of structured data that does not

conform with the formal structure of data models associated with
relational databases or other forms of data tables, but nonetheless,
contains tags or other markers to separate semantic elements and
enforce hierarchies of records and fields within the data.
 Therefore, it is also known as a self-describing structure.
 Examples of semi-structured data include JSON and XML are forms
of semi-structured data.
Unstructured Data
19
 Unstructured data is information that either does not have a
predefined data model or is not organized in a pre-defined manner.
 Unstructured information is typically text-heavy but may contain
data such as dates, numbers, and facts as well.
 This results in irregularities and ambiguities that make it difficult to
understand using traditional programs as compared to data stored in
structured databases.
 Common examples of unstructured data include audio, video files or
NoSQL.
Metadata – Data about Data
20

 The last category of data type is metadata.

 From a technical point of view, this is not a separate data structure,
but it is one of the most important elements for Big Data analysis
and big data solutions.
 Metadata is data about data.
 It provides additional information about a specific set of data.
 In a set of photographs, for example, metadata could describe
when and where the photos were taken.
Data value Chain
21

 The Data Value Chain is introduced to describe the information

flow within a big data system as a series of steps needed to
generate value and useful insights from data. The Big Data Value
Chain identifies the following key high-level activities:

Fig2.Data Value Chain

1. Data Acquisition
22

 It is the process of gathering, filtering, and cleaning data before it is put in a data
warehouse or any other storage solution on which data analysis can be carried
out.
 Data acquisition is one of the major big data challenges in terms of infrastructure
requirements.
 The infrastructure required to support the acquisition of big data must deliver
low, predictable latency in both capturing data and in executing queries; be able
to handle very high transaction volumes, often in a distributed environment and
support flexible and dynamic data structures.
2. Data Analysis
23

 It is concerned with making the raw data acquired amenable to use

in decision-making as well as domain-specific usage.
 Data analysis involves exploring, transforming, and modeling data
with the goal of highlighting relevant data, synthesizing and
extracting useful hidden information with high potential from a
business point of view.
 Related areas include data mining, business intelligence, and
machine learning.
3. Data Curation
24

 It is the active management of data over its life cycle to ensure it meets
the necessary data quality requirements for its effective usage.
 Data curation processes can be categorized into different activities
such as content creation, selection, classification, transformation,
validation, and preservation.
 Data curation is performed by expert curators that are responsible for
improving the accessibility and quality of data.
 Data curators (also known as scientific curators or data annotators)
hold the responsibility of ensuring that data are trustworthy,
discoverable, accessible, reusable and fit their purpose.
 A key trend for the duration of big data utilizes community and crowd
sourcing approaches.
4. Data Storage
25

 It is the persistence and management of data in a scalable way that

satisfies the needs of applications that require fast access to the data.
 Relational Database Management Systems (RDBMS) have been the
main, and almost unique, a solution to the storage paradigm for nearly
40 years.
 However, the ACID (Atomicity,Consistency,Isolation,and Durability)
properties that guarantee database transactions lack flexibility with
regard to schema changes and the performance and fault tolerance
when data volumes and complexity grow, making them unsuitable for
big data scenarios.
 NoSQL technologies have been designed with the scalability goal in
mind and present a wide range of solutions based on alternative data
models.
5. Data Usage
26

 It covers the data-driven business activities that need

access to data, its analysis, and the tools needed to
integrate the data analysis within the business activity.
 Data usage in business decision making can enhance
competitiveness through the reduction of costs, increased
added value, or any other parameter that can be
measured against existing performance criteria.
Basic concepts of big data
27

What Is Big Data?

 Big data is the term for a collection of data sets so large and
complex that it becomes difficult to process using on-hand
database management tools or traditional data processing
applications.
 In this context, a “large dataset” means a dataset too large
to reasonably process or store with traditional tooling or on a
single computer.
 This means that the common scale of big datasets is
constantly shifting and may vary significantly from
organization to organization.
 Big data is characterized by 4V and more:
28
 Volume: large amounts of data Zeta bytes/Massive datasets
 Velocity: Data is live streaming or in motion
 Variety: data comes in many different forms from diverse sources
 Veracity: can we trust the data? How accurate is it? etc.

Fig 3. Characteristics of Big data

Source of Big data

Mobile devices
(Tracking all objects all the time)
Areas of Applications of Big Data
30

Health and Well being

Policy making and public opinions
Smart cities and more efficient society
New online educational models: MOOC and
Student-Teacher modeling
Robotics and human-robot interaction
Areas of Applications of Big Data
31

Smarter Multi-
Healthcare channel
sales

Telecom
Homeland
Security

Trading
Analytics
TrafficControl

Search
Quality
Manufacturing
Big Data vs Data
Science
32

Factors Big Data Data Science

Concept Handling large Data Analyzing data
Responsibility Processing huge volume of Understand pattern
data and generate insights within and make
decisions
Industry E-commerce ,security Sales, image
services, telecommunication recognition,
advertisement ,risk
analytics
tools Hadoop Python ,R
33

THANK YOU
?

Ict Ch. 2
No ratings yet
Ict Ch. 2
38 pages
Chapter 2 Data Science1
No ratings yet
Chapter 2 Data Science1
41 pages
Chapter 2. Introduction To Data Science
No ratings yet
Chapter 2. Introduction To Data Science
41 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
57 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
43 pages
Chapter Two
No ratings yet
Chapter Two
14 pages
Emerging Tech CH 2
No ratings yet
Emerging Tech CH 2
52 pages
EmgTech Chapter 02
No ratings yet
EmgTech Chapter 02
52 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
37 pages
Chapter 2 EmTe
No ratings yet
Chapter 2 EmTe
37 pages
ETCh 2
No ratings yet
ETCh 2
36 pages
Chapter 2EMR
No ratings yet
Chapter 2EMR
21 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
55 pages
Big Data Processing Tools Overview
No ratings yet
Big Data Processing Tools Overview
56 pages
ET Ch-2 Data Science PPT
No ratings yet
ET Ch-2 Data Science PPT
28 pages
(ET) Chapter - 2
No ratings yet
(ET) Chapter - 2
31 pages
Chapter-2 Data Science2
No ratings yet
Chapter-2 Data Science2
24 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
37 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
33 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
IT 106 - Intro To Data Sciences
No ratings yet
IT 106 - Intro To Data Sciences
32 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
58 pages
Data Science and Big Data Basics
No ratings yet
Data Science and Big Data Basics
32 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
30 pages
Course Name: Introduction To Emerging Technologies
No ratings yet
Course Name: Introduction To Emerging Technologies
24 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
28 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
27 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
32 pages
Data Lifecycle
No ratings yet
Data Lifecycle
55 pages
Chapter 2 (Data Science)
No ratings yet
Chapter 2 (Data Science)
35 pages
Data Science: Chapter Two
No ratings yet
Data Science: Chapter Two
8 pages
Chapter 2 Emerging
No ratings yet
Chapter 2 Emerging
31 pages
Data Science
No ratings yet
Data Science
32 pages
Emerging CH2
No ratings yet
Emerging CH2
41 pages
Chapter 2
No ratings yet
Chapter 2
30 pages
Chapter 2 Introduction To Data Science - For Extension
No ratings yet
Chapter 2 Introduction To Data Science - For Extension
51 pages
CH 2 Data Science
No ratings yet
CH 2 Data Science
28 pages
Chapter 2 - Intro To Data Sciences (Updated)
No ratings yet
Chapter 2 - Intro To Data Sciences (Updated)
67 pages
Chap 2-Data Analysis
No ratings yet
Chap 2-Data Analysis
27 pages
Chapter 2: Data Science
No ratings yet
Chapter 2: Data Science
32 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
41 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
8 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
41 pages
IET - Chapter 2
No ratings yet
IET - Chapter 2
32 pages
CH-2 Introduction To Data Science
No ratings yet
CH-2 Introduction To Data Science
26 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
52 pages
Multidisciplinary Field That Uses A Variety
No ratings yet
Multidisciplinary Field That Uses A Variety
48 pages
Chapter 2-2
No ratings yet
Chapter 2-2
34 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
35 pages
Introduction To Emerging Technologies Chapter 2
No ratings yet
Introduction To Emerging Technologies Chapter 2
31 pages
Introduction To Data Science: Chapter Two
No ratings yet
Introduction To Data Science: Chapter Two
52 pages
Overview of Data Science Concepts
No ratings yet
Overview of Data Science Concepts
40 pages
CHAPTER 2 Emerging
No ratings yet
CHAPTER 2 Emerging
8 pages
Chapter 2 Introduction To Data Science
No ratings yet
Chapter 2 Introduction To Data Science
50 pages
Chapter 2 - EMTE - 240216 - 133452
No ratings yet
Chapter 2 - EMTE - 240216 - 133452
47 pages
CH-2 Data Science
No ratings yet
CH-2 Data Science
45 pages
Chapter 2. Introduction To Data Science
100% (2)
Chapter 2. Introduction To Data Science
45 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
30 pages
7 Object Storage
No ratings yet
7 Object Storage
20 pages
Name - Prajwal Ghale Roll On - NP000477 Section - B: Question No.1
No ratings yet
Name - Prajwal Ghale Roll On - NP000477 Section - B: Question No.1
4 pages
Artificial Intelligence Syllabus
No ratings yet
Artificial Intelligence Syllabus
52 pages
Cybersecurity Information Gathering Using Kali Linux
100% (6)
Cybersecurity Information Gathering Using Kali Linux
93 pages
Mukularanyam English School: A Project Report ON Travel Agency Management System
No ratings yet
Mukularanyam English School: A Project Report ON Travel Agency Management System
71 pages
SQL Stored Procedure Execution Plan Guide
0% (1)
SQL Stored Procedure Execution Plan Guide
8 pages
Familiarization with Computer Hardware
No ratings yet
Familiarization with Computer Hardware
11 pages
Surface Vehicle Recommended Practice
100% (1)
Surface Vehicle Recommended Practice
29 pages
Application Launch Methods Guide
No ratings yet
Application Launch Methods Guide
2 pages
Information Produced by The DBMS Are Classified Into Thre1
No ratings yet
Information Produced by The DBMS Are Classified Into Thre1
9 pages
Basic Host To Host 1 15
No ratings yet
Basic Host To Host 1 15
37 pages
Data Structures and Algorithms Overview
No ratings yet
Data Structures and Algorithms Overview
20 pages
Entity and Referential Integrity
No ratings yet
Entity and Referential Integrity
3 pages
Cheat 2
No ratings yet
Cheat 2
187 pages
Apache Pig: A Guide to Pig Latin
No ratings yet
Apache Pig: A Guide to Pig Latin
23 pages
UNM2000 - Network Convergence Management System (Based On Windows) - Active-Standby System Installation Guide - A
No ratings yet
UNM2000 - Network Convergence Management System (Based On Windows) - Active-Standby System Installation Guide - A
318 pages
Install OPSAP Connector in SAP
No ratings yet
Install OPSAP Connector in SAP
22 pages
Sample Question Paper Data Structure Using 'C' PDF
100% (1)
Sample Question Paper Data Structure Using 'C' PDF
5 pages
NMap Command Reference Guide
No ratings yet
NMap Command Reference Guide
1 page
Azure Data Engineering Course
No ratings yet
Azure Data Engineering Course
7 pages
Preprocessor Directives: Example
No ratings yet
Preprocessor Directives: Example
20 pages
How To Update Qu Firmware - 5
No ratings yet
How To Update Qu Firmware - 5
1 page
MT4 Fibonacci Setup Guide
No ratings yet
MT4 Fibonacci Setup Guide
5 pages
2022-23-BDA-LAB Manual
No ratings yet
2022-23-BDA-LAB Manual
59 pages
L14 - Wildcard Queries
No ratings yet
L14 - Wildcard Queries
19 pages
Zerto vs. iland DRaaS Cost Analysis
No ratings yet
Zerto vs. iland DRaaS Cost Analysis
4 pages
Data Movement Modeling
No ratings yet
Data Movement Modeling
188 pages
Data Migration Process for T24 System
0% (1)
Data Migration Process for T24 System
19 pages
Báo Giá VANTECH - Dau Ghi - T10
No ratings yet
Báo Giá VANTECH - Dau Ghi - T10
24 pages

Data Science: Insights & Challenges

Uploaded by

Data Science: Insights & Challenges

Uploaded by

Chapter Two

 Data science is a multi-disciplinary field that uses scientific

 Allows to build intelligence ability in machines

 High variety of information & data is required for accurate analysis

 Data can be defined as a representation of facts, concepts, or

 Information is the processed data on which decisions and actions

 Data processing is the re-structuring or re-ordering of data by

Fig. 1.Data processing Cycle

Output-at this stage, the result of the proceeding processing step is

 Data types can be described from diverse perspectives.

 From a data analytics point of view, it is important to

 Unstructured data types.

 Structured data is data that adheres to a pre-defined data

 Semi-structured data is a form of structured data that does not

 The last category of data type is metadata.

 The Data Value Chain is introduced to describe the information

Fig2.Data Value Chain

 It is concerned with making the raw data acquired amenable to use

 It is the persistence and management of data in a scalable way that

 It covers the data-driven business activities that need

What Is Big Data?

Fig 3. Characteristics of Big data

Health and Well being

Factors Big Data Data Science

You might also like