0% found this document useful (0 votes)

7 views6 pages

Mds101 Unit 1

Introduction to data science

Uploaded by

Srinivasa Rao T

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views6 pages

Mds101 Unit 1

Introduction to data science

Uploaded by

Srinivasa Rao T

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

MDS101 – INTRODUCTION TO DATA SCIENCE

Total Teaching Hours: 52 No. of Hours / Week: 04

Course Objective:

 To understand the applications of Data Science

 To provide in-depth knowledge of Principles of Data Science, techniques and applications
 To gain a well-rounded introduction to the core concepts and technologies of Data Science
 An insight into data driven programming
Learning Outcome:
Upon completion of the course, students will be able to
 Explore data science and data engineering
 Apply Data-Driven Insights to Business and Industry
 Create Data Visualizations That Clearly Communicate Meaning
 Build Models That Operate Internet-of-Things Devices
 Apply Domain Expertise to Solve Real World Problems Using Data Science

UNIT-I [12 Hours]

Getting Started with Data Science – Facets of Data- structured data, Unstructured data, Natural
language, Machine-generated data, Graph based or network data, Audio, image, video streaming
data. Data Science Process: setting research goal-retrieving data- Data Preparation, Data
Exploration, Data modelling or model building- Presentation and Automation. Who Can Make Use
of Data Science, Analyzing the pieces of the Data Science Puzzle, Exploring the Data Science
Solution Alternatives, Letting Data Science Make You More Marketable.
UNIT-II. [10 Hours]
Exploring Data Engineering - Pipelines and Infrastructure -Grasping the Difference between Data
Science and Data Engineering- Identifying Big Data Sources-Making Sense of Data in Hadoop-
Identifying Alternative Big Data Solutions. Applying Data-Driven Insights to Business - Defining
Business-Centric Data Science -Differentiating between Business and Data Driven Business-
Benefiting from Business-Centric Data Science -Converting Raw Data into Actionable Insights with
Data Analytics -Taking Action on Business Insights -Distinguishing between Business Intelligence
and Data Science.
UNIT-III [10 Hours]
Using Data Science to Extract Meaning from Your Data: Machine Learning - Learning from
Data with Your Machine, Defining Machine Learning and Its Processes Considering Learning
Styles. Building Models That Operate Internet-of-Things Devices - Overviewing the Vocabulary
and Technologies Digging into the Data Science Approaches, Advancing Artificial Intelligence
Innovation
UNIT-IV [10 Hours]
Creating Data Visualizations: Following the Principles of Data Visualization Design, Data
Visualizations - The Big Three Designing to Meet the Needs of Your Target Audience, Picking the
Most Appropriate Design Style, Choosing How to Add Context, Selecting the Appropriate Data
Graphic Type, Choosing a Data Graphic. Using D3.js for Data Visualization - Introducing the
D3.js Library-Knowing When to Use D3.js-Getting Started in D3.js-Implementing More Advanced
Concepts.
UNITV
[10Hours]
Doing Data Science with Excel and Knime - Making Life Easier with Excel, Using KNIME for
Advanced Data Analytics. Applying Domain Expertise to Solve RealWorld Problems Using
Data Science - Data Science for driving growth in e-commerce.
Textbooks and References:

1. Introducing Data Science by Davy Cielen, Arno D.B.Meysman, Mohamed Ali, dream
tech press.

2. Data Science For Dummies (For Dummies (Computers)) 2nd Edition by Lillian
Pierson
3. An Introduction to Data Science by Jeffrey S. Saltz and Jeffrey M. Stanton
4. A Hands-On Introduction to Data Science by Chirag Shah

Getting Started with Data Science – Facets of Data.

Data science is focused on making sense of complex datasets and in building

predictive models from those data. As such, it encompasses a wide array of
different activities, from the upstream processes of acquiring, cleaning and
integrating data to downstream processes of analysis, modeling and
prediction. There are many facets of data science, including:
 Identifying the structure of data

 Cleaning, filtering, reorganizing, augmenting, and aggregating data

 Visualizing data

 Data analysis, statistics, and modeling

 Machine Learning

 Assembling data processing pipelines to link these steps

 Leveraging high-end computational resources for large-scale problems

Often, different tools address different parts of this process.
Therefore, interoperability among tools, based on common data structures
and interfaces, is an important element in enabling the construction of
complex, multifaceted data analysis pipelines. It is in this sense that we can
talk about an ecosystem for data science. For any particular application, you
might only be interested in a

structured data,
Structured data is the data which conforms to a data model, has a well define structure, follows a
consistent order and can be easily accessed and used by a person or a computer program. Structured data is
usually stored in well-defined schemas such as Databases.

Structured data is the data which conforms to a data model, has a well define structure,
follows a consistent order and can be easily accessed and used by a person or a computer
program.
Structured data is usually stored in well-defined schemas such as Databases. It is generally
tabular with column and rows that clearly define its attributes.
SQL (Structured Query language) is often used to manage structured data stored in
databases.
Characteristics of Structured Data:
 Data conforms to a data model and has easily identifiable structure
 Data is stored in the form of rows and columns
Example : Database
 Data is well organised so, Definition, Format and Meaning of data is explicitly known
 Data resides in fixed fields within a record or file
 Similar entities are grouped together to form relations or classes
 Entities in the same group have same attributes
 Easy to access and query, So data can be easily used by other programs
 Data elements are addressable, so efficient to analyse and process
Sources of Structured Data:
 SQL Databases
 Spreadsheets such as Excel
 OLTP Systems
 Online forms
 Sensors such as GPS or RFID tags
 Network and Web server logs
 Medical devices
Advantages of Structured Data:
 Structured data have a well defined structure that helps in easy storage and access of
data
 Data can be indexed based on text string as well as attributes. This makes search
operation hassle-free
 Data mining is easy i.e knowledge can be easily extracted from data
 Operations such as Updating and deleting is easy due to well structured form of data
 Business Intelligence operations such as Data warehousing can be easily undertaken
 Easily scalable in case there is an increment of data
 Ensuring security to data is easy
Note: Structured data accounts for only about 20% of data but because of its high degree
of organisation and performance make it foundation of Big data
To read Differences between Structured, Semi-structured and Unstructured data refer the
following article –
Structured Data
When we talk about structured data, we are often talking about tabular data(rectangular data) i.e.
rows and columns from a database. These tables further contain mainly two types of structured data:

1. Numerical Data
Data that is expressed on a numerical scale. It is further represented in two forms:
 Continuous — Data that can undertake any value in an interval. For example, the speed of a
car, heart rate, etc.
 Discrete — Data that can undertake only integer values, such as counts. For example, the
number of heads in 20 flips of a coin.
2. Categorical Data
Data that can undertake only a specific set of values representing possible categories. These are also
called enums, enumerated, factors, or nominal.
 Binary — A special case of categorical data where the features are dichotomous i.e. can accept
only 0/1 or True/False.
 Ordinal — Categorical data that has an explicit ordering. For example, five-star rating of a
restaurant(1,2,3,4,5)
But the question arises, why do you need to learn about the data? The answer is that without the
knowledge of the type of data, you will have no clue about applying the right statistical methods to
deal with that type of data.
For example, if one of the columns in a dataframe has ordinal data, we will have to preprocess it, and
in python, the scikit-learn package offers an OrdinalEncoder to deal with ordinal data.
The next step is to dive deeper into structured data and how we can use third party packages and
libraries to manipulate such structures. We have mainly two types of structures or data storage
models:
1. Rectangular
2. Non-Rectangular

Rectangular Data
Mostly all analyses in data science are done with a rectangular two-dimensional data object like a
dataframe, spreadsheet, CSV file, or a database table.
This mainly consists of rows that represent records(observations) and columns(features/variables).
Dataframe on the other hand is a special data structure with a tabular format that offers super-
efficient operations to manipulate the data.
Dataframes are the most commonly used data structures and it’s important to cover a few definitions
here:
Data frame
Rectangular data structure (like a spreadsheet) for efficient manipulation and application of statistical
and machine learning models.
Feature
A column within a dataframe is commonly referred to as a feature.
Synonyms — attribute, input, predictor, variable
Outcome
Many data science projects involve predicting an outcome — often a yes/no outcome.
Synonyms — dependent variable, response, target, output
Records
A row within a dataframe is commonly referred to as a record.
Synonyms — case, example, instance, observation, pattern, sample
Example:

Relational database tables have one or more columns designated as an index, essentially a row

number. This can vastly improve the efficiency of certain database queries. In a pandas dataframe,

an automatic integer index is created based on the order of the rows. In pandas, it is also possible to
set multilevel/hierarchical indexes to improve the efficiency of certain operations

Non-rectangular Data
Besides rectangular data, we have several other data structures which come under the umbrella of
non-rectangular data.
Spatial data structures, which are used in geolocation analytics, are more complex and different from
rectangular data structures. In the object representation, the focus of the data is an object (e.g., a
park) and its spatial coordinates. The field view, by contrast, focuses on small units of space and the
value of a relevant metric (pixel intensities, for example).
Graph data structures are used to represent relationships — physical, social, and abstract. For
example, Facebook or Twitter represents connections between people on the network as a graph of
social relationships. Graph structures are useful for certain types of problems, such as network
optimization and recommender systems.
Each of these data types has a specific set of methods in data science. The focus of this series is on
rectangular data which forms the foundational building blocks of predictive modeling.
Unstructured data, Natural language, Machine-generated data, Graph based or network data, Audio,
image, video streaming data. Data Science Process: setting research goal-retrieving data- Data
Preparation, Data Exploration, Data modelling or model building- Presentation and Automation.
Who Can Make Use of Data Science, Analyzing the pieces of the Data Science Puzzle, Exploring
the Data Science Solution Alternatives, Letting Data Science Make You More Marketable

CH1 Introduction To Data Science BS
No ratings yet
CH1 Introduction To Data Science BS
69 pages
FODS Unit-1
No ratings yet
FODS Unit-1
33 pages
EDS Unit 1?
No ratings yet
EDS Unit 1?
15 pages
Intro To Data-Science Final
No ratings yet
Intro To Data-Science Final
3 pages
Data Science Unit 01
No ratings yet
Data Science Unit 01
19 pages
Unit-1 IDS
No ratings yet
Unit-1 IDS
26 pages
21css303t Datascience Unit 1 Notes
No ratings yet
21css303t Datascience Unit 1 Notes
246 pages
Ocs353dsf Unit Wise Notes
100% (4)
Ocs353dsf Unit Wise Notes
121 pages
FDSNotes
No ratings yet
FDSNotes
12 pages
Data Science Unit 1 Notes
No ratings yet
Data Science Unit 1 Notes
75 pages
Introduction To Data Science - Ii-I Course File 2025-26
No ratings yet
Introduction To Data Science - Ii-I Course File 2025-26
152 pages
Foundations of Data Science PPT TEXT BOOK
No ratings yet
Foundations of Data Science PPT TEXT BOOK
132 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
161 pages
Introduction to Data Science Concepts
100% (1)
Introduction to Data Science Concepts
167 pages
DS Unit 1
No ratings yet
DS Unit 1
37 pages
FDS - Unit 1
No ratings yet
FDS - Unit 1
233 pages
Bca Ctis Sem-5 Introduction To Data Science
No ratings yet
Bca Ctis Sem-5 Introduction To Data Science
14 pages
DSF Notes
No ratings yet
DSF Notes
97 pages
Unit 1
No ratings yet
Unit 1
34 pages
Unit I - Data Science
No ratings yet
Unit I - Data Science
185 pages
Data Science Overview for Honours Students
No ratings yet
Data Science Overview for Honours Students
28 pages
Foundation of Data Science (BSC) 1
No ratings yet
Foundation of Data Science (BSC) 1
64 pages
Foundation of Data Science (BSC)
No ratings yet
Foundation of Data Science (BSC)
64 pages
Dsbda Unit1
No ratings yet
Dsbda Unit1
232 pages
FDS - Unit 1
No ratings yet
FDS - Unit 1
233 pages
Overview of Data Science Concepts
No ratings yet
Overview of Data Science Concepts
40 pages
Ids Unit 1,2,3,4 & 5
No ratings yet
Ids Unit 1,2,3,4 & 5
117 pages
Introduction to Data Science Course
No ratings yet
Introduction to Data Science Course
9 pages
Session 1819
No ratings yet
Session 1819
47 pages
Data Science Fundamentals Detailed Notes
No ratings yet
Data Science Fundamentals Detailed Notes
31 pages
IDS Complete Notes
No ratings yet
IDS Complete Notes
126 pages
Lecture 1 & 2
No ratings yet
Lecture 1 & 2
53 pages
Data Science Unit I
No ratings yet
Data Science Unit I
13 pages
Data Science Unit-1 Notes
No ratings yet
Data Science Unit-1 Notes
19 pages
Kadir
No ratings yet
Kadir
84 pages
Fundamentals of Data Science Course
75% (4)
Fundamentals of Data Science Course
62 pages
Introduction To Datasciecne
No ratings yet
Introduction To Datasciecne
50 pages
20IT501 BDA Unit1
No ratings yet
20IT501 BDA Unit1
18 pages
Hammad Raza.
No ratings yet
Hammad Raza.
28 pages
Unit1 R Full Material
No ratings yet
Unit1 R Full Material
11 pages
Cs3352 - Foundation of Data Science
No ratings yet
Cs3352 - Foundation of Data Science
56 pages
Data Science - Unit 1 MDM
No ratings yet
Data Science - Unit 1 MDM
64 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
31 pages
Screenshot 2025-04-23 at 8.26.12 AM
No ratings yet
Screenshot 2025-04-23 at 8.26.12 AM
14 pages
Fds Question Bank
No ratings yet
Fds Question Bank
116 pages
DTS 201 Lecture Note
No ratings yet
DTS 201 Lecture Note
24 pages
Data Science
No ratings yet
Data Science
244 pages
Data Science S (2 Files Merged)
No ratings yet
Data Science S (2 Files Merged)
30 pages
Data Scince Report
No ratings yet
Data Scince Report
11 pages
Introduction to Data Science Lecture Notes
100% (5)
Introduction to Data Science Lecture Notes
133 pages
CRISP-DM Methodology in Data Science
No ratings yet
CRISP-DM Methodology in Data Science
493 pages
IDS Unit 1
No ratings yet
IDS Unit 1
67 pages
Data Science - FYBCA-Sem-II
No ratings yet
Data Science - FYBCA-Sem-II
13 pages
Introduction To Data Science - 23CSH-283
100% (1)
Introduction To Data Science - 23CSH-283
48 pages
Week 1 Data Science
No ratings yet
Week 1 Data Science
17 pages
Unit-1 - Introduction To Data Science
No ratings yet
Unit-1 - Introduction To Data Science
17 pages
FDS CH1
No ratings yet
FDS CH1
4 pages
Mongodb MCQ
No ratings yet
Mongodb MCQ
3 pages
MongoDB Indexing Strategies Explained
No ratings yet
MongoDB Indexing Strategies Explained
31 pages
Hadoop Installation Steps
No ratings yet
Hadoop Installation Steps
16 pages
Understanding Dynamic Programming Techniques
No ratings yet
Understanding Dynamic Programming Techniques
10 pages
Data Science Basics
No ratings yet
Data Science Basics
8 pages
CC MCQ
No ratings yet
CC MCQ
28 pages
Cloud Computing Deployment Models
No ratings yet
Cloud Computing Deployment Models
5 pages
What Is Cloud Computing Reference Model
100% (1)
What Is Cloud Computing Reference Model
3 pages
MSC Comp SC Syllabus Cbcs 09072016
No ratings yet
MSC Comp SC Syllabus Cbcs 09072016
37 pages
Cloud Computing Architecture
100% (2)
Cloud Computing Architecture
4 pages
Cornell Note-Taking System Guide
No ratings yet
Cornell Note-Taking System Guide
1 page
R Programming Presentation Slots
No ratings yet
R Programming Presentation Slots
1 page
MCA Admission List
No ratings yet
MCA Admission List
3 pages
Resume: Mamatha.K
No ratings yet
Resume: Mamatha.K
2 pages
Data Warehousing & OLAP Insights
No ratings yet
Data Warehousing & OLAP Insights
53 pages
Excel 2011: Formulas and Functions Guide
No ratings yet
Excel 2011: Formulas and Functions Guide
14 pages
DG5000 Catalog AT2225
No ratings yet
DG5000 Catalog AT2225
2 pages
Ketan Thakur's Tech Projects & Experience
No ratings yet
Ketan Thakur's Tech Projects & Experience
1 page
CS Practical 2023
No ratings yet
CS Practical 2023
4 pages
Unit-1 ACA
No ratings yet
Unit-1 ACA
26 pages
SAP SD Ticket Handling Process
100% (1)
SAP SD Ticket Handling Process
3 pages
EPC-1000 Tactical Rifle Camera
No ratings yet
EPC-1000 Tactical Rifle Camera
6 pages
Backup Tele Canvas
100% (1)
Backup Tele Canvas
673 pages
Laboratory Inventory and Monitoring System Eweeeeeee
No ratings yet
Laboratory Inventory and Monitoring System Eweeeeeee
39 pages
IoT Uses in Oil Gas Industry 1646190315
No ratings yet
IoT Uses in Oil Gas Industry 1646190315
13 pages
SP-C-WEB Security Expert - Network Administrator - Reference Guide
No ratings yet
SP-C-WEB Security Expert - Network Administrator - Reference Guide
10 pages
Current Account
No ratings yet
Current Account
481 pages
President IT Authentic Product List in Bangladesh
No ratings yet
President IT Authentic Product List in Bangladesh
33 pages
C# Ref vs Out Parameters Explained
No ratings yet
C# Ref vs Out Parameters Explained
5 pages
Brochure Teledyne - Dms Motion Sensors
No ratings yet
Brochure Teledyne - Dms Motion Sensors
4 pages
MIT 6.S191: Intro to Deep Learning Syllabus
No ratings yet
MIT 6.S191: Intro to Deep Learning Syllabus
6 pages
How To Use Mobile App ASOAnd SEOFor Your App Promotion
No ratings yet
How To Use Mobile App ASOAnd SEOFor Your App Promotion
5 pages
Lean Engineering
No ratings yet
Lean Engineering
29 pages
Kavitha K
No ratings yet
Kavitha K
10 pages
TF Chile - BPO Salesland Migration & Rollback Plan
No ratings yet
TF Chile - BPO Salesland Migration & Rollback Plan
7 pages
AI Manual
No ratings yet
AI Manual
16 pages
SecurView DX-RT v11.0 Workstation User Guide (MAN-07257-002) English (OUS) Rev - 002 07-2021
No ratings yet
SecurView DX-RT v11.0 Workstation User Guide (MAN-07257-002) English (OUS) Rev - 002 07-2021
204 pages
Introdection To Electonucs
No ratings yet
Introdection To Electonucs
132 pages
Multiplexer Assignment Fresh
No ratings yet
Multiplexer Assignment Fresh
3 pages
Exploring Java - Io
No ratings yet
Exploring Java - Io
1 page
66kV GIS Substation Construction Tender
No ratings yet
66kV GIS Substation Construction Tender
236 pages
Forcepoint'S Cloud and On Premise Email Security
No ratings yet
Forcepoint'S Cloud and On Premise Email Security
8 pages
BP2313 Audit With Answers)
100% (4)
BP2313 Audit With Answers)
44 pages
Computer Magazine
100% (1)
Computer Magazine
112 pages
Vijamukhi - The IL Disassembler
No ratings yet
Vijamukhi - The IL Disassembler
541 pages

Mds101 Unit 1

Uploaded by

Mds101 Unit 1

Uploaded by

MDS101 – INTRODUCTION TO DATA SCIENCE

Total Teaching Hours: 52 No. of Hours / Week: 04

 To understand the applications of Data Science

UNIT-I [12 Hours]

Getting Started with Data Science – Facets of Data.

Data science is focused on making sense of complex datasets and in building

 Cleaning, filtering, reorganizing, augmenting, and aggregating data

 Data analysis, statistics, and modeling

 Assembling data processing pipelines to link these steps

 Leveraging high-end computational resources for large-scale problems

You might also like