0% found this document useful (0 votes)

8 views10 pages

Module 4

The document explains linked analytical datasets in data science, emphasizing the importance of combining multiple datasets through common identifiers to enhance analysis and insights. It outlines various linking strategies such as inner join, left join, right join, full outer join, and cross join, detailing their applications and implications for data analysis. Additionally, it discusses the economics of IoT analytics, predictive maintenance, and the associated costs and revenue opportunities in data analytics.

Uploaded by

Shreya Saxena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views10 pages

Module 4

Uploaded by

Shreya Saxena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Module 4

1. What is meant by linked analytical datasets in data science?

In data science, linked analytical datasets refer to multiple datasets that are
connected or combined through a common key or identifier (like a person ID,
product ID, time, etc.) to create a richer, more complete dataset for analysis.
In simple terms:
It’s when you take two or more separate data sources and "link" them together so you can analyze
more complex relationships.
For example:
 One dataset might have customer information (name, age, location).
 Another dataset might have purchase history (customer ID, items bought, date).
 By linking them using the customer ID, you can study patterns like "what type of
customers buy which products."
Key points:
 Linking usually happens using a shared variable (a key).
 It helps enrich the data — you get more features and context.
 It's critical for better modeling, prediction, and insight generation.
Example

Imagine a hospital wants to predict which patients are at risk of being readmitted
within 30 days.
They have two datasets:
 Dataset 1: Patient Information
(Patient ID, Age, Gender, Medical Conditions)
 Dataset 2: Hospital Visits
(Patient ID, Visit Date, Reason for Visit, Treatment Given)

If they only look at one dataset, they miss important information.

But if they link the two datasets using Patient ID, they can:
✅ See each patient’s medical history and their treatment history
✅ Spot patterns like "patients with certain conditions and treatments are more likely to return"
✅ Build much better predictive models

2. What is meant Linking together strategy in Data Science?

In data science, a "linking together strategy" means planning how to combine
different datasets to create a single, unified dataset that’s ready for analysis.
It’s not just about randomly merging files — it’s a careful strategy to decide:
 Which datasets to link
 How to link them (what keys or fields to use)
 When to link them (before, during, or after cleaning the data)
 What to do if the data doesn't match perfectly (missing IDs, mismatched formats)

In simple words:
It’s a smart plan for how you connect your different pieces of data to build a bigger, more useful
dataset for solving your problem.

Example:
Suppose a company has:
 A customer profile dataset
 A website activity dataset
 A product purchase dataset
Their linking together strategy might be:
 Use Customer ID to join customer profiles to website activity.
 Then link the combined data to purchases using Customer ID again.
 Handle missing data carefully (for customers who browsed but didn’t buy).

Why is it important?
Because bad linking = bad analysis.
Good strategy ensures accuracy, completeness, and better results in models and reports.

3. Most common types of linking strategies

Suppose we have two small datasets:
Customers Table

CustomerID Name
1 Alice
2 Bob
3 Charlie
Purchases Table

CustomerID Product
2 Laptop
3 Smartphone
4 Tablet
a. Inner Join
 Meaning: Only keep records that match in both datasets.
 Example: If a customer exists in both the purchase and profile datasets, keep them. Ignore
the rest.
 Use when: You want only complete matches.
Keeps only the CustomerIDs that exist in both tables.

CustomerID Name Product

2 Bob Laptop
3 Charlie Smartphone

b. Left Join (or Left Outer Join)

 Meaning: Keep all records from the first (left) dataset, even if there’s no match in the
second.
 Example: Keep all customers, even if they didn’t make a purchase.
 Use when: You want everything from your main dataset, plus extra info if available.
 Keeps all customers, even if they didn’t buy anything.

CustomerID Name Product

1 Alice NULL
2 Bob Laptop
3 Charlie Smartphone

c. Right Join (or Right Outer Join)

 Meaning: Opposite of left join — keep all records from the second (right) dataset.
 Less common, but useful if the second dataset is your focus
 Keeps all purchases, even if we don’t know who the customer is.

CustomerID Name Product

2 Bob Laptop
3 Charlie Smartphone
4 NULL Tablet


d. Full Outer Join

 Meaning: Keep everything from both datasets, match when you can, and leave blanks
where you can’t.
 Example: List all customers and all purchases, even if they don't match perfectly.
 Use when: You don’t want to miss anything.
 Keeps everything, even if it doesn’t match perfectly.

CustomerID Name Product

1 Alice NULL
2 Bob Laptop
3 Charlie Smartphone
4 NULL Tablet


e. Cross Join
 Meaning: Match every row of one dataset to every row of the other. (Rarely used.)
 Example: If you have 10 customers and 5 products, you get 50 combinations.
 Use when: You need every possible pair (like for testing scenarios).
 Every row from Customers combined with every row from Purchases. Example for 2
customers and 2 products:

Name Product
Alice Laptop
Alice Smartphone
Bob Laptop
Bob Smartphone

4. What is the economics of IoT Analytics

The economics of IoT Analytics means how money, value, and costs flow when companies
collect and analyze data from Internet of Things (IoT) devices.
It’s about answering questions like:
 How much does it cost to collect, store, and analyze IoT data?
 How much value (money, insights, efficiency) can you create from that data?
 Is it profitable to invest in IoT Analytics?

Key parts of IoT Analytics economics:

1. Costs
 Buying and maintaining sensors/devices
 Networking (getting data from devices to the cloud)
 Data storage (because IoT data is huge and continuous)
 Processing and analytics tools (software, cloud services)
 Security and privacy costs (protecting sensitive data)
2. Value Creation
 Operational efficiency (example: factories predicting machine failure before it
happens)
 New business models (example: smart homes offering subscription services)
 Better decision-making (example: farms adjusting watering based on real-time soil
sensors)
 Customer personalization (example: fitness trackers suggesting workouts)
3. Return on Investment (ROI)
 Companies must calculate:
Value generated minus Total costs
→ If positive, IoT Analytics is economically good for them.
4. Scale Effects
 Bigger companies or projects get cheaper per device (economies of scale).
 Analyzing 1,000 devices is expensive, but analyzing 1 million devices can be much
cheaper per device.
5. Risks
 High upfront costs (especially for hardware and setup)
 Uncertain ROI if the analytics don’t produce useful insights
 Data quality issues (bad data = wrong decisions = wasted money)

Simple Example:
A delivery company installs GPS trackers on all its trucks.
 Cost: Buy trackers + cloud fees + analytics software.
 Value: Save fuel (better routes), predict maintenance (fewer breakdowns), happier
customers.
 If savings > costs → good economics for IoT Analytics.

In short we can say:

Economics of IoT Analytics = making sure that the cost of collecting and analyzing data is less
than the benefits you get from using that data smartly.
5. Explain by Flowchart showing how the economics of IoT Analytics works:

[ Investment ]
↓
Buy Devices + Sensors
↓
Set Up Networking + Cloud Storage
↓
Collect Huge Amounts of Data
↓
Use Analytics Tools + AI to Process Data
↓
Generate Insights
↓
Take Smarter Actions (example: save energy, predict failures)
↓
Create Value (example: cost savings, new revenue)
↓
Calculate Return on Investment (ROI)
↓
Is Value > Cost?
→ Yes → PROFIT
→ No → REVISE STRATEGY

6. Expalin with real world example economics of IoT Analytics works

Real-World Example: DHL — Smart Logistics with IoT Analytics

Company: DHL (one of the world’s largest logistics companies)

🛠 What they did:

 Installed IoT sensors on delivery trucks, shipping containers, and warehouses.
 Sensors collected data like:
➔ Truck location
➔ Temperature inside containers
➔ Vehicle health (engine, tires)
➔ Package movement

🔍 How they used IoT Analytics:

 Real-time Tracking: They could see where every truck and package was at all times.
 Predictive Maintenance: Analyzed truck sensor data to predict breakdowns before they
happened.
 Temperature Monitoring: Made sure sensitive packages (like medicines) stayed within
safe temperatures during shipping.
 Route Optimization: Analyzed traffic and delivery patterns to plan faster and cheaper
routes.

💰 Economic Impact:
 Reduced fuel costs by optimizing delivery routes.
 Fewer truck breakdowns, saving repair and emergency costs.
 Fewer lost or damaged goods, saving money on insurance claims.
 Faster deliveries improved customer satisfaction (bringing more business).
Result:
DHL reported millions of dollars saved each year — the value they gained was much higher than
the cost of installing sensors and running analytics.

7. What are the Cost Considerations and Revenue Opportunities in Data

Analytics

I. Cost Considerations
These are the main expenses companies need to plan for when they invest in data analytics:

a. Data Collection
 Buying or building systems to capture data (sensors, apps, websites, CRM systems).
 Cost of IoT devices if it's physical world data.

b. Data Storage
 Cloud storage (AWS, Azure, Google Cloud) or on-premise data centers.
 Storing huge volumes of structured and unstructured data can be expensive.

c. Data Processing and Cleaning

 Tools, servers, and staff to clean, format, and process messy raw data into useful
information.
 Example: Data engineers and database administrators.

d. Analytics Tools and Software

 Licenses for software like Tableau, Power BI, SAS, or custom AI/ML models.
 Sometimes open-source is free, but at scale, even free tools have infrastructure costs.

e. Talent and Skills

 Hiring data scientists, data analysts, machine learning engineers, and architects.
 Salaries for these roles can be very high!

f. Security and Compliance

 Protecting sensitive data (especially customer data) with cybersecurity measures.
 GDPR, HIPAA, and other legal compliance requirements can add extra costs.

g. Maintenance and Upgrades

 Regular updates to hardware, software, and models to keep everything running efficiently.

II. Revenue Opportunities

These are the ways data analytics helps companies MAKE more money:

a. Better Decision-Making
 Use insights from data to make smarter business decisions.
 Example: Knowing which products are about to trend and increasing stock early.

b. Personalized Marketing and Sales

 Target the right customer with the right offer at the right time.
 Higher conversion rates = more sales.

c. New Products and Services

 Data can reveal customer needs companies didn’t know about.
 Example: Spotify using listening data to create personalized playlists (which keeps users
loyal).

d. Cost Reductions (Efficiency Gains)

 Predictive maintenance in factories saves huge amounts by avoiding unexpected
breakdowns.
 Automation of repetitive tasks with analytics saves on staffing costs.

e. Risk Management
 Predict and prevent risks (like fraud detection in banks) before they cause financial damage.

f. Selling Data or Insights

 Some companies anonymize and sell their collected data to other businesses (carefully,
within legal limits).

⚡ Quick Example
A retail company using customer buying data:
 Cost: $500,000 per year on cloud storage + analytics platform.
 Value created: $2 million extra in sales through better-targeted promotions.
Result: Huge return on investment!

8. What is the economics of the predictive maintenance cycle in data science?

1. What is Predictive Maintenance?

Predictive Maintenance (PdM) means using data (like sensor readings, machine logs,
temperature, vibrations, etc.) to predict when equipment might fail, before it actually fails — so
you can fix it just in time.
Instead of:
 Reactive maintenance (fixing after it breaks — expensive 💸)
 Scheduled maintenance (fixing even if it doesn’t need it — wasteful 🛠)
You do Predictive maintenance → fix only when needed, based on real signs.

2. Economics of the Predictive Maintenance Cycle

It’s about how much money you spend vs. how much money you save or make by using
predictive maintenance.

🛠 Costs involved:
 Sensors and IoT devices on machines (temperature, vibration, usage hours).
 Data storage (because sensors collect data 24/7).
 Analytics software (machine learning models that predict failures).
 Skilled personnel (data scientists, engineers).
 Integration and maintenance of the system itself.

💰 Value / Revenue created:

 Less unexpected downtime (no stopping of production or services).
 Lower repair costs (small fixes instead of big breakdowns).
 Longer machine life (machines are healthier over time).
 Better productivity (no "waiting" for machines to get fixed).
 Safer working conditions (fewer accidents from machine failures).

3. Predictive Maintenance Cycle (Economics Flow)

Install Sensors → Collect Data → Analyze and Predict → Schedule Maintenance at
the Right Time → Avoid Breakdowns → Save Money → Earn More Profit

 Investment happens early (installing and setting up).

 Savings and value keep happening over time (through fewer failures and lower costs).

4. Real-World Example
Airlines use predictive maintenance on aircraft engines:
 Sensors monitor engine vibrations, temperature, air pressure, etc.
 AI models predict when a part will need replacement.
 Result:
 Save millions by avoiding canceled flights.
 Lower maintenance costs.
 Increase passenger satisfaction and trust.

5. Simple Formula
Net Economic Benefit = (Value of avoided failures + Value of improved productivity + Value of
extended asset life)
MINUS
(Cost of sensors + analytics + skilled workers + maintenance of system)
If the Net Benefit is positive and growing, predictive maintenance is economically successful!

🎯 In simple words:
You spend some money up front to set up predictive maintenance, but you save much more
money by catching problems before they explode — and you make your machines and business
run smoother and longer.

Unit IV Notes
100% (1)
Unit IV Notes
28 pages
Unit 1 DAW
No ratings yet
Unit 1 DAW
30 pages
MergeResult 2025 06 02 03 28 21
No ratings yet
MergeResult 2025 06 02 03 28 21
306 pages
Understanding Data Science on AWS
No ratings yet
Understanding Data Science on AWS
13 pages
Lecture 2 Data Analytics Video Transcript 2
No ratings yet
Lecture 2 Data Analytics Video Transcript 2
11 pages
ABUBAKAR AMINU 2022-149040CS New
No ratings yet
ABUBAKAR AMINU 2022-149040CS New
14 pages
TLMweek 1 Intro Ds
No ratings yet
TLMweek 1 Intro Ds
11 pages
Iot Unit Wise
No ratings yet
Iot Unit Wise
43 pages
Data Science
No ratings yet
Data Science
207 pages
Big Data Analytics
No ratings yet
Big Data Analytics
5 pages
BA Test Material
No ratings yet
BA Test Material
13 pages
IoT Data Analytics Overview
No ratings yet
IoT Data Analytics Overview
24 pages
Unit 4-IOT
No ratings yet
Unit 4-IOT
21 pages
Unit 1
No ratings yet
Unit 1
8 pages
Business Analytics Summary (Units 1.2 - 1.8)
No ratings yet
Business Analytics Summary (Units 1.2 - 1.8)
8 pages
Intro AI
No ratings yet
Intro AI
4 pages
Abhijitya Midsem
No ratings yet
Abhijitya Midsem
6 pages
Data Science Course in Hyderabad
No ratings yet
Data Science Course in Hyderabad
9 pages
Data Science
No ratings yet
Data Science
9 pages
All Answers
No ratings yet
All Answers
55 pages
R Programming Basics
No ratings yet
R Programming Basics
17 pages
Internship Report
No ratings yet
Internship Report
9 pages
Ba Notes Short
No ratings yet
Ba Notes Short
50 pages
Data Science and Python for Business Insights
No ratings yet
Data Science and Python for Business Insights
12 pages
Data Science & Cyber Security
100% (1)
Data Science & Cyber Security
13 pages
Document From Shivam
No ratings yet
Document From Shivam
35 pages
Unit 1 Ba
No ratings yet
Unit 1 Ba
14 pages
Definition of IoT Data Analytics
No ratings yet
Definition of IoT Data Analytics
18 pages
Data Analysis Fundamentals and Techniques
No ratings yet
Data Analysis Fundamentals and Techniques
5 pages
Introduction To Data Science and Analytics: Summer School 2015
No ratings yet
Introduction To Data Science and Analytics: Summer School 2015
31 pages
Ccs334 Unit 1
No ratings yet
Ccs334 Unit 1
44 pages
Da Unit-Ii
No ratings yet
Da Unit-Ii
21 pages
Module 5
No ratings yet
Module 5
29 pages
Big Data
No ratings yet
Big Data
47 pages
Data Science in IoT UNIT-5 Notes
No ratings yet
Data Science in IoT UNIT-5 Notes
6 pages
Data Analytics
No ratings yet
Data Analytics
4 pages
Here Is An Even More Detailed and Expanded Version of Chapter 1
No ratings yet
Here Is An Even More Detailed and Expanded Version of Chapter 1
5 pages
File 1
No ratings yet
File 1
3 pages
Connecting Data Driving Productivity
No ratings yet
Connecting Data Driving Productivity
64 pages
Finals IT APP REVIEWER
No ratings yet
Finals IT APP REVIEWER
48 pages
CHAPTER 02: Big Data Analytics
No ratings yet
CHAPTER 02: Big Data Analytics
73 pages
Unit 4-IOT
No ratings yet
Unit 4-IOT
21 pages
Unit1 R Full Material
No ratings yet
Unit1 R Full Material
11 pages
Dsbda Ut3
No ratings yet
Dsbda Ut3
14 pages
Module4-Data Analytics-Ppt-Dlb-Chapter5
No ratings yet
Module4-Data Analytics-Ppt-Dlb-Chapter5
50 pages
Unit 2
No ratings yet
Unit 2
11 pages
IoT - New 6
No ratings yet
IoT - New 6
186 pages
Unit - II (Bca01)
No ratings yet
Unit - II (Bca01)
17 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
29 pages
Defining IoT Analytics
No ratings yet
Defining IoT Analytics
20 pages
Analytics Overview
No ratings yet
Analytics Overview
34 pages
Data Science & Big Data Essentials
No ratings yet
Data Science & Big Data Essentials
46 pages
DIGITAL FLUENCY - Unit 1 & 2 - Part 2 Edited
No ratings yet
DIGITAL FLUENCY - Unit 1 & 2 - Part 2 Edited
11 pages
Ds&ba 22
No ratings yet
Ds&ba 22
35 pages
CH 1
No ratings yet
CH 1
56 pages
CH 1
No ratings yet
CH 1
33 pages
unit-3-TOOLS AND METHODS USED IN CYBERCRIME
No ratings yet
unit-3-TOOLS AND METHODS USED IN CYBERCRIME
23 pages
WT - Unit 1 (HTML)
No ratings yet
WT - Unit 1 (HTML)
34 pages
Normalization Techniques
No ratings yet
Normalization Techniques
2 pages
Basics of IoT Networking
No ratings yet
Basics of IoT Networking
9 pages
Os Question Bank
No ratings yet
Os Question Bank
5 pages
SE Assignment Unit 5
No ratings yet
SE Assignment Unit 5
1 page
Cyber Crime
No ratings yet
Cyber Crime
20 pages
Financial Accounting Quiz
No ratings yet
Financial Accounting Quiz
19 pages
Improve Your Market Timing
100% (1)
Improve Your Market Timing
26 pages
Share Article - ABM1
No ratings yet
Share Article - ABM1
3 pages
Ibc End Term Semester 9
No ratings yet
Ibc End Term Semester 9
18 pages
CheckStub - 2024 05 17
No ratings yet
CheckStub - 2024 05 17
1 page
Kampala Club V Uganda Revenue Authority 2025 UGCommC 337 (9 September 2025)
No ratings yet
Kampala Club V Uganda Revenue Authority 2025 UGCommC 337 (9 September 2025)
14 pages
PIA Hawaii Emirates Easy Jet: Breakeven Analysis
No ratings yet
PIA Hawaii Emirates Easy Jet: Breakeven Analysis
3 pages
Shinde Traders Partnership Deed 2023
No ratings yet
Shinde Traders Partnership Deed 2023
6 pages
(Logistics Operation - SharedVersion) Session 4-5 - Logistics Projects
No ratings yet
(Logistics Operation - SharedVersion) Session 4-5 - Logistics Projects
19 pages
Income Tax Act 1936 Compilation 2024
No ratings yet
Income Tax Act 1936 Compilation 2024
402 pages
Mahima Sharma - Contract PDF
No ratings yet
Mahima Sharma - Contract PDF
16 pages
Compilation TLE6 IndustrialArts Week1-4
No ratings yet
Compilation TLE6 IndustrialArts Week1-4
100 pages
Shaista Ishaq CV 2025
No ratings yet
Shaista Ishaq CV 2025
3 pages
Umoja Fund Application Form
No ratings yet
Umoja Fund Application Form
3 pages
Gratuity Form F: Nomination Details
No ratings yet
Gratuity Form F: Nomination Details
3 pages
Head Pulley Replacement SOP
No ratings yet
Head Pulley Replacement SOP
7 pages
ICTNWK546 Network Security Project Guide
No ratings yet
ICTNWK546 Network Security Project Guide
5 pages
Sustainabilityreport 2425
No ratings yet
Sustainabilityreport 2425
194 pages
Investment Foundations Certificate Glossary
No ratings yet
Investment Foundations Certificate Glossary
6 pages
Simulation and Reflection Assignment
No ratings yet
Simulation and Reflection Assignment
3 pages
Economics Term Paper
No ratings yet
Economics Term Paper
16 pages
Annex A Cert of Expenses Not Req Receipts
No ratings yet
Annex A Cert of Expenses Not Req Receipts
26 pages
Bizmeals Offer Letter To P.Gayathri
No ratings yet
Bizmeals Offer Letter To P.Gayathri
4 pages
Financial Accounting Canadian 6th Edition Libby Solutions Manual 1
100% (108)
Financial Accounting Canadian 6th Edition Libby Solutions Manual 1
107 pages
F24 ISD SRS TEMPLATE Complete Ph1 & Ph2
No ratings yet
F24 ISD SRS TEMPLATE Complete Ph1 & Ph2
12 pages
CVP Excel Project
50% (4)
CVP Excel Project
10 pages
SFM Compiler 4.0 - Ca Final - by Ca Ravi Agarwal
100% (1)
SFM Compiler 4.0 - Ca Final - by Ca Ravi Agarwal
613 pages
Contract
No ratings yet
Contract
4 pages
Assignment Moot Problem
No ratings yet
Assignment Moot Problem
4 pages

Module 4

Uploaded by

Module 4

Uploaded by

Module 4

1. What is meant by linked analytical datasets in data science?

If they only look at one dataset, they miss important information.

2. What is meant Linking together strategy in Data Science?

3. Most common types of linking strategies

CustomerID Name Product

b. Left Join (or Left Outer Join)

CustomerID Name Product

c. Right Join (or Right Outer Join)

CustomerID Name Product

d. Full Outer Join

CustomerID Name Product

4. What is the economics of IoT Analytics

Key parts of IoT Analytics economics:

In short we can say:

6. Expalin with real world example economics of IoT Analytics works

Real-World Example: DHL — Smart Logistics with IoT Analytics

🛠 What they did:

🔍 How they used IoT Analytics:

7. What are the Cost Considerations and Revenue Opportunities in Data

c. Data Processing and Cleaning

d. Analytics Tools and Software

e. Talent and Skills

f. Security and Compliance

g. Maintenance and Upgrades

II. Revenue Opportunities

b. Personalized Marketing and Sales

c. New Products and Services

d. Cost Reductions (Efficiency Gains)

f. Selling Data or Insights

8. What is the economics of the predictive maintenance cycle in data science?

1. What is Predictive Maintenance?

2. Economics of the Predictive Maintenance Cycle

💰 Value / Revenue created:

3. Predictive Maintenance Cycle (Economics Flow)

 Investment happens early (installing and setting up).

You might also like