Module 4
1. What is meant by linked analytical datasets in data science?
In data science, linked analytical datasets refer to multiple datasets that are
connected or combined through a common key or identifier (like a person ID,
product ID, time, etc.) to create a richer, more complete dataset for analysis.
In simple terms:
It’s when you take two or more separate data sources and "link" them together so you can analyze
more complex relationships.
For example:
One dataset might have customer information (name, age, location).
Another dataset might have purchase history (customer ID, items bought, date).
By linking them using the customer ID, you can study patterns like "what type of
customers buy which products."
Key points:
Linking usually happens using a shared variable (a key).
It helps enrich the data — you get more features and context.
It's critical for better modeling, prediction, and insight generation.
Example
Imagine a hospital wants to predict which patients are at risk of being readmitted
within 30 days.
They have two datasets:
Dataset 1: Patient Information
(Patient ID, Age, Gender, Medical Conditions)
Dataset 2: Hospital Visits
(Patient ID, Visit Date, Reason for Visit, Treatment Given)
If they only look at one dataset, they miss important information.
But if they link the two datasets using Patient ID, they can:
✅ See each patient’s medical history and their treatment history
✅ Spot patterns like "patients with certain conditions and treatments are more likely to return"
✅ Build much better predictive models
2. What is meant Linking together strategy in Data Science?
In data science, a "linking together strategy" means planning how to combine
different datasets to create a single, unified dataset that’s ready for analysis.
It’s not just about randomly merging files — it’s a careful strategy to decide:
Which datasets to link
How to link them (what keys or fields to use)
When to link them (before, during, or after cleaning the data)
What to do if the data doesn't match perfectly (missing IDs, mismatched formats)
In simple words:
It’s a smart plan for how you connect your different pieces of data to build a bigger, more useful
dataset for solving your problem.
Example:
Suppose a company has:
A customer profile dataset
A website activity dataset
A product purchase dataset
Their linking together strategy might be:
Use Customer ID to join customer profiles to website activity.
Then link the combined data to purchases using Customer ID again.
Handle missing data carefully (for customers who browsed but didn’t buy).
Why is it important?
Because bad linking = bad analysis.
Good strategy ensures accuracy, completeness, and better results in models and reports.
3. Most common types of linking strategies
Suppose we have two small datasets:
Customers Table
CustomerID Name
1 Alice
2 Bob
3 Charlie
Purchases Table
CustomerID Product
2 Laptop
3 Smartphone
4 Tablet
a. Inner Join
Meaning: Only keep records that match in both datasets.
Example: If a customer exists in both the purchase and profile datasets, keep them. Ignore
the rest.
Use when: You want only complete matches.
Keeps only the CustomerIDs that exist in both tables.
CustomerID Name Product
2 Bob Laptop
3 Charlie Smartphone
b. Left Join (or Left Outer Join)
Meaning: Keep all records from the first (left) dataset, even if there’s no match in the
second.
Example: Keep all customers, even if they didn’t make a purchase.
Use when: You want everything from your main dataset, plus extra info if available.
Keeps all customers, even if they didn’t buy anything.
CustomerID Name Product
1 Alice NULL
2 Bob Laptop
3 Charlie Smartphone
c. Right Join (or Right Outer Join)
Meaning: Opposite of left join — keep all records from the second (right) dataset.
Less common, but useful if the second dataset is your focus
Keeps all purchases, even if we don’t know who the customer is.
CustomerID Name Product
2 Bob Laptop
3 Charlie Smartphone
4 NULL Tablet
d. Full Outer Join
Meaning: Keep everything from both datasets, match when you can, and leave blanks
where you can’t.
Example: List all customers and all purchases, even if they don't match perfectly.
Use when: You don’t want to miss anything.
Keeps everything, even if it doesn’t match perfectly.
CustomerID Name Product
1 Alice NULL
2 Bob Laptop
3 Charlie Smartphone
4 NULL Tablet
e. Cross Join
Meaning: Match every row of one dataset to every row of the other. (Rarely used.)
Example: If you have 10 customers and 5 products, you get 50 combinations.
Use when: You need every possible pair (like for testing scenarios).
Every row from Customers combined with every row from Purchases. Example for 2
customers and 2 products:
Name Product
Alice Laptop
Alice Smartphone
Bob Laptop
Bob Smartphone
4. What is the economics of IoT Analytics
The economics of IoT Analytics means how money, value, and costs flow when companies
collect and analyze data from Internet of Things (IoT) devices.
It’s about answering questions like:
How much does it cost to collect, store, and analyze IoT data?
How much value (money, insights, efficiency) can you create from that data?
Is it profitable to invest in IoT Analytics?
Key parts of IoT Analytics economics:
1. Costs
Buying and maintaining sensors/devices
Networking (getting data from devices to the cloud)
Data storage (because IoT data is huge and continuous)
Processing and analytics tools (software, cloud services)
Security and privacy costs (protecting sensitive data)
2. Value Creation
Operational efficiency (example: factories predicting machine failure before it
happens)
New business models (example: smart homes offering subscription services)
Better decision-making (example: farms adjusting watering based on real-time soil
sensors)
Customer personalization (example: fitness trackers suggesting workouts)
3. Return on Investment (ROI)
Companies must calculate:
Value generated minus Total costs
→ If positive, IoT Analytics is economically good for them.
4. Scale Effects
Bigger companies or projects get cheaper per device (economies of scale).
Analyzing 1,000 devices is expensive, but analyzing 1 million devices can be much
cheaper per device.
5. Risks
High upfront costs (especially for hardware and setup)
Uncertain ROI if the analytics don’t produce useful insights
Data quality issues (bad data = wrong decisions = wasted money)
Simple Example:
A delivery company installs GPS trackers on all its trucks.
Cost: Buy trackers + cloud fees + analytics software.
Value: Save fuel (better routes), predict maintenance (fewer breakdowns), happier
customers.
If savings > costs → good economics for IoT Analytics.
In short we can say:
Economics of IoT Analytics = making sure that the cost of collecting and analyzing data is less
than the benefits you get from using that data smartly.
5. Explain by Flowchart showing how the economics of IoT Analytics works:
[ Investment ]
↓
Buy Devices + Sensors
↓
Set Up Networking + Cloud Storage
↓
Collect Huge Amounts of Data
↓
Use Analytics Tools + AI to Process Data
↓
Generate Insights
↓
Take Smarter Actions (example: save energy, predict failures)
↓
Create Value (example: cost savings, new revenue)
↓
Calculate Return on Investment (ROI)
↓
Is Value > Cost?
→ Yes → PROFIT
→ No → REVISE STRATEGY
6. Expalin with real world example economics of IoT Analytics works
Real-World Example: DHL — Smart Logistics with IoT Analytics
Company: DHL (one of the world’s largest logistics companies)
🛠 What they did:
Installed IoT sensors on delivery trucks, shipping containers, and warehouses.
Sensors collected data like:
➔ Truck location
➔ Temperature inside containers
➔ Vehicle health (engine, tires)
➔ Package movement
🔍 How they used IoT Analytics:
Real-time Tracking: They could see where every truck and package was at all times.
Predictive Maintenance: Analyzed truck sensor data to predict breakdowns before they
happened.
Temperature Monitoring: Made sure sensitive packages (like medicines) stayed within
safe temperatures during shipping.
Route Optimization: Analyzed traffic and delivery patterns to plan faster and cheaper
routes.
💰 Economic Impact:
Reduced fuel costs by optimizing delivery routes.
Fewer truck breakdowns, saving repair and emergency costs.
Fewer lost or damaged goods, saving money on insurance claims.
Faster deliveries improved customer satisfaction (bringing more business).
Result:
DHL reported millions of dollars saved each year — the value they gained was much higher than
the cost of installing sensors and running analytics.
7. What are the Cost Considerations and Revenue Opportunities in Data
Analytics
I. Cost Considerations
These are the main expenses companies need to plan for when they invest in data analytics:
a. Data Collection
Buying or building systems to capture data (sensors, apps, websites, CRM systems).
Cost of IoT devices if it's physical world data.
b. Data Storage
Cloud storage (AWS, Azure, Google Cloud) or on-premise data centers.
Storing huge volumes of structured and unstructured data can be expensive.
c. Data Processing and Cleaning
Tools, servers, and staff to clean, format, and process messy raw data into useful
information.
Example: Data engineers and database administrators.
d. Analytics Tools and Software
Licenses for software like Tableau, Power BI, SAS, or custom AI/ML models.
Sometimes open-source is free, but at scale, even free tools have infrastructure costs.
e. Talent and Skills
Hiring data scientists, data analysts, machine learning engineers, and architects.
Salaries for these roles can be very high!
f. Security and Compliance
Protecting sensitive data (especially customer data) with cybersecurity measures.
GDPR, HIPAA, and other legal compliance requirements can add extra costs.
g. Maintenance and Upgrades
Regular updates to hardware, software, and models to keep everything running efficiently.
II. Revenue Opportunities
These are the ways data analytics helps companies MAKE more money:
a. Better Decision-Making
Use insights from data to make smarter business decisions.
Example: Knowing which products are about to trend and increasing stock early.
b. Personalized Marketing and Sales
Target the right customer with the right offer at the right time.
Higher conversion rates = more sales.
c. New Products and Services
Data can reveal customer needs companies didn’t know about.
Example: Spotify using listening data to create personalized playlists (which keeps users
loyal).
d. Cost Reductions (Efficiency Gains)
Predictive maintenance in factories saves huge amounts by avoiding unexpected
breakdowns.
Automation of repetitive tasks with analytics saves on staffing costs.
e. Risk Management
Predict and prevent risks (like fraud detection in banks) before they cause financial damage.
f. Selling Data or Insights
Some companies anonymize and sell their collected data to other businesses (carefully,
within legal limits).
⚡ Quick Example
A retail company using customer buying data:
Cost: $500,000 per year on cloud storage + analytics platform.
Value created: $2 million extra in sales through better-targeted promotions.
Result: Huge return on investment!
8. What is the economics of the predictive maintenance cycle in data science?
1. What is Predictive Maintenance?
Predictive Maintenance (PdM) means using data (like sensor readings, machine logs,
temperature, vibrations, etc.) to predict when equipment might fail, before it actually fails — so
you can fix it just in time.
Instead of:
Reactive maintenance (fixing after it breaks — expensive 💸)
Scheduled maintenance (fixing even if it doesn’t need it — wasteful 🛠)
You do Predictive maintenance → fix only when needed, based on real signs.
2. Economics of the Predictive Maintenance Cycle
It’s about how much money you spend vs. how much money you save or make by using
predictive maintenance.
🛠 Costs involved:
Sensors and IoT devices on machines (temperature, vibration, usage hours).
Data storage (because sensors collect data 24/7).
Analytics software (machine learning models that predict failures).
Skilled personnel (data scientists, engineers).
Integration and maintenance of the system itself.
💰 Value / Revenue created:
Less unexpected downtime (no stopping of production or services).
Lower repair costs (small fixes instead of big breakdowns).
Longer machine life (machines are healthier over time).
Better productivity (no "waiting" for machines to get fixed).
Safer working conditions (fewer accidents from machine failures).
3. Predictive Maintenance Cycle (Economics Flow)
Install Sensors → Collect Data → Analyze and Predict → Schedule Maintenance at
the Right Time → Avoid Breakdowns → Save Money → Earn More Profit
Investment happens early (installing and setting up).
Savings and value keep happening over time (through fewer failures and lower costs).
4. Real-World Example
Airlines use predictive maintenance on aircraft engines:
Sensors monitor engine vibrations, temperature, air pressure, etc.
AI models predict when a part will need replacement.
Result:
Save millions by avoiding canceled flights.
Lower maintenance costs.
Increase passenger satisfaction and trust.
5. Simple Formula
Net Economic Benefit = (Value of avoided failures + Value of improved productivity + Value of
extended asset life)
MINUS
(Cost of sensors + analytics + skilled workers + maintenance of system)
If the Net Benefit is positive and growing, predictive maintenance is economically successful!
🎯 In simple words:
You spend some money up front to set up predictive maintenance, but you save much more
money by catching problems before they explode — and you make your machines and business
run smoother and longer.