0% found this document useful (0 votes)
8 views10 pages

Module 4

The document explains linked analytical datasets in data science, emphasizing the importance of combining multiple datasets through common identifiers to enhance analysis and insights. It outlines various linking strategies such as inner join, left join, right join, full outer join, and cross join, detailing their applications and implications for data analysis. Additionally, it discusses the economics of IoT analytics, predictive maintenance, and the associated costs and revenue opportunities in data analytics.

Uploaded by

Shreya Saxena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views10 pages

Module 4

The document explains linked analytical datasets in data science, emphasizing the importance of combining multiple datasets through common identifiers to enhance analysis and insights. It outlines various linking strategies such as inner join, left join, right join, full outer join, and cross join, detailing their applications and implications for data analysis. Additionally, it discusses the economics of IoT analytics, predictive maintenance, and the associated costs and revenue opportunities in data analytics.

Uploaded by

Shreya Saxena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Module 4

1. What is meant by linked analytical datasets in data science?


In data science, linked analytical datasets refer to multiple datasets that are
connected or combined through a common key or identifier (like a person ID,
product ID, time, etc.) to create a richer, more complete dataset for analysis.
In simple terms:
It’s when you take two or more separate data sources and "link" them together so you can analyze
more complex relationships.
For example:
 One dataset might have customer information (name, age, location).
 Another dataset might have purchase history (customer ID, items bought, date).
 By linking them using the customer ID, you can study patterns like "what type of
customers buy which products."
Key points:
 Linking usually happens using a shared variable (a key).
 It helps enrich the data — you get more features and context.
 It's critical for better modeling, prediction, and insight generation.
Example

Imagine a hospital wants to predict which patients are at risk of being readmitted
within 30 days.
They have two datasets:
 Dataset 1: Patient Information
(Patient ID, Age, Gender, Medical Conditions)
 Dataset 2: Hospital Visits
(Patient ID, Visit Date, Reason for Visit, Treatment Given)

If they only look at one dataset, they miss important information.


But if they link the two datasets using Patient ID, they can:
✅ See each patient’s medical history and their treatment history
✅ Spot patterns like "patients with certain conditions and treatments are more likely to return"
✅ Build much better predictive models

2. What is meant Linking together strategy in Data Science?


In data science, a "linking together strategy" means planning how to combine
different datasets to create a single, unified dataset that’s ready for analysis.
It’s not just about randomly merging files — it’s a careful strategy to decide:
 Which datasets to link
 How to link them (what keys or fields to use)
 When to link them (before, during, or after cleaning the data)
 What to do if the data doesn't match perfectly (missing IDs, mismatched formats)

In simple words:
It’s a smart plan for how you connect your different pieces of data to build a bigger, more useful
dataset for solving your problem.

Example:
Suppose a company has:
 A customer profile dataset
 A website activity dataset
 A product purchase dataset
Their linking together strategy might be:
 Use Customer ID to join customer profiles to website activity.
 Then link the combined data to purchases using Customer ID again.
 Handle missing data carefully (for customers who browsed but didn’t buy).

Why is it important?
Because bad linking = bad analysis.
Good strategy ensures accuracy, completeness, and better results in models and reports.

3. Most common types of linking strategies


Suppose we have two small datasets:
Customers Table

CustomerID Name
1 Alice
2 Bob
3 Charlie
Purchases Table

CustomerID Product
2 Laptop
3 Smartphone
4 Tablet
a. Inner Join
 Meaning: Only keep records that match in both datasets.
 Example: If a customer exists in both the purchase and profile datasets, keep them. Ignore
the rest.
 Use when: You want only complete matches.
Keeps only the CustomerIDs that exist in both tables.

CustomerID Name Product


2 Bob Laptop
3 Charlie Smartphone

b. Left Join (or Left Outer Join)


 Meaning: Keep all records from the first (left) dataset, even if there’s no match in the
second.
 Example: Keep all customers, even if they didn’t make a purchase.
 Use when: You want everything from your main dataset, plus extra info if available.
 Keeps all customers, even if they didn’t buy anything.

CustomerID Name Product


1 Alice NULL
2 Bob Laptop
3 Charlie Smartphone

c. Right Join (or Right Outer Join)


 Meaning: Opposite of left join — keep all records from the second (right) dataset.
 Less common, but useful if the second dataset is your focus
 Keeps all purchases, even if we don’t know who the customer is.

CustomerID Name Product


2 Bob Laptop
3 Charlie Smartphone
4 NULL Tablet

d. Full Outer Join


 Meaning: Keep everything from both datasets, match when you can, and leave blanks
where you can’t.
 Example: List all customers and all purchases, even if they don't match perfectly.
 Use when: You don’t want to miss anything.
 Keeps everything, even if it doesn’t match perfectly.

CustomerID Name Product


1 Alice NULL
2 Bob Laptop
3 Charlie Smartphone
4 NULL Tablet

e. Cross Join
 Meaning: Match every row of one dataset to every row of the other. (Rarely used.)
 Example: If you have 10 customers and 5 products, you get 50 combinations.
 Use when: You need every possible pair (like for testing scenarios).
 Every row from Customers combined with every row from Purchases. Example for 2
customers and 2 products:

Name Product
Alice Laptop
Alice Smartphone
Bob Laptop
Bob Smartphone

4. What is the economics of IoT Analytics


The economics of IoT Analytics means how money, value, and costs flow when companies
collect and analyze data from Internet of Things (IoT) devices.
It’s about answering questions like:
 How much does it cost to collect, store, and analyze IoT data?
 How much value (money, insights, efficiency) can you create from that data?
 Is it profitable to invest in IoT Analytics?

Key parts of IoT Analytics economics:


1. Costs
 Buying and maintaining sensors/devices
 Networking (getting data from devices to the cloud)
 Data storage (because IoT data is huge and continuous)
 Processing and analytics tools (software, cloud services)
 Security and privacy costs (protecting sensitive data)
2. Value Creation
 Operational efficiency (example: factories predicting machine failure before it
happens)
 New business models (example: smart homes offering subscription services)
 Better decision-making (example: farms adjusting watering based on real-time soil
sensors)
 Customer personalization (example: fitness trackers suggesting workouts)
3. Return on Investment (ROI)
 Companies must calculate:
Value generated minus Total costs
→ If positive, IoT Analytics is economically good for them.
4. Scale Effects
 Bigger companies or projects get cheaper per device (economies of scale).
 Analyzing 1,000 devices is expensive, but analyzing 1 million devices can be much
cheaper per device.
5. Risks
 High upfront costs (especially for hardware and setup)
 Uncertain ROI if the analytics don’t produce useful insights
 Data quality issues (bad data = wrong decisions = wasted money)

Simple Example:
A delivery company installs GPS trackers on all its trucks.
 Cost: Buy trackers + cloud fees + analytics software.
 Value: Save fuel (better routes), predict maintenance (fewer breakdowns), happier
customers.
 If savings > costs → good economics for IoT Analytics.

In short we can say:


Economics of IoT Analytics = making sure that the cost of collecting and analyzing data is less
than the benefits you get from using that data smartly.
5. Explain by Flowchart showing how the economics of IoT Analytics works:

[ Investment ]

Buy Devices + Sensors

Set Up Networking + Cloud Storage

Collect Huge Amounts of Data

Use Analytics Tools + AI to Process Data

Generate Insights

Take Smarter Actions (example: save energy, predict failures)

Create Value (example: cost savings, new revenue)

Calculate Return on Investment (ROI)

Is Value > Cost?
→ Yes → PROFIT
→ No → REVISE STRATEGY

6. Expalin with real world example economics of IoT Analytics works

Real-World Example: DHL — Smart Logistics with IoT Analytics


Company: DHL (one of the world’s largest logistics companies)

🛠 What they did:


 Installed IoT sensors on delivery trucks, shipping containers, and warehouses.
 Sensors collected data like:
➔ Truck location
➔ Temperature inside containers
➔ Vehicle health (engine, tires)
➔ Package movement

🔍 How they used IoT Analytics:


 Real-time Tracking: They could see where every truck and package was at all times.
 Predictive Maintenance: Analyzed truck sensor data to predict breakdowns before they
happened.
 Temperature Monitoring: Made sure sensitive packages (like medicines) stayed within
safe temperatures during shipping.
 Route Optimization: Analyzed traffic and delivery patterns to plan faster and cheaper
routes.

💰 Economic Impact:
 Reduced fuel costs by optimizing delivery routes.
 Fewer truck breakdowns, saving repair and emergency costs.
 Fewer lost or damaged goods, saving money on insurance claims.
 Faster deliveries improved customer satisfaction (bringing more business).
Result:
DHL reported millions of dollars saved each year — the value they gained was much higher than
the cost of installing sensors and running analytics.

7. What are the Cost Considerations and Revenue Opportunities in Data


Analytics

I. Cost Considerations
These are the main expenses companies need to plan for when they invest in data analytics:

a. Data Collection
 Buying or building systems to capture data (sensors, apps, websites, CRM systems).
 Cost of IoT devices if it's physical world data.

b. Data Storage
 Cloud storage (AWS, Azure, Google Cloud) or on-premise data centers.
 Storing huge volumes of structured and unstructured data can be expensive.

c. Data Processing and Cleaning


 Tools, servers, and staff to clean, format, and process messy raw data into useful
information.
 Example: Data engineers and database administrators.

d. Analytics Tools and Software


 Licenses for software like Tableau, Power BI, SAS, or custom AI/ML models.
 Sometimes open-source is free, but at scale, even free tools have infrastructure costs.

e. Talent and Skills


 Hiring data scientists, data analysts, machine learning engineers, and architects.
 Salaries for these roles can be very high!

f. Security and Compliance


 Protecting sensitive data (especially customer data) with cybersecurity measures.
 GDPR, HIPAA, and other legal compliance requirements can add extra costs.

g. Maintenance and Upgrades


 Regular updates to hardware, software, and models to keep everything running efficiently.

II. Revenue Opportunities


These are the ways data analytics helps companies MAKE more money:

a. Better Decision-Making
 Use insights from data to make smarter business decisions.
 Example: Knowing which products are about to trend and increasing stock early.

b. Personalized Marketing and Sales


 Target the right customer with the right offer at the right time.
 Higher conversion rates = more sales.

c. New Products and Services


 Data can reveal customer needs companies didn’t know about.
 Example: Spotify using listening data to create personalized playlists (which keeps users
loyal).

d. Cost Reductions (Efficiency Gains)


 Predictive maintenance in factories saves huge amounts by avoiding unexpected
breakdowns.
 Automation of repetitive tasks with analytics saves on staffing costs.

e. Risk Management
 Predict and prevent risks (like fraud detection in banks) before they cause financial damage.

f. Selling Data or Insights


 Some companies anonymize and sell their collected data to other businesses (carefully,
within legal limits).

⚡ Quick Example
A retail company using customer buying data:
 Cost: $500,000 per year on cloud storage + analytics platform.
 Value created: $2 million extra in sales through better-targeted promotions.
Result: Huge return on investment!

8. What is the economics of the predictive maintenance cycle in data science?

1. What is Predictive Maintenance?


Predictive Maintenance (PdM) means using data (like sensor readings, machine logs,
temperature, vibrations, etc.) to predict when equipment might fail, before it actually fails — so
you can fix it just in time.
Instead of:
 Reactive maintenance (fixing after it breaks — expensive 💸)
 Scheduled maintenance (fixing even if it doesn’t need it — wasteful 🛠)
You do Predictive maintenance → fix only when needed, based on real signs.

2. Economics of the Predictive Maintenance Cycle


It’s about how much money you spend vs. how much money you save or make by using
predictive maintenance.

🛠 Costs involved:
 Sensors and IoT devices on machines (temperature, vibration, usage hours).
 Data storage (because sensors collect data 24/7).
 Analytics software (machine learning models that predict failures).
 Skilled personnel (data scientists, engineers).
 Integration and maintenance of the system itself.

💰 Value / Revenue created:


 Less unexpected downtime (no stopping of production or services).
 Lower repair costs (small fixes instead of big breakdowns).
 Longer machine life (machines are healthier over time).
 Better productivity (no "waiting" for machines to get fixed).
 Safer working conditions (fewer accidents from machine failures).

3. Predictive Maintenance Cycle (Economics Flow)


Install Sensors → Collect Data → Analyze and Predict → Schedule Maintenance at
the Right Time → Avoid Breakdowns → Save Money → Earn More Profit

 Investment happens early (installing and setting up).


 Savings and value keep happening over time (through fewer failures and lower costs).

4. Real-World Example
Airlines use predictive maintenance on aircraft engines:
 Sensors monitor engine vibrations, temperature, air pressure, etc.
 AI models predict when a part will need replacement.
 Result:
 Save millions by avoiding canceled flights.
 Lower maintenance costs.
 Increase passenger satisfaction and trust.

5. Simple Formula
Net Economic Benefit = (Value of avoided failures + Value of improved productivity + Value of
extended asset life)
MINUS
(Cost of sensors + analytics + skilled workers + maintenance of system)
If the Net Benefit is positive and growing, predictive maintenance is economically successful!

🎯 In simple words:
You spend some money up front to set up predictive maintenance, but you save much more
money by catching problems before they explode — and you make your machines and business
run smoother and longer.

You might also like