0% found this document useful (0 votes)
6 views76 pages

Big Data & Cloud Computing

The document provides an overview of data mining, defining it as the process of discovering meaningful patterns and information from large datasets. It discusses various types of data mining, advantages, disadvantages, applications, challenges, techniques, and the implementation process. Additionally, it contrasts data mining with data analytics and machine learning, highlighting their unique focuses and methodologies.

Uploaded by

MECH HOD
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views76 pages

Big Data & Cloud Computing

The document provides an overview of data mining, defining it as the process of discovering meaningful patterns and information from large datasets. It discusses various types of data mining, advantages, disadvantages, applications, challenges, techniques, and the implementation process. Additionally, it contrasts data mining with data analytics and machine learning, highlighting their unique focuses and methodologies.

Uploaded by

MECH HOD
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

BSDVP ACADEMY

C23 – 5th SEMESTER – CSE – 502

1|Page BSDVP ACADEMY


BSDVP ACADEMY

Chapter-1- Over View of Data Mining


1.1 Define Data Mining
• Definition:
Data Mining is the process of discovering meaningful patterns, trends, relationships,
and useful information from large sets of data using statistical, mathematical, and
computational techniques.
• Also called Knowledge Discovery in Databases (KDD).
• It is widely used in business, science, healthcare, and engineering for decision-
making.

1.2 Types of Data Mining


1. Predictive Data Mining – predicts unknown or future values (e.g., sales forecast).
2. Descriptive Data Mining – describes existing patterns in data (e.g., customer
segmentation).
3. Diagnostic Data Mining – explains why something happened.
4. Prescriptive Data Mining – suggests actions for future outcomes.
5. Text/Data/Web Mining – mining data from documents, text, and the internet.

1.3 Advantages of Data Mining


• Helps in decision making using hidden patterns.
• Improves business strategies by analyzing customer behavior.
• Detects fraudulent activities (e.g., banking/insurance).
• Supports market basket analysis in retail.
• Saves time and cost by automating data analysis.

1.4 Disadvantages of Data Mining


• Privacy issues (e.g., misuse of personal data).
• Security concerns with sensitive databases.
2|Page BSDVP ACADEMY
BSDVP ACADEMY

• Requires large datasets and high computation power.


• Results may be misinterpreted if not handled properly.
• High cost of implementation.

1.5 Applications of Data Mining


• Banking & Finance → fraud detection, credit scoring.
• Retail → customer purchase patterns, recommendation systems.
• Healthcare → disease prediction, patient record analysis.
• Telecommunications → customer churn prediction.
• Education → student performance analysis.
• Government → crime detection, e-governance.

1.6 Challenges of Implementation in Data Mining


• Handling large volumes of data (Big Data).
• Ensuring data quality (accuracy, completeness).
• Privacy and security of sensitive data.
• Integrating data from multiple sources.
• Requirement of skilled professionals.
• High cost and time of implementation.

1.7 Evolution of Data Mining


• 1960s–1970s: Database creation (flat files, hierarchical, network DBMS).
• 1980s: Relational databases, SQL, basic reporting.
• 1990s: OLAP, machine learning, introduction of KDD.
• 2000s: Web mining, big data analytics.
• 2010s onwards: Integration with AI, cloud, IoT, and deep learning.

3|Page BSDVP ACADEMY


BSDVP ACADEMY

1.8 Data Mining Techniques


1. Classification – assigning data into predefined categories (e.g., spam email detection).
2. Clustering – grouping similar data objects (e.g., market segmentation).
3. Association Rule Learning – finding relationships among data (e.g., “if X is bought, Y
is also bought”).
4. Regression – predicting continuous values (e.g., predicting house prices).
5. Anomaly Detection – identifying unusual data (e.g., fraud detection).
6. Sequential Patterns – analyzing event sequences (e.g., customer buying behavior).

1.9 Data Mining Implementation Process


1. Understanding business objectives.
2. Data collection & integration.
3. Data cleaning & preprocessing.
4. Data transformation (normalization, reduction).
5. Choosing mining techniques.
6. Pattern discovery & evaluation.
7. Knowledge presentation (reports, visualization).
8. Decision-making & deployment.

1.10 Data Mining Architecture


• Components:
1. Database/Data Warehouse – stores data.
2. Database/OLAP server – provides data access.
3. Data Mining Engine – applies mining algorithms.
4. Pattern Evaluation Module – identifies valid patterns.
5. Graphical User Interface (GUI) – user interaction.
4|Page BSDVP ACADEMY
BSDVP ACADEMY

6. Knowledge Base – stores rules and domain knowledge.

1.11 KDD (Knowledge Discovery in Databases)


• Definition: A process of identifying valid, novel, useful, and understandable patterns
from large data sets.
• Steps in KDD:
1. Data selection.
2. Data preprocessing.
3. Data transformation.
4. Data mining (core step).
5. Interpretation and evaluation.

1.12 Data Mining Tools


• WEKA – open-source machine learning & mining tool.
• RapidMiner – predictive analytics tool.
• KNIME – open-source analytics platform.
• Orange – visualization and data analysis tool.
• SAS Enterprise Miner – commercial data mining tool.
• R & Python Libraries – powerful data analysis programming tools.

1.13 Major Differences: Data Mining vs Machine Learning

Data Mining Machine Learning

Focus: Extracting patterns from existing data. Focus: Building models that learn from data.

Relies on databases & statistics. Relies on algorithms & learning methods.

Human-guided process. System self-learns & adapts.

Goal: Discover knowledge. Goal: Predict and improve accuracy.

5|Page BSDVP ACADEMY


BSDVP ACADEMY

1.14 Importance of Data Analytics


• Provides insights for decision making.
• Improves business efficiency and productivity.
• Identifies risks and frauds.
• Supports innovation and strategy development.
• Helps in customer satisfaction and personalization.

1.15 Phases of Data Analytics


1. Data Collection – gathering relevant data.
2. Data Cleaning – removing errors and duplicates.
3. Data Exploration – analyzing basic patterns.
4. Data Modeling – applying statistical/machine learning methods.
5. Data Interpretation & Visualization – presenting results.
6. Decision-making & Action – using insights.

1.16 Data Mining vs Data Analytics

Data Mining Data Analytics

Focus on discovering hidden patterns. Focus on analyzing data to solve problems.

Uses algorithms like classification, clustering. Uses statistical & visualization techniques.

More automated and machine-driven. More human-driven (business-focused).

Example: Finding buying patterns. Example: Understanding sales performance.

1.17 Types of Data Mining Techniques


• Classification
• Clustering
6|Page BSDVP ACADEMY
BSDVP ACADEMY

• Regression
• Association Rules
• Anomaly Detection
• Sequential Pattern Mining
• Text Mining
• Web Mining

1.18 Text Data Mining


• Process of discovering patterns and useful information from textual data (emails,
documents, social media posts).
• Techniques used: Natural Language Processing (NLP), text classification, sentiment
analysis.
• Applications: spam filtering, sentiment analysis, document classification.

1.19 Classification vs Clustering

Classification Clustering

Supervised learning technique. Unsupervised learning technique.

Data is assigned to predefined Groups are formed without predefined


classes/labels. labels.

Example: Email → spam or not spam. Example: Customer segmentation.

Accuracy depends on training data. Accuracy depends on similarity measures.

7|Page BSDVP ACADEMY


BSDVP ACADEMY

3 Marks Questions
1.1 Define Data Mining
1. Define Data Mining.
2. Why is Data Mining also called KDD?
1.2 Types of Data Mining
3. List any four types of Data Mining.
4. Differentiate between predictive and descriptive data mining.
1.3 Advantages of Data Mining
5. List any three advantages of Data Mining.
6. How does Data Mining help in fraud detection?
1.4 Disadvantages of Data Mining
7. Write any three disadvantages of Data Mining.
8. Why is privacy a major issue in Data Mining?
1.5 Applications of Data Mining
9. List three applications of Data Mining in business.
10. Write three applications of Data Mining in healthcare.
1.6 Challenges of Implementation
11. List any three challenges in Data Mining implementation.
12. Why is data quality important in Data Mining?
1.7 Evolution of Data Mining
13. Write about the evolution of Data Mining in the 1980s.
14. Mention two recent trends in Data Mining evolution.
1.8 Data Mining Techniques
15. List three Data Mining techniques.
16. Differentiate between classification and regression.
1.9 Implementation Process
17. List three steps in Data Mining implementation process.
18. Why is data preprocessing important in Data Mining?
1.10 Data Mining Architecture

8|Page BSDVP ACADEMY


BSDVP ACADEMY

19. Write any three components of Data Mining architecture.


20. What is the role of the knowledge base in Data Mining architecture?
1.11 KDD Process
21. List the main steps in KDD.
22. What is the core step in KDD?
1.12 Data Mining Tools
23. List any three Data Mining tools.
24. Mention any two advantages of using WEKA.
1.13 Data Mining vs Machine Learning
25. Write one difference between Data Mining and Machine Learning.
26. State the goal of Machine Learning.
1.14 Importance of Data Analytics
27. List three importance points of Data Analytics.
28. How does Data Analytics help businesses?
1.15 Phases of Data Analytics
29. List three phases of Data Analytics.
30. What is the role of data visualization in Data Analytics?
1.16 Data Mining vs Data Analytics
31. Write any two differences between Data Mining and Data Analytics.
32. Give one example of Data Mining and one example of Data Analytics.
1.17 Types of Data Mining Techniques
33. List any four Data Mining techniques.
34. What is association rule mining?
1.18 Text Data Mining
35. Define Text Data Mining.
36. Write any two applications of Text Data Mining.
1.19 Classification vs Clustering
37. List two differences between classification and clustering.
38. Give one example each for classification and clustering.

9|Page BSDVP ACADEMY


BSDVP ACADEMY

Detailed Answers
1.1 Define Data Mining
Q1: Define Data Mining
Ans)
1. Data Mining is the process of discovering hidden patterns, trends, and useful
information from large datasets.
2. It uses statistical, mathematical, and computational techniques.
3. Helps in decision-making and prediction.
Q2: Why is Data Mining also called KDD?
Ans)
1. KDD stands for Knowledge Discovery in Databases.
2. Data Mining is a core step in KDD, hence sometimes called KDD.
3. It focuses on extracting knowledge from data.

1.2 Types of Data Mining


Q3: List any four types of Data Mining
Ans)
1. Predictive Data Mining
2. Descriptive Data Mining
3. Diagnostic Data Mining
4. Prescriptive Data Mining
Q4: Differentiate between predictive and descriptive data mining
Ans)
1. Predictive: Predicts future or unknown outcomes (e.g., sales forecasting).
2. Descriptive: Summarizes and describes patterns in existing data (e.g., customer
segmentation).

1.3 Advantages of Data Mining

10 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

Q5: List any three advantages of Data Mining


Ans)
1. Supports decision making by discovering patterns.
2. Detects fraud and anomalies.
3. Helps in market analysis and customer behavior understanding.
Q6: How does Data Mining help in fraud detection?
Ans)
1. Analyzes transaction patterns.
2. Detects unusual or suspicious activity.
3. Alerts organizations for preventive action.

1.4 Disadvantages of Data Mining


Q7: Write any three disadvantages of Data Mining
Ans)
1. Privacy and security issues.
2. High cost and computational requirements.
3. Risk of misinterpretation of results.
Q8: Why is privacy a major issue in Data Mining?
Ans)
1. Mining may reveal sensitive personal or organizational data.
2. Can lead to misuse or unauthorized access.
3. Raises ethical and legal concerns.

1.5 Applications of Data Mining


Q9: List three applications of Data Mining in business
Ans)
1. Customer segmentation.
2. Market basket analysis.
11 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

3. Sales and demand forecasting.


Q10: Write three applications of Data Mining in healthcare
Ans)
1. Disease prediction and diagnosis.
2. Analysis of patient treatment records.
3. Drug discovery and clinical trial analysis.

1.6 Challenges of Implementation


Q11: List any three challenges in Data Mining implementation
Ans)
1. Handling large volumes of data (Big Data).
2. Ensuring data quality and consistency.
3. Requirement of skilled professionals.
Q12: Why is data quality important in Data Mining?
Ans)
1. Accurate patterns can be discovered only from clean and correct data.
2. Poor data leads to wrong conclusions.
3. Ensures reliable decision-making.

1.7 Evolution of Data Mining


Q13: Write about the evolution of Data Mining in the 1980s
Ans)
1. Relational databases became popular.
2. SQL used for querying data.
3. Basic reporting and statistical analysis started.
Q14: Mention two recent trends in Data Mining evolution
Ans)
1. Integration with AI, machine learning, and deep learning.
12 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

2. Big Data and cloud-based analytics.

1.8 Data Mining Techniques


Q15: List three Data Mining techniques
Ans)
1. Classification
2. Clustering
3. Association Rule Learning
Q16: Differentiate between classification and regression
Ans)
1. Classification: Predicts categorical values (e.g., spam/not spam).
2. Regression: Predicts continuous numerical values (e.g., house price).
3. Both are supervised learning techniques.

1.9 Implementation Process


Q17: List three steps in Data Mining implementation process
Ans)
1. Data collection and integration
2. Data preprocessing and cleaning
3. Applying mining techniques
Q18: Why is data preprocessing important in Data Mining?
Ans)
1. Removes inaccuracies and inconsistencies.
2. Converts data into required formats.
3. Improves accuracy of mining results.

1.10 Data Mining Architecture

13 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

Q19: Write any three components of Data Mining architecture


Ans)
1. Database/Data Warehouse
2. Data Mining Engine
3. Graphical User Interface (GUI)
Q20: What is the role of the knowledge base in Data Mining architecture?
Ans)
1. Stores domain knowledge and rules.
2. Helps in interpreting discovered patterns.
3. Supports decision-making and evaluation.

1.11 KDD Process


Q21: List the main steps in KDD
Ans)
1. Data selection
2. Data preprocessing
3. Data transformation
4. Data mining
5. Interpretation and evaluation
Q22: What is the core step in KDD?
Ans)
1. Data Mining is the core step.
2. It discovers hidden patterns and knowledge.

1.12 Data Mining Tools


Q23: List any three Data Mining tools
Ans)
1. WEKA
14 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

2. RapidMiner
3. KNIME
Q24: Mention any two advantages of using WEKA
Ans)
1. Open-source and free to use.
2. Provides various algorithms for classification, regression, clustering.

1.13 Data Mining vs Machine Learning


Q25: Write one difference between Data Mining and Machine Learning
Ans)
1. Data Mining discovers patterns from data.
2. Machine Learning learns models to predict or classify.
Q26: State the goal of Machine Learning
Ans)
1. To learn patterns from data automatically.
2. Predict outcomes and improve model performance.

1.14 Importance of Data Analytics


Q27: List three importance points of Data Analytics
Ans)
1. Supports decision-making.
2. Helps in risk management and fraud detection.
3. Improves business productivity and efficiency.
Q28: How does Data Analytics help businesses?
Ans)
1. Analyzes customer behavior.
2. Guides marketing and operational strategies.
3. Identifies opportunities and threats.
15 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

1.15 Phases of Data Analytics


Q29: List three phases of Data Analytics
Ans)
1. Data Collection
2. Data Cleaning
3. Data Modeling
Q30: What is the role of data visualization in Data Analytics?
Ans)
1. Represents data in charts, graphs, and dashboards.
2. Makes complex data easy to interpret.
3. Supports quick decision-making.

1.16 Data Mining vs Data Analytics


Q31: Write any two differences between Data Mining and Data Analytics
Ans)
1. Data Mining discovers hidden patterns, while Data Analytics analyzes data to solve
business problems.
2. Data Mining uses algorithms, while Data Analytics uses statistical methods and
visualization.
Q32: Give one example of Data Mining and one example of Data Analytics
Ans)
1. Data Mining: Market basket analysis
2. Data Analytics: Sales performance analysis

1.17 Types of Data Mining Techniques


Q33: List any four Data Mining techniques
Ans)

16 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

1. Classification
2. Clustering
3. Regression
4. Association Rule Mining
Q34: What is association rule mining?
Ans)
1. Technique to find relationships between items.
2. Example: “If a customer buys bread, they also buy butter.”

1.18 Text Data Mining


Q35: Define Text Data Mining
Ans)
1. Process of extracting useful patterns and knowledge from text data.
2. Uses NLP and statistical methods.
Q36: Write any two applications of Text Data Mining
Ans)
1. Spam email detection
2. Sentiment analysis of social media posts

1.19 Classification vs Clustering


Q37: List two differences between classification and clustering
Ans)
1. Classification: Supervised learning; Clustering: Unsupervised learning.
2. Classification uses predefined labels, clustering does not use labels.
Q38: Give one example each for classification and clustering
Ans)
1. Classification: Email → Spam / Not Spam
2. Clustering: Customer segmentation in retail
17 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

10 Marks Questions
Very Very Important Questions (VV Imp)
1. Explain the Data Mining process in detail with a diagram.
2. Describe Data Mining architecture and its components with a neat diagram.
3. Explain Classification, Clustering, and Association Rule Mining techniques with examples.
4. Discuss the evolution of Data Mining from 1960s to present.
5. Explain KDD (Knowledge Discovery in Databases) process with steps and diagram.

Very Important Questions (V Imp)


6. Explain the types of Data Mining techniques in detail.
7. Differentiate between Data Mining, Data Analytics, and Machine Learning with examples.
8. Explain Text Data Mining, its techniques, and applications.
9. Discuss the challenges and limitations of Data Mining.
10. Explain the importance of Data Analytics and its phases with examples.

Important Questions (Imp)


11. List and explain the advantages and disadvantages of Data Mining.
12. Explain the applications of Data Mining in business, healthcare, and education.
13. Explain classification vs clustering in detail with examples.
14. Discuss predictive, descriptive, diagnostic, and prescriptive Data Mining.
15. Explain the major Data Mining tools and their features.

18 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

Chapter-2- Over View Of Data Ware Housing


2.1 Define Data Warehousing
• Definition:
Data Warehousing is a process of collecting, storing, and managing large volumes of
data from multiple sources in a central repository (called a data warehouse), designed
for querying, analysis, and decision-making.
• A data warehouse supports business intelligence (BI), reporting, and analytics rather
than day-to-day operations.

2.2 Importance of Data Warehousing


• Provides a centralized view of organizational data.
• Supports historical data analysis (not just real-time data).
• Enables business decision-making through reports, dashboards, and trends.
• Improves data quality and consistency (integration from different sources).
• Reduces query response time by using optimized structures.
• Supports Data Mining, OLAP (Online Analytical Processing), and predictive analytics.

2.3 Differences between Database and Data Warehouse

Aspect Database Data Warehouse

Purpose Stores current operational data Stores historical and integrated data

Usage Transaction processing (OLTP) Analysis and decision support (OLAP)

Data Type Current, real-time Historical, summarized

Data Structure Normalized Denormalized (optimized for queries)

Updates Frequent (Insert, Update, Delete) Periodic (Batch loads)

Example Banking system records Sales analysis reports

2.4 Data Warehouse Architecture


19 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

The architecture defines how data flows from sources to the warehouse and then to users.
Components:
1. Data Sources – operational databases, flat files, external sources.
2. ETL (Extract, Transform, Load) tools – extract data, clean/transform it, and load it into
the warehouse.
3. Data Warehouse Database – central repository for integrated data.
4. Metadata Repository – stores information about data (data about data).
5. Data Marts – subject-oriented subsets (e.g., Sales, Finance).
6. OLAP Tools – support analytical queries.
7. End-user Tools – dashboards, reports, queries.

2.5 Three-Tier Data Warehouse Architecture


1. Bottom Tier (Data Sources + ETL Layer):
o Data from multiple sources is extracted, transformed, and loaded.
2. Middle Tier (Data Warehouse Server):
o Central warehouse database where data is stored, often in OLAP cubes.
3. Top Tier (Front-End Tools):
o User interfaces for reporting, querying, and analysis.

This three-tier model improves performance, scalability, and user interaction.

2.6 Importance of Operational Data Stores (ODS)


• An Operational Data Store (ODS) is an intermediate storage between operational
systems and the warehouse.
• Importance:
o Stores current/near real-time data for short-term decision-making.
o Acts as a staging area before data moves into the warehouse.
o Useful for frequent updates and quick operational reporting.
20 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

2.7 Define ETL and ELT


• ETL (Extract, Transform, Load):
o Data is extracted from sources → transformed (cleaned, formatted) → loaded
into the data warehouse.
• ELT (Extract, Load, Transform):
o Data is extracted → loaded into the warehouse first → transformation
happens inside the warehouse using its computing power.

2.8 Types of Data Warehouses


1. Enterprise Data Warehouse (EDW):
o Central warehouse for the entire organization.
2. Operational Data Store (ODS):
o Stores real-time or near-real-time operational data.
3. Data Marts:
o Subset of a warehouse, focused on a specific business area (sales, finance, HR).

2.9 Data Warehousing Model


1. Enterprise Warehouse Model – Centralized storage for the entire organization.
2. Data Mart Model – Department-specific warehouses.
3. Virtual Warehouse Model – Uses views on existing databases without physically
storing all data.

2.10 Data Warehouse Design Approaches


1. Top-Down Approach (Inmon):
o Build an Enterprise Data Warehouse (EDW) first, then create data marts.
o Advantage: Consistent, integrated.

21 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

o Disadvantage: Time-consuming, costly.


2. Bottom-Up Approach (Kimball):
o Start with Data Marts, then integrate into a warehouse.
o Advantage: Quick implementation.
o Disadvantage: May face integration issues later.
3. Hybrid Approach:
o Combines both methods for flexibility and speed.

2.11 Define Terms


• Metadata: Data about data (e.g., source, type, meaning, relationships).
• Data Mart: A subset of a warehouse focused on a particular department or subject
area.

2.12 Define OLAP


• OLAP (Online Analytical Processing):
A technology that allows fast analysis of multidimensional data stored in the
warehouse.

2.13 Characteristics of OLAP


• Multidimensional data representation (cube format).
• Supports complex queries (slice, dice, drill-down, roll-up).
• Provides aggregated and summarized data.
• Fast response to analytical queries.
• Helps in trend analysis and forecasting.

2.14 Differences between OLTP and OLAP

22 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

Aspect OLTP OLAP

Full Form Online Transaction Processing Online Analytical Processing

Purpose Day-to-day operations Analysis and decision support

Data Nature Current, real-time Historical, summarized

Queries Simple, read/write Complex, read-intensive

Example ATM transactions Sales forecasting

2.15 Types of OLAP


1. MOLAP (Multidimensional OLAP):
o Uses multidimensional cubes for analysis.
o Fast, but limited scalability.
2. ROLAP (Relational OLAP):
o Uses relational databases with OLAP functionalities.
o Scalable, but slower than MOLAP.
3. HOLAP (Hybrid OLAP):
o Combines MOLAP and ROLAP advantages.

2.16 Difference between Data Mining and Data Warehousing

Aspect Data Warehousing Data Mining

Storage, integration, organization of Extracting hidden patterns, trends,


Purpose
data knowledge

Function Data consolidation and management Knowledge discovery

Tools ETL, OLAP, Reporting Machine Learning, AI, Statistics

Example Sales data storage Predicting customer buying behavior

23 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

3 Marks Questions
3 Marks Questions
Very Most Important Questions
1. Define Data Warehousing.
2. State the importance of Data Warehousing.
3. List any four differences between Database and Data Warehouse.
4. Draw and explain Three-Tier Data Warehouse Architecture.
5. Define ETL and ELT.
6. Differentiate between OLTP and OLAP.
7. List the types of OLAP.

Most Important Questions


8. State the importance of Operational Data Stores.
9. List the types of Data Warehouses.
10. Explain Data Warehousing Models.
11. Write short notes on Data Warehouse Design Approaches.
12. Define Metadata and Data Mart.
13. List any four characteristics of OLAP.
14. Differentiate between Data Mining and Data Warehousing.

Important Questions
15. Explain the role of Metadata in a Data Warehouse.
16. What is a Data Mart? Give an example.
17. Write any three components of Data Warehouse Architecture.
18. Write any three advantages of OLAP.
19. List any three uses of Data Warehousing.
20. Write short notes on Virtual Data Warehouse.

24 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

Detailed Answers
Very Most Important Questions
1. Define Data Warehousing
Ans)
• Data Warehousing is the process of collecting, storing, and managing data from multiple sources
into a central repository.
• It is designed for querying, analysis, and business decision-making rather than day-to-day
operations.
• Supports historical data analysis, reporting, and business intelligence (BI).

2. State the importance of Data Warehousing


Ans)
• Provides a centralized view of all organizational data.
• Supports historical and trend analysis for decision-making.
• Ensures data consistency and quality across the organization.
• Reduces query response time for analytical tasks.
• Enables Data Mining and OLAP for advanced analytics.

3. List any four differences between Database and Data Warehouse


Ans)

Database Data Warehouse

Stores current operational data Stores historical and integrated data

Used for transaction processing (OLTP) Used for analysis and decision support (OLAP)

Data is updated frequently Data is updated periodically (batch)

Normalized for efficient transaction Denormalized for fast querying and reporting

4. Draw and explain Three-Tier Data Warehouse Architecture


Ans)
• Bottom Tier (Data Sources + ETL Layer): Data is extracted, cleaned, and transformed before
loading.
• Middle Tier (Data Warehouse Server): Stores centralized, integrated data; often uses OLAP cubes.
25 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

• Top Tier (Front-End Tools): Provides reporting, dashboards, and query interfaces for users.
• Purpose: Improves performance, scalability, and analytical capabilities.

5. Define ETL and ELT


Ans)
• ETL (Extract, Transform, Load):
1. Extract data from multiple sources.
2. Transform data (cleaning, formatting, integration).
3. Load transformed data into the data warehouse.
• ELT (Extract, Load, Transform):
1. Extract data from sources.
2. Load raw data directly into warehouse.
3. Transform data inside the warehouse using its computing power.

6. Differentiate between OLTP and OLAP


Ans)

Feature OLTP OLAP

Full Form Online Transaction Processing Online Analytical Processing

Purpose Day-to-day operations Analysis and decision-making

Data Type Current, real-time Historical, summarized

Queries Simple, frequent Complex, read-intensive

Example ATM transactions Sales trend analysis

7. List the types of OLAP


Ans)
• MOLAP (Multidimensional OLAP): Uses multidimensional cubes for fast analysis.
• ROLAP (Relational OLAP): Uses relational databases; scalable but slower.
• HOLAP (Hybrid OLAP): Combines advantages of MOLAP and ROLAP.

26 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

Most Important Questions


8. State the importance of Operational Data Stores (ODS)
Ans)
• Stores current or near-real-time operational data.
• Acts as a staging area before loading data into the warehouse.
• Supports frequent updates and quick operational reporting.
• Helps in short-term decision-making.

9. List the types of Data Warehouses


Ans)
1. Enterprise Data Warehouse (EDW): Centralized warehouse for entire organization.
2. Operational Data Store (ODS): Stores current operational data.
3. Data Marts: Focused on specific departments or subjects like sales or finance.

10. Explain Data Warehousing Models


Ans)
• Enterprise Warehouse Model: Central warehouse for organization; integrated and consistent.
• Data Mart Model: Department-specific subset of the warehouse.
• Virtual Warehouse Model: Uses views on existing databases without physical storage.

11. Write short notes on Data Warehouse Design Approaches


Ans)
• Top-Down Approach (Inmon):
o Build Enterprise Warehouse first, then data marts.
o Advantages: Integrated and consistent.
o Disadvantages: Expensive and time-consuming.
• Bottom-Up Approach (Kimball):
o Start with Data Marts, integrate later.
o Advantages: Quick implementation.
o Disadvantages: Integration challenges.
• Hybrid Approach: Combines both methods for flexibility and speed.
27 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

12. Define Metadata and Data Mart


Ans)
• Metadata: Data about data; describes source, type, structure, and usage.
• Data Mart: Subset of a data warehouse focused on a specific business area; e.g., sales, finance.

13. List any four characteristics of OLAP


Ans)
• Multidimensional data analysis (cubes).
• Supports slice, dice, drill-down, roll-up operations.
• Provides aggregated and summarized data.
• Enables fast query response for analytics.

14. Differentiate between Data Mining and Data Warehousing


Ans)

Feature Data Warehousing Data Mining

Purpose Store and organize data Extract hidden patterns and knowledge

Function Data consolidation Knowledge discovery and analysis

Tools ETL, OLAP, Reporting Machine Learning, AI, Statistics

Example Sales data storage Customer buying behavior prediction

Important Questions
15. Explain the role of Metadata in a Data Warehouse
Ans)
• Describes source, structure, and meaning of data.
• Helps in query optimization and data lineage tracking.
• Assists users in understanding warehouse contents.

16. What is a Data Mart? Give an example


Ans)

28 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

• Definition: A subset of data warehouse for a specific business area.


• Example: Sales Data Mart for sales department analysis.

17. Write any three components of Data Warehouse Architecture


Ans)
1. Data Sources: Operational databases, flat files, external sources.
2. ETL Layer: Extract, transform, load data into warehouse.
3. OLAP Tools: For multidimensional analysis and reporting.

18. Write any three advantages of OLAP


Ans)
• Fast multidimensional data analysis.
• Supports complex queries and aggregations.
• Helps in trend analysis and business forecasting.

19. List any three uses of Data Warehousing


Ans)
• Decision support for management.
• Trend analysis and forecasting.
• Supports Data Mining and business intelligence applications.

20. Write short notes on Virtual Data Warehouse


Ans)
• Does not store data physically.
• Uses views on operational databases to access data.
• Provides a logical view of integrated data.

29 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

10 Marks Questions
Very Very Important Questions
1. Explain the architecture of a Data Warehouse with a neat diagram.
2. Explain the Three-Tier Data Warehouse Architecture in detail.
3. Explain ETL and ELT processes with examples and differences.

Very Important Questions


4. Explain the differences between Database and Data Warehouse with a suitable example.
5. Describe Data Warehouse Design Approaches (Top-Down, Bottom-Up, Hybrid) with advantages and
disadvantages.
6. Explain the types of Data Warehouses and their uses in an organization.
7. Explain OLAP in detail, including types, characteristics, and operations (slice, dice, roll-up, drill-
down).

Important Questions
8. Explain the importance of Operational Data Store (ODS) with examples.
9. Explain the differences between OLTP and OLAP with examples.
10. Differentiate Data Mining and Data Warehousing and explain their roles in decision-making.

30 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

Chapter-3- Introduction to Big Data

3.1 Define Big Data


• Definition:
Big Data refers to extremely large and complex datasets that cannot be
easily captured, stored, managed, or processed using traditional database
management systems or software tools.
• It involves structured, semi-structured, and unstructured data generated
at high speed from multiple sources.

3.2 Evolution of Big Data


• Traditional Data (Before 2000):
Data stored in databases (RDBMS). Mainly structured data like sales,
payroll, and inventory.
• Growth of Internet (2000–2010):
Explosion of web data, emails, social media, e-commerce, and sensor data.
Traditional systems became insufficient.
• Big Data Era (2010–Present):
Emergence of Hadoop, NoSQL, cloud storage, and advanced analytics tools
to handle data at massive scale in real time.

3.3 Challenges of Traditional System


1. Limited capacity for large datasets.
2. Inability to process unstructured data (videos, images, logs).
3. Poor scalability.
4. High cost of storage.
5. Slow processing speed.
6. Lack of real-time data analysis.
31 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

3.4 Three V’s of Big Data


1. Volume: Massive amount of data (terabytes, petabytes, exabytes).
2. Velocity: High speed of data generation and processing (real-time
streaming).
3. Variety: Different formats (structured, semi-structured, unstructured).
(Some sources also include extra V’s like Veracity, Value, Visualization, etc.)

3.5 Storing Big Data


• Methods of Storage:
o Distributed file systems (HDFS – Hadoop Distributed File System).
o Cloud storage (AWS S3, Azure Blob Storage, Google Cloud).
o NoSQL Databases (MongoDB, Cassandra, HBase).
• Characteristics:
o Scalable, fault-tolerant, and supports parallel storage.

3.6 How do you Select Big Data


When selecting big data for analysis, consider:
1. Relevance – Does the data solve the problem?
2. Quality – Is it accurate, consistent, and clean?
3. Volume & Variety – Large enough to provide insights.
4. Timeliness – Data should be recent or real-time.
5. Cost-effectiveness – Data collection and storage should not exceed budget.

32 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

3.7 Processing of Big Data


• Batch Processing: Analyzing large volumes of data at once (Hadoop
MapReduce).
• Real-time Processing: Handling data streams instantly (Apache Spark,
Apache Storm, Kafka).
• Steps:
1. Data Collection →
2. Data Cleaning →
3. Data Storage →
4. Data Processing →
5. Data Analysis →
6. Visualization.

3.8 Structures of Big Data


1. Structured Data: Organized in rows & columns (SQL databases,
spreadsheets).
2. Semi-Structured Data: Partially organized (XML, JSON, log files).
3. Unstructured Data: No fixed format (images, audio, video, emails, social
media).

3.9 State the Need of Big Data


• To handle massive data volumes.
• To analyze real-time trends and decision making.
• To improve business efficiency and innovation.
• To provide better customer insights.
33 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

• To detect fraud and security risks.

3.10 Sources of Big Data


1. Social media (Facebook, Twitter, Instagram).
2. Internet of Things (IoT devices, sensors).
3. Mobile applications.
4. Business transactions (e-commerce, banking).
5. Healthcare records.
6. Government & research data.
7. Machine logs and clickstream data.

3.11 Define Big Data Analytics


• Definition:
Big Data Analytics is the process of examining large datasets to uncover
hidden patterns, correlations, trends, and useful information for better
decision-making.

3.12 Types of Tools Used in Big Data


1. Storage Tools: HDFS, Amazon S3.
2. Processing Tools: Hadoop, Spark, Flink, Storm.
3. Database Tools: NoSQL (MongoDB, Cassandra, HBase).
4. Data Integration Tools: Talend, Apache NiFi.
5. Visualization Tools: Tableau, Power BI, QlikView.

3.13 Applications of Big Data


34 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

1. Healthcare: Disease prediction, patient monitoring.


2. Finance: Fraud detection, risk analysis.
3. E-commerce: Recommendation engines (Amazon, Flipkart).
4. Telecommunications: Customer behavior analysis.
5. Government: Smart cities, crime prediction.
6. Social Media: Trend analysis, targeted advertising.
7. Education: Student performance tracking.

3.14 Risks of Big Data


1. Data Privacy Issues – Sensitive data exposure.
2. Security Threats – Cyberattacks on large datasets.
3. Data Quality Problems – Inaccurate or incomplete data.
4. Cost of Infrastructure – Expensive storage/processing.
5. Legal & Compliance Issues – Violations of data protection laws (GDPR).

3.15 Intelligent Data Analysis


• Intelligent Data Analysis (IDA) uses AI, Machine Learning, and Data Mining
to automatically analyze big data and generate insights.
• Features:
o Predictive analytics.
o Pattern recognition.
o Real-time decision-making.
o Self-learning from data.

35 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

3.16 Difference Between Traditional and Big Data Approach


Feature Traditional Approach Big Data Approach

Data Type Structured only Structured, Semi-structured, Unstructured

Data Volume MBs to GBs TBs to PBs & beyond

Processing Single system Distributed systems (Hadoop, Spark)

Speed Batch only Batch + Real-time

Scalability Limited Highly scalable (cloud, clusters)

Tools RDBMS, SQL Hadoop, NoSQL, Spark, AI tools

Use Cases Small businesses Global enterprises, IoT, AI, ML

3 Marks Questions
3.1 Define Big Data
1. Define Big Data.
2. Give two examples of Big Data.
3.2 Evolution of Big Data
3. Write a short note on the evolution of Big Data.
4. Mention two stages in the evolution of Big Data.
3.3 Challenges of Traditional System
5. List any three challenges of traditional data systems.
6. Why are traditional systems not suitable for Big Data?
3.4 Three V’s of Big Data
7. State the three V’s of Big Data.
8. Explain any one V of Big Data with example.
3.5 Storing Big Data
9. Mention two methods used for storing Big Data.
10. What is HDFS in Big Data storage?
36 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

3.6 Selecting Big Data


11. What factors are considered while selecting Big Data?
12. Write any two qualities of good Big Data.
3.7 Processing of Big Data
13. Differentiate between batch processing and real-time processing.
14. Write the steps involved in Big Data processing.
3.8 Structures of Big Data
15. List three types of Big Data structures.
16. Give an example for unstructured data.
3.9 Need of Big Data
17. State any three needs of Big Data.
18. Why is Big Data important in decision-making?
3.10 Sources of Big Data
19. List any three sources of Big Data.
20. Give two examples of Big Data sources from social media.
3.11 Big Data Analytics
21. Define Big Data Analytics.
22. Mention two uses of Big Data Analytics.
3.12 Tools for Big Data
23. List any three tools used in Big Data.
24. Name two visualization tools used in Big Data.
3.13 Applications of Big Data
25. Mention any three applications of Big Data.
26. How is Big Data used in healthcare?
3.14 Risks of Big Data
27. List any three risks of Big Data.
28. Write two examples of data privacy issues in Big Data.
3.15 Intelligent Data Analysis
29. Define Intelligent Data Analysis.

37 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

30. Mention two features of Intelligent Data Analysis.


3.16 Traditional vs Big Data Approach
31. Differentiate between Traditional and Big Data in terms of data type.
32. Write any two differences between Traditional and Big Data approach.

Detailed Answers
3.1 Define Big Data
Q1. Define Big Data.
Ans)
1. Big Data refers to extremely large and complex datasets that cannot be processed or managed using
traditional database systems.
2. It includes structured, semi-structured, and unstructured data generated at high speed.
3. Used for analysis, predictions, and decision-making in businesses and research.
Q2. Give two examples of Big Data.
Ans)
1. Social media data (Facebook, Twitter posts, likes, comments).
2. Sensor data from IoT devices (temperature, traffic, smart homes).

3.2 Evolution of Big Data


Q3. Write a short note on the evolution of Big Data.
Ans)
1. Traditional Data (before 2000) – Structured data stored in databases.
2. Growth of Internet (2000–2010) – Web, emails, social media data increased.
3. Big Data Era (2010–present) – Use of Hadoop, NoSQL, cloud storage for massive datasets.
Q4. Mention two stages in the evolution of Big Data.
Ans)
1. Pre-Big Data era – Traditional structured databases (RDBMS).
2. Big Data era – Distributed storage, real-time analytics, unstructured data processing.

3.3 Challenges of Traditional System

38 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

Q5. List any three challenges of traditional data systems.


Ans)
1. Limited capacity to handle large datasets.
2. Cannot process unstructured data (images, videos, logs).
3. Poor scalability and high cost of storage.
Q6. Why are traditional systems not suitable for Big Data?
Ans)
1. Traditional systems cannot manage massive volume, variety, and velocity of data.
2. They process data slowly and cannot provide real-time insights.
3. Not designed for distributed or cloud-based storage systems.

3.4 Three V’s of Big Data


Q7. State the three V’s of Big Data.
Ans)
1. Volume – Large amount of data (TBs, PBs).
2. Velocity – High speed of data generation and processing.
3. Variety – Different types of data: structured, semi-structured, unstructured.
Q8. Explain any one V of Big Data with example.
Ans)
1. Volume – Refers to huge amounts of data generated every second.
2. Example: Social media platforms generate millions of posts, likes, and comments daily.

3.5 Storing Big Data


Q9. Mention two methods used for storing Big Data.
Ans)
1. Hadoop Distributed File System (HDFS) – Stores data across multiple nodes in a cluster.
2. Cloud Storage – Services like Amazon S3, Google Cloud Storage, and Azure Blob store large data.
Q10. What is HDFS in Big Data storage?
Ans)
1. HDFS stands for Hadoop Distributed File System.
2. It divides large data files into blocks and stores them across multiple nodes.

39 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

3. Provides fault tolerance and high scalability.

3.6 Selecting Big Data


Q11. What factors are considered while selecting Big Data?
Ans)
1. Relevance – Data must be related to the problem.
2. Quality – Accuracy and completeness of data.
3. Timeliness – Data should be current or real-time.
Q12. Write any two qualities of good Big Data.
Ans)
1. Accurate and consistent.
2. Large enough to provide meaningful insights.

3.7 Processing of Big Data


Q13. Differentiate between batch processing and real-time processing.
Ans)
1. Batch Processing – Analyzes large datasets in chunks; slower but efficient for large volumes (e.g.,
Hadoop MapReduce).
2. Real-time Processing – Processes data instantly as it arrives; useful for immediate insights (e.g.,
Apache Spark, Kafka).
Q14. Write the steps involved in Big Data processing.
Ans)
1. Data Collection
2. Data Cleaning
3. Data Storage
4. Data Processing
5. Data Analysis
6. Data Visualization

3.8 Structures of Big Data


Q15. List three types of Big Data structures.
Ans)
40 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

1. Structured Data – Organized in tables (SQL databases).


2. Semi-structured Data – Partially organized (JSON, XML).
3. Unstructured Data – No fixed format (videos, emails, social media posts).
Q16. Give an example for unstructured data.
Ans)
1. Social media posts
2. Images and video files

3.9 Need of Big Data


Q17. State any three needs of Big Data.
Ans)
1. Handle massive data volumes that traditional systems cannot.
2. Enable real-time insights and decision-making.
3. Improve business efficiency and customer understanding.
Q18. Why is Big Data important in decision-making?
Ans)
1. Identifies patterns and trends in large datasets.
2. Helps in predicting outcomes and reducing risks.
3. Supports evidence-based business strategies.

3.10 Sources of Big Data


Q19. List any three sources of Big Data.
Ans)
1. Social media (Facebook, Twitter, Instagram)
2. IoT devices and sensors
3. Business transactions (e-commerce, banking)
Q20. Give two examples of Big Data sources from social media.
Ans)
1. Tweets on Twitter
2. Facebook posts, likes, and comments

41 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

3.11 Big Data Analytics


Q21. Define Big Data Analytics.
Ans)
1. Process of examining large datasets to uncover hidden patterns, correlations, and trends.
2. Helps in predictive analytics and informed decision-making.
Q22. Mention two uses of Big Data Analytics.
Ans)
1. Fraud detection in finance.
2. Customer behavior analysis in e-commerce.

3.12 Tools for Big Data


Q23. List any three tools used in Big Data.
Ans)
1. Hadoop
2. Apache Spark
3. MongoDB
Q24. Name two visualization tools used in Big Data.
Ans)
1. Tableau
2. Power BI

3.13 Applications of Big Data


Q25. Mention any three applications of Big Data.
Ans)
1. Healthcare – disease prediction and patient monitoring
2. Finance – fraud detection and risk management
3. E-commerce – product recommendation engines
Q26. How is Big Data used in healthcare?
Ans)
1. Predicting diseases using patient data analysis.
2. Monitoring patient vitals in real-time.

42 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

3. Improving treatment plans and hospital management.

3.14 Risks of Big Data


Q27. List any three risks of Big Data.
Ans)
1. Data privacy issues
2. Security threats like hacking
3. Poor data quality
Q28. Write two examples of data privacy issues in Big Data.
Ans)
1. Unauthorized access to personal data
2. Sharing sensitive information without consent

3.15 Intelligent Data Analysis


Q29. Define Intelligent Data Analysis.
Ans)
1. Use of AI, Machine Learning, and Data Mining to automatically analyze Big Data.
2. Generates patterns, predictions, and useful insights without manual intervention.
Q30. Mention two features of Intelligent Data Analysis.
Ans)
1. Predictive analytics – forecasting future trends.
2. Real-time decision-making based on incoming data.

3.16 Traditional vs Big Data Approach


Q31. Differentiate between Traditional and Big Data in terms of data type.
Ans)
1. Traditional – Handles only structured data.
2. Big Data – Handles structured, semi-structured, and unstructured data.
Q32. Write any two differences between Traditional and Big Data approach.
Ans)
1. Processing – Traditional uses single systems; Big Data uses distributed systems.

43 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

2. Speed – Traditional processes slowly; Big Data supports batch and real-time processing.

10 Marks Questions
Very Very Important Questions (Must Prepare)
1. Explain the evolution of Big Data and describe the need for Big Data in modern organizations.
2. Discuss the three V’s of Big Data in detail with examples.
3. Explain the differences between Traditional and Big Data approaches with a suitable comparison
table.
4. Describe Big Data Analytics, its types, tools, and applications in real-life scenarios.

Very Important Questions


5. Explain the challenges of traditional data management systems and how Big Data overcomes
them.
6. Describe the different structures of Big Data (structured, semi-structured, unstructured) with
examples.
7. Explain how Big Data is stored and processed, including batch processing and real-time processing
techniques.

Important Questions
8. List and explain the sources of Big Data with suitable examples.
9. Explain the risks associated with Big Data and measures to overcome them.
10. Describe Intelligent Data Analysis, its features, and importance in decision-making.

44 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

Chapter-4- Big Data Analytics


4.1. Importance of Big Data Analytics
• Helps organizations extract meaningful insights from large and complex datasets.
• Improves decision-making by identifying trends, patterns, and correlations.
• Increases business efficiency by predicting customer behavior and market trends.
• Provides competitive advantage through real-time analytics.
• Useful in diverse domains such as healthcare, finance, education, e-commerce, government, and
manufacturing.
• Enables fraud detection, risk management, and operational optimization.

4.2. Big Data Life Cycle


The life cycle involves the following stages:
1. Data Collection – Gathering data from multiple sources (sensors, social media, logs, transactions).
2. Data Storage – Storing in databases or distributed systems (HDFS, cloud storage).
3. Data Processing – Cleaning, filtering, and transforming data for analysis.
4. Data Analysis – Applying statistical, mathematical, or machine learning techniques.
5. Data Visualization – Presenting results in charts, graphs, dashboards.
6. Decision Making – Using insights for business and strategic actions.

4.3. Methodology in Big Data Analytics


• Define Objective: Understand problem statement.
• Collect Data: Structured and unstructured.
• Data Cleaning: Remove duplicates, errors, inconsistencies.
• Data Integration: Combine data from multiple sources.
• Apply Analytics Tools/Models: Use statistical models or machine learning.
• Visualization: Graphical representation for better understanding.
• Interpret Results: Provide actionable insights.

4.4. Core Deliverables

45 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

• Processed Data (clean and usable form).


• Analytical Reports (insight documents).
• Predictive Models (for forecasting).
• Dashboards (real-time monitoring).
• Recommendations (based on data-driven decisions).

4.5. Key Stakeholders


• Business Managers – Define objectives, take decisions.
• Data Engineers – Handle data collection, storage, pipelines.
• Data Analysts – Perform statistical analysis and visualization.
• Data Scientists – Apply machine learning models.
• End Users/Customers – Benefit from improved services.
• IT Professionals – Manage infrastructure and security.

4.6. Responsibilities of Data Analyst


• Collecting and interpreting data.
• Cleaning and validating datasets.
• Using statistical techniques for analysis.
• Preparing dashboards, reports, and visualizations.
• Identifying trends and correlations.
• Assisting in decision-making with insights.

4.7. Basic Skills Necessary for Data Analyst


• Technical Skills: SQL, Excel, R, Python.
• Statistical Knowledge: Probability, hypothesis testing.
• Data Visualization: Tableau, Power BI, Matplotlib.
• Critical Thinking: Problem-solving and reasoning.
• Communication Skills: Presenting insights clearly.
• Domain Knowledge: Understanding business/industry context.

46 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

4.8. Importance of Data Scientist


• Develops predictive models using AI and ML.
• Extracts hidden insights from large datasets.
• Improves decision accuracy through data-driven approaches.
• Bridges the gap between raw data and actionable strategies.
• Plays a crucial role in innovation and automation.

4.9. Dealing with Big Data Analytic Project


4.9.1. Managing a Big Data Analytics Project
• Define goals clearly.
• Form multidisciplinary teams.
• Use Agile/iterative approach.
• Ensure proper infrastructure (cloud, Hadoop).
• Monitor data quality and security.
4.9.2. Problem Definition
• Identify business challenge.
• Translate problem into data questions.
4.9.3. Data Collection
• Gather data from databases, sensors, APIs, logs, IoT devices, social media.
4.9.4. Cleansing Data
• Remove duplicates, errors, inconsistencies.
• Handle missing values.
4.9.5. Summarizing
• Use statistical summaries (mean, median, variance).
• Provide descriptive insights.
4.9.6. Data Exploration
• Identify hidden patterns using EDA (Exploratory Data Analysis).
• Apply visualization and clustering techniques.

47 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

4.9.7. Data Visualization


• Represent results using charts, graphs, dashboards, infographics.
• Helps stakeholders easily understand insights.

4.10. Big Data Analytic Methods


4.10.1. Importance of SQL
• Essential for querying structured data.
• Used for filtering, joining, aggregating.
• Supports integration with other analytics tools.
4.10.2. Importance of Charts & Graphs
• Convert complex data into visual form.
• Make insights easy to interpret.
• Aid in faster decision-making.
4.10.3. Importance of Data Analysis Tools
• R Programming: Statistical computing, visualization.
• Python: Machine learning, AI, data manipulation.
• Julia: High-performance computing, numerical analysis.
• SPSS: Business statistics and social sciences analysis.
• MATLAB: Numerical computation, simulations.
• Octave: Open-source alternative to MATLAB.

4.11. Advanced Methods


4.11.1. Role of Machine Learning
• Automates prediction and classification.
• Improves accuracy with algorithms like regression, clustering, neural networks.
• Enables recommendation systems, fraud detection.
4.11.2. Association Rules
• Discover relationships between items (e.g., “If a customer buys bread, they also buy butter”).
• Used in market basket analysis.

48 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

4.11.3. Importance of Decision Trees


• Easy-to-understand model for classification and prediction.
• Handles both numerical and categorical data.
• Widely used in business decision-making.
4.11.4. Importance of Text Analytics
• Extracts meaning from textual data (emails, reviews, social media).
• Enables sentiment analysis, opinion mining.
• Useful in customer feedback analysis.

4.12. Big Data Technologies


4.12.1. Importance of NoSQL
• Handles unstructured and semi-structured data.
• Provides scalability and flexibility.
4.12.2. Advantages of NoSQL
• Schema-less design.
• High availability and fault tolerance.
• Handles large-scale distributed systems.
4.12.3. Importance of NewSQL
• Combines relational model (SQL) with scalability of NoSQL.
• Supports real-time analytics.
4.12.4. Advantages of NewSQL
• High performance for OLTP systems.
• Scalability without losing ACID properties.
• Easier migration from traditional SQL systems.

4.12.5. HADOOP
4.12.5.1. Advantages
• Open-source and cost-effective.
• Scalable and fault-tolerant.

49 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

• Supports storage and processing of huge data.


4.12.5.2. Features
• Distributed storage (HDFS).
• Batch processing (MapReduce).
• High fault tolerance.
• Scalability across commodity hardware.
4.12.5.3. Versions
• Hadoop 1.x: Basic HDFS + MapReduce.
• Hadoop 2.x: YARN for resource management.
• Hadoop 3.x: Improved fault tolerance, containerization, erasure coding.
4.12.5.4. Components
1. HDFS (Hadoop Distributed File System) – Storage.
2. MapReduce – Processing framework.
3. YARN (Yet Another Resource Negotiator) – Resource management.
4. Common Utilities – Libraries and support services.
4.12.5.5. Hadoop Architecture
• HDFS Layer: Stores large datasets across clusters.
• YARN Layer: Manages resources and scheduling.
• MapReduce Layer: Processes data in parallel.
• Application Layer: Tools like Hive, Pig, Spark for analysis.

3 Marks Questions
4.1 Importance of Big Data Analytics
1. State any three benefits of Big Data Analytics.
2. Why is Big Data Analytics important in business?
3. Mention two real-world applications of Big Data Analytics.

4.2 Big Data Life Cycle


4. List the stages of Big Data Life Cycle.

50 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

5. What is the purpose of Data Visualization in Big Data Life Cycle?


6. Define Data Analysis in Big Data Life Cycle.

4.3 Methodology in Big Data Analytics


7. What is the first step in Big Data Analytics methodology?
8. Mention three common steps followed in Big Data Analytics methodology.
9. State the importance of data cleaning in analytics methodology.

4.4 Core Deliverables


10. List any three deliverables of a Big Data Analytics project.
11. What is a predictive model in analytics?
12. Define dashboards in Big Data Analytics.

4.5 Key Stakeholders


13. List any three stakeholders in Big Data Analytics.
14. Who are Data Engineers?
15. Mention the role of IT professionals in Big Data projects.

4.6 Responsibilities of Data Analyst


16. State any three responsibilities of a Data Analyst.
17. What is the role of a Data Analyst in data cleaning?
18. Why does a Data Analyst prepare dashboards?

4.7 Skills for Data Analyst


19. List three basic technical skills necessary for a Data Analyst.
20. Mention two non-technical skills of a Data Analyst.
21. Why is statistical knowledge important for a Data Analyst?

4.8 Importance of Data Scientist

51 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

22. Define the role of a Data Scientist.


23. Mention two reasons why Data Scientists are important.
24. State one difference between a Data Analyst and a Data Scientist.

4.9 Dealing with Big Data Projects


25. What is the first step in managing a Big Data project?
26. Define problem definition in Big Data Analytics.
27. Why is data cleansing necessary in Big Data projects?
28. What is meant by data summarizing?
29. State the purpose of Exploratory Data Analysis (EDA).
30. Why is data visualization important in Big Data projects?

4.10 Big Data Analytic Methods


31. State the importance of SQL in Big Data Analytics.
32. Why are charts and graphs used in data analysis?
33. List any three tools used for data analysis.

4.11 Advanced Methods


34. State one role of Machine Learning in Data Analytics.
35. Define association rule with an example.
36. Mention two advantages of using Decision Trees.
37. What is text analytics used for?

4.12 Big Data Technologies


38. Define NoSQL and mention one advantage.
39. List any two advantages of NoSQL.
40. State the importance of NewSQL.
41. List any two advantages of NewSQL.
42. Mention three advantages of Hadoop.

52 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

43. List any three features of Hadoop.


44. State the three main versions of Hadoop.
45. List the four core components of Hadoop.
46. What is the function of YARN in Hadoop?
47. State the role of HDFS in Hadoop architecture.

Detailed Answers
4.1 Importance of Big Data Analytics
Q1. State any three benefits of Big Data Analytics.
Ans)
• Improves decision-making by providing insights from large datasets.
• Identifies trends and patterns for better business strategies.
• Enhances operational efficiency and reduces costs.
Q2. Why is Big Data Analytics important in business?
Ans)
• Helps in understanding customer behavior.
• Supports data-driven decision-making.
• Provides competitive advantage by predicting market trends.
Q3. Mention two real-world applications of Big Data Analytics.
Ans)
• Fraud detection in banking and finance.
• Recommendation systems in e-commerce platforms (e.g., Amazon, Netflix).

4.2 Big Data Life Cycle


Q4. List the stages of Big Data Life Cycle.
Ans)
• Data Collection
• Data Storage
• Data Processing
• Data Analysis

53 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

• Data Visualization
• Decision Making
Q5. What is the purpose of Data Visualization in Big Data Life Cycle?
Ans)
• Converts complex data into easy-to-understand visual formats.
• Helps stakeholders quickly interpret data insights.
• Supports faster decision-making.
Q6. Define Data Analysis in Big Data Life Cycle.
Ans)
• The process of examining, transforming, and modeling data.
• Extracts useful insights, trends, and patterns.
• Supports informed business decisions.

4.3 Methodology in Big Data Analytics


Q7. What is the first step in Big Data Analytics methodology?
Ans)
• Define the objective or problem statement clearly.
Q8. Mention three common steps followed in Big Data Analytics methodology.
Ans)
• Data Collection from various sources.
• Data Cleaning and preparation.
• Data Analysis using statistical or machine learning techniques.
Q9. State the importance of data cleaning in analytics methodology.
Ans)
• Removes errors, duplicates, and inconsistencies.
• Ensures accuracy and reliability of analysis.
• Improves quality of insights and predictions.

4.4 Core Deliverables


Q10. List any three deliverables of a Big Data Analytics project.
Ans)

54 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

• Processed and clean data.


• Analytical reports or insights.
• Predictive models or dashboards.
Q11. What is a predictive model in analytics?
Ans)
• A statistical or machine learning model used to forecast future outcomes.
• Helps in decision-making based on historical data.
Q12. Define dashboards in Big Data Analytics.
Ans)
• Visual interface displaying key metrics and insights.
• Helps in real-time monitoring and reporting.

4.5 Key Stakeholders


Q13. List any three stakeholders in Big Data Analytics.
Ans)
• Business Managers
• Data Engineers
• Data Analysts
Q14. Who are Data Engineers?
Ans)
• Professionals who design, build, and maintain data pipelines.
• Responsible for data storage, retrieval, and processing.
Q15. Mention the role of IT professionals in Big Data projects.
Ans)
• Manage infrastructure and hardware.
• Ensure security, availability, and scalability of data systems.

4.6 Responsibilities of Data Analyst


Q16. State any three responsibilities of a Data Analyst.
Ans)
• Collect and interpret data from multiple sources.

55 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

• Clean and validate datasets.


• Prepare dashboards and visualizations for insights.
Q17. What is the role of a Data Analyst in data cleaning?
Ans)
• Remove duplicates, errors, and inconsistencies.
• Handle missing or incomplete data.
• Ensure accuracy for analysis.
Q18. Why does a Data Analyst prepare dashboards?
Ans)
• To visually present key insights.
• To make data understandable for stakeholders.
• Supports decision-making with real-time metrics.

4.7 Skills for Data Analyst


Q19. List three basic technical skills necessary for a Data Analyst.
Ans)
• SQL for data querying.
• Excel for data manipulation.
• Programming in R or Python for analysis.
Q20. Mention two non-technical skills of a Data Analyst.
Ans)
• Critical thinking for problem-solving.
• Communication skills for presenting results.
Q21. Why is statistical knowledge important for a Data Analyst?
Ans)
• Helps in understanding data distributions and trends.
• Supports hypothesis testing and predictive modeling.
• Aids in making informed decisions.

4.8 Importance of Data Scientist

56 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

Q22. Define the role of a Data Scientist.


Ans)
• Applies statistical, programming, and machine learning techniques.
• Extracts insights from large datasets.
• Develops predictive and analytical models.
Q23. Mention two reasons why Data Scientists are important.
Ans)
• Automates data-driven decision-making.
• Extracts hidden patterns from large and complex datasets.
Q24. State one difference between a Data Analyst and a Data Scientist.
Ans)
• Data Analyst: Focuses on analyzing existing data.
• Data Scientist: Builds predictive models and handles advanced analytics.

4.9 Dealing with Big Data Projects


Q25. What is the first step in managing a Big Data project?
Ans)
• Clearly define objectives and project goals.
Q26. Define problem definition in Big Data Analytics.
Ans)
• Translate business problems into data questions.
• Specify what insights are required from the data.
Q27. Why is data cleansing necessary in Big Data projects?
Ans)
• Removes errors, duplicates, and inconsistencies.
• Ensures accuracy and reliability of analysis.
Q28. What is meant by data summarizing?
Ans)
• Providing statistical summaries like mean, median, and variance.
• Helps in understanding overall trends in data.
Q29. State the purpose of Exploratory Data Analysis (EDA).
Ans)
57 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

• Identify patterns, correlations, and outliers in data.


• Helps in understanding data distribution.
• Guides further modeling and analysis.
Q30. Why is data visualization important in Big Data projects?
Ans)
• Makes insights easy to understand.
• Supports faster and better decision-making.
• Communicates results to stakeholders effectively.

4.10 Big Data Analytic Methods


Q31. State the importance of SQL in Big Data Analytics.
Ans)
• Query and retrieve structured data efficiently.
• Perform filtering, joining, and aggregation of data.
• Supports integration with analytics tools.
Q32. Why are charts and graphs used in data analysis?
Ans)
• Convert complex data into visual form.
• Help identify trends and patterns quickly.
• Make data insights understandable for stakeholders.
Q33. List any three tools used for data analysis.
Ans)
• R Programming
• Python
• MATLAB

4.11 Advanced Methods


Q34. State one role of Machine Learning in Data Analytics.
Ans)
• Automates prediction and classification tasks.
• Improves accuracy using algorithms like regression and clustering.
58 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

Q35. Define association rule with an example.


Ans)
• Identifies relationships between items in datasets.
• Example: “If a customer buys bread, they also buy butter.”
Q36. Mention two advantages of using Decision Trees.
Ans)
• Easy to interpret and visualize.
• Can handle both numerical and categorical data.
Q37. What is text analytics used for?
Ans)
• Extract insights from textual data like emails, reviews, or social media.
• Used for sentiment analysis and opinion mining.

4.12 Big Data Technologies


Q38. Define NoSQL and mention one advantage.
Ans)
• Non-relational database for unstructured/semi-structured data.
• Advantage: Flexible schema design.
Q39. List any two advantages of NoSQL.
Ans)
• High scalability and performance.
• Fault-tolerant and distributed system support.
Q40. State the importance of NewSQL.
Ans)
• Combines relational SQL features with scalability of NoSQL.
• Supports real-time analytics and transactions.
Q41. List any two advantages of NewSQL.
Ans)
• Maintains ACID properties with high performance.
• Scalable for large data workloads.
Q42. Mention three advantages of Hadoop.
Ans)
59 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

• Open-source and cost-effective.


• Highly scalable and fault-tolerant.
• Processes large datasets efficiently.
Q43. List any three features of Hadoop.
Ans)
• Distributed storage using HDFS.
• Batch processing with MapReduce.
• High fault tolerance and scalability.
Q44. State the three main versions of Hadoop.
Ans)
• Hadoop 1.x: Basic HDFS and MapReduce.
• Hadoop 2.x: Introduced YARN for resource management.
• Hadoop 3.x: Enhanced fault tolerance, erasure coding, containerization.
Q45. List the four core components of Hadoop.
Ans)
• HDFS (Hadoop Distributed File System)
• MapReduce
• YARN (Yet Another Resource Negotiator)
• Common Utilities (libraries and support services)
Q46. What is the function of YARN in Hadoop?
Ans)
• Manages cluster resources.
• Schedules and monitors tasks.
• Ensures efficient utilization of resources.
Q47. State the role of HDFS in Hadoop architecture.
Ans)
• Provides distributed storage for large datasets.
• Splits data into blocks and stores across multiple nodes.
• Ensures fault tolerance and high availability.

60 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

10 Marks Questions
Very Very Important Questions (VVI)
1. Explain the Big Data Life Cycle in detail with a diagram.
2. Describe the methodology of Big Data Analytics and explain each step.
3. Explain Hadoop architecture, its components, and features in detail.
4. Discuss the role of Machine Learning in Big Data Analytics with examples.

Very Important Questions (VI)


5. Explain the responsibilities of a Data Analyst and the skills required.
6. Discuss the importance of Data Scientist and their role in Big Data projects.
7. Explain the steps involved in managing a Big Data Analytics project, including data collection,
cleansing, exploration, and visualization.

Important Questions (I)


8. Explain NoSQL and NewSQL databases, their importance, and advantages.
9. Describe Big Data Analytic methods, including the importance of SQL, charts, graphs, and data
analysis tools like R, Python, MATLAB, and SPSS.
10. Explain the concept of association rules, decision trees, and text analytics in Big Data, and their
applications.

61 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

Chapter-5- Cloud Computing


5.1 What is Cloud Computing
• Cloud Computing is a technology that allows users to access computing resources
such as servers, storage, applications, and services over the internet on a pay-as-you-
go basis.
• Instead of owning physical infrastructure, organizations can rent or subscribe to
resources from cloud service providers.
• It provides on-demand access, scalability, and flexibility.

5.2 Advantages of Cloud Computing


1. Cost Efficiency – Reduces hardware and maintenance cost.
2. Scalability – Resources can be increased or decreased easily.
3. Accessibility – Access from anywhere, anytime via the internet.
4. Disaster Recovery – Data backup and recovery are simplified.
5. Collaboration – Multiple users can work on shared resources.
6. Automatic Updates – Software and security updates managed by providers.

5.3 Disadvantages of Cloud Computing


1. Internet Dependency – Requires stable internet.
2. Security Issues – Data stored on third-party servers may be vulnerable.
3. Downtime – Service outages may affect availability.
4. Limited Control – Users rely on providers for updates and management.
5. Hidden Costs – Long-term costs may be higher than owning infrastructure.

5.4 Evolution of Cloud Computing


• 1960s – Concept of time-sharing in mainframes introduced.
• 1990s – Internet growth led to Application Service Providers (ASPs).

62 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

• 2000s – Amazon launched AWS (2006), popularizing cloud services.


• 2010s onwards – Widespread adoption of IaaS, PaaS, SaaS models with providers like
AWS, Azure, Google Cloud.

5.5 NIST Visual Model of Cloud Computing


According to NIST (National Institute of Standards and Technology):
• Five Essential Characteristics: On-demand self-service, Broad network access,
Resource pooling, Rapid elasticity, Measured service.
• Three Service Models: IaaS, PaaS, SaaS.
• Four Deployment Models: Private Cloud, Public Cloud, Hybrid Cloud, Community
Cloud.
(Draw diagram with three layers of service models and deployment models around them.)

5.6 Features of Cloud Computing


1. On-demand service
2. Elasticity and scalability
3. Pay-per-use model
4. Multi-tenancy
5. Security and reliability
6. Broad network access
7. Virtualization

5.7 Components of Cloud Computing


1. Front-end – Client side interface (browser, app).
2. Back-end – Servers, databases, storage, and virtualization.
3. Middleware – Bridges communication between front and back ends.
4. Cloud Network – Internet backbone enabling connectivity.

63 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

5. Cloud Storage – Databases and file storage systems.

5.8 Cloud Computing Technologies


1. Virtualization – Running multiple OS/applications on the same hardware.
2. Service-Oriented Architecture (SOA) – Software components as services.
3. Grid Computing – Distributed computing resources acting together.
4. Utility Computing – Pay-as-you-go resource model.
5. Autonomic Computing – Self-managing systems.

5.9 Service Models in Cloud Computing


1. IaaS (Infrastructure as a Service) – Virtualized hardware resources (e.g., AWS EC2).
2. PaaS (Platform as a Service) – Platforms for developing and deploying applications
(e.g., Google App Engine).
3. SaaS (Software as a Service) – Software applications delivered via the internet (e.g.,
Gmail, Office 365).

5.10 Comparison of Service Models

Feature IaaS PaaS SaaS

User Control High Medium Low

Example AWS EC2 Google App Engine Gmail

Focus Infrastructure Development End-user apps

Cost Pay for usage Subscription Subscription

5.11 Deployment Models (Types of Clouds)


1. Private Cloud – Dedicated to one organization.
2. Public Cloud – Shared infrastructure provided by third parties.

64 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

3. Hybrid Cloud – Combination of public and private.


4. Community Cloud – Shared by a group with common requirements.

5.12 Private Cloud vs Public Cloud

Aspect Private Cloud Public Cloud

Ownership Single organization Cloud provider

Security High Medium

Cost Expensive Cost-effective

Scalability Limited High

5.13 Traditional Data Center vs Cloud Storage

Feature Traditional Data Center Cloud Storage

Ownership Organization-owned Third-party

Cost High upfront Pay-per-use

Scalability Limited Flexible

Maintenance Manual Provider-managed

Accessibility Local Anywhere via internet

5.14 Data Management in Cloud (DBaaS)


• DBaaS (Database as a Service): A managed database service where users can access
and use databases without managing hardware or software.
• Examples: Amazon RDS, Google Cloud SQL, Microsoft Azure SQL Database.
• Features: Auto backup, high availability, scalability, and security.

5.15 Security Concepts in Cloud

65 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

1. Data Encryption – Securing data in transit and at rest.


2. Identity & Access Management (IAM) – Controlling user access.
3. Firewalls – Protecting cloud environment from external threats.
4. Intrusion Detection Systems – Monitoring unauthorized access.
5. Compliance – Meeting standards like GDPR, HIPAA.

5.16 Types of Cloud Simulators


1. CloudSim – Open-source framework for modeling and simulation.
2. GreenCloud – Focuses on energy-aware cloud simulation.
3. iCanCloud – Simulates cloud storage and virtualization.
4. GroudSim – Event-based simulation tool.

5.17 Importance of Cloud Simulators


• Allow testing of cloud applications before deployment.
• Reduce cost by avoiding real hardware setup.
• Help in research and development.
• Useful for teaching and training in cloud computing.
• Provide performance, cost, and energy efficiency evaluation.

3 Marks Questions
5.1 What is cloud computing
1. Define cloud computing.
2. Give two examples of cloud computing services.
5.2 Advantages of cloud computing
3. List any three advantages of cloud computing.
4. Why is cloud computing cost-effective?
5.3 Disadvantages of cloud computing

66 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

5. List any three disadvantages of cloud computing.


6. Why is cloud computing dependent on internet connectivity?
5.4 Evolution of cloud computing
7. Write a short note on the evolution of cloud computing.
8. Mention two milestones in the development of cloud computing.
5.5 NIST Visual Model of Cloud Computing
9. State the five essential characteristics of NIST model.
10. Name three service models in NIST cloud computing model.
5.6 Features of Cloud computing
11. List any three features of cloud computing.
12. What is meant by “Pay-per-use” feature in cloud computing?
5.7 Components of Cloud computing
13. List the main components of cloud computing.
14. What is the role of front-end in cloud computing?
5.8 Cloud computing technologies
15. List any three technologies used in cloud computing.
16. What is virtualization in cloud computing?
5.9 Service models in cloud computing
17. List three service models of cloud computing.
18. Give one example each of IaaS, PaaS, and SaaS.
5.10 Comparison of service models
19. Differentiate between IaaS and SaaS (any three points).
20. Which service model provides maximum control to the user?
5.11 Deployment models
21. List four types of cloud deployment models.
22. What is a hybrid cloud?
5.12 Private cloud vs Public cloud
23. Give three differences between private cloud and public cloud.
24. Which cloud is more secure: private or public? Why?

67 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

5.13 Traditional data center vs Cloud storage


25. List three differences between traditional data center and cloud storage.
26. Why is cloud storage more flexible than data centers?
5.14 Data Management in Cloud (DBaaS)
27. What is DBaaS? Give two examples.
28. List two advantages of DBaaS.
5.15 Security concepts in Cloud
29. List any three security concepts in cloud computing.
30. What is IAM in cloud security?
5.16 Types of cloud simulators
31. List any three cloud simulators.
32. What is CloudSim used for?
5.17 Importance of cloud simulators
33. State the importance of cloud simulators.
34. Why are cloud simulators useful in research?

Detailed Answers
5.1 What is cloud computing
Q1. Define cloud computing
Ans)
1. Cloud computing is a technology that delivers computing services over the internet.
2. It provides resources such as servers, storage, databases, software, and applications on demand.
3. Users can access these resources without owning physical infrastructure, paying only for what they
use.
Q2. Give two examples of cloud computing services
Ans)
1. Google Drive – for cloud storage and file sharing.
2. Amazon Web Services (AWS) – provides IaaS, PaaS, and SaaS solutions.

5.2 Advantages of cloud computing

68 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

Q3. List any three advantages of cloud computing


Ans)
1. Cost-effective – reduces hardware and maintenance costs.
2. Scalability – resources can be increased or decreased easily.
3. Accessibility – users can access data and applications from anywhere via the internet.
Q4. Why is cloud computing cost-effective?
Ans)
1. Eliminates the need to buy and maintain physical hardware.
2. Users pay only for the resources they consume (pay-per-use).
3. Reduces costs of software updates and IT staff maintenance.

5.3 Disadvantages of cloud computing


Q5. List any three disadvantages of cloud computing
Ans)
1. Requires stable internet connection.
2. Security and privacy risks as data is stored on third-party servers.
3. Limited control over infrastructure and software updates.
Q6. Why is cloud computing dependent on internet connectivity?
Ans)
1. Cloud services are accessed over the internet.
2. Without internet, users cannot access applications or stored data.
3. Service performance depends on internet speed and reliability.

5.4 Evolution of cloud computing


Q7. Write a short note on the evolution of cloud computing
Ans)
1. 1960s – Time-sharing in mainframes introduced.
2. 1990s – Application Service Providers (ASP) provided software via internet.
3. 2000s – Cloud services popularized by AWS and other providers.
4. 2010s onwards – Wide adoption of IaaS, PaaS, SaaS models.

69 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

Q8. Mention two milestones in the development of cloud computing


Ans)
1. Launch of Amazon Web Services (AWS) in 2006.
2. Introduction of Google App Engine as a PaaS platform.

5.5 NIST Visual Model of Cloud Computing


Q9. State the five essential characteristics of NIST model
Ans)
1. On-demand self-service – Users can automatically provision resources.
2. Broad network access – Accessible via the internet on multiple devices.
3. Resource pooling – Resources shared among multiple users.
4. Rapid elasticity – Resources can scale up or down quickly.
5. Measured service – Resource usage is monitored and billed accordingly.
Q10. Name three service models in NIST cloud computing model
Ans)
1. IaaS – Infrastructure as a Service
2. PaaS – Platform as a Service
3. SaaS – Software as a Service

5.6 Features of Cloud computing


Q11. List any three features of cloud computing
Ans)
1. On-demand self-service – Users can provision resources without human intervention.
2. Scalability – Resources can scale up or down as needed.
3. Multi-tenancy – Multiple users can share resources securely.
Q12. What is meant by “Pay-per-use” feature in cloud computing?
Ans)
1. Users are billed only for the resources they actually use.
2. Helps in cost savings as there is no need to invest in idle infrastructure.
3. Enables flexible budgeting for organizations.

70 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

5.7 Components of Cloud computing


Q13. List the main components of cloud computing
Ans)
1. Front-end – Client devices and interface to access cloud.
2. Back-end – Servers, storage systems, databases, virtualization software.
3. Middleware – Software to connect front-end and back-end.
4. Cloud Network – Internet connectivity enabling resource access.
5. Cloud Storage – Data storage and management systems.
Q14. What is the role of front-end in cloud computing?
Ans)
1. Front-end is the client-side interface to access cloud services.
2. Can include web browsers, mobile apps, or desktop applications.
3. Facilitates interaction between the user and cloud resources.

5.8 Cloud computing technologies


Q15. List any three technologies used in cloud computing
Ans)
1. Virtualization – Allows multiple virtual systems on a single physical machine.
2. Service-Oriented Architecture (SOA) – Software delivered as reusable services.
3. Grid Computing – Distributed computing resources working together.
Q16. What is virtualization in cloud computing?
Ans)
1. Virtualization creates virtual versions of physical resources like servers or storage.
2. Allows multiple operating systems and applications to run on a single hardware.
3. Improves resource utilization and reduces hardware costs.

5.9 Service models in cloud computing


Q17. List three service models of cloud computing
Ans)
1. IaaS – Infrastructure as a Service
2. PaaS – Platform as a Service
71 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

3. SaaS – Software as a Service


Q18. Give one example each of IaaS, PaaS, and SaaS
Ans)
1. IaaS – Amazon EC2
2. PaaS – Google App Engine
3. SaaS – Gmail

5.10 Comparison of service models


Q19. Differentiate between IaaS and SaaS (any three points)
Ans)
1. IaaS provides virtualized hardware; SaaS provides software applications.
2. IaaS gives high control to users; SaaS gives low control.
3. IaaS is used by IT administrators/developers; SaaS is used by end-users.
Q20. Which service model provides maximum control to the user?
Ans)
1. IaaS provides maximum control.
2. Users can manage operating systems, storage, and deployed applications.
3. Suitable for organizations needing full control over infrastructure.

5.11 Deployment models


Q21. List four types of cloud deployment models
Ans)
1. Private Cloud
2. Public Cloud
3. Hybrid Cloud
4. Community Cloud
Q22. What is a hybrid cloud?
Ans)
1. A combination of private and public cloud.
2. Allows critical data to stay in private cloud and less-sensitive data on public cloud.
3. Provides flexibility, cost efficiency, and better resource utilization.
72 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

5.12 Private cloud vs Public cloud


Q23. Give three differences between private cloud and public cloud
Ans)
1. Private cloud is owned by a single organization; public cloud is shared.
2. Security is higher in private cloud; moderate in public cloud.
3. Cost is higher in private cloud; lower in public cloud.
Q24. Which cloud is more secure: private or public? Why?
Ans)
1. Private cloud is more secure.
2. Dedicated resources reduce risk of data leakage.
3. Organization controls security policies and access.

5.13 Traditional data center vs Cloud storage


Q25. List three differences between traditional data center and cloud storage
Ans)
1. Traditional data center is organization-owned; cloud storage is third-party.
2. Scalability is limited in data centers; cloud storage is highly scalable.
3. Maintenance is manual in data centers; managed by provider in cloud.
Q26. Why is cloud storage more flexible than data centers?
Ans)
1. Resources can be scaled up or down on demand.
2. Users can access data from anywhere via the internet.
3. Reduces need for physical hardware and manual upgrades.

5.14 Data Management in Cloud (DBaaS)


Q27. What is DBaaS? Give two examples
Ans)
1. DBaaS (Database as a Service) provides managed database solutions on cloud.
2. Users do not manage hardware or software; provider handles maintenance.
3. Examples: Amazon RDS, Google Cloud SQL
73 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

Q28. List two advantages of DBaaS


Ans)
1. Automatic backup and recovery of data.
2. Easy scalability without hardware investment.

5.15 Security concepts in Cloud


Q29. List any three security concepts in cloud computing
Ans)
1. Data Encryption – Protects data in transit and at rest.
2. Identity & Access Management (IAM) – Controls user access.
3. Firewalls – Prevent unauthorized access to cloud resources.
Q30. What is IAM in cloud security?
Ans)
1. IAM stands for Identity and Access Management.
2. It manages user authentication and authorization.
3. Ensures only authorized users can access resources.

5.16 Types of cloud simulators


Q31. List any three cloud simulators
Ans)
1. CloudSim
2. GreenCloud
3. iCanCloud
Q32. What is CloudSim used for?
Ans)
1. CloudSim is used to simulate cloud computing environments.
2. Helps in modeling and testing cloud applications before deployment.
3. Reduces cost and effort of real hardware setup.

74 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

5.17 Importance of cloud simulators


Q33. State the importance of cloud simulators
Ans)
1. Enable testing of cloud applications without real infrastructure.
2. Useful for research and development of cloud systems.
3. Helps evaluate performance, cost, and energy efficiency.
Q34. Why are cloud simulators useful in research?
Ans)
1. Allow experimentation with different cloud scenarios safely.
2. Reduce need for expensive hardware and maintenance.
3. Provide insights into cloud system behavior and optimization.

10 Marks Questions
Very Very Important Questions (VVIQ)
1. Explain the evolution of cloud computing with a diagram.
2. Draw and explain the NIST visual model of cloud computing.
3. Explain different service models (IaaS, PaaS, SaaS) of cloud computing with examples and
comparison.

Very Important Questions (VIQ)


4. Explain different deployment models of cloud computing with advantages and disadvantages.
5. Compare traditional data center and cloud storage with a neat table.
6. Explain the components of cloud computing with a neat diagram.
7. Explain data management in cloud (DBaaS) and its benefits.

Important Questions (IQ)


8. Explain features and advantages of cloud computing.
9. Explain cloud security concepts and their importance in detail.
10. List and explain cloud computing technologies and their roles.

75 | P a g e BSDVP ACADEMY
BSDVP ACADEMY

76 | P a g e BSDVP ACADEMY

You might also like