0% found this document useful (0 votes)
41 views19 pages

Adbms Notes

Uploaded by

Aditya Chavan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views19 pages

Adbms Notes

Uploaded by

Aditya Chavan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

(a) What is NoSQL database? Explain two types of NoSQL databases.

[6]
NoSQL Database:
NoSQL (Not Only SQL) databases are non-relational databases designed to handle large
volumes of unstructured, semi-structured, or structured data. They provide flexible schemas,
horizontal scalability, and high performance, making them suitable for big data and real-time
web applications.
Two Types of NoSQL Databases:
1. Document-Oriented Databases:
o Store data in the form of documents (usually JSON, BSON, or XML).
o Each document is a self-contained unit of data with a flexible schema.
o Example: MongoDB
o Use Case: Content management systems, real-time analytics.
2. Key-Value Stores:
o Store data as a collection of key-value pairs.
o Keys are unique identifiers; values can be any type of data.
o Example: Redis, Riak
o Use Case: Caching, session management, user preference storage.

(b) Compare Relational and NoSQL Databases. [6]

Feature Relational Databases (RDBMS) NoSQL Databases

Data Model Table-based (rows and columns) Document, key-value, graph, column

Schema Fixed schema Dynamic or flexible schema

Scalability Vertically scalable Horizontally scalable

May not fully support ACID; use


Transactions Support ACID properties
BASE

Query Varies (MongoDB uses JSON-like


SQL
Language query)

Structured data with complex


Best for Unstructured or semi-structured data
queries
Feature Relational Databases (RDBMS) NoSQL Databases

Examples MySQL, PostgreSQL, Oracle MongoDB, Redis, Cassandra, Neo4j

(c) What is JSON? Explain features and data types of JSON. [8]
JSON (JavaScript Object Notation):
JSON is a lightweight data interchange format that is easy for humans to read and write and
easy for machines to parse and generate. It is commonly used for transmitting data in web
applications.
Features of JSON:
1. Lightweight: Minimal format, ideal for data exchange.
2. Readable: Human-readable and easy to understand.
3. Language Independent: Though derived from JavaScript, supported by many
languages.
4. Structured Data: Stores data in key-value pairs.
5. Supports Nesting: Allows arrays and objects to be nested within each other.
Data Types in JSON:
1. String: Text enclosed in double quotes.
Example: "name": "Alice"
2. Number: Integer or floating-point number.
Example: "age": 25
3. Boolean: True or false values.
Example: "isStudent": true
4. Array: Ordered list of values.
Example: "hobbies": ["reading", "gaming"]
5. Object: Collection of key-value pairs.
Example: "address": {"city": "Pune", "pin": 411001}
6. Null: Represents empty or unknown value.
Example: "middleName": null
(a) Explain features of MongoDB. [6]
MongoDB is a popular NoSQL document-oriented database that stores data in JSON-like
documents with dynamic schemas. Here are its key features:
1. Document-Oriented Storage:
o Data is stored in flexible, JSON-like documents (BSON format), making it easy
to map to objects in applications.
2. Schema-less:
o No fixed schema; different documents in the same collection can have different
structures.
3. Scalability:
o Supports horizontal scaling through sharding, allowing it to handle large
amounts of data and high throughput.
4. Indexing:
o Supports indexing on any field, including compound, geospatial, and text
indexes for fast queries.
5. Aggregation Framework:
o Provides powerful aggregation operations (similar to SQL GROUP BY) for data
processing and analysis.
6. High Availability:
o Supports replication through replica sets, ensuring redundancy and automatic
failover.
(b) Differentiate between Apache Cassandra and MongoDB. [6]

Feature MongoDB Apache Cassandra

Wide-column store (table-like with


Data Model Document-based (BSON)
rows/columns)

Semi-structured, predefined columns


Schema Flexible and dynamic schema
required

Query
JSON-like query language CQL (Cassandra Query Language)
Language

Good horizontal scaling (via Excellent horizontal scaling (peer-to-


Scalability
sharding) peer)

Replica sets (primary-secondary


Replication Built-in replication with no master node
model)

Use Cases Content management, analytics IoT, time-series data, messaging systems
(c) What is XML? Why is XML important? What are the benefits of using XML? [8]
XML (eXtensible Markup Language):
XML is a markup language used to store and transport data. It defines a set of rules for
encoding documents in a format that is both human-readable and machine-readable.
Importance of XML:
1. Data Sharing: XML enables data sharing between systems and platforms, especially
in web services (e.g., SOAP).
2. Platform Independent: XML can be used across different systems and technologies.
3. Custom Tags: Users can define their own tags to describe data, making XML highly
flexible.
Benefits of Using XML:
1. Self-Descriptive Structure: Tags describe the data, making it easy to understand.
2. Hierarchical Data Representation: Supports nested elements, ideal for complex
data.
3. Separation of Data and Presentation: Helps in keeping content separate from design
(used in XML + XSLT).
4. Data Validation: Can be validated using DTD (Document Type Definition) or XML
Schema.
5. Widely Supported: Supported by many programming languages, tools, and
platforms.
6. Human and Machine Readable: XML files are readable by both humans and
software.
Unit 4
Q3) a) Draw and explain Data Warehouse architecture. [8]

Explanation:
1. Data Sources:
o Includes operational databases, external sources, and flat files.
o These are the origin points of raw data.
2. Data Staging Area (ETL):
o Extract: Data is collected from various sources.
o Transform: Data is cleaned and transformed into a suitable format.
o Load: Transformed data is loaded into the data warehouse.
3. Data Storage:
o Central data repository stores historical data.
o May include Data Marts for department-specific access.
4. OLAP Servers:
o Process data for complex queries, analysis, and multidimensional views.
5. End Users/BI Tools:
o Business analysts and decision-makers use dashboards, reports, and queries for
insights.
(b) What is OLAP? Explain different types of OLAP in detail. [8]
OLAP (Online Analytical Processing):
OLAP is a category of data processing that enables users to interactively analyze
multidimensional data from multiple perspectives. It supports complex analytical and ad hoc
queries with rapid execution time.
Types of OLAP:
1. MOLAP (Multidimensional OLAP):
o Stores data in multidimensional cubes.
o Data is pre-aggregated and stored in optimized formats.
o Advantages:
▪ Fast query performance due to pre-computed aggregates.
▪ Good for low-latency analytics.
o Disadvantages:
▪ Not suitable for large datasets.
o Example Tools: IBM Cognos, Microsoft Analysis Services.
2. ROLAP (Relational OLAP):
o Uses relational databases to store and process OLAP data.
o Aggregates are computed at query time.
o Advantages:
▪ Can handle large amounts of data.
▪ Uses standard SQL.
o Disadvantages:
▪ Slower performance compared to MOLAP.
o Example Tools: Oracle OLAP, MicroStrategy.
3. HOLAP (Hybrid OLAP):
o Combines features of MOLAP and ROLAP.
o Detailed data is stored in relational databases (ROLAP), while aggregated data
is stored in cubes (MOLAP).
o Advantages:
▪ Balance of performance and scalability.
o Disadvantages:
▪ More complex architecture.
o Example Tools: Microsoft SQL Server Analysis Services (SSAS).
(a) Explain components of Data Warehouse. Explain Star Schema in detail with
example. [8]
Components of a Data Warehouse:
1. Data Sources:
o Include operational databases, flat files, external data, etc.
o Provide raw data for analysis.
2. ETL (Extract, Transform, Load) Tools:
o Extract data from multiple sources.
o Transform data into the desired format (cleaning, filtering).
o Load the data into the warehouse.
3. Data Warehouse Storage:
o Central repository for integrated, historical data.
o May contain Data Marts for department-specific analysis.
4. Metadata:
o Data about data.
o Describes source, usage, and structure of warehouse data.
5. OLAP Engine:
o Enables fast querying and multidimensional analysis.
o Supports operations like drill-down, roll-up, slicing, and dicing.
6. Front-End Tools:
o Used by analysts and end-users for reporting, dashboards, and decision-making.

Star Schema:
A Star Schema is a type of data warehouse schema that organizes data into fact and
dimension tables. It resembles a star, with the fact table at the center and dimension tables
surrounding it.
Structure:
• Fact Table: Contains measurable data (e.g., sales, profit).
• Dimension Tables: Contain descriptive attributes (e.g., customer, time, product).
Example:
Fact Table: Sales_Fact

Sale_ID Product_ID Customer_ID Date_ID Sales_Amount

Dimension Tables:
Product_Dim
| Product_ID | Product_Name | Category |
Customer_Dim
| Customer_ID | Customer_Name | Region |
Date_Dim
| Date_ID | Date | Month | Year |
Advantages:
• Simple design.
• Easy to understand and navigate.
• Efficient for querying large data.

(b) Write a short note on: [8]


i) Decision Support System (DSS):
A Decision Support System is a computer-based information system that supports business
or organizational decision-making activities. It helps in analyzing large volumes of data to
make strategic decisions.
Key Features:
• Interactive and user-friendly.
• Supports what-if analysis and simulations.
• Integrates data from multiple sources.
• Used by middle and upper-level management.
Examples: Sales forecasting, budget planning, resource allocation tools.

ii) Snowflake Schema:


A Snowflake Schema is a more complex version of the star schema where dimension tables
are normalized into multiple related tables.
Characteristics:
• Dimension tables are split into sub-dimensions.
• Reduces data redundancy but increases complexity.
• Requires more joins in queries, which can slow performance.
Example:
Product_Dim split into:
• Product → Category → Department
Advantages:
• More normalized and space-efficient.
• Better data integrity.
Disadvantages:
• Slower query performance.
• More complex design and maintenance.
Unit 5
(a) What is KDD? Explain KDD seven-step process in detail. [8]
KDD (Knowledge Discovery in Databases):
KDD is the overall process of discovering useful knowledge from large volumes of data. It
involves preparing data, selecting the right algorithms, and interpreting patterns to gain
insights.
Seven-Step Process of KDD:
1. Data Cleaning:
o Removes noise and corrects inconsistencies or missing values in the data.
o Ensures quality and accuracy of the dataset.
2. Data Integration:
o Combines data from multiple sources (databases, files, etc.).
o Creates a unified view of the data.
3. Data Selection:
o Selects relevant data needed for the analysis.
o Filters unnecessary or irrelevant data.
4. Data Transformation:
o Converts data into suitable formats (e.g., normalization, aggregation).
o Makes data more meaningful for mining.
5. Data Mining:
o Core step where intelligent methods (e.g., classification, clustering) are used to
extract patterns.
o Applies algorithms to discover hidden insights.
6. Pattern Evaluation:
o Identifies truly interesting patterns based on some measures (like accuracy,
usefulness).
o Eliminates irrelevant or redundant patterns.
7. Knowledge Representation:
o Presents the discovered knowledge using visualization, reports, or graphs.
o Helps users understand and use the results effectively.
(b) Explain benefits of Data Mining. Explain any two applications of Data Mining in
detail. [8]
Benefits of Data Mining:
1. Improved Decision-Making:
o Helps organizations make data-driven decisions by discovering hidden patterns.
2. Cost Reduction:
o Identifies inefficiencies in operations or customer behavior to optimize
resources.
3. Customer Insights:
o Reveals customer buying patterns, preferences, and behavior for targeted
marketing.
4. Fraud Detection:
o Detects unusual patterns to prevent financial fraud and identity theft.
5. Market Basket Analysis:
o Helps in product placement and cross-selling by analyzing purchase data.
Two Applications of Data Mining:
1. Healthcare:
• Use: Predicting disease outbreaks, patient diagnosis, and treatment plans.
• Example: Using patient records to identify high-risk individuals for heart disease.
• Benefit: Improves patient care and reduces healthcare costs.
2. E-commerce/Retail:
• Use: Customer segmentation, personalized recommendations, and inventory planning.
• Example: Amazon suggests products based on previous purchases and browsing
behavior.
• Benefit: Increases customer satisfaction and boosts sales.
Q6 (a) Draw and explain architecture of Data Mining. [8]

Explanation of Components:
1. Data Sources:
o Includes databases, flat files, data warehouses, etc.
o Raw data is collected from different sources.
2. Data Warehouse:
o Centralized storage system that holds integrated data.
3. Data Cleaning & Integration:
o Removes noise, handles missing values, and integrates data into a consistent
format.
4. Data Mining Engine:
o Core component that applies algorithms to extract patterns.
o Performs classification, clustering, association, etc.
5. Pattern Evaluation Module:
o Filters interesting and useful patterns based on measures like accuracy, support,
or confidence.
6. Knowledge Base:
o Contains background knowledge, rules, constraints, and user preferences.
o Assists the mining process.
7. User Interface/Visualization:
o Helps users interact with the system and visualize the discovered patterns in
readable formats like charts or graphs.
Q6 (b) Explain predictive and descriptive algorithms in Data Mining. [8]
1. Predictive Algorithms:
These algorithms predict unknown or future values based on known data. They are often
used for classification and regression tasks.
Examples:
• Classification: Predicts categorical labels.
o Algorithms: Decision Trees, Naïve Bayes, Random Forest.
o Example: Predicting whether a customer will buy a product (Yes/No).
• Regression: Predicts continuous numeric values.
o Algorithms: Linear Regression, Support Vector Regression.
o Example: Predicting house prices based on size and location.
Use Cases:
• Credit scoring
• Sales forecasting
• Medical diagnosis
2. Descriptive Algorithms:
These algorithms describe patterns or relationships in existing data without predicting
future outcomes. Used for summarization, association, and clustering.
Examples:
• Clustering: Groups similar data points.
o Algorithms: K-Means, DBSCAN
o Example: Grouping customers based on purchasing behavior.
• Association Rule Mining: Finds relationships between items.
o Algorithms: Apriori, FP-Growth
o Example: If a customer buys bread, they are likely to buy butter.
• Summarization: Provides compact descriptions of datasets.
o Example: Average customer age per region.
Use Cases:
• Market basket analysis
• Customer segmentation
• Pattern discovery in large datasets
Unit 6
Q7 a) Compare Spatial and Temporal Databases. [6]

Feature Spatial Database Temporal Database

Stores and queries data related to Stores and manages data involving
Definition
space or location. time aspects.

Geometric types (points, lines, Time-related types (date, time,


Data Type
polygons, etc.) timestamp, intervals)

Manages geographical or spatial


Main Purpose Tracks changes in data over time.
features and relationships.

Query Find all restaurants within 2 km Show salary history of an employee


Example radius. over the last 5 years.

Key Spatial joins, range queries, nearest Temporal queries like valid time,
Operations neighbor. transaction time queries.

HR systems, historical records,


Applications GIS, GPS navigation, urban planning.
medical records.

Q7 b) Explain Geographical Information Systems in detail with examples. [6]


Geographical Information System (GIS):
A GIS is a computer-based system used for capturing, storing, analyzing, and displaying
spatial or geographic data.
Key Components:
1. Hardware: Computers, GPS devices, scanners.
2. Software: Tools like ArcGIS, QGIS for processing spatial data.
3. Data: Spatial data (maps, satellite images) and attribute data (population, roads).
4. People: GIS analysts, engineers, decision-makers.
5. Methods: Techniques for collecting, analyzing, and interpreting geospatial data.
Functions of GIS:
• Map creation and visualization
• Spatial analysis (e.g., proximity, overlay, buffering)
• Geocoding (assigning coordinates to addresses)
• Querying and reporting spatial data
Examples of GIS Applications:
• Urban Planning: Planning city infrastructure using population and land use data.
• Disaster Management: Mapping flood zones, tracking wildfires.
• Environmental Monitoring: Monitoring deforestation or pollution areas.
• Agriculture: Soil analysis, crop planning using satellite imagery.

Q7 c) Explain Deductive Database in detail with example. [6]


Deductive Database:
A Deductive Database is a type of database system that combines traditional databases with
logic-based inference rules to derive new information from stored facts.
Key Features:
• Uses facts (data) and rules (logic) to infer new data.
• Based on logic programming, often using languages like Datalog or Prolog.
• Supports recursive queries and reasoning capabilities.
Structure:
1. Facts: Base data (like a relational database).
2. Rules: Logical conditions used to infer new facts.
3. Queries: Can return both explicit and inferred data.
Benefits:
• Automates complex logic and relationships.
• Useful in knowledge-based systems and AI applications.
Applications:
• Expert systems
• Natural language understanding
• Semantic web and ontology reasoning
Q8 a) Explain Multimedia Databases in detail with examples. [6]
Multimedia Database (MMDB):
A Multimedia Database stores and manages multimedia data types such as text, images,
audio, video, and animation in an organized and searchable format.
Key Features:
• Supports various data types: images (JPEG, PNG), audio (MP3, WAV), video (MP4,
AVI), etc.
• Enables efficient storage, retrieval, indexing, and querying of multimedia content.
• Often uses content-based retrieval, e.g., searching by color, shape, or sound pattern.
Components:
1. Media Data – Actual multimedia content (e.g., photo or audio file).
2. Metadata – Describes the media (e.g., file name, date, tags).
3. Feature Data – Color, shape, texture for images; pitch, tone for audio.
Examples of Multimedia DB Applications:
• YouTube: Stores and retrieves videos based on keywords, tags, or views.
• Spotify: Manages large-scale music/audio libraries searchable by genre or artist.
• Digital Libraries: Store scanned documents, images, and video lectures.
• Medical Imaging: Stores X-rays, MRIs, and ultrasound videos for diagnostics.

Q8 b) What are Active Databases? Elaborate with example. [6]


Active Database:
An Active Database is a database that responds automatically to certain events or conditions
using rules, typically called ECA rules (Event-Condition-Action).
ECA Rule Structure:
• Event: A change or operation (e.g., insert, delete, update).
• Condition: A test or filter that must be true.
• Action: A task performed if the condition is true.
Explanation:
• Event: Update on Accounts table.
• Condition: New balance < 0.
• Action: Insert a warning into Alerts table.
Applications:
• Fraud detection
• Inventory restocking alerts
• Real-time monitoring in security systems
• Automated workflow management
Q8 c) Explain Mobile Databases in detail with examples. [6]
Mobile Database:
A Mobile Database is a database that is accessible from mobile devices like smartphones,
tablets, or laptops, regardless of the user's physical location.
Key Features:
• Supports offline access and synchronization when online.
• Handles limited storage and power of mobile devices.
• Ensures data consistency and security during mobility.
Types:
1. Client-side DB: Stored locally on the mobile device (e.g., SQLite in Android apps).
2. Server-side DB: Resides on the cloud/server and accessed via mobile network.
3. Distributed DB: Combines local and remote databases with synchronization.
Examples:
• WhatsApp: Stores recent chats locally and syncs with cloud backups.
• Google Maps: Allows offline downloading of maps (local DB) and syncs routes
online.
• Mobile Banking Apps: Access and update user account data from mobile devices.
Challenges:
• Connectivity issues
• Data synchronization
• Security of sensitive data

You might also like