DBMS : Assignment - submission date (27/6/25)
Explain the concept of Big Data in the context of DBMS. Discuss its characteristics, challenges,
and how traditional DBMS differs from Big Data systems. Also, briefly describe the technologies
used to manage Big Data.
Introduction to Big Data in DBMS
It processes a huge amount of structured, semi-structured, and unstructured data to extract
insight meaning, from which one pattern can be designed that will be useful to take a decision
for grabbing the new business opportunity, the betterment of product/service, and ultimately
business growth. Data science process to make sense of Big data/huge amount of data that is
used in business.
Characteristics of Big Data (The 5 V's)
1. Volume:
Refers to the massive size of data generated daily from multiple sources such as social
media, IoT devices, and transactions.
2. Velocity:
Describes the speed at which new data is generated and needs to be processed in real
time (e.g., social feeds, sensor data).
3. Variety:
Data comes in various formats – structured (tables), semi-structured (XML, JSON), and
unstructured (images, videos, logs).
4. Veracity:
Ensures that the data is trustworthy and accurate, despite inconsistencies or
incompleteness in raw data.
5. Value:
The ability to derive meaningful insights and business value from the data collected.
Types of Big Data
Type Description Examples
Data organized in rows and columns, easy Databases, spreadsheets, online
Structured
to query using SQL. transaction logs
Semi- Data with some organizational properties
XML, JSON, metadata, NoSQL
Structured but no fixed schema.
Data without a pre-defined model or Images, videos, text files, social
Unstructured
structure. media posts
Sources of Big Data
Big Data is generated from multiple sources.
1. Social Media Platforms: Data from posts, likes, comments, shares on Facebook, Twitter,
Instagram, etc.
2. Sensor-Generated Data: Environmental data (temperature, humidity) and surveillance
from traffic or security cameras.
3. Customer Feedback: Reviews and ratings on platforms like Amazon, Flipkart, Myntra, and
service-based sectors.
4. IoT Devices: Smart TVs, ACs, refrigerators, and other devices sending real-time usage and
control data.
5. E-Commerce & Online Transactions: Banking records, shopping history, and digital
payment logs.
6. GPS Devices: Location tracking data from smartphones and vehicles for route
optimization and movement monitoring.
7. Transactional Data: Data from purchases, invoices, receipts, including date, time, items,
and payment methods.
8. Machine-Generated Data: Logs and system data from servers, industrial machines,
wearable devices, and satellites.
Challenges in Big Data Management
1. Storage & Scalability: Traditional systems cannot scale to petabyte-level data storage.
2. Data Integration: Combining data from various formats and sources.
3. Real-time Processing: Need for instant processing in areas like fraud detection or live
analytics.
4. Data Quality: Managing errors, inconsistencies, and duplications in large datasets.
5. Security & Privacy: Protecting sensitive and personal information.
6. Analysis Complexity: Requires advanced tools and skills for meaningful insights.
Difference: Traditional DBMS vs Big Data Systems
Feature Traditional DBMS Big Data Systems
Data Type Structured data only Structured, Semi-structured, Unstructured
Scalability Limited, vertical scaling Massive, horizontal scaling
Storage Centralized Distributed across clusters
Processing Batch processing Real-time and batch
Technology Used RDBMS (e.g., MySQL, Oracle) Hadoop, Spark, NoSQL, etc.
Query Language SQL NoSQL, MapReduce, HiveQL, etc.
Flexibility Rigid schema Schema-on-read
Technologies Used to Manage Big Data
1. Hadoop
o Open-source framework for distributed storage and processing of large datasets
using the MapReduce programming model.
2. Apache Spark
o Fast data processing engine for real-time and batch analytics.
3. NoSQL Databases
o Examples: MongoDB, Cassandra, CouchDB
o Handle semi-structured and unstructured data with flexibility and scalability.
4. Hive & Pig
o Tools on top of Hadoop for querying (Hive uses SQL-like language).
5. Apache Kafka
o Distributed streaming platform for building real-time data pipelines and
streaming apps.
6. Elasticsearch
o Search engine used for indexing and querying large volumes of data quickly.
Conclusion
Big Data has revolutionized how modern organizations collect, store, and analyze data. Unlike
traditional DBMS, Big Data systems are designed to handle massive volumes and diverse types
of data in real time. Through technologies like Hadoop, Spark, and NoSQL, businesses can
unlock new opportunities, enhance services, and gain competitive advantages. Understanding
the characteristics and management challenges of Big Data is crucial for developing efficient and
future-ready data systems.