0% found this document useful (0 votes)
14 views2 pages

006 - Data Model of Big Data Systems

The data model of Big Data systems accommodates diverse data types and rapid changes by utilizing flexible schemas and various storage formats. It includes schema-on-read and schema-on-write approaches, supports structured, semi-structured, and unstructured data, and employs batch, stream, and hybrid processing models. This adaptability allows organizations to efficiently manage and analyze large datasets regardless of their structure or origin.

Uploaded by

Samrat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views2 pages

006 - Data Model of Big Data Systems

The data model of Big Data systems accommodates diverse data types and rapid changes by utilizing flexible schemas and various storage formats. It includes schema-on-read and schema-on-write approaches, supports structured, semi-structured, and unstructured data, and employs batch, stream, and hybrid processing models. This adaptability allows organizations to efficiently manage and analyze large datasets regardless of their structure or origin.

Uploaded by

Samrat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

### Data Model of Big Data Systems

The data model of Big Data systems defines how data is structured, stored, and
accessed. Unlike traditional data models that rely on strict schemas (like
relational databases), Big Data systems often embrace a more flexible and varied
approach due to the volume, velocity, and variety of data. Below are the key
aspects of the data model in Big Data systems:

---

### 1. **Schema-on-Read vs. Schema-on-Write**

- **Schema-on-Read**:
- **Definition**: The data schema is applied when data is read, rather than when
it is written.
- **Characteristics**:
- Allows for greater flexibility, as data can be ingested in various formats
without a predefined schema.
- Suitable for unstructured and semi-structured data, like JSON, XML, and text
files.
- **Examples**: NoSQL databases (like MongoDB, Cassandra) and data lakes.

- **Schema-on-Write**:
- **Definition**: The schema is defined and enforced when data is written to the
database.
- **Characteristics**:
- Provides consistency and integrity for structured data.
- Suitable for use cases where data types and relationships are well
understood.
- **Examples**: Traditional relational databases (like MySQL, PostgreSQL).

---

### 2. **Data Formats**

Big Data systems support a variety of data formats, including:

- **Structured Data**: Typically stored in tables (e.g., CSV files, SQL databases).
- **Semi-Structured Data**: Often in formats like JSON, XML, or Avro, which have
tags or markers but do not require a rigid schema.
- **Unstructured Data**: Includes text files, images, audio, and video, where data
does not follow a specific format or structure.

---

### 3. **Data Storage Models**

Different storage models are used in Big Data systems:

- **Distributed File Systems**:


- Designed to store large datasets across multiple machines.
- **Example**: Hadoop Distributed File System (HDFS).

- **NoSQL Databases**:
- Designed for horizontal scalability and to handle unstructured or semi-
structured data.
- **Examples**:
- Document stores (MongoDB, Couchbase).
- Key-value stores (Redis, DynamoDB).
- Column-family stores (Cassandra, HBase).

- **Data Lakes**:
- Centralized repositories that store raw data in its native format until needed
for analysis.
- They support various data types and structures.

---

### 4. **Data Processing Models**

Data processing in Big Data systems can occur in several ways:

- **Batch Processing**:
- Large volumes of data are processed at once.
- Suitable for periodic analysis and large data aggregations.
- **Example**: Apache Hadoop MapReduce.

- **Stream Processing**:
- Data is processed in real-time as it flows in, allowing for immediate insights.
- Suitable for applications that require low latency.
- **Example**: Apache Kafka, Apache Flink, Apache Spark Streaming.

- **Hybrid Processing**:
- Combines batch and stream processing to handle both historical and real-time
data.
- **Example**: Apache Beam.

---

### 5. **Data Access and Query Models**

Big Data systems support various access and query methods:

- **SQL-like Queries**:
- Some Big Data tools provide SQL-like query capabilities for ease of use.
- **Examples**: Apache Hive, Apache Drill.

- **APIs and SDKs**:


- Allow developers to interact with data programmatically.
- Useful for integrating Big Data solutions with applications.

- **Graph Query Languages**:


- For graph databases, specific query languages (like Gremlin or Cypher) are used
to traverse and analyze data relationships.

---

### Summary

The data model of Big Data systems is designed to accommodate the vast, varied, and
rapidly changing landscape of data. By leveraging flexible schemas, diverse storage
formats, and a mix of processing models, these systems provide the scalability and
adaptability needed to extract insights from massive datasets. This model enables
organizations to effectively manage and analyze their data, regardless of its
structure or origin.

You might also like