0% found this document useful (0 votes)

18 views43 pages

Lecture 6 - Document Databases, Data Formats

The document provides an overview of NoSQL databases, focusing on document databases such as MongoDB, which utilize JSON and BSON formats for data storage. It covers key concepts including data types, schema design, indexing, and the internal workings of MongoDB, including replication, sharding, and transactions. The lecture emphasizes the advantages of document databases in terms of flexibility and performance compared to traditional relational databases.

Uploaded by

Yasmine Elqorashy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views43 pages

Lecture 6 - Document Databases, Data Formats

Uploaded by

Yasmine Elqorashy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

NoSQL Databases

Document Databases
Lecture 6 of NoSQL Databases (PA195)

David Novak, FI, Masaryk University, Brno

http://disa.fi.muni.cz/david-novak/teaching/nosql-databases-2018/
Agenda
● Text (Document) Data Types
○ JSON: JavaScript Object Notation

● Document Databases: MongoDB

○ Database schema: Design
○ Using MongoDB: Updates, Queries, Indexes
○ Behind the scene
■ BSON format, Distribution, Replication, Transactions, ...
NoSQL Databases and Data Types
1. Key-value stores:
○ Can store any (text or binary) data
■ often, if using JSON data, additional functionality is available

2. Document databases
○ Structured text data - Hierarchical tree data structures
■ typically JSON, XML

3. Column-family stores
○ Rows that have many columns associated with a row key
■ can be written as JSON
Part 1: Document Data Types
Data Formats
● Binary Data (previous lecture)
○ often, we want to store objects (class instances)
○ objects can be binary serialized (marshalled)
■ and kept in a key-value store
○ there are several popular serialization formats
■ Protocol Buffers, Apache Thrift

● Semi-Structured Text Data

○ JSON, BSON (Binary JSON)
■ JSON is currently number one data format used on the Web
○ XML: eXtensible Markup Language
○ RDF: Resource Description Framework
JSON: Basic Information
● Text-based open standard for data interchange
○ Serializing and transmitting structured data
● JSON = JavaScript Object Notation
○ Originally specified by Douglas Crockford in 2001
○ Derived from JavaScript scripting language
○ Uses conventions of the C-family of languages
● Filename: *.json
● Internet media (MIME) type: application/json
● Language independent
http://www.json.org
JSON:Example

source: I. Holubová, J. Kosek, K. Minařík, D. Novák. Big Data a NoSQL databáze. Praha: Grada Publishing, 2015.
JSON Properties
● There is no way to write comments in JSON
○ Originally, there was but it was removed for security

● No way to specify precision/size of numbers

○ It depends on the parser and the programming language

● There exists a standard “JSON Schema”

○ A way to specify the schema of the data
○ Field names, field types, required/optional fields, etc.
○ JSON Schema is written in JSON, of course
■ see example below
JSON Schema: Example

source: I. Holubová, J. Kosek, K. Minařík, D. Novák. Big Data a NoSQL databáze. Praha: Grada Publishing, 2015.
Document with JSON Schema

source: I. Holubová, J. Kosek, K. Minařík, D. Novák. Big Data a NoSQL databáze. Praha: Grada Publishing, 2015.
Part 2: Document Databases
Document Databases: Fundamentals
● Basic concept of data: Document
● Documents are self-describing pieces of data
○ Hierarchical tree data structures
○ Nested associative arrays (maps), collections, scalars
○ XML, JSON (JavaScript Object Notation), BSON, …
● Documents in a collection should be “similar”
○ Their schema can differ
● Often: Documents stored as values of key-value
○ Key-value stores where the values are examinable
○ Building search indexes on various keys/fields
Why Document Databases
● XML and JSON are popular for data exchange
○ Recently mainly JSON
● Data stored in document DB can be used directly

● Databases often store objects from memory

○ Using RDBMS, we must do Object Relational Mapping (ORM)
■ ORM is relatively demanding
○ JSON is much closer to structure of memory objects
■ It was originally for JavaScript objects
■ Object Document Mapping (ODM) is faster
Document Databases: Representatives

MS Azure
DocumentDB

Ranked list: http://db-engines.com/en/ranking/document+store

Part 2.1: MongoDB - Basics & Querying
MongoDB
● Initial release: 2009
○ Written in C++
○ Open-source
○ Cross-platform
● JSON documents
● Basic features:
○ High performance – many indexes
○ High availability – replication + eventual consistency +
automatic failover
○ Automatic scaling – automatic sharding across the cluster
○ MapReduce support
http://www.mongodb.org/
MongoDB: Terminology
RDBMS MongoDB ● each JSON document:
database instance MongoDB instance ○ belongs to a collection
schema database ○ has a field _id
table collection ■ unique within the collection

row document

rowid _id
● each collection:
○ belongs to a “database”

http://www.mongodb.org/
Documents
● Use JSON for API communication
● Internally: BSON
○ Binary representation of JSON
○ For storage and inter-server communication

● Document has a maximum size: 16MB (in BSON)

○ Not to use too much RAM
○ GridFS tool can divide larger files into fragments
Document Fields
● Every document must have field _id
○ Used as a primary key
○ Unique within the collection
○ Immutable
○ Any type other than an array
○ Can be generated automatically

● Restrictions on field names:

○ The field names cannot start with the $ character
■ Reserved for operators
○ The field names cannot contain the . character
■ Reserved for accessing sub-fields
Database Schema
● Documents have flexible schema
○ Collections do not enforce specific data structure
○ In practice, documents in a collection are similar

● Key decision of data modeling:

○ References vs. embedded documents

○ In other words: Where to draw lines between aggregates

■ Structure of data
■ Relationships between data
Schema: Embedded Docs
● Related data in a single document structure
○ Documents can have subdocuments (in a field or array)

http://www.mongodb.org/
Schema: Embedded Docs (2)
● Denormalized schema
● Main advantage:
Manipulate related data in a single operation
● Use this schema when:
○ One-to-one relationships: one doc “contains” the other
○ One-to-many: if children docs have one parent document
● Disadvantages:
○ Documents may grow significantly during the time
○ Impacts both read/write performance
■ Document must be relocated on disk if its size exceeds allocated space
■ May lead to data fragmentation on the disk
Schema: References
● Links/references from one document to another
● Normalization of the schema

http://www.mongodb.org/
Schema: References (2)
● More flexibility than embedding
● Use references:
○ When embedding would result in duplication of data
■ and only insignificant boost of read performance
○ To represent more complex many-to-many relationships
○ To model large hierarchical data sets

● Disadvantages:
○ Can require more roundtrips to the server
■ Documents are accessed one by one
Part 2.2: MongoDB - Indexes
Indexes
● Indexes are the key for MongoDB performance
○ Without indexes, MongoDB must scan every document in a
collection to select matching documents
● Indexes store some fields in easily accessible form
○ Stores values of a specific field(s) ordered by the value

● Defined per collection

● Purpose:
○ To speed up common queries
○ To optimize performance of other specific operations
Index Types
● Default: _id
○ Exists by default
■ If applications do not specify _id, it is created.
○ Unique
● Single Field
○ User-defined indexes on a single field of a document
● Compound
○ User-defined indexes on multiple fields
● Multikey index
○ To index the content stored in arrays
○ Creates separate index entry for each array element
Index Types (3)
● Ordered Index
○ B-Tree (see above)
● Hash Indexes
○ Fast O(1) indexes the hash of the value of a field
■ Only equality matches
● Geospatial Index
○ 2d indexes = use planar geometry when returning results
■ For data representing points on a two-dimensional plane
○ 2sphere indexes = spherical (Earth-like) geometry
■ For data representing longitude, latitude
● Text Indexes
○ Searching for string content in a collection
Part 2.3: MongoDB - Behind the Scene
MongoDB: Behind the Scene
● BSON format
● Distribution models
○ Replication
○ Sharding
○ Balancing
● MapReduce
● Transactions
● Journaling
BSON (Binary JSON) Format
● Binary-encoded serialization of JSON documents
○ Representation of documents, arrays, JSON simple data
types + other types (e.g., date)

http://www.bsonspec.org/
Data Replication
● Master/slave replication
● Replica set = group of
instances that host the
same data set
○ primary (master) – handles
all write operations
○ secondaries (slaves) –
apply operations from the
primary so that they have
the same data set
Replication: Read & Write
● Write operation:
1. Write operation is applied on the primary
2. Operation is recorded to primary’s oplog (operation log)
3. Secondaries replicate the oplog + apply the operations to
their data sets
● Read: All replica set members can accept reads
○ By default, application directs its reads to the primary
■ Guaranties the latest version of a document
■ Decreases read throughput
○ Read preference mode can be set
■ See below
Replication: Read Modes

Read Preference Description

Mode
primary operations read from the primary of the replica set
primaryPreferred operations read from the primary, but if unavailable,
operations read from secondary members
secondary operations read from the secondary members
secondaryPreferred operations read from secondary members, but if
none is available, operations read from the primary
nearest operations read from the nearest member (= shortest
ping time) of the replica set
Replica Set Elections
● If the primary
becomes
unavailable, an
election determines
a new primary
○ Elections need some
time
○ No primary =>
no writes
Replica Set: CAP
● Let us have three nodes in the replica set
○ Let’s say that the master is disconnected from the other two
■ The distributed system is partitioned
○ The master finds out, that it is alone
■ Specifically, that can communicate with less than half of the nodes
■ And it steps down from being master (handles just reads)
○ The other two slaves “think” that the master failed
■ Because they form a partition with more than half of the nodes
■ And elect a new master
● In case of just two nodes in RS
○ Both partitions will become read-only
■ Similar case can occur with any even number of nodes in RS
○ Therefore, we can always add an arbiter node to an even RS
Sharding
● MongoDB enables
collection partitioning
(sharding)
Collection Partitioning
● Mongo partitions collection’s data by the shard key
○ Indexed field(s) that exist in each document in the collection
■ Immutable
○ Divided into chunks, distributed across shards
■ Range-based partitioning
■ Hash-based partitioning
○ When a chunk grows beyond
the size limit, it is split
■ Metadata change, no data migration

● Data balancing:
○ Background chunk migration
Sharding: Components
● MongoDB runs in cluster of different node types:
● Shards – store the data
○ Each shard is a replica set
■ Can be a single node

● Query routers – interface with client applications

○ Direct operations to the relevant shard(s)
■ + return the result to the client
○ More than one => to divide the client request load
● Config servers – store the cluster’s metadata
○ Mapping of the cluster’s data set to the shards
○ Recommended number: 3
Sharding: Diagram
Journaling
● Write operations are applied in memory and into
a journal before done in the data files (on disk)
○ To restore consistent state after a hard shutdown
○ Can be switched on/off
● Journal directory – holds journal files
● Journal file = write-ahead redo logs
○ Append only file
○ Deleted when all the writes are durable
○ When size > 1GB of data, MongoDB creates a new file
■ The size can be modified
● Clean shutdown removes all journal files
Transactions
● Write ops: atomic at the level of single document
○ Including nested documents
○ Sufficient for many cases, but not all
○ When a write operation modifies multiple documents,
other operations may interleave
● Transactions:
○ Isolation of a write operation that affects multiple
documents update.
○ Two-phase commit
References
● I. Holubová, J. Kosek, K. Minařík, D. Novák. Big Data a
NoSQL databáze. Praha: Grada Publishing, 2015. 288 p.

● Sadalage, P. J., & Fowler, M. (2012). NoSQL Distilled: A

Brief Guide to the Emerging World of Polyglot
Persistence. Addison-Wesley Professional, 192 p.

● RNDr. Irena Holubova, Ph.D. MMF UK course NDBI040:

Big Data Management and NoSQL Databases

● MongoDB Manual: http://docs.mongodb.org/manual/

Common questions

A database administrator might choose embedding over referencing in MongoDB to achieve faster read operations by storing all related data in a single document, which can be beneficial for one-to-one or one-to-many relationships where the child documents share the same parent . The main advantage of embedding is the ability to manipulate related data in a single database operation, reducing the query complexity. However, the trade-offs include potentially larger document sizes, leading to increased RAM usage and possible data fragmentation on disk if the document size exceeds allocated space during updates, which can impact read/write performance .

In a partitioned network environment, challenges with MongoDB's default replication model include temporary unavailability of the primary node, leading to a necessity for elections to select a new primary from the secondaries, which can cause delays in write operations . Additionally, the possibility of split-brain scenarios arises when a network partition leaves multiple sets of nodes each thinking they are in the majority, potentially leading to data inconsistency. With smaller numbers of nodes or even numbers of nodes, there can be situations where no node majority is achieved, rendering the entire replica set read-only and causing service disruptions without careful management and use of arbiter nodes for ensuring election processes succeed smoothly .

Object-Relational Mapping (ORM) in RDBMS involves a relatively demanding process of translating data between incompatible systems using object-oriented programming languages, often leading to impedance mismatch between the application's objects and database tables . In contrast, Object Document Mapping (ODM) used in document databases like MongoDB is faster and more efficient as JSON documents closely resemble the structure of in-memory objects originally used in JavaScript. This reduces complexity in data serialization and deserialization processes, allowing for direct storage and retrieval of application objects as documents .

MongoDB manages high availability and consistency through its replica set model, where each replica set comprises a primary node handling all write operations and secondary nodes that replicate the data . If the primary becomes unavailable, an election is held to promote a secondary to the primary role, ensuring continued availability. MongoDB's distributed architecture supports eventual consistency and automatic failover, and it uses configurations like read preference modes to balance the application read requests across primary and secondary nodes .

BSON (Binary JSON) is used in MongoDB as a storage and data transmission format due to several benefits it offers over JSON, such as being a binary-encoded serialization that allows it to store additional data types like dates and raw binary data, which JSON does not natively support . BSON facilitates faster data parsing and is more efficient in terms of both space and speed for database operations, which is crucial for high-performance needs. BSON's design is meant to be efficient in several ways including supporting fast scans and indexing, which is critical for MongoDB’s high throughput operations .

Different index types in MongoDB, such as single-field, compound, multikey, hashed, geospatial, and text indexes, each have unique implications on performance. Single-field indexes are straightforward and improve query performance by allowing quick lookups on individual fields . Compound indexes optimize queries that use multiple fields, multikey indexes are used for array fields to create separate index entries for each array element, and hashed indexes are efficient for equality searches . Geospatial and text indexes are specialized indexes; geospatial indexes support spatial queries while text indexes facilitate searching for string content across a collection. Proper use of indexes can greatly enhance query performance by reducing the amount of scanned data, but they also consume additional resources and might slow down write operations due to the need to update the indexes on data changes .

Sharding in MongoDB is used to distribute data across multiple machines to support databases that require horizontal scalability and must handle large volumes of data or high throughput operations. Sharding becomes necessary when a single machine's capacity is insufficient to handle the data volume or traffic . Components involved in sharding include shards, which store the data and are often replica sets; query routers, which direct client requests to the proper shards; and config servers that maintain the metadata about how data is distributed .

Read preference modes in MongoDB dictate from which members of a replica set the read operations are performed, affecting load distribution and data staleness considerations . For instance, 'primary' mode ensures reads go to the primary for the latest data, while 'secondary' mode balances the load by allowing reads from secondary nodes. Modes like 'primaryPreferred' and 'nearest' further help optimize performance and availability by placing preferences on certain nodes under specific failure conditions or proximity constraints. These preferences are crucial for optimizing read throughput, maintaining data consistency, and ensuring application availability, particularly in geographically distributed deployments or high-throughput environments .

MongoDB's journaling feature contributes to data durability by ensuring write operations are recorded in memory and into a journal prior to being applied to data files on disk. In the event of a system failure, the journal allows MongoDB to restore the database to a consistent state by replaying logged operations, minimizing data loss . However, journaling has limitations, such as potentially not being enabled on all systems by default, and the overhead of maintaining additional write-ahead logs, which can affect overall write performance. Its effectiveness also depends on prompt application of journaled operations to data files to ensure minimal data exposure to failures .

MongoDB’s transaction mechanisms, which offer atomicity at the document level, may be inadequate for applications requiring multi-document transactions with strict ACID compliance, such as those involving financial operations or complex business processes . MongoDB's lack of native support for multi-document ACID transactions prior to version 4.0 is a limiting factor, especially if multiple document operations must be isolated within a single transaction. Possible solutions include application-level mechanisms for ensuring consistency through careful management of write operations or upgrading to newer MongoDB versions with support for ACID transactions across multiple documents utilizing the two-phase commit protocol for complex transactional requirements .

CT113H Lecture 6 - Document Databases, Data Formats
No ratings yet
CT113H Lecture 6 - Document Databases, Data Formats
63 pages
Overview of Document Databases
No ratings yet
Overview of Document Databases
63 pages
Mongo Lesson2
No ratings yet
Mongo Lesson2
43 pages
NGT Unit 2 - 230630 - 094118
No ratings yet
NGT Unit 2 - 230630 - 094118
62 pages
Mongodb-Unit 5
No ratings yet
Mongodb-Unit 5
120 pages
MongoDB for Developers
No ratings yet
MongoDB for Developers
15 pages
05 NoSQL
No ratings yet
05 NoSQL
21 pages
Chapter 5
No ratings yet
Chapter 5
84 pages
Complete Unit 3 Notes
No ratings yet
Complete Unit 3 Notes
30 pages
Understanding NoSQL and MongoDB Basics
No ratings yet
Understanding NoSQL and MongoDB Basics
20 pages
01 - Introduction To MongoDB
No ratings yet
01 - Introduction To MongoDB
15 pages
Document Database
No ratings yet
Document Database
25 pages
06 NoSQL
No ratings yet
06 NoSQL
80 pages
Unit 1 Part1
No ratings yet
Unit 1 Part1
38 pages
Chapitre 4 MongoDB
No ratings yet
Chapitre 4 MongoDB
27 pages
Full Stack-UNIT 3
No ratings yet
Full Stack-UNIT 3
8 pages
Understanding MongoDB's $out Stage
No ratings yet
Understanding MongoDB's $out Stage
133 pages
MongoDB: A Guide for Developers
No ratings yet
MongoDB: A Guide for Developers
50 pages
Full Stack - Unit3
No ratings yet
Full Stack - Unit3
70 pages
BDA Unit 5
No ratings yet
BDA Unit 5
61 pages
NOSQL
No ratings yet
NOSQL
50 pages
4-The MongoDB Data Model (E-Next - In)
No ratings yet
4-The MongoDB Data Model (E-Next - In)
6 pages
Benefits and Features of MongoDB
No ratings yet
Benefits and Features of MongoDB
18 pages
Introduction To MongoDB
No ratings yet
Introduction To MongoDB
28 pages
DSS - U3 - Chap6 - MongoDB Rev 1.1
No ratings yet
DSS - U3 - Chap6 - MongoDB Rev 1.1
80 pages
BDA Unit-4
No ratings yet
BDA Unit-4
12 pages
Presentation by Rajashekar G.S
100% (1)
Presentation by Rajashekar G.S
79 pages
BDA Unit 3 Notes
No ratings yet
BDA Unit 3 Notes
10 pages
MongoDB Lecture 1
No ratings yet
MongoDB Lecture 1
37 pages
Unit-1 Notes
No ratings yet
Unit-1 Notes
18 pages
MEAN 3 L3 Setting Up and Operating On MongoDB
No ratings yet
MEAN 3 L3 Setting Up and Operating On MongoDB
108 pages
Mongo DB
No ratings yet
Mongo DB
12 pages
1664473609-Unit 5 - Database Management - MongoDB
No ratings yet
1664473609-Unit 5 - Database Management - MongoDB
23 pages
Bda M3
No ratings yet
Bda M3
72 pages
Unit-3 (Mongo DB)
No ratings yet
Unit-3 (Mongo DB)
47 pages
Se DBMS 2023 Unit4
No ratings yet
Se DBMS 2023 Unit4
53 pages
CHAP1 No SQL Database - 085309
No ratings yet
CHAP1 No SQL Database - 085309
72 pages
MongoDB NoSQL Database Guide
No ratings yet
MongoDB NoSQL Database Guide
19 pages
BAD601 Module 3 PDF
No ratings yet
BAD601 Module 3 PDF
72 pages
UNIT 1 MongoDB Fully Complete
67% (3)
UNIT 1 MongoDB Fully Complete
60 pages
Mongodb
No ratings yet
Mongodb
22 pages
Understanding MongoDB Basics
No ratings yet
Understanding MongoDB Basics
46 pages
NoSQL Unit 3
No ratings yet
NoSQL Unit 3
65 pages
Mongo DB
No ratings yet
Mongo DB
8 pages
Unit 2
No ratings yet
Unit 2
85 pages
Lecture 07.06 ModelingDataInMongo - 12
No ratings yet
Lecture 07.06 ModelingDataInMongo - 12
12 pages
Big Data
No ratings yet
Big Data
26 pages
Chapter 5: No SQL Data Management and Mongodb: Unit-2
No ratings yet
Chapter 5: No SQL Data Management and Mongodb: Unit-2
65 pages
Unit 1
No ratings yet
Unit 1
57 pages
MongoDB: Features and Advantages
No ratings yet
MongoDB: Features and Advantages
227 pages
Mongo DB
No ratings yet
Mongo DB
31 pages
MST Unit-5
No ratings yet
MST Unit-5
14 pages
Mongo DB
No ratings yet
Mongo DB
14 pages
Mongo DB
No ratings yet
Mongo DB
21 pages
Module 3
No ratings yet
Module 3
54 pages
BDA Module3
No ratings yet
BDA Module3
36 pages
MongoDB Architecture Guide
100% (3)
MongoDB Architecture Guide
15 pages
Harmonize 2 TRM Review 14 Vocab Grammar Worksheets
No ratings yet
Harmonize 2 TRM Review 14 Vocab Grammar Worksheets
6 pages
Reading Test 1
No ratings yet
Reading Test 1
13 pages
Parashara Smriti in Hindi PDF
No ratings yet
Parashara Smriti in Hindi PDF
2 pages
Moeller 9e Ch02
No ratings yet
Moeller 9e Ch02
19 pages
Did Rizal Retract?: Case Title
No ratings yet
Did Rizal Retract?: Case Title
2 pages
Tetris
No ratings yet
Tetris
3 pages
Boolean Simplification Guide
No ratings yet
Boolean Simplification Guide
33 pages
Understanding Text Structures in EAPP
No ratings yet
Understanding Text Structures in EAPP
35 pages
609 Lesson Plan
No ratings yet
609 Lesson Plan
9 pages
Civil 3D Keyboard Shortcut
No ratings yet
Civil 3D Keyboard Shortcut
3 pages
APA Plagiarism Detection Exercise
No ratings yet
APA Plagiarism Detection Exercise
3 pages
Operating Systems Course Outline CS-3534
No ratings yet
Operating Systems Course Outline CS-3534
7 pages
Buchi-Emecheta - A-Feminist-With-A-Small - F'-Or-A-Motherist-With-A-Big - M'
No ratings yet
Buchi-Emecheta - A-Feminist-With-A-Small - F'-Or-A-Motherist-With-A-Big - M'
20 pages
New Century Maths Year 9 5.2 Teaching Program
No ratings yet
New Century Maths Year 9 5.2 Teaching Program
30 pages
1 - Chatter, Analytics in SF
No ratings yet
1 - Chatter, Analytics in SF
9 pages
Suggested Teaching Internship 1 Schedule of Activities
No ratings yet
Suggested Teaching Internship 1 Schedule of Activities
3 pages
PowerShell Commands Quick Guide
100% (2)
PowerShell Commands Quick Guide
2 pages
Creative Writing Module Quarter 2
No ratings yet
Creative Writing Module Quarter 2
74 pages
8086 Assembly Questions Answers
No ratings yet
8086 Assembly Questions Answers
16 pages
16 My Father Goes To Court by Carlos Bulosan
No ratings yet
16 My Father Goes To Court by Carlos Bulosan
6 pages
Math Grade2 PreTest DepEdClick
No ratings yet
Math Grade2 PreTest DepEdClick
7 pages
A Letter To Penpal Grade 5
No ratings yet
A Letter To Penpal Grade 5
5 pages
Osi Layers
No ratings yet
Osi Layers
43 pages
Metaphysical Deja Vu Hacking and Latour On Science Studies and Metaphysics - Martin Kusch 2002
No ratings yet
Metaphysical Deja Vu Hacking and Latour On Science Studies and Metaphysics - Martin Kusch 2002
9 pages
Hume - 13 Principal Up Ani Shads
No ratings yet
Hume - 13 Principal Up Ani Shads
555 pages
Calculus Applications for Math Students
No ratings yet
Calculus Applications for Math Students
30 pages
Engineering Diploma Resume SEO
No ratings yet
Engineering Diploma Resume SEO
2 pages
ĐỀ 8
No ratings yet
ĐỀ 8
4 pages
ECS Concepts and Features-Participant Guide
No ratings yet
ECS Concepts and Features-Participant Guide
132 pages
Understanding Psycholinguistics and Language
No ratings yet
Understanding Psycholinguistics and Language
28 pages

Lecture 6 - Document Databases, Data Formats

Uploaded by

Lecture 6 - Document Databases, Data Formats

Uploaded by

NoSQL Databases

David Novak, FI, Masaryk University, Brno

● Document Databases: MongoDB

● Semi-Structured Text Data

● No way to specify precision/size of numbers

● There exists a standard “JSON Schema”

● Databases often store objects from memory

Ranked list: http://db-engines.com/en/ranking/document+store

● Document has a maximum size: 16MB (in BSON)

● Restrictions on field names:

● Key decision of data modeling:

○ In other words: Where to draw lines between aggregates

● Defined per collection

Read Preference Description

● Query routers – interface with client applications

● Sadalage, P. J., & Fowler, M. (2012). NoSQL Distilled: A

● RNDr. Irena Holubova, Ph.D. MMF UK course NDBI040:

● MongoDB Manual: http://docs.mongodb.org/manual/

Common questions

Why might a database administrator choose to use embedding of documents over referencing in MongoDB, and what are the trade-offs?

What challenges might arise when using MongoDB's default replication model, especially in a partitioned network environment?

What are the key differences in data modeling between Object-Relational Mapping (ORM) used in RDBMS and Object Document Mapping (ODM) used in document databases like MongoDB?

How does MongoDB manage high availability and consistency in its distributed system architecture?

Explain the role and benefits of using BSON in MongoDB compared to JSON.

Discuss the implications of using different index types in MongoDB and how they affect performance.

In what scenarios might sharding be used in MongoDB, and what are the components involved in this process?

How do read preference modes in MongoDB influence where the read operations are performed, and why is this important?

How does MongoDB’s journaling feature contribute to data durability, and what are its limitations?

Under what circumstances would MongoDB's use of transaction mechanisms be inadequate, and what are the possible solutions?

You might also like