0% found this document useful (0 votes)

460 views59 pages

Data Modeling With MongoDB

This document discusses data modeling with MongoDB. It covers key considerations like linking vs embedding data and provides examples. The methodology involves iteratively defining entities and relationships, evaluating the application workload, and finalizing the data model with relevant design patterns. Linking is better if related data is often queried or changed separately, while embedding works for tightly-coupled data.

Uploaded by

Muhammad Riza Alifi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

460 views59 pages

Data Modeling With MongoDB

Uploaded by

Muhammad Riza Alifi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data Modeling with MongoDB

Yulia Genkina
Curriculum Engineer @ MongoDB
Agenda

Key Considerations
Agenda

Key Considerations

Linking vs. Embedding

Agenda

Key Considerations

Linking vs. Embedding

Design Patterns
Sub - Bullet points

Key Considerations

Linking vs. Embedding

Design Patterns

Use Case Example

Agenda

Key Considerations

Linking vs. Embedding

Design Patterns

Use Case Example

Conclusion
Let’s Compare
RDBMS approach to data modeling vs. MongoDB
Modeling for RDBMS Concerns

Step 1: Define the Schema

T
EC
RR
CO

Step 2: Develop the application

and queries
Modeling for RDBMS Concerns

Step 1: Define the Schema

D
L IZE
R MA
NO ?
DE

Step 2: Develop the application

and queries ?
Modeling for RDBMS Concerns

Step 1: Define the Schema

Da
ta
dic
Step 2: Develop the application t at
es

and queries
Modeling for RDBMS Concerns

Step 1: Define the Schema

Step 2: Develop the application

and queries
Data Modeling with MongoDB

Develop the Define the Data Improve the Improve the Data
Application Model Application Model
Many design options

Designed for the usage pattern

Data model evolution is easy

Improve the Improve the
Application Data Model
Can evolve without any
downtime
Key Considerations
For Data Modeling with MongoDB
Data model is defined at the
application level

There Is No Magic
Design is part of each phase of
Formula, but There Is A
the application lifetime
Method
What affects the data model:
o The data that your application needs
o Application’s read and write usage of
the data
Data Modeling
Methodology to Achieve a Near Magic Almost Formula
Step-by-step Iteration
ü Business domain expertise
ü Current and predicted scenarios
ü Production logs and stats

• Data size

• Database queries and

Evaluate the indexes
application workload
• Current operations and
assumptions
• Data size
• A list of
operations
ranked by
importance
Step-by-step Iteration
• Business domain expertise
• Current and predicted scenarios
• Production logs and stats

• Data size

• Database queries and

Evaluate the Map out entities and indexes
application workload their relationships
• Current operations and
assumptions
• Data size • CRD: Collection
• A list of relationship
operations Diagram (Link or
ranked by Embed? )
importance
Link vs. Embed
Which is the Right Decision and What Does it Mean?
What Can Be Linked?
tags
• name
Relationships: • url
• One-to-one articles
N-to-N

• One-to-many • title
• date
• Many-to-many • text
1-to-N N-to-N
users categories
• name 1-to-N
• name
• email • url
1-to-N
comments
• name
• url
Example: Entities and relationships in a Blog
One-to-One Linked

Book = { // either side can track

"_id": 1,
"title": "Harry Potter and the Methods of Rationality",
"slug": "9781857150193-hpmor",
"author": 1, // more fields follow…
}

Author = {
"_id": 1,
"firstName": "Eliezer",
"lastName": "Yudkowsky"
"book": 1, // more fields follow…
}
One-to-One Embedded

Book = {
"_id": 1,
"title": "Harry Potter and the Methods of Rationality",
"slug": "9781857150193-hpmor",
"author": {
"firstName": "Eliezer",
"lastName": "Yudkowsky"
},
// more fields follow…
}
One-to-Many: Array in Parent

Author= {
"_id": 1,
"firstName": "Eliezer",
"lastName": "Yudkowsky",
"books": [1, 5, 17],
// more fields follow…
}
One-to-Many: Scalar in Child

Book1= {
"_id": 1,
"title": "Harry Potter and the Methods of Rationality",
"slug": "9781857150193-hpmor",
"author": 1, // more fields follow…
}

Book2= {
"_id": 5,
"title": "How to Actually Change Your Mind",
"slug": "1939311179490-how-to-change",
"author": 1, // more fields follow…
}
Many-to-Many: Arrays on either side

Book = { //either side can track

"_id": 5,
"title": "Harry Potter and the Methods of Rationality",
"slug": "9781857150193-hpmor",
"authors": [1, 3], // more fields follow…
}

Author = {
"_id": 1,
"firstName": "Eliezer",
"lastName": "Yudkowsky",
"books": [5, 7], // more fields follow…

}
Embed All Embed &Link
articles articles
• title
• title
• date
• text
• date
• text
tags []
• name
• url
tags []
• name
categories [] users • url
• name • name 1-to-N
• url • email
categories []
• name
comments[] • url
• name
• url
comments[]
• name
users • url
• name
• email

Queries by articles Queries by articles or users

How often does the embedded
information get accessed?

Is the data queried using the

To Link or Embed? embedded information?

Does the embedded information

change often?
Step-by-step Iteration
• Business domain expertise
• Current and predicted scenarios
• Production logs and stats

• Collections with
documents fields and
Finalize the data shapes for each
Evaluate the Map out entities and • Data size
model for each
application workload their relationships • Database queries and
collection indexes
• Current operations
• Data size • CRD: Collection • Identify and assumptions, and growth
• A list of relationship apply relevant projections
operations Diagram (Link or design patterns
ranked by Embed? )
importance
Design Patterns
Brief introduction
The Schema Versioning Pattern
The Schema Versioning Pattern
The Schema Versioning Pattern
The Schema Versioning Pattern
The Schema Versioning Pattern
The Bucket Pattern

Tabular Approach Document Approach

New document for each sensor New document per time unit per
reading sensor
Really benefits from the document
model

Used to store small, related data

items
• Bank Transactions – related by account and
date
• IoT Readings – related by sensor and date

Reduces index sizes by a large

magnitude

The Bucket Pattern Increases speed of retrieval of related

Enables the Computed Pattern data
The Bucket Pattern Implementation

sensor = 5, value = 22, time = Date('2020-05-11')

db.iot.updateOne({ "sensor": reading.sensor,

"valcount": { "$lt": 200 } },
{ "$push": { "readings": { "v": value, "t": time } },
"$inc": { "valcount": 1 } },
{ upsert: true })

{ "_id": ObjectId("abcd12340101"), "sensor": 5, "valcount": 3,

"readings": [ {"v": 11, "t": Date("2020-05-09")},
{"v": 81, "t": Date("2020-05-10")},
{"v": 22, "t": Date("2020-05-11")} ] }

}
The Computed Pattern

CPU work
The Computed Pattern

CPU work
The Computed Pattern
"Never recompute what you can
precompute"

Reads are often more common than

writes

Compute on write is less work than

The Computed Pattern compute on read

When updating the database, update

some summary records too

Can be thought of as a caching

pattern
Computed Pattern with the Bucket Pattern

sensor = 5, value = 22, time = Date('2020-05-11')

db.iot.updateOne({ "sensor": reading.sensor,

"valcount": { $lt:200 } },
{ "$push": { "readings": { "v": value, "t": time } },
"$inc": { "valcount": 1, "tot": value } },
{ upsert: true })

{ "_id": ObjectId("abcd12340101"), "sensor": 5, "valcount": 3, "tot": 114,

"readings": [ { "v": 11, "t": Date("2020-05-09” )},
{ "v": 81, "t": Date("2020-05-10” )},
{ "v": 22, "t": Date("2020-05-11” )} ] }
Other Patterns and Where To Find Them
MongoDB Blog, MongoDB Developer Portal and
MongoDB University are all great resources to continue
learning about data modeling and patterns.

Learning
Design Patterns: Elements of Reusable Object-Oriented
Software – a book!

Other talks at this conference:

• Advanced Schema Design Patterns
• A Complete Methodology to Data Modeling
• Using JSON Schema to Save Lives
• Attribute Pattern and the Wildcard Index: Is the
Attribute Pattern Obsolete?
Design an Online Shopping App:
MongoMart
A Use Case Example
Step 1
• Business domain expertise
• Current and predicted scenarios
• Production logs and stats

• Data size

• Database queries and

Evaluate the indexes
application workload
• Current operations
assumptions, and growth
• Data size projections
• A list of
operations
ranked by
importance
Evaluate the Application Workload

1000 stores 50 employees per stores

1 store lookup per customer per year

10 Million items 100 reviews per item

500 thousand updates per day

100 Million user accounts Placing 4 items in the cart

• 500 thousand new accounts per week
Buying an average of 2 items per cart
• Logging in 20 times a year
• Looking up 100 items per year
• Creating 5 carts per year
• Reviewing 2 items per year

10 data scientists each running 10

Analytics
queries a day
Workload Evaluation Summary

Most important queries

• r2: user views a specific item – has to be under 1 ms
• w3: user adds item to cart – write concern: majority
List of Entities:
Required indexes • carts
• {"category": 1, "item_name": 1} • categories
• items
• {"category": 1, "item_name": 1, "price": 1}
• reviews
• {"username": 1} and more.. • staff
• stores
• users
Assumptions and Projections • views
• Data will be stored for a maximum of 5 years
• Number of items sold and number of users will double each year
Step-by-step Iteration
• Business domain expertise
• Current and predicted scenarios
• Production logs and stats

• Collections with
documents fields and
shapes for each
Evaluate the Map out entities and • Data size
application workload their relationships • Database queries and
indexes
• Current operations
• Data size • CRD: Collection assumptions, and growth
• A list of relationship projections
operations Diagram (Link or
ranked by Embed? )
importance
Entity Relationship Diagram

carts users

N-to-N N-to-N
1-to-N

users items staff

1-to-N 1-to-N N-to-N 1-to-N

N-to-N

views reviews stores

Collections Relationship Diagram (Simple)
Embed Everything!

users items

carts reviews
stores
N-to-N
N-to-N staff

1-to-N
views categories
Collections Relationship Diagram (Better)
Accommodate for assumptions.
Embed & Link!

items

y 5
carts
r
ve
reviews
stores
r e
ea rs
1-to-N N-to-N
users
l
c a
N-to-N staff

ye
y5
1-to-N 1-to-N
views
e r categories

e v
r
l ea rs
c a
ye
Step-by-step Iteration
• Business domain expertise
• Current and predicted scenarios
• Production logs and stats

• Collections with
documents fields and
Finalize the data shapes for each
Evaluate the Map out entities and • Data size
model for each
application workload their relationships • Database queries and
collection indexes
• Current operations
• Data size • CRD: Collection • Identify and assumptions, and growth
• A list of relationship apply relevant projections
operations Diagram (Link or schema patterns
ranked by Embed? )
importance
Apply all the Patterns!

Patterns Used:

• Schema Versioning
• Subset
• Computed
• Bucket
• Extended Reference
Conclusion
And additional considerations
Your Data Model Will Evolve
Just like your application

Small team Medium team Large team Very big team team
Tailor the Data Model
To your unique setup

e l
od
e l a m
• Shared hosted DB
od• Replica Set at
• Small team
m t d
ta an
d a rm
le r rf o
p Pe
Sim • Large Sharded Cluster
Small team Medium team Large team Very big team team
Flexible Data Modeling Approach
For a Simpler data model For the most Performant
For a bit of both:
focus on: data model focus on:

• Data size
• Data size • The most frequent
Evaluate the application The most frequent
• The most frequent operations
workload operation
operations • The most important
operations

Map out the entities and Embedding and linking Embedding and linking
Embedding data
their relationships data data

Finalize schema for each Use as many patterns as Use as many patterns as
Use few patterns
collection necessary necessary
#MDBlive

Visit our product

"booths" for new
features, like the new
Schema Advisor in
Atlas!
mongodb.com/live/product
#MDBlive

Special Thanks to:

John Page, Daniel Coupal,
Eoin Brazil for excellent
content support

RDBMS To MongoDB Migration
No ratings yet
RDBMS To MongoDB Migration
20 pages
MongoDB Architecture Guide
No ratings yet
MongoDB Architecture Guide
18 pages
Little Mongodb Schema Book
No ratings yet
Little Mongodb Schema Book
27 pages
Optimizing MongoDB Data Models
100% (1)
Optimizing MongoDB Data Models
39 pages
MongoDB Schema Design
No ratings yet
MongoDB Schema Design
69 pages
Forecasting MySQL Performance and Scalability
100% (1)
Forecasting MySQL Performance and Scalability
41 pages
MongoDB SI Architect Program Overview
No ratings yet
MongoDB SI Architect Program Overview
2 pages
Dynamodb DG
No ratings yet
Dynamodb DG
705 pages
Node Patterns - Databases Volume I - LevelDB, Redis and CouchDB
No ratings yet
Node Patterns - Databases Volume I - LevelDB, Redis and CouchDB
98 pages
Simply Rethink DB
No ratings yet
Simply Rethink DB
193 pages
Chapter 4. Database System Architecture & Modeling
100% (1)
Chapter 4. Database System Architecture & Modeling
57 pages
MongoDB Schema Design Guide
0% (1)
MongoDB Schema Design Guide
116 pages
Angular 8 Features and Updates
No ratings yet
Angular 8 Features and Updates
7 pages
NoSQL Data Analytics Guide
0% (1)
NoSQL Data Analytics Guide
50 pages
Introduction to Hadoop HDFS
No ratings yet
Introduction to Hadoop HDFS
9 pages
MongoDB CRUD Operations
No ratings yet
MongoDB CRUD Operations
70 pages
MySQL InnoDB X-Locks Monitoring Script
100% (1)
MySQL InnoDB X-Locks Monitoring Script
4 pages
MongoDB Schema Design Basics
100% (2)
MongoDB Schema Design Basics
51 pages
Apache Spark Quick Guide
100% (2)
Apache Spark Quick Guide
21 pages
RMI & JDBC for Java Developers
No ratings yet
RMI & JDBC for Java Developers
30 pages
Nprobe User'S Guide: Open Source Software and Hardware Netflow V5/V9 Probe
No ratings yet
Nprobe User'S Guide: Open Source Software and Hardware Netflow V5/V9 Probe
47 pages
07 - Ingesting New Datasets Into Google BigQuery
No ratings yet
07 - Ingesting New Datasets Into Google BigQuery
8 pages
MongoDB - Learn MongoDB in A Simple Way! (PDFDrive)
No ratings yet
MongoDB - Learn MongoDB in A Simple Way! (PDFDrive)
112 pages
JUnit 5 User Guide Overview
No ratings yet
JUnit 5 User Guide Overview
145 pages
Hibernate Search Reference
No ratings yet
Hibernate Search Reference
379 pages
Docker Guide for DevOps Engineers
No ratings yet
Docker Guide for DevOps Engineers
17 pages
Learning Concurrent Programming in Scala: Chapter No. 1 "Introduction"
No ratings yet
Learning Concurrent Programming in Scala: Chapter No. 1 "Introduction"
21 pages
AWS-Storage Services V2
No ratings yet
AWS-Storage Services V2
25 pages
Rule Engine
No ratings yet
Rule Engine
2 pages
100 Interview Questions
No ratings yet
100 Interview Questions
13 pages
Hibernate
No ratings yet
Hibernate
110 pages
Pydantic Fast Review
100% (1)
Pydantic Fast Review
9 pages
MongoDB Indexes Guide
No ratings yet
MongoDB Indexes Guide
68 pages
SQL vs NoSQL: Key Differences Explained
No ratings yet
SQL vs NoSQL: Key Differences Explained
4 pages
MongoDB Architecture Guide
100% (3)
MongoDB Architecture Guide
15 pages
Couchbase Server Architecture Overview
No ratings yet
Couchbase Server Architecture Overview
12 pages
Angular 5
No ratings yet
Angular 5
47 pages
NoSQL Databases: Overview & Benefits
No ratings yet
NoSQL Databases: Overview & Benefits
8 pages
Understanding NoSQL Databases
No ratings yet
Understanding NoSQL Databases
4 pages
Advanced Data Modeling Guide
No ratings yet
Advanced Data Modeling Guide
18 pages
MongoDB Basics for Beginners
No ratings yet
MongoDB Basics for Beginners
2 pages
Data Modeling Vs Database Design
100% (1)
Data Modeling Vs Database Design
12 pages
MongoDB Architecture Guide
No ratings yet
MongoDB Architecture Guide
18 pages
SS1123 - D2T - Apache Cassandra Overview PDF
100% (1)
SS1123 - D2T - Apache Cassandra Overview PDF
45 pages
NoSQL Architecture: MongoDB vs. Couchbase
No ratings yet
NoSQL Architecture: MongoDB vs. Couchbase
45 pages
C Programming Algorithm Guide
No ratings yet
C Programming Algorithm Guide
29 pages
Overview of C# 13 Features and Updates
No ratings yet
Overview of C# 13 Features and Updates
1 page
Apache Cassandra
No ratings yet
Apache Cassandra
3 pages
Cassandra for Developers
100% (2)
Cassandra for Developers
183 pages
Lekcija09 - 04 NoSQL Redis
No ratings yet
Lekcija09 - 04 NoSQL Redis
40 pages
MongoDB Schema Design Guide
No ratings yet
MongoDB Schema Design Guide
59 pages
BigQuery's New Enterprise Data Features
No ratings yet
BigQuery's New Enterprise Data Features
6 pages
Overview of Jersey Web Services
No ratings yet
Overview of Jersey Web Services
17 pages
Android Employee Tracker App
0% (1)
Android Employee Tracker App
16 pages
Node.js Developer Roadmap 2021 Guide
No ratings yet
Node.js Developer Roadmap 2021 Guide
6 pages
Hibernate Introduction & JDBC Drawbacks
No ratings yet
Hibernate Introduction & JDBC Drawbacks
63 pages
Supertype, Subtype
No ratings yet
Supertype, Subtype
31 pages
Data Modeling
No ratings yet
Data Modeling
36 pages
MongoDB Data Modeling - Sample Chapter
No ratings yet
MongoDB Data Modeling - Sample Chapter
40 pages
MongoDB Why Documents
No ratings yet
MongoDB Why Documents
15 pages
EEE 213 Electronics Devices & Circuits 3 EEE 214 Electronics Devices & Circuits Laboratory 1.5
No ratings yet
EEE 213 Electronics Devices & Circuits 3 EEE 214 Electronics Devices & Circuits Laboratory 1.5
4 pages
Green Electricity from Aloe Vera
100% (1)
Green Electricity from Aloe Vera
5 pages
Trouble-Shooting SAP PDF
No ratings yet
Trouble-Shooting SAP PDF
19 pages
Carpe Diem Theme
No ratings yet
Carpe Diem Theme
1 page
Tunnel Intake Failure Case Study
No ratings yet
Tunnel Intake Failure Case Study
4 pages
Web Methods Upgrade Guides
No ratings yet
Web Methods Upgrade Guides
120 pages
Effect of Light Quality and Intensity On Emergence, Growth and Reproduction in Chromolaena Odorata
No ratings yet
Effect of Light Quality and Intensity On Emergence, Growth and Reproduction in Chromolaena Odorata
14 pages
Binary Tree and General Tree Implementation
No ratings yet
Binary Tree and General Tree Implementation
5 pages
Turbofan Engine Database As A Preliminary Design Tool
No ratings yet
Turbofan Engine Database As A Preliminary Design Tool
15 pages
PG TRB State Level Model Exam Chemistry
No ratings yet
PG TRB State Level Model Exam Chemistry
25 pages
How To Prepare A Poster
No ratings yet
How To Prepare A Poster
3 pages
Beam Stress Analysis and Properties
No ratings yet
Beam Stress Analysis and Properties
18 pages
Steam Tables
No ratings yet
Steam Tables
127 pages
Math9 Q3 Mod7 TrapezoidsAndKite V3
No ratings yet
Math9 Q3 Mod7 TrapezoidsAndKite V3
29 pages
Regulacion Del Carburador Stromberg
100% (2)
Regulacion Del Carburador Stromberg
22 pages
Blender Shading Node
No ratings yet
Blender Shading Node
366 pages
Jain Tutorials CA F/MATHS/TEST/15-4: Rakesh Sharma 7044222444/9831255762
No ratings yet
Jain Tutorials CA F/MATHS/TEST/15-4: Rakesh Sharma 7044222444/9831255762
4 pages
Alla France ASTM Thermometer
No ratings yet
Alla France ASTM Thermometer
3 pages
Skills Practice: Measuring Angles and Arcs
No ratings yet
Skills Practice: Measuring Angles and Arcs
1 page
SOP-Piping Fabrication & Installation
100% (7)
SOP-Piping Fabrication & Installation
27 pages
Question Bank
100% (1)
Question Bank
2 pages
Overview of Cement Industry Processes
No ratings yet
Overview of Cement Industry Processes
22 pages
C等级-2022 AMC paper SC-J
No ratings yet
C等级-2022 AMC paper SC-J
11 pages
Set Operations Guide for Students
No ratings yet
Set Operations Guide for Students
19 pages
Chnos Analyzer-1
No ratings yet
Chnos Analyzer-1
2 pages
Sbfp-Forms-1,2,3 & Attendance Qes
No ratings yet
Sbfp-Forms-1,2,3 & Attendance Qes
14 pages
CASIO qw3502
No ratings yet
CASIO qw3502
3 pages
Top 135 Multiple Choice Questions: Inter-I Chemistry Success Series
No ratings yet
Top 135 Multiple Choice Questions: Inter-I Chemistry Success Series
15 pages
White Balancing Techniques in Python
No ratings yet
White Balancing Techniques in Python
11 pages
Sensors and Electrode Systems-Module-1
No ratings yet
Sensors and Electrode Systems-Module-1
25 pages

Data Modeling With MongoDB

Uploaded by

Data Modeling With MongoDB

Uploaded by

Data Modeling with MongoDB

Linking vs. Embedding

Linking vs. Embedding

Linking vs. Embedding

Use Case Example

Linking vs. Embedding

Use Case Example

Step 1: Define the Schema

Step 2: Develop the application

Step 1: Define the Schema

Step 2: Develop the application

Step 1: Define the Schema

Step 1: Define the Schema

Step 2: Develop the application

Designed for the usage pattern

Data model evolution is easy

• Database queries and

• Database queries and

Book = { // either side can track

Book = { //either side can track

Queries by articles Queries by articles or users

Is the data queried using the

Does the embedded information

Tabular Approach Document Approach

Used to store small, related data

Reduces index sizes by a large

The Bucket Pattern Increases speed of retrieval of related

sensor = 5, value = 22, time = Date('2020-05-11')

db.iot.updateOne({ "sensor": reading.sensor,

{ "_id": ObjectId("abcd12340101"), "sensor": 5, "valcount": 3,

Reads are often more common than

Compute on write is less work than

When updating the database, update

Can be thought of as a caching

sensor = 5, value = 22, time = Date('2020-05-11')

db.iot.updateOne({ "sensor": reading.sensor,

{ "_id": ObjectId("abcd12340101"), "sensor": 5, "valcount": 3, "tot": 114,

Other talks at this conference:

• Database queries and

1000 stores 50 employees per stores

10 Million items 100 reviews per item

100 Million user accounts Placing 4 items in the cart

10 data scientists each running 10

Most important queries

users items staff

1-to-N 1-to-N N-to-N 1-to-N

views reviews stores

Visit our product

Special Thanks to:

You might also like