Data Modeling with MongoDB
Yulia Genkina
Curriculum Engineer @ MongoDB
Agenda
Key Considerations
Agenda
Key Considerations
Linking vs. Embedding
Agenda
Key Considerations
Linking vs. Embedding
Design Patterns
Sub - Bullet points
Key Considerations
Linking vs. Embedding
Design Patterns
Use Case Example
Agenda
Key Considerations
Linking vs. Embedding
Design Patterns
Use Case Example
Conclusion
Let’s Compare
RDBMS approach to data modeling vs. MongoDB
Modeling for RDBMS Concerns
Step 1: Define the Schema
T
EC
RR
CO
Step 2: Develop the application
and queries
Modeling for RDBMS Concerns
Step 1: Define the Schema
D
L IZE
R MA
NO ?
DE
Step 2: Develop the application
and queries ?
Modeling for RDBMS Concerns
Step 1: Define the Schema
Da
ta
dic
Step 2: Develop the application t at
es
and queries
Modeling for RDBMS Concerns
Step 1: Define the Schema
Step 2: Develop the application
and queries
Data Modeling with MongoDB
Develop the Define the Data Improve the Improve the Data
Application Model Application Model
Many design options
Designed for the usage pattern
Data model evolution is easy
Improve the Improve the
Application Data Model
Can evolve without any
downtime
Key Considerations
For Data Modeling with MongoDB
Data model is defined at the
application level
There Is No Magic
Design is part of each phase of
Formula, but There Is A
the application lifetime
Method
What affects the data model:
o The data that your application needs
o Application’s read and write usage of
the data
Data Modeling
Methodology to Achieve a Near Magic Almost Formula
Step-by-step Iteration
ü Business domain expertise
ü Current and predicted scenarios
ü Production logs and stats
• Data size
• Database queries and
Evaluate the indexes
application workload
• Current operations and
assumptions
• Data size
• A list of
operations
ranked by
importance
Step-by-step Iteration
• Business domain expertise
• Current and predicted scenarios
• Production logs and stats
• Data size
• Database queries and
Evaluate the Map out entities and indexes
application workload their relationships
• Current operations and
assumptions
• Data size • CRD: Collection
• A list of relationship
operations Diagram (Link or
ranked by Embed? )
importance
Link vs. Embed
Which is the Right Decision and What Does it Mean?
What Can Be Linked?
tags
• name
Relationships: • url
• One-to-one articles
N-to-N
• One-to-many • title
• date
• Many-to-many • text
1-to-N N-to-N
users categories
• name 1-to-N
• name
• email • url
1-to-N
comments
• name
• url
Example: Entities and relationships in a Blog
One-to-One Linked
Book = { // either side can track
"_id": 1,
"title": "Harry Potter and the Methods of Rationality",
"slug": "9781857150193-hpmor",
"author": 1, // more fields follow…
}
Author = {
"_id": 1,
"firstName": "Eliezer",
"lastName": "Yudkowsky"
"book": 1, // more fields follow…
}
One-to-One Embedded
Book = {
"_id": 1,
"title": "Harry Potter and the Methods of Rationality",
"slug": "9781857150193-hpmor",
"author": {
"firstName": "Eliezer",
"lastName": "Yudkowsky"
},
// more fields follow…
}
One-to-Many: Array in Parent
Author= {
"_id": 1,
"firstName": "Eliezer",
"lastName": "Yudkowsky",
"books": [1, 5, 17],
// more fields follow…
}
One-to-Many: Scalar in Child
Book1= {
"_id": 1,
"title": "Harry Potter and the Methods of Rationality",
"slug": "9781857150193-hpmor",
"author": 1, // more fields follow…
}
Book2= {
"_id": 5,
"title": "How to Actually Change Your Mind",
"slug": "1939311179490-how-to-change",
"author": 1, // more fields follow…
}
Many-to-Many: Arrays on either side
Book = { //either side can track
"_id": 5,
"title": "Harry Potter and the Methods of Rationality",
"slug": "9781857150193-hpmor",
"authors": [1, 3], // more fields follow…
}
Author = {
"_id": 1,
"firstName": "Eliezer",
"lastName": "Yudkowsky",
"books": [5, 7], // more fields follow…
}
Embed All Embed &Link
articles articles
• title
• title
• date
• text
• date
• text
tags []
• name
• url
tags []
• name
categories [] users • url
• name • name 1-to-N
• url • email
categories []
• name
comments[] • url
• name
• url
comments[]
• name
users • url
• name
• email
Queries by articles Queries by articles or users
How often does the embedded
information get accessed?
Is the data queried using the
To Link or Embed? embedded information?
Does the embedded information
change often?
Step-by-step Iteration
• Business domain expertise
• Current and predicted scenarios
• Production logs and stats
• Collections with
documents fields and
Finalize the data shapes for each
Evaluate the Map out entities and • Data size
model for each
application workload their relationships • Database queries and
collection indexes
• Current operations
• Data size • CRD: Collection • Identify and assumptions, and growth
• A list of relationship apply relevant projections
operations Diagram (Link or design patterns
ranked by Embed? )
importance
Design Patterns
Brief introduction
The Schema Versioning Pattern
The Schema Versioning Pattern
The Schema Versioning Pattern
The Schema Versioning Pattern
The Schema Versioning Pattern
The Bucket Pattern
Tabular Approach Document Approach
New document for each sensor New document per time unit per
reading sensor
Really benefits from the document
model
Used to store small, related data
items
• Bank Transactions – related by account and
date
• IoT Readings – related by sensor and date
Reduces index sizes by a large
magnitude
The Bucket Pattern Increases speed of retrieval of related
Enables the Computed Pattern data
The Bucket Pattern Implementation
sensor = 5, value = 22, time = Date('2020-05-11')
db.iot.updateOne({ "sensor": reading.sensor,
"valcount": { "$lt": 200 } },
{ "$push": { "readings": { "v": value, "t": time } },
"$inc": { "valcount": 1 } },
{ upsert: true })
{ "_id": ObjectId("abcd12340101"), "sensor": 5, "valcount": 3,
"readings": [ {"v": 11, "t": Date("2020-05-09")},
{"v": 81, "t": Date("2020-05-10")},
{"v": 22, "t": Date("2020-05-11")} ] }
}
The Computed Pattern
CPU work
The Computed Pattern
CPU work
The Computed Pattern
"Never recompute what you can
precompute"
Reads are often more common than
writes
Compute on write is less work than
The Computed Pattern compute on read
When updating the database, update
some summary records too
Can be thought of as a caching
pattern
Computed Pattern with the Bucket Pattern
sensor = 5, value = 22, time = Date('2020-05-11')
db.iot.updateOne({ "sensor": reading.sensor,
"valcount": { $lt:200 } },
{ "$push": { "readings": { "v": value, "t": time } },
"$inc": { "valcount": 1, "tot": value } },
{ upsert: true })
{ "_id": ObjectId("abcd12340101"), "sensor": 5, "valcount": 3, "tot": 114,
"readings": [ { "v": 11, "t": Date("2020-05-09” )},
{ "v": 81, "t": Date("2020-05-10” )},
{ "v": 22, "t": Date("2020-05-11” )} ] }
Other Patterns and Where To Find Them
MongoDB Blog, MongoDB Developer Portal and
MongoDB University are all great resources to continue
learning about data modeling and patterns.
Learning
Design Patterns: Elements of Reusable Object-Oriented
Software – a book!
Other talks at this conference:
• Advanced Schema Design Patterns
• A Complete Methodology to Data Modeling
• Using JSON Schema to Save Lives
• Attribute Pattern and the Wildcard Index: Is the
Attribute Pattern Obsolete?
Design an Online Shopping App:
MongoMart
A Use Case Example
Step 1
• Business domain expertise
• Current and predicted scenarios
• Production logs and stats
• Data size
• Database queries and
Evaluate the indexes
application workload
• Current operations
assumptions, and growth
• Data size projections
• A list of
operations
ranked by
importance
Evaluate the Application Workload
1000 stores 50 employees per stores
1 store lookup per customer per year
10 Million items 100 reviews per item
500 thousand updates per day
100 Million user accounts Placing 4 items in the cart
• 500 thousand new accounts per week
Buying an average of 2 items per cart
• Logging in 20 times a year
• Looking up 100 items per year
• Creating 5 carts per year
• Reviewing 2 items per year
10 data scientists each running 10
Analytics
queries a day
Workload Evaluation Summary
Most important queries
• r2: user views a specific item – has to be under 1 ms
• w3: user adds item to cart – write concern: majority
List of Entities:
Required indexes • carts
• {"category": 1, "item_name": 1} • categories
• items
• {"category": 1, "item_name": 1, "price": 1}
• reviews
• {"username": 1} and more.. • staff
• stores
• users
Assumptions and Projections • views
• Data will be stored for a maximum of 5 years
• Number of items sold and number of users will double each year
Step-by-step Iteration
• Business domain expertise
• Current and predicted scenarios
• Production logs and stats
• Collections with
documents fields and
shapes for each
Evaluate the Map out entities and • Data size
application workload their relationships • Database queries and
indexes
• Current operations
• Data size • CRD: Collection assumptions, and growth
• A list of relationship projections
operations Diagram (Link or
ranked by Embed? )
importance
Entity Relationship Diagram
carts users
N-to-N N-to-N
1-to-N
users items staff
1-to-N 1-to-N N-to-N 1-to-N
N-to-N
views reviews stores
Collections Relationship Diagram (Simple)
Embed Everything!
users items
carts reviews
stores
N-to-N
N-to-N staff
1-to-N
views categories
Collections Relationship Diagram (Better)
Accommodate for assumptions.
Embed & Link!
items
y 5
carts
r
ve
reviews
stores
r e
ea rs
1-to-N N-to-N
users
l
c a
N-to-N staff
ye
y5
1-to-N 1-to-N
views
e r categories
e v
r
l ea rs
c a
ye
Step-by-step Iteration
• Business domain expertise
• Current and predicted scenarios
• Production logs and stats
• Collections with
documents fields and
Finalize the data shapes for each
Evaluate the Map out entities and • Data size
model for each
application workload their relationships • Database queries and
collection indexes
• Current operations
• Data size • CRD: Collection • Identify and assumptions, and growth
• A list of relationship apply relevant projections
operations Diagram (Link or schema patterns
ranked by Embed? )
importance
Apply all the Patterns!
Patterns Used:
• Schema Versioning
• Subset
• Computed
• Bucket
• Extended Reference
Conclusion
And additional considerations
Your Data Model Will Evolve
Just like your application
Small team Medium team Large team Very big team team
Tailor the Data Model
To your unique setup
e l
od
e l a m
• Shared hosted DB
od• Replica Set at
• Small team
m t d
ta an
d a rm
le r rf o
p Pe
Sim • Large Sharded Cluster
Small team Medium team Large team Very big team team
Flexible Data Modeling Approach
For a Simpler data model For the most Performant
For a bit of both:
focus on: data model focus on:
• Data size
• Data size • The most frequent
Evaluate the application The most frequent
• The most frequent operations
workload operation
operations • The most important
operations
Map out the entities and Embedding and linking Embedding and linking
Embedding data
their relationships data data
Finalize schema for each Use as many patterns as Use as many patterns as
Use few patterns
collection necessary necessary
#MDBlive
Visit our product
"booths" for new
features, like the new
Schema Advisor in
Atlas!
mongodb.com/live/product
#MDBlive
Special Thanks to:
John Page, Daniel Coupal,
Eoin Brazil for excellent
content support