0% found this document useful (0 votes)
303 views1,238 pages

MongoDB Manual PDF

Uploaded by

cecigj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
303 views1,238 pages

MongoDB Manual PDF

Uploaded by

cecigj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1238

MongoDB Documentation

Release 2.4.1
MongoDB Documentation Project
March 25, 2013
Contents
I Installing MongoDB 1
1 Installation Guides 3
1.1 Install MongoDB on Red Hat Enterprise, CentOS, or Fedora Linux . . . . . . . . . . . . . . . . . . 3
1.2 Install MongoDB on Ubuntu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Install MongoDB on Debian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Install MongoDB on Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Install MongoDB on OS X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6 Install MongoDB on Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.7 Install MongoDB Enterprise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.8 Getting Started with MongoDB Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2 Release Notes 29
II Administration 31
3 Run-time Database Conguration 35
3.1 Starting, Stopping, and Running the Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Security Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Replication and Sharding Conguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 Running Multiple Database Instances on the Same System . . . . . . . . . . . . . . . . . . . . . . . 38
3.5 Diagnostic Congurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4 Operational Segregation in MongoDB Operations and Deployments 41
4.1 Operational Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5 Journaling 43
5.1 Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2 Journaling Internals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6 Use MongoDB with SSL Connections 49
6.1 Congure mongod and mongos for SSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.2 SSL Conguration for Clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
i
7 Use MongoDB with SNMP Monitoring 55
7.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.2 Congure SNMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.3 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
8 Monitoring Database Systems 59
8.1 Monitoring Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
8.2 Process Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
8.3 Diagnosing Performance Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
8.4 Replication and Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
8.5 Sharding and Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
9 Importing and Exporting MongoDB Data 67
9.1 Data Type Fidelity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
9.2 Data Import and Export and Backups Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
9.3 Human Intelligible Import/Export Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
10 Backup Strategies for MongoDB Systems 71
10.1 Backup Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
10.2 Approaches to Backing Up MongoDB Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
10.3 Backup Strategies for MongoDB Deployments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
11 Linux ulimit Settings 75
11.1 Resource Utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
11.2 Review and Set Resource Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
11.3 Recommended Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
12 Production Notes 79
12.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
12.2 Backups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
12.3 Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
12.4 MongoDB on Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
12.5 Readahead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
12.6 MongoDB on Virtual Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
12.7 Disk and Storage Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
12.8 Hardware Requirements and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
12.9 Performance Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
12.10 Production Checklist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
III Security 87
13 Strategies and Practices 91
13.1 Security Practices and Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
13.2 Vulnerability Notication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
14 Tutorials 99
14.1 Congure Linux iptables Firewall for MongoDB . . . . . . . . . . . . . . . . . . . . . . . . . . 99
14.2 Congure Windows netsh Firewall for MongoDB . . . . . . . . . . . . . . . . . . . . . . . . . . 103
14.3 Control Access to MongoDB Instances with Authentication . . . . . . . . . . . . . . . . . . . . . . 106
14.4 Deploy MongoDB with Kerberos Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
15 Reference 115
15.1 User Privilege Roles in MongoDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
15.2 system.user Privilege Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
ii
IV Core MongoDB Operations (CRUD) 123
16 Read and Write Operations in MongoDB 127
16.1 Read Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
16.2 Write Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
17 Document Orientation Concepts 147
17.1 Data Modeling Considerations for MongoDB Applications . . . . . . . . . . . . . . . . . . . . . . . 147
17.2 BSON Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
17.3 ObjectId . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
17.4 Database References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
17.5 GridFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
18 CRUD Operations for MongoDB 167
18.1 Create . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
18.2 Read . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
18.3 Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
18.4 Delete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
19 Data Modeling Patterns 195
19.1 Model Embedded One-to-One Relationships Between Documents . . . . . . . . . . . . . . . . . . . 195
19.2 Model Embedded One-to-Many Relationships Between Documents . . . . . . . . . . . . . . . . . . 196
19.3 Model Referenced One-to-Many Relationships Between Documents . . . . . . . . . . . . . . . . . . 197
19.4 Model Data for Atomic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
19.5 Model Tree Structures with Parent References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
19.6 Model Tree Structures with Child References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
19.7 Model Tree Structures with an Array of Ancestors . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
19.8 Model Tree Structures with Materialized Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
19.9 Model Tree Structures with Nested Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
19.10 Model Data to Support Keyword Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
V Aggregation 207
20 Aggregation Framework 211
20.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
20.2 Framework Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
20.3 Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
20.4 Optimizing Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
20.5 Sharded Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
20.6 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
21 Aggregation Framework Examples 217
21.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
21.2 Aggregations using the Zip Code Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
21.3 Aggregation with User Preference Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
22 Aggregation Framework Reference 227
22.1 Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
22.2 Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
23 Map-Reduce 243
23.1 Map-Reduce Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
23.2 Incremental Map-Reduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
23.3 Temporary Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
iii
23.4 Concurrency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
23.5 Sharded Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
23.6 Troubleshooting Map-Reduce Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
24 Simple Aggregation Methods and Commands 255
24.1 Count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
24.2 Distinct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
24.3 Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
VI Text Search 257
25 Overview 261
26 Create a text Index 263
27 text Command 265
28 Text Search Output 267
VII Indexes 269
29 Core MongoDB Indexing Background 273
29.1 Indexing Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
29.2 Indexing Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
29.3 Indexing Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
30 Geospatial Indexing 295
30.1 Geospatial Indexes and Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
30.2 2d Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
30.3 2dsphere Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
30.4 Haystack Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
30.5 Geospatial Query Compatibility Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
30.6 Calculate Distances in a 2d Index Using Spherical Geometry . . . . . . . . . . . . . . . . . . . . . 318
30.7 Geospatial Index Internals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
31 Text Indexing 323
31.1 Text Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
31.2 text Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
VIII Replication 329
32 Replica Set Use and Operation 333
32.1 Replica Set Fundamental Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
32.2 Replica Set Operation and Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
32.3 Replica Set Architectures and Deployment Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
32.4 Replica Set Considerations and Behaviors for Applications and Development . . . . . . . . . . . . . 357
32.5 Replica Set Internals and Behaviors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
32.6 Master Slave Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
33 Replica Set Tutorials and Procedures 377
33.1 Getting Started with Replica Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
33.2 Replica Set Maintenance and Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
iv
34 Replica Set Reference Material 407
34.1 Replica Set Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
34.2 Replica Set Features and Version Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
IX Sharding 417
35 Sharded Cluster Use and Operation 421
35.1 Sharded Cluster Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
35.2 Sharded Cluster Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
35.3 Sharded Cluster Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
35.4 Sharded Cluster Internals and Behaviors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
36 Sharded Cluster Tutorials and Procedures 441
36.1 Getting Started With Sharded Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
36.2 Sharded Cluster Maintenance and Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
36.3 Backup and Restore Sharded Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
36.4 Application Development Patterns for Sharded Clusters . . . . . . . . . . . . . . . . . . . . . . . . 466
37 Sharded Cluster Reference 479
37.1 Sharding Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
X Application Development 489
38 Development Considerations 493
38.1 MongoDB Drivers and Client Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
38.2 Optimization Strategies for MongoDB Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 493
38.3 Server-side JavaScript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
38.4 Capped Collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
39 Application Design Patterns for MongoDB 503
39.1 Perform Two Phase Commits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503
39.2 Create Tailable Cursor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
39.3 Isolate Sequence of Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
39.4 Create an Auto-Incrementing Sequence Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
39.5 Limit Number of Elements in an Array after an Update . . . . . . . . . . . . . . . . . . . . . . . . . 516
39.6 Expire Data from Collections by Setting TTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
40 Text Search Patterns 519
40.1 Enable Text Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
40.2 Search String Content for Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
40.3 Create a text Index on a Multi-language Collection . . . . . . . . . . . . . . . . . . . . . . . . . . 523
40.4 Return Text Queries Using Only a text Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
40.5 Limit the Number of Index Entries Scanned for Text Search . . . . . . . . . . . . . . . . . . . . . . 524
XI Using the mongo Shell 527
41 Getting Started with the mongo Shell 531
41.1 Start the mongo Shell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
41.2 Executing Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532
41.3 Print . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
41.4 Use a Custom Prompt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
41.5 Use an External Editor in the mongo Shell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534
v
41.6 Exit the Shell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534
42 Data Types in the mongo Shell 535
42.1 Date . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535
42.2 ObjectId . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
42.3 NumberLong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
43 Access the mongo Shell Help Information 539
43.1 Command Line Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
43.2 Shell Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
43.3 Database Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
43.4 Collection Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
43.5 Cursor Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
43.6 Type Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
44 Write Scripts for the mongo Shell 543
44.1 Opening New Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
44.2 Scripting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544
45 mongo Shell Quick Reference 545
45.1 mongo Shell Command History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
45.2 Command Line Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
45.3 Command Helpers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
45.4 Basic Shell JavaScript Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546
45.5 Keyboard Shortcuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
45.6 Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548
45.7 Error Checking Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
45.8 Administrative Command Helpers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
45.9 Opening Additional Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
45.10 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552
45.11 Additional Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552
XII Use Cases 553
46 Operational Intelligence 557
46.1 Storing Log Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
46.2 Pre-Aggregated Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567
46.3 Hierarchical Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576
47 Product Data Management 585
47.1 Product Catalog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585
47.2 Inventory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593
47.3 Category Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
48 Content Management Systems 607
48.1 Metadata and Asset Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607
48.2 Storing Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614
49 Python Application Development 625
49.1 Write a Tumblelog Application with Django MongoDB Engine . . . . . . . . . . . . . . . . . . . . 625
49.2 Write a Tumblelog Application with Flask and MongoEngine . . . . . . . . . . . . . . . . . . . . . 637
vi
XIII MongoDB Tutorials 655
50 Getting Started 659
51 Administration 661
51.1 Use Database Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661
51.2 Recover MongoDB Data following Unexpected Shutdown . . . . . . . . . . . . . . . . . . . . . . . 662
51.3 Manage mongod Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664
51.4 Convert a Replica Set to a Replicated Sharded Cluster . . . . . . . . . . . . . . . . . . . . . . . . . 667
51.5 Copy Databases Between Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673
51.6 Use mongodump and mongorestore to Backup and Restore MongoDB Databases . . . . . . . . 675
51.7 Use Filesystem Snapshots to Backup and Restore MongoDB Databases . . . . . . . . . . . . . . . . 677
51.8 Analyze Performance of Database Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681
51.9 Rotate Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685
51.10 Build Old Style Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686
51.11 Replica Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 687
51.12 Sharding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 688
51.13 Basic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 688
51.14 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 688
52 Development Patterns 689
53 Application Development 691
54 Text Search Patterns 693
55 Data Modeling Patterns 695
56 MongoDB Use Case Studies 697
XIV Frequently Asked Questions 699
57 FAQ: MongoDB Fundamentals 701
57.1 What kind of database is MongoDB? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701
57.2 Do MongoDB databases have tables? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702
57.3 Do MongoDB databases have schemas? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702
57.4 What languages can I use to work with the MongoDB? . . . . . . . . . . . . . . . . . . . . . . . . . 702
57.5 Does MongoDB support SQL? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702
57.6 What are typical uses for MongoDB? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702
57.7 Does MongoDB support transactions? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703
57.8 Does MongoDB require a lot of RAM? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703
57.9 How do I congure the cache size? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703
57.10 Does MongoDB require a separate caching layer for application-level caching? . . . . . . . . . . . . 703
57.11 Does MongoDB handle caching? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704
57.12 Are writes written to disk immediately, or lazily? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704
57.13 What language is MongoDB written in? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704
57.14 What are the limitations of 32-bit versions of MongoDB? . . . . . . . . . . . . . . . . . . . . . . . 704
58 FAQ: MongoDB for Application Developers 705
58.1 What is a namespace in MongoDB? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706
58.2 How do you copy all objects from one collection to another? . . . . . . . . . . . . . . . . . . . . . . 706
58.3 If you remove a document, does MongoDB remove it from disk? . . . . . . . . . . . . . . . . . . . 706
58.4 When does MongoDB write updates to disk? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706
58.5 How do I do transactions and locking in MongoDB? . . . . . . . . . . . . . . . . . . . . . . . . . . 707
vii
58.6 How do you aggregate data with MongoDB? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707
58.7 Why does MongoDB log so many Connection Accepted events? . . . . . . . . . . . . . . . . . . 707
58.8 Does MongoDB run on Amazon EBS? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707
58.9 Why are MongoDBs data les so large? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707
58.10 How do I optimize storage use for small documents? . . . . . . . . . . . . . . . . . . . . . . . . . . 708
58.11 When should I use GridFS? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708
58.12 How does MongoDB address SQL or Query injection? . . . . . . . . . . . . . . . . . . . . . . . . . 709
58.13 How does MongoDB provide concurrency? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710
58.14 What is the compare order for BSON types? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711
58.15 How do I query for elds that have null values? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712
58.16 Are there any restrictions on the names of Collections? . . . . . . . . . . . . . . . . . . . . . . . . . 712
58.17 How do I isolate cursors from intervening write operations? . . . . . . . . . . . . . . . . . . . . . . 713
58.18 When should I embed documents within other documents? . . . . . . . . . . . . . . . . . . . . . . . 713
58.19 Can I manually pad documents to prevent moves during updates? . . . . . . . . . . . . . . . . . . . 714
59 FAQ: The mongo Shell 715
59.1 How can I enter multi-line operations in the mongo shell? . . . . . . . . . . . . . . . . . . . . . . . 715
59.2 How can I access to different databases temporarily? . . . . . . . . . . . . . . . . . . . . . . . . . . 715
59.3 Does the mongo shell support tab completion and other keyboard shortcuts? . . . . . . . . . . . . . 716
59.4 How can I customize the mongo shell prompt? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716
59.5 Can I edit long shell operations with an external text editor? . . . . . . . . . . . . . . . . . . . . . . 717
60 FAQ: Concurrency 719
60.1 What type of locking does MongoDB use? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719
60.2 How granular are locks in MongoDB? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 720
60.3 How do I see the status of locks on my mongod instances? . . . . . . . . . . . . . . . . . . . . . . 720
60.4 Does a read or write operation ever yield the lock? . . . . . . . . . . . . . . . . . . . . . . . . . . . 720
60.5 Which operations lock the database? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 720
60.6 Which administrative commands lock the database? . . . . . . . . . . . . . . . . . . . . . . . . . . 721
60.7 Does a MongoDB operation ever lock more than one database? . . . . . . . . . . . . . . . . . . . . 722
60.8 How does sharding affect concurrency? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722
60.9 How does concurrency affect a replica set primary? . . . . . . . . . . . . . . . . . . . . . . . . . . . 722
60.10 How does concurrency affect secondaries? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722
60.11 What kind of concurrency does MongoDB provide for JavaScript operations? . . . . . . . . . . . . . 722
61 FAQ: Sharding with MongoDB 723
61.1 Is sharding appropriate for a new deployment? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724
61.2 How does sharding work with replication? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724
61.3 Can I change the shard key after sharding a collection? . . . . . . . . . . . . . . . . . . . . . . . . . 724
61.4 What happens to unsharded collections in sharded databases? . . . . . . . . . . . . . . . . . . . . . 724
61.5 How does MongoDB distribute data across shards? . . . . . . . . . . . . . . . . . . . . . . . . . . . 724
61.6 What happens if a client updates a document in a chunk during a migration? . . . . . . . . . . . . . 725
61.7 What happens to queries if a shard is inaccessible or slow? . . . . . . . . . . . . . . . . . . . . . . . 725
61.8 How does MongoDB distribute queries among shards? . . . . . . . . . . . . . . . . . . . . . . . . . 725
61.9 How does MongoDB sort queries in sharded environments? . . . . . . . . . . . . . . . . . . . . . . 725
61.10 How does MongoDB ensure unique _id eld values when using a shard key other than _id? . . . . 725
61.11 Ive enabled sharding and added a second shard, but all the data is still on one server. Why? . . . . . 726
61.12 Is it safe to remove old les in the moveChunk directory? . . . . . . . . . . . . . . . . . . . . . . . 726
61.13 How does mongos use connections? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726
61.14 Why does mongos hold connections open? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726
61.15 Where does MongoDB report on connections used by mongos? . . . . . . . . . . . . . . . . . . . . 726
61.16 What does writebacklisten in the log mean? . . . . . . . . . . . . . . . . . . . . . . . . . . . 727
61.17 How should administrators deal with failed migrations? . . . . . . . . . . . . . . . . . . . . . . . . 727
61.18 What is the process for moving, renaming, or changing the number of cong servers? . . . . . . . . 727
viii
61.19 When do the mongos servers detect cong server changes? . . . . . . . . . . . . . . . . . . . . . . 727
61.20 Is it possible to quickly update mongos servers after updating a replica set conguration? . . . . . . 727
61.21 What does the maxConns setting on mongos do? . . . . . . . . . . . . . . . . . . . . . . . . . . . 727
61.22 How do indexes impact queries in sharded systems? . . . . . . . . . . . . . . . . . . . . . . . . . . 728
61.23 Can shard keys be randomly generated? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728
61.24 Can shard keys have a non-uniform distribution of values? . . . . . . . . . . . . . . . . . . . . . . . 728
61.25 Can you shard on the _id eld? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728
61.26 Can shard key be in ascending order, like dates or timestamps? . . . . . . . . . . . . . . . . . . . . . 728
61.27 What do moveChunk commit failed errors mean? . . . . . . . . . . . . . . . . . . . . . . . 729
61.28 How does draining a shard affect the balancing of uneven chunk distribution? . . . . . . . . . . . . . 729
62 FAQ: Replica Sets and Replication in MongoDB 731
62.1 What kinds of replication does MongoDB support? . . . . . . . . . . . . . . . . . . . . . . . . . . . 731
62.2 What do the terms primary and master mean? . . . . . . . . . . . . . . . . . . . . . . . . . . . 732
62.3 What do the terms secondary and slave mean? . . . . . . . . . . . . . . . . . . . . . . . . . . . 732
62.4 How long does replica set failover take? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 732
62.5 Does replication work over the Internet and WAN connections? . . . . . . . . . . . . . . . . . . . . 732
62.6 Can MongoDB replicate over a noisy connection? . . . . . . . . . . . . . . . . . . . . . . . . . . 732
62.7 What is the preferred replication method: master/slave or replica sets? . . . . . . . . . . . . . . . . . 733
62.8 What is the preferred replication method: replica sets or replica pairs? . . . . . . . . . . . . . . . . . 733
62.9 Why use journaling if replication already provides data redundancy? . . . . . . . . . . . . . . . . . 733
62.10 Are write operations durable if write concern does not acknowledge writes? . . . . . . . . . . . . . . 733
62.11 How many arbiters do replica sets need? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734
62.12 What information do arbiters exchange with the rest of the replica set? . . . . . . . . . . . . . . . . 734
62.13 Which members of a replica set vote in elections? . . . . . . . . . . . . . . . . . . . . . . . . . . . 734
62.14 Do hidden members vote in replica set elections? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735
62.15 Is it normal for replica set members to use different amounts of disk space? . . . . . . . . . . . . . . 735
63 FAQ: MongoDB Storage 737
63.1 What are memory mapped les? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737
63.2 How do memory mapped les work? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737
63.3 How does MongoDB work with memory mapped les? . . . . . . . . . . . . . . . . . . . . . . . . 738
63.4 What are page faults? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 738
63.5 What is the difference between soft and hard page faults? . . . . . . . . . . . . . . . . . . . . . . . 738
63.6 What tools can I use to investigate storage use in MongoDB? . . . . . . . . . . . . . . . . . . . . . 738
63.7 What is the working set? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 738
63.8 Why are the les in my data directory larger than the data in my database? . . . . . . . . . . . . . . 739
63.9 How can I check the size of a collection? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740
63.10 How can I check the size of indexes? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740
63.11 How do I know when the server runs out of disk space? . . . . . . . . . . . . . . . . . . . . . . . . 740
64 FAQ: Indexes 743
64.1 Should you run ensureIndex() after every insert? . . . . . . . . . . . . . . . . . . . . . . . . . 743
64.2 How do you know what indexes exist in a collection? . . . . . . . . . . . . . . . . . . . . . . . . . . 744
64.3 How do you determine the size of an index? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744
64.4 What happens if an index does not t into RAM? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744
64.5 How do you know what index a query used? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744
64.6 How do you determine what elds to index? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744
64.7 How do write operations affect indexes? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744
64.8 Will building a large index affect database performance? . . . . . . . . . . . . . . . . . . . . . . . . 744
64.9 Can I use index keys to constrain query matches? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745
64.10 Using $ne and $nin in a query is slow. Why? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745
64.11 Can I use a multi-key index to support a query for a whole array? . . . . . . . . . . . . . . . . . . . 745
64.12 How can I effectively use indexes strategy for attribute lookups? . . . . . . . . . . . . . . . . . . . . 745
ix
65 FAQ: MongoDB Diagnostics 747
65.1 Where can I nd information about a mongod process that stopped running unexpectedly? . . . . . 747
65.2 Does TCP keepalive time affect sharded clusters and replica sets? . . . . . . . . . . . . . . . . . 748
65.3 Memory Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 748
65.4 Sharded Cluster Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 749
XV Reference 753
66 MongoDB Interface 755
66.1 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755
66.2 MongoDB and SQL Interface Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 966
66.3 Quick Reference Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974
67 Architecture and Components 989
67.1 MongoDB Package Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 989
68 Conguration 1045
68.1 Conguration File Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1045
68.2 Replica Set Conguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1058
68.3 mongod Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1066
68.4 Connection String URI Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1069
69 Status and Reporting 1075
69.1 Server Status Output Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075
69.2 Server Status Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1080
69.3 Database Statistics Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1098
69.4 Collection Statistics Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1100
69.5 Collection Validation Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1102
69.6 Connection Pool Statistics Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1104
69.7 Replica Set Status Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1106
69.8 Replication Info Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1109
69.9 Current Operation Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1110
69.10 Database Proler Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1114
69.11 Explain Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1117
69.12 Exit Codes and Statuses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1121
70 Internal Metadata 1123
70.1 Cong Database Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1123
70.2 The local Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1129
70.3 System Collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1131
71 General Reference 1133
71.1 MongoDB Limits and Thresholds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1133
71.2 MongoDB Extended JSON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1136
71.3 Text Search Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1138
71.4 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1140
72 Release Notes 1151
72.1 Current Stable Release . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1151
72.2 Previous Stable Releases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1170
72.3 Other MongoDB Release Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1194
x
XVI About MongoDB Documentation 1197
73 License 1201
74 Editions 1203
75 Version and Revisions 1205
76 Report an Issue or Make a Change Request 1207
77 Contribute to the Documentation 1209
77.1 MongoDB Manual Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1209
77.2 About the Documentation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1210
xi
xii
Part I
Installing MongoDB
1
CHAPTER 1
Installation Guides
MongoDB runs on most platforms, and supports 32-bit and 64-bit architectures. 10gen, the MongoDB makers, pro-
vides both binaries and packages. Choose your platform below:
1.1 Install MongoDB on Red Hat Enterprise, CentOS, or Fedora Linux
1.1.1 Synopsis
This tutorial outlines the basic installation process for deploying MongoDB on Red Hat Enterprise Linux, CentOS
Linux, Fedora Linux and related systems. This procedure uses .rpm packages as the basis of the installation. 10gen
publishes packages of the MongoDB releases as .rpm packages for easy installation and management for users of
CentOS, Fedora and Red Hat Enterprise Linux systems. While some of these distributions include their own MongoDB
packages, the 10gen packages are generally more up to date.
This tutorial includes: an overview of the available packages, instructions for conguring the package manager, the
process install packages from the 10gen repository, and preliminary MongoDB conguration and operation.
See Also:
The documentation of following related processes and concepts.
Other installation tutorials:
http://docs.mongodb.org/manual/tutorial/install-mongodb-on-debian-or-ubuntu-linux
Install MongoDB on Debian (page 9)
Install MongoDB on Ubuntu (page 6)
Install MongoDB on Linux (page 12)
Install MongoDB on OS X (page 13)
Install MongoDB on Windows (page 17)
1.1.2 Package Options
The 10gen repository contains two packages:
3
MongoDB Documentation, Release 2.4.1
mongo-10gen-server
This package contains the mongod (page 989) and mongos (page 999) daemons from the latest stable release
and associated conguration and init scripts. Additionally, you can use this package to install tools from a
previous release (page 4) of MongoDB.
mongo-10gen
By default, this package contains all MongoDB tools from latest stable release and you can use this package
to install previous releases (page 4) of MongoDB. Install this package on all production MongoDB hosts and
optionally on other systems from which you may need to administer MongoDB systems.
1.1.3 Installing MongoDB
Congure Package Management System (YUM)
Create a http://docs.mongodb.org/manual/etc/yum.repos.d/10gen.repo le to hold informa-
tion about your repository. If you are running a 64-bit system (recommended,) place the following conguration in
http://docs.mongodb.org/manual/etc/yum.repos.d/10gen.repo le:
[10gen]
name=10gen Repository
baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64
gpgcheck=0
enabled=1
If you are running a 32-bit system, which isnt recommended for production deployments, place the following cong-
uration in http://docs.mongodb.org/manual/etc/yum.repos.d/10gen.repo le:
[10gen]
name=10gen Repository
baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/i686
gpgcheck=0
enabled=1
Installing Packages
Issue the following command (as root or with sudo) to install the latest stable version of MongoDB and the associ-
ated tools:
yum install mongo-10gen mongo-10gen-server
When this command completes, you have successfully installed MongoDB!
Manage Installed Versions
You can use the mongo-10gen and mongo-10gen-server packages to install previous releases of MongoDB.
To install a specic release, append the version number, as in the following example:
yum install mongo-10gen-2.2.3 mongo-10gen-server-2.2.3
This installs the mongo-10gen and mongo-10gen-server packages with the 2.2.3 release. You can specify
any available version of MongoDB; however yum will upgrade the mongo-10gen and mongo-10gen-server
packages when a newer version becomes available. Use the following pinning procedure to prevent unintended up-
grades.
4 Chapter 1. Installation Guides
MongoDB Documentation, Release 2.4.1
To pin a package, add the following line to your http://docs.mongodb.org/manual/etc/yum.conf le:
exclude=mongo-10gen,mongo-10gen-server
1.1.4 Congure MongoDB
These packages congure MongoDB using the http://docs.mongodb.org/manual/etc/mongod.conf
le in conjunction with the control script. You can nd the init script at
http://docs.mongodb.org/manual/etc/rc.d/init.d/mongod.
This MongoDBinstance will store its data les in the http://docs.mongodb.org/manual/var/lib/mongo
and its log les in http://docs.mongodb.org/manual/var/log/mongo, and run using the mongod user
account.
Note: If you change the user that runs the MongoDB process, you will need to modify
the access control rights to the http://docs.mongodb.org/manual/var/lib/mongo and
http://docs.mongodb.org/manual/var/log/mongo directories.
1.1.5 Control MongoDB
Warning: With the introduction of systemd in Fedora 15, the control scripts included in the packages available
in the 10gen repository are not compatible with Fedora systems. A correction is forthcoming, see SERVER-7285
for more information, and in the mean time use your own control scripts or install using the procedure outlined in
Install MongoDB on Linux (page 12).
Start MongoDB
Start the mongod (page 989) process by issuing the following command (as root, or with sudo):
service mongod start
You can verify that the mongod (page 989) process has started successfully by checking the contents of the log le at
http://docs.mongodb.org/manual/var/log/mongo/mongod.log.
You may optionally, ensure that MongoDB will start following a system reboot, by issuing the following command
(with root privileges:)
chkconfig mongod on
Stop MongoDB
Stop the mongod (page 989) process by issuing the following command (as root, or with sudo):
service mongod stop
Restart MongoDB
You can restart the mongod (page 989) process by issuing the following command (as root, or with sudo):
1.1. Install MongoDB on Red Hat Enterprise, CentOS, or Fedora Linux 5
MongoDB Documentation, Release 2.4.1
service mongod restart
Followthe state of this process by watching the output in the http://docs.mongodb.org/manual/var/log/mongo/mongod.log
le to watch for errors or important messages from the server.
Control mongos
As of the current release, there are no control scripts for mongos (page 999). mongos (page 999) is only used in
sharding deployments and typically do not run on the same systems where mongod (page 989) runs. You can use the
mongodb script referenced above to derive your own mongos (page 999) control script.
SELinux Considerations
You must SELinux to allow MongoDB to start on Fedora systems. Administrators have two options:
enable access to the relevant ports (e.g. 27017) for SELinux. See Interfaces and Port Numbers (page 92) for
more information on MongoDBs default ports.
disable SELinux entirely. This requires a system reboot and may have larger implications for your deployment.
1.1.6 Using MongoDB
Among the tools included in the mongo-10gen package, is the mongo (page 1002) shell. You can connect to your
MongoDB instance by issuing the following command at the system prompt:
mongo
This will connect to the database running on the localhost interface by default. At the mongo (page 1002) prompt,
issue the following two commands to insert a record in the test collection of the (default) test database and then
retrieve that document.
> db.test.save( { a: 1 } )
> db.test.find()
See Also:
mongo (page 1002) and mongo Shell JavaScript Quick Reference (page 982)
1.2 Install MongoDB on Ubuntu
1.2.1 Synopsis
This tutorial outlines the basic installation process for installing MongoDB on Ubuntu Linux systems. This tutorial uses
.deb packages as the basis of the installation. 10gen publishes packages of the MongoDB releases as .deb packages
for easy installation and management for users of Ubuntu systems. Ubuntu does include MongoDB packages, the
10gen packages are generally more up to date.
This tutorial includes: an overview of the available packages, instructions for conguring the package manager, the
process for installing packages from the 10gen repository, and preliminary MongoDB conguration and operation.
Note: If you use an older Ubuntu that does not use Upstart, (i.e. any version before 9.10 Karmic) please follow the
instructions on the Install MongoDB on Debian (page 9) tutorial.
6 Chapter 1. Installation Guides
MongoDB Documentation, Release 2.4.1
See Also:
The documentation of following related processes and concepts.
Other installation tutorials:
Install MongoDB on Red Hat Enterprise, CentOS, or Fedora Linux (page 3)
Install MongoDB on Debian (page 9)
Install MongoDB on Linux (page 12)
Install MongoDB on OS X (page 13)
Install MongoDB on Windows (page 17)
1.2.2 Package Options
The 10gen repository provides the mongodb-10gen package, which contains the latest stable release. Use this for
production deployments. Additionally you can install previous releases (page 7) of MongoDB.
You cannot install these packages concurrently with each other or with the mongodb package that your release of
Ubuntu may include.
1.2.3 Installing MongoDB
Congure Package Management System (APT)
The Ubuntu package management tool (i.e. dpkg and apt) ensure package consistency and authenticity by requiring
that distributors sign packages with GPG keys. Issue the following command to import the 10gen public GPG Key:
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10
Create a http://docs.mongodb.org/manual/etc/apt/sources.list.d/10gen.list le and in-
clude the following line for the 10gen repository.
deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen
Now issue the following command to reload your repository:
sudo apt-get update
Manage Installed Versions
You can use the mongodb-10gen package to install previous versions of MongoDB. To install a specic release,
append the version number to the package name, as in the following example:
apt-get install mongodb-10gen=2.2.3
This will install the 2.2.3 release of MongoDB. You can specify any available version of MongoDB; however
apt-get will upgrade the mongodb-10gen package when a newer version becomes available. Use the following
pinning procedure to prevent unintended upgrades.
To pin a package, issue the following command at the system prompt to pin the version of MongoDB at the currently
installed version:
echo "mongodb-10gen hold" | dpkg --set-selections
1.2. Install MongoDB on Ubuntu 7
MongoDB Documentation, Release 2.4.1
Install Packages
Issue the following command to install the latest stable version of MongoDB:
sudo apt-get install mongodb-10gen
When this command completes, you have successfully installed MongoDB! Continue for conguration and start-up
suggestions.
1.2.4 Congure MongoDB
These packages congure MongoDB using the http://docs.mongodb.org/manual/etc/mongodb.conf
le in conjunction with the control script. You will nd the control script is at
http://docs.mongodb.org/manual/etc/init.d/mongodb.
This MongoDBinstance will store its data les in the http://docs.mongodb.org/manual/var/lib/mongodb
and its log les in http://docs.mongodb.org/manual/var/log/mongodb, and run using the mongodb
user account.
Note: If you change the user that runs the MongoDB process, you will need to modify the
access control rights to the http://docs.mongodb.org/manual/var/lib/mongodb and
http://docs.mongodb.org/manual/var/log/mongodb directories.
1.2.5 Controlling MongoDB
Starting MongoDB
You can start the mongod (page 989) process by issuing the following command:
sudo service mongodb start
You can verify that mongod (page 989) has started successfully by checking the contents of the log le at
http://docs.mongodb.org/manual/var/log/mongodb/mongodb.log.
Stopping MongoDB
As needed, you may stop the mongod (page 989) process by issuing the following command:
sudo service mongodb stop
Restarting MongoDB
You may restart the mongod (page 989) process by issuing the following command:
sudo service mongodb restart
Controlling mongos
As of the current release, there are no control scripts for mongos (page 999). mongos (page 999) is only used in
sharding deployments and typically do not run on the same systems where mongod (page 989) runs. You can use the
mongodb script referenced above to derive your own mongos (page 999) control script.
8 Chapter 1. Installation Guides
MongoDB Documentation, Release 2.4.1
1.2.6 Using MongoDB
Among the tools included with the MongoDB package, is the mongo (page 1002) shell. You can connect to your
MongoDB instance by issuing the following command at the system prompt:
mongo
This will connect to the database running on the localhost interface by default. At the mongo (page 1002) prompt,
issue the following two commands to insert a record in the test collection of the (default) test database.
> db.test.save( { a: 1 } )
> db.test.find()
See Also:
mongo (page 1002) and mongo Shell JavaScript Quick Reference (page 982)
1.3 Install MongoDB on Debian
1.3.1 Synopsis
This tutorial outlines the basic installation process for installing MongoDB on Debian systems. This tutorial uses
.deb packages as the basis of the installation. 10gen publishes packages of the MongoDB releases as .deb packages
for easy installation and management for users of Debian systems. While some of these distributions include their
own MongoDB packages, the 10gen packages are generally more up to date.
This tutorial includes: an overview of the available packages, instructions for conguring the package manager, the
process for installing packages from the 10gen repository, and preliminary MongoDB conguration and operation.
Note: If youre running a version of Ubuntu Linux prior to 9.10 Karmic, use this tutorial. Other Ubuntu users will
want to follow the Install MongoDB on Ubuntu (page 6) tutorial.
See Also:
The documentation of following related processes and concepts.
Other installation tutorials:
Install MongoDB on Red Hat Enterprise, CentOS, or Fedora Linux (page 3)
Install MongoDB on Ubuntu (page 6)
Install MongoDB on Linux (page 12)
Install MongoDB on OS X (page 13)
Install MongoDB on Windows (page 17)
1.3.2 Package Options
The 10gen repository provides the mongodb-10gen package, which contains the latest stable release. Use this for
production deployments. Additionally you can install previous releases (page 10) of MongoDB.
You cannot install these packages concurrently with each other or with the mongodb package that your release of
Debian may include.
1.3. Install MongoDB on Debian 9
MongoDB Documentation, Release 2.4.1
1.3.3 Installing MongoDB
Congure Package Management System (APT)
The Debian package management tool (i.e. dpkg and apt) ensure package consistency and authenticity by requiring
that distributors sign packages with GPG keys. Issue the following command to import the 10gen public GPG Key:
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10
Create a the http://docs.mongodb.org/manual/etc/apt/sources.list.d/10gen.list le and
include the following line for the 10gen repository.
deb http://downloads-distro.mongodb.org/repo/debian-sysvinit dist 10gen
Now issue the following command to reload your repository:
sudo apt-get update
Install Packages
Issue the following command to install the latest stable version of MongoDB:
sudo apt-get install mongodb-10gen
When this command completes, you have successfully installed MongoDB!
Manage Installed Versions
You can use the mongodb-10gen package to install previous versions of MongoDB. To install a specic release,
append the version number to the package name, as in the following example:
apt-get install mongodb-10gen=2.2.3
This will install the 2.2.3 release of MongoDB. You can specify any available version of MongoDB; however
apt-get will upgrade the mongodb-10gen package when a newer version becomes available. Use the following
pinning procedure to prevent unintended upgrades.
To pin a package, issue the following command at the system prompt to pin the version of MongoDB at the currently
installed version:
echo "mongodb-10gen hold" | dpkg --set-selections
1.3.4 Congure MongoDB
These packages congure MongoDB using the http://docs.mongodb.org/manual/etc/mongodb.conf
le in conjunction with the control script. You can nd the control script at
http://docs.mongodb.org/manual/etc/init.d/mongodb.
This MongoDBinstance will store its data les in the http://docs.mongodb.org/manual/var/lib/mongodb
and its log les in http://docs.mongodb.org/manual/var/log/mongodb, and run using the mongodb
user account.
Note: If you change the user that runs the MongoDB process, you will need to modify the
access control rights to the http://docs.mongodb.org/manual/var/lib/mongodb and
http://docs.mongodb.org/manual/var/log/mongodb directories.
10 Chapter 1. Installation Guides
MongoDB Documentation, Release 2.4.1
1.3.5 Controlling MongoDB
Starting MongoDB
Issue the following command to start mongod (page 989):
sudo /etc/init.d/mongodb start
You can verify that mongod (page 989) has started successfully by checking the contents of the log le at
http://docs.mongodb.org/manual/var/log/mongodb/mongodb.log.
Stopping MongoDB
Issue the following command to stop mongod (page 989):
sudo /etc/init.d/mongodb stop
Restarting MongoDB
Issue the following command to restart mongod (page 989):
sudo /etc/init.d/mongodb restart
Controlling mongos
As of the current release, there are no control scripts for mongos (page 999). mongos (page 999) is only used in
sharding deployments and typically do not run on the same systems where mongod (page 989) runs. You can use the
mongodb script referenced above to derive your own mongos (page 999) control script.
1.3.6 Using MongoDB
Among the tools included with the MongoDB package, is the mongo (page 1002) shell. You can connect to your
MongoDB instance by issuing the following command at the system prompt:
mongo
This will connect to the database running on the localhost interface by default. At the mongo (page 1002) prompt,
issue the following two commands to insert a record in the test collection of the (default) test database.
> db.test.save( { a: 1 } )
> db.test.find()
See Also:
mongo (page 1002) and mongo Shell JavaScript Quick Reference (page 982)
1.3. Install MongoDB on Debian 11
MongoDB Documentation, Release 2.4.1
1.4 Install MongoDB on Linux
1.4.1 Synopsis
10gen provides compiled versions of MongoDB for use on Linux that provides a simple option for users who cannot
use packages. This tutorial outlines the basic installation of MongoDB using these compiled versions and an initial
usage guide.
See Also:
The documentation of following related processes and concepts.
Other installation tutorials:
Install MongoDB on Red Hat Enterprise, CentOS, or Fedora Linux (page 3)
Install MongoDB on Ubuntu (page 6)
Install MongoDB on Debian (page 9)
Install MongoDB on OS X (page 13)
Install MongoDB on Windows (page 17)
1.4.2 Download MongoDB
Note: You should place the MongoDB binaries in a central location on the le system
that is easy to access and control. Consider http://docs.mongodb.org/manual/opt or
http://docs.mongodb.org/manual/usr/local/bin.
In a terminal session, begin by downloading the latest release. In most cases you will want to download the 64-bit
version of MongoDB.
curl http://downloads.mongodb.org/linux/mongodb-linux-x86_64-2.4.1.tgz > mongodb.tgz
If you need to run the 32-bit version, use the following command.
curl http://downloads.mongodb.org/linux/mongodb-linux-i686-2.4.1.tgz > mongodb.tgz
Once youve downloaded the release, issue the following command to extract the les from the archive:
tar -zxvf mongodb.tgz
Optional
You may use the following command to copy the extracted folder into a more generic location.
cp -R -n mongodb-linux-????-??-??/ mongodb
You can nd the mongod (page 989) binary, and the binaries all of the associated MongoDB utilities, in the bin/
directory within the extracted directory.
12 Chapter 1. Installation Guides
MongoDB Documentation, Release 2.4.1
Using MongoDB
Before you start mongod (page 989) for the rst time, you will need to create the data directory. By default, mongod
(page 989) writes data to the http://docs.mongodb.org/manual/data/db/ directory. To create this di-
rectory, use the following command:
mkdir -p /data/db
Note: Ensure that the system account that will run the mongod (page 989) process has read and write permissions to
this directory. If mongod (page 989) runs under the mongo user account, issue the following command to change the
owner of this folder:
chown mongo /data/db
If you use an alternate location for your data directory, ensure that this user can write to your chosen data path.
You can specify, and create, an alternate path using the --dbpath (page 991) option to mongod (page 989) and the
above command.
The 10gen builds of MongoDB contain no control scripts or method to control the mongod (page 989)
process. You may wish to create control scripts, modify your path, and/or create symbolic links
to the MongoDB programs in your http://docs.mongodb.org/manual/usr/local/bin or
http://docs.mongodb.org/manual/usr/bin directory for easier use.
For testing purposes, you can start a mongod (page 989) directly in the terminal without creating a control script:
mongod --config /etc/mongod.conf
Note: The above command assumes that the mongod (page 989) binary is accessible via
your systems search path, and that you have created a default conguration le located at
http://docs.mongodb.org/manual/etc/mongod.conf.
Among the tools included with this MongoDB distribution, is the mongo (page 1002) shell. You can use this shell to
connect to your MongoDB instance by issuing the following command at the system prompt:
./bin/mongo
Note: The ./bin/mongo command assumes that the mongo (page 1002) binary is in the bin/ sub-directory of
the current directory. This is the directory into which you extracted the .tgz le.
This will connect to the database running on the localhost interface by default. At the mongo (page 1002) prompt,
issue the following two commands to insert a record in the test collection of the (default) test database and then
retrieve that record:
> db.test.save( { a: 1 } )
> db.test.find()
See Also:
mongo (page 1002) and mongo Shell JavaScript Quick Reference (page 982)
1.5 Install MongoDB on OS X
1.5. Install MongoDB on OS X 13
MongoDB Documentation, Release 2.4.1
Platform Support
MongoDB only supports OS X versions 10.6 (Snow Leopard) and later. Changed in version 2.4.
1.5.1 Synopsis
This tutorial outlines the basic installation process for deploying MongoDB on Macintosh OS X systems. This tutorial
provides two main methods of installing the MongoDB server (i.e. mongod (page 989)) and associated tools: rst
using the community package management tools, and second using builds of MongoDB provided by 10gen.
See Also:
The documentation of following related processes and concepts.
Other installation tutorials:
Install MongoDB on Red Hat Enterprise, CentOS, or Fedora Linux (page 3)
Install MongoDB on Ubuntu (page 6)
Install MongoDB on Debian (page 9)
Install MongoDB on Linux (page 12)
Install MongoDB on Windows (page 17)
1.5.2 Installing with Package Management
Both community package management tools: Homebrew and MacPorts require some initial setup and conguration.
This conguration is beyond the scope of this document. You only need to use one of these tools.
If you want to use package management, and do not already have a system installed, Homebrew is typically easier and
simpler to use.
Homebrew
Homebrew installs binary packages based on published formula. Issue the following command at the system shell
to update the brew package manager:
brew update
Use the following command to install the MongoDB package into your Homebrew system.
brew install mongodb
Later, if you need to upgrade MongoDB, you can issue the following sequence of commands to update the MongoDB
installation on your system:
brew update
brew upgrade mongodb
MacPorts
MacPorts distributes build scripts that allow you to easily build packages and their dependencies on your own system.
The compilation process can take signicant period of time depending on your systems capabilities and existing
dependencies. Issue the following command in the system shell:
14 Chapter 1. Installation Guides
MongoDB Documentation, Release 2.4.1
port install mongodb
Using MongoDB from Homebrew and MacPorts
The packages installed with Homebrew and MacPorts contain no control scripts or interaction with the systems
process manager.
If you have congured Homebrew and MacPorts correctly, including setting your PATH, the MongoDB applications
and utilities will be accessible from the system shell. Start the mongod (page 989) process in a terminal (for testing
or development) or using a process management tool.
mongod
Then open the mongo (page 1002) shell by issuing the following command at the system prompt:
mongo
This will connect to the database running on the localhost interface by default. At the mongo (page 1002) prompt,
issue the following two commands to insert a record in the test collection of the (default) test database and then
retrieve that record.
> db.test.save( { a: 1 } )
> db.test.find()
See Also:
mongo (page 1002) and mongo Shell JavaScript Quick Reference (page 982)
1.5.3 Installing from 10gen Builds
10gen provides compiled binaries of all MongoDB software compiled for OS X, which may provide a more straight-
forward installation process.
Download MongoDB
In a terminal session, begin by downloading the latest release. Use the following command at the system prompt:
curl http://downloads.mongodb.org/osx/mongodb-osx-x86_64-2.4.1.tgz > mongodb.tgz
Note: The mongod (page 989) process will not run on older Macintosh computers with PowerPC (i.e. non-Intel)
processors.
Once youve downloaded the release, issue the following command to extract the les from the archive:
tar -zxvf mongodb.tgz
Optional
You may use the following command to move the extracted folder into a more generic location.
mv -n mongodb-osx-[platform]-[version]/ /path/to/new/location/
1.5. Install MongoDB on OS X 15
MongoDB Documentation, Release 2.4.1
Replace [platform] with i386 or x86_64 depending on your system and the version you downloaded, and
[version] with 2.4 or the version of MongoDB that you are installing.
You can nd the mongod (page 989) binary, and the binaries all of the associated MongoDB utilities, in the bin/
directory within the archive.
Using MongoDB from 10gen Builds
Before you start mongod (page 989) for the rst time, you will need to create the data directory. By default, mongod
(page 989) writes data to the http://docs.mongodb.org/manual/data/db/ directory. To create this di-
rectory, and set the appropriate permissions use the following commands:
sudo mkdir -p /data/db
sudo chown id -u /data/db
You can specify an alternate path for data les using the --dbpath (page 991) option to mongod (page 989).
The 10gen builds of MongoDB contain no control scripts or method to control the mongod (page 989) process. You
may wish to create control scripts, modify your path, and/or create symbolic links to the MongoDB programs in your
http://docs.mongodb.org/manual/usr/local/bin directory for easier use.
For testing purposes, you can start a mongod (page 989) directly in the terminal without creating a control script:
mongod --config /etc/mongod.conf
Note: This command assumes that the mongod (page 989) binary is accessible via
your systems search path, and that you have created a default conguration le located at
http://docs.mongodb.org/manual/etc/mongod.conf.
Among the tools included with this MongoDB distribution, is the mongo (page 1002) shell. You can use this shell
to connect to your MongoDB instance by issuing the following command at the system prompt from inside of the
directory where you extracted mongo (page 1002):
./bin/mongo
Note: The ./bin/mongo command assumes that the mongo (page 1002) binary is in the bin/ sub-directory of
the current directory. This is the directory into which you extracted the .tgz le.
This will connect to the database running on the localhost interface by default. At the mongo (page 1002) prompt,
issue the following two commands to insert a record in the test collection of the (default) test database and then
retrieve that record:
> db.test.save( { a: 1 } )
> db.test.find()
See Also:
mongo (page 1002) and mongo Shell JavaScript Quick Reference (page 982)
16 Chapter 1. Installation Guides
MongoDB Documentation, Release 2.4.1
1.6 Install MongoDB on Windows
1.6.1 Synopsis
This tutorial provides a method for installing and running the MongoDB server (i.e. mongod.exe (page 1008)) on
the Microsoft Windows platform through the Command Prompt and outlines the process for setting up MongoDB as
a Windows Service.
Operating MongoDB with Windows is similar to MongoDB on other platforms. Most components share the same
operational patterns.
1.6.2 Procedure
Download MongoDB for Windows
Download the latest production release of MongoDB from the MongoDB downloads page.
There are three builds of MongoDB for Windows:
MongoDB for Windows Server 2008 R2 edition only runs on Windows Server 2008 R2, Windows 7 64-bit, and
newer versions of Windows. This build takes advantage of recent enhancements to the Windows Platform and
cannot operate on older versions of Windows.
MongoDB for Windows 64-bit runs on any 64-bit version of Windows newer than Windows XP, including
Windows Server 2008 R2 and Windows 7 64-bit.
MongoDB for Windows 32-bit runs on any 32-bit version of Windows newer than Windows XP. 32-bit versions
of MongoDB are only intended for older systems and for use in testing and development systems.
Changed in version 2.2: MongoDB does not support Windows XP. Please use a more recent version of Windows to
use more recent releases of MongoDB.
Note: Always download the correct version of MongoDB for your Windows system. The 64-bit versions of Mon-
goDB will not work with 32-bit Windows.
32-bit versions of MongoDB are suitable only for testing and evaluation purposes and only support databases smaller
than 2GB.
You can nd the architecture of your version of Windows platform using the following command in the Command
Prompt
wmic os get osarchitecture
In Windows Explorer, nd the MongoDB download le, typically in the default Downloads directory. Extract the
archive to C:\ by right clicking on the archive and selecting Extract All and browsing to C:\.
Note: The folder name will be either:
C:\mongodb-win32-i386-[version]
Or:
C:\mongodb-win32-x86_64-[version]
In both examples, replace [version] with the version of MongoDB downloaded.
1.6. Install MongoDB on Windows 17
MongoDB Documentation, Release 2.4.1
Set up the Environment
Start the Command Prompt by selecting the Start Menu, then All Programs, then Accessories, then right click Com-
mand Prompt, and select Run as Administrator from the popup menu. In the Command Prompt, issue the following
commands:
cd \
move C:\mongodb-win32-
*
C:\mongodb
Note: MongoDB is self-contained and does not have any other system dependencies. You can run MongoDB from
any folder you choose. You may install MongoDB in any directory (e.g. D:\test\mongodb)
MongoDB requires a data folder to store its les. The default location for the MongoDB data directory is
C:\data\db. Create this folder using the Command Prompt. Issue the following command sequence:
md data
md data\db
Note: You may specify an alternate path for \data\db with the dbpath (page 1048) setting for mongod.exe
(page 1008), as in the following example:
C:\mongodb\bin\mongod.exe --dbpath d:\test\mongodb\data
If your path includes spaces, enclose the entire path in double quotations, for example:
C:\mongodb\bin\mongod.exe --dbpath "d:\test\mongo db data"
Start MongoDB
To start MongoDB, execute from the Command Prompt:
C:\mongodb\bin\mongod.exe
This will start the main MongoDB database process. The waiting for connections message in the console
output indicates that the mongod.exe process is running successfully.
Note: Depending on the security level of your system, Windows will issue a Security Alert dialog box about blocking
some features of C:\\mongodb\bin\mongod.exe from communicating on networks. All users should select
Private Networks, such as my home or work network and click Allow access. For additional
information on security and MongoDB, please read the Security Practices and Management (page 91) page.
Warning: Do not allow mongod.exe (page 1008) to be accessible to public networks without running in
Secure Mode (i.e. auth (page 1048).) MongoDB is designed to be run in trusted environments and the
database does not enable authentication or Secure Mode by default.
Connect to MongoDB using the mongo.exe shell. Open another Command Prompt and issue the following com-
mand:
C:\mongodb\bin\mongo.exe
Note: Executing the command start C:\mongodb\bin\mongo.exe will automatically start the mongo.exe
shell in a separate Command Prompt window.
18 Chapter 1. Installation Guides
MongoDB Documentation, Release 2.4.1
The mongo.exe shell will connect to mongod.exe (page 1008) running on the localhost interface and port 27017
by default. At the mongo.exe prompt, issue the following two commands to insert a record in the test collection
of the default test database and then retrieve that record:
> db.test.save( { a: 1 } )
> db.test.find()
See Also:
mongo (page 1002) and mongo Shell JavaScript Quick Reference (page 982). If you want to develop applications
using .NET, see the documentation of C# and MongoDB for more information.
1.6.3 MongoDB as a Windows Service
New in version 2.0. Setup MongoDB as a Windows Service, so that the database will start automatically following
each reboot cycle.
Note: mongod.exe (page 1008) added support for running as a Windows service in version 2.0, and mongos.exe
(page 1009) added support for running as a Windows Service in version 2.1.1.
Congure the System
You should specify two options when running MongoDB as a Windows Service: a path for the log output (i.e.
logpath (page 1047)) and a conguration le (page 1045).
1. Create a specic directory for MongoDB log les:
md C:\mongodb\log
2. Create a conguration le for the logpath (page 1047) option for MongoDB in the Command Prompt by
issuing this command:
echo logpath=C:\mongodb\log\mongo.log > C:\mongodb\mongod.cfg
While these optional steps are optional, creating a specic location for log les and using the conguration le are
good practice.
Note: Consider setting the logappend (page 1047) option. If you do not, mongod.exe (page 1008) will delete
the contents of the existing log le when starting. Changed in version 2.2: The default logpath (page 1047) and
logappend (page 1047) behavior changed in the 2.2 release.
Install and Run the MongoDB Service
Run all of the following commands in Command Prompt with Administrative Privileges:
1. To install the MongoDB service:
C:\mongodb\bin\mongod.exe --config C:\mongodb\mongod.cfg --install
Modify the path to the mongod.cfg le as needed. For the --install (page 1008) option to succeed, you
must specify a logpath (page 1047) setting or the --logpath (page 990) run-time option.
1.6. Install MongoDB on Windows 19
MongoDB Documentation, Release 2.4.1
2. To run the MongoDB service:
net start MongoDB
Note: If you wish to use an alternate path for your dbpath (page 1048) specify it in the cong le (e.g.
C:\mongodb\mongod.cfg) on that you specied in the --install (page 1008) operation. You may also spec-
ify --dbpath (page 991) on the command line; however, always prefer the conguration le.
If the dbpath (page 1048) directory does not exist, mongod.exe (page 1008) will not be able to start. The default
value for dbpath (page 1048) is \data\db.
Stop or Remove the MongoDB Service
To stop the MongoDB service:
net stop MongoDB
To remove the MongoDB service:
C:\mongodb\bin\mongod.exe --remove
1.7 Install MongoDB Enterprise
New in version 2.2. MongoDB Enterprise is available on four platforms and contains support for several features
related to security and monitoring.
1.7.1 Required Packages
Changed in version 2.4: MongoDB Enterprise requires libgsasl. To use MongoDB Enterprise, you must install
several prerequisites. The names of the packages vary by distribution and are as follows:
Ubuntu 12.04 and 11.04 require libssl0.9.8, libgsasl, snmp, and snmpd. Issue a command such as
the following to install these packages:
sudo apt-get install libssl0.9.8 libgsasl7 snmp snmpd
Red Hat Enterprise Linux 6.x series and Amazon Linux AMI require libssl, libgsasl7, net-snmp,
net-snmp-libs, and net-snmp-utils. To download libgsasl you must enable the EPEL repository
by issuing the following sequence of commands to add and update the system repositories:
sudo rpm -ivh http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
sudo yum update -y
When you have installed and updated the EPEL repositories, issue the following install these packages:
sudo yum install libssl net-snmp net-snmp-libs net-snmp-utils libgsasl
SUSE Enterprise Linux requires libopenssl0_9_8, libsnmp15, slessp1-libsnmp15, and
snmp-mibs. Issue a command such as the following to install these packages:
sudo zypper install libopenssl0_9_8 libsnmp15 slessp1-libsnmp15 snmp-mibs
20 Chapter 1. Installation Guides
MongoDB Documentation, Release 2.4.1
Note: For the 2.4 release, the MongoDB Enterprise for SUSE requires libgsasl which is not available in the
default repositories for SUSE.
1.7.2 Install MongoDB Enterprise Binaries
When you have installed the required packages, and downloaded the Enterprise packages you can install the packages
using the same procedure as a standard installation of MongoDB on Linux Systems (page 12).
After you have installed MongoDB, consider the following documents as you begin to learn about MongoDB:
1.8 Getting Started with MongoDB Development
This tutorial provides an introduction to basic database operations using the mongo (page 1002) shell. mongo
(page 1002) is a part of the standard MongoDB distribution and provides a full JavaScript environment with a complete
access to the JavaScript language and all standard functions as well as a full database interface for MongoDB. See the
mongo JavaScript API documentation and the mongo (page 1002) shell JavaScript Method Reference (page 982).
The tutorial assumes that youre running MongoDB on a Linux or OS X operating system and that you have a running
database server; MongoDB does support Windows and provides a Windows distribution with identical operation. For
instructions on installing MongoDBand starting the database server see the appropriate installation (page 3) document.
This tutorial addresses the following aspects of MongoDB use:
Connect to a Database (page 21)
Connect to a mongod (page 989) (page 21)
Select a Database (page 22)
Display mongo Help (page 22)
Create a Collection and Insert Documents (page 22)
Insert Individual Documents (page 22)
Insert Multiple Documents Using a For Loop (page 23)
Working with the Cursor (page 24)
Iterate over the Cursor with a Loop (page 24)
Use Array Operations with the Cursor (page 25)
Query for Specic Documents (page 25)
Return a Single Document from a Collection (page 27)
Limit the Number of Documents in the Result Set (page 27)
Next Steps with MongoDB (page 27)
1.8.1 Connect to a Database
In this section you connect to the database server, which runs as mongod (page 989), and begin using the mongo
(page 1002) shell to select a logical database within the database instance and access the help text in the mongo
(page 1002) shell.
Connect to a mongod
From a system prompt, start mongo (page 1002) by issuing the mongo (page 1002) command, as follows:
1.8. Getting Started with MongoDB Development 21
MongoDB Documentation, Release 2.4.1
mongo
By default, mongo (page 1002) looks for a database server listening on port 27017 on the localhost interface. To
connect to a server on a different port or interface, use the --port (page 1003) and --host (page 1003) options.
Select a Database
After starting the mongo (page 1002) shell your session will use the test database for context, by default. At any
time issue the following operation at the mongo (page 1002) to report the current database:
db
db returns the name of the current database.
1. From the mongo (page 1002) shell, display the list of databases with the following operation:
show dbs
2. Switch to a new database named mydb with the following operation:
use mydb
3. Conrm that your session has the mydb database as context, using the db operation, which returns the name of
the current database as follows:
db
At this point, if you issue the show dbs operation again, it will not include mydb, because MongoDB will not create
a database until you insert data into that database. The Create a Collection and Insert Documents (page 22) section
describes the process for inserting data. New in version 2.4: show databases also returns a list of databases.
Display mongo Help
At any point you can access help for the mongo (page 1002) shell using the following operation:
help
Furthermore, you can append the .help() method to some JavaScript methods, any cursor object, as well as the db
and db.collection objects to return additional help information.
1.8.2 Create a Collection and Insert Documents
In this section, you insert documents into a new collection named things within the new database named mydb.
MongoDB will create collections and databases implicitly upon their rst use: you do not need to create the database
or collection before inserting data. Furthermore, because MongoDB uses dynamic schemas (page 702), you do not
need to specify the structure of your documents before inserting them into the collection.
Insert Individual Documents
1. From the mongo (page 1002) shell, conrm that the current context is the mydb database with the following
operation:
db
22 Chapter 1. Installation Guides
MongoDB Documentation, Release 2.4.1
2. If mongo (page 1002) does not return mydb for the previous operation, set the context to the mydb database
with the following operation:
use mydb
3. Create two documents, named j and k, with the following sequence of JavaScript operations:
j = { name : "mongo" }
k = { x : 3 }
4. Insert the j and k documents into the collection things with the following sequence of operations:
db.things.insert( j )
db.things.insert( k )
When you insert the rst document, the mongod (page 989) will create both the mydb database and the things
collection.
5. Conrm that the collection named things exists using the following operation:
show collections
The mongo (page 1002) shell will return the list of the collections in the current (i.e. mydb) database. At
this point, the only collection is things. All mongod (page 989) databases also have a system.indexes
(page 1131) collection.
6. Conrm that the documents exist in the collection things by issuing query on the things collection. Using
the find() (page 910) method in an operation that resembles the following:
db.things.find()
This operation returns the following results. The ObjectId (page 158) values will be unique:
{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
{ "_id" : ObjectId("4c2209fef3924d31102bd84b"), "x" : 3 }
All MongoDB documents must have an _id eld with a unique value. These operations do not explicitly
specify a value for the _id eld, so mongo (page 1002) creates a unique ObjectId (page 158) value for the eld
before inserting it into the collection.
Insert Multiple Documents Using a For Loop
1. From the mongo (page 1002) shell, add more documents to the things collection using the following for
loop:
for (var i = 1; i <= 20; i++) db.things.insert( { x : 4 , j : i } )
2. Query the collection by issuing the following command:
db.things.find()
The mongo (page 1002) shell displays the rst 20 documents in the collection. Your ObjectId (page 158) values
will be different:
{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
{ "_id" : ObjectId("4c2209fef3924d31102bd84b"), "x" : 3 }
{ "_id" : ObjectId("4c220a42f3924d31102bd856"), "x" : 4, "j" : 1 }
{ "_id" : ObjectId("4c220a42f3924d31102bd857"), "x" : 4, "j" : 2 }
{ "_id" : ObjectId("4c220a42f3924d31102bd858"), "x" : 4, "j" : 3 }
{ "_id" : ObjectId("4c220a42f3924d31102bd859"), "x" : 4, "j" : 4 }
1.8. Getting Started with MongoDB Development 23
MongoDB Documentation, Release 2.4.1
{ "_id" : ObjectId("4c220a42f3924d31102bd85a"), "x" : 4, "j" : 5 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85b"), "x" : 4, "j" : 6 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85c"), "x" : 4, "j" : 7 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85d"), "x" : 4, "j" : 8 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85e"), "x" : 4, "j" : 9 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85f"), "x" : 4, "j" : 10 }
{ "_id" : ObjectId("4c220a42f3924d31102bd860"), "x" : 4, "j" : 11 }
{ "_id" : ObjectId("4c220a42f3924d31102bd861"), "x" : 4, "j" : 12 }
{ "_id" : ObjectId("4c220a42f3924d31102bd862"), "x" : 4, "j" : 13 }
{ "_id" : ObjectId("4c220a42f3924d31102bd863"), "x" : 4, "j" : 14 }
{ "_id" : ObjectId("4c220a42f3924d31102bd864"), "x" : 4, "j" : 15 }
{ "_id" : ObjectId("4c220a42f3924d31102bd865"), "x" : 4, "j" : 16 }
{ "_id" : ObjectId("4c220a42f3924d31102bd866"), "x" : 4, "j" : 17 }
{ "_id" : ObjectId("4c220a42f3924d31102bd867"), "x" : 4, "j" : 18 }
1. The find() (page 910) returns a cursor. To iterate the cursor and return more documents use the it operation
in the mongo (page 1002) shell. The mongo (page 1002) shell will exhaust the cursor, and return the following
documents:
{ "_id" : ObjectId("4c220a42f3924d31102bd868"), "x" : 4, "j" : 19 }
{ "_id" : ObjectId("4c220a42f3924d31102bd869"), "x" : 4, "j" : 20 }
For more information on inserting new documents, see the insert() (page 168) documentation.
1.8.3 Working with the Cursor
When you query a collection, MongoDB returns a cursor object that contains the results of the query. The mongo
(page 1002) shell then iterates over the cursor to display the results. Rather than returning all results at once, the shell
iterates over the cursor 20 times to display the rst 20 results and then waits for a request to iterate over the remaining
results. This prevents mongo (page 1002) from displaying thousands or millions of results at once.
The it operation allows you to iterate over the next 20 results in the shell. In the previous procedure (page 24), the
cursor only contained two more documents, and so only two more documents displayed.
The procedures in this section show other ways to work with a cursor. For comprehensive documentation on cursors,
see Iterate the Returned Cursor (page 181).
Iterate over the Cursor with a Loop
1. In the MongoDB JavaScript shell, query the things collection and assign the resulting cursor object to the c
variable:
var c = db.things.find()
2. Print the full result set by using a while loop to iterate over the c variable:
while ( c.hasNext() ) printjson( c.next() )
The hasNext() function returns true if the cursor has documents. The next() method returns the next
document. The printjson() method renders the document in a JSON-like format.
The result of this operation follows, although if the ObjectId (page 158) values will be unique:
{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
{ "_id" : ObjectId("4c2209fef3924d31102bd84b"), "x" : 3 }
{ "_id" : ObjectId("4c220a42f3924d31102bd856"), "x" : 4, "j" : 1 }
{ "_id" : ObjectId("4c220a42f3924d31102bd857"), "x" : 4, "j" : 2 }
24 Chapter 1. Installation Guides
MongoDB Documentation, Release 2.4.1
{ "_id" : ObjectId("4c220a42f3924d31102bd858"), "x" : 4, "j" : 3 }
{ "_id" : ObjectId("4c220a42f3924d31102bd859"), "x" : 4, "j" : 4 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85a"), "x" : 4, "j" : 5 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85b"), "x" : 4, "j" : 6 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85c"), "x" : 4, "j" : 7 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85d"), "x" : 4, "j" : 8 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85e"), "x" : 4, "j" : 9 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85f"), "x" : 4, "j" : 10 }
{ "_id" : ObjectId("4c220a42f3924d31102bd860"), "x" : 4, "j" : 11 }
{ "_id" : ObjectId("4c220a42f3924d31102bd861"), "x" : 4, "j" : 12 }
{ "_id" : ObjectId("4c220a42f3924d31102bd862"), "x" : 4, "j" : 13 }
{ "_id" : ObjectId("4c220a42f3924d31102bd863"), "x" : 4, "j" : 14 }
{ "_id" : ObjectId("4c220a42f3924d31102bd864"), "x" : 4, "j" : 15 }
{ "_id" : ObjectId("4c220a42f3924d31102bd865"), "x" : 4, "j" : 16 }
{ "_id" : ObjectId("4c220a42f3924d31102bd866"), "x" : 4, "j" : 17 }
{ "_id" : ObjectId("4c220a42f3924d31102bd867"), "x" : 4, "j" : 18 }
{ "_id" : ObjectId("4c220a42f3924d31102bd868"), "x" : 4, "j" : 19 }
{ "_id" : ObjectId("4c220a42f3924d31102bd869"), "x" : 4, "j" : 20 }
Use Array Operations with the Cursor
You can manipulate a cursor object as if it were an array. Consider the following procedure:
1. In the mongo (page 1002) shell, query the things collection and assign the resulting cursor object to the c
variable:
var c = db.things.find()
2. To nd the document at the array index 4, use the following operation:
printjson( c [ 4 ] )
MongoDB returns the following:
{ "_id" : ObjectId("4c220a42f3924d31102bd858"), "x" : 4, "j" : 3 }
When you access documents in a cursor using the array index notation, mongo (page 1002) rst calls the
cursor.toArray() method and loads into RAM all documents returned by the cursor. The index is then
applied to the resulting array. This operation iterates the cursor completely and exhausts the cursor.
For very large result sets, mongo (page 1002) may run out of available memory.
For more information on the cursor, see Iterate the Returned Cursor (page 181).
Query for Specic Documents
MongoDB has a rich query system that allows you to select and lter the documents in a collection along specic
elds and values. See Query Document (page 128) and Read (page 175) for a full account of queries in MongoDB.
In this procedure, you query for specic documents in the things collection by passing a query document as a
parameter to the find() (page 910) method. A query document species the criteria the query must match to return
a document.
To query for specic documents, do the following:
1. In the mongo (page 1002) shell, query for all documents where the name eld has a value of mongo by passing
the { name : "mongo" } query document as a parameter to the find() (page 910) method:
1.8. Getting Started with MongoDB Development 25
MongoDB Documentation, Release 2.4.1
db.things.find( { name : "mongo" } )
MongoDB returns one document that ts this criteria. The ObjectId (page 158) value will be different:
{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
2. Query for all documents where x has a value of 4 by passing the { x : 4 } query document as a parameter
to find() (page 910):
db.things.find( { x : 4 } )
MongoDB returns the following result set:
{ "_id" : ObjectId("4c220a42f3924d31102bd856"), "x" : 4, "j" : 1 }
{ "_id" : ObjectId("4c220a42f3924d31102bd857"), "x" : 4, "j" : 2 }
{ "_id" : ObjectId("4c220a42f3924d31102bd858"), "x" : 4, "j" : 3 }
{ "_id" : ObjectId("4c220a42f3924d31102bd859"), "x" : 4, "j" : 4 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85a"), "x" : 4, "j" : 5 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85b"), "x" : 4, "j" : 6 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85c"), "x" : 4, "j" : 7 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85d"), "x" : 4, "j" : 8 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85e"), "x" : 4, "j" : 9 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85f"), "x" : 4, "j" : 10 }
{ "_id" : ObjectId("4c220a42f3924d31102bd860"), "x" : 4, "j" : 11 }
{ "_id" : ObjectId("4c220a42f3924d31102bd861"), "x" : 4, "j" : 12 }
{ "_id" : ObjectId("4c220a42f3924d31102bd862"), "x" : 4, "j" : 13 }
{ "_id" : ObjectId("4c220a42f3924d31102bd863"), "x" : 4, "j" : 14 }
{ "_id" : ObjectId("4c220a42f3924d31102bd864"), "x" : 4, "j" : 15 }
{ "_id" : ObjectId("4c220a42f3924d31102bd865"), "x" : 4, "j" : 16 }
{ "_id" : ObjectId("4c220a42f3924d31102bd866"), "x" : 4, "j" : 17 }
{ "_id" : ObjectId("4c220a42f3924d31102bd867"), "x" : 4, "j" : 18 }
{ "_id" : ObjectId("4c220a42f3924d31102bd868"), "x" : 4, "j" : 19 }
{ "_id" : ObjectId("4c220a42f3924d31102bd869"), "x" : 4, "j" : 20 }
ObjectId (page 158) values are always unique.
3. Query for all documents where x has a value of 4, as in the previous query, but only return only the value of
j. MongoDB will also return the _id eld, unless explicitly excluded. To do this, you add the { j : 1 }
document as the projection in the second parameter to find() (page 910). This operation would resemble the
following:
db.things.find( { x : 4 } , { j : 1 } )
MongoDB returns the following results:
{ "_id" : ObjectId("4c220a42f3924d31102bd856"), "j" : 1 }
{ "_id" : ObjectId("4c220a42f3924d31102bd857"), "j" : 2 }
{ "_id" : ObjectId("4c220a42f3924d31102bd858"), "j" : 3 }
{ "_id" : ObjectId("4c220a42f3924d31102bd859"), "j" : 4 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85a"), "j" : 5 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85b"), "j" : 6 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85c"), "j" : 7 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85d"), "j" : 8 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85e"), "j" : 9 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85f"), "j" : 10 }
{ "_id" : ObjectId("4c220a42f3924d31102bd860"), "j" : 11 }
{ "_id" : ObjectId("4c220a42f3924d31102bd861"), "j" : 12 }
{ "_id" : ObjectId("4c220a42f3924d31102bd862"), "j" : 13 }
{ "_id" : ObjectId("4c220a42f3924d31102bd863"), "j" : 14 }
{ "_id" : ObjectId("4c220a42f3924d31102bd864"), "j" : 15 }
26 Chapter 1. Installation Guides
MongoDB Documentation, Release 2.4.1
{ "_id" : ObjectId("4c220a42f3924d31102bd865"), "j" : 16 }
{ "_id" : ObjectId("4c220a42f3924d31102bd866"), "j" : 17 }
{ "_id" : ObjectId("4c220a42f3924d31102bd867"), "j" : 18 }
{ "_id" : ObjectId("4c220a42f3924d31102bd868"), "j" : 19 }
{ "_id" : ObjectId("4c220a42f3924d31102bd869"), "j" : 20 }
Return a Single Document from a Collection
With the db.collection.findOne() (page 915) method you can return a single document from a MongoDB
collection. The findOne() (page 915) method takes the same parameters as find() (page 910), but returns a
document rather than a cursor.
To retrieve one document from the things collection, issue the following command:
db.things.findOne()
For more information on querying for documents, see the Read (page 175) and Read Operations (page 127) documen-
tation.
Limit the Number of Documents in the Result Set
You can constrain the size of the result set to increase performance by limiting the amount of data your application
must receive over the network.
To specify the maximum number of documents in the result set, call the limit() (page 895) method on a cursor, as
in the following command:
db.things.find().limit(3)
MongoDB will return the following result, with different ObjectId (page 158) values:
{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
{ "_id" : ObjectId("4c2209fef3924d31102bd84b"), "x" : 3 }
{ "_id" : ObjectId("4c220a42f3924d31102bd856"), "x" : 4, "j" : 1 }
1.8.4 Next Steps with MongoDB
For more information on manipulating the documents in a database as you continue to learn MongoDB, consider the
following resources:
CRUD Operations for MongoDB (page 167)
SQL to MongoDB Mapping Chart (page 966)
MongoDB Drivers and Client Libraries (page 493)
Getting Started with MongoDB Development (page 21)
Create (page 167)
Read (page 175)
Update (page 185)
Delete (page 191)
1.8. Getting Started with MongoDB Development 27
MongoDB Documentation, Release 2.4.1
28 Chapter 1. Installation Guides
CHAPTER 2
Release Notes
You should always install the latest, stable version of MongoDB. Stable versions have an even-numbered minor version
number. For example: v2.4 is stable, v2.2, and v2.0 were previously the stable, while v2.1 and v2.3 are a development
versions.
Current Stable Release:
Release Notes for MongoDB 2.4 (page 1151)
Previous Stable Releases:
Release Notes for MongoDB 2.2 (page 1170)
Release Notes for MongoDB 2.0 (page 1178)
Release Notes for MongoDB 1.8 (page 1184)
29
MongoDB Documentation, Release 2.4.1
30 Chapter 2. Release Notes
Part II
Administration
31
MongoDB Documentation, Release 2.4.1
The documentation in this section outlines core administrative tasks and practices that operators of MongoDB will
want to consider. In addition to the core topics that follow, also consider the relevant documentation in other sections
including: Sharding (page 419), Replication (page 331), and Indexes (page 271).
33
MongoDB Documentation, Release 2.4.1
34
CHAPTER 3
Run-time Database Conguration
The command line (page 989) and conguration le (page 1045) interfaces provide MongoDB administrators with a
large number of options and settings for controlling the operation of the database system. This document provides an
overview of common congurations and examples of best-practice congurations for common use cases.
While both interfaces provide access to the same collection of options and settings, this docu-
ment primarily uses the conguration le interface. If you run MongoDB using a control script
or installed from a package for your operating system, you likely already have a conguration
le located at http://docs.mongodb.org/manual/etc/mongodb.conf. Conrm this by
checking the content of the http://docs.mongodb.org/manual/etc/init.d/mongod or
http://docs.mongodb.org/manual/etc/rc.d/mongod script to insure that the control scripts start the
mongod (page 989) with the appropriate conguration le (see below.)
To start MongoDB instance using this conguration issue a command in the following form:
mongod --config /etc/mongodb.conf
mongod -f /etc/mongodb.conf
Modify the values in the http://docs.mongodb.org/manual/etc/mongodb.conf le on your system to
control the conguration of your database instance.
3.1 Starting, Stopping, and Running the Database
Consider the following basic conguration:
fork = true
bind_ip = 127.0.0.1
port = 27017
quiet = true
dbpath = /srv/mongodb
logpath = /var/log/mongodb/mongod.log
logappend = true
journal = true
For most standalone servers, this is a sufcient base conguration. It makes several assumptions, but consider the
following explanation:
fork (page 1048) is true, which enables a daemon mode for mongod (page 989), which detaches (i.e.
forks) the MongoDB from the current session and allows you to run the database as a conventional server.
35
MongoDB Documentation, Release 2.4.1
bind_ip (page 1046) is 127.0.0.1, which forces the server to only listen for requests on the localhost IP.
Only bind to secure interfaces that the application-level systems can access with access control provided by
system network ltering (i.e. rewall).
port (page 1046) is 27017, which is the default MongoDB port for database instances. MongoDB can bind
to any port. You can also lter access based on port using network ltering tools.
Note: UNIX-like systems require superuser privileges to attach processes to ports lower than 1000.
quiet (page 1052) is true. This disables all but the most critical entries in output/log le. In normal operation
this is the preferable operation to avoid log noise. In diagnostic or testing situations, set this value to false.
Use setParameter (page 878) to modify this setting during run time.
dbpath (page 1048) is http://docs.mongodb.org/manual/srv/mongodb, which species where
MongoDB will store its data les. http://docs.mongodb.org/manual/srv/mongodb and
http://docs.mongodb.org/manual/var/lib/mongodb are popular locations. The user account
that mongod (page 989) runs under will need read and write access to this directory.
logpath (page 1047) is http://docs.mongodb.org/manual/var/log/mongodb/mongod.log
which is where mongod (page 989) will write its output. If you do not set this value, mongod (page 989) writes
all output to standard output (e.g. stdout.)
logappend (page 1047) is true, which ensures that mongod (page 989) does not overwrite an existing log
le following the server start operation.
journal (page 1049) is true, which enables journaling. Journaling ensures single instance write-durability.
64-bit builds of mongod (page 989) enable journaling by default. Thus, this setting may be redundant.
Given the default conguration, some of these values may be redundant. However, in many situations explicitly stating
the conguration increases overall system intelligibility.
3.2 Security Considerations
The following collection of conguration options are useful for limiting access to a mongod (page 989) instance.
Consider the following:
bind_ip = 127.0.0.1,10.8.0.10,192.168.4.24
nounixsocket = true
auth = true
Consider the following explanation for these conguration decisions:
bind_ip (page 1046) has three values: 127.0.0.1, the localhost interface; 10.8.0.10, a private IP
address typically used for local networks and VPN interfaces; and 192.168.4.24, a private network interface
typically used for local networks.
Because production MongoDB instances need to be accessible from multiple database servers, it is important
to bind MongoDB to multiple interfaces that are accessible from your application servers. At the same time its
important to limit these interfaces to interfaces controlled and protected at the network layer.
nounixsocket (page 1047) to true disables the UNIX Socket, which is otherwise enabled by default.
This limits access on the local system. This is desirable when running MongoDB on systems with shared
access, but in most situations has minimal impact.
auth (page 1048) is true enables the authentication system within MongoDB. If enabled you will need to
log in by connecting over the localhost interface for the rst time to create user credentials.
36 Chapter 3. Run-time Database Conguration
MongoDB Documentation, Release 2.4.1
See Also:
Security Practices and Management (page 91)
3.3 Replication and Sharding Conguration
3.3.1 Replication Conguration
Replica set conguration is straightforward, and only requires that the replSet (page 1053) have a value that is
consistent among all members of the set. Consider the following:
replSet = set0
Use descriptive names for sets. Once congured use the mongo (page 1002) shell to add hosts to the replica set.
See Also:
Replica set reconguration (page 1061).
To enable authentication for the replica set, add the following option:
keyFile = /srv/mongodb/keyfile
New in version 1.8: for replica sets, and 1.9.1 for sharded replica sets. Setting keyFile (page 1047) enables authen-
tication and species a key le for the replica set member use to when authenticating to each other. The content of
the key le is arbitrary, but must be the same on all members of the replica set and mongos (page 999) instances that
connect to the set. The keyle must be less than one kilobyte in size and may only contain characters in the base64 set
and the le must not have group or world permissions on UNIX systems.
See Also:
The Replica set Reconguration (page 1061) section for information regarding the process for changing replica set
during operation.
Additionally, consider the Replica Set Security (page 348) section for information on conguring authentication
with replica sets.
Finally, see the Replication (page 331) index and the Replica Set Fundamental Concepts (page 333) document for
more information on replication in MongoDB and replica set conguration in general.
3.3.2 Sharding Conguration
Sharding requires a number of mongod (page 989) instances with different congurations. The cong servers store
the clusters metadata, while the cluster distributes data among one or more shard servers.
Note: Cong servers are not replica sets.
To set up one or three cong server instances as normal (page 35) mongod (page 989) instances, and then add the
following conguration option:
configsvr = true
bind_ip = 10.8.0.12
port = 27001
3.3. Replication and Sharding Conguration 37
MongoDB Documentation, Release 2.4.1
This creates a cong server running on the private IP address 10.8.0.12 on port 27001. Make sure that there
are no port conicts, and that your cong server is accessible from all of your mongos (page 999) and mongod
(page 989) instances.
To set up shards, congure two or more mongod (page 989) instance using your base conguration (page 35), adding
the shardsvr (page 1054) setting:
shardsvr = true
Finally, to establish the cluster, congure at least one mongos (page 999) process with the following settings:
configdb = 10.8.0.12:27001
chunkSize = 64
You can specify multiple configdb (page 1055) instances by specifying hostnames and ports in the form of a comma
separated list. In general, avoid modifying the chunkSize (page 1055) from the default value of 64,
1
and should
ensure this setting is consistent among all mongos (page 999) instances.
See Also:
The Sharding (page 419) section of the manual for more information on sharding and cluster conguration.
3.4 Running Multiple Database Instances on the Same System
In many cases running multiple instances of mongod (page 989) on a single system is not recommended. On some
types of deployments
2
and for testing purposes you may need to run more than one mongod (page 989) on a single
system.
In these cases, use a base conguration (page 35) for each instance, but consider the following conguration values:
dbpath = /srv/mongodb/db0/
pidfilepath = /srv/mongodb/db0.pid
The dbpath (page 1048) value controls the location of the mongod (page 989) instances data directory. Ensure that
each database has a distinct and well labeled data directory. The pidfilepath (page 1047) controls where mongod
(page 989) process places its process id le. As this tracks the specic mongod (page 989) le, it is crucial that le
be unique and well labeled to make it easy to start and stop these processes.
Create additional control scripts and/or adjust your existing MongoDB conguration and control script as needed to
control these processes.
3.5 Diagnostic Congurations
The following conguration options control various mongod (page 989) behaviors for diagnostic purposes. The
following settings have default values that tuned for general production purposes:
slowms = 50
profile = 3
verbose = true
diaglog = 3
objcheck = true
cpu = true
1
Chunk size is 64 megabytes by default, which provides the ideal balance between the most even distribution of data, for which smaller chunk
sizes are best, and minimizing chunk migration, for which larger chunk sizes are optimal.
2
Single-tenant systems with SSD or other high performance disks may provide acceptable performance levels for multiple mongod (page 989)
instances. Additionally, you may nd that multiple databases with small working sets may function acceptably on a single system.
38 Chapter 3. Run-time Database Conguration
MongoDB Documentation, Release 2.4.1
Use the base conguration (page 35) and add these options if you are experiencing some unknown issue or perfor-
mance problem as needed:
slowms (page 1051) congures the threshold for the database proler to consider a query slow. The default
value is 100 milliseconds. Set a lower value if the database proler does not return useful results. See Opti-
mization Strategies for MongoDB Applications (page 493) for more information on optimizing operations in
MongoDB.
profile (page 1050) sets the database proler level. The proler is not active by default because of the
possible impact on the proler itself on performance. Unless this setting has a value, queries are not proled.
verbose (page 1045) enables a verbose logging mode that modies mongod (page 989) output and increases
logging to include a greater number of events. Only use this option if you are experiencing an issue that is not
reected in the normal logging level. If you require additional verbosity, consider the following options:
v = true
vv = true
vvv = true
vvvv = true
vvvvv = true
Each additional level v adds additional verbosity to the logging. The verbose option is equal to v = true.
diaglog (page 1048) enables diagnostic logging. Level 3 logs all read and write options.
objcheck (page 1046) forces mongod (page 989) to validate all requests from clients upon receipt. Use this
option to ensure that invalid requests are not causing errors, particularly when running a database with untrusted
clients. This option may affect database performance.
cpu (page 1048) forces mongod (page 989) to report the percentage of the last interval spent in write-lock.
The interval is typically 4 seconds, and each output line in the log includes both the actual interval since the last
report and the percentage of time spent in write lock.
3.5. Diagnostic Congurations 39
MongoDB Documentation, Release 2.4.1
40 Chapter 3. Run-time Database Conguration
CHAPTER 4
Operational Segregation in MongoDB
Operations and Deployments
4.1 Operational Overview
MongoDB includes a cluster of features that allow database administrators and developers to segregate application
operations to MongoDB deployments by functional or geographical groupings.
This capability provides data center awareness, which allows applications to target MongoDB deployments with
consideration of the physical location of mongod (page 989) instances. MongoDB supports segmentation of oper-
ations across different dimensions, which may include multiple data centers and geographical regions in multi-data
center deployments or racks, networks, or power circuits in single data center deployments.
MongoDB also supports segregation of database operations based on functional or operational parameters, to ensure
that certain mongod (page 989) instances are only used for reporting workloads or that certain high-frequency portions
of a sharded collection only exist on specic shards.
Specically, with MongoDB, you can:
ensure write operations propagate to specic members of a replica set, or to specic members of replica sets.
ensure that specic members of a replica set respond to queries.
ensure that specic ranges of your shard key balance onto and reside on specic shards.
combine the above features in a single distributed deployment, on a per-operation (for read and write operations)
and collection (for chunk distribution in sharded clusters distribution) basis.
For full documentation of these features, see the following documentation in the MongoDB Manual:
Read Preferences (page 360), which controls how drivers help applications target read operations to members
of a replica set.
Write Concerns (page 357), which controls how MongoDB ensures that write operations propagate to members
of a replica set.
Replica Set Tags (page 1063), which control how applications create and interact with custom groupings of
replica set members to create custom application-specic read preferences and write concerns.
Tag Aware Sharding (page 466), which allows MongoDB administrators to dene an application-specic bal-
ancing policy, to control how documents belonging to specic ranges of a shard key distribute to shards in the
sharded cluster.
41
MongoDB Documentation, Release 2.4.1
See Also:
Before adding operational segregation features to your application and MongoDB deployment, become familiar with
all documentation of replication (page 331) and sharding (page 419), particularly Replica Set Fundamental Concepts
(page 333) and Sharded Cluster Overview (page 421).
42 Chapter 4. Operational Segregation in MongoDB Operations and Deployments
CHAPTER 5
Journaling
MongoDB uses write ahead logging to an on-disk journal to guarantee write operation (page 139) durability and to
provide crash resiliency. Before applying a change to the data les, MongoDB writes the change operation to the
journal. If MongoDB should terminate or encounter an error before it can write the changes from the journal to the
data les, MongoDB can re-apply the write operation and maintain a consistent state.
Without a journal, if mongod (page 989) exits unexpectedly, you must assume your data is in an inconsistent state,
and you must run either repair (page 662) or, preferably, resync (page 347) from a clean member of the replica set.
With journaling enabled, if mongod (page 989) stops unexpectedly, the program can recover everything written to the
journal, and the data remains in a consistent state. By default, the greatest extent of lost writes, i.e., those not made to
the journal, is no more than the last 100 milliseconds.
With journaling, if you want a data set to reside entirely in RAM, you need enough RAM to hold the dataset plus
the write working set. The write working set is the amount of unique data you expect to see written between
re-mappings of the private view. For information on views, see Storage Views used in Journaling (page 46).
Important: Changed in version 2.0: For 64-bit builds of mongod (page 989), journaling is enabled by default. For
other platforms, see journal (page 1049).
5.1 Procedures
5.1.1 Enable Journaling
Changed in version 2.0: For 64-bit builds of mongod (page 989), journaling is enabled by default. To enable journal-
ing, start mongod (page 989) with the --journal (page 992) command line option.
If no journal les exist, when mongod (page 989) starts, it must preallocate new journal les. During this operation,
the mongod (page 989) is not listening for connections until preallocation completes: for some systems this may take
a several minutes. During this period your applications and the mongo (page 1002) shell are not available.
43
MongoDB Documentation, Release 2.4.1
5.1.2 Disable Journaling
Warning: Do not disable journaling on production systems. If your mongod (page 989) instance stops without
shutting down cleanly unexpectedly for any reason, (e.g. power failure) and you are not running with journaling,
then you must recover from an unaffected replica set member or backup, as described in repair (page 662).
To disable journaling, start mongod (page 989) with the --nojournal (page 993) command line option.
5.1.3 Get Commit Acknowledgment
You can get commit acknowledgment with the getLastError (page 847) command and the j option. For details,
see Internal Operation of Write Concern (page 141).
5.1.4 Avoid Preallocation Lag
To avoid preallocation lag (page 45), you can preallocate les in the journal directory by copying them from another
instance of mongod (page 989).
Preallocated les do not contain data. It is safe to later remove them. But if you restart mongod (page 989) with
journaling, mongod (page 989) will create them again.
Example
The following sequence preallocates journal les for an instance of mongod (page 989) running on port 27017 with
a database path of http://docs.mongodb.org/manual/data/db.
For demonstration purposes, the sequence starts by creating a set of journal les in the usual way.
1. Create a temporary directory into which to create a set of journal les:
mkdir ~/tmpDbpath
2. Create a set of journal les by staring a mongod (page 989) instance that uses the temporary directory:
mongod --port 10000 --dbpath ~/tmpDbpath --journal
3. When you see the following log output, indicating mongod (page 989) has the les, press CONTROL+C to
stop the mongod (page 989) instance:
web admin interface listening on port 11000
4. Preallocate journal les for the new instance of mongod (page 989) by moving the journal les from the data
directory of the existing instance to the data directory of the new instance:
mv ~/tmpDbpath/journal /data/db/
5. Start the new mongod (page 989) instance:
mongod --port 27017 --dbpath /data/db --journal
5.1.5 Monitor Journal Status
Use the following commands and methods to monitor journal status:
44 Chapter 5. Journaling
MongoDB Documentation, Release 2.4.1
serverStatus (page 878)
The serverStatus (page 878) command returns database status information that is useful for assessing
performance.
journalLatencyTest (page 858)
Use journalLatencyTest (page 858) to measure how long it takes on your volume to write to the disk in
an append-only fashion. You can run this command on an idle system to get a baseline sync time for journaling.
You can also run this command on a busy system to see the sync time on a busy system, which may be higher if
the journal directory is on the same volume as the data les.
The journalLatencyTest (page 858) command also provides a way to check if your disk drive is buffering
writes in its local cache. If the number is very low (i.e., less than 2 milliseconds) and the drive is non-SSD, the
drive is probably buffering writes. In that case, enable cache write-through for the device in your operating
system, unless you have a disk controller card with battery backed RAM.
5.1.6 Change the Group Commit Interval
Changed in version 2.0. You can set the group commit interval using the --journalCommitInterval (page 992)
command line option. The allowed range is 2 to 300 milliseconds.
Lower values increase the durability of the journal at the expense of disk performance.
5.1.7 Recover Data After Unexpected Shutdown
On a restart after a crash, MongoDB replays all journal les in the journal directory before the server becomes avail-
able. If MongoDB must replay journal les, mongod (page 989) notes these events in the log output.
There is no reason to run repairDatabase (page 872) in these situations.
5.2 Journaling Internals
When running with journaling, MongoDB stores and applies write operations (page 139) in memory and in the journal
before the changes are in the data les.
5.2.1 Journal Files
With journaling enabled, MongoDB creates a journal directory within the directory dened by dbpath (page 1048),
which is http://docs.mongodb.org/manual/data/db by default. The journal directory holds journal les,
which contain write-ahead redo logs. The directory also holds a last-sequence-number le. A clean shutdown removes
all the les in the journal directory.
Journal les are append-only les and have le names prexed with j._. When a journal le holds 1 gigabyte of data,
MongoDB creates a new journal le. Once MongoDB applies all the write operations in the journal les, it deletes
these les. Unless you write many bytes of data per-second, the journal directory should contain only two or three
journal les.
To limit the size of each journal le to 128 megabytes, use the smallfiles (page 1051) run time option when
starting mongod (page 989).
To speed the frequent sequential writes that occur to the current journal le, you can ensure that the journal directory
is on a different system.
5.2. Journaling Internals 45
MongoDB Documentation, Release 2.4.1
Important: If you place the journal on a different lesystem from your data les you cannot use a lesystem snapshot
to capture consistent backups of a dbpath (page 1048) directory.
Note: Depending on your le system, you might experience a preallocation lag the rst time you start a mongod
(page 989) instance with journaling enabled. MongoDB preallocates journal les if it is faster on your le system
to create les of a pre-dened. The amount of time required to pre-allocate lag might last several minutes, during
which you will not be able to connect to the database. This is a one-time preallocation and does not occur with future
invocations.
To avoid preallocation lag, see Avoid Preallocation Lag (page 44).
5.2.2 Storage Views used in Journaling
Journaling adds three storage views to MongoDB.
The shared view stores modied data for upload to the MongoDB data les. The shared view is the only view
with direct access to the MongoDB data les. When running with journaling, mongod (page 989) asks the operating
system to map your existing on-disk data les to the shared view memory view. The operating system maps the
les but does not load them. MongoDB later loads data les to shared view as needed.
The private view stores data for use in read operations (page 127). MongoDB maps private view to the
shared view and is the rst place MongoDB applies new write operations (page 139).
The journal is an on-disk view that stores new write operations after MongoDB applies the operation to the private
cache but before applying them to the data les. The journal provides durability. If the mongod (page 989) instance
were to crash without having applied the writes to the data les, the journal could replay the writes to the shared
view for eventual upload to the data les.
5.2.3 How Journaling Records Write Operations
MongoDB copies the write operations to the journal in batches called group commits. By default, MongoDB performs
a group commit every 100 milliseconds: as a result MongoDBcommits all operations within a 100 millisecond window
in a single batch. These group commits help minimize the performance impact of journaling.
Journaling stores raw operations that allow MongoDB to reconstruct the following:
document insertion/updates
index modications
changes to the namespace les
As write operations (page 139) occur, MongoDB writes the data to the private view in RAM and then copies the
write operations in batches to the journal. The journal stores the operations on disk to ensure durability. MongoDB
adds the operations as entries on the journals forward pointer. Each entry describes which bytes the write operation
changed in the data les.
MongoDB next applies the journals write operations to the shared view. At this point, the shared view
becomes inconsistent with the data les.
At default intervals of 60 seconds, MongoDB asks the operating system to ush the shared view to disk. This
brings the data les up-to-date with the latest write operations.
When MongoDB ushes write operations to the data les, MongoDB removes the write operations from the journals
behind pointer. The behind pointer is always far back from advanced pointer.
46 Chapter 5. Journaling
MongoDB Documentation, Release 2.4.1
As part of journaling, MongoDB routinely asks the operating system to remap the shared view to the private
view, for consistency.
Note: The interaction between the shared view and the on-disk data les is similar to how MongoDB works
without journaling, which is that MongoDB asks the operating system to ush in-memory changes back to the data
les every 60 seconds.
5.2. Journaling Internals 47
MongoDB Documentation, Release 2.4.1
48 Chapter 5. Journaling
CHAPTER 6
Use MongoDB with SSL Connections
This document outlines the use and operation of MongoDBs SSL support. SSL allows MongoDB clients to support
encrypted connections to mongod (page 989) instances.
Note: The default distribution of MongoDB does not contain support for SSL. To use SSL, you must either build
MongoDB locally passing the --ssl option to scons or use MongoDB Enterprise.
These instructions outline the process for getting started with SSL and assume that you have already installed a build
of MongoDB that includes SSL support and that your client driver supports SSL.
6.1 Congure mongod and mongos for SSL
6.1.1 Combine SSL Certicate and Key File
Before you can use SSL, you must have a .pem le that contains the public key certicate and private key. MongoDB
can use any valid SSL certicate. To generate a self-signed certicate and private key, use a command that resembles
the following:
cd /etc/ssl/
openssl req -new -x509 -days 365 -nodes -out mongodb-cert.crt -keyout mongodb-cert.key
This operation generates a new, self-signed certicate with no passphrase that is valid for 365 days. Once you have
the certicate, concatenate the certicate and private key to a .pem le, as in the following example:
cat mongodb-cert.key mongodb-cert.crt > mongodb.pem
6.1.2 Set Up mongod and mongos with SSL Certicate and Key
To use SSL in your MongoDB deployment, include the following run-time options with mongod (page 989) and
mongos (page 999):
sslOnNormalPorts (page 1056)
sslPEMKeyFile (page 1056) with the .pem le that contains the SSL certicate and key.
Consider the following syntax for mongod (page 989):
49
MongoDB Documentation, Release 2.4.1
mongod --sslOnNormalPorts --sslPEMKeyFile <pem>
For example, given an SSL certicate located at http://docs.mongodb.org/manual/etc/ssl/mongodb.pem,
congure mongod (page 989) to use SSL encryption for all connections with the following command:
mongod --sslOnNormalPorts --sslPEMKeyFile /etc/ssl/mongodb.pem
Note:
Specify <pem> with the full path name to the certicate.
If the private key portion of the <pem> is encrypted, specify the encryption password with the
sslPEMKeyPassword (page 1057) option.
You may also specify these options in the conguration le (page 1045), as in the following example:
sslOnNormalPorts = true
sslPEMKeyFile = /etc/ssl/mongodb.pem
To connect, to mongod (page 989) and mongos (page 999) instances using SSL, the mongo (page 1002) shell and
MongoDB tools must include the --ssl option. See SSL Conguration for Clients (page 52) for more information
on connecting to mongod (page 989) and mongos (page 999) running with SSL.
6.1.3 Set Up mongod and mongos with Certicate Validation
To set up mongod (page 989) or mongos (page 999) for SSL encryption using an SSL certicate signed by a certi-
cate authority, include the following run-time options during startup:
sslOnNormalPorts (page 1056)
sslPEMKeyFile (page 1056) with the name of the .pem le that contains the signed SSL certicate and key.
sslCAFile (page 1057) with the name of the .pem le that contains the root certicate chain from the
Certicate Authority.
Consider the following syntax for mongod (page 989):
mongod --sslOnNormalPorts --sslPEMKeyFile <pem> --sslCAFile <ca>
For example, given a signed SSL certicate located at http://docs.mongodb.org/manual/etc/ssl/mongodb.pem
and the certicate authority le at http://docs.mongodb.org/manual/etc/ssl/ca.pem, you can con-
gure mongod (page 989) for SSL encryption as follows:
mongod --sslOnNormalPorts --sslPEMKeyFile /etc/ssl/mongodb.pem --sslCAFile /etc/ssl/ca.pem
Note:
Specify the <pem> le and the <ca> le with either the full path name or the relative path name.
If the <pem> is encrypted, specify the encryption password with the sslPEMKeyPassword (page 1057)
option.
You may also specify these options in the conguration le (page 1045), as in the following example:
sslOnNormalPorts = true
sslPEMKeyFile = /etc/ssl/mongodb.pem
sslCAFile = /etc/ssl/ca.pem
50 Chapter 6. Use MongoDB with SSL Connections
MongoDB Documentation, Release 2.4.1
To connect, to mongod (page 989) and mongos (page 999) instances using SSL, the mongo (page 1002) tools must
include the both the --ssl (page 1004) and --sslPEMKeyFile (page 1004) option. See SSL Conguration for
Clients (page 52) for more information on connecting to mongod (page 989) and mongos (page 999) running with
SSL.
Block Revoked Certicates for Clients
To prevent clients with revoked certicates from connecting, include the sslCRLFile (page 1057) to specify a .pem
le that contains revoked certicates.
For example, the following mongod (page 989) with SSL conguration includes the sslCRLFile (page 1057)
setting:
mongod --sslOnNormalPorts --sslCRLFile /etc/ssl/ca-crl.pem --sslPEMKeyFile /etc/ssl/mongodb.pem --sslCAFile /etc/ssl/ca.pem
Clients with revoked certicates in the http://docs.mongodb.org/manual/etc/ssl/ca-crl.pem will
not be able to connect to this mongod (page 989) instance.
Validate Only if a Client Presents a Certicate
In most cases it is important to ensure that clients present valid certicates. However, if you have clients that cannot
present a client certicate, or are transitioning to using a certicate authority you may only want to validate certicates
from clients that present a certicate.
If you want to bypass validation for clients that dont present certicates, include the
sslWeakCertificateValidation (page 1057) run-time option with mongod (page 989) and mongos
(page 999). If the client does not present a certicate, no validation occurs. These connections, though not validated,
are still encrypted using SSL.
For example, consider the following mongod (page 989) with an SSL conguration that includes the
sslWeakCertificateValidation (page 1057) setting:
mongod --sslOnNormalPorts --sslWeakCertificateValidation --sslPEMKeyFile /etc/ssl/mongodb.pem --sslCAFile /etc/ssl/ca.pem
Then, clients can connect either with the option --ssl (page 1004) and no certicate or with the option --ssl
(page 1004) and a valid certicate. See SSL Conguration for Clients (page 52) for more information on SSL connec-
tions for clients.
Note: If the client presents a certicate, the certicate must be a valid certicate.
All connections, including those that have not presented certicates are encrypted using SSL.
6.1.4 Run in FIPS Mode
If your mongod (page 989) or mongos (page 999) is running on a system with an OpenSSL library congured
with the FIPS 140-2 module, you can run mongod (page 989) or mongos (page 999) in FIPS mode, with the
sslFIPSMode (page 1058) setting.
6.1. Congure mongod and mongos for SSL 51
MongoDB Documentation, Release 2.4.1
6.2 SSL Conguration for Clients
Clients must have support for SSL to work with a mongod (page 989) or a mongos (page 999) instance that has SSL
support enabled. The current versions of the Python, Java, Ruby, Node.js, .NET, and C++ drivers have support for
SSL, with full support coming in future releases of other drivers.
6.2.1 mongo SSL Conguration
For SSL connections, you must use the mongo (page 1002) shell built with SSL support or distributed with MongoDB
Enterprise. To support SSL, mongo (page 1002) has the following settings:
--ssl (page 1004)
--sslPEMKeyFile (page 1056) with the name of the .pem le that contains the SSL certicate and key.
--sslCAFile (page 1057) with the name of the .pem le that contains the certicate from the Certicate
Authority.
--sslPEMKeyPassword (page 1057) option if the client certicate-key le is encrypted.
Connect to MongoDB Instance with SSL Encryption
To connect to a mongod (page 989) or mongos (page 999) instance that requires only a SSL encryption mode
(page 49), start mongo (page 1002) shell with --ssl (page 1004), as in the following:
mongo --ssl
Connect to MongoDB Instance that Requires Client Certicates
To connect to a mongod (page 989) or mongos (page 999) that requires CA-signed client certicates (page 50), start
the mongo (page 1002) shell with --ssl (page 1004) and the --sslPEMKeyFile (page 1056) option to specify
the signed certicate-key le, as in the following:
mongo --ssl --sslPEMKeyFile /etc/ssl/client.pem
Connect to MongoDB Instance that Validates when Presented with a Certicate
To connect to a mongod (page 989) or mongos (page 999) instance that only requires valid certicates when the
client presents a certicate (page 51), start mongo (page 1002) shell either with the --ssl (page 1004) ssl and no
certicate or with the --ssl (page 1004) ssl and a valid signed certicate.
For example, if mongod (page 989) is running with weak certicate validation, both of the following mongo
(page 1002) shell clients can connect to that mongod (page 989):
mongo --ssl
mongo --ssl --sslPEMKeyFile /etc/ssl/client.pem
Important: If the client presents a certicate, the certicate must be valid.
52 Chapter 6. Use MongoDB with SSL Connections
MongoDB Documentation, Release 2.4.1
6.2.2 MMS
The MMS agent will also have to connect via SSL in order to gather its stats. Because the agent already utilizes SSL
for its communications to the MMS servers, this is just a matter of enabling SSL support in MMS itself on a per host
basis.
Use the Edit host button (i.e. the pencil) on the Hosts page in the MMS console and is currently enabled on a group
by group basis by 10gen.
Please see the MMS Manual for more information about MMS conguration.
6.2.3 PyMongo
Add the ssl=True parameter to a PyMongo MongoClient to create a MongoDB connection to an SSL Mon-
goDB instance:
from pymongo import MongoClient
c = MongoClient(host="mongodb.example.net", port=27017, ssl=True)
To connect to a replica set, use the following operation:
from pymongo import MongoReplicaSetClient
c = MongoReplicaSetClient("mongodb.example.net:27017",
replicaSet="mysetname", ssl=True)
PyMongo also supports an ssl=true option for the MongoDB URI:
mongodb://mongodb.example.net:27017/?ssl=true
6.2.4 Java
Consider the following example SSLApp.java class le:
import com.mongodb.
*
;
import javax.net.ssl.SSLSocketFactory;
public class SSLApp {
public static void main(String args[]) throws Exception {
MongoClientOptions o = new MongoClientOptions.Builder()
.socketFactory(SSLSocketFactory.getDefault())
.build();
MongoClient m = new MongoClient("localhost", o);
DB db = m.getDB( "test" );
DBCollection c = db.getCollection( "foo" );
System.out.println( c.findOne() );
}
}
6.2. SSL Conguration for Clients 53
MongoDB Documentation, Release 2.4.1
6.2.5 Ruby
The recent versions of the Ruby driver have support for connections to SSL servers. Install the latest version of the
driver with the following command:
gem install mongo
Then connect to a standalone instance, using the following form:
require rubygems
require mongo
connection = Mongo::Connection.new(localhost, 27017, :ssl => true)
Replace connection with the following if youre connecting to a replica set:
connection = Mongo::ReplSetConnection.new([localhost:27017],
[localhost:27018],
:ssl => true)
Here, mongod (page 989) instance run on localhost:27017 and localhost:27018.
6.2.6 Node.JS (node-mongodb-native)
In the node-mongodb-native driver, use the following invocation to connect to a mongod (page 989) or mongos
(page 999) instance via SSL:
var db1 = new Db(MONGODB, new Server("127.0.0.1", 27017,
{ auto_reconnect: false, poolSize:4, ssl:ssl } );
To connect to a replica set via SSL, use the following form:
var replSet = new ReplSetServers( [
new Server( RS.host, RS.ports[1], { auto_reconnect: true } ),
new Server( RS.host, RS.ports[0], { auto_reconnect: true } ),
],
{rs_name:RS.name, ssl:ssl}
);
6.2.7 .NET
As of release 1.6, the .NET driver supports SSL connections with mongod (page 989) and mongos (page 999)
instances. To connect using SSL, you must add an option to the connection string, specifying ssl=true as follows:
var connectionString = "mongodb://localhost/?ssl=true";
var server = MongoServer.Create(connectionString);
The .NET driver will validate the certicate against the local trusted certicate store, in addition to providing en-
cryption of the server. This behavior may produce issues during testing if the server uses a self-signed certicate. If
you encounter this issue, add the sslverifycertificate=false option to the connection string to prevent the
.NET driver from validating the certicate, as follows:
var connectionString = "mongodb://localhost/?ssl=true&sslverifycertificate=false";
var server = MongoServer.Create(connectionString);
54 Chapter 6. Use MongoDB with SSL Connections
CHAPTER 7
Use MongoDB with SNMP Monitoring
New in version 2.2.
Enterprise Feature
This feature is only available in MongoDB Enterprise.
This document outlines the use and operation of MongoDBs SNMP extension, which is only available in MongoDB
Enterprise.
7.1 Prerequisites
7.1.1 Install MongoDB Enterprise
MongoDB Enterprise, is available on four platforms. For more information, see MongoDB Enterprise.
7.1.2 Included Files
The Enterprise packages contain the following les:
MONGO-MIB.txt:
The MIB le that describes the data (i.e. schema) for MongoDBs SNMP output
mongod.conf:
The SNMP conguration le for reading the SNMP output of MongoDB. The SNMP congures the community
names, permissions, access controls, etc.
7.1.3 Required Packages
To use SNMP, you must install several prerequisites. The names of the packages vary by distribution and are as
follows:
Ubuntu 11.04 requires libssl0.9.8, snmp-mibs-downloader, snmp, and snmpd. Issue a command
such as the following to install these packages:
55
MongoDB Documentation, Release 2.4.1
sudo apt-get install libssl0.9.8 snmp snmpd snmp-mibs-downloader
Red Hat Enterprise Linux 6.x series and Amazon Linux AMI require libssl, net-snmp,
net-snmp-libs, and net-snmp-utils. Issue a command such as the following to install these pack-
ages:
sudo yum install libssl net-snmp net-snmp-libs net-snmp-utils
SUSE Enterprise Linux requires libopenssl0_9_8, libsnmp15, slessp1-libsnmp15, and
snmp-mibs. Issue a command such as the following to install these packages:
sudo zypper install libopenssl0_9_8 libsnmp15 slessp1-libsnmp15 snmp-mibs
7.2 Congure SNMP
7.2.1 Install MIB Conguration Files
Ensure that the MIB directory, at http://docs.mongodb.org/manual/usr/share/snmp/mibs exists. If
not, issue the following command:
sudo mkdir -p /usr/share/snmp/mibs
Use the following command to create a symbolic link:
sudo ln -s [/path/to/mongodb/distribution/]MONGO-MIB.txt /usr/share/snmp/mibs/
Replace [/path/to/mongodb/distribution/] with the path to your MONGO-MIB.txt conguration le.
Copy the mongod.conf le into the http://docs.mongodb.org/manual/etc/snmp directory with the
following command:
cp mongod.conf /etc/snmp/mongod.conf
7.2.2 Start Up
You can control MongoDB Enterprise using default or custom or control scripts, just as you can any other mongod:
Use the following command to view all SNMP options available in your MongoDB:
mongod --help | grep snmp
The above command should return the following output:
Module snmp options:
--snmp-subagent run snmp subagent
--snmp-master run snmp as master
Ensure that the following directories exist:
http://docs.mongodb.org/manual/data/db/ (This is the path where MongoDB stores the data
les.)
http://docs.mongodb.org/manual/var/log/mongodb/ (This is the path where MongoDBwrites
the log output.)
If they do not, issue the following command:
56 Chapter 7. Use MongoDB with SNMP Monitoring
MongoDB Documentation, Release 2.4.1
mkdir -p /var/log/mongodb/ /data/db/
Start the mongod instance with the following command:
mongod --snmp-master --port 3001 --fork --dbpath /data/db/ --logpath /var/log/mongodb/1.log
Optionally, you can set these options in a conguration le (page 1045).
To check if mongod is running with SNMP support, issue the following command:
ps -ef | grep mongod --snmp
The command should return output that includes the following line. This indicates that the proper mongod instance is
running:
systemuser 31415 10260 0 Jul13 pts/16 00:00:00 mongod --snmp-master --port 3001 # [...]
7.2.3 Test SNMP
Check for the snmp agent process listening on port 1161 with the following command:
sudo lsof -i :1161
which return the following output:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
mongod 9238 sysadmin 10u IPv4 96469 0t0 UDP localhost:health-polling
Similarly, this command:
netstat -an | grep 1161
should return the following output:
udp 0 0 127.0.0.1:1161 0.0.0.0:
*
7.2.4 Run snmpwalk Locally
snmpwalk provides tools for retrieving and parsing the SNMP data according to the MIB. If you installed all of the
required packages above, your system will have snmpwalk.
Issue the following command to collect data from mongod using SNMP:
snmpwalk -m MONGO-MIB -v 2c -c mongodb 127.0.0.1:1161 1.3.6.1.4.1.37601
You may also choose to specify a the path to the MIB le:
snmpwalk -m /usr/share/snmp/mibs/MONGO-MIB -v 2c -c mongodb 127.0.0.1:1161 1.3.6.1.4.1.37601
Use this command only to ensure that you can retrieve and validate SNMP data from MongoDB.
7.3 Troubleshooting
Always check the logs for errors if something does not run as expected, see the log at
http://docs.mongodb.org/manual/var/log/mongodb/1.log. The presence of the following line in-
7.3. Troubleshooting 57
MongoDB Documentation, Release 2.4.1
dicates that the mongod cannot read the http://docs.mongodb.org/manual/etc/snmp/mongod.conf
le:
[SNMPAgent] warning: error starting SNMPAgent as master err:1
58 Chapter 7. Use MongoDB with SNMP Monitoring
CHAPTER 8
Monitoring Database Systems
Monitoring is a critical component of all database administration. A rm grasp of MongoDBs reporting will allow you
to assess the state of your database and maintain your deployment without crisis. Additionally, a sense of MongoDBs
normal operational parameters will allow you to diagnose issues as you encounter them, rather than waiting for a crisis
or failure.
This document provides an overview of the available tools and data provided by MongoDB as well as an introduction
to diagnostic strategies, and suggestions for monitoring instances in MongoDBs replica sets and sharded clusters.
Note: 10gen provides a hosted monitoring service which collects and aggregates these data to provide insight into the
performance and operation of MongoDB deployments. See the MongoDB Monitoring Service (MMS) and the MMS
documentation for more information.
8.1 Monitoring Tools
There are two primary methods for collecting data regarding the state of a running MongoDB instance. First, there
are a set of tools distributed with MongoDB that provide real-time reporting of activity on the database. Second,
several database commands (page 977) return statistics regarding the current database state with greater delity. Both
methods allow you to collect data that answers a different set of questions, and are useful in different contexts.
This section provides an overview of these utilities and statistics, along with an example of the kinds of questions that
each method is most suited to help you address.
8.1.1 Utilities
The MongoDBdistribution includes a number of utilities that return statistics about instances performance and activity
quickly. These are typically most useful for diagnosing issues and assessing normal operation.
mongotop
mongotop (page 1034) tracks and reports the current read and write activity of a MongoDB instance. mongotop
(page 1034) provides per-collection visibility into use. Use mongotop (page 1034) to verify that activity and use
match expectations. See the mongotop manual (page 1033) for details.
59
MongoDB Documentation, Release 2.4.1
mongostat
mongostat (page 1029) captures and returns counters of database operations. mongostat (page 1029) reports
operations on a per-type (e.g. insert, query, update, delete, etc.) basis. This format makes it easy to understand the
distribution of load on the server. Use mongostat (page 1029) to understand the distribution of operation types and
to inform capacity planning. See the mongostat manual (page 1029) for details.
REST Interface
MongoDB provides a REST interface that exposes a diagnostic and monitoring information in a simple web page.
Enable this by setting rest (page 1050) to true, and access this page via the local host interface using the port
numbered 1000 more than that the database port. In default congurations the REST interface is accessible on 28017.
For example, to access the REST interface on a locally running mongod instance: http://localhost:28017
8.1.2 Statistics
MongoDB provides a number of commands that return statistics about the state of the MongoDB instance. These data
may provide ner granularity regarding the state of the MongoDB instance than the tools above. Consider using their
output in scripts and programs to develop custom alerts, or to modify the behavior of your application in response to
the activity of your instance.
serverStatus
Access serverStatus data (page 1080) by way of the serverStatus (page 878) command. This document contains
a general overview of the state of the database, including disk usage, memory use, connection, journaling, index
accesses. The command returns quickly and does not impact MongoDB performance.
While this output contains a (nearly) complete account of the state of a MongoDB instance, in most cases you
will not run this command directly. Nevertheless, all administrators should be familiar with the data provided by
serverStatus (page 878).
See Also:
db.serverStatus() (page 946) and serverStatus data (page 1080).
replSetGetStatus
View the replSetGetStatus data (page 1106) with the replSetGetStatus (page 874) command (rs.status()
(page 954) from the shell). The document returned by this command reects the state and conguration of the replica
set. Use this data to ensure that replication is properly congured, and to check the connections between the current
host and the members of the replica set.
dbStats
The dbStats data (page 1098) is accessible by way of the dbStats (page 833) command (db.stats() (page 947)
from the shell). This command returns a document that contains data that reects the amount of storage used and
data contained in the database, as well as object, collection, and index counters. Use this data to check and track the
state and storage of a specic database. This output also allows you to compare utilization between databases and to
determine average document size in a database.
60 Chapter 8. Monitoring Database Systems
MongoDB Documentation, Release 2.4.1
collStats
The collStats data (page 1100) is accessible using the collStats (page 825) command
(db.printCollectionStats() (page 943) from the shell). It provides statistics that resemble dbStats
(page 833) on the collection level: this includes a count of the objects in the collection, the size of the collection, the
amount of disk space used by the collection, and information about the indexes.
8.1.3 Introspection Tools
In addition to status reporting, MongoDB provides a number of introspection tools that you can use to diagnose and
analyze performance and operational conditions. Consider the following documentation:
diagLogging (page 833)
Analyze Performance of Database Operations (page 681)
Database Proler Output (page 1114)
Current Operation Reporting (page 1110)
8.1.4 Third Party Tools
A number of third party monitoring tools have support for MongoDB, either directly, or through their own plugins.
Self Hosted Monitoring Tools
These are monitoring tools that you must install, congure and maintain on your own servers, usually open source.
Tool Plugin Description
Gan-
glia
mongodb-ganglia Python script to report operations per second, memory usage, btree statistics,
master/slave status and current connections.
Gan-
glia
gmond_python_modules Parses output from the serverStatus (page 878) and replSetGetStatus
(page 874) commands.
Mo-
top
None Realtime monitoring tool for several MongoDB servers. Shows current operations
ordered by durations every second.
mtop None A top like tool.
Munin mongo-munin Retrieves server statistics.
Munin mongomon Retrieves collection statistics (sizes, index sizes, and each (congured) collection
count for one DB).
Munin munin-plugins
Ubuntu PPA
Some additional munin plugins not in the main distribution.
Na-
gios
nagios-plugin-
mongodb
A simple Nagios check script, written in Python.
Zab-
bix
mikoomi-
mongodb
Monitors availability, resource utilization, health, performance and other important
metrics.
Also consider dex, an index and query analyzing tool for MongoDB that compares MongoDB log les and indexes to
make indexing recommendations.
Hosted (SaaS) Monitoring Tools
These are monitoring tools provided as a hosted service, usually on a subscription billing basis.
8.1. Monitoring Tools 61
MongoDB Documentation, Release 2.4.1
Name Notes
Scout Several plugins including: MongoDB Monitoring, MongoDB Slow Queries and MongoDB
Replica Set Monitoring.
Server
Density
Dashboard for MongoDB, MongoDB specic alerts, replication failover timeline and iPhone, iPad
and Android mobile apps.
8.2 Process Logging
During normal operation, mongod (page 989) and mongos (page 999) instances report information that reect current
operation to standard output, or a log le. The following runtime settings control these options.
quiet (page 1052). Limits the amount of information written to the log or output.
verbose (page 1045). Increases the amount of information written to the log or output.
You can also specify this as v (as in -v.) Set multiple v, as in vvvv = True for higher levels of verbosity.
You can also change the verbosity of a running mongod (page 989) or mongos (page 999) instance with the
setParameter (page 878) command.
logpath (page 1047). Enables logging to a le, rather than standard output. Specify the full path to the log
le to this setting..
logappend (page 1047). Adds information to a log le instead of overwriting the le.
Note: You can specify these conguration operations as the command line arguments to mongod (page 989) or
mongos (page 999)
Additionally, the following database commands affect logging:
getLog (page 848). Displays recent messages from the mongod (page 989) process log.
logRotate (page 859). Rotates the log les for mongod (page 989) processes only. See Rotate Log Files
(page 685).
8.3 Diagnosing Performance Issues
Degraded performance in MongoDB can be the result of an array of causes, and is typically a function of the relation-
ship among the quantity of data stored in the database, the amount of system RAM, the number of connections to the
database, and the amount of time the database spends in a lock state.
In some cases performance issues may be transient and related to trafc load, data access patterns, or the availability
of hardware on the host system for virtualized environments. Some users also experience performance limitations as a
result of inadequate or inappropriate indexing strategies, or as a consequence of poor schema design patterns. In other
situations, performance issues may indicate that the database may be operating at capacity and that it is time to add
additional capacity to the database.
8.3.1 Locks
MongoDB uses a locking system to ensure consistency. However, if certain operations are long-running, or a queue
forms, performance slows as requests and operations wait for the lock. Because lock related slow downs can be
intermittent, look to the data in the globalLock (page 1083) section of the serverStatus (page 878) response to
assess if the lock has been a challenge to your performance. If globalLock.currentQueue.total (page 1084)
62 Chapter 8. Monitoring Database Systems
MongoDB Documentation, Release 2.4.1
is consistently high, then there is a chance that a large number of requests are waiting for a lock. This indicates a
possible concurrency issue that might affect performance.
If globalLock.totalTime (page 1084) is high in context of uptime (page 1081) then the database has existed
in a lock state for a signicant amount of time. If globalLock.ratio (page 1084) is also high, MongoDB has
likely been processing a large number of long running queries. Long queries are often the result of a number of factors:
ineffective use of indexes, non-optimal schema design, poor query structure, system architecture issues, or insufcient
RAM resulting in page faults (page 63) and disk reads.
8.3.2 Memory Usage
Because MongoDB uses memory mapped les to store data, given a data set of sufcient size, the MongoDB process
will allocate all memory available on the system for its use. Because of the way operating systems function, the
amount of allocated RAM is not a useful reection of MongoDBs state.
While this is part of the design, and affords MongoDBsuperior performance, the memory mapped les make it difcult
to determine if the amount of RAM is sufcient for the data set. Consider memory usage statuses (page 1085) to better
understand MongoDBs memory utilization. Check the resident memory use (i.e. mem.resident (page 1085):) if
this exceeds the amount of system memory and theres a signicant amount of data on disk that isnt in RAM, you
may have exceeded the capacity of your system.
Also check the amount of mapped memory (i.e. mem.mapped (page 1085).) If this value is greater than the amount of
system memory, some operations will require disk access page faults to read data from virtual memory with deleterious
effects on performance.
8.3.3 Page Faults
Page faults represent the number of times that MongoDB requires data not located in physical memory, and must
read from virtual memory. To check for page faults, see the extra_info.page_faults (page 1086) value in the
serverStatus (page 878) command. This data is only available on Linux systems.
Alone, page faults are minor and complete quickly; however, in aggregate, large numbers of page fault typically
indicate that MongoDB is reading too much data from disk and can indicate a number of underlying causes and
recommendations. In many situations, MongoDBs read locks will yield after a page fault to allow other processes
to read and avoid blocking while waiting for the next page to read into memory. This approach improves concurrency,
and in high volume systems this also improves overall throughput.
If possible, increasing the amount of RAM accessible to MongoDB may help reduce the number of page faults. If
this is not possible, you may want to consider deploying a sharded cluster and/or adding one or more shards to your
deployment to distribute load among mongod (page 989) instances.
8.3.4 Number of Connections
In some cases, the number of connections between the application layer (i.e. clients) and the database can overwhelm
the ability of the server to handle requests which can produce performance irregularities. Check the following elds
in the serverStatus (page 1080) document:
globalLock.activeClients (page 1084) contains a counter of the total number of clients with active
operations in progress or queued.
connections (page 1085) is a container for the following two elds:
current (page 1086) the total number of current clients that connect to the database instance.
available (page 1086) the total number of unused collections available for new clients.
8.3. Diagnosing Performance Issues 63
MongoDB Documentation, Release 2.4.1
Note: Unless limited by system-wide limits MongoDB has a hard connection limit of 20 thousand
connections. You can modify system limits using the ulimit command, or by editing your systems
http://docs.mongodb.org/manual/etc/sysctl le.
If requests are high because there are many concurrent application requests, the database may have trouble keeping
up with demand. If this is the case, then you will need to increase the capacity of your deployment. For read-
heavy applications increase the size of your replica set and distribute read operations to secondary members. For
write heavy applications, deploy sharding and add one or more shards to a sharded cluster to distribute load among
mongod (page 989) instances.
Spikes in the number of connections can also be the result of application or driver errors. All of the MongoDB drivers
supported by 10gen implement connection pooling, which allows clients to use and reuse connections more efciently.
Extremely high numbers of connections, particularly without corresponding workload is often indicative of a driver or
other conguration error.
8.3.5 Database Proling
MongoDB contains a database proling system that can help identify inefcient queries and operations. Enable the
proler by setting the profile (page 869) value using the following command in the mongo (page 1002) shell:
db.setProfilingLevel(1)
See Also:
The documentation of db.setProfilingLevel() (page 946) for more information about this command.
Note: Because the database proler can have an impact on the performance, only enable proling for strategic
intervals and as minimally as possible on production systems.
You may enable proling on a per-mongod (page 989) basis. This setting will not propagate across a replica set or
sharded cluster.
The following proling levels are available:
Level Setting
0 Off. No proling.
1 On. Only includes slow operations.
2 On. Includes all operations.
See the output of the proler in the system.profile collection of your database. You can specify the
slowms (page 1051) setting to set a threshold above which the proler considers operations slow and thus in-
cluded in the level 1 proling data. You may congure slowms (page 1051) at runtime, as an argument to the
db.setProfilingLevel() (page 946) operation.
Additionally, mongod (page 989) records all slow queries to its log (page 1047), as dened by slowms
(page 1051). The data in system.profile does not persist between mongod (page 989) restarts.
You can view the prolers output by issuing the show profile command in the mongo (page 1002) shell, with
the following operation.
db.system.profile.find( { millis : { $gt : 100 } } )
This returns all operations that lasted longer than 100 milliseconds. Ensure that the value specied here (i.e. 100) is
above the slowms (page 1051) threshold.
See Also:
64 Chapter 8. Monitoring Database Systems
MongoDB Documentation, Release 2.4.1
Optimization Strategies for MongoDB Applications (page 493) addresses strategies that may improve the performance
of your database queries and operations.
8.4 Replication and Monitoring
The primary administrative concern that requires monitoring with replica sets, beyond the requirements for any Mon-
goDB instance, is replication lag. This refers to the amount of time that it takes a write operation on the primary
to replicate to a secondary. Some very small delay period may be acceptable; however, as replication lag grows, two
signicant problems emerge:
First, operations that have occurred in the period of lag are not replicated to one or more secondaries. If youre
using replication to ensure data persistence, exceptionally long delays may impact the integrity of your data set.
Second, if the replication lag exceeds the length of the operation log (oplog) then MongoDB will have to perform
an initial sync on the secondary, copying all data from the primary and rebuilding all indexes. In normal
circumstances this is uncommon given the typical size of the oplog, but its an issue to be aware of.
For causes of replication lag, see Replication Lag (page 349).
Replication issues are most often the result of network connectivity issues between members or the result of a primary
that does not have the resources to support application and replication trafc. To check the status of a replica, use the
replSetGetStatus (page 874) or the following helper in the shell:
rs.status()
See the Replica Set Status Reference (page 1106) document for a more in depth overview view of this output. In
general watch the value of optimeDate. Pay particular attention to the difference in time between the primary and
the secondary members.
The size of the operation log is only congurable during the rst run using the --oplogSize (page 995) argument
to the mongod (page 989) command, or preferably the oplogSize (page 1053) in the MongoDB conguration le.
If you do not specify this on the command line before running with the --replSet (page 995) option, mongod
(page 989) will create an default sized oplog.
By default the oplog is 5% of total available disk space on 64-bit systems.
See Also:
Change the Size of the Oplog (page 390)
8.5 Sharding and Monitoring
In most cases the components of sharded clusters benet from the same monitoring and analysis as all other MongoDB
instances. Additionally, clusters require monitoring to ensure that data is effectively distributed among nodes and that
sharding operations are functioning appropriately.
See Also:
See the Sharding (page 419) page for more information.
8.5.1 Cong Servers
The cong database provides a map of documents to shards. The cluster updates this map as chunks move between
shards. When a conguration server becomes inaccessible, some sharding operations like moving chunks and start-
ing mongos (page 999) instances become unavailable. However, clusters remain accessible from already-running
mongos (page 999) instances.
8.4. Replication and Monitoring 65
MongoDB Documentation, Release 2.4.1
Because inaccessible conguration servers can have a serious impact on the availability of a sharded cluster, you
should monitor the conguration servers to ensure that the cluster remains well balanced and that mongos (page 999)
instances can restart.
8.5.2 Balancing and Chunk Distribution
The most effective sharded cluster deployments require that chunks are evenly balanced among the shards. MongoDB
has a background balancer process that distributes data such that chunks are always optimally distributed among
the shards. Issue the db.printShardingStatus() (page 944) or sh.status() (page 963) command to the
mongos (page 999) by way of the mongo (page 1002) shell. This returns an overview of the entire cluster including
the database name, and a list of the chunks.
8.5.3 Stale Locks
In nearly every case, all locks used by the balancer are automatically released when they become stale. However,
because any long lasting lock can block future balancing, its important to insure that all locks are legitimate. To check
the lock status of the database, connect to a mongos (page 999) instance using the mongo (page 1002) shell. Issue
the following command sequence to switch to the config database and display all outstanding locks on the shard
database:
use config
db.locks.find()
For active deployments, the above query might return a useful result set. The balancing process, which originates on
a randomly selected mongos (page 999), takes a special balancer lock that prevents other balancing activity from
transpiring. Use the following command, also to the config database, to check the status of the balancer lock.
db.locks.find( { _id : "balancer" } )
If this lock exists, make sure that the balancer process is actively using this lock.
66 Chapter 8. Monitoring Database Systems
CHAPTER 9
Importing and Exporting MongoDB
Data
Full database instance backups (page 71) are useful for disaster recovery protection and routine database backup
operation; however, some cases require additional import and export functionality.
This document provides an overview of the import and export programs included in the MongoDB distribution. These
tools are useful when you want to backup or export a portion of your data without capturing the state of the entire
database, or for simple data ingestion cases. For more complex data migration tasks, you may want to write your own
import and export scripts using a client driver to interact with the database itself.
Warning: Because these tools primarily operate by interacting with a running mongod (page 989) instance, they
can impact the performance of your running database.
Not only do these processes create trafc for a running database instance, they also force the database to read all
data through memory. When MongoDB reads infrequently used data, it can supplant more frequently accessed
data, causing a deterioration in performance for the databases regular workload.
mongoimport (page 1022) and mongoexport (page 1025) do not reliably preserve all rich BSON data types,
because BSON is a superset of JSON. Thus, mongoimport (page 1022) and mongoexport (page 1025) cannot
represent BSON data accurately in JSON. As a result data exported or imported with these tools may lose some
measure of delity. See MongoDB Extended JSON (page 1136) for more information about MongoDB Extended
JSON.
See Also:
See the Backup Strategies for MongoDB Systems (page 71) document for more information on backing up MongoDB
instances. Additionally, consider the following references for commands addressed in this document:
mongoexport (page 1025)
mongorestore (page 1014)
mongodump (page 1010)
If you want to transform and process data once youve imported it in MongoDB consider the topics in Aggregation
(page 209), including:
Map-Reduce (page 243) and
Aggregation Framework (page 211).
67
MongoDB Documentation, Release 2.4.1
9.1 Data Type Fidelity
JSON does not have the following data types that exist in BSON documents: data_binary, data_date,
data_timestamp, data_regex, data_oid and data_ref. As a result using any tool that decodes BSON
documents into JSON will suffer some loss of delity.
If maintaining type delity is important, consider writing a data import and export system that does not force BSON
documents into JSON form as part of the process. The following list of types contain examples for how MongoDB
will represent how BSON documents render in JSON.
data_binary
{ "$binary" : "<bindata>", "$type" : "<t>" }
<bindata> is the base64 representation of a binary string. <t> is the hexadecimal representation of a single
byte indicating the data type.
data_date
Date( <date> )
<date> is the JSON representation of a 64-bit signed integer for milliseconds since epoch.
data_timestamp
Timestamp( <t>, <i> )
<t> is the JSON representation of a 32-bit unsigned integer for milliseconds since epoch. <i> is a 32-bit
unsigned integer for the increment.
data_regex
/<jRegex>/<jOptions>
<jRegex> is a string that may contain valid JSON characters and unescaped double quote (i.e. ") characters,
but may not contain unescaped forward slash (i.e. http://docs.mongodb.org/manual/) characters.
<jOptions> is a string that may contain only the characters g, i, m, and s.
data_oid
ObjectId( "<id>" )
<id> is a 24 character hexadecimal string. These representations require that data_oid values have an
associated eld named _id.
data_ref
DBRef( "<name>", "<id>" )
<name> is a string of valid JSON characters. <id> is a 24 character hexadecimal string.
See Also:
MongoDB Extended JSON (page 1136)
9.2 Data Import and Export and Backups Operations
For resilient and non-disruptive backups, use a le system or block-level disk snapshot function, such as the methods
described in the Backup Strategies for MongoDB Systems (page 71) document. The tools and operations discussed
provide functionality thats useful in the context of providing some kinds of backups.
68 Chapter 9. Importing and Exporting MongoDB Data
MongoDB Documentation, Release 2.4.1
By contrast, use import and export tools to backup a small subset of your data or to move data to or from a 3rd party
system. These backups may capture a small crucial set of data or a frequently modied section of data, for extra
insurance, or for ease of access. No matter how you decide to import or export your data, consider the following
guidelines:
Label les so that you can identify what point in time the export or backup reects.
Labeling should describe the contents of the backup, and reect the subset of the data corpus, captured in the
backup or export.
Do not create or apply exports if the backup process itself will have an adverse effect on a production system.
Make sure that they reect a consistent data state. Export or backup processes can impact data integrity (i.e.
type delity) and consistency if updates continue during the backup process.
Test backups and exports by restoring and importing to ensure that the backups are useful.
9.3 Human Intelligible Import/Export Formats
This section describes a process to import/export your database, or a portion thereof, to a le in a JSON or CSV format.
See Also:
The mongoimport (page 1022) and mongoexport (page 1025) documents contain complete documentation of these
tools. If you have questions about the function and parameters of these tools not covered here, please refer to these
documents.
If you want to simply copy a database or collection from one instance to another, consider using the copydb
(page 829), clone (page 822), or cloneCollection (page 822) commands, which may be more suited to this
task. The mongo (page 1002) shell provides the db.copyDatabase() (page 935) method.
These tools may also be useful for importing data into a MongoDB database from third party applications.
9.3.1 Collection Export with mongoexport
With the mongoexport (page 1025) utility you can create a backup le. In the most simple invocation, the command
takes the following form:
mongoexport --collection collection --out collection.json
This will export all documents in the collection named collection into the le collection.json. Without the
output specication (i.e. --out collection.json (page 1028),) mongoexport (page 1025) writes output
to standard output (i.e. stdout.) You can further narrow the results by supplying a query lter using the --query
(page 1027) and limit results to a single database using the --db (page 1027) option. For instance:
mongoexport --db sales --collection contacts --query {"field": 1}
This command returns all documents in the sales databases contacts collection, with a eld named field with
a value of 1. Enclose the query in single quotes (e.g. ) to ensure that it does not interact with your shell environment.
The resulting documents will return on standard output.
By default, mongoexport (page 1025) returns one JSON document per MongoDB document. Specify the
--jsonArray (page 1027) argument to return the export as a single JSON array. Use the --csv (page 1027)
le to return the result in CSV (comma separated values) format.
If your mongod (page 989) instance is not running, you can use the --dbpath (page 1027) option to specify the
location to your MongoDB instances database les. See the following example:
9.3. Human Intelligible Import/Export Formats 69
MongoDB Documentation, Release 2.4.1
mongoexport --db sales --collection contacts --dbpath /srv/MongoDB/
This reads the data les directly. This locks the data directory to prevent conicting writes. The mongod (page 989)
process must not be running or attached to these data les when you run mongoexport (page 1025) in this congu-
ration.
The --host (page 1026) and --port (page 1026) options allow you to specify a non-local host to connect to
capture the export. Consider the following example:
mongoexport --host mongodb1.example.net --port 37017 --username user --password pass --collection contacts --file mdb1-examplenet.json
On any mongoexport (page 1025) command you may, as above specify username and password credentials as
above.
9.3.2 Collection Import with mongoimport
To restore a backup taken with mongoexport (page 1025). Most of the arguments to mongoexport (page 1025)
also exist for mongoimport (page 1022). Consider the following command:
mongoimport --collection collection --file collection.json
This imports the contents of the le collection.json into the collection named collection. If you do not
specify a le with the --file (page 1024) option, mongoimport (page 1022) accepts input over standard input
(e.g. stdin.)
If you specify the --upsert (page 1024) option, all of mongoimport (page 1022) operations will attempt to
update existing documents in the database and insert other documents. This option will cause some performance
impact depending on your conguration.
You can specify the database option --db (page 1023) to import these documents to a particular database. If your
MongoDB instance is not running, use the --dbpath (page 1023) option to specify the location of your Mon-
goDB instances database les. Consider using the --journal (page 1023) option to ensure that mongoimport
(page 1022) records its operations in the journal. The mongod process must not be running or attached to these data
les when you run mongoimport (page 1022) in this conguration.
Use the --ignoreBlanks (page 1024) option to ignore blank elds. For CSV and TSV imports, this option
provides the desired functionality in most cases: it avoids inserting blank elds in MongoDB documents.
70 Chapter 9. Importing and Exporting MongoDB Data
CHAPTER 10
Backup Strategies for MongoDB
Systems
Backups are an important part of any operational disaster recovery plan. A good backup plan must be able to capture
data in a consistent and usable state, and operators must be able to automate both the backup and the recovery opera-
tions. Also test all components of the backup system to ensure that you can recover backed up data as needed. If you
cannot effectively restore your database from the backup, then your backups are useless. This document addresses
higher level backup strategies, for more information on specic backup procedures consider the following documents:
Use Filesystem Snapshots to Backup and Restore MongoDB Databases (page 677).
Use mongodump and mongorestore to Backup and Restore MongoDB Databases (page 675).
Backup a Small Sharded Cluster with mongodump (page 461)
Create Backup of a Sharded Cluster with Filesystem Snapshots (page 462)
Create Backup of a Sharded Cluster with Database Dumps (page 463)
Schedule Backup Window for Sharded Clusters (page 466)
Restore a Single Shard (page 464)
Restore Sharded Clusters (page 465)
10.1 Backup Considerations
As you develop a backup strategy for your MongoDB deployment consider the following factors:
Geography. Ensure that you move some backups away from the your primary database infrastructure.
System errors. Ensure that your backups can survive situations where hardware failures or disk errors impact
the integrity or availability of your backups.
Production constraints. Backup operations themselves sometimes require substantial system resources. It is
important to consider the time of the backup schedule relative to peak usage and maintenance windows.
System capabilities. Some of the block-level snapshot tools require special support on the operating-system or
infrastructure level.
Database conguration. Replication and sharding can affect the process and impact of the backup implementa-
tion. See Sharded Cluster Backup Considerations (page 72) and Replica Set Backup Considerations (page 73).
71
MongoDB Documentation, Release 2.4.1
Actual requirements. You may be able to save time, effort, and space by including only crucial data in the most
frequent backups and backing up less crucial data less frequently.
10.2 Approaches to Backing Up MongoDB Systems
There are two main methodologies for backing up MongoDB instances. Creating binary dumps of the database
using mongodump (page 1010) or creating lesystem level snapshots. Both methodologies have advantages and
disadvantages:
binary database dumps are comparatively small, because they dont include index content or pre-allocated free
space, and record padding (page 143). However, its impossible to capture a copy of a running system that
reects a single moment in time using a binary dump.
lesystem snapshots, sometimes called block level backups, produce larger backup sizes, but complete quickly
and can reect a single moment in time on a running system. However, snapshot systems require lesystem and
operating system support and tools.
The best option depends on the requirements of your deployment and disaster recovery needs. Typically, lesystem
snapshots are because of their accuracy and simplicity; however, mongodump (page 1010) is a viable option used
often to generate backups of MongoDB systems.
The following topics provide details and procedures on the two approaches:
Use Filesystem Snapshots to Backup and Restore MongoDB Databases (page 677).
Use mongodump and mongorestore to Backup and Restore MongoDB Databases (page 675).
In some cases, taking backups is difcult or impossible because of large data volumes, distributed architectures, and
data transmission speeds. In these situations, increase the number of members in your replica set or sets.
10.3 Backup Strategies for MongoDB Deployments
10.3.1 Sharded Cluster Backup Considerations
Important: To capture a point-in-time backup from a sharded cluster you must stop all writes to the cluster. On a
running production system, you can only capture an approximation of point-in-time snapshot.
Sharded clusters complicate backup operations, as distributed systems. True point-in-time backups are only possible
when stopping all write activity from the application. To create a precise moment-in-time snapshot of a cluster, stop
all application write activity to the database, capture a backup, and allow only write operations to the database after
the backup is complete.
However, you can capture a backup of a cluster that approximates a point-in-time backup by capturing a backup from
a secondary member of the replica sets that provide the shards in the cluster at roughly the same moment. If you
decide to use an approximate-point-in-time backup method, ensure that your application can operate using a copy of
the data that does not reect a single moment in time.
The following documents describe sharded cluster related backup procedures:
Backup a Small Sharded Cluster with mongodump (page 461)
Create Backup of a Sharded Cluster with Filesystem Snapshots (page 462)
Create Backup of a Sharded Cluster with Database Dumps (page 463)
Schedule Backup Window for Sharded Clusters (page 466)
72 Chapter 10. Backup Strategies for MongoDB Systems
MongoDB Documentation, Release 2.4.1
Restore a Single Shard (page 464)
Restore Sharded Clusters (page 465)
10.3.2 Replica Set Backup Considerations
In most cases, backing up data stored in a replica set is similar to backing up data stored in a single instance. It
is possible to lock a single secondary database and then create a backup from that instance. When you unlock the
database, the secondary will catch up with the primary. You may also choose to deploy a dedicated hidden member
for backup purposes.
If you have a sharded cluster where each shard is itself a replica set, you can use this method to create a backup of
the entire cluster without disrupting the operation of the node. In these situations you should still turn off the balancer
when you create backups.
For any cluster, using a non-primary node to create backups is particularly advantageous in that the backup operation
does not affect the performance of the primary. Replication itself provides some measure of redundancy. Nevertheless,
keeping point-in time backups of your cluster to provide for disaster recovery and as an additional layer of protection
is crucial.
10.3. Backup Strategies for MongoDB Deployments 73
MongoDB Documentation, Release 2.4.1
74 Chapter 10. Backup Strategies for MongoDB Systems
CHAPTER 11
Linux ulimit Settings
The Linux kernel provides a system to limit and control the number of threads, connections, and open les on a per-
process and per-user basis. These limits prevent single users from using too many system resources. Sometimes, these
limits, as congured by the distribution developers, are too low for MongoDB and can cause a number of issues in the
course of normal MongoDB operation. Generally, MongoDB should be the only user process on a system, to prevent
resource contention.
11.1 Resource Utilization
mongod (page 989) and mongos (page 999) each use threads and le descriptors to track connections and manage
internal operations. This section outlines the general resource utilization patterns for MongoDB. Use these gures in
combination with the actual information about your deployment and its use to determine ideal ulimit settings.
Generally, all mongod (page 989) and mongos (page 999) instances, like other processes:
track each incoming connection with a le descriptor and a thread.
track each internal thread or pthread as a system process.
11.1.1 mongod
1 le descriptor for each data le in use by the mongod (page 989) instance.
1 le descriptor for each journal le used by the mongod (page 989) instance when journal (page 1049) is
true.
In replica sets, each mongod (page 989) maintains a connection to all other members of the set.
mongod (page 989) uses background threads for a number of internal processes, including TTL collections (page 517),
replication, and replica set health checks, which may require a small number of additional resources.
11.1.2 mongos
In addition to the threads and le descriptors for client connections, mongos (page 999) must maintain connects to
all cong servers and all shards, which includes all members of all replica sets.
For mongos (page 999), consider the following behaviors:
75
MongoDB Documentation, Release 2.4.1
mongos (page 999) instances maintain a connection pool to each shard so that the mongos (page 999) can
reuse connections and quickly fulll requests without needing to create new connections.
You can limit the number of incoming connections using the maxConns (page 1046) run-time option:
:option:--maxConns <mongos --maxConns>
By restricting the number of incoming connections you can prevent a cascade effect where the mongos
(page 999) creates too many connections on the mongod (page 989) instances.
Note: You cannot set maxConns (page 1046) to a value higher than 20000.
11.2 Review and Set Resource Limits
11.2.1 ulimit
You can use the ulimit command at the system prompt to check system limits, as in the following example:
$ ulimit -a
-t: cpu time (seconds) unlimited
-f: file size (blocks) unlimited
-d: data seg size (kbytes) unlimited
-s: stack size (kbytes) 8192
-c: core file size (blocks) 0
-m: resident set size (kbytes) unlimited
-u: processes 192276
-n: file descriptors 21000
-l: locked-in-memory size (kb) 40000
-v: address space (kb) unlimited
-x: file locks unlimited
-i: pending signals 192276
-q: bytes in POSIX msg queues 819200
-e: max nice 30
-r: max rt priority 65
-N 15: unlimited
ulimit refers to the per-user limitations for various resources. Therefore, if your mongod (page 989) instance
executes as a user that is also running multiple processes, or multiple mongod (page 989) processes, you might see
contention for these resources. Also, be aware that the processes value (i.e. -u) refers to the combined number of
distinct processes and sub-process threads.
You can change ulimit settings by issuing a command in the following form:
ulimit -n <value>
For many distributions of Linux you can change values by substituting the -n option for any possible value in the
output of ulimit -a. See your operating system documentation for the precise procedure for changing system
limits on running systems.
Note: After changing the ulimit settings, you must restart the process to take advantage of the modied settings.
You can use the http://docs.mongodb.org/manual/proc le system to see the current limitations on a
running process.
Depending on your systems conguration, and default settings, any change to system limits made using ulimit
may revert following system a system restart. Check your distribution and operating system documentation for more
information.
76 Chapter 11. Linux ulimit Settings
MongoDB Documentation, Release 2.4.1
11.2.2 /proc File System
Note: This section applies only to Linux operating systems.
The http://docs.mongodb.org/manual/proc le-system stores the per-process limits in the le system ob-
ject located at http://docs.mongodb.org/manual/proc/<pid>/limits, where <pid> is the processs
PID or process identier. You can use the following bash function to return the content of the limits object for a
process or processes with a given name:
return-limits(){
for process in $@; do
process_pids=ps -C $process -o pid --no-headers | cut -d " " -f 2
if [ -z $@ ]; then
echo "[no $process running]"
else
for pid in $process_pids; do
echo "[$process #$pid -- limits]"
cat /proc/$pid/limits
done
fi
done
}
You can copy and paste this function into a current shell session or load it as part of a script. Call the function with
one the following invocations:
return-limits mongod
return-limits mongos
return-limits mongod mongos
The output of the rst command may resemble the following:
[mongod #6809 -- limits]
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8720000 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 192276 192276 processes
Max open files 1024 4096 files
Max locked memory 40960000 40960000 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 192276 192276 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 30 30
Max realtime priority 65 65
Max realtime timeout unlimited unlimited us
11.2. Review and Set Resource Limits 77
MongoDB Documentation, Release 2.4.1
11.3 Recommended Settings
Every deployment may have unique requirements and settings; however, the following thresholds and settings are
particularly important for mongod (page 989) and mongos (page 999) deployments:
-f (le size): unlimited
-t (cpu time): unlimited
-v (virtual memory): unlimited
-n (open les): 64000
-m (memory size): unlimited
1
-u (processes/threads): 32000
Always remember to restart your mongod (page 989) and mongos (page 999) instances after changing the ulimit
settings to make sure that the settings change takes effect.
1
If you limit the resident memory size on a system running MongoDB you risk allowing the operating system to terminate the mongod
(page 989) process under normal situations. Do not set this value. If the operating system (i.e. Linux) kills your mongod (page 989), with the
OOM killer, check the output of serverStatus (page 878) and ensure MongoDB is not leaking memory.
78 Chapter 11. Linux ulimit Settings
CHAPTER 12
Production Notes
12.1 Overview
This page details system congurations that affect MongoDB, especially in production.
12.2 Backups
To make backups of your MongoDB database, please refer to Backup Strategies for MongoDB Systems (page 71).
12.3 Networking
Always run MongoDB in a trusted environment, with network rules that prevent access from all unknown machines,
systems, or networks. As with any sensitive system dependent on network access, your MongoDB deployment should
only be accessible to specic systems that require access: application servers, monitoring services, and other Mon-
goDB components.
See documents in the Security (page 89) section for additional information, specically:
Interfaces and Port Numbers (page 92)
Firewalls (page 93)
Congure Linux iptables Firewall for MongoDB (page 99)
Congure Windows netsh Firewall for MongoDB (page 103)
12.4 MongoDB on Linux
If you use the Linux kernel, the MongoDB user community has recommended Linux kernel 2.6.36 or later for running
MongoDB in production.
Because MongoDB preallocates its database les before using them and because MongoDB uses very large les on
average, you should use the Ext4 and XFS le systems if using the Linux kernel:
If you use the Ext4 le system, use at least version 2.6.23 of the Linux Kernel.
79
MongoDB Documentation, Release 2.4.1
If you use the XFS le system, use at least version 2.6.25 of the Linux Kernel.
For MongoDB on Linux use the following recommended congurations:
Turn off atime for the storage volume with the database les.
Set the le descriptor limit and the user process limit above 20,000, according to the suggestions in Linux ulimit
Settings (page 75). A low ulimit will affect MongoDB when under heavy use and will produce weird errors.
Do not use hugepages virtual memory pages, MongoDB performs better with normal virtual memory pages.
Disable NUMA in your BIOS. If that is not possible see NUMA (page 81).
Ensure that readahead settings for the block devices that store the database les are acceptable. See the Reada-
head (page 80) section
Use NTP to synchronize time among your hosts. This is especially important in sharded clusters.
12.5 Readahead
For random access use patterns set readahead values low, for example setting readahead to a small value such as 32
(16KB) often works well.
12.6 MongoDB on Virtual Environments
The section describes considerations when running MongoDB in some of the more common virtual environments.
12.6.1 EC2
MongoDB is compatible with EC2 and requires no conguration changes specic to the environment.
12.6.2 VMWare
MongoDB is compatible with VMWare. Some in the MongoDB community have run into issues with VMWares
memory overcommit feature and suggest disabling the feature.
You can clone a virtual machine running MongoDB. You might use this to spin up a new virtual host that will be added
as a member of a replica set. If journaling is enabled, the clone snapshot will be consistent. If not using journaling,
stop mongod (page 989), clone, and then restart.
12.6.3 OpenVZ
The MongoDB community has encountered issues running MongoDB on OpenVZ.
12.7 Disk and Storage Systems
12.7.1 Swap
Congure swap space for your systems. Having swap can prevent issues with memory contention and can prevent
the OOM Killer on Linux systems from killing mongod (page 989). Because of the way mongod (page 989) maps
80 Chapter 12. Production Notes
MongoDB Documentation, Release 2.4.1
memory les to memory, the operating system will never store MongoDB data in swap.
12.7.2 RAID
Most MongoDB deployments should use disks backed by RAID-10.
RAID-5 and RAID-6 do not typically provide sufcient performance to support a MongoDB deployment.
RAID-0 provides good write performance but provides limited availability, and reduced performance on read opera-
tions, particularly using Amazons EBS volumes: as a result, avoid RAID-0 with MongoDB deployments.
12.7.3 Remote Filesystems
Some versions of NFS perform very poorly with MongoDB and NFS is not recommended for use with MongoDB.
Performance problems arise when both the data les and the journal les are both hosted on NFS: you may experience
better performance if you place the journal on local or iscsi volumes. If you must use NFS, add the following NFS
options to your http://docs.mongodb.org/manual/etc/fstab le: bg, nolock, and noatime.
Many MongoDB deployments work successfully with Amazons Elastic Block Store (EBS) volumes. There are certain
intrinsic performance characteristics, with EBS volumes that users should consider.
12.8 Hardware Requirements and Limitations
MongoDB is designed specically with commodity hardware in mind and has few hardware requirements or limita-
tions. MongoDB core components runs on little-endian hardware primarily x86/x86_64 processors. Client libraries
(i.e. drivers) can run on big or little endian systems.
When installing hardware for MongoDB, consider the following:
As with all software, more RAM and a faster CPU clock speed are important to productivity.
Because databases do not perform high amounts of computation, increasing the number cores helps but does not
provide a high level of marginal return.
MongoDB has good results and good price/performance with SATA SSD (Solid State Disk) and with PCI (Pe-
ripheral Component Interconnect).
Commodity (SATA) spinning drives are often a good option as the speed increase for random I/O for more
expensive drives is not that dramatic (only on the order of 2x), spending that money on SSDs or RAM may be
more effective.
12.8.1 MongoDB on NUMA Hardware
MongoDB and NUMA, Non-Uniform Access Memory, do not work well together. When running MongoDB on
NUMA hardware, disable NUMA for MongoDB and run with an interleave memory policy. NUMA can cause a
number of operational problems with MongoDB, including slow performance for periods of time or high system
processor usage.
Note: On Linux, MongoDB version 2.0 and greater checks these settings on start up and prints a warning if the
system is NUMA-based.
To disable NUMA for MongoDB, use the numactl command and start mongod (page 989) in the following manner:
12.8. Hardware Requirements and Limitations 81
MongoDB Documentation, Release 2.4.1
numactl --interleave=all /usr/bin/local/mongod
Adjust the proc settings using the following command:
echo 0 > /proc/sys/vm/zone_reclaim_mode
To fully disable NUMA you must perform both operations. However, you can change zone_reclaim_mode
without restarting mongod. For more information, see documentation on Proc/sys/vm.
See the The MySQL swap insanity problem and the effects of NUMA post, which describes the effects of NUMA on
databases. This blog post addresses the impact of NUMA for MySQL; however, the issues for MongoDB are similar.
The post introduces NUMA its goals, and illustrates how these goals are not compatible with production databases.
12.9 Performance Monitoring
12.9.1 iostat
On Linux, use the iostat command to check if disk I/O is a bottleneck for your database. Specify a number of seconds
when running iostat to avoid displaying stats covering the time since server boot.
For example:
iostat -xm 2
Use the mount command to see what device your data directory (page 1048) resides on.
Key elds from iostat:
%util: this is the most useful eld for a quick check, it indicates what percent of the time the device/drive is
in use.
avgrq-sz: average request size. Smaller number for this value reect more random IO operations.
12.9.2 bwm-ng
bwm-ng is a command-line tool for monitoring network use. If you suspect a network-based bottleneck, you may use
bwm-ng to begin your diagnostic process.
12.10 Production Checklist
12.10.1 64-bit Builds for Production
Always use 64-bit Builds for Production. MongoDB uses memory mapped les. See the 32-bit limitations (page 704)
for more information.
32-bit builds exist to support use on development machines and also for other miscellaneous things such as replica set
arbiters.
12.10.2 BSON Document Size Limit
There is a BSON Document Size (page 1133) at the time of this writing 16MB per document. If you have large
objects, use GridFS (page 162) instead.
82 Chapter 12. Production Notes
MongoDB Documentation, Release 2.4.1
12.10.3 Set Appropriate Write Concern for Write Operations
See write concern (page 140) for more information.
12.10.4 Dynamic Schema
Data in MongoDB has a dynamic schema. Collections do not enforce document structure. This facilitates iterative
development and polymorphism. However, collections often hold documents with highly homogeneous structures.
See Data Modeling Considerations for MongoDB Applications (page 147) for more information.
Some operational considerations include:
the exact set of collections to be used
the indexes to be used, which are created explicitly except for the _id index
shard key declarations, which are explicit and quite important as it is hard to change shard keys later
One very simple rule-of-thumb is not to import data from a relational database unmodied: you will generally want to
roll up certain data into richer documents that use some embedding of nested documents and arrays (and/or arrays
of subdocuments).
12.10.5 Updates by Default Affect Only one Document
Set the multi parameter to true to update (page 933) multiple documents that meet the query criteria. The
mongo (page 1002) shell syntax is:
db.my_collection_name.update(my_query, my_update_expression, bool_upsert, bool_multi)
Set bool_multi to true when updating many documents. Otherwise only the rst matched will update.
12.10.6 Case Sensitive Strings
MongoDB strings are case sensitive. So a search for "joe" will not nd "Joe".
Consider:
storing data in a normalized case format, or
using regular expressions ending with http://docs.mongodb.org/manual/i
and/or using $toLower (page 815) or $toUpper (page 815) in the aggregation framework (page 211)
12.10.7 Type Sensitive Fields
MongoDB data which is JSON-style, specically, BSON format have several data types.
Consider the following document which has a eld x with the string value "123":
{ x : "123" }
Then the following query which looks for a number value 123 will not return that document:
db.mycollection.find( { x : 123 } )
12.10. Production Checklist 83
MongoDB Documentation, Release 2.4.1
12.10.8 Locking
Older versions of MongoDB used a global lock; use MongoDB v2.2+ for better results. See the Concurrency
(page 719) page for more information.
12.10.9 Packages
Be sure you have the latest stable release if you are using a package manager. You can see what is current on the
Downloads page, even if you then choose to install via a package manager.
12.10.10 Use Odd Number of Replica Set Members
Replica sets (page 331) perform consensus elections. Use either an odd number of members (e.g., three) or else use
an arbiter to get up to an odd number of votes.
12.10.11 Dont disable journaling
See Journaling (page 43) for more information.
12.10.12 Keep Replica Set Members Up-to-Date
This is important as MongoDB replica sets support automatic failover. Thus you want your secondaries to be up-to-
date. You have a few options here:
1. Monitoring and alerts for any lagging can be done via various means. MMS shows a graph of replica set lag
2. Using getLastError (page 357) with w:majority, you will get a timeout or no return if a majority of the
set is lagging. This is thus another way to guard against lag and get some reporting back of its occurrence.
3. Or, if you want to fail over manually, you can set your secondaries to priority:0 in their conguration.
Then manual action would be required for a failover. This is practical for a small cluster; for a large cluster you
will want automation.
Additionally, see information on replica set rollbacks (page 335).
12.10.13 Additional Deployment Considerations
Pick your shard keys carefully! There is no way to modify a shard key on a collection that is already sharded.
You cannot shard an existing collection over 256 gigabytes. To shard large amounts of data, create a new empty
sharded collection, and ingest the data from the source collection using an application level import operation.
Unique indexes are not enforced across shards except for the shard key itself. See Enforce Unique Keys for
Sharded Collections (page 468).
Consider pre-splitting (page 424) a sharded collection before a massive bulk import. Usually this isnt necessary
but on a bulk import of size it is helpful.
Use security/auth (page 91) mode if you need it. By default auth (page 1048) is not enabled and mongod
(page 989) assumes a trusted environment.
You do not have fully generalized transactions (page 511). Create rich documents and read the preceding link
and consider the use case often there is a good t.
84 Chapter 12. Production Notes
MongoDB Documentation, Release 2.4.1
Disable NUMA for best results. If you have NUMA enabled, mongod (page 989) will print a warning when it
starts.
Avoid excessive prefetch/readahead on the lesystem. Check your prefetch settings. Note on linux the parameter
is in sectors, not bytes. 32KBytes (a setting of 64 sectors) is pretty reasonable.
Check ulimits (page 75) settings.
Use SSD if available and economical. Spinning disks can work well but SSDs capacity for random I/O oper-
ations work well with the update model of mongod (page 989). See Remote Filesystems (page 81) for more
info.
Ensure that clients keep reasonable pool sizes to avoid overloading the connection tracking capacity of a single
mongod (page 989) or mongos (page 999) instance.
See Also:
Replica Set Operation and Management (page 339)
Replica Set Architectures and Deployment Patterns (page 354)
Sharded Cluster Administration (page 424)
Sharded Cluster Architectures (page 429)
Tag Aware Sharding (page 466)
Indexing Overview (page 273)
Indexing Operations (page 283)
Additionally, Consider the MongoDB Tutorials (page 657) page that contains a full index of all tutorials available
in the MongoDB manual. These documents provide pragmatic instructions for common operational practices and
administrative tasks.
12.10. Production Checklist 85
MongoDB Documentation, Release 2.4.1
86 Chapter 12. Production Notes
Part III
Security
87
MongoDB Documentation, Release 2.4.1
The documents outline basic security practices and risk management strategies. Additionally, this section includes
MongoDB Tutorials (page 657) that outline basic network lter and rewall rules to congure trusted environments
for MongoDB.
89
MongoDB Documentation, Release 2.4.1
90
CHAPTER 13
Strategies and Practices
13.1 Security Practices and Management
As with all software running in a networked environment, administrators of MongoDB must consider security and
risk exposures for a MongoDB deployment. There are no magic solutions for risk mitigation, and maintaining a
secure MongoDB deployment is an ongoing process. This document takes a Defense in Depth approach to securing
MongoDB deployments, and addresses a number of different methods for managing risk and reducing risk exposure
The intent of Defense In Depth approaches are to ensure there are no exploitable points of failure in your deployment
that could allow an intruder or un-trusted party to access the data stored in the MongoDB database. The easiest and
most effective way to reduce the risk of exploitation is to run MongoDB in a trusted environment, limit access, follow
a system of least privilege, and follow best development and deployment practices. See the Strategies for Reducing
Risk (page 91) section for more information.
13.1.1 Strategies for Reducing Risk
The most effective way to reduce risk for MongoDB deployments is to run your entire MongoDB deployment, includ-
ing all MongoDB components (i.e. mongod (page 989), mongos (page 999) and application instances) in a trusted
environment. Trusted environments use the following strategies to control access:
network lter (e.g. rewall) rules that block all connections from unknown systems to MongoDB components.
bind mongod (page 989) and mongos (page 999) instances to specic IP addresses to limit accessibility.
limit MongoDB programs to non-public local networks, and virtual private networks.
You may further reduce risk by:
requiring authentication for access to MongoDB instances.
requiring strong, complex, single purpose authentication credentials. This should be part of your internal secu-
rity policy but is not currently congurable in MongoDB.
deploying a model of least privilege, where all users have only the amount of access they need to accomplish
required tasks, and no more.
following the best application development and deployment practices, which includes: validating all inputs,
managing sessions, and application-level access control.
Continue reading this document for more information on specic strategies and congurations to help reduce the risk
exposure of your application.
91
MongoDB Documentation, Release 2.4.1
13.1.2 Vulnerability Notication
10gen takes the security of MongoDB and associated products very seriously. If you discover a vulnerability in
MongoDB or another 10gen product, or would like to know more about our vulnerability reporting and response
process, see the Vulnerability Notication (page 96) document.
13.1.3 Networking Risk Exposure
Interfaces and Port Numbers
The following list includes all default ports used by MongoDB:
By default, listens for connections on the following ports:
27017 This is the default port mongod (page 989) and mongos (page 999) instances. You can change this port with
port (page 1046) or --port (page 990).
27018 This is the default port when running with --shardsvr (page 997) runtime operation or shardsvr
(page 1054) setting.
27019 This is the default port when running with --configsvr (page 996) runtime operation or configsvr
(page 1054) setting.
28017 This is the default port for the web status page. This is always accessible at a port that is 1000 greater than
the port determined by port (page 1046).
By default MongoDB programs (i.e. mongos (page 999) and mongod (page 989)) will bind to all available network
interfaces (i.e. IP addresses) on a system. The next section outlines various runtime options that allow you to limit
access to MongoDB programs.
Network Interface Limitation
You can limit the network exposure with the following conguration options:
the nohttpinterface (page 1049) setting for mongod (page 989) and mongos (page 999) instances.
Disables the home status page, which would run on port 28017 by default. The status interface is read-
only by default. You may also specify this option on the command line as mongod --nohttpinterface
(page 993) or mongos --nohttpinterface (page 1001). Authentication does not control or affect access
to this interface.
Important: Disable this option for production deployments. If do you leave this interface enabled, you should
only allow trusted clients to access this port.
the port (page 1046) setting for mongod (page 989) and mongos (page 999) instances.
Changes the main port on which the mongod (page 989) or mongos (page 999) instance listens for connections.
Changing the port does not meaningfully reduce risk or limit exposure.
You may also specify this option on the command line as mongod --port (page 990) or mongos --port
(page 999).
Whatever port you attach mongod (page 989) and mongos (page 999) instances to, you should only allow
trusted clients to connect to this port.
92 Chapter 13. Strategies and Practices
MongoDB Documentation, Release 2.4.1
the rest (page 1050) setting for mongod (page 989) and mongos (page 999) instances.
Enables a fully interactive administrative REST interface, which is disabled by default. The status interface,
which is enabled by default, is read-only. This conguration makes that interface fully interactive. The REST
interface does not support any authentication and you should always restrict access to this interface to only allow
trusted clients to connect to this port.
You may also enable this interface on the command line as mongod --rest (page 993).
Important: Disable this option for production deployments. If do you leave this interface enabled, you should
only allow trusted clients to access this port.
the bind_ip (page 1046) setting for mongod (page 989) and mongos (page 999) instances.
Limits the network interfaces on which MongoDB programs will listen for incoming connections. You can
also specify a number of interfaces by passing bind_ip (page 1046) a comma separated list of IP addresses.
You can use the mongod --bind_ip (page 990) and mongos --bind_ip (page 1000) option on the
command line at run time to limit the network accessibility of a MongoDB program.
Important: Make sure that your mongod (page 989) and mongos (page 999) instances are only accessible on
trusted networks. If your system has more than one network interface, bind MongoDB programs to the private
or internal network interface.
Firewalls
Firewalls allow administrators to lter and control access to a system by providing granular control over what network
communications. For administrators of MongoDB, the following capabilities are important:
limiting incoming trafc on a specic port to specic systems.
limiting incoming trafc from untrusted hosts.
On Linux systems, the iptables interface provides access to the underlying netfilter rewall. On Windows
systems netsh command line interface provides access to the underlying Windows Firewall. For additional informa-
tion about rewall conguration consider the following documents:
Congure Linux iptables Firewall for MongoDB (page 99)
Congure Windows netsh Firewall for MongoDB (page 103)
For best results and to minimize overall exposure, ensure that only trafc from trusted sources can reach mongod
(page 989) and mongos (page 999) instances and that the mongod (page 989) and mongos (page 999) instances can
only connect to trusted outputs.
See Also:
For MongoDB deployments on Amazons web services, see the Amazon EC2 page, which addresses Amazons Secu-
rity Groups and other EC2-specic security features.
Virtual Private Networks
Virtual private networks, or VPNs, make it possible to link two networks over an encrypted and limited-access trusted
network. Typically MongoDB users who use VPNs use SSL rather than IPSEC VPNs for performance issues.
Depending on conguration and implementation VPNs provide for certicate validation and a choice of encryption
protocols, which requires a rigorous level of authentication and identication of all clients. Furthermore, because
13.1. Security Practices and Management 93
MongoDB Documentation, Release 2.4.1
VPNs provide a secure tunnel, using a VPN connection to control access to your MongoDB instance, you can prevent
tampering and man-in-the-middle attacks.
13.1.4 Operations
Always run the mongod (page 989) or mongos (page 999) process as a unique user with the minimum required
permissions and access. Never run a MongoDB program as a root or administrative users. The system users that run
the MongoDB processes should have robust authentication credentials that prevent unauthorized or casual access.
To further limit the environment, you can run the mongod (page 989) or mongos (page 999) process in a chroot
environment. Both user-based access restrictions and chroot conguration follow recommended conventions for
administering all daemon processes on Unix-like systems.
You can disable anonymous access to the database by enabling authentication using the auth (page 1048) as detailed
in the Authentication (page 94) section.
13.1.5 Authentication
MongoDB provides basic support for authentication with the auth (page 1048) setting. For multi-instance deploy-
ments (i.e. replica sets, and sharded clusters) use the keyFile (page 1047) setting, which implies auth (page 1048),
and allows intra-deployment authentication and operation. Be aware of the following behaviors of MongoDBs au-
thentication system:
Authentication is disabled by default.
MongoDB provisions access on a per-database level. Users either have read only access to a database or normal
access to a database that permits full read and write access to the database. Normal access conveys the ability
to add additional users to the database.
The system.users collection in each database stores all credentials. You can query the authorized users
with the following operation:
db.system.users.find()
The admin database is unique. Users with normal access to the admin database have read and write access to
all databases. Users with read only access to the admin database have read only access to all databases, with
the exception of the system.users collection, which is protected to prevent privilege escalation attacks.
Additionally the admin database exposes several commands and functionality, such as listDatabases
(page 859).
Once authenticated a normal user has full read and write access to a database.
If you have authenticated to a database as a normal, read and write, user; authenticating as a read-only user on
the same database will invalidate the earlier authentication, leaving the current connection with read only access.
If you have authenticated to the admin database as normal, read and write, user; logging into a different database
as a read only user will not invalidate the authentication to the admin database. In this situation, this client will
be able to read and write data to this second database.
When setting up authentication for the rst time you must either:
1. add at least one user to the admin database before starting the mongod (page 989) instance with auth
(page 1048).
94 Chapter 13. Strategies and Practices
MongoDB Documentation, Release 2.4.1
2. add the rst user to the admin database when connected to the mongod (page 989) instance from a
localhost connection.
1
New in version 2.0: Support for authentication with sharded clusters. Before 2.0 sharded clusters had to run with
trusted applications and a trusted networking conguration. Consider the Control Access to MongoDB Instances with
Authentication (page 106) document which outlines procedures for conguring and maintaining users and access with
MongoDBs authentication system.
13.1.6 Interfaces
Simply limiting access to a mongod (page 989) is not sufcient for totally controlling risk exposure. Consider the
recommendations in the following section, for limiting exposure other interface-related risks.
JavaScript and the Security of the mongo Shell
Be aware of the following capabilities and behaviors of the mongo (page 1002) shell:
mongo (page 1002) will evaluate a .js le passed to the mongo --eval (page 1003) option. The mongo
(page 1002) shell does not validate the input of JavaScript input to --eval (page 1003).
mongo (page 1002) will evaluate a .mongorc.js le before starting. You can disable this behavior by passing
the mongo --norc (page 1003) option.
On Linux and Unix systems, mongo (page 1002) reads the .mongorc.js le from $HOME/.mongorc.js
(i.e. ~/.mongorc.js), and Windows mongo.exe reads the .mongorc.js le from
%HOME%.mongorc.js or %HOMEDRIVE%%HOMEPATH%.mongorc.js.
HTTP Status Interface
The HTTP status interface provides a web-based interface that includes a variety of operational data, logs, and status
reports regarding the mongod (page 989) or mongos (page 999) instance. The HTTP interface is always available on
the port numbered 1000 greater than the primary mongod (page 989) port. By default this is 28017, but is indirectly
set using the port (page 1046) option which allows you to congure the primary mongod (page 989) port.
Without the rest (page 1050) setting, this interface is entirely read-only, and limited in scope; nevertheless, this
interface may represent an exposure. To disable the HTTP interface, set the nohttpinterface (page 1049) run
time option or the --nohttpinterface (page 993) command line option.
REST API
The REST API to MongoDB provides additional information and write access on top of the HTTP Status interface.
The REST interface is disabled by default, and is not recommended for production use.
While the REST API does not provide any support for insert, update, or remove operations, it does provide adminis-
trative access, and its accessibility represents a vulnerability in a secure environment.
If you must use the REST API, please control and limit access to the REST API. The REST API does not include any
support for authentication, even if when running with auth (page 1048) enabled.
See the following documents for instructions on restricting access to the REST API interface:
Congure Linux iptables Firewall for MongoDB (page 99)
1
Because of SERVER-6591, you cannot add the rst user to a sharded cluster using the localhost connection in 2.2. If you are running a
2.2 sharded cluster, and want to enable authentication, you must deploy the cluster and add the rst user to the admin database before restarting
the cluster to run with keyFile (page 1047).
13.1. Security Practices and Management 95
MongoDB Documentation, Release 2.4.1
Congure Windows netsh Firewall for MongoDB (page 103)
13.1.7 Data Encryption
To support audit requirements, you may need to encrypt data stored in MongoDB. For best results you can encrypt
this data in the application layer, by encrypting the content of elds that hold secure data.
Additionally, 10gen has a partnership with Gazzang to encrypt and secure sensitive data within MongoDB. The solu-
tion encrypts data in real time and Gazzang provides advanced key management that ensures only authorized processes
and can access this data. The Gazzang software ensures that the cryptographic keys remain safe and ensures compli-
ance with standards including HIPAA, PCI-DSS, and FERPA. For more information consider the following resources:
Datasheet
Webinar
13.2 Vulnerability Notication
10gen values the privacy and security of all users of MongoDB, and we work very hard to ensure that MongoDB and
related tools minimize risk exposure and increase the security and integrity of data and environments using MongoDB.
13.2.1 Notication
If you believe you have discovered a vulnerability in MongoDB or a related product or have experienced a security
incident related to MongoDB, please report these issues so that 10gen can respond appropriately and work to prevent
additional issues in the future. All vulnerability reports should contain as much information as possible so that we can
move quickly to resolve the issue. In particular, please include the following:
The name of the product.
Common Vulnerability information, if applicable, including:
CVSS (Common Vulnerability Scoring System) Score.
CVE (Common Vulnerability and Exposures) Identier.
Contact information, including an email address and/or phone number, if applicable.
10gen will respond to all vulnerability notications within 48 hours.
Jira
10gen prefers jira.mongodb.org for all communication regarding MongoDB and related products.
Submit a ticket in the Core Server Security project, at: <https://jira.mongodb.org/browse/SECURITY/>. The ticket
number will become reference identication for the issue for the lifetime of the issue, and you can use this identier
for tracking purposes.
10gen will respond to any vulnerability notication received in a Jira case posted to the SECURITY project.
96 Chapter 13. Strategies and Practices
MongoDB Documentation, Release 2.4.1
Email
While Jira is the preferred communication vector, you may also report vulnerabilities via email to <secu-
[email protected]>.
You may encrypt email using our public key, to ensure the privacy of a any sensitive information in your vulnerability
report.
10gen will respond to any vulnerability notication received via email with email which will contain a reference
number (i.e. a ticket from the SECURITY project,) Jira case posted to the SECURITY project.
Evaluation
10gen will validate all submitted vulnerabilities. 10gen will use Jira to track all communications regarding the vulner-
ability, which may include requests for clarication and for additional information. If needed 10gen representatives
can set up a conference call to exchange information regaining the vulnerability.
Disclosure
10gen requests that you do not publicly disclose any information regarding the vulnerability or exploit until 10gen has
had the opportunity to analyze the vulnerability, respond to the notication, and to notify key users, customers, and
partners if needed.
The amount of time required to validate a reported vulnerability depends on the complexity and severity of the issue.
10gen takes all required vulnerabilities very seriously, and will always ensure that there is a clear and open channel of
communication with the reporter of the vulnerability.
After validating the issue, 10gen will coordinate public disclosure of the issue with the reporter in a mutually agreed
timeframe and format. If required or requested, the reporter of a vulnerability will receive credit in the published
security bulletin.
13.2. Vulnerability Notication 97
MongoDB Documentation, Release 2.4.1
98 Chapter 13. Strategies and Practices
CHAPTER 14
Tutorials
14.1 Congure Linux iptables Firewall for MongoDB
On contemporary Linux systems, the iptables program provides methods for managing the Linux Kernels
netfilter or network packet ltering capabilities. These rewall rules make it possible for administrators to
control what hosts can connect to the system, and limit risk exposure by limiting the hosts that can connect to a
system.
This document outlines basic rewall congurations for iptables rewalls on Linux. Use these approaches as a
starting point for your larger networking organization. For a detailed over view of security practices and risk manage-
ment for MongoDB, see Security Practices and Management (page 91).
See Also:
For MongoDB deployments on Amazons web services, see the Amazon EC2 page, which addresses Amazons Secu-
rity Groups and other EC2-specic security features.
14.1.1 Overview
Rules in iptables congurations fall into chains, which describe the process for ltering and processing specic
streams of trafc. Chains have an order, and packets must pass through earlier rules in a chain to reach later rules.
This document only the following two chains:
INPUT Controls all incoming trafc.
OUTPUT Controls all outgoing trafc.
Given the default ports (page 92) of all MongoDB processes, you must congure networking rules that permit only
required communication between your application and the appropriate mongod (page 989) and mongos (page 999)
instances.
Be aware that, by default, the default policy of iptables is to allow all connections and trafc unless explicitly
disabled. The conguration changes outlined in this document will create rules that explicitly allow trafc from
specic addresses and on specic ports, using a default policy that drops all trafc that is not explicitly allowed. When
you have properly congured your iptables rules to allow only the trafc that you want to permit, you can Change
Default Policy to DROP (page 102).
99
MongoDB Documentation, Release 2.4.1
14.1.2 Patterns
This section contains a number of patterns and examples for conguring iptables for use with MongoDB deploy-
ments. If you have congured different ports using the port (page 1046) conguration setting, you will need to
modify the rules accordingly.
Trafc to and from mongod Instances
This pattern is applicable to all mongod (page 989) instances running as standalone instances or as part of a replica
set.
The goal of this pattern is to explicitly allow trafc to the mongod (page 989) instance from the application server. In
the following examples, replace <ip-address> with the IP address of the application server:
iptables -A INPUT -s <ip-address> -p tcp --destination-port 27017 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -d <ip-address> -p tcp --source-port 27017 -m state --state ESTABLISHED -j ACCEPT
The rst rule allows all incoming trafc from <ip-address> on port 27017, which allows the application server
to connect to the mongod (page 989) instance. The second rule, allows outgoing trafc from the mongod (page 989)
to reach the application server.
Optional
If you have only one application server, you can replace <ip-address> with either the IP address itself, such as:
198.51.100.55. You can also express this using CIDR notation as 198.51.100.55/32. If you want to permit
a larger block of possible IP addresses you can allow trafc from a http://docs.mongodb.org/manual/24
using one of the following specications for the <ip-address>, as follows:
10.10.10.10/24
10.10.10.10/255.255.255.0
Trafc to and from mongos Instances
mongos (page 999) instances provide query routing for sharded clusters. Clients connect to mongos (page 999) in-
stances, which behave from the clients perspective as mongod (page 989) instances. In turn, the mongos (page 999)
connects to all mongod (page 989) instances that are components of the sharded cluster.
Use the same iptables command to allow trafc to and from these instances as you would from the mongod
(page 989) instances that are members of the replica set. Take the conguration outlined in the Trafc to and from
mongod Instances (page 100) section as an example.
Trafc to and from a MongoDB Cong Server
Cong servers, host the cong database that stores metadata for sharded clusters. Each production cluster has three
cong servers, initiated using the mongod --configsvr (page 996) option.
1
Cong servers listen for connections
on port 27019. As a result, add the following iptables rules to the cong server to allow incoming and outgoing
connection on port 27019, for connection to the other cong servers.
iptables -A INPUT -s <ip-address> -p tcp --destination-port 27019 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -d <ip-address> -p tcp --source-port 27019 -m state --state ESTABLISHED -j ACCEPT
1
You can also run a cong server by setting the configsvr (page 1054) option in a conguration le.
100 Chapter 14. Tutorials
MongoDB Documentation, Release 2.4.1
Replace <ip-address> with the address or address space of all the mongod (page 989) that provide cong servers.
Additionally, cong servers need to allow incoming connections from all of the mongos (page 999) instances in the
cluster and all mongod (page 989) instances in the cluster. Add rules that resemble the following:
iptables -A INPUT -s <ip-address> -p tcp --destination-port 27019 -m state --state NEW,ESTABLISHED -j ACCEPT
Replace <ip-address> with the address of the mongos (page 999) instances and the shard mongod (page 989)
instances.
Trafc to and from a MongoDB Shard Server
For shard servers, running as mongod --shardsvr (page 997)
2
Because the default port number when running
with shardsvr (page 1054) is 27018, you must congure the following iptables rules to allow trafc to and
from each shard:
iptables -A INPUT -s <ip-address> -p tcp --destination-port 27018 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -d <ip-address> -p tcp --source-port 27018 -m state --state ESTABLISHED -j ACCEPT
Replace the <ip-address> specication with the IP address of all mongod (page 989). This allows you to permit
incoming and outgoing trafc between all shards including constituent replica set members, to:
all mongod (page 989) instances in the shards replica sets.
all mongod (page 989) instances in other shards.
3
Furthermore, shards need to be able make outgoing connections to:
all mongos (page 999) instances.
all mongod (page 989) instances in the cong servers.
Create a rule that resembles the following, and replace the <ip-address> with the address of the cong servers
and the mongos (page 999) instances:
iptables -A OUTPUT -d <ip-address> -p tcp --source-port 27018 -m state --state ESTABLISHED -j ACCEPT
Provide Access For Monitoring Systems
1. The mongostat (page 1029) diagnostic tool, when running with the --discover (page 1031) needs to
be able to reach all components of a cluster, including the cong servers, the shard servers, and the mongos
(page 999) instances.
2. If your monitoring system needs access the HTTP interface, insert the following rule to the chain:
iptables -A INPUT -s <ip-address> -p tcp --destination-port 28017 -m state --state NEW,ESTABLISHED -j ACCEPT
Replace <ip-address> with the address of the instance that needs access to the HTTP or REST interface.
For all deployments, you should restrict access to this port to only the monitoring instance.
Optional
For shard server mongod (page 989) instances running with shardsvr (page 1054), the rule would resemble
the following:
2
You can also specify the shard server option using the shardsvr (page 1054) setting in the conguration le. Shard members are also often
conventional replica sets using the default port.
3
All shards in a cluster need to be able to communicate with all other shards to facilitate chunk and balancing operations.
14.1. Congure Linux iptables Firewall for MongoDB 101
MongoDB Documentation, Release 2.4.1
iptables -A INPUT -s <ip-address> -p tcp --destination-port 28018 -m state --state NEW,ESTABLISHED -j ACCEPT
For cong server mongod (page 989) instances running with configsvr (page 1054), the rule would resemble
the following:
iptables -A INPUT -s <ip-address> -p tcp --destination-port 28019 -m state --state NEW,ESTABLISHED -j ACCEPT
14.1.3 Change Default Policy to DROP
The default policy for iptables chains is to allowall trafc. After completing all iptables conguration changes,
you must change the default policy to DROP so that all trafc that isnt explicitly allowed as above will not be able to
reach components of the MongoDB deployment. Issue the following commands to change this policy:
iptables -P INPUT DROP
iptables -P OUTPUT DROP
14.1.4 Manage and Maintain iptables Conguration
This section contains a number of basic operations for managing and using iptables. There are various front end
tools that automate some aspects of iptables conguration, but at the core all iptables front ends provide the
same basic functionality:
Make all iptables Rules Persistent
By default all iptables rules are only stored in memory. When your system restarts, your rewall rules will revert
to their defaults. When you have tested a rule set and have guaranteed that it effectively controls trafc you can use
the following operations to you should make the rule set persistent.
On Red Hat Enterprise Linux, Fedora Linux, and related distributions you can issue the following command:
service iptables save
On Debian, Ubuntu, and related distributions, you can use the following command to dump the iptables rules to
the http://docs.mongodb.org/manual/etc/iptables.conf le:
iptables-save > /etc/iptables.conf
Run the following operation to restore the network rules:
iptables-restore < /etc/iptables.conf
Place this command in your rc.local le, or in the http://docs.mongodb.org/manual/etc/network/if-up.d/iptables
le with other similar operations.q
List all iptables Rules
To list all of currently applied iptables rules, use the following operation at the system shell.
iptables --L
102 Chapter 14. Tutorials
MongoDB Documentation, Release 2.4.1
Flush all iptables Rules
If you make a conguration mistake when entering iptables rules or simply need to revert to the default rule set,
you can use the following operation at the system shell to ush all rules:
iptables --F
If youve already made your iptables rules persistent, you will need to repeat the appropriate procedure in the
Make all iptables Rules Persistent (page 102) section.
14.2 Congure Windows netsh Firewall for MongoDB
On Windows Server systems, the netsh program provides methods for managing the Windows Firewall. These
rewall rules make it possible for administrators to control what hosts can connect to the system, and limit risk
exposure by limiting the hosts that can connect to a system.
This document outlines basic Windows Firewall congurations. Use these approaches as a starting point for your
larger networking organization. For a detailed over view of security practices and risk management for MongoDB, see
Security Practices and Management (page 91).
See Also:
Windows Firewall documentation from Microsoft.
14.2.1 Overview
Windows Firewall processes rules in an ordered determined by rule type, and parsed in the following order:
1. Windows Service Hardening
2. Connection security rules
3. Authenticated Bypass Rules
4. Block Rules
5. Allow Rules
6. Default Rules
By default, the policy in Windows Firewall allows all outbound connections and blocks all incoming connections.
Given the default ports (page 92) of all MongoDB processes, you must congure networking rules that permit only
required communication between your application and the appropriate mongod.exe (page 1008) and mongos.exe
(page 1009) instances.
The conguration changes outlined in this document will create rules which explicitly allow trafc from specic
addresses and on specic ports, using a default policy that drops all trafc that is not explicitly allowed.
You can congure the Windows Firewall with using the netsh command line tool or through a windows application.
On Windows Server 2008 this application is Windows Firewall With Advanced Security in Administrative Tools. On
previous versions of Windows Server, access the Windows Firewall application in the System and Security control
panel.
The procedures in this document use the netsh command line tool.
14.2. Congure Windows netsh Firewall for MongoDB 103
MongoDB Documentation, Release 2.4.1
14.2.2 Patterns
This section contains a number of patterns and examples for conguring Windows Firewall for use with MongoDB
deployments. If you have congured different ports using the port (page 1046) conguration setting, you will need
to modify the rules accordingly.
Trafc to and from mongod.exe Instances
This pattern is applicable to all mongod.exe (page 1008) instances running as standalone instances or as part of a
replica set. The goal of this pattern is to explicitly allow trafc to the mongod.exe (page 1008) instance from the
application server.
netsh advfirewall firewall add rule name="Open mongod port 27017" dir=in action=allow protocol=TCP localport=27017
This rule allows all incoming trafc to port 27017, which allows the application server to connect to the
mongod.exe (page 1008) instance.
Windows Firewall also allows enabling network access for an entire application rather than to a specic port, as in the
following example:
netsh advfirewall firewall add rule name="Allowing mongod" dir=in action=allow program=" C:\mongodb\bin\mongod.exe"
You can allow all access for a mongos.exe (page 1009) server, with the following invocation:
netsh advfirewall firewall add rule name="Allowing mongos" dir=in action=allow program=" C:\mongodb\bin\mongos.exe"
Trafc to and from mongos.exe Instances
mongos.exe (page 1009) instances provide query routing for sharded clusters. Clients connect to mongos.exe
(page 1009) instances, which behave from the clients perspective as mongod.exe (page 1008) instances. In turn, the
mongos.exe (page 1009) connects to all mongod.exe (page 1008) instances that are components of the sharded
cluster.
Use the same Windows Firewall command to allow trafc to and from these instances as you would from the
mongod.exe (page 1008) instances that are members of the replica set.
netsh advfirewall firewall add rule name="Open mongod shard port 27018" dir=in action=allow protocol=TCP localport=27018
Trafc to and from a MongoDB Cong Server
Conguration servers, host the cong database that stores metadata for sharded clusters. Each production cluster has
three conguration servers, initiated using the mongod --configsvr (page 996) option.
4
Conguration servers
listen for connections on port 27019. As a result, add the following Windows Firewall rules to the cong server to
allow incoming and outgoing connection on port 27019, for connection to the other cong servers.
netsh advfirewall firewall add rule name="Open mongod config svr port 27019" dir=in action=allow protocol=TCP localport=27019
Additionally, cong servers need to allow incoming connections from all of the mongos.exe (page 1009) instances
in the cluster and all mongod.exe (page 1008) instances in the cluster. Add rules that resemble the following:
netsh advfirewall firewall add rule name="Open mongod config svr inbound" dir=in action=allow protocol=TCP remoteip=<ip-address> localport=27019
Replace <ip-address> with the addresses of the mongos.exe (page 1009) instances and the shard
mongod.exe (page 1008) instances.
4
You can also run a cong server by setting the configsvr (page 1054) option in a conguration le.
104 Chapter 14. Tutorials
MongoDB Documentation, Release 2.4.1
Trafc to and from a MongoDB Shard Server
For shard servers, running as mongod --shardsvr (page 997)
5
Because the default port number when running
with shardsvr (page 1054) is 27018, you must congure the following Windows Firewall rules to allow trafc to
and from each shard:
netsh advfirewall firewall add rule name="Open mongod shardsvr inbound" dir=in action=allow protocol=TCP remoteip=<ip-address> localport=27018
netsh advfirewall firewall add rule name="Open mongod shardsvr outbound" dir=out action=allow protocol=TCP remoteip=<ip-address> localport=27018
Replace the <ip-address> specication with the IP address of all mongod.exe (page 1008) instances. This
allows you to permit incoming and outgoing trafc between all shards including constituent replica set members to:
all mongod.exe (page 1008) instances in the shards replica sets.
all mongod.exe (page 1008) instances in other shards.
6
Furthermore, shards need to be able make outgoing connections to:
all mongos.exe (page 1009) instances.
all mongod.exe (page 1008) instances in the cong servers.
Create a rule that resembles the following, and replace the <ip-address> with the address of the cong servers
and the mongos.exe (page 1009) instances:
netsh advfirewall firewall add rule name="Open mongod config svr outbound" dir=out action=allow protocol=TCP remoteip=<ip-address> localport=27018
Provide Access For Monitoring Systems
1. The mongostat (page 1029) diagnostic tool, when running with the --discover (page 1031) needs to be
able to reach all components of a cluster, including the cong servers, the shard servers, and the mongos.exe
(page 1009) instances.
2. If your monitoring system needs access the HTTP interface, insert the following rule to the chain:
netsh advfirewall firewall add rule name="Open mongod HTTP monitoring inbound" dir=in action=allow protocol=TCP remoteip=<ip-address> localport=28017
Replace <ip-address> with the address of the instance that needs access to the HTTP or REST interface.
For all deployments, you should restrict access to this port to only the monitoring instance.
Optional
For shard server mongod.exe (page 1008) instances running with shardsvr (page 1054), the rule would
resemble the following:
netsh advfirewall firewall add rule name="Open mongos HTTP monitoring inbound" dir=in action=allow protocol=TCP remoteip=<ip-address> localport=28018
For cong server mongod.exe (page 1008) instances running with configsvr (page 1054), the rule would
resemble the following:
netsh advfirewall firewall add rule name="Open mongod configsvr HTTP monitoring inbound" dir=in action=allow protocol=TCP remoteip=<ip-address> localport=28019
5
You can also specify the shard server option using the shardsvr (page 1054) setting in the conguration le. Shard members are also often
conventional replica sets using the default port.
6
All shards in a cluster need to be able to communicate with all other shards to facilitate chunk and balancing operations.
14.2. Congure Windows netsh Firewall for MongoDB 105
MongoDB Documentation, Release 2.4.1
14.2.3 Manage and Maintain Windows Firewall Congurations
This section contains a number of basic operations for managing and using netsh. While you can use the GUI front
ends to manage the Windows Firewall, all core functionality is accessible is accessible from netsh.
Delete all Windows Firewall Rules
To delete the rewall rule allowing mongod.exe (page 1008) trafc:
netsh advfirewall firewall delete rule name="Open mongod port 27017" protocol=tcp localport=27017
netsh advfirewall firewall delete rule name="Open mongod shard port 27018" protocol=tcp localport=27018
List All Windows Firewall Rules
To return a list of all Windows Firewall rules:
netsh advfirewall firewall show rule name=all
Reset Windows Firewall
To reset the Windows Firewall rules:
netsh advfirewall reset
Backup and Restore Windows Firewall Rules
To simplify administration of larger collection of systems, you can export or import rewall systems from different
servers) rules very easily on Windows:
Export all rewall rules with the following command:
netsh advfirewall export "C:\temp\MongoDBfw.wfw"
Replace "C:\temp\MongoDBfw.wfw" with a path of your choosing. You can use a command in the following
form to import a le created using this operation:
netsh advfirewall import "C:\temp\MongoDBfw.wfw"
14.3 Control Access to MongoDB Instances with Authentication
MongoDB provides a basic authentication system, that you can enable with the auth (page 1048) and keyFile
(page 1047) conguration settings.
7
See the authentication (page 94) section of the Security Practices and Manage-
ment (page 91) document.
This document contains an overview of all operations related to authentication and managing a MongoDB deployment
with authentication.
See Also:
The Security Considerations (page 36) section of the Run-time Database Conguration (page 35) document for more
information on conguring authentication.
7
Use the --auth (page 991) --keyFile (page 991) options on the command line.
106 Chapter 14. Tutorials
MongoDB Documentation, Release 2.4.1
14.3.1 Add Users
When setting up authentication for the rst time you must either:
1. add at least one user to the admin database before starting the mongod (page 989) instance with auth
(page 1048).
2. add the rst user to the admin database when connected to the mongod (page 989) instance from a
localhost connection.
8
Begin by setting up the rst administrative user for the mongod (page 989) instance.
Add an Administrative User
About administrative users
Administrative users are those users that have normal or read and write access to the admin database.
If this is the rst administrative user,
9
connect to the mongod (page 989) on the localhost interface using the
mongo (page 1002) shell. Then, issue the following command sequence to switch to the admin database context and
add the administrative user:
use admin
db.addUser("<username>", "<password>")
Replace <username> and <password> with the credentials for this administrative user.
Add a Normal User to a Database
To add a user with read and write access to a specic database, in this example the records database, connect to the
mongod (page 989) instance using the mongo (page 1002) shell, and issue the following sequence of operations:
use records
db.addUser("<username>", "<password>")
Replace <username> and <password> with the credentials for this user.
Add a Read Only User to a Database
To add a user with read only access to a specic database, in this example the records database, connect to the
mongod (page 989) instance using the mongo (page 1002) shell, and issue the following sequence of operations:
use records
db.addUser("<username>", "<password>", true)
Replace <username> and <password> with the credentials for this user.
8
Because of SERVER-6591, you cannot add the rst user to a sharded cluster using the localhost connection in 2.2. If you are running a
2.2 sharded cluster, and want to enable authentication, you must deploy the cluster and add the rst user to the admin database before restarting
the cluster to run with keyFile (page 1047).
9
You can also use this procedure if authentication is not enabled so that your databases has an administrative user when you enable auth
(page 1048).
14.3. Control Access to MongoDB Instances with Authentication 107
MongoDB Documentation, Release 2.4.1
14.3.2 Administrative Access in MongoDB
Although administrative accounts have access to all databases, these users must authenticate against the admin
database before changing contexts to a second database, as in the following example:
Example
Given the superAdmin user with the password Password123, and access to the admin database.
The following operation in the mongo (page 1002) shell will succeed:
use admin
db.auth("superAdmin", "Password123")
However, the following operation will fail:
use test
db.auth("superAdmin", "Password123")
Note: If you have authenticated to the admin database as normal, read and write, user; logging into a different
database as a read only user will not invalidate the authentication to the admin database. In this situation, this client
will be able to read and write data to this second database.
14.3.3 Authentication on Localhost
The behavior of mongod (page 989) running with auth (page 1048), when connecting from a client over the localhost
interface (i.e. a client running on the same system as the mongod (page 989),) varies slightly between before and after
version 2.2.
In general if there are no users for the admin database, you may connect via the localhost interface. For sharded
clusters running version 2.2, if mongod (page 989) is running with auth (page 1048) then all users connecting over
the localhost interface must authenticate, even if there arent any users in the admin database.
14.3.4 Password Hashing Insecurity
In version 2.2 and earlier:
the normal users of a database all have access to the system.users collection, which contains the user names
and user password hashes.
10
if a user has the same password for multiple databases, the hash will be the same. A malicious user could exploit
this to gain access on a second database using a different users credentials.
As a result, always use unique username and password combinations for each database.
Thanks to Will Urbanski, from Dell SecureWorks, for identifying this issue.
14.3.5 Conguration Considerations for Authentication
The following sections outline practices for enabling and managing authentication with specic MongoDB deploy-
ments:
10
Read only users do not have access to the system.users database.
108 Chapter 14. Tutorials
MongoDB Documentation, Release 2.4.1
Security Considerations for Replica Sets (page 348)
Sharded Cluster Security Considerations (page 427)
14.3.6 Generate a Key File
The key le must be less than one kilobyte in size and may only contain characters in the base64 set. The key le must
not have group or world permissions on UNIX systems. Key le permissions are not checked on Windows systems.
Use the following command at the system shell to generate pseudo-random content for a key le:
openssl rand -base64 753
Note: Be aware that MongoDB strips whitespace characters (e.g. x0d, x09, and x20,) for cross-platform conve-
nience. As a result, the following keys are identical:
echo -e "my secret key" > key1
echo -e "my secret key\n" > key2
echo -e "my secret key" > key3
echo -e "my\r\nsecret\r\nkey\r\n" > key4
14.4 Deploy MongoDB with Kerberos Authentication
New in version 2.4. MongoDB Enterprise supports authentication using a Kerberos service to manage the authentica-
tion process. Kerberos is an industry standard authentication protocol for large client/server system. With Kerberos
MongoDB and application ecosystems can take advantage of existing authentication infrastructure and processes.
Setting up and conguring a Kerberos deployment is beyond the scope of this document. In order to use MongoDB
with Kerberos, you must have a properly congured Kerberos deployment and the ability to generate a valid keytab
le for each mongod (page 989) instance in your MongoDB deployment.
Note: The following assumes that you have a valid Kerberos keytab le for your realm acces-
sible on your system. The examples below assume that the keytab le is valid and is located at
http://docs.mongodb.org/manual/opt/mongodb/mongod.keytab and is only accessible to the user
that runs the mongod (page 989) process.
14.4.1 Process Overview
To run MongoDB with Kerberos support, you must:
Congure a Kerberos service principal for each mongod (page 989) and mongos (page 999) instance in your
MongoDB deployment.
Generate and distribute keytab les for each MongoDB component (i.e. mongod (page 989) and mongos
(page 999))in your deployment. Ensure that you only transmit keytab les over secure channels.
Optional. Start the mongod (page 989) instance without auth (page 1048) and create users inside of MongoDB
that you can use to bootstrap your deployment.
Start mongod (page 989) and mongos (page 999) with the KRB5_KTNAME environment variable as well as a
number of required run time options.
14.4. Deploy MongoDB with Kerberos Authentication 109
MongoDB Documentation, Release 2.4.1
If you did not create Kerberos user accounts, you can use the localhost exception (page 108) to create users at
this point until you create the rst user on the admin database.
Authenticate clients, including the mongo (page 1002) shell using Kerberos.
14.4.2 Operations
Create Users and Privilege Documents
For every user that you want to be able to authenticate using Kerberos, you must create corresponding privilege docu-
ments in the system.users (page 120) collection to provision access to users. Consider the following document:
{
user: "application/[email protected]",
roles: ["read"],
userSource: "$external"
}
This grants the Kerberos user principal application/[email protected] read only access to a
database. The userSource (page 121) $external reference allows mongod (page 989) to consult an external
source (i.e. Kerberos) to authenticate this user.
In the mongo (page 1002) shell you can pass the db.addUser() (page 902) a user privilege document to provision
access to users, as in the following operation:
db = db.getSiblingDB("records")
db.addUser( {
"user": "application/[email protected]",
"roles": [ "read" ],
"userSource": "$external"
} )
These operations grants the Kerberos user application/[email protected] access to the records
database.
To remove access to a user, use the remove() (page 929) method, as in the following example:
db.system.users.remove( { user: "application/[email protected]" } )
To modify a user document, use update (page 185) operations on documents in the system.users (page 120)
collection.
See Also:
system.user Privilege Documents (page 119) and User Privilege Roles in MongoDB (page 115).
Start mongod with Kerberos Support
Once you have provisioned privileges to users in the mongod (page 989), and obtained a valid keytab le, you must
start mongod (page 989) using a command in the following form:
env KRB5_KTNAME=<path to keytab file> <mongod invocation>
For successful operation with mongod (page 989) use the following run time options in addition to your normal
default conguration options:
--setParameter with the authenticationMechanisms=GSSAPI argument to enable support for
Kerberos.
110 Chapter 14. Tutorials
MongoDB Documentation, Release 2.4.1
--auth (page 991) to enable authentication.
--keyFile (page 991) to allow components of a single MongoDB deployment to communicate with each
other, if needed to support replica set and sharded cluster operations. keyFile (page 1047) implies auth
(page 1048).
For example, consider the following invocation:
env KRB5_KTNAME=/opt/mongodb/mongod.keytab \
/opt/mongodb/bin/mongod --dbpath /opt/mongodb/data \
--fork --logpath /opt/mongodb/log/mongod.log \
--auth --setParameter authenticationMechanisms=GSSAPI
You can also specify these options using the conguration le. As in the following:
# /opt/mongodb/mongod.conf, Example configuration file.
fork = true
auth = true
dbpath = /opt/mongodb/data
logpath = /opt/mongodb/log/mongod.log
setParameter = authenticationMechanisms=GSSAPI
To use this conguration le, start mongod (page 989) as in the following:
env KRB5_KTNAME=/opt/mongodb/mongod.keytab \
/opt/mongodb/bin/mongod --config /opt/mongodb/mongod.conf
To start a mongos (page 999) instance using Kerberos, you must create a Kerberos service principal and deploy a
keytab le for this instance, and then start the mongos (page 999) with the following invocation:
env KRB5_KTNAME=/opt/mongodb/mongos.keytab \
/opt/mongodb/bin/mongos
--configdb shard0.example.net,shard1.example.net,shard2.example.net \
--setParameter authenticationMechanisms=GSSAPI \
--keyFile /opt/mongodb/mongos.keyfile
If you encounter problems when trying to start mongod (page 989) or mongos (page 999), please see the trou-
bleshooting section (page 112) for more information.
Important: Before users can authenticate to MongoDB using Kerberos you must create users (page 110) and grant
them privileges within MongoDB. If you have not created users when you start MongoDB with Kerberos you can
use the localhost authentication exception (page 108) to add users. See the Create Users and Privilege Documents
(page 110) section and the User Privilege Roles in MongoDB (page 115) document for more information.
Authenticate mongo Shell with Kerberos
To connect to a mongod (page 989) instance using the mongo (page 1002) shell you must begin by using the kinit
program to initialize and authenticate a Kerberos session. Then, start a mongo (page 1002) instance, and use the
db.auth() (page 903) method, to authenticate against the special $external database, as in the following oper-
ation:
use $external
db.auth( { mechanism: "GSSAPI", user: "application/[email protected]" } )
Alternately, you can authenticate using command line options to mongo (page 1002), as in the following equivalent
example:
14.4. Deploy MongoDB with Kerberos Authentication 111
MongoDB Documentation, Release 2.4.1
mongo --authenticationMechanism=GSSAPI
--authenticationDatabase=$external \
--username application/[email protected]
These operations authenticates the Kerberos principal name application/[email protected] to the
connected mongod (page 989), and will automatically acquire all available privileges as needed.
Use MongoDB Drivers to Authenticate with Kerberos
At the time of release, the C++, Java, and C# drivers all provide support for Kerberos authentication to MongoDB.
Consider the following tutorials for more information:
Java
C#
C++
14.4.3 Troubleshooting
Kerberos Conguration Checklist
If youre having trouble getting mongod (page 989) to start with Kerberos, there are a number of Kerberos-specic
issues that can prevent successful authentication. As you begin troubleshooting your Kerberos deployment, ensure
that:
The mongod (page 989) is from MongoDB Enterprise.
You have a valid keytab le specied in the environment running the mongod (page 989). For the
mongod (page 989) instance running on the db0.example.net host, the service principal should be
mongodb/db0.example.net.
DNS allows the mongod (page 989) to resolve the components of the Kerberos infrastructure. You should
have both A and PTR records (i.e. forward and reverse DNS) for the system that runs the mongod (page 989)
instance.
The canonical system hostname of the system that runs the mongod (page 989) instance is the resolvable fully
qualied domain for this host. Test system hostname resolution with the hostname -f command at the
system prompt.
Both the Kerberos KDC and the system running mongod (page 989) instance must be able to resolve each other
using DNS
11
The time systems of the systems running the mongod (page 989) instances and the Kerberos infrastructure are
synchronized. Time differences greater than 5 minutes will prevent successful authentication.
If you still encounter problems with Kerberos, you can start both mongod (page 989) and mongo (page 1002) (or
another client) with the environment variable KRB5_TRACE set to different les to produce more verbose logging of
the Kerberos process to help further troubleshooting, as in the following example:
env KRB5_KTNAME=/opt/mongodb/mongod.keytab \
KRB5_TRACE=/opt/mongodb/log/mongodb-kerberos.log \
/opt/mongodb/bin/mongod --dbpath /opt/mongodb/data \
--fork --logpath /opt/mongodb/log/mongod.log \
--auth --setParameter authenticationMechanisms=GSSAPI
11
By default, Kerberos attempts to resolve hosts using the content of the http://docs.mongodb.org/manual/etc/kerb5.conf
before using DNS to resolve hosts.
112 Chapter 14. Tutorials
MongoDB Documentation, Release 2.4.1
Common Error Messages
In some situations, MongoDB will return error messages from the GSSAPI interface if there is a problem with the
Kerberos service.
GSSAPI error in client while negotiating security context.
This error occurs on the client and reects insufcient credentials or a malicious attempt to authenticate.
If you receive this error ensure that youre using the correct credentials and the correct fully qualied
domain name when connecting to the host.
GSSAPI error acquiring credentials.
This error only occurs when attempting to start the mongod (page 989) or mongos (page 999) and
reects improper conguration of system hostname or a missing or incorrectly congured keytab le. If
you encounter this problem, consider all the items in the Kerberos Conguration Checklist (page 112), in
particular:
examine the keytab le, with the following command:
klist -k <keytab>
Replace <keytab> with the path to your keytab le.
check the congured hostname for your system, with the following command:
hostname -f
Ensure that this name matches the name in the keytab le, or use the saslHostName (page 1068)
to pass MongoDB the correct hostname.
Enable the Traditional MongoDB Authentication Mechanism
For testing and development purposes you can enable both the Kerberos (i.e. GSSAPI) authentication mechanism in
combination with the traditional MongoDB challenge/response authentication mechanism (i.e. MONGODB-CR), using
the following setParameter (page 1052) run-time option:
mongod --setParameter authenticationMechanisms=GSSAPI,MONGODB-CR
Warning: All keyFile (page 1047) internal authentication between members of a replica set or sharded
cluster still uses the MONGODB-CR authentication mechanism, even if MONGODB-CR is not enabled. All client
authentication will still use Kerberos.
14.4. Deploy MongoDB with Kerberos Authentication 113
MongoDB Documentation, Release 2.4.1
114 Chapter 14. Tutorials
CHAPTER 15
Reference
15.1 User Privilege Roles in MongoDB
New in version 2.4. In version 2.4, MongoDB adds support for the following user roles:
15.1.1 Roles
Changed in version 2.4. Roles in MongoDB provide users with a set of specic privileges, on specic logi-
cal databases. Users may have multiple roles, and have different roles on different logical database. Roles only
grant privileges and never limit access: read (page 115) permissions on the records database database and the
readWriteAnyDatabase (page 119) permission, that user will be able to write data to the records database.
Note: By default, MongoDB 2.4 is backwards-compatible with the MongoDB 2.2 access con-
trol roles. You can, however, explicitly disable this backwards-compatibility by setting the
supportCompatibilityFormPrivilegeDocuments (page 1068) option to 0 during startup, as in the
following command-line invocation of MongoDB:
mongod --setParameter supportCompatibilityFormPrivilegeDocuments=0
In general, you should set this option if your deployment does not need to support legacy user documents. Typically
legacy user documents are only useful during the upgrade process and while you migrate applications to the updated
privilege document form.
See system.user Privilege Documents (page 119) and Delegated Credentials for MongoDB Authentication (page 121)
for more information about permissions and authentication in MongoDB.
Database User Roles
read
Provides users with the ability to read data from any collection within a specic logical database. This includes
find() (page 910) and the following database commands:
aggregate (page 818)
checkShardingIndex (page 821)
115
MongoDB Documentation, Release 2.4.1
cloneCollectionAsCapped (page 823)
collStats (page 825)
count (page 830)
dataSize (page 832)
dbHash (page 833)
dbStats (page 833)
distinct (page 833)
filemd5 (page 838)
geoNear (page 845)
geoSearch (page 846)
geoWalk (page 846)
group (page 849)
mapReduce (page 860) (inline output only.)
text (page 883) (beta feature.)
readWrite
Provides users with the ability to read from or write to any collection within a specic logical database. Users
with readWrite (page 116) have access to all of the operations available to read (page 115) users, as
well as the following basic write operations: insert() (page 921), remove() (page 929), and update()
(page 933).
Additionally, users with the readWrite (page 116) have access to the following database commands:
cloneCollection (page 822) (as the target database.)
convertToCapped (page 829)
create (page 831) (and to create collections implicitly.)
drop() (page 907)
dropIndexes (page 835)
emptycapped (page 835)
ensureIndex() (page 908)
findAndModify (page 838)
mapReduce (page 860) (output to a collection.)
renameCollection (page 871) (within the same database.)
Database Administration Roles
dbAdmin
Provides the ability to perform the following set of administrative operations within the scope of this logical
database.
clean (page 821)
collMod (page 823)
collStats (page 825)
116 Chapter 15. Reference
MongoDB Documentation, Release 2.4.1
compact (page 825)
convertToCapped (page 829)
createCollection
create (page 831)
dbStats (page 833)
drop() (page 907)
dropIndexes (page 835)
ensureIndex
profile (page 869)
reIndex (page 870)
renameCollection (page 871) (within a single database.)
validate (page 887)
Furthermore only dbAdmin (page 116) has the ability to read the system.profile (page 1131) collection.
userAdmin
Allows users to read and write data to the system.users (page 120) collection. of any database. Users with
this role will be able to modify permissions for existing users, and create new users. userAdmin (page 117)
does not restrict the permissions that a user can grant, and an userAdmin (page 117) user can grant privileges
to themselves or other users in excess of the userAdmin (page 117) users current privileges.
userAdmin (page 117) is effectively the superuser role for a specic database. Users with userAdmin
(page 117) can grant themselves all privileges.
15.1.2 Administrative Roles
clusterAdmin
clusterAdmin (page 117) grants access to several administration options replica set and sharded cluster
administrative functions.
clusterAdmin (page 117) is only applicable on the admin database.
Specically, users with the clusterAdmin (page 117) role have access to the following operations:
addShard (page 817)
closeAllDatabases (page 823)
connPoolStats (page 828)
connPoolSync (page 828)
_cpuProfilerStart
_cpuProfilerStop
cursorInfo (page 832)
diagLogging (page 833)
dropDatabase (page 834)
enableSharding (page 835)
flushRouterConfig (page 843)
15.1. User Privilege Roles in MongoDB 117
MongoDB Documentation, Release 2.4.1
fsync (page 844)
db.fsyncUnlock() (page 939)
getCmdLineOpts (page 846)
getLog (page 848)
getParameter (page 848)
getShardMap (page 849)
getShardVersion (page 849)
hostInfo (page 855)
db.currentOp() (page 936)
db.killOp() (page 942)
listDatabases (page 859)
listShards (page 859)
logRotate (page 859)
moveChunk (page 868)
movePrimary (page 868)
netstat (page 869)
removeShard (page 871)
repairDatabase (page 872)
replSetFreeze (page 873)
replSetGetStatus (page 874)
replSetInitiate (page 874)
replSetMaintenance (page 875)
replSetReconfig (page 875)
replSetStepDown (page 876)
replSetSyncFrom (page 876)
resync (page 878)
serverStatus (page 878)
setParameter (page 878)
setShardVersion (page 879)
shardCollection (page 879)
shardingState (page 880)
shutdown (page 880)
splitChunk (page 882)
splitVector (page 882)
split (page 882)
top (page 886)
118 Chapter 15. Reference
MongoDB Documentation, Release 2.4.1
touch (page 886)
unsetSharding (page 887)
15.1.3 Any Database Roles
Note: You must specify the following any database roles on the admin databases. These roles apply to all
databases in a MongoDB instance, and are roughly equivalent to their single-database equivalents.
If you add any of these roles to a user privilege document (page 119) outside of the admin database, the privilege will
have no effect.
readAnyDatabase
readAnyDatabase (page 119) provides users with the same read-only permissions as read (page 115),
except it applies to all logical databases in the MongoDB environment.
readWriteAnyDatabase
readWriteAnyDatabase (page 119) provides users with the same read and write permissions as
readWrite (page 116), except it applies to all logical databases in the MongoDB environment.
userAdminAnyDatabase
userAdminAnyDatabase (page 119) provides users with the same access to user administration operations
as userAdmin (page 117), except it applies to all logical databases in the MongoDB environment.
Warning: Because users with userAdminAnyDatabase (page 119) and userAdmin (page 117) have
the ability to create and modify permissions in addition to their own level of access, this role is effectively
the MongoDB system superuser.
dbAdminAnyDatabase
dbAdminAnyDatabase (page 119) provides users with the same access to database administration operations
as dbAdmin (page 116), except it applies to all logical databases in the MongoDB environment.
15.1.4 Combined Access
Some operations are only available to users that have multiple roles. Consider the following:
sh.status() (page 963) Requires clusterAdmin (page 117) and read (page 115) access to the config
(page 1123) database.
applyOps (page 819), eval (page 836), and db.eval() (page 937) Requires readAnyDatabase
(page 119), readWriteAnyDatabase (page 119), userAdminAnyDatabase (page 119),
dbAdminAnyDatabase (page 119) and clusterAdmin (page 117) (on the admin database.)
15.2 system.user Privilege Documents
Changed in version 2.4.
15.2.1 Overview
The documents in the <database>.system.users (page 120) collection store credentials and user privilege
information used by the authentication system to provision access to users in the MongoDB system. See User Privilege
15.2. system.user Privilege Documents 119
MongoDB Documentation, Release 2.4.1
Roles in MongoDB (page 115) for more information about access roles, and Security (page 89) for an overviewsecurity
in MongoDB.
15.2.2 Data Model
<database>.system.users
Changed in version 2.4. Documents in the <database>.system.users (page 120) collection stores cre-
dentials and user roles (page 115) for users who have access to the database. Consider the following prototypes
of user privilege documents:
{
user: "<username>",
pwd: "<hash>",
roles: []
}
{
user: "<username>",
userSource: "<database>",
roles: []
}
Note: The pwd (page 120) and userSource (page 121) elds are mutually exclusive. A single document
cannot contain both.
The following privilege document with the otherDBRoles (page 121) eld is only supported on the admin
database:
{
user: "<username>",
userSource: "<database>",
otherDBRoles: {
<database0> : [],
<database1> : []
},
roles: []
}
Consider the content of the following elds in the system.users (page 120) documents:
<database>.system.users.user
user (page 120) is a string that identies each user. Users exist in the context of a single logical database;
however, users from one database may obtain access in another database by way of the otherDBRoles
(page 121) eld on the admin database, the userSource (page 121) eld, or the Any Database Roles
(page 119).
<database>.system.users.pwd
pwd (page 120) holds a hashed shared secret used to authenticate the user (page 120). pwd (page 120)
eld is mutually exclusive with the userSource (page 121) eld.
<database>.system.users.roles
roles (page 120) holds an array of user roles. The available roles are:
read (page 115)
readWrite (page 116)
dbAdmin (page 116)
120 Chapter 15. Reference
MongoDB Documentation, Release 2.4.1
userAdmin (page 117)
clusterAdmin (page 117)
readAnyDatabase (page 119)
readWriteAnyDatabase (page 119)
userAdminAnyDatabase (page 119)
dbAdminAnyDatabase (page 119)
See Roles (page 115) for full documentation of all available user roles.
<database>.system.users.userSource
A string that holds the name of the database that contains the credentials for the user. If userSource
(page 121) is $external, then MongoDB will use an external resource, such as Kerberos, for authenti-
cation credentials.
Note: In the current release, the only external authentication source is Kerberos, which is only available
in MongoDB Enterprise.
Use userSource (page 121) to ensure that a single users authentication credentials are only stored in a
single location in a mongod (page 989) instances data.
A userSource (page 121) and user (page 120) pair identies a unique user in a MongoDB system.
admin.system.users.otherDBRoles
A document that holds one or more elds with a name that is the name of a database in the MongoDB
instance with a value that holds a list of roles this user has on other databases. Consider the following
example:
{
user: "admin",
userSource: "$external",
roles: [ "clusterAdmin"],
otherDBRoles:
{
config: [ "read" ],
records: [ "dbadmin" ]
}
}
This user has the following privileges:
clusterAdmin (page 117) on the admin database,
read (page 115) on the config (page 1123) database, and
dbAdmin (page 116) on the records database.
15.2.3 Delegated Credentials for MongoDB Authentication
New in version 2.4. With a new document format in the system.users (page 120) collection, MongoDB now sup-
ports the ability to delegate authentication credentials to other sources and databases. The userSource (page 121)
eld in these documents forces MongoDB to use another source for credentials.
Consider the following document in a system.users (page 120) collection in a database named accounts:
15.2. system.user Privilege Documents 121
MongoDB Documentation, Release 2.4.1
{
user: "application0",
pwd: "YvuolxMtaycghk2GMrzmImkG4073jzAw2AliMRul",
roles: []
}
Then for every database that the application0 user requires access, add documents to the system.users
(page 120) collection that resemble the following:
{
user: "application0",
roles: [readWrite],
userSource: "accounts"
}
To gain privileges to databases where the application0 has access, you must rst authenticate to the accounts
database.
15.2.4 Disable Legacy Privilege Documents
By default MongoDB 2.4 includes support for both new, role-based privilege documents style as well 2.2 and earlier
privilege documents. MongoDB assumes any privilege document without a roles (page 120) eld is a 2.2 or earlier
document.
To ensure that mongod (page 989) instances will only provide access to users dened with the newrole-based privilege
documents, use the following setParameter (page 1052) run-time option:
mongod --setParameter supportCompatibilityFormPrivilegeDocuments=0
122 Chapter 15. Reference
Part IV
Core MongoDB Operations (CRUD)
123
MongoDB Documentation, Release 2.4.1
CRUD stands for create, read, update, and delete, which are the four core database operations used in database driven
application development. The CRUD Operations for MongoDB (page 167) section provides introduction to each class
of operation along with complete examples of each operation. The documents in the Read and Write Operations in
MongoDB (page 127) section provide a higher level overview of the behavior and available functionality of these
operations.
125
MongoDB Documentation, Release 2.4.1
126
CHAPTER 16
Read and Write Operations in
MongoDB
The Read Operations (page 127) and Write Operations (page 139) documents provide higher level introductions
and description of the behavior and operations of read and write operations for MongoDB deployments. The BSON
Documents (page 151) provides an overview of documents and document-orientation in MongoDB.
16.1 Read Operations
Read operations include all operations that return a cursor in response to application request data (i.e. queries,) and
also include a number of aggregation (page 209) operations that do not return a cursor but have similar properties as
queries. These commands include aggregate (page 818), count (page 830), and distinct (page 833).
This document describes the syntax and structure of the queries applications use to request data from MongoDB and
how different factors affect the efciency of reads.
Note: All of the examples in this document use the mongo (page 1002) shell interface. All of these operations are
available in an idiomatic interface for each language by way of the MongoDB Driver (page 493). See your driver
documentation for full API documentation.
16.1.1 Queries in MongoDB
In the mongo (page 1002) shell, the find() (page 910) and findOne() (page 915) methods perform read opera-
tions. The find() (page 910) method has the following syntax:
1
db.collection.find( <query>, <projection> )
The db.collection object species the database and collection to query. All queries in MongoDB address
a single collection.
You can enter db in the mongo (page 1002) shell to return the name of the current database. Use the show
collections operation in the mongo (page 1002) shell to list the current collections in the database.
1
db.collection.find() (page 910) is a wrapper for the more formal query structure with the $query (page 781) operator.
127
MongoDB Documentation, Release 2.4.1
Queries in MongoDB are BSON objects that use a set of query operators (page 974) to describe query parame-
ters.
The <query> argument of the find() (page 910) method holds this query document. A read operation
without a query document will return all documents in the collection.
The <projection> argument describes the result set in the form of a document. Projections specify or limit
the elds to return.
Without a projection, the operation will return all elds of the documents. Specify a projection if your documents
are larger, or when your application only needs a subset of available elds.
The order of documents returned by a query is not dened and is not necessarily consistent unless you specify a
sort (sort() (page 901)).
For example, the following operation on the inventory collection selects all documents where the type eld
equals food and the price eld has a value less than 9.95. The projection limits the response to the item and
qty, and _id eld:
db.inventory.find( { type: food, price: { $lt: 9.95 } },
{ item: 1, qty: 1 } )
The findOne() (page 915) method is similar to the find() (page 910) method except the findOne() (page 915)
method returns a single document from a collection rather than a cursor. The method has the syntax:
db.collection.findOne( <query>, <projection> )
For additional documentation and examples of the main MongoDB read operators, refer to the Read (page 175) page
of the Core MongoDB Operations (CRUD) (page 125) section.
Query Document
This section provides an overview of the query document for MongoDB queries. See the preceding section for more
information on queries in MongoDB (page 127).
The following examples demonstrate the key properties of the query document in MongoDB queries, using the
find() (page 910) method from the mongo (page 1002) shell, and a collection of documents named inventory:
An empty query document ({}) selects all documents in the collection:
db.inventory.find( {} )
Not specifying a query document to the find() (page 910) is equivalent to specifying an empty query docu-
ment. Therefore the following operation is equivalent to the previous operation:
db.inventory.find()
A single-clause query selects all documents in a collection where a eld has a certain value. These are simple
equality queries.
In the following example, the query selects all documents in the collection where the type eld has the value
snacks:
db.inventory.find( { type: "snacks" } )
A single-clause query document can also select all documents in a collection given a condition or set of condi-
tions for one eld in the collections documents. Use the query operators (page 974) to specify conditions in a
MongoDB query.
In the following example, the query selects all documents in the collection where the value of the type eld is
either food or snacks:
128 Chapter 16. Read and Write Operations in MongoDB
MongoDB Documentation, Release 2.4.1
db.inventory.find( { type: { $in: [ food, snacks ] } } )
Note: Although you can express this query using the $or (page 775) operator, choose the $in (page 765)
operator rather than the $or (page 775) operator when performing equality checks on the same eld.
A compound query can specify conditions for more than one eld in the collections documents. Implicitly, a
logical AND conjunction connects the clauses of a compound query so that the query selects the documents in
the collection that match all the conditions.
In the following example, the query document species an equality match on a single eld, followed by a range
of values for a second eld using a comparison operator (page 974):
db.inventory.find( { type: food, price: { $lt: 9.95 } } )
This query selects all documents where the type eld has the value food and the value of the price eld
is less than ($lt (page 767)) 9.95.
Using the $or (page 775) operator, you can specify a compound query that joins each clause with a logical OR
conjunction so that the query selects the documents in the collection that match at least one condition.
In the following example, the query document selects all documents in the collection where the eld qty has a
value greater than ($gt (page 764)) 100 or the value of the price eld is less than ($lt (page 767)) 9.95:
db.inventory.find( { $or: [ { qty: { $gt: 100 } },
{ price: { $lt: 9.95 } } ]
} )
With additional clauses, you can specify precise conditions for matching documents. In the following example,
the compound query document selects all documents in the collection where the value of the type eld is
food and either the qty has a value greater than ($gt (page 764)) 100 or the value of the price eld is
less than ($lt (page 767)) 9.95:
db.inventory.find( { type: food, $or: [ { qty: { $gt: 100 } },
{ price: { $lt: 9.95 } } ]
} )
Subdocuments
When the eld holds an embedded document (i.e. subdocument), you can either specify the entire subdocument as
the value of a eld, or reach into the subdocument using dot notation, to specify values for individual elds in the
subdocument:
Equality matches within subdocuments select documents if the subdocument matches exactly the specied sub-
document, including the eld order.
In the following example, the query matches all documents where the value of the eld producer is a subdoc-
ument that contains only the eld company with the value ABC123 and the eld address with the value
123 Street, in the exact order:
db.inventory.find( {
producer: {
company: ABC123,
address: 123 Street
}
}
)
16.1. Read Operations 129
MongoDB Documentation, Release 2.4.1
Equality matches for specic elds within subdocuments select documents when the eld in the subdocument
contains a eld that matches the specied value.
In the following example, the query uses the dot notation to match all documents where the value of the eld
producer is a subdocument that contains a eld company with the value ABC123 and may contain other
elds:
db.inventory.find( { producer.company: ABC123 } )
Arrays
When the eld holds an array, you can query for values in the array, and if the array holds sub-documents, you query
for specic elds within the sub-documents using dot notation:
Equality matches can specify an entire array, to select an array that matches exactly. In the following exam-
ple, the query matches all documents where the value of the eld tags is an array and holds three elements,
fruit, food, and citrus, in this order:
db.inventory.find( { tags: [ fruit, food, citrus ] } )
Equality matches can specify a single element in the array. If the array contains at least one element with the
specied value, as in the following example: the query matches all documents where the value of the eld tags
is an array that contains, as one of its elements, the element fruit:
db.inventory.find( { tags: fruit } )
Equality matches can also select documents by values in an array using the array index (i.e. position) of the
element in the array, as in the following example: the query uses the dot notation to match all documents where
the value of the tags eld is an array whose rst element equals fruit:
db.inventory.find( { tags.0 : fruit } )
In the following examples, consider an array that contains subdocuments:
If you know the array index of the subdocument, you can specify the document using the subdocuments posi-
tion.
The following example selects all documents where the memos contains an array whose rst element (i.e. index
is 0) is a subdocument with the eld by with the value shipping:
db.inventory.find( { memos.0.by: shipping } )
If you do not know the index position of the subdocument, concatenate the name of the eld that contains the
array, with a dot (.) and the name of the eld in the subdocument.
The following example selects all documents where the memos eld contains an array that contains at least one
subdocument with the eld by with the value shipping:
db.inventory.find( { memos.by: shipping } )
To match by multiple elds in the subdocument, you can use either dot notation or the $elemMatch (page 760)
operator:
The following example uses dot notation to query for documents where the value of the memos eld is an array
that has at least one subdocument that contains the eld memo equal to on time and the eld by equal to
shipping:
db.inventory.find(
{
130 Chapter 16. Read and Write Operations in MongoDB
MongoDB Documentation, Release 2.4.1
memos.memo: on time,
memos.by: shipping
}
)
The following example uses $elemMatch (page 760) to query for documents where the value of the memos
eld is an array that has at least one subdocument that contains the eld memo equal to on time and the
eld by equal to shipping:
db.inventory.find( { memos: {
$elemMatch: {
memo : on time,
by: shipping
}
}
}
)
Refer to the Query, Update, Projection, and Aggregation Operators (page 755) document for the complete list of query
operators.
Result Projections
The projection specication limits the elds to return for all matching documents. Restricting the elds to return can
minimize network transit costs and the costs of deserializing documents in the application layer.
The second argument to the find() (page 910) method is a projection, and it takes the form of a document with a list
of elds for inclusion or exclusion from the result set. You can either specify the elds to include (e.g. { field:
1 }) or specify the elds to exclude (e.g. { field: 0 }). The _id eld is, by default, included in the result set.
To exclude the _id eld from the result set, you need to specify in the projection document the exclusion of the _id
eld (i.e. { _id: 0 }).
Note: You cannot combine inclusion and exclusion semantics in a single projection with the exception of the _id
eld.
Consider the following projection specications in find() (page 910) operations:
If you specify no projection, the find() (page 910) method returns all elds of all documents that match the
query.
db.inventory.find( { type: food } )
This operation will return all documents in the inventory collection where the value of the type eld is
food.
A projection can explicitly include several elds. In the following operation, find() (page 910) method
returns all documents that match the query as well as item and qty elds. The results also include the _id
eld:
db.inventory.find( { type: food }, { item: 1, qty: 1 } )
You can remove the _id eld from the results by specifying its exclusion in the projection, as in the following
example:
db.inventory.find( { type: food }, { item: 1, qty: 1, _id:0 } )
16.1. Read Operations 131
MongoDB Documentation, Release 2.4.1
This operation returns all documents that match the query, and only includes the item and qty elds in the
result set.
To exclude a single eld or group of elds you can use a projection in the following form:
db.inventory.find( { type: food }, { type:0 } )
This operation returns all documents where the value of the type eld is food, but does not include the type
eld in the output.
With the exception of the _id eld you cannot combine inclusion and exclusion statements in projection docu-
ments.
The $elemMatch (page 793) and $slice (page 797) projection operators provide more control when projecting
only a portion of an array.
16.1.2 Indexes
Indexes improve the efciency of read operations by reducing the amount of data that query operations need to process
and thereby simplifying the work associated with fullling queries within MongoDB. The indexes themselves are a
special data structure that MongoDB maintains when inserting or modifying documents, and any given index can:
support and optimize specic queries, sort operations, and allow for more efcient storage utilization. For more
information about indexes in MongoDB see: Indexes (page 271) and Indexing Overview (page 273).
You can create indexes using the db.collection.ensureIndex() (page 908) method in the mongo
(page 1002) shell, as in the following prototype operation:
db.collection.ensureIndex( { <field1>: <order>, <field2>: <order>, ... } )
The field species the eld to index. The eld may be a eld from a subdocument, using dot notation to
specify subdocument elds.
You can create an index on a single eld or a compound index (page 275) that includes multiple elds in the
index.
The order option is species either ascending ( 1 ) or descending ( -1 ).
MongoDB can read the index in either direction. In most cases, you only need to specify indexing order
(page 276) to support sort operations in compound queries.
Covering a Query
An index covers (page 290) a query, a covered query, when:
all the elds in the query (page 128) are part of that index, and
all the elds returned in the documents that match the query are in the same index.
For these queries, MongoDB does not need to inspect at documents outside of the index, which is often more efcient
than inspecting entire documents.
Example
Given a collection inventory with the following index on the type and item elds:
{ type: 1, item: 1 }
This index will cover the following query on the type and item elds, which returns only the item eld:
132 Chapter 16. Read and Write Operations in MongoDB
MongoDB Documentation, Release 2.4.1
db.inventory.find( { type: "food", item:/^c/ },
{ item: 1, _id: 0 } )
However, this index will not cover the following query, which returns the item eld and the _id eld:
db.inventory.find( { type: "food", item:/^c/ },
{ item: 1 } )
See Create Indexes that Support Covered Queries (page 290) for more information on the behavior and use of covered
queries.
Measuring Index Use
The explain() (page 893) cursor method allows you to inspect the operation of the query system, and is useful
for analyzing the efciency of queries, and for determining how the query uses the index. Call the explain()
(page 893) method on a cursor returned by find() (page 910), as in the following example:
db.inventory.find( { type: food } ).explain()
Note: Only use explain() (page 893) to test the query operation, and not the timing of query performance.
Because explain() (page 893) attempts multiple query plans, it does not reect accurate query performance.
If the above operation could not use an index, the output of explain() (page 893) would resemble the following:
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 5,
"nscannedObjects" : 4000006,
"nscanned" : 4000006,
"nscannedObjectsAllPlans" : 4000006,
"nscannedAllPlans" : 4000006,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 2,
"nChunkSkips" : 0,
"millis" : 1591,
"indexBounds" : { },
"server" : "mongodb0.example.net:27017"
}
The BasicCursor value in the cursor (page 1119) eld conrms that this query does not use an index. The
explain.nscannedObjects (page 1119) value shows that MongoDB must scan 4,000,006 documents to return
only 5 documents. To increase the efciency of the query, create an index on the type eld, as in the following
example:
db.inventory.ensureIndex( { type: 1 } )
Run the explain() (page 893) operation, as follows, to test the use of the index:
db.inventory.find( { type: food } ).explain()
Consider the results:
{
"cursor" : "BtreeCursor type_1",
16.1. Read Operations 133
MongoDB Documentation, Release 2.4.1
"isMultiKey" : false,
"n" : 5,
"nscannedObjects" : 5,
"nscanned" : 5,
"nscannedObjectsAllPlans" : 5,
"nscannedAllPlans" : 5,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : { "type" : [
[ "food",
"food" ]
] },
"server" : "mongodbo0.example.net:27017" }
The BtreeCursor value of the cursor (page 1119) eld indicates that the query used an index. This query:
returned 5 documents, as indicated by the n (page 1119) eld;
scanned 5 documents from the index, as indicated by the nscanned (page 1119) eld;
then read 5 full documents from the collection, as indicated by the nscannedObjects (page 1119) eld.
Although the query uses an index to nd the matching documents, if indexOnly (page 1120) is false then an
index could not cover (page 132) the query: MongoDB could not both match the query conditions (page 128)
and return the results using only this index. See Create Indexes that Support Covered Queries (page 290) for
more information.
Query Optimization
The MongoDB query optimizer processes queries and chooses the most efcient query plan for a query given the avail-
able indexes. The query system then uses this query plan each time the query runs. The query optimizer occasionally
reevaluates query plans as the content of the collection changes to ensure optimal query plans.
To create a new query plan, the query optimizer:
1. runs the query against several candidate indexes in parallel.
2. records the matches in a common results buffer or buffers.
If the candidate plans include only ordered query plans, there is a single common results buffer.
If the candidate plans include only unordered query plans, there is a single common results buffer.
If the candidate plans include both ordered query plans and unordered query plans, there are two common
results buffers, one for the ordered plans and the other for the unordered plans.
If an index returns a result already returned by another index, the optimizer skips the duplicate match. In the
case of the two buffers, both buffers are de-duped.
3. stops the testing of candidate plans and selects an index when one of the following events occur:
An unordered query plan has returned all the matching results; or
An ordered query plan has returned all the matching results; or
An ordered query plan has returned a threshold number of matching results:
Version 2.0: Threshold is the query batch size. The default batch size is 101.
Version 2.2: Threshold is 101.
134 Chapter 16. Read and Write Operations in MongoDB
MongoDB Documentation, Release 2.4.1
The selected index becomes the index specied in the query plan; future iterations of this query or queries with the
same query pattern will use this index. Query pattern refers to query select conditions that differ only in the values, as
in the following two queries with the same query pattern:
db.inventory.find( { type: food } )
db.inventory.find( { type: utensil } )
To manually compare the performance of a query using more than one index, you can use the hint() (page 894) and
explain() (page 893) methods in conjunction, as in the following prototype:
db.collection.find().hint().explain()
The following operations each run the same query but will reect the use of the different indexes:
db.inventory.find( { type: food } ).hint( { type: 1 } ).explain()
db.inventory.find( { type: food } ).hint( { type: 1, name: 1 }).explain()
This returns the statistics regarding the execution of the query. For more information on the output of explain()
(page 893), see the Explain Output (page 1117).
Note: If you run explain() (page 893) without including hint() (page 894), the query optimizer reevaluates
the query and runs against multiple indexes before returning the query statistics.
As collections change over time, the query optimizer deletes a query plan and reevaluates the after any of the following
events:
the collection receives 1,000 write operations.
the reIndex (page 870) rebuilds the index.
you add or drop an index.
the mongod (page 989) process restarts.
For more information, see Indexing Strategies (page 289).
Query Operations that Cannot Use Indexes Effectively
Some query operations cannot use indexes effectively or cannot use indexes at all. Consider the following situations:
The inequality operators $nin (page 773) and $ne (page 771) are not very selective, as they often match a
large portion of the index.
As a result, in most cases, a $nin (page 773) or $ne (page 771) query with an index may perform no better
than a $nin (page 773) or $ne (page 771) query that must scan all documents in a collection.
Queries that specify regular expressions, with inline JavaScript regular expressions or $regex (page 781)
operator expressions, cannot use an index. However, the regular expression with anchors to the beginning of a
string can use an index.
16.1.3 Cursors
The find() (page 910) method returns a cursor to the results; however, in the mongo (page 1002) shell, if the
returned cursor is not assigned to a variable, then the cursor is automatically iterated up to 20 times
2
to print up to the
rst 20 documents that match the query, as in the following example:
2
You can use the DBQuery.shellBatchSize to change the number of iteration from the default value 20. See Executing Queries
(page 532) for more information.
16.1. Read Operations 135
MongoDB Documentation, Release 2.4.1
db.inventory.find( { type: food } );
When you assign the find() (page 910) to a variable:
you can call the cursor variable in the shell to iterate up to 20 times
2
and print the matching documents, as in
the following example:
var myCursor = db.inventory.find( { type: food } );
myCursor
you can use the cursor method next() (page 899) to access the documents, as in the following example:
var myCursor = db.inventory.find( { type: food } );
var myDocument = myCursor.hasNext() ? myCursor.next() : null;
if (myDocument) {
var myItem = myDocument.item;
print(tojson(myItem));
}
As an alternative print operation, consider the printjson() helper method to replace print(tojson()):
if (myDocument) {
var myItem = myDocument.item;
printjson(myItem);
}
you can use the cursor method forEach() (page 894) to iterate the cursor and access the documents, as in the
following example:
var myCursor = db.inventory.find( { type: food } );
myCursor.forEach(printjson);
See JavaScript cursor methods (page 982) and your driver (page 493) documentation for more information on cursor
methods.
Iterator Index
In the mongo (page 1002) shell, you can use the toArray() (page 902) method to iterate the cursor and return the
documents in an array, as in the following:
var myCursor = db.inventory.find( { type: food } );
var documentArray = myCursor.toArray();
var myDocument = documentArray[3];
The toArray() (page 902) method loads into RAM all documents returned by the cursor; the toArray()
(page 902) method exhausts the cursor.
Additionally, some drivers (page 493) provide access to the documents by using an index on the cursor (i.e.
cursor[index]). This is a shortcut for rst calling the toArray() (page 902) method and then using an in-
dex on the resulting array.
Consider the following example:
var myCursor = db.inventory.find( { type: food } );
var myDocument = myCursor[3];
136 Chapter 16. Read and Write Operations in MongoDB
MongoDB Documentation, Release 2.4.1
The myCursor[3] is equivalent to the following example:
myCursor.toArray() [3];
Cursor Behaviors
Consider the following behaviors related to cursors:
By default, the server will automatically close the cursor after 10 minutes of inactivity or if client has exhausted
the cursor. To override this behavior, you can specify the noTimeout wire protocol ag in your query; how-
ever, you should either close the cursor manually or exhaust the cursor. In the mongo (page 1002) shell, you
can set the noTimeout ag:
var myCursor = db.inventory.find().addOption(DBQuery.Option.noTimeout);
See your driver (page 493) documentation for information on setting the noTimeout ag. See Cursor Flags
(page 138) for a complete list of available cursor ags.
Because the cursor is not isolated during its lifetime, intervening write operations may result in a cursor that
returns a single document
3
more than once. To handle this situation, see the information on snapshot mode
(page 713).
The MongoDB server returns the query results in batches:
For most queries, the rst batch returns 101 documents or just enough documents to exceed 1 megabyte.
Subsequent batch size is 4 megabytes. To override the default size of the batch, see batchSize()
(page 892) and limit() (page 895).
For queries that include a sort operation without an index, the server must load all the documents in
memory to perform the sort and will return all documents in the rst batch.
Batch size will not exceed the maximum BSON document size (page 1133).
As you iterate through the cursor and reach the end of the returned batch, if there are more results,
cursor.next() (page 899) will perform a getmore operation (page 1111) to retrieve the next
batch.
To see how many documents remain in the batch as you iterate the cursor, you can use the
objsLeftInBatch() (page 899) method, as in the following example:
var myCursor = db.inventory.find();
var myFirstDocument = myCursor.hasNext() ? myCursor.next() : null;
myCursor.objsLeftInBatch();
You can use the command cursorInfo (page 832) to retrieve the following information on cursors:
total number of open cursors
size of the client cursors in current use
number of timed out cursors since the last server restart
Consider the following example:
db.runCommand( { cursorInfo: 1 } )
The result from the command returns the following documentation:
3
A single document relative to value of the _id eld. A cursor cannot return the same document more than once if the document has not
changed.
16.1. Read Operations 137
MongoDB Documentation, Release 2.4.1
{ "totalOpen" : <number>, "clientCursors_size" : <number>, "timedOut" : <number>, "ok" : 1 }
Cursor Flags
The mongo (page 1002) shell provides the following cursor ags:
DBQuery.Option.tailable
DBQuery.Option.slaveOk
DBQuery.Option.oplogReplay
DBQuery.Option.noTimeout
DBQuery.Option.awaitData
DBQuery.Option.exhaust
DBQuery.Option.partial
Aggregation
Changed in version 2.2. MongoDB can perform some basic data aggregation operations on results before returning
data to the application. These operations are not queries; they use database commands rather than queries, and they
do not return a cursor. However, they still require MongoDB to read data.
Running aggregation operations on the database side can be more efcient than running them in the application layer
and can reduce the amount of data MongoDB needs to send to the application. These aggregation operations include
basic grouping, counting, and even processing data using a map reduce framework. Additionally, in 2.2 MongoDB
provides a complete aggregation framework for more rich aggregation operations.
The aggregation framework provides users with a pipeline like framework: documents enter from a collection and
then pass through a series of steps by a sequence of pipeline operators (page 228) that manipulate and transform the
documents until theyre output at the end. The aggregation framework is accessible via the aggregate (page 818)
command or the db.collection.aggregate() (page 905) helper in the mongo (page 1002) shell.
For more information on the aggregation framework see Aggregation (page 209).
Additionally, MongoDB provides a number of simple data aggregation operations for more basic data aggregation
operations:
count (page 830) (count() (page 892))
distinct (page 833) (db.collection.distinct() (page 906))
group (page 849) (db.collection.group() (page 918))
mapReduce (page 860). (Also consider mapReduce() (page 922) and Map-Reduce (page 243).)
16.1.4 Architecture
Read Operations from Sharded Clusters
Sharded clusters allow you to partition a data set among a cluster of mongod (page 989) in a way that is nearly
transparent to the application. See the Sharding (page 419) section of this manual for additional information about
these deployments.
For a sharded cluster, you issue all operations to one of the mongos (page 999) instances associated with the cluster.
mongos (page 999) instances route operations to the mongod (page 989) in the cluster and behave like mongod
138 Chapter 16. Read and Write Operations in MongoDB
MongoDB Documentation, Release 2.4.1
(page 989) instances to the application. Read operations to a sharded collection in a sharded cluster are largely the
same as operations to a replica set or standalone instances. See the section on Read Operations in Sharded Clusters
(page 426) for more information.
In sharded deployments, the mongos (page 999) instance routes the queries fromthe clients to the mongod (page 989)
instances that hold the data, using the cluster metadata stored in the cong database (page 425).
For sharded collections, if queries do not include the shard key (page 421), the mongos (page 999) must direct the
query to all shards in a collection. These scatter gather queries can be inefcient, particularly on larger clusters, and
are unfeasible for routine operations.
For more information on read operations in sharded clusters, consider the following resources:
An Introduction to Shard Keys (page 421)
Shard Key Internals and Operations (page 431)
Querying Sharded Clusters (page 432)
Sharded Cluster Operations and mongos Instances (page 426)
Read Operations from Replica Sets
Replica sets use read preferences to determine where and how to route read operations to members of the replica set.
By default, MongoDB always reads data from a replica sets primary. You can modify that behavior by changing the
read preference mode (page 360).
You can congure the read preference mode (page 360) on a per-connection or per-operation basis to allow reads from
secondaries to:
reduce latency in multi-data-center deployments,
improve read throughput by distributing high read-volumes (relative to write volume),
for backup operations, and/or
to allow reads during failover (page 334) situations.
Read operations from secondary members of replica sets are not guaranteed to reect the current state of the primary,
and the state of secondaries will trail the primary by some amount of time. Often, applications dont rely on this kind
of strict consistency, but application developers should always consider the needs of their application before setting
read preference.
For more information on read preferences (page 360) or on the read preference modes, see Read Preference (page 360)
and Read Preference Modes (page 360).
16.2 Write Operations
All operations that create or modify data in the MongoDB instance are write operations. MongoDB represents data as
BSON documents stored in collections. Write operations target one collection and are atomic on the level of a single
document: no single write operation can atomically affect more than one document or more than one collection.
This document introduces the write operators available in MongoDB as well as presents strategies to increase the
efciency of writes in applications.
16.2.1 Write Operators
For information on write operators and how to write data to a MongoDB database, see the following pages:
16.2. Write Operations 139
MongoDB Documentation, Release 2.4.1
Create (page 167)
Update (page 185)
Delete (page 191)
For information on specic methods used to perform write operations in the mongo (page 1002) shell, see the follow-
ing:
db.collection.insert() (page 921)
db.collection.update() (page 933)
db.collection.save() (page 931)
db.collection.findAndModify() (page 912)
db.collection.remove() (page 929)
For information on how to perform write operations from within an application, see the MongoDB Drivers and Client
Libraries (page 493) documentation or the documentation for your client library.
16.2.2 Write Concern
Note: The driver write concern (page 1194) change created a new connection class in all of the MongoDB drivers,
called MongoClient with a different default write concern. See the release notes (page 1194) for this change, and
the release notes for the driver youre using for more information about your drivers release.
Operational Considerations and Write Concern
Clients issue write operations with some level of write concern, which describes the level of concern or guarantee the
server will provide in its response to a write operation. Consider the following levels of conceptual write concern:
errors ignored: Write operations are not acknowledged by MongoDB, and may not succeed in the case of
connection errors that the client is not yet aware of, or if the mongod (page 989) produces an exception (e.g.
a duplicate key exception for unique indexes (page 278).) While this operation is efcient because it does not
require the database to respond to every write operation, it also incurs a signicant risk with regards to the
persistence and durability of the data.
Warning: Do not use this option in normal operation.
unacknowledged: MongoDB does not acknowledge the receipt of write operation as with a write concern level
of ignore; however, the driver will receive and handle network errors, as possible given system networking
conguration.
Before the releases outlined in Default Write Concern Change (page 1194), this was the default write concern.
receipt acknowledged: The mongod (page 989) will conrm the receipt of the write operation, allowing the
client to catch network, duplicate key, and other exceptions. After the releases outlined in Default Write Concern
Change (page 1194), this is the default write concern.
4
journaled: The mongod (page 989) will conrm the write operation only after it has written the operation to
the journal. This conrms that the write operation can survive a mongod (page 989) shutdown and ensures that
the write operation is durable.
4
The default write concern is to call getLastError (page 847) with no arguments. For replica sets, you can dene the default write concern
settings in the getLastErrorDefaults (page 1061) If getLastErrorDefaults (page 1061) does not dene a default write concern
setting, getLastError (page 847) defaults to basic receipt acknowledgment.
140 Chapter 16. Read and Write Operations in MongoDB
MongoDB Documentation, Release 2.4.1
While receipt acknowledged without journaled provides the fundamental basis for write concern, there is an
up-to 100 millisecond window between journal commits where the write operation is not fully durable. Require
journaled as part of the write concern to provide this durability guarantee.
Replica sets present an additional layer of consideration for write concern. Basic write concern levels affect the write
operation on only one mongod (page 989) instance. The w argument to getLastError (page 847) provides a
replica acknowledged level of write concern. With replica acknowledged you can guarantee that the write operation
has propagated to the members of a replica set. See the Write Concern for Replica Sets (page 357) document for more
information.
Note: Requiring journaled write concern in a replica set only requires a journal commit of the write operation to the
primary of the set regardless of the level of replica acknowledged write concern.
Internal Operation of Write Concern
To provide write concern, drivers (page 493) issue the getLastError (page 847) command after a write operation
and receive a document with information about the last operation. This documents err eld contains either:
null, which indicates the write operations have completed successfully, or
a description of the last error encountered.
The denition of a successful write depends on the arguments specied to getLastError (page 847), or in
replica sets, the conguration of getLastErrorDefaults (page 1061). When deciding the level of write concern
for your application, become familiar with the Operational Considerations and Write Concern (page 140).
The getLastError (page 847) command has the following options to congure write concern requirements:
j or journal option
This option conrms that the mongod (page 989) instance has written the data to the on-disk journal and ensures
data is not lost if the mongod (page 989) instance shuts down unexpectedly. Set to true to enable, as shown
in the following example:
db.runCommand( { getLastError: 1, j: "true" } )
If you set journal (page 1049) to true, and the mongod (page 989) does not have journaling enabled, as with
nojournal (page 1049), then getLastError (page 847) will provide basic receipt acknowledgment, and
will include a jnote eld in its return document.
w option
This option provides the ability to disable write concern entirely as well as species the write concern operations
for replica sets. See Operational Considerations and Write Concern (page 140) for an introduction to the
fundamental concepts of write concern. By default, the w option is set to 1, which provides basic receipt
acknowledgment on a single mongod (page 989) instance or on the primary in a replica set.
The w option takes the following values:
-1:
Disables all acknowledgment of write operations, and suppresses all including network and socket errors.
0:
Disables basic acknowledgment of write operations, but returns information about socket excepts and
networking errors to the application.
16.2. Write Operations 141
MongoDB Documentation, Release 2.4.1
Note: If you disable basic write operation acknowledgment but require journal commit acknowledgment,
the journal commit prevails, and the driver will require that mongod (page 989) will acknowledge the
replica set.
1:
Provides acknowledgment of write operations on a standalone mongod (page 989) or the primary in a
replica set.
A number greater than 1:
Guarantees that write operations have propagated successfully to the specied number of replica set mem-
bers including the primary. If you set w to a number that is greater than the number of set members that
hold data, MongoDB waits for the non-existent members to become available, which means MongoDB
blocks indenitely.
majority:
Conrms that write operations have propagated to the majority of congured replica set: nodes must ac-
knowledge the write operation before it succeeds. This ensures that write operation will never be subject
to a rollback in the course of normal operation, and furthermore allows you to prevent hard coding as-
sumptions about the size of your replica set into your application.
A tag set:
By specifying a tag set (page 1063) you can have ne-grained control over which replica set members
must acknowledge a write operation to satisfy the required level of write concern.
getLastError (page 847) also supports a wtimeout setting which allows clients to specify a timeout for the
write concern: if you dont specify wtimeout and the mongod (page 989) cannot fulll the write concern the
getLastError (page 847) will block, potentially forever.
For more information on write concern and replica sets, see Write Concern for Replica Sets (page 357) for more
information..
In sharded clusters, mongos (page 999) instances will pass write concern on to the shard mongod (page 989) in-
stances.
16.2.3 Bulk Inserts
In some situations you may need to insert or ingest a large amount of data into a MongoDB database. These bulk
inserts have some special considerations that are different from other write operations.
The insert() (page 921) method, when passed an array of documents, will perform a bulk insert, and inserts each
document atomically. Drivers (page 493) provide their own interface for this kind of operation. New in version 2.2:
insert() (page 921) in the mongo (page 1002) shell gained support for bulk inserts in version 2.2. Bulk insert can
signicantly increase performance by amortizing write concern (page 140) costs. In the drivers, you can congure
write concern for batches rather than on a per-document level.
Drivers also have a ContinueOnError option in their insert operation, so that the bulk operation will continue to
insert remaining documents in a batch even if an insert fails.
Note: New in version 2.0: Support for ContinueOnError depends on version 2.0 of the core mongod (page 989)
and mongos (page 999) components.
If the bulk insert process generates more than one error in a batch job, the client will only receive the most recent
error. All bulk operations to a sharded collection run with ContinueOnError, which applications cannot disable.
142 Chapter 16. Read and Write Operations in MongoDB
MongoDB Documentation, Release 2.4.1
See Strategies for Bulk Inserts in Sharded Clusters (page 454) section for more information on consideration for bulk
inserts in sharded clusters.
For more information see your driver documentation (page 493) for details on performing bulk inserts in your appli-
cation. Also consider the following resources: Sharded Clusters (page 145), Strategies for Bulk Inserts in Sharded
Clusters (page 454), and Importing and Exporting MongoDB Data (page 67).
16.2.4 Indexing
After every insert, update, or delete operation, MongoDB must update every index associated with the collection in
addition to the data itself. Therefore, every index on a collection adds some amount of overhead for the performance
of write operations.
5
In general, the performance gains that indexes provide for read operations are worth the insertion penalty; however,
when optimizing write performance, be careful when creating new indexes and always evaluate the indexes on the
collection and ensure that your queries are actually using these indexes.
For more information on indexes in MongoDB consider Indexes (page 271) and Indexing Strategies (page 289).
16.2.5 Isolation
When a single write operation modies multiple documents, the operation as a whole is not atomic, and other opera-
tions may interleave. The modication of a single document, or record, is always atomic, even if the write operation
modies multiple sub-document within the single record.
No other operations are atomic; however, you can attempt to isolate a write operation that affects multiple documents
using the isolation operator (page 766).
To isolate a sequence of write operations from other read and write operations, see Perform Two Phase Commits
(page 503).
16.2.6 Updates
Each document in a MongoDB collection has allocated record space which includes the entire document and a small
amount of padding. This padding makes it possible for update operations to increase the size of a document slightly
without causing the document to outgrow the allocated record size.
Documents in MongoDB can grow up to the full maximum BSON document size (page 1133). However, when
documents outgrow their allocated record size MongoDB must allocate a new record and move the document to the
new record. Update operations that do not cause a document to grow, (i.e. in-place updates,) are signicantly more
efcient than those updates that cause document growth. Use data models (page 147) that minimize the need for
document growth when possible.
For complete examples of update operations, see Update (page 185).
16.2.7 Padding Factor
If an update operation does not cause the document to increase in size, MongoDB can apply the update in-place. Some
updates change the size of the document, for example using the $push (page 779) operator to append a sub-document
to an array can cause the top level document to grow beyond its allocated space.
5
The overhead for sparse indexes (page 278) inserts and updates to un-indexed elds is less than for non-sparse indexes. Also for non-sparse
indexes, updates that dont change the record size have less indexing overhead.
16.2. Write Operations 143
MongoDB Documentation, Release 2.4.1
When documents grow, MongoDBrelocates the document on disk with enough contiguous space to hold the document.
These relocations take longer than in-place updates, particularly if the collection has indexes that MongoDB must
update all index entries. If collection has many indexes, the move will impact write throughput.
To minimize document movements, MongoDB employs padding. MongoDB adaptively learns if documents in a
collection tend to grow, and if they do, adds a paddingFactor (page 1101) so that the documents have room to
grow on subsequent writes. The paddingFactor (page 1101) indicates the padding for new inserts and moves.
New in version 2.2: You can use the collMod (page 823) command with the usePowerOf2Sizes (page 823)
ag so that MongoDB allocates document space in sizes that are powers of 2. This helps ensure that MongoDB can
efciently reuse the space freed as a result of deletions or document relocations. As with all padding, using document
space allocations with power of 2 sizes minimizes, but does not eliminate, document movements. To check the current
paddingFactor (page 1101) on a collection, you can run the db.collection.stats() (page 932) operation
in the mongo (page 1002) shell, as in the following example:
db.myCollection.stats()
Since MongoDB writes each document at a different point in time, the padding for each document will not be the
same. You can calculate the padding size by subtracting 1 from the paddingFactor (page 1101), for example:
padding size = (paddingFactor - 1)
*
<document size>.
For example, a paddingFactor (page 1101) of 1.0 species no padding whereas a paddingFactor of 1.5 species
a padding size of 0.5 or 50 percent (50%) of the document size.
Because the paddingFactor (page 1101) is relative to the size of each document, you cannot calculate the exact
amount of padding for a collection based on the average document size and padding factor.
If an update operation causes the document to decrease in size, for instance if you perform an $unset (page 791) or
a $pop (page 778) update, the document remains in place and effectively has more padding. If the document remains
this size, the space is not reclaimed until you perform a compact (page 825) or a repairDatabase (page 872)
operation.
Note: The following operations remove padding:
compact (page 825),
repairDatabase (page 872), and
initial replica sync operations.
However, with the compact (page 825) command, you can run the command with a paddingFactor or a
paddingBytes parameter.
Padding is also removed if you use mongoexport (page 1025) from a collection. If you use mongoimport
(page 1022) into a new collection, mongoimport (page 1022) will not add padding. If you use mongoimport
(page 1022) with an existing collection with padding, mongoimport (page 1022) will not affect the existing padding.
When a database operation removes padding, subsequent update that require changes in record sizes will have re-
duced throughput until the collections padding factor grows. Padding does not affect in-place, and after compact
(page 825), repairDatabase (page 872), and replica set initial sync the collection will require less storage.
See Also:
Can I manually pad documents to prevent moves during updates? (page 714)
Fast Updates with MongoDB with in-place Updates (blog post)
144 Chapter 16. Read and Write Operations in MongoDB
MongoDB Documentation, Release 2.4.1
16.2.8 Architecture
Replica Sets
In replica sets, all write operations go to the sets primary, which applies the write operation then records the oper-
ations on the primarys operation log or oplog. The oplog is a reproducible sequence of operations to the data set.
Secondary members of the set are continuously replicating the oplog and applying the operations to themselves in an
asynchronous process.
Large volumes of write operations, particularly bulk operations, may create situations where the secondary members
have difculty applying the replicating operations from the primary at a sufcient rate: this can cause the secondarys
state to fall behind that of the primary. Secondaries that are signicantly behind the primary present problems for
normal operation of the replica set, particularly failover (page 352) in the form of rollbacks (page 335) as well as
general read consistency (page 335).
To help avoid this issue, you can customize the write concern (page 140) to return conrmation of the write operation
to another member
6
of the replica set every 100 or 1,000 operations. This provides an opportunity for secondaries
to catch up with the primary. Write concern can slow the overall progress of write operations but ensure that the
secondaries can maintain a largely current state with respect to the primary.
For more information on replica sets and write operations, see Write Concern (page 357), Oplog (page 336), Oplog
Internals (page 366), and Changing Oplog Size (page 347).
Sharded Clusters
In a sharded cluster, MongoDB directs a given write operation to a shard and then performs the write on a particular
chunk on that shard. Shards and chunks are range-based. Shard keys affect how MongoDB distributes documents
among shards. Choosing the correct shard key can have a great impact on the performance, capability, and functioning
of your database and cluster.
For more information, see Sharded Cluster Administration (page 424) and Bulk Inserts (page 142).
6
Calling getLastError (page 847) intermittently with a w value of 2 or majority will slow the throughput of write trafc; however, this
practice will allow the secondaries to remain current with the state of the primary.
16.2. Write Operations 145
MongoDB Documentation, Release 2.4.1
146 Chapter 16. Read and Write Operations in MongoDB
CHAPTER 17
Document Orientation Concepts
17.1 Data Modeling Considerations for MongoDB Applications
17.1.1 Overview
Data in MongoDB has a exible schema. Collections do not enforce document structure. This means that:
documents in the same collection do not need to have the same set of elds or structure, and
common elds in a collections documents may hold different types of data.
Each document only needs to contain relevant elds to the entity or object that the document represents. In practice,
most documents in a collection share a similar structure. Schema exibility means that you can model your documents
in MongoDB so that they can closely resemble and reect application-level objects.
As in all data modeling, when developing data models (i.e. schema designs) for MongoDB, you must consider the
inherent properties and requirements of the application objects and the relationships between application objects.
MongoDB data models must also reect:
how data will grow and change over time, and
the kinds of queries your application will perform.
These considerations and requirements force developers to make a number of multi-factored decisions when modeling
data, including:
normalization and de-normalization.
These decisions reect the degree to which the data model should store related pieces of data in a single doc-
ument. Fully normalized data models describe relationships using references (page 160) between documents,
while de-normalized models may store redundant information across related models.
indexing strategy (page 289).
representation of data in arrays in BSON.
Although a number of data models may be functionally equivalent for a given application, different data models may
have signicant impacts on MongoDB and applications performance.
This document provides a high level overview of these data modeling decisions and factors. In addition, consider the
Data Modeling Patterns and Examples (page 151) section which provides more concrete examples of all the discussed
patterns.
147
MongoDB Documentation, Release 2.4.1
17.1.2 Data Modeling Decisions
Data modeling decisions involve determining how to structure the documents to model the data effectively. The
primary decision is whether to embed (page 148) or to use references (page 148).
Embedding
To de-normalize data, store two related pieces of data in a single document.
Operations within a document are less expensive for the server than operations that involve multiple documents.
In general, use embedded data models when:
you have contains relationships between entities. See Model Embedded One-to-One Relationships Between
Documents (page 195).
you have one-to-many relationships where the many objects always appear with or are viewed in the context
of their parent documents. See Model Embedded One-to-Many Relationships Between Documents (page 196).
Embedding provides the following benets:
generally better performance for read operations.
the ability to request and retrieve related data in a single database operation.
Embedding related data in documents, can lead to situations where documents grow after creation. Document growth
can impact write performance and lead to data fragmentation. Furthermore, documents in MongoDB must be smaller
than the maximum BSON document size (page 1133). For larger documents, consider using GridFS (page 162).
For examples in accessing embedded documents, see Subdocuments (page 129).
See Also:
dot notation for information on reaching into embedded sub-documents.
Arrays (page 130) for more examples on accessing arrays
Subdocuments (page 129) for more examples on accessing subdocuments
Referencing
To normalize data, store references (page 160) between two documents to indicate a relationship between the data
represented in each document.
In general, use normalized data models:
when embedding would result in duplication of data but would not provide sufcient read performance advan-
tages to outweigh the implications of the duplication.
to represent more complex many-to-many relationships.
to model large hierarchical data sets. See data-modeling-trees.
Referencing provides more exibility than embedding; however, to resolve the references, client-side applications
must issue follow-up queries. In other words, using references requires more roundtrips to the server.
See Model Referenced One-to-Many Relationships Between Documents (page 197) for an example of referencing.
148 Chapter 17. Document Orientation Concepts
MongoDB Documentation, Release 2.4.1
Atomicity
MongoDB only provides atomic operations on the level of a single document.
1
As a result needs for atomic operations
inuence decisions to use embedded or referenced relationships when modeling data for MongoDB.
Embed elds that need to be modied together atomically in the same document. See Model Data for Atomic Opera-
tions (page 199) for an example of atomic updates within a single document.
17.1.3 Operational Considerations
In addition to normalization and normalization concerns, a number of other operational factors help shape data mod-
eling decisions in MongoDB. These factors include:
data lifecycle management,
number of collections and
indexing requirements,
sharding, and
managing document growth.
These factors implications for database and application performance as well as future maintenance and development
costs.
Data Lifecycle Management
Data modeling decisions should also take data lifecycle management into consideration.
The Time to Live or TTL feature (page 517) of collections expires documents after a period of time. Consider using
the TTL feature if your application requires some data to persist in the database for a limited period of time.
Additionally, if your application only uses recently inserted documents consider Capped Collections (page 498).
Capped collections provide rst-in-rst-out (FIFO) management of inserted documents and optimized to support op-
erations that insert and read documents based on insertion order.
Large Number of Collections
In certain situations, you might choose to store information in several collections rather than in a single collection.
Consider a sample collection logs that stores log documents for various environment and applications. The logs
collection contains documents of the following form:
{ log: "dev", ts: ..., info: ... }
{ log: "debug", ts: ..., info: ...}
If the total number of documents is low you may group documents into collection by type. For logs, consider main-
taining distinct log collections, such as logs.dev and logs.debug. The logs.dev collection would contain
only the documents related to the dev environment.
Generally, having large number of collections has no signicant performance penalty and results in very good perfor-
mance. Distinct collections are very important for high-throughput batch processing.
When using models that have a large number of collections, consider the following behaviors:
Each collection has a certain minimum overhead of a few kilobytes.
1
Document-level atomic operations include all operations within a single MongoDB document record: operations that affect multiple sub-
documents within that single record are still atomic.
17.1. Data Modeling Considerations for MongoDB Applications 149
MongoDB Documentation, Release 2.4.1
Each index, including the index on _id, requires at least 8KB of data space.
A single <database>.ns le stores all meta-data for each database. Each index and collection has its own entry
in the namespace le, MongoDB places limits on the size of namespace files. (page 1133).
Because of limits on namespaces (page 1133), you may wish to know the current number of namespaces in
order to determine how many additional namespaces the database can support, as in the following example:
db.system.namespaces.count()
The <database>.ns le defaults to 16 MB. To change the size of the <database>.ns le, pass a new size to
--nssize option <new size MB> (page 993) on server start.
The --nssize (page 993) sets the size for new <database>.ns les. For existing databases, after starting up
the server with --nssize (page 993), run the db.repairDatabase() (page 945) command from the mongo
(page 1002) shell.
Indexes
Create indexes to support common queries. Generally, indexes and index use in MongoDB correspond to indexes and
index use in relational database: build indexes on elds that appear often in queries and for all operations that return
sorted results. MongoDB automatically creates a unique index on the _id eld.
As you create indexes, consider the following behaviors of indexes:
Each index requires at least 8KB of data space.
Adding an index has some negative performance impact for write operations. For collections with high write-
to-read ratio, indexes are expensive as each insert must add keys to each index.
Collections with high proportion of read operations to write operations often benet from additional indexes.
Indexes do not affect un-indexed read operations.
See Indexing Strategies (page 289) for more information on determining indexes. Additionally, the MongoDB
database proler (page 681) may help identify inefcient queries.
Sharding
Sharding allows users to partition a collection within a database to distribute the collections documents across a
number of mongod (page 989) instances or shards.
The shard key determines how MongoDB distributes data among shards in a sharded collection. Selecting the proper
shard key (page 421) has signicant implications for performance.
See Sharded Cluster Overview (page 421) for more information on sharding and the selection of the shard key
(page 421).
Document Growth
Certain updates to documents can increase the document size, such as pushing elements to an array and adding new
elds. If the document size exceeds the allocated space for that document, MongoDB relocates the document on disk.
This internal relocation can be both time and resource consuming.
Although MongoDB automatically provides padding to minimize the occurrence of relocations, you may still need to
manually handle document growth. Refer to Pre-Aggregated Reports (page 567) for an example of the Pre-allocation
approach to handle document growth.
150 Chapter 17. Document Orientation Concepts
MongoDB Documentation, Release 2.4.1
17.1.4 Data Modeling Patterns and Examples
The following documents provide overviews of various data modeling patterns and common schema design consider-
ations:
Model Embedded One-to-One Relationships Between Documents (page 195)
Model Embedded One-to-Many Relationships Between Documents (page 196)
Model Referenced One-to-Many Relationships Between Documents (page 197)
Model Data for Atomic Operations (page 199)
Model Tree Structures with Parent References (page 200)
Model Tree Structures with Child References (page 200)
Model Tree Structures with Materialized Paths (page 202)
Model Tree Structures with Nested Sets (page 203)
For more information and examples of real-world data modeling, consider the following external resources:
Schema Design by Example
Walkthrough MongoDB Data Modeling
Document Design for MongoDB
Dynamic Schema Blog Post
MongoDB Data Modeling and Rails
Ruby Example of Materialized Paths
Sean Cribs Blog Post which was the source for much of the data-modeling-trees content.
17.2 BSON Documents
MongoDB is a document-based database system, and as a result, all records, or data, in MongoDB are documents.
Documents are the default representation of most user accessible data structures in the database. Documents provide
structure for data in the following MongoDB contexts:
the records (page 153) stored in collections
the query selectors (page 155) that determine which records to select for read, update, and delete operations
the update actions (page 155) that specify the particular eld updates to perform during an update operation
the specication of indexes (page 156) for collection.
arguments to several MongoDB methods and operators, including:
sort order (page 156) for the sort() (page 901) method.
index specication (page 156) for the hint() (page 894) method.
the output of a number of MongoDB commands and operations, including:
the output (page 1100) of collStats (page 825) command, and
the output (page 1080) of the serverStatus (page 878) command.
17.2. BSON Documents 151
MongoDB Documentation, Release 2.4.1
17.2.1 Structure
The document structure in MongoDB are BSON objects with support for the full range of BSON types; however,
BSON documents are conceptually, similar to JSON objects, and have the following structure:
{
field1: value1,
field2: value2,
field3: value3,
...
fieldN: valueN
}
Having support for the full range of BSON types, MongoDB documents may contain eld and value pairs where the
value can be another document, an array, an array of documents as well as the basic types such as Double, String,
and Date. See also BSON Type Considerations (page 157).
Consider the following document that contains values of varying types:
var mydoc = {
_id: ObjectId("5099803df3f4948bd2f98391"),
name: { first: "Alan", last: "Turing" },
birth: new Date(Jun 23, 1912),
death: new Date(Jun 07, 1954),
contribs: [ "Turing machine", "Turing test", "Turingery" ],
views : NumberLong(1250000)
}
The document contains the following elds:
_id that holds an ObjectId.
name that holds a subdocument that contains the elds first and last.
birth and death, which both have Date types.
contribs that holds an array of strings.
views that holds a value of NumberLong type.
All eld names are strings in BSON documents. Be aware that there are some restrictions on field names
(page 1135) for BSON documents: eld names cannot contain null characters, dots (.), or dollar signs ($).
Note: BSON documents may have more than one eld with the same name; however, most MongoDB Interfaces
(page 493) represent MongoDB with a structure (e.g. a hash table) that does not support duplicate eld names. If you
need to manipulate documents that have more than one eld with the same name, see your drivers documentation for
more information.
Some documents created by internal MongoDB processes may have duplicate elds, but no MongoDB process will
ever add duplicate keys to an existing user document.
Type Operators
To determine the type of elds, the mongo (page 1002) shell provides the following operators:
instanceof returns a boolean to test if a value has a specic type.
typeof returns the type of a eld.
152 Chapter 17. Document Orientation Concepts
MongoDB Documentation, Release 2.4.1
Example
Consider the following operations using instanceof and typeof:
The following operation tests whether the _id eld is of type ObjectId:
mydoc._id instanceof ObjectId
The operation returns true.
The following operation returns the type of the _id eld:
typeof mydoc._id
In this case typeof will return the more generic object type rather than ObjectId type.
Dot Notation
MongoDB uses the dot notation to access the elements of an array and to access the elds of a subdocument.
To access an element of an array by the zero-based index position, you concatenate the array name with the dot (.)
and zero-based index position:
<array>.<index>
To access a eld of a subdocument with dot-notation, you concatenate the subdocument name with the dot (.) and the
eld name:
<subdocument>.<field>
See Also:
Subdocuments (page 129) for dot notation examples with subdocuments.
Arrays (page 130) for dot notation examples with arrays.
17.2.2 Document Types in MongoDB
Record Documents
Most documents in MongoDB in collections store data from users applications.
These documents have the following attributes:
The maximum BSON document size is 16 megabytes.
The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, dur-
ing transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB
provides the GridFS API. See mongofiles (page 1041) and the documentation for your driver (page 493) for
more information about GridFS.
Documents (page 151) have the following restrictions on eld names:
The eld name _id is reserved for use as a primary key; its value must be unique in the collection, is
immutable, and may be of any type other than an array.
The eld names cannot start with the $ character.
The eld names cannot contain the . character.
17.2. BSON Documents 153
MongoDB Documentation, Release 2.4.1
Note: Most MongoDB driver clients will include the _id eld and generate an ObjectId before sending the insert
operation to MongoDB; however, if the client sends a document without an _id eld, the mongod (page 989) will
add the _id eld and generate the ObjectId.
The following document species a record in a collection:
{
_id: 1,
name: { first: John, last: Backus },
birth: new Date(Dec 03, 1924),
death: new Date(Mar 17, 2007),
contribs: [ Fortran, ALGOL, Backus-Naur Form, FP ],
awards: [
{ award: National Medal of Science,
year: 1975,
by: National Science Foundation },
{ award: Turing Award,
year: 1977,
by: ACM }
]
}
The document contains the following elds:
_id, which must hold a unique value and is immutable.
name that holds another document. This sub-document contains the elds first and last, which both hold
strings.
birth and death that both have date types.
contribs that holds an array of strings.
awards that holds an array of documents.
Consider the following behavior and constraints of the _id eld in MongoDB documents:
In documents, the _id eld is always indexed for regular collections.
The _id eld may contain values of any BSON data type other than an array.
Consider the following options for the value of an _id eld:
Use an ObjectId. See the ObjectId (page 158) documentation.
Although it is common to assign ObjectId values to _id elds, if your objects have a natural unique identier,
consider using that for the value of _id to save space and to avoid an additional index.
Generate a sequence number for the documents in your collection in your application and use this value for
the _id value. See the Create an Auto-Incrementing Sequence Field (page 512) tutorial for an implementation
pattern.
Generate a UUID in your application code. For a more efcient storage of the UUID values in the collection
and in the _id index, store the UUID as a value of the BSON BinData type.
Index keys that are of the BinData type are more efciently stored in the index if:
the binary subtype value is in the range of 0-7 or 128-135, and
the length of the byte array is: 0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 16, 20, 24, or 32.
154 Chapter 17. Document Orientation Concepts
MongoDB Documentation, Release 2.4.1
Use your drivers BSON UUID facility to generate UUIDs. Be aware that driver implementations may imple-
ment UUID serialization and deserialization logic differently, which may not be fully compatible with other
drivers. See your driver documentation for information concerning UUID interoperability.
Query Specication Documents
Query documents specify the conditions that determine which records to select for read, update, and delete opera-
tions. You can use <field>:<value> expressions to specify the equality condition and query operator (page 974)
expressions to specify additional conditions.
When passed as an argument to methods such as the find() (page 910) method, the remove() (page 929) method,
or the update() (page 933) method, the query document selects documents for MongoDB to return, remove, or
update, as in the following:
db.bios.find( { _id: 1 } )
db.bios.remove( { _id: { $gt: 3 } } )
db.bios.update( { _id: 1, name: { first: John, last: Backus } },
<update>,
<options> )
See Also:
Query Document (page 128) and Read (page 175) for more examples on selecting documents for reads.
Update (page 185) for more examples on selecting documents for updates.
Delete (page 191) for more examples on selecting documents for deletes.
Update Specication Documents
Update documents specify the data modications to perform during an update() (page 933) operation to modify
existing records in a collection. You can use update operators (page 976) to specify the exact actions to perform on
the document elds.
Consider the update document example:
{
$set: { name.middle: Warner },
$push: { awards: { award: IBM Fellow,
year: 1963,
by: IBM }
}
}
When passed as an argument to the update() (page 933) method, the update actions document:
Modies the eld name whose value is another document. Specically, the $set (page 785) operator updates
the middle eld in the name subdocument. The document uses dot notation (page 153) to access a eld in a
subdocument.
Adds an element to the eld awards whose value is an array. Specically, the $push (page 779) operator
adds another document as element to the eld awards.
db.bios.update(
{ _id: 1 },
{
$set: { name.middle: Warner },
$push: { awards: {
award: IBM Fellow,
17.2. BSON Documents 155
MongoDB Documentation, Release 2.4.1
year: 1963,
by: IBM
}
}
}
)
See Also:
update operators (page 976) page for the available update operators and syntax.
update (page 185) for more examples on update documents.
For additional examples of updates that involve array elements, including where the elements are documents, see the
$ (page 778) positional operator.
Index Specication Documents
Index specication documents describe the elds to index on during the index creation (page 908). See indexes
(page 273) for an overview of indexes.
2
Index documents contain eld and value pairs, in the following form:
{ field: value }
field is the eld in the documents to index.
value is either 1 for ascending or -1 for descending.
The following document species the multi-key index (page 277) on the _id eld and the last eld contained in the
subdocument name eld. The document uses dot notation (page 153) to access a eld in a subdocument:
{ _id: 1, name.last: 1 }
When passed as an argument to the ensureIndex() (page 908) method, the index documents species the index
to create:
db.bios.ensureIndex( { _id: 1, name.last: 1 } )
Sort Order Specication Documents
Sort order documents specify the order of documents that a query() (page 910) returns. Pass sort order specication
documents as an argument to the sort() (page 901) method. See the sort() (page 901) page for more information
on sorting.
The sort order documents contain eld and value pairs, in the following form:
{ field: value }
field is the eld by which to sort documents.
value is either 1 for ascending or -1 for descending.
The following document species the sort order using the elds from a sub-document name rst sort by the last
eld ascending, then by the first eld also ascending:
{ name.last: 1, name.first: 1 }
2
Indexes optimize a number of key read (page 127) and write (page 139) operations.
156 Chapter 17. Document Orientation Concepts
MongoDB Documentation, Release 2.4.1
When passed as an argument to the sort() (page 901) method, the sort order document sorts the results of the
find() (page 910) method:
db.bios.find().sort( { name.last: 1, name.first: 1 } )
17.2.3 BSON Type Considerations
The following BSON types require special consideration:
ObjectId
ObjectIds are: small, likely unique, fast to generate, and ordered. These values consists of 12-bytes, where the rst
4-bytes is a timestamp that reects the ObjectIds creation. Refer to the ObjectId (page 158) documentation for more
information.
String
BSONstrings are UTF-8. In general, drivers for each programming language convert from the languages string format
to UTF-8 when serializing and deserializing BSON. This makes it possible to store most international characters in
BSON strings with ease.
3
In addition, MongoDB $regex (page 781) queries support UTF-8 in the regex string.
Timestamps
BSON has a special timestamp type for internal MongoDB use and is not associated with the regular Date (page 158)
type. Timestamp values are a 64 bit value where:
the rst 32 bits are a time_t value (seconds since the Unix epoch)
the second 32 bits are an incrementing ordinal for operations within a given second.
Within a single mongod (page 989) instance, timestamp values are always unique.
In replication, the oplog has a ts eld. The values in this eld reect the operation time, which uses a BSON
timestamp value.
Note: The BSON Timestamp type is for internal MongoDB use. For most cases, in application development, you
will want to use the BSON date type. See Date (page 158) for more information.
If you create a BSON Timestamp using the empty constructor (e.g. new Timestamp()), MongoDB will only
generate a timestamp if you use the constructor in the rst eld of the document.
4
Otherwise, MongoDB will
generate an empty timestamp value (i.e. Timestamp(0, 0).) Changed in version 2.1: mongo (page 1002) shell
displays the Timestamp value with the wrapper:
Timestamp(<time_t>, <ordinal>)
Prior to version 2.1, the mongo (page 1002) shell display the Timestamp value as a document:
{ t : <time_t>, i : <ordinal> }
3
Given strings using UTF-8 character sets, using sort() (page 901) on strings will be reasonably correct; however, because internally sort()
(page 901) uses the C++ strcmp api, the sort order may handle some characters incorrectly.
4
If the rst eld in the document is _id, then you can generate a timestamp in the second eld of a document.
17.2. BSON Documents 157
MongoDB Documentation, Release 2.4.1
Date
BSON Date is a 64-bit integer that represents the number of milliseconds since the Unix epoch (Jan 1, 1970). The
ofcial BSON specication refers to the BSON Date type as the UTC datetime. Changed in version 2.0: BSON Date
type is signed.
5
Negative values represent dates before 1970. Consider the following examples of BSON Date:
Construct a Date using the new Date() constructor in the mongo (page 1002) shell:
var mydate1 = new Date()
Construct a Date using the ISODate() constructor in the mongo (page 1002) shell:
var mydate2 = ISODate()
Return the Date value as string:
mydate1.toString()
Return the month portion of the Date value; months are zero-indexed, so that January is month 0:
mydate1.getMonth()
17.3 ObjectId
17.3.1 Overview
ObjectId is a 12-byte BSON type, constructed using:
a 4-byte timestamp,
a 3-byte machine identier,
a 2-byte process id, and
a 3-byte counter, starting with a random value.
In MongoDB, documents stored in a collection require a unique _id eld that acts as a primary key. Because ObjectIds
are small, most likely unique, and fast to generate, MongoDB uses ObjectIds as the default value for the _id eld
if the _id eld is not specied; i.e., the mongod (page 989) adds the _id eld and generates a unique ObjectId to
assign as its value.
Using ObjectIds for the _id eld, provides the following additional benets:
you can access the timestamp of the ObjectIds creation, using the getTimestamp() (page 889) method.
Sorting on an _id eld that stores ObjectId values is equivalent to sorting by creation time.
Also consider the BSON Documents (page 151) section for related information on MongoDBs document orientation.
17.3.2 ObjectId()
The mongo (page 1002) shell provides the ObjectId() wrapper class to generate a new ObjectId, and to provide
the following helper attribute and methods:
str
5
Prior to version 2.0, Date values were incorrectly interpreted as unsigned integers, which affected sorts, range queries, and indexes on Date
elds. Because indexes are not recreated when upgrading, please re-index if you created an index on Date values with an earlier version, and dates
before 1970 are relevant to your application.
158 Chapter 17. Document Orientation Concepts
MongoDB Documentation, Release 2.4.1
The hexadecimal string value of the ObjectId() object.
getTimestamp() (page 889)
Returns the timestamp portion of the ObjectId() object as a Date.
toString() (page 890)
Returns the string representation of the ObjectId() object. The returned string lit-
eral has the format ObjectId(...). Changed in version 2.2: In previous versions
ObjectId.toString() (page 890) returns the value of the ObjectId as a hexadecimal string.
valueOf() (page 890)
Returns the value of the ObjectId() object as a hexadecimal string. The returned string is the
str attribute. Changed in version 2.2: In previous versions ObjectId.valueOf() (page 890)
returns the ObjectId() object.
17.3.3 Examples
Consider the following uses ObjectId() class in the mongo (page 1002) shell:
To generate a new ObjectId, use the ObjectId() constructor with no argument:
x = ObjectId()
In this example, the value of x would be:
ObjectId("507f1f77bcf86cd799439011")
To generate a new ObjectId using the ObjectId() constructor with a unique hexadecimal string:
y = ObjectId("507f191e810c19729de860ea")
In this example, the value of y would be:
ObjectId("507f191e810c19729de860ea")
To return the timestamp of an ObjectId() object, use the getTimestamp() (page 889) method as follows:
ObjectId("507f191e810c19729de860ea").getTimestamp()
This operation will return the following Date object:
ISODate("2012-10-17T20:46:22Z")
Access the str attribute of an ObjectId() object, as follows:
ObjectId("507f191e810c19729de860ea").str
This operation will return the following hexadecimal string:
507f191e810c19729de860ea
To return the string representation of an ObjectId() object, use the toString() (page 890) method as
follows:
ObjectId("507f191e810c19729de860ea").toString()
This operation will return the following output:
17.3. ObjectId 159
MongoDB Documentation, Release 2.4.1
ObjectId("507f191e810c19729de860ea")
To return the value of an ObjectId() object as a hexadecimal string, use the valueOf() (page 890) method
as follows:
ObjectId("507f191e810c19729de860ea").valueOf()
This operation returns the following output:
507f191e810c19729de860ea
17.4 Database References
MongoDB does not support joins. In MongoDB some data is denormalized, or stored with related data in documents to
remove the need for joins. However, in some cases it makes sense to store related information in separate documents,
typically in different collections or databases.
MongoDB applications use one of two methods for relating documents:
1. Manual references (page 160) where you save the _id eld of one document in another document as a reference.
Then your application can run a second query to return the embedded data. These references are simple and
sufcient for most use cases.
2. DBRefs (page 161) are references from one document to another using the value of the rst documents _id
eld collection, and optional database name. To resolve DBRefs, your application must perform additional
queries to return the referenced documents. Many drivers (page 493) have helper methods that form the query
for the DBRef automatically. The drivers
6
do not automatically resolve DBRefs into documents.
Use a DBRef when you need to embed documents from multiple collections in documents from one collection.
DBRefs also provide a common format and type to represent these relationships among documents. The DBRef
format provides common semantics for representing links between documents if your database must interact
with multiple frameworks and tools.
Unless you have a compelling reason for using a DBRef, use manual references.
17.4.1 Manual References
Background
Manual references refers to the practice of including one documents _id eld in another document. The application
can then issue a second query to resolve the referenced elds as needed.
Process
Consider the following operation to insert two documents, using the _id eld of the rst document as a reference in
the second document:
original_id = ObjectId()
db.places.insert({
"_id": original_id
"name": "Broadway Center"
"url": "bc.example.net"
6
Some community supported drivers may have alternate behavior and may resolve a DBRef into a document automatically.
160 Chapter 17. Document Orientation Concepts
MongoDB Documentation, Release 2.4.1
})
db.people.insert({
"name": "Erin"
"places_id": original_id
"url": "bc.example.net/Erin"
})
Then, when a query returns the document from the people collection you can, if needed, make a second query for
the document referenced by the places_id eld in the places collection.
Use
For nearly every case where you want to store a relationship between two documents, use manual references
(page 160). The references are simple to create and your application can resolve references as needed.
The only limitation of manual linking is that these references do not convey the database and collection name. If you
have documents in a single collection that relate to documents in more than one collection, you may need to consider
using DBRefs (page 161).
17.4.2 DBRefs
Background
DBRefs are a convention for representing a document, rather than a specic reference type. They include the name
of the collection, and in some cases the database, in addition to the value from the _id eld.
Format
DBRefs have the following elds:
$ref
The $ref eld holds the name of the collection where the referenced document resides.
$id
The $id eld contains the value of the _id eld in the referenced document.
$db
Optional.
Contains the name of the database where the referenced document resides.
Only some drivers support $db references.
Example
DBRef document would resemble the following:
{ "$ref" : <value>, "$id" : <value>, "$db" : <value> }
Consider a document from a collection that stored a DBRef in a creator eld:
{
"_id" : ObjectId("5126bbf64aed4daf9e2ab771"),
// .. application fields
"creator" : {
17.4. Database References 161
MongoDB Documentation, Release 2.4.1
"$ref" : "creators",
"$id" : ObjectId("5126bc054aed4daf9e2ab772"),
"$db" : "users"
}
}
The DBRef in this example, points to a document in the creators collection of the users database that has
ObjectId("5126bc054aed4daf9e2ab772") in its _id eld.
Note: The order of elds in the DBRef matters, and you must use the above sequence when using a DBRef.
Support
C++ The C++ driver contains no support for DBRefs. You can transverse references manually.
C# The C# driver provides access to DBRef objects with the MongoDBRef Class and supplies the FetchDBRef
Method for accessing these objects.
Java The DBRef class provides supports for DBRefs from Java.
JavaScript The mongo (page 1002) shells JavaScript (page 982) interface provides a DBRef.
Perl The Perl driver contains no support for DBRefs. You can transverse references manually or use the Mon-
goDBx::AutoDeref CPAN module.
PHP The PHP driver does support DBRefs, including the optional $db reference, through The MongoDBRef class.
Python The Python driver provides the DBRef class, and the dereference method for interacting with DBRefs.
Ruby The Ruby Driver supports DBRefs using the DBRef class and the deference method.
Use
In most cases you should use the manual reference (page 160) method for connecting two or more related documents.
However, if you need to reference documents from multiple collections, consider a DBRef.
17.5 GridFS
GridFS is a specication for storing and retrieving les that exceed the BSON-document size limit (page 1133) of
16MB.
Instead of storing a le in an single document, GridFS divides a le into parts, or chunks,
7
and stores each of those
chunks as a separate document. By default GridFS limits chunk size to 256k. GridFS uses two collections to store
les. One collection stores the le chunks, and the other stores le metadata.
When you query a GridFS store for a le, the driver or client will reassemble the chunks as needed. You can perform
range queries on les stored through GridFS. You also can access information from arbitrary sections of les, which
allows you to skip into the middle of a video or audio le.
GridFS is useful not only for storing les that exceed 16MB but also for storing any les for which you want access
without having to load the entire le into memory. For more information on the indications of GridFS, see When
should I use GridFS? (page 708).
7
The use of the term chunks in the context of GridFS is not related to the use of the term chunks in the context of sharding.
162 Chapter 17. Document Orientation Concepts
MongoDB Documentation, Release 2.4.1
17.5.1 Implement GridFS
To store and retrieve les using GridFS, use either of the following:
A MongoDB driver. See the drivers (page 493) documentation for information on using GridFS with your
driver.
The mongofiles (page 1041) command-line tool in the mongo (page 1002) shell. See mongoles
(page 1040).
17.5.2 GridFS Collections
GridFS stores les in two collections:
chunks stores the binary chunks. For details, see The chunks Collection (page 163).
files stores the les metadata. For details, see The les Collection (page 163).
GridFS places the collections in a common bucket by prexing each with the bucket name. By default, GridFS uses
two collections with names prexed by fs bucket:
fs.files
fs.chunks
You can choose a different bucket name than fs, and create multiple buckets in a single database.
The chunks Collection
Each document in the chunks collection represents a distinct chunk of a le as represented in the GridFS store. The
following is a prototype document from the chunks collection.:
{
"_id" : <string>,
"files_id" : <string>,
"n" : <num>,
"data" : <binary>
}
A document from the chunks collection contains the following elds:
chunks._id
The unique ObjectID of the chunk.
chunks.files_id
The _id of the parent document, as specied in the files collection.
chunks.n
The sequence number of the chunk. GridFS numbers all chunks, starting with 0.
chunks.data
The chunks payload as a BSON binary type.
The chunks collection uses a compound index on files_id and n, as described in GridFS Index (page 164).
The files Collection
Each document in the files collection represents a le in the GridFS store. Consider the following prototype of a
document in the files collection:
17.5. GridFS 163
MongoDB Documentation, Release 2.4.1
{
"_id" : <ObjectID>,
"length" : <num>,
"chunkSize" : <num>
"uploadDate" : <timestamp>
"md5" : <hash>
"filename" : <string>,
"contentType" : <string>,
"aliases" : <string array>,
"metadata" : <dataObject>,
}
Documents in the files collection contain some or all of the following elds. Applications may create additional
arbitrary elds:
files._id
The unique ID for this document. The _id is of the data type you chose for the original document. The default
type for MongoDB documents is BSON ObjectID.
files.length
The size of the document in bytes.
files.chunkSize
The size of each chunk. GridFS divides the document into chunks of the size specied here. The default size is
256 kilobytes.
files.uploadDate
The date the document was rst stored by GridFS. This value has the Date type.
files.md5
An MD5 hash returned from the lemd5 API. This value has the String type.
files.filename
Optional. A human-readable name for the document.
files.contentType
Optional. A valid MIME type for the document.
files.aliases
Optional. An array of alias strings.
files.metadata
Optional. Any additional information you want to store.
17.5.3 GridFS Index
GridFS uses a unique, compound index on the chunks collection for files_id and n. The index allows efcient
retrieval of chunks using the files_id and n values, as shown in the following example:
cursor = db.fs.chunks.find({files_id: myFileID}).sort({n:1});
See the relevant driver (page 493) documentation for the specic behavior of your GridFS application. If your driver
does not create this index, issue the following operation using the mongo (page 1002) shell:
db.fs.chunks.ensureIndex( { files_id: 1, n: 1 }, { unique: true } );
164 Chapter 17. Document Orientation Concepts
MongoDB Documentation, Release 2.4.1
17.5.4 Example Interface
The following is an example of the GridFS interface in Java. The example is for demonstration purposes only. For
API specics, see the relevant driver (page 493) documentation.
By default, the interface must support the default GridFS bucket, named fs, as in the following:
GridFS myFS = new GridFS(myDatabase); // returns default GridFS bucket (e.g. "fs" collection)
myFS.storeFile(new File("/tmp/largething.mpg")); // saves the file to "fs" GridFS bucket
Optionally, interfaces may support other additional GridFS buckets as in the following example:
GridFS myContracts = new GridFS(myDatabase, "contracts"); // returns GridFS bucket named "contracts"
myFS.retrieveFile("smithco", new File("/tmp/smithco.pdf")); // retrieve GridFS object "smithco"
17.5. GridFS 165
MongoDB Documentation, Release 2.4.1
166 Chapter 17. Document Orientation Concepts
CHAPTER 18
CRUD Operations for MongoDB
These documents provide an overview and examples of common database operations, i.e. CRUD, in MongoDB.
18.1 Create
Of the four basic database operations (i.e. CRUD), create operations are those that add new records or documents to a
collection in MongoDB. For general information about write operations and the factors that affect their performance,
see Write Operations (page 139); for documentation of the other CRUDoperations, see the Core MongoDB Operations
(CRUD) (page 125) page.
Overview (page 167)
insert() (page 168)
Insert the First Document in a Collection (page 168)
Insert a Document without Specifying an _id Field (page 169)
Bulk Insert Multiple Documents (page 171)
Insert a Document with save() (page 172)
update() Operations with the upsert Flag (page 173)
Insert a Document that Contains field and value Pairs (page 173)
Insert a Document that Contains Update Operator Expressions (page 174)
Update operations with with save() (page 174)
18.1.1 Overview
You can create documents in a MongoDB collection using any of the following basic operations:
insert (page 168)
upsert (page 173)
All insert operations in MongoDB exhibit the following properties:
If you attempt to insert a document without the _id eld, the client library or the mongod (page 989) instance
will add an _id eld and populate the eld with a unique ObjectId.
For operations with write concern (page 140), if you specify an _id eld, the _id eld must be unique within
the collection; otherwise the mongod (page 989) will return a duplicate key exception.
167
MongoDB Documentation, Release 2.4.1
The maximum BSON document size is 16 megabytes.
The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, dur-
ing transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB
provides the GridFS API. See mongofiles (page 1041) and the documentation for your driver (page 493) for
more information about GridFS.
Documents (page 151) have the following restrictions on eld names:
The eld name _id is reserved for use as a primary key; its value must be unique in the collection, is
immutable, and may be of any type other than an array.
The eld names cannot start with the $ character.
The eld names cannot contain the . character.
Note: As of these driver versions (page 1194), all write operations will issue a getLastError (page 847) command
to conrm the result of the write operation:
{ getLastError: 1 }
Refer to the documentation on write concern (page 140) in the Write Operations (page 139) document for more
information.
18.1.2 insert()
The insert() (page 921) is the primary method to insert a document or documents into a MongoDB collection, and
has the following syntax:
db.collection.insert( <document> )
Corresponding Operation in SQL
The insert() (page 921) method is analogous to the INSERT statement.
Insert the First Document in a Collection
If the collection does not exist
1
, then the insert() (page 921) method creates the collection during the rst insert.
Specically in the example, if the collection bios does not exist , then the insert operation will create this collection:
db.bios.insert(
{
_id: 1,
name: { first: John, last: Backus },
birth: new Date(Dec 03, 1924),
death: new Date(Mar 17, 2007),
contribs: [ Fortran, ALGOL, Backus-Naur Form, FP ],
awards: [
{
award: W.W. McDowell Award,
year: 1967,
by: IEEE Computer Society
},
{
1
You can also view a list of the existing collections in the database using the show collections operation in the mongo (page 1002) shell.
168 Chapter 18. CRUD Operations for MongoDB
MongoDB Documentation, Release 2.4.1
award: National Medal of Science,
year: 1975,
by: National Science Foundation
},
{
award: Turing Award,
year: 1977,
by: ACM
},
{
award: Draper Prize,
year: 1993,
by: National Academy of Engineering
}
]
}
)
You can conrm the insert by querying (page 175) the bios collection:
db.bios.find()
This operation returns the following document from the bios collection:
{
"_id" : 1,
"name" : { "first" : "John", "last" : "Backus" },
"birth" : ISODate("1924-12-03T05:00:00Z"),
"death" : ISODate("2007-03-17T04:00:00Z"),
"contribs" : [ "Fortran", "ALGOL", "Backus-Naur Form", "FP" ],
"awards" : [
{
"award" : "W.W. McDowell Award",
"year" : 1967,
"by" : "IEEE Computer Society"
},
{
"award" : "National Medal of Science",
"year" : 1975,
"by" : "National Science Foundation"
},
{
"award" : "Turing Award",
"year" : 1977,
"by" : "ACM"
},
{ "award" : "Draper Prize",
"year" : 1993,
"by" : "National Academy of Engineering"
}
]
}
Insert a Document without Specifying an _id Field
If the new document does not contain an _id eld, then the insert() (page 921) method adds the _id eld to the
document and generates a unique ObjectId for the value:
18.1. Create 169
MongoDB Documentation, Release 2.4.1
db.bios.insert(
{
name: { first: John, last: McCarthy },
birth: new Date(Sep 04, 1927),
death: new Date(Dec 24, 2011),
contribs: [ Lisp, Artificial Intelligence, ALGOL ],
awards: [
{
award: Turing Award,
year: 1971,
by: ACM
},
{
award: Kyoto Prize,
year: 1988,
by: Inamori Foundation
},
{
award: National Medal of Science,
year: 1990,
by: National Science Foundation
}
]
}
)
You can verify the inserted document by the querying the bios collection:
db.bios.find( { name: { first: John, last: McCarthy } } )
The returned document contains an _id eld with the generated ObjectId value:
{
"_id" : ObjectId("50a1880488d113a4ae94a94a"),
"name" : { "first" : "John", "last" : "McCarthy" },
"birth" : ISODate("1927-09-04T04:00:00Z"),
"death" : ISODate("2011-12-24T05:00:00Z"),
"contribs" : [ "Lisp", "Artificial Intelligence", "ALGOL" ],
"awards" : [
{
"award" : "Turing Award",
"year" : 1971,
"by" : "ACM"
},
{
"award" : "Kyoto Prize",
"year" :1988,
"by" : "Inamori Foundation"
},
{
"award" : "National Medal of Science",
"year" : 1990,
"by" : "National Science Foundation"
}
]
}
170 Chapter 18. CRUD Operations for MongoDB
MongoDB Documentation, Release 2.4.1
Bulk Insert Multiple Documents
If you pass an array of documents to the insert() (page 921) method, the insert() (page 921) performs a bulk
insert into a collection.
The following operation inserts three documents into the bios collection. The operation also illustrates the dynamic
schema characteristic of MongoDB. Although the document with _id: 3 contains a eld title which does not
appear in the other documents, MongoDB does not require the other documents to contain this eld:
db.bios.insert(
[
{
_id: 3,
name: { first: Grace, last: Hopper },
title: Rear Admiral,
birth: new Date(Dec 09, 1906),
death: new Date(Jan 01, 1992),
contribs: [ UNIVAC, compiler, FLOW-MATIC, COBOL ],
awards: [
{
award: Computer Sciences Man of the Year,
year: 1969,
by: Data Processing Management Association
},
{
award: Distinguished Fellow,
year: 1973,
by: British Computer Society
},
{
award: W. W. McDowell Award,
year: 1976,
by: IEEE Computer Society
},
{
award: National Medal of Technology,
year: 1991,
by: United States
}
]
},
{
_id: 4,
name: { first: Kristen, last: Nygaard },
birth: new Date(Aug 27, 1926),
death: new Date(Aug 10, 2002),
contribs: [ OOP, Simula ],
awards: [
{
award: Rosing Prize,
year: 1999,
by: Norwegian Data Association
},
{
award: Turing Award,
year: 2001,
by: ACM
},
{
18.1. Create 171
MongoDB Documentation, Release 2.4.1
award: IEEE John von Neumann Medal,
year: 2001,
by: IEEE
}
]
},
{
_id: 5,
name: { first: Ole-Johan, last: Dahl },
birth: new Date(Oct 12, 1931),
death: new Date(Jun 29, 2002),
contribs: [ OOP, Simula ],
awards: [
{
award: Rosing Prize,
year: 1999,
by: Norwegian Data Association
},
{
award: Turing Award,
year: 2001,
by: ACM
},
{
award: IEEE John von Neumann Medal,
year: 2001,
by: IEEE
}
]
}
]
)
Insert a Document with save()
The save() (page 931) method performs an insert if the document to save does not contain the _id eld.
The following save() (page 931) operation performs an insert into the bios collection since the document does not
contain the _id eld:
db.bios.save(
{
name: { first: Guido, last: van Rossum},
birth: new Date(Jan 31, 1956),
contribs: [ Python ],
awards: [
{
award: Award for the Advancement of Free Software,
year: 2001,
by: Free Software Foundation
},
{
award: NLUUG Award,
year: 2003,
by: NLUUG
}
]
172 Chapter 18. CRUD Operations for MongoDB
MongoDB Documentation, Release 2.4.1
}
)
18.1.3 update() Operations with the upsert Flag
The update() (page 933) operation in MongoDB accepts an upsert ag that modies the behavior of
update() (page 933) from updating existing documents (page 185), to inserting data.
These update() (page 933) operations with the upsert ag eliminate the need to perform an additional operation
to check for existence of a record before performing either an update or an insert operation. These update operations
have the use <query> argument to determine the write operation:
If the query matches an existing document(s), the operation is an update (page 185).
If the query matches no document in the collection, the operation is an insert (page 167).
An upsert operation has the following syntax:
db.collection.update( <query>,
<update>,
{ upsert: true } )
Insert a Document that Contains field and value Pairs
If no document matches the <query> argument, the upsert performs an insert. If the <update> argument
includes only eld and value pairs, the new document contains the elds and values specied in the <update>
argument. If query does not include an _id eld, the operation adds the _id eld and generates a unique ObjectId
for its value.
The following update inserts a new document into the bios collection:
db.bios.update(
{ name: { first: Dennis, last: Ritchie} },
{
name: { first: Dennis, last: Ritchie},
birth: new Date(Sep 09, 1941),
death: new Date(Oct 12, 2011),
contribs: [ UNIX, C ],
awards: [
{
award: Turing Award,
year: 1983,
by: ACM
},
{
award: National Medal of Technology,
year: 1998,
by: United States
},
{
award: Japan Prize,
year: 2011,
by: The Japan Prize Foundation
}
]
},
18.1. Create 173
MongoDB Documentation, Release 2.4.1
{ upsert: true }
)
Insert a Document that Contains Update Operator Expressions
If no document matches the <query> argument, the update operation inserts a new document. If the <update>
argument includes only update operators (page 976), the new document contains the elds and values from<query>
argument with the operations from the <update> argument applied.
The following operation inserts a new document into the bios collection:
db.bios.update(
{
_id: 7,
name: { first: Ken, last: Thompson }
},
{
$set: {
birth: new Date(Feb 04, 1943),
contribs: [ UNIX, C, B, UTF-8 ],
awards: [
{
award: Turing Award,
year: 1983,
by: ACM
},
{
award: IEEE Richard W. Hamming Medal,
year: 1990,
by: IEEE
},
{
award: National Medal of Technology,
year: 1998,
by: United States
},
{
award: Tsutomu Kanai Award,
year: 1999,
by: IEEE
},
{
award: Japan Prize,
year: 2011,
by: The Japan Prize Foundation
}
]
}
},
{ upsert: true }
)
Update operations with with save()
The save() (page 931) method is identical to an update operation with the upsert ag (page 173)
174 Chapter 18. CRUD Operations for MongoDB
MongoDB Documentation, Release 2.4.1
performs an upsert if the document to save contains the _id eld. To determine whether to perform an insert or an
update, save() (page 931) method queries documents on the _id eld.
The following operation performs an upsert that inserts a document into the bios collection since no documents in
the collection contains an _id eld with the value 10:
db.bios.save(
{
_id: 10,
name: { first: Yukihiro, aka: Matz, last: Matsumoto},
birth: new Date(Apr 14, 1965),
contribs: [ Ruby ],
awards: [
{
award: Award for the Advancement of Free Software,
year: 2011,
by: Free Software Foundation
}
]
}
)
18.2 Read
Of the four basic database operations (i.e. CRUD), read operation are those that retrieve records or documents from a
collection in MongoDB. For general information about read operations and the factors that affect their performance,
see Read Operations (page 127); for documentation of the other CRUD operations, see the Core MongoDB Operations
(CRUD) (page 125) page.
18.2. Read 175
MongoDB Documentation, Release 2.4.1
Overview (page 176)
find() (page 176)
Return All Documents in a Collection (page 177)
Return Documents that Match Query Conditions (page 178)
*
Equality Matches (page 178)
*
Using Operators (page 178)
*
On Arrays (page 178)
Query an Element (page 178)
Query Multiple Fields on an Array of Documents (page 178)
*
On Subdocuments (page 179)
Exact Matches (page 179)
Fields of a Subdocument (page 179)
*
Logical Operators (page 179)
OR Disjunctions (page 179)
AND Conjunctions (page 180)
With a Projection (page 180)
*
Specify the Fields to Return (page 180)
*
Explicitly Exclude the _id Field (page 180)
*
Return All but the Excluded Fields (page 181)
*
On Arrays and Subdocuments (page 181)
Iterate the Returned Cursor (page 181)
*
With Variable Name (page 182)
*
With next() Method (page 182)
*
With forEach() Method (page 182)
Modify the Cursor Behavior (page 182)
*
Order Documents in the Result Set (page 183)
*
Limit the Number of Documents to Return (page 183)
*
Set the Starting Point of the Result Set (page 183)
*
Combine Cursor Methods (page 183)
findOne() (page 183)
With Empty Query Specication (page 184)
With a Query Specication (page 184)
With a Projection (page 184)
*
Specify the Fields to Return (page 184)
*
Return All but the Excluded Fields (page 184)
Access the findOne Result (page 185)
18.2.1 Overview
You can retrieve documents from MongoDB using either of the following methods:
nd (page 176)
ndOne (page 183)
18.2.2 find()
The find() (page 910) method is the primary method to select documents froma collection. The find() (page 910)
method returns a cursor that contains a number of documents. Most drivers (page 493) provide application developers
with a native iterable interface for handling cursors and accessing documents. The find() (page 910) method has
the following syntax:
176 Chapter 18. CRUD Operations for MongoDB
MongoDB Documentation, Release 2.4.1
db.collection.find( <query>, <projection> )
Corresponding Operation in SQL
The find() (page 910) method is analogous to the SELECT statement, while:
the <query> argument corresponds to the WHERE statement, and
the <projection> argument corresponds to the list of elds to select from the result set.
The examples refer to a collection named bios that contains documents with the following prototype:
{
"_id" : 1,
"name" : {
"first" : "John",
"last" :"Backus"
},
"birth" : ISODate("1924-12-03T05:00:00Z"),
"death" : ISODate("2007-03-17T04:00:00Z"),
"contribs" : [ "Fortran", "ALGOL", "Backus-Naur Form", "FP" ],
"awards" : [
{
"award" : "W.W. McDowellAward",
"year" : 1967,
"by" : "IEEE Computer Society"
},
{
"award" : "National Medal of Science",
"year" : 1975,
"by" : "National Science Foundation"
},
{
"award" : "Turing Award",
"year" : 1977,
"by" : "ACM"
},
{
"award" : "Draper Prize",
"year" : 1993,
"by" : "National Academy of Engineering"
}
]
}
Note: In the mongo (page 1002) shell, you can format the output by adding .pretty() to the find() (page 910)
method call.
Return All Documents in a Collection
If there is no <query> argument, the :method: ~db.collection.nd() method selects all documents from a collection.
The following operation returns all documents (or more precisely, a cursor to all documents) in the bios collection:
db.bios.find()
18.2. Read 177
MongoDB Documentation, Release 2.4.1
Return Documents that Match Query Conditions
If there is a <query> argument, the find() (page 910) method selects all documents from a collection that satises
the query specication.
Equality Matches
The following operation returns a cursor to documents in the bios collection where the eld _id equals 5:
db.bios.find(
{
_id: 5
}
)
Using Operators
The following operation returns a cursor to all documents in the bios collection where the eld _id equals 5 or
ObjectId("507c35dd8fada716c89d0013"):
db.bios.find(
{
_id: { $in: [ 5, ObjectId("507c35dd8fada716c89d0013") ] }
}
)
On Arrays
Query an Element The following operation returns a cursor to all documents in the bios collection where the
array eld contribs contains the element UNIX:
db.bios.find(
{
contribs: UNIX
}
)
Query Multiple Fields on an Array of Documents The following operation returns a cursor to all documents in
the bios collection where awards array contains a subdocument element that contains the award eld equal to
Turing Award and the year eld greater than 1980:
db.bios.find(
{
awards: {
$elemMatch: {
award: Turing Award,
year: { $gt: 1980 }
}
}
}
)
178 Chapter 18. CRUD Operations for MongoDB
MongoDB Documentation, Release 2.4.1
On Subdocuments
Exact Matches The following operation returns a cursor to all documents in the bios collection where the subdoc-
ument name is exactly { first: Yukihiro, last: Matsumoto }, including the order:
db.bios.find(
{
name: {
first: Yukihiro,
last: Matsumoto
}
}
)
The name eld must match the sub-document exactly, including order. For instance, the query would not match
documents with name elds that held either of the following values:
{
first: Yukihiro,
aka: Matz,
last: Matsumoto
}
{
last: Matsumoto,
first: Yukihiro
}
Fields of a Subdocument The following operation returns a cursor to all documents in the bios collection where
the subdocument name contains a eld first with the value Yukihiro and a eld last with the value
Matsumoto; the query uses dot notation to access elds in a subdocument:
db.bios.find(
{
name.first: Yukihiro,
name.last: Matsumoto
}
)
The query matches the document where the name eld contains a subdocument with the eld first with the value
Yukihiro and a eld last with the value Matsumoto. For instance, the query would match documents
with name elds that held either of the following values:
{
first: Yukihiro,
aka: Matz,
last: Matsumoto
}
{
last: Matsumoto,
first: Yukihiro
}
Logical Operators
18.2. Read 179
MongoDB Documentation, Release 2.4.1
OR Disjunctions The following operation returns a cursor to all documents in the bios collection where either
the eld first in the sub-document name starts with the letter G or where the eld birth is less than new
Date(01/01/1945):
db.bios.find(
{ $or: [
{ name.first : /^G/ },
{ birth: { $lt: new Date(01/01/1945) } }
]
}
)
AND Conjunctions The following operation returns a cursor to all documents in the bios collection where the eld
first in the subdocument name starts with the letter K and the array eld contribs contains the element UNIX:
db.bios.find(
{
name.first: /^K/,
contribs: UNIX
}
)
In this query, the parameters (i.e. the selections of both elds) combine using an implicit logical AND for criteria on
different elds contribs and name.first. For multiple AND criteria on the same eld, use the $and (page 756)
operator.
With a Projection
If there is a <projection> argument, the find() (page 910) method returns only those elds as specied in the
<projection> argument to include or exclude:
Note: The _id eld is implicitly included in the <projection> argument. In projections that explicitly include
elds, _id is the only eld that you can explicitly exclude. Otherwise, you cannot mix include eld and exclude eld
specications.
Specify the Fields to Return
The following operation nds all documents in the bios collection and returns only the name eld, the contribs
eld, and the _id eld:
db.bios.find(
{ },
{ name: 1, contribs: 1 }
)
Explicitly Exclude the _id Field
The following operation nds all documents in the bios collection and returns only the name eld and the
contribs eld:
180 Chapter 18. CRUD Operations for MongoDB
MongoDB Documentation, Release 2.4.1
db.bios.find(
{ },
{ name: 1, contribs: 1, _id: 0 }
)
Return All but the Excluded Fields
The following operation nds the documents in the bios collection where the contribs eld contains the element
OOP and returns all elds except the _id eld, the first eld in the name subdocument, and the birth eld
from the matching documents:
db.bios.find(
{ contribs: OOP },
{ _id: 0, name.first: 0, birth: 0 }
)
On Arrays and Subdocuments
The following operation nds all documents in the bios collection and returns the last eld in the name subdocu-
ment and the rst two elements in the contribs array:
db.bios.find(
{ },
{
_id: 0,
name.last: 1,
contribs: { $slice: 2 }
}
)
See Also:
dot notation for information on reaching into embedded sub-documents.
Arrays (page 130) for more examples on accessing arrays.
Subdocuments (page 129) for more examples on accessing subdocuments.
$elemMatch query operator for more information on matching array elements.
$elemMatch (page 793) projection operator for additional information on restricting array elements to return.
Iterate the Returned Cursor
The find() (page 910) method returns a cursor to the results; however, in the mongo (page 1002) shell, if the
returned cursor is not assigned to a variable, then the cursor is automatically iterated up to 20 times
2
to print up to the
rst 20 documents that match the query, as in the following example:
db.bios.find( { _id: 1 } );
2
You can use the DBQuery.shellBatchSize to change the number of iteration from the default value 20. See Cursor Flags (page 138)
and Cursor Behaviors (page 137) for more information.
18.2. Read 181
MongoDB Documentation, Release 2.4.1
With Variable Name
When you assign the find() (page 910) to a variable, you can type the name of the cursor variable to iterate up to
20 times
1
and print the matching documents, as in the following example:
var myCursor = db.bios.find( { _id: 1 } );
myCursor
With next() Method
You can use the cursor method next() (page 899) to access the documents, as in the following example:
var myCursor = db.bios.find( { _id: 1 } );
var myDocument = myCursor.hasNext() ? myCursor.next() : null;
if (myDocument) {
var myName = myDocument.name;
print (tojson(myName));
}
To print, you can also use the printjson() method instead of print(tojson()):
if (myDocument) {
var myName = myDocument.name;
printjson(myName);
}
With forEach() Method
You can use the cursor method forEach() (page 894) to iterate the cursor and access the documents, as in the
following example:
var myCursor = db.bios.find( { _id: 1 } );
myCursor.forEach(printjson);
For more information on cursor handling, see:
cursor.hasNext() (page 894)
cursor.next() (page 899)
cursor.forEach() (page 894)
cursors (page 135)
JavaScript cursor methods (page 982)
Modify the Cursor Behavior
In addition to the <query> and the <projection> arguments, the mongo (page 1002) shell and the drivers
(page 493) provide several cursor methods that you can call on the cursor returned by find() (page 910) method to
modify its behavior, such as:
182 Chapter 18. CRUD Operations for MongoDB
MongoDB Documentation, Release 2.4.1
Order Documents in the Result Set
The sort() (page 901) method orders the documents in the result set.
The following operation returns all documents (or more precisely, a cursor to all documents) in the bios collection
ordered by the name eld ascending:
db.bios.find().sort( { name: 1 } )
sort() (page 901) corresponds to the ORDER BY statement in SQL.
Limit the Number of Documents to Return
The limit() (page 895) method limits the number of documents in the result set.
The following operation returns at most 5 documents (or more precisely, a cursor to at most 5 documents) in the bios
collection:
db.bios.find().limit( 5 )
limit() (page 895) corresponds to the LIMIT statement in SQL.
Set the Starting Point of the Result Set
The skip() (page 900) method controls the starting point of the results set.
The following operation returns all documents, skipping the rst 5 documents in the bios collection:
db.bios.find().skip( 5 )
Combine Cursor Methods
You can chain these cursor methods, as in the following examples
3
:
db.bios.find().sort( { name: 1 } ).limit( 5 )
db.bios.find().limit( 5 ).sort( { name: 1 } )
See the JavaScript cursor methods (page 982) reference and your driver (page 493) documentation for additional
references. See Cursors (page 135) for more information regarding cursors.
18.2.3 findOne()
The findOne() (page 915) method selects and returns a single document from a collection and returns that docu-
ment. findOne() (page 915) does not return a cursor.
The findOne() (page 915) method has the following syntax:
db.collection.findOne( <query>, <projection> )
Except for the return value, findOne() (page 915) method is quite similar to the find() (page 910) method; in
fact, internally, the findOne() (page 915) method is the find() (page 910) method with a limit of 1.
3
Regardless of the order you chain the limit() (page 895) and the sort() (page 901), the request to the server has the structure that treats
the query and the :method: ~cursor.sort() modier as a single object. Therefore, the limit() (page 895) operation method is always applied
after the sort() (page 901) regardless of the specied order of the operations in the chain. See the meta query operators (page 977) for more
information.
18.2. Read 183
MongoDB Documentation, Release 2.4.1
With Empty Query Specication
If there is no <query> argument, the findOne() (page 915) method selects just one document from a collection.
The following operation returns a single document from the bios collection:
db.bios.findOne()
With a Query Specication
If there is a <query> argument, the findOne() (page 915) method selects the rst document from a collection that
meets the <query> argument:
The following operation returns the rst matching document from the bios collection where either the eld first in
the subdocument name starts with the letter G or where the eld birth is less than new Date(01/01/1945):
db.bios.findOne(
{
$or: [
{ name.first : /^G/ },
{ birth: { $lt: new Date(01/01/1945) } }
]
}
)
With a Projection
You can pass a <projection> argument to findOne() (page 915) to control the elds included in the result set.
Specify the Fields to Return
The following operation nds a document in the bios collection and returns only the name eld, the contribs
eld, and the _id eld:
db.bios.findOne(
{ },
{ name: 1, contribs: 1 }
)
Return All but the Excluded Fields
The following operation returns a document in the bios collection where the contribs eld contains the element
OOP and returns all elds except the _id eld, the first eld in the name subdocument, and the birth eld from
the matching documents:
db.bios.findOne(
{ contribs: OOP },
{ _id: 0, name.first: 0, birth: 0 }
)
184 Chapter 18. CRUD Operations for MongoDB
MongoDB Documentation, Release 2.4.1
Access the findOne Result
Although similar to the find() (page 910) method, because the findOne() (page 915) method returns a document
rather than a cursor, you cannot apply the cursor methods such as limit() (page 895), sort() (page 901), and
skip() (page 900) to the result of the findOne() (page 915) method. However, you can access the document
directly, as in the example:
var myDocument = db.bios.findOne();
if (myDocument) {
var myName = myDocument.name;
print (tojson(myName));
}
18.3 Update
Of the four basic database operations (i.e. CRUD), update operations are those that modify existing records or doc-
uments in a MongoDB collection. For general information about write operations and the factors that affect their
performance, see Write Operations (page 139); for documentation of other CRUD operations, see the Core MongoDB
Operations (CRUD) (page 125) page.
Overview (page 185)
Update (page 186)
Modify with Update Operators (page 186)
*
Update a Field in a Document (page 186)
*
Add a New Field to a Document (page 187)
*
Remove a Field from a Document (page 187)
*
Update Arrays (page 187)
Update an Element by Specifying Its Position (page 187)
Update an Element without Specifying Its Position (page 187)
Update a Document Element without Specifying Its Position (page 188)
Add an Element to an Array (page 188)
*
Update Multiple Documents (page 188)
Replace Existing Document with New Document (page 188)
update() Operations with the upsert Flag (page 189)
Save (page 190)
Save Performs an Update (page 190)
Update Operators (page 190)
Fields (page 190)
Array (page 190)
Bitwise (page 191)
Isolation (page 191)
18.3.1 Overview
Update operation modies an existing document or documents in a collection. MongoDB provides the following
methods to perform update operations:
update (page 186)
18.3. Update 185
MongoDB Documentation, Release 2.4.1
save (page 190)
Note: Consider the following behaviors of MongoDBs update operations.
When performing update operations that increase the document size beyond the allocated space for that docu-
ment, the update operation relocates the document on disk and may reorder the document elds depending on
the type of update.
As of these driver versions (page 1194), all write operations will issue a getLastError (page 847) command
to conrm the result of the write operation:
{ getLastError: 1 }
Refer to the documentation on write concern (page 140) in the Write Operations (page 139) document for more
information.
18.3.2 Update
The update() (page 933) method is the primary method used to modify documents in a MongoDB collection. By
default, the update() (page 933) method updates a single document, but by using the multi option, update()
(page 933) can update all documents that match the query criteria in the collection. The update() (page 933) method
can either replace the existing document with the new document or update specic elds in the existing document.
The update() (page 933) has the following syntax:
db.collection.update( <query>, <update>, <options> )
Corresponding operation in SQL
The update() (page 933) method corresponds to the UPDATE operation in SQL, and:
the <query> argument corresponds to the WHERE statement, and
the <update> corresponds to the SET ... statement.
The default behavior of the update() (page 933) method updates a single document and would correspond to
the SQL UPDATE statement with the LIMIT 1. With the multi option, update() (page 933) method would
correspond to the SQL UPDATE statement without the LIMIT clause.
Modify with Update Operators
If the <update> argument contains only update operator (page 190) expressions such as the $set (page 785)
operator expression, the update() (page 933) method updates the corresponding elds in the document. To update
elds in subdocuments, MongoDB uses dot notation.
Update a Field in a Document
Use $set (page 785) to update a value of a eld.
The following operation queries the bios collection for the rst document that has an _id eld equal to 1 and sets
the value of the eld middle, in the subdocument name, to Warner:
186 Chapter 18. CRUD Operations for MongoDB
MongoDB Documentation, Release 2.4.1
db.bios.update(
{ _id: 1 },
{
$set: { name.middle: Warner },
}
)
Add a New Field to a Document
If the <update> argument contains elds not currently in the document, the update() (page 933) method adds the
new elds to the document.
The following operation queries the bios collection for the rst document that has an _id eld equal to 3 and adds
to that document a new mbranch eld and a new aka eld in the subdocument name:
db.bios.update(
{ _id: 3 },
{ $set: {
mbranch: Navy,
name.aka: Amazing Grace
}
}
)
Remove a Field from a Document
If the <update> argument contains $unset (page 791) operator, the update() (page 933) method removes the
eld from the document.
The following operation queries the bios collection for the rst document that has an _id eld equal to 3 and
removes the birth eld from the document:
db.bios.update(
{ _id: 3 },
{ $unset: { birth: 1 } }
)
Update Arrays
Update an Element by Specifying Its Position If the update operation requires an update of an element in an array
eld, the update() (page 933) method can perform the update using the position of the element and dot notation.
Arrays in MongoDB are zero-based.
The following operation queries the bios collection for the rst document with _id eld equal to 1 and updates the
second element in the contribs array:
db.bios.update(
{ _id: 1 },
{ $set: { contribs.1: ALGOL 58 } }
)
Update an Element without Specifying Its Position The update() (page 933) method can perform the update
using the $ (page 778) positional operator if the position is not known. The array eld must appear in the query
argument in order to determine which array element to update.
18.3. Update 187
MongoDB Documentation, Release 2.4.1
The following operation queries the bios collection for the rst document where the _id eld equals 3 and the
contribs array contains an element equal to compiler. If found, the update() (page 933) method updates the
rst matching element in the array to A compiler in the document:
db.bios.update(
{ _id: 3, contribs: compiler },
{ $set: { contribs.$: A compiler } }
)
Update a Document Element without Specifying Its Position The update() (page 933) method can perform
the update of an array that contains subdocuments by using the positional operator (i.e. $ (page 778)) and the dot
notation.
The following operation queries the bios collection for the rst document where the _id eld equals 6 and the
awards array contains a subdocument element with the by eld equal to ACM. If found, the update() (page 933)
method updates the by eld in the rst matching subdocument:
db.bios.update(
{ _id: 6, awards.by: ACM } ,
{ $set: { awards.$.by: Association for Computing Machinery } }
)
Add an Element to an Array The following operation queries the bios collection for the rst document that has
an _id eld equal to 1 and adds a new element to the awards eld:
db.bios.update(
{ _id: 1 },
{
$push: { awards: { award: IBM Fellow, year: 1963, by: IBM } }
}
)
Update Multiple Documents
If the <options> argument contains the multi option set to true or 1, the update() (page 933) method updates
all documents that match the query.
The following operation queries the bios collection for all documents where the awards eld contains a subdocu-
ment element with the award eld equal to Turing and sets the turing eld to true in the matching documents:
db.bios.update(
{ awards.award: Turing },
{ $set: { turing: true } },
{ multi: true }
)
Replace Existing Document with New Document
If the <update> argument contains only eld and value pairs, the update() (page 933) method replaces the
existing document with the document in the <update> argument, except for the _id eld.
The following operation queries the bios collection for the rst document that has a name eld equal to { first:
John, last: McCarthy } and replaces all but the _id eld in the document with the elds in the
<update> argument:
188 Chapter 18. CRUD Operations for MongoDB
MongoDB Documentation, Release 2.4.1
db.bios.update(
{ name: { first: John, last: McCarthy } },
{ name: { first: Ken, last: Iverson },
born: new Date(Dec 17, 1941),
died: new Date(Oct 19, 2004),
contribs: [ APL, J ],
awards: [
{ award: Turing Award,
year: 1979,
by: ACM },
{ award: Harry H. Goode Memorial Award,
year: 1975,
by: IEEE Computer Society },
{ award: IBM Fellow,
year: 1970,
by: IBM }
]
}
)
18.3.3 update() Operations with the upsert Flag
If you set the upsert option in the <options> argument to true or 1 and no existing document match the
<query> argument, the update() (page 933) method can insert a new document into the collection.
The following operation queries the bios collection for a document with the _id eld equal to 11 and the name
eld equal to { first: James, last: Gosling}. If the query selects a document, the operation
performs an update operation. If a document is not found, update() (page 933) inserts a new document containing
the elds and values from <query> argument with the operations from the <update> argument applied.
4
db.bios.update(
{ _id:11, name: { first: James, last: Gosling } },
{
$set: {
born: new Date(May 19, 1955),
contribs: [ Java ],
awards: [
{
award: The Economist Innovation Award,
year: 2002,
by: The Economist
},
{
award: Officer of the Order of Canada,
year: 2007,
by: Canada
}
]
}
},
{ upsert: true }
)
See also Update Operations with the Upsert Flag (page 173) in the Create (page 167) document.
4
If the <update> argument includes only eld and value pairs, the new document contains the elds and values specied in the <update>
argument. If the <update> argument includes only update operators (page 190), the new document contains the elds and values from<query>
argument with the operations from the <update> argument applied.
18.3. Update 189
MongoDB Documentation, Release 2.4.1
18.3.4 Save
The save() (page 931) method updates an existing document or inserts a document depending on the _id eld of
the document. The save() (page 931) method is equivalent to the update() (page 933) method with the upsert
option and a <query> argument on the _id eld.
The save() (page 931) method has the following syntax:
db.collection.save( <document> )
Save Performs an Update
If the <document> argument contains the _id eld that exists in the collection, the save() (page 931) method
performs an update that replaces the existing document with the <document> argument.
The following operation queries the bios collection for a document where the _id equals
ObjectId("507c4e138fada716c89d0014") and replaces the document with the <document> argu-
ment:
db.bios.save(
{
_id: ObjectId("507c4e138fada716c89d0014"),
name: { first: Martin, last: Odersky },
contribs: [ Scala ]
}
)
See Also:
Insert a Document with save() (page 172) and Update operations with with save() (page 174) in the Create (page 167)
section.
18.3.5 Update Operators
Fields
$inc (page 766)
$rename (page 782)
$set (page 785)
$unset (page 791)
Array
$ (page 778)
$addToSet (page 755)
$pop (page 778)
$pullAll (page 779)
$pull (page 779)
$pushAll (page 781)
$push (page 779)
190 Chapter 18. CRUD Operations for MongoDB
MongoDB Documentation, Release 2.4.1
$each (page 759) modier
$slice (page 787) modier
$sort (page 788) modier
Bitwise
$bit (page 757)
Isolation
$isolated (page 766)
18.4 Delete
Of the four basic database operations (i.e. CRUD), delete operations are those that remove documents froma collection
in MongoDB.
For general information about write operations and the factors that affect their performance, see Write Operations
(page 139); for documentation of other CRUD operations, see the Core MongoDB Operations (CRUD) (page 125)
page.
Overview (page 191)
Remove All Documents that Match a Condition (page 192)
Remove a Single Document that Matches a Condition (page 192)
Remove All Documents from a Collection (page 192)
Capped Collection (page 192)
Isolation (page 193)
18.4.1 Overview
The remove() (page 191) method in the mongo (page 1002) shell provides this operation, as do corresponding methods
in the drivers (page 493).
Note: As of these driver versions (page 1194), all write operations will issue a getLastError (page 847) command
to conrm the result of the write operation:
{ getLastError: 1 }
Refer to the documentation on write concern (page 140) in the Write Operations (page 139) document for more
information.
Use the remove() (page 929) method to delete documents from a collection. The remove() (page 929) method
has the following syntax:
db.collection.remove( <query>, <justOne> )
18.4. Delete 191
MongoDB Documentation, Release 2.4.1
Corresponding operation in SQL
The remove() (page 929) method is analogous to the DELETE statement, and:
the <query> argument corresponds to the WHERE statement, and
the <justOne> argument takes a Boolean and has the same affect as LIMIT 1.
remove() (page 929) deletes documents from the collection. If you do not specify a query, remove() (page 929)
removes all documents from a collection, but does not remove the indexes.
5
Note: For large deletion operations, it may be more efcient to copy the documents that you want to keep to a new
collection and then use drop() (page 907) on the original collection.
18.4.2 Remove All Documents that Match a Condition
If there is a <query> argument, the remove() (page 929) method deletes from the collection all documents that
match the argument.
The following operation deletes all documents from the bios collection where the subdocument name contains a
eld first whose value starts with G:
db.bios.remove( { name.first : /^G/ } )
18.4.3 Remove a Single Document that Matches a Condition
If there is a <query> argument and you specify the <justOne> argument as true or 1, remove() (page 929)
only deletes a single document from the collection that matches the query.
The following operation deletes a single document from the bios collection where the turing eld equals true:
db.bios.remove( { turing: true }, 1 )
18.4.4 Remove All Documents from a Collection
If there is no <query> argument, the remove() (page 929) method deletes all documents from a collection. The
following operation deletes all documents from the bios collection:
db.bios.remove()
Note: This operation is not equivalent to the drop() (page 907) method.
18.4.5 Capped Collection
You cannot use the remove() (page 929) method with a capped collection.
5
To remove all documents from a collection, it may be more efcient to use the drop() (page 907) method to drop the entire collection,
including the indexes, and then recreate the collection and rebuild the indexes.
192 Chapter 18. CRUD Operations for MongoDB
MongoDB Documentation, Release 2.4.1
18.4.6 Isolation
If the <query> argument to the remove() (page 929) method matches multiple documents in the collection, the
delete operation may interleave with other write operations to that collection. For an unsharded collection, you have
the option to override this behavior with the $isolated (page 766) isolation operator, effectively isolating the delete
operation from other write operations. To isolate the operation, include $isolated: 1 in the <query> parameter
as in the following example:
db.bios.remove( { turing: true, $isolated: 1 } )
18.4. Delete 193
MongoDB Documentation, Release 2.4.1
194 Chapter 18. CRUD Operations for MongoDB
CHAPTER 19
Data Modeling Patterns
19.1 Model Embedded One-to-One Relationships Between Docu-
ments
19.1.1 Overview
Data in MongoDB has a exible schema. Collections do not enforce document structure. Decisions that affect how
you model data can affect application performance and database capacity. See Data Modeling Considerations for
MongoDB Applications (page 147) for a full high level overview of data modeling in MongoDB.
This document describes a data model that uses embedded (page 148) documents to describe relationships between
connected data.
19.1.2 Pattern
Consider the following example that maps patron and address relationships. The example illustrates the advantage of
embedding over referencing if you need to view one data entity in context of the other. In this one-to-one relationship
between patron and address data, the address belongs to the patron.
In the normalized data model, the address contains a reference to the parent.
{
_id: "joe",
name: "Joe Bookreader"
}
{
patron_id: "joe",
street: "123 Fake Street",
city: "Faketon",
state: "MA"
zip: 12345
}
If the address data is frequently retrieved with the name information, then with referencing, your application needs
to issue multiple queries to resolve the reference. The better data model would be to embed the address data in the
patron data, as in the following document:
195
MongoDB Documentation, Release 2.4.1
{
_id: "joe",
name: "Joe Bookreader",
address: {
street: "123 Fake Street",
city: "Faketon",
state: "MA"
zip: 12345
}
}
With the embedded data model, your application can retrieve the complete patron information with one query.
19.2 Model Embedded One-to-Many Relationships Between Docu-
ments
19.2.1 Overview
Data in MongoDB has a exible schema. Collections do not enforce document structure. Decisions that affect how
you model data can affect application performance and database capacity. See Data Modeling Considerations for
MongoDB Applications (page 147) for a full high level overview of data modeling in MongoDB.
This document describes a data model that uses embedded (page 148) documents to describe relationships between
connected data.
19.2.2 Pattern
Consider the following example that maps patron and multiple address relationships. The example illustrates the
advantage of embedding over referencing if you need to view many data entities in context of another. In this one-to-
many relationship between patron and address data, the patron has multiple address entities.
In the normalized data model, the address contains a reference to the parent.
{
_id: "joe",
name: "Joe Bookreader"
}
{
patron_id: "joe",
street: "123 Fake Street",
city: "Faketon",
state: "MA",
zip: 12345
}
{
patron_id: "joe",
street: "1 Some Other Street",
city: "Boston",
state: "MA",
zip: 12345
}
196 Chapter 19. Data Modeling Patterns
MongoDB Documentation, Release 2.4.1
If your application frequently retrieves the address data with the name information, then your application needs
to issue multiple queries to resolve the references. A more optimal schema would be to embed the address data
entities in the patron data, as in the following document:
{
_id: "joe",
name: "Joe Bookreader",
addresses: [
{
street: "123 Fake Street",
city: "Faketon",
state: "MA",
zip: 12345
},
{
street: "1 Some Other Street",
city: "Boston",
state: "MA",
zip: 12345
}
]
}
With the embedded data model, your application can retrieve the complete patron information with one query.
19.3 Model Referenced One-to-Many Relationships Between Docu-
ments
19.3.1 Overview
Data in MongoDB has a exible schema. Collections do not enforce document structure. Decisions that affect how
you model data can affect application performance and database capacity. See Data Modeling Considerations for
MongoDB Applications (page 147) for a full high level overview of data modeling in MongoDB.
This document describes a data model that uses references (page 148) between documents to describe relationships
between connected data.
19.3.2 Pattern
Consider the following example that maps publisher and book relationships. The example illustrates the advantage of
referencing over embedding to avoid repetition of the publisher information.
Embedding the publisher document inside the book document would lead to repetition of the publisher data, as the
following documents show:
{
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher: {
name: "OReilly Media",
founded: 1980,
location: "CA"
19.3. Model Referenced One-to-Many Relationships Between Documents 197
MongoDB Documentation, Release 2.4.1
}
}
{
title: "50 Tips and Tricks for MongoDB Developer",
author: "Kristina Chodorow",
published_date: ISODate("2011-05-06"),
pages: 68,
language: "English",
publisher: {
name: "OReilly Media",
founded: 1980,
location: "CA"
}
}
To avoid repetition of the publisher data, use references and keep the publisher information in a separate collection
from the book collection.
When using references, the growth of the relationships determine where to store the reference. If the number of books
per publisher is small with limited growth, storing the book reference inside the publisher document may sometimes
be useful. Otherwise, if the number of books per publisher is unbounded, this data model would lead to mutable,
growing arrays, as in the following example:
{
name: "OReilly Media",
founded: 1980,
location: "CA",
books: [12346789, 234567890, ...]
}
{
_id: 123456789,
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}
{
_id: 234567890,
title: "50 Tips and Tricks for MongoDB Developer",
author: "Kristina Chodorow",
published_date: ISODate("2011-05-06"),
pages: 68,
language: "English"
}
To avoid mutable, growing arrays, store the publisher reference inside the book document:
{
_id: "oreilly",
name: "OReilly Media",
founded: 1980,
location: "CA"
}
{
198 Chapter 19. Data Modeling Patterns
MongoDB Documentation, Release 2.4.1
_id: 123456789,
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher_id: "oreilly"
}
{
_id: 234567890,
title: "50 Tips and Tricks for MongoDB Developer",
author: "Kristina Chodorow",
published_date: ISODate("2011-05-06"),
pages: 68,
language: "English",
publisher_id: "oreilly"
}
19.4 Model Data for Atomic Operations
19.4.1 Pattern
Consider the following example that keeps a library book and its checkout information. The example illustrates how
embedding elds related to an atomic update within the same document ensures that the elds are in sync.
Consider the following book document that stores the number of available copies for checkout and the current check-
out information:
book = {
_id: 123456789,
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher_id: "oreilly",
available: 3,
checkout: [ { by: "joe", date: ISODate("2012-10-15") } ]
}
You can use the db.collection.findAndModify() (page 912) method to atomically determine if a book is
available for checkout and update with the new checkout information. Embedding the available eld and the
checkout eld within the same document ensures that the updates to these elds are in sync:
db.books.findAndModify ( {
query: {
_id: 123456789,
available: { $gt: 0 }
},
update: {
$inc: { available: -1 },
$push: { checkout: { by: "abc", date: new Date() } }
}
} )
19.4. Model Data for Atomic Operations 199
MongoDB Documentation, Release 2.4.1
19.5 Model Tree Structures with Parent References
19.5.1 Overview
Data in MongoDB has a exible schema. Collections do not enforce document structure. Decisions that affect how
you model data can affect application performance and database capacity. See Data Modeling Considerations for
MongoDB Applications (page 147) for a full high level overview of data modeling in MongoDB.
This document describes a data model that describes a tree-like structure in MongoDBdocuments by storing references
(page 148) to parent nodes in children nodes.
19.5.2 Pattern
The Parent References pattern stores each tree node in a document; in addition to the tree node, the document stores
the id of the nodes parent.
Consider the following example that models a tree of categories using Parent References:
db.categories.insert( { _id: "MongoDB", parent: "Databases" } )
db.categories.insert( { _id: "Postgres", parent: "Databases" } )
db.categories.insert( { _id: "Databases", parent: "Programming" } )
db.categories.insert( { _id: "Languages", parent: "Programming" } )
db.categories.insert( { _id: "Programming", parent: "Books" } )
db.categories.insert( { _id: "Books", parent: null } )
The query to retrieve the parent of a node is fast and straightforward:
db.categories.findOne( { _id: "MongoDB" } ).parent
You can create an index on the eld parent to enable fast search by the parent node:
db.categories.ensureIndex( { parent: 1 } )
You can query by the parent eld to nd its immediate children nodes:
db.categories.find( { parent: "Databases" } )
The Parent Links pattern provides a simple solution to tree storage, but requires multiple queries to retrieve subtrees.
19.6 Model Tree Structures with Child References
19.6.1 Overview
Data in MongoDB has a exible schema. Collections do not enforce document structure. Decisions that affect how
you model data can affect application performance and database capacity. See Data Modeling Considerations for
MongoDB Applications (page 147) for a full high level overview of data modeling in MongoDB.
This document describes a data model that describes a tree-like structure in MongoDBdocuments by storing references
(page 148) in the parent-nodes to children nodes.
19.6.2 Pattern
The Child References pattern stores each tree node in a document; in addition to the tree node, document stores in an
array the id(s) of the nodes children.
200 Chapter 19. Data Modeling Patterns
MongoDB Documentation, Release 2.4.1
Consider the following example that models a tree of categories using Child References:
db.categories.insert( { _id: "MongoDB", children: [] } )
db.categories.insert( { _id: "Postgres", children: [] } )
db.categories.insert( { _id: "Databases", children: [ "MongoDB", "Postgres" ] } )
db.categories.insert( { _id: "Languages", children: [] } )
db.categories.insert( { _id: "Programming", children: [ "Databases", "Languages" ] } )
db.categories.insert( { _id: "Books", children: [ "Programming" ] } )
The query to retrieve the immediate children of a node is fast and straightforward:
db.categories.findOne( { _id: "Databases" } ).children
You can create an index on the eld children to enable fast search by the child nodes:
db.categories.ensureIndex( { children: 1 } )
You can query for a node in the children eld to nd its parent node as well as its siblings:
db.categories.find( { children: "MongoDB" } )
The Child References pattern provides a suitable solution to tree storage as long as no operations on subtrees are
necessary. This pattern may also provide a suitable solution for storing graphs where a node may have multiple
parents.
19.7 Model Tree Structures with an Array of Ancestors
19.7.1 Overview
Data in MongoDB has a exible schema. Collections do not enforce document structure. Decisions that affect how
you model data can affect application performance and database capacity. See Data Modeling Considerations for
MongoDB Applications (page 147) for a full high level overview of data modeling in MongoDB.
This document describes a data model that describes a tree-like structure in MongoDB documents using references
(page 148) to parent nodes and an array that stores all ancestors.
19.7.2 Pattern
The Array of Ancestors pattern stores each tree node in a document; in addition to the tree node, document stores in
an array the id(s) of the nodes ancestors or path.
Consider the following example that models a tree of categories using Array of Ancestors:
db.categories.insert( { _id: "MongoDB", ancestors: [ "Books", "Programming", "Databases" ], parent: "Databases" } )
db.categories.insert( { _id: "Postgres", ancestors: [ "Books", "Programming", "Databases" ], parent: "Databases" } )
db.categories.insert( { _id: "Databases", ancestors: [ "Books", "Programming" ], parent: "Programming" } )
db.categories.insert( { _id: "Languages", ancestors: [ "Books", "Programming" ], parent: "Programming" } )
db.categories.insert( { _id: "Programming", ancestors: [ "Books" ], parent: "Books" } )
db.categories.insert( { _id: "Books", ancestors: [ ], parent: null } )
The query to retrieve the ancestors or path of a node is fast and straightforward:
db.categories.findOne( { _id: "MongoDB" } ).ancestors
You can create an index on the eld ancestors to enable fast search by the ancestors nodes:
19.7. Model Tree Structures with an Array of Ancestors 201
MongoDB Documentation, Release 2.4.1
db.categories.ensureIndex( { ancestors: 1 } )
You can query by the ancestors to nd all its descendants:
db.categories.find( { ancestors: "Programming" } )
The Array of Ancestors pattern provides a fast and efcient solution to nd the descendants and the ancestors of a node
by creating an index on the elements of the ancestors eld. This makes Array of Ancestors a good choice for working
with subtrees.
The Array of Ancestors pattern is slightly slower than the Materialized Paths pattern but is more straightforward to
use.
19.8 Model Tree Structures with Materialized Paths
19.8.1 Overview
Data in MongoDB has a exible schema. Collections do not enforce document structure. Decisions that affect how
you model data can affect application performance and database capacity. See Data Modeling Considerations for
MongoDB Applications (page 147) for a full high level overview of data modeling in MongoDB.
This document describes a data model that describes a tree-like structure in MongoDB documents by storing full
relationship paths between documents.
19.8.2 Pattern
The Materialized Paths pattern stores each tree node in a document; in addition to the tree node, document stores as
a string the id(s) of the nodes ancestors or path. Although the Materialized Paths pattern requires additional steps of
working with strings and regular expressions, the pattern also provides more exibility in working with the path, such
as nding nodes by partial paths.
Consider the following example that models a tree of categories using Materialized Paths ; the path string uses the
comma , as a delimiter:
db.categories.insert( { _id: "Books", path: null } )
db.categories.insert( { _id: "Programming", path: ",Books," } )
db.categories.insert( { _id: "Databases", path: ",Books,Programming," } )
db.categories.insert( { _id: "Languages", path: ",Books,Programming," } )
db.categories.insert( { _id: "MongoDB", path: ",Books,Programming,Databases," } )
db.categories.insert( { _id: "Postgres", path: ",Books,Programming,Databases," } )
You can query to retrieve the whole tree, sorting by the path:
db.categories.find().sort( { path: 1 } )
You can use regular expressions on the path eld to nd the descendants of Programming:
db.categories.find( { path: /,Programming,/ } )
You can also retrieve the descendants of Books where the Books is also at the topmost level of the hierarchy:
db.categories.find( { path: /^,Books,/ } )
To create an index on the eld path use the following invocation:
202 Chapter 19. Data Modeling Patterns
MongoDB Documentation, Release 2.4.1
db.categories.ensureIndex( { path: 1 } )
This index may improve performance, depending on the query:
For queries of the Books sub-tree (e.g. http://docs.mongodb.org/manual/^,Books,/) an
index on the path eld improves the query performance signicantly.
For queries of the Programming sub-tree (e.g. http://docs.mongodb.org/manual/,Programming,/),
or similar queries of sub-tress, where the node might be in the middle of the indexed string, the query
must inspect the entire index.
For these queries an index may provide some performance improvement if the index is signicantly smaller
than the entire collection.
19.9 Model Tree Structures with Nested Sets
19.9.1 Overview
Data in MongoDB has a exible schema. Collections do not enforce document structure. Decisions that affect how
you model data can affect application performance and database capacity. See Data Modeling Considerations for
MongoDB Applications (page 147) for a full high level overview of data modeling in MongoDB.
This document describes a data model that describes a tree like structure that optimizes discovering subtrees at the
expense of tree mutability.
19.9.2 Pattern
The Nested Sets pattern identies each node in the tree as stops in a round-trip traversal of the tree. The application
visits each node in the tree twice; rst during the initial trip, and second during the return trip. The Nested Sets pattern
stores each tree node in a document; in addition to the tree node, document stores the id of nodes parent, the nodes
initial stop in the left eld, and its return stop in the right eld.
Consider the following example that models a tree of categories using Nested Sets:
db.categories.insert( { _id: "Books", parent: 0, left: 1, right: 12 } )
db.categories.insert( { _id: "Programming", parent: "Books", left: 2, right: 11 } )
db.categories.insert( { _id: "Languages", parent: "Programming", left: 3, right: 4 } )
db.categories.insert( { _id: "Databases", parent: "Programming", left: 5, right: 10 } )
db.categories.insert( { _id: "MongoDB", parent: "Databases", left: 6, right: 7 } )
db.categories.insert( { _id: "Postgres", parent: "Databases", left: 8, right: 9 } )
You can query to retrieve the descendants of a node:
var databaseCategory = db.v.findOne( { _id: "Databases" } );
db.categories.find( { left: { $gt: databaseCategory.left }, right: { $lt: databaseCategory.right } } );
The Nested Sets pattern provides a fast and efcient solution for nding subtrees but is inefcient for modifying the
tree structure. As such, this pattern is best for static trees that do not change.
19.10 Model Data to Support Keyword Search
Note: Keyword search is not the same as text search or full text search, and does not provide stemming or other
text-processing features. See the Limitations of Keyword Indexes (page 204) section for more information.
19.9. Model Tree Structures with Nested Sets 203
MongoDB Documentation, Release 2.4.1
In 2.4, MongoDB provides a text search feature. See Text Search (page 323) for more information.
If your application needs to perform queries on the content of a eld that holds text you can perform exact matches on
the text or use $regex (page 781) to use regular expression pattern matches. However, for many operations on text,
these methods do not satisfy application requirements.
This pattern describes one method for supporting keyword search using MongoDB to support application search
functionality, that uses keywords stored in an array in the same document as the text eld. Combined with a multi-key
index (page 277), this pattern can support applications keyword search operations.
19.10.1 Pattern
To add structures to your document to support keyword-based queries, create an array eld in your documents and add
the keywords as strings in the array. You can then create a multi-key index (page 277) on the array and create queries
that select values from the array.
Example
Suppose you have a collection of library volumes that you want to make searchable by topics. For each volume, you
add the array topics, and you add as many keywords as needed for a given volume.
For the Moby-Dick volume you might have the following document:
{ title : "Moby-Dick" ,
author : "Herman Melville" ,
published : 1851 ,
ISBN : 0451526996 ,
topics : [ "whaling" , "allegory" , "revenge" , "American" ,
"novel" , "nautical" , "voyage" , "Cape Cod" ]
}
You then create a multi-key index on the topics array:
db.volumes.ensureIndex( { topics: 1 } )
The multi-key index creates separate index entries for each keyword in the topics array. For example the index
contains one entry for whaling and another for allegory.
You then query based on the keywords. For example:
db.volumes.findOne( { topics : "voyage" }, { title: 1 } )
Note: An array with a large number of elements, such as one with several hundreds or thousands of keywords will
incur greater indexing costs on insertion.
19.10.2 Limitations of Keyword Indexes
MongoDB can support keyword searches using specic data models and multi-key indexes (page 277); however, these
keyword indexes are not sufcient or comparable to full-text products in the following respects:
Stemming. Keyword queries in MongoDB can not parse keywords for root or related words.
Synonyms. Keyword-based search features must provide support for synonym or related queries in the applica-
tion layer.
204 Chapter 19. Data Modeling Patterns
MongoDB Documentation, Release 2.4.1
Ranking. The keyword look ups described in this document do not provide a way to weight results.
Asynchronous Indexing. MongoDB builds indexes synchronously, which means that the indexes used for key-
word indexes are always current and can operate in real-time. However, asynchronous bulk indexes may be
more efcient for some kinds of content and workloads.
19.10. Model Data to Support Keyword Search 205
MongoDB Documentation, Release 2.4.1
206 Chapter 19. Data Modeling Patterns
Part V
Aggregation
207
MongoDB Documentation, Release 2.4.1
In version 2.2, MongoDB introduced the aggregation framework (page 211) that provides a powerful and exible set
of tools to use for many data aggregation tasks. If youre familiar with data aggregation in SQL, consider the SQL to
Aggregation Framework Mapping Chart (page 971) document as an introduction to some of the basic concepts in the
aggregation framework. Consider the full documentation of the aggregation framework here:
209
MongoDB Documentation, Release 2.4.1
210
CHAPTER 20
Aggregation Framework
New in version 2.1.
20.1 Overview
The MongoDB aggregation framework provides a means to calculate aggregated values without having to use map-
reduce. While map-reduce is powerful, it is often more difcult than necessary for many simple aggregation tasks,
such as totaling or averaging eld values.
If youre familiar with SQL, the aggregation framework provides similar functionality to GROUP BY and related
SQL operators as well as simple forms of self joins. Additionally, the aggregation framework provides projection
capabilities to reshape the returned data. Using the projections in the aggregation framework, you can add computed
elds, create new virtual sub-objects, and extract sub-elds into the top-level of results.
See Also:
A presentation from MongoSV 2011: MongoDBs New Aggregation Framework.
Additionally, consider Aggregation Framework Examples (page 217) and Aggregation Framework Reference
(page 227) for more documentation.
20.2 Framework Components
This section provides an introduction to the two concepts that underpin the aggregation framework: pipelines and
expressions.
20.2.1 Pipelines
Conceptually, documents from a collection pass through an aggregation pipeline, which transforms these objects as
they pass through. For those familiar with UNIX-like shells (e.g. bash,) the concept is analogous to the pipe (i.e. |)
used to string text lters together.
In a shell environment the pipe redirects a stream of characters from the output of one process to the input of the next.
The MongoDB aggregation pipeline streams MongoDB documents from one pipeline operator (page 228) to the next
to process the documents. Pipeline operators can be repeated in the pipe.
211
MongoDB Documentation, Release 2.4.1
All pipeline operators process a stream of documents and the pipeline behaves as if the operation scans a collection and
passes all matching documents into the top of the pipeline. Each operator in the pipeline transforms each document
as it passes through the pipeline.
Note: Pipeline operators need not produce one output document for every input document: operators may also
generate new documents or lter out documents.
Warning: The pipeline cannot operate on values of the following types: Binary, Symbol, MinKey, MaxKey,
DBRef, Code, and CodeWScope.
See Also:
The Aggregation Framework Reference (page 227) includes documentation of the following pipeline operators:
$project (page 811)
$match (page 807)
$limit (page 806)
$skip (page 813)
$unwind (page 815)
$group (page 804)
$sort (page 813)
$geoNear (page 802)
20.2.2 Expressions
Expressions (page 237) produce output documents based on calculations performed on input documents. The aggre-
gation framework denes expressions using a document format using prexes.
Expressions are stateless and are only evaluated when seen by the aggregation process. All aggregation expressions
can only operate on the current document in the pipeline, and cannot integrate data from other documents.
The accumulator expressions used in the $group (page 804) operator maintain that state (e.g. totals, maximums,
minimums, and related data) as documents progress through the pipeline.
See Also:
Aggregation expressions (page 237) for additional examples of the expressions provided by the aggregation framework.
20.3 Use
20.3.1 Invocation
Invoke an aggregation operation with the aggregate() (page 905) wrapper in the mongo (page 1002) shell or
the aggregate (page 818) database command. Always call aggregate() (page 905) on a collection object that
determines the input documents of the aggregation pipeline. The arguments to the aggregate() (page 905) method
specify a sequence of pipeline operators (page 228), where each operator may have a number of operands.
First, consider a collection of documents named articles using the following format:
212 Chapter 20. Aggregation Framework
MongoDB Documentation, Release 2.4.1
{
title : "this is my title" ,
author : "bob" ,
posted : new Date () ,
pageViews : 5 ,
tags : [ "fun" , "good" , "fun" ] ,
comments : [
{ author :"joe" , text : "this is cool" } ,
{ author :"sam" , text : "this is bad" }
],
other : { foo : 5 }
}
The following example aggregation operation pivots data to create a set of author names grouped by tags applied to an
article. Call the aggregation framework by issuing the following command:
db.articles.aggregate(
{ $project : {
author : 1,
tags : 1,
} },
{ $unwind : "$tags" },
{ $group : {
_id : { tags : "$tags" },
authors : { $addToSet : "$author" }
} }
);
The aggregation pipeline begins with the collection articles and selects the author and tags elds using the
$project (page 811) aggregation operator. The $unwind operator produces one output document per tag. Finally,
the $group operator pivots these elds.
20.3.2 Result
The aggregation operation in the previous section returns a document with two elds:
result which holds an array of documents returned by the pipeline
ok which holds the value 1, indicating success, or another value if there was an error
As a document, the result is subject to the BSON Document size (page 1133) limit, which is currently 16 megabytes.
20.4 Optimizing Performance
Because you will always call aggregate (page 818) on a collection object, which logically inserts the entire collec-
tion into the aggregation pipeline, you may want to optimize the operation by avoiding scanning the entire collection
whenever possible.
20.4.1 Pipeline Operators and Indexes
Depending on the order in which they appear in the pipeline, aggregation operators can take advantage of indexes.
The following pipeline operators take advantage of an index when they occur at the beginning of the pipeline:
$match (page 807)
20.4. Optimizing Performance 213
MongoDB Documentation, Release 2.4.1
$sort (page 813)
$limit (page 806)
$skip (page 813).
The above operators can also use an index when placed before the following aggregation operators:
$project (page 811)
$unwind (page 815)
$group (page 804).
New in version 2.4. The $geoNear (page 802) pipeline operator takes advantage of a geospatial index. When using
$geoNear (page 802), the $geoNear (page 802) pipeline operation must appear as the rst stage in an aggregation
pipeline.
20.4.2 Early Filtering
If your aggregation operation requires only a subset of the data in a collection, use the $match (page 807) operator
to restrict which items go in to the top of the pipeline, as in a query. When placed early in a pipeline, these $match
(page 807) operations use suitable indexes to scan only the matching documents in a collection.
Placing a $match (page 807) pipeline stage followed by a $sort (page 813) stage at the start of the pipeline is
logically equivalent to a single query with a sort, and can use an index.
In future versions there may be an optimization phase in the pipeline that reorders the operations to increase perfor-
mance without affecting the result. However, at this time place $match (page 807) operators at the beginning of the
pipeline when possible.
20.4.3 Pipeline Sequence Optimization
Changed in version 2.4. Aggregation operations have an optimization phase which attempts to re-arrange the pipeline
for improved performance.
$sort + $skip + $limit Sequence Optimization
When you have sequence of $sort (page 813) followed by a $skip (page 813) followed by a $limit (page 806),
an optimization occurs whereby the $limit (page 806) moves in front of the $skip (page 813). For example, if the
pipeline consists of the following stages:
{ $sort: { age : -1 } },
{ $skip: 10 },
{ $limit: 5 }
During the optimization phase, the optimizer transforms the sequence to the following:
{ $sort: { age : -1 } },
{ $limit: 15 }
{ $skip: 10 }
Note: The $limit (page 806) value has increased to the sum of the initial value and the $skip (page 813) value.
214 Chapter 20. Aggregation Framework
MongoDB Documentation, Release 2.4.1
$limit + $skip + $limit + $skip Sequence Optimization
When you have continuous sequence of $limit (page 806) pipeline stage followed by a $skip (page 813) pipeline
stage, the aggregation will attempt to re-arrange the pipeline stages to combine the limits together and the skips
together. For example, if the pipeline consists of the following stages:
{ $limit: 100 },
{ $skip: 5 },
{ $limit: 10},
{ $skip: 2 }
During the intermediate step, the optimizer reverses the position of the $skip (page 813) followed by a $limit
(page 806) to $limit (page 806) followed by the $skip (page 813).
{ $limit: 100 },
{ $limit: 15},
{ $skip: 5 },
{ $skip: 2 }
The $limit (page 806) value has increased to the sum of the initial value and the $skip (page 813) value. Then,
for the nal $limit (page 806) value, the optimizer selects the minimum between the adjacent $limit (page 806)
values. For the nal $skip (page 813) value, the optimizer adds the adjacent $skip (page 813) values, to transform
the sequence to the following:
{ $limit: 15 },
{ $skip: 7 }
20.4.4 Memory for Cumulative Operators
Certain pipeline operators require access to the entire input set before they can produce any output. For example,
$sort (page 813) must receive all of the input from the preceding pipeline operator before it can produce its rst
output document. The current implementation of $sort (page 813) does not go to disk in these cases: in order to sort
the contents of the pipeline, the entire input must t in memory. Changed in version 2.4: When a $sort (page 813)
immediately precedes a $limit (page 806) in the pipeline, the $sort (page 813) operation only maintains the top
n results as it progresses, where n is the specied limit. Before 2.4, $sort (page 813) would sort all the results in
memory, and then limit the results to n results. $group (page 804) has similar characteristics: Before any $group
(page 804) passes its output along the pipeline, it must receive the entirety of its input. For the $group (page 804)
operator, this frequently does not require as much memory as $sort (page 813), because it only needs to retain one
record for each unique key in the grouping specication.
The current implementation of the aggregation framework logs a warning if a cumulative operator consumes 5% or
more of the physical memory on the host. Cumulative operators produce an error if they consume 10% or more of the
physical memory on the host.
20.5 Sharded Operation
Note: Changed in version 2.1. Some aggregation operations using aggregate (page 818) will cause mongos
(page 999) instances to require more CPU resources than in previous versions. This modied performance prole may
dictate alternate architectural decisions if you use the aggregation framework extensively in a sharded environment.
The aggregation framework is compatible with sharded collections.
20.5. Sharded Operation 215
MongoDB Documentation, Release 2.4.1
When operating on a sharded collection, the aggregation pipeline is split into two parts. The aggregation framework
pushes all of the operators up to the rst $group (page 804) or $sort (page 813) operation to each shard.
1
Then,
a second pipeline on the mongos (page 999) runs. This pipeline consists of the rst $group (page 804) or $sort
(page 813) and any remaining pipeline operators, and runs on the results received from the shards.
The $group (page 804) operator brings in any sub-totals from the shards and combines them: in some cases these
may be structures. For example, the $avg expression maintains a total and count for each shard; mongos (page 999)
combines these values and then divides.
20.6 Limitations
Aggregation operations with the aggregate (page 818) command have the following limitations:
The pipeline cannot operate on values of the following types: Binary, Symbol, MinKey, MaxKey, DBRef,
Code, CodeWScope.
Output from the pipeline can only contain 16 megabytes. If your result set exceeds this limit, the aggregate
(page 818) command produces an error.
If any single aggregation operation consumes more than 10 percent of system RAM the operation will produce
an error.
1
If an early $match (page 807) can exclude shards through the use of the shard key in the predicate, then these operators are only pushed to
the relevant shards.
216 Chapter 20. Aggregation Framework
CHAPTER 21
Aggregation Framework Examples
MongoDB provides exible data aggregation functionality with the aggregate (page 818) command. For additional
information about aggregation consider the following resources:
Aggregation Framework (page 211)
Aggregation Framework Reference (page 227)
SQL to Aggregation Framework Mapping Chart (page 971)
This document provides a number of practical examples that display the capabilities of the aggregation framework.
All examples use a publicly available data set of all zipcodes and populations in the United States.
21.1 Requirements
mongod (page 989) and mongo (page 1002), version 2.2 or later.
21.2 Aggregations using the Zip Code Data Set
To run you will need the zipcode data set. These data are available at: media.mongodb.org/zips.json. Use
mongoimport (page 1022) to load this data set into your mongod (page 989) instance.
21.2.1 Data Model
Each document in this collection has the following form:
{
"_id": "10280",
"city": "NEW YORK",
"state": "NY",
"pop": 5574,
"loc": [
-74.016323,
40.710537
]
}
217
MongoDB Documentation, Release 2.4.1
In these documents:
The _id eld holds the zipcode as a string.
The city eld holds the city.
The state eld holds the two letter state abbreviation.
The pop eld holds the population.
The loc eld holds the location as a latitude longitude pair.
All of the following examples use the aggregate() (page 905) helper in the mongo (page 1002) shell.
aggregate() (page 905) provides a wrapper around the aggregate (page 818) database command. See the
documentation for your driver (page 493) for a more idiomatic interface for data aggregation operations.
21.2.2 States with Populations Over 10 Million
To return all states with a population greater than 10 million, use the following aggregation operation:
db.zipcodes.aggregate( { $group :
{ _id : "$state",
totalPop : { $sum : "$pop" } } },
{ $match : {totalPop : { $gte : 10
*
1000
*
1000 } } } )
Aggregations operations using the aggregate() (page 905) helper, process all documents on the zipcodes col-
lection. aggregate() (page 905) a number of pipeline (page 211) operators that dene the aggregation process.
In the above example, the pipeline passes all documents in the zipcodes collection through the following steps:
the $group (page 804) operator collects all documents and creates documents for each state.
These new per-state documents have one eld in addition the _id eld: totalPop which is a generated eld
using the $sum operation to calculate the total value of all pop elds in the source documents.
After the $group (page 804) operation the documents in the pipeline resemble the following:
{
"_id" : "AK",
"totalPop" : 550043
}
the $match (page 807) operation lters these documents so that the only documents that remain are those
where the value of totalPop is greater than or equal to 10 million.
The $match (page 807) operation does not alter the documents, which have the same format as the documents
output by $group (page 804).
The equivalent SQL for this operation is:
SELECT state, SUM(pop) AS pop
FROM zips
GROUP BY state
HAVING pop > (10
*
1000
*
1000)
21.2.3 Average City Population by State
To return the average populations for cities in each state, use the following aggregation operation:
218 Chapter 21. Aggregation Framework Examples
MongoDB Documentation, Release 2.4.1
db.zipcodes.aggregate( { $group :
{ _id : { state : "$state", city : "$city" },
pop : { $sum : "$pop" } } },
{ $group :
{ _id : "$_id.state",
avgCityPop : { $avg : "$pop" } } } )
Aggregations operations using the aggregate() (page 905) helper, process all documents on the zipcodes col-
lection. aggregate() (page 905) a number of pipeline (page 211) operators that dene the aggregation process.
In the above example, the pipeline passes all documents in the zipcodes collection through the following steps:
the $group (page 804) operator collects all documents and creates new documents for every combination of
the city and state elds in the source document.
After this stage in the pipeline, the documents resemble the following:
{
"_id" : {
"state" : "CO",
"city" : "EDGEWATER"
},
"pop" : 13154
}
the second $group (page 804) operator collects documents by the state eld and use the $avg expression
to compute a value for the avgCityPop eld.
The nal output of this aggregation operation is:
{
"_id" : "MN",
"avgCityPop" : 5335
},
21.2.4 Largest and Smallest Cities by State
To return the smallest and largest cities by population for each state, use the following aggregation operation:
db.zipcodes.aggregate( { $group:
{ _id: { state: "$state", city: "$city" },
pop: { $sum: "$pop" } } },
{ $sort: { pop: 1 } },
{ $group:
{ _id : "$_id.state",
biggestCity: { $last: "$_id.city" },
biggestPop: { $last: "$pop" },
smallestCity: { $first: "$_id.city" },
smallestPop: { $first: "$pop" } } },
// the following $project is optional, and
// modifies the output format.
{ $project:
{ _id: 0,
state: "$_id",
biggestCity: { name: "$biggestCity", pop: "$biggestPop" },
smallestCity: { name: "$smallestCity", pop: "$smallestPop" } } } )
21.2. Aggregations using the Zip Code Data Set 219
MongoDB Documentation, Release 2.4.1
Aggregations operations using the aggregate() (page 905) helper, process all documents on the zipcodes col-
lection. aggregate() (page 905) a number of pipeline (page 211) operators that dene the aggregation process.
All documents from the zipcodes collection pass into the pipeline, which consists of the following steps:
the $group (page 804) operator collects all documents and creates new documents for every combination of
the city and state elds in the source documents.
By specifying the value of _id as a sub-document that contains both elds, the operation preserves the state
eld for use later in the pipeline. The documents produced by this stage of the pipeline have a second eld,
pop, which uses the $sum operator to provide the total of the pop elds in the source document.
At this stage in the pipeline, the documents resemble the following:
{
"_id" : {
"state" : "CO",
"city" : "EDGEWATER"
},
"pop" : 13154
}
$sort (page 813) operator orders the documents in the pipeline based on the vale of the pop eld from largest
to smallest. This operation does not alter the documents.
the second $group (page 804) operator collects the documents in the pipeline by the state eld, which is a
eld inside the nested _id document.
Within each per-state document this $group (page 804) operator species four elds: Using the $last ex-
pression, the $group (page 804) operator creates the biggestcity and biggestpop elds that store the
city with the largest population and that population. Using the $first expression, the $group (page 804)
operator creates the smallestcity and smallestpop elds that store the city with the smallest population
and that population.
The documents, at this stage in the pipeline resemble the following:
{
"_id" : "WA",
"biggestCity" : "SEATTLE",
"biggestPop" : 520096,
"smallestCity" : "BENGE",
"smallestPop" : 2
}
The nal operation is $project (page 811), which renames the _id eld to state and moves
the biggestCity, biggestPop, smallestCity, and smallestPop into biggestCity and
smallestCity sub-documents.
The nal output of this aggregation operation is:
{
"state" : "RI",
"biggestCity" : {
"name" : "CRANSTON",
"pop" : 176404
},
"smallestCity" : {
"name" : "CLAYVILLE",
"pop" : 45
}
}
220 Chapter 21. Aggregation Framework Examples
MongoDB Documentation, Release 2.4.1
21.3 Aggregation with User Preference Data
21.3.1 Data Model
Consider a hypothetical sports club with a database that contains a user collection that tracks users join dates, sport
preferences, and stores these data in documents that resemble the following:
{
_id : "jane",
joined : ISODate("2011-03-02"),
likes : ["golf", "racquetball"]
}
{
_id : "joe",
joined : ISODate("2012-07-02"),
likes : ["tennis", "golf", "swimming"]
}
21.3.2 Normalize and Sort Documents
The following operation returns user names in upper case and in alphabetical order. The aggregation includes user
names for all documents in the users collection. You might do this to normalize user names for processing.
db.users.aggregate(
[
{ $project : { name:{$toUpper:"$_id"} , _id:0 } },
{ $sort : { name : 1 } }
]
)
All documents from the users collection passes through the pipeline, which consists of the following operations:
The $project (page 811) operator:
creates a new eld called name.
converts the value of the _id to upper case, with the $toUpper (page 815) operator. Then the
$project (page 811) creates a new led, named name to hold this value.
suppresses the id eld. $project (page 811) will pass the _id eld by default, unless explicitly
suppressed.
The $sort (page 813) operator orders the results by the name eld.
The results of the aggregation would resemble the following:
{
"name" : "JANE"
},
{
"name" : "JILL"
},
{
"name" : "JOE"
}
21.3. Aggregation with User Preference Data 221
MongoDB Documentation, Release 2.4.1
21.3.3 Return Usernames Ordered by Join Month
The following aggregation operation returns user names sorted by the month they joined. This kind of aggregation
could help generate membership renewal notices.
db.users.aggregate(
[
{ $project : { month_joined : {
$month : "$joined"
},
name : "$_id",
_id : 0
},
{ $sort : { month_joined : 1 } }
]
)
The pipeline passes all documents in the users collection through the following operations:
The $project (page 811) operator:
Creates two new elds: month_joined and name.
Suppresses the id from the results. The aggregate() (page 905) method includes the _id, unless
explicitly suppressed.
The $month (page 810) operator converts the values of the joined eld to integer representations of the
month. Then the $project (page 811) operator assigns those values to the month_joined eld.
The $sort (page 813) operator sorts the results by the month_joined eld.
The operation returns results that resemble the following:
{
"month_joined" : 1,
"name" : "ruth"
},
{
"month_joined" : 1,
"name" : "harold"
},
{
"month_joined" : 1,
"name" : "kate"
}
{
"month_joined" : 2,
"name" : "jill"
}
21.3.4 Return Total Number of Joins per Month
The following operation shows how many people joined each month of the year. You might use this aggregated data
for such information for recruiting and marketing strategies.
db.users.aggregate(
[
{ $project : { month_joined : { $month : "$joined" } } } ,
{ $group : { _id : {month_joined:"$month_joined"} , number : { $sum : 1 } } },
222 Chapter 21. Aggregation Framework Examples
MongoDB Documentation, Release 2.4.1
{ $sort : { "_id.month_joined" : 1 } }
]
)
The pipeline passes all documents in the users collection through the following operations:
The $project (page 811) operator creates a new eld called month_joined.
The $month (page 810) operator converts the values of the joined eld to integer representations of the
month. Then the $project (page 811) operator assigns the values to the month_joined eld.
The $group (page 804) operator collects all documents with a given month_joined value and counts how
many documents there are for that value. Specically, for each unique value, $group (page 804) creates a new
per-month document with two elds:
_id, which contains a nested document with the month_joined eld and its value.
number, which is a generated eld. The $sum operator increments this eld by 1 for every document
containing the given month_joined value.
The $sort (page 813) operator sorts the documents created by $group (page 804) according to the contents
of the month_joined eld.
The result of this aggregation operation would resemble the following:
{
"_id" : {
"month_joined" : 1
},
"number" : 3
},
{
"_id" : {
"month_joined" : 2
},
"number" : 9
},
{
"_id" : {
"month_joined" : 3
},
"number" : 5
}
21.3.5 Return the Five Most Common Likes
The following aggregation collects top ve most liked activities in the data set. In this data set, you might use an
analysis of this to help inform planning and future development.
db.users.aggregate(
[
{ $unwind : "$likes" },
{ $group : { _id : "$likes" , number : { $sum : 1 } } },
{ $sort : { number : -1 } },
{ $limit : 5 }
]
)
21.3. Aggregation with User Preference Data 223
MongoDB Documentation, Release 2.4.1
The pipeline begins with all documents in the users collection, and passes these documents through the following
operations:
The $unwind (page 815) operator separates each value in the likes array, and creates a new version of the
source document for every element in the array.
Example
Given the following document from the users collection:
{
_id : "jane",
joined : ISODate("2011-03-02"),
likes : ["golf", "racquetball"]
}
The $unwind (page 815) operator would create the following documents:
{
_id : "jane",
joined : ISODate("2011-03-02"),
likes : "golf"
}
{
_id : "jane",
joined : ISODate("2011-03-02"),
likes : "racquetball"
}
The $group (page 804) operator collects all documents the same value for the likes eld and counts each
grouping. With this information, $group (page 804) creates a new document with two elds:
_id, which contains the likes value.
number, which is a generated eld. The $sum operator increments this eld by 1 for every document
containing the given likes value.
The $sort (page 813) operator sorts these documents by the number eld in reverse order.
The $limit (page 806) operator only includes the rst 5 result documents.
The results of aggregation would resemble the following:
{
"_id" : "golf",
"number" : 33
},
{
"_id" : "racquetball",
"number" : 31
},
{
"_id" : "swimming",
"number" : 24
},
{
"_id" : "handball",
"number" : 19
},
{
224 Chapter 21. Aggregation Framework Examples
MongoDB Documentation, Release 2.4.1
"_id" : "tennis",
"number" : 18
}
21.3. Aggregation with User Preference Data 225
MongoDB Documentation, Release 2.4.1
226 Chapter 21. Aggregation Framework Examples
CHAPTER 22
Aggregation Framework Reference
New in version 2.1.0. The aggregation framework provides the ability to project, process, and/or control the output of
the query, without using map-reduce. Aggregation uses a syntax that resembles the same syntax and form as regular
MongoDB database queries.
These aggregation operations are all accessible by way of the aggregate() (page 905) method. While all examples
in this document use this method, aggregate() (page 905) is merely a wrapper around the database command
aggregate (page 818). The following prototype aggregation operations are equivalent:
db.people.aggregate( <pipeline> )
db.people.aggregate( [<pipeline>] )
db.runCommand( { aggregate: "people", pipeline: [<pipeline>] } )
These operations perform aggregation routines on the collection named people. <pipeline> is a placeholder for
the aggregation pipeline denition. aggregate() (page 905) accepts the stages of the pipeline (i.e. <pipeline>)
as an array, or as arguments to the method.
This documentation provides an overview of all aggregation operators available for use in the aggregation pipeline as
well as details regarding their use and behavior.
See Also:
Aggregation Framework (page 211) overview, the Aggregation Framework Documentation Index (page 209), and the
Aggregation Framework Examples (page 217) for more information on the aggregation functionality.
Aggregation Operators:
Pipeline (page 228)
Expressions (page 237)
$group Operators (page 237)
Boolean Operators (page 239)
Comparison Operators (page 239)
Arithmetic Operators (page 240)
String Operators (page 241)
Date Operators (page 241)
Conditional Expressions (page 242)
227
MongoDB Documentation, Release 2.4.1
22.1 Pipeline
Warning: The pipeline cannot operate on values of the following types: Binary, Symbol, MinKey, MaxKey,
DBRef, Code, and CodeWScope.
Pipeline operators appear in an array. Conceptually, documents pass through these operators in a sequence. All
examples in this section assume that the aggregation pipeline begins with a collection named article that contains
documents that resemble the following:
{
title : "this is my title" ,
author : "bob" ,
posted : new Date() ,
pageViews : 5 ,
tags : [ "fun" , "good" , "fun" ] ,
comments : [
{ author :"joe" , text : "this is cool" } ,
{ author :"sam" , text : "this is bad" }
],
other : { foo : 5 }
}
The current pipeline operators are:
$project
Reshapes a document stream by renaming, adding, or removing elds. Also use $project (page 811) to
create computed values or sub-objects. Use $project (page 811) to:
Include elds from the original document.
Insert computed elds.
Rename elds.
Create and populate elds that hold sub-documents.
Use $project (page 811) to quickly select the elds that you want to include or exclude from the response.
Consider the following aggregation framework operation.
db.article.aggregate(
{ $project : {
title : 1 ,
author : 1 ,
}}
);
This operation includes the title eld and the author eld in the document that returns from the aggregation
pipeline.
Note: The _id eld is always included by default. You may explicitly exclude _id as follows:
db.article.aggregate(
{ $project : {
_id : 0 ,
title : 1 ,
author : 1
}}
);
228 Chapter 22. Aggregation Framework Reference
MongoDB Documentation, Release 2.4.1
Here, the projection excludes the _id eld but includes the title and author elds.
Projections can also add computed elds to the document stream passing through the pipeline. A computed eld
can use any of the expression operators (page 237). Consider the following example:
db.article.aggregate(
{ $project : {
title : 1,
doctoredPageViews : { $add:["$pageViews", 10] }
}}
);
Here, the eld doctoredPageViews represents the value of the pageViews eld after adding 10 to the
original eld using the $add (page 798).
Note: You must enclose the expression that denes the computed eld in braces, so that the expression is a
valid object.
You may also use $project (page 811) to rename elds. Consider the following example:
db.article.aggregate(
{ $project : {
title : 1 ,
page_views : "$pageViews" ,
bar : "$other.foo"
}}
);
This operation renames the pageViews eld to page_views, and renames the foo eld in the other sub-
document as the top-level eld bar. The eld references used for renaming elds are direct expressions and do
not use an operator or surrounding braces. All aggregation eld references can use dotted paths to refer to elds
in nested documents.
Finally, you can use the $project (page 811) to create and populate new sub-documents. Consider the
following example that creates a new object-valued eld named stats that holds a number of values:
db.article.aggregate(
{ $project : {
title : 1 ,
stats : {
pv : "$pageViews",
foo : "$other.foo",
dpv : { $add:["$pageViews", 10] }
}
}}
);
This projection includes the title eld and places $project (page 811) into inclusive mode. Then, it
creates the stats documents with the following elds:
pv which includes and renames the pageViews from the top level of the original documents.
foo which includes the value of other.foo from the original documents.
dpv which is a computed eld that adds 10 to the value of the pageViews eld in the original document
using the $add (page 798) aggregation expression.
$match
$match (page 807) pipes the documents that match its conditions to the next operator in the pipeline.
22.1. Pipeline 229
MongoDB Documentation, Release 2.4.1
The $match (page 807) query syntax is identical to the read operation query (page 128) syntax.
Example
The following operation uses $match (page 807) to perform a simple equality match:
db.articles.aggregate(
{ $match : { author : "dave" } }
);
The $match (page 807) selects the documents where the author eld equals dave, and the aggregation
returns the following:
{ "result" : [
{
"_id" : ObjectId("512bc95fe835e68f199c8686"),
"author": "dave",
"score" : 80
},
{ "_id" : ObjectId("512bc962e835e68f199c8687"),
"author" : "dave",
"score" : 85
}
],
"ok" : 1 }
Example
The following example selects documents to process using the $match (page 807) pipeline operator and then
pipes the results to the $group (page 804) pipeline operator to compute a count of the documents:
db.articles.aggregate( [
{ $match : { score : { $gt : 70, $lte : 90 } } },
{ $group: { _id: null, count: { $sum: 1 } } }
] );
In the aggregation pipeline, $match (page 807) selects the documents where the score is greater than 70 and
less than or equal to 90. These documents are then piped to the $group (page 804) to perform a count. The
aggregation returns the following:
{
"result" : [
{
"_id" : null,
"count" : 3
}
],
"ok" : 1 }
Note:
Place the $match (page 807) as early in the aggregation pipeline as possible. Because $match
(page 807) limits the total number of documents in the aggregation pipeline, earlier $match (page 807)
operations minimize the amount of processing down the pipe.
If you place a $match (page 807) at the very beginning of a pipeline, the query can take advantage
of indexes like any other db.collection.find() (page 910) or db.collection.findOne()
230 Chapter 22. Aggregation Framework Reference
MongoDB Documentation, Release 2.4.1
(page 915).
New in version 2.4: $match (page 807) queries can support the geospatial $geoWithin (page 762) opera-
tions.
Warning: You cannot use $where (page 792) in $match (page 807) queries as part of the aggregation
pipeline.
$limit
Restricts the number of documents that pass through the $limit (page 806) in the pipeline.
$limit (page 806) takes a single numeric (positive whole number) value as a parameter. Once the specied
number of documents pass through the pipeline operator, no more will. Consider the following example:
db.article.aggregate(
{ $limit : 5 }
);
This operation returns only the rst 5 documents passed to it from by the pipeline. $limit (page 806) has no
effect on the content of the documents it passes.
Note: Changed in version 2.4: When a $sort (page 813) immediately precedes a $limit (page 806) in
the pipeline, the $sort (page 813) operation only maintains the top n results as it progresses, where n is the
specied limit. Before 2.4, $sort (page 813) would sort all the results in memory, and then limit the results to
n results.
$skip
Skips over the specied number of documents that pass through the $skip (page 813) in the pipeline before
passing all of the remaining input.
$skip (page 813) takes a single numeric (positive whole number) value as a parameter. Once the operation has
skipped the specied number of documents, it passes all the remaining documents along the pipeline without
alteration. Consider the following example:
db.article.aggregate(
{ $skip : 5 }
);
This operation skips the rst 5 documents passed to it by the pipeline. $skip (page 813) has no effect on the
content of the documents it passes along the pipeline.
$unwind
Peels off the elements of an array individually, and returns a stream of documents. $unwind (page 815) returns
one document for every member of the unwound array within every source document. Take the following
aggregation command:
db.article.aggregate(
{ $project : {
author : 1 ,
title : 1 ,
tags : 1
}},
{ $unwind : "$tags" }
);
Note: The dollar sign (i.e. $) must proceed the eld specication handed to the $unwind (page 815) operator.
22.1. Pipeline 231
MongoDB Documentation, Release 2.4.1
In the above aggregation $project (page 811) selects (inclusively) the author, title, and tags elds,
as well as the _id eld implicitly. Then the pipeline passes the results of the projection to the $unwind
(page 815) operator, which will unwind the tags eld. This operation may return a sequence of documents
that resemble the following for a collection that contains one document holding a tags eld with an array of 3
items.
{
"result" : [
{
"_id" : ObjectId("4e6e4ef557b77501a49233f6"),
"title" : "this is my title",
"author" : "bob",
"tags" : "fun"
},
{
"_id" : ObjectId("4e6e4ef557b77501a49233f6"),
"title" : "this is my title",
"author" : "bob",
"tags" : "good"
},
{
"_id" : ObjectId("4e6e4ef557b77501a49233f6"),
"title" : "this is my title",
"author" : "bob",
"tags" : "fun"
}
],
"OK" : 1
}
A single document becomes 3 documents: each document is identical except for the value of the tags eld.
Each value of tags is one of the values in the original tags array.
Note: $unwind (page 815) has the following behaviors:
$unwind (page 815) is most useful in combination with $group (page 804).
You may undo the effects of unwind operation with the $group (page 804) pipeline operator.
If you specify a target eld for $unwind (page 815) that does not exist in an input document, the pipeline
ignores the input document, and will generate no result documents.
If you specify a target eld for $unwind (page 815) that is not an array, aggregate() generates an
error.
If you specify a target eld for $unwind (page 815) that holds an empty array ([]) in an input document,
the pipeline ignores the input document, and will generates no result documents.
$group
Groups documents together for the purpose of calculating aggregate values based on a collection of documents.
Practically, group often supports tasks such as average page views for each page in a website on a daily basis.
The output of $group (page 804) depends on how you dene groups. Begin by specifying an identier (i.e. a
_id eld) for the group youre creating with this pipeline. You can specify a single eld from the documents in
the pipeline, a previously computed value, or an aggregate key made up from several incoming elds. Aggregate
keys may resemble the following document:
{ _id : { author: $author, pageViews: $pageViews, posted: $posted } }
232 Chapter 22. Aggregation Framework Reference
MongoDB Documentation, Release 2.4.1
With the exception of the _id eld, $group (page 804) cannot output nested documents.
Every group expression must specify an _id eld. You may specify the _id eld as a dotted eld path refer-
ence, a document with multiple elds enclosed in braces (i.e. { and }), or a constant value.
Note: Use $project (page 811) as needed to rename the grouped eld after an $group (page 804) operation,
if necessary.
Consider the following example:
db.article.aggregate(
{ $group : {
_id : "$author",
docsPerAuthor : { $sum : 1 },
viewsPerAuthor : { $sum : "$pageViews" }
}}
);
This groups by the author eld and computes two elds, the rst docsPerAuthor is a counter eld
that adds one for each document with a given author eld using the $sum (page 815) function. The
viewsPerAuthor eld is the sum of all of the pageViews elds in the documents for each group.
Each eld dened for the $group (page 804) must use one of the group aggregation function listed below to
generate its composite value:
$addToSet (page 798)
$first (page 802)
$last (page 806)
$max (page 808)
$min (page 808)
$avg (page 798)
$push (page 812)
$sum (page 815)
Warning: The aggregation system currently stores $group (page 804) operations in memory, which may
cause problems when processing a larger number of groups.
$sort
The $sort (page 813) pipeline operator sorts all input documents and returns them to the pipeline in sorted
order. Consider the following prototype form:
db.<collection-name>.aggregate(
{ $sort : { <sort-key> } }
);
This sorts the documents in the collection named <collection-name>, according to the key and specica-
tion in the { <sort-key> } document.
Specify the sort in a document with a eld or elds that you want to sort by and a value of 1 or -1 to specify an
ascending or descending sort respectively, as in the following example:
db.users.aggregate(
{ $sort : { age : -1, posts: 1 } }
);
22.1. Pipeline 233
MongoDB Documentation, Release 2.4.1
This operation sorts the documents in the users collection, in descending order according by the age eld
and then in ascending order according to the value in the posts eld.
When comparing values of different BSON types, MongoDB uses the following comparison order, from lowest
to highest:
1.MinKey (internal type)
2.Null
3.Numbers (ints, longs, doubles)
4.Symbol, String
5.Object
6.Array
7.BinData
8.ObjectID
9.Boolean
10.Date, Timestamp
11.Regular Expression
12.MaxKey (internal type)
Note: MongoDB treats some types as equivalent for comparison purposes. For instance, numeric types undergo
conversion before comparison.
Note: The $sort (page 813) cannot begin sorting documents until previous operators in the pipeline have
returned all output.
$skip (page 813)
$sort (page 813) operator can take advantage of an index when placed at the beginning of the pipeline or
placed before the following aggregation operators:
$project (page 811)
$unwind (page 815)
$group (page 804).
Changed in version 2.4: When a $sort (page 813) immediately precedes a $limit (page 806) in the pipeline,
the $sort (page 813) operation only maintains the top n results as it progresses, where n is the specied limit.
Before 2.4, $sort (page 813) would sort all the results in memory, and then limit the results to n results.
Warning: Changed in version 2.4: Sorts immediately proceeded by a limit no longer need to t into memory.
Previously, all sorts had to t into memory or use an index. Unless the $sort (page 813) operator can use
an index, or immediately precedes a $limit (page 806), the $sort (page 813) operation must t within
memory.
For $sort (page 813) operations that immediately precede a $limit (page 806) stage, MongoDB only
needs to store the number of items specied by $limit (page 806) in memory.
$geoNear
New in version 2.4. $geoNear (page 802) returns documents in order of nearest to farthest from a specied
point and pass the documents through the aggregation pipeline.
234 Chapter 22. Aggregation Framework Reference
MongoDB Documentation, Release 2.4.1
Important:
You can only use $geoNear (page 802) as the rst stage of a pipeline.
You must include the distanceField option. The distanceField option species the eld that
will contain the calculated distance.
The collection must have a geospatial index (page 321).
The $geoNear (page 802) accept the following options:
Fields
near (coordinates) Species the coordinates (e.g. [ x, y ]) to use as the center of a
geospatial query.
distanceField (string) Species the output eld that will contain the calculated distance.
You can use the dot notation to specify a eld within a subdocument.
num (number) Optional. Species the maximum number of documents to return. The
default value is 100.
maxDistance (number) Optional. Limits the results to the documents within the specied
distance from the center coordinates.
query (document) Optional. Limits the results to the documents that match the query. The
query syntax is identical to the read operation query (page 128) syntax.
spherical (boolean) Optional. Default value is false. When true, MongoDB performs
calculation using spherical geometry.
distanceMultiplier (number) Optional. Species a factor to multiply all distances re-
turned by $geoNear (page 802). For example, use distanceMultiplier to convert
from spherical queries returned in radians to linear units (i.e. miles or kilometers) by multi-
plying by the radius of the Earth.
includeLocs (string) Optional. Species the output eld that identies the location used to
calculate the distance. This option is useful when a location eld contains multiple locations.
You can use the dot notation to specify a eld within a subdocument.
uniqueDocs (boolean) Optional. Default value is false. If a location eld contains
multiple locations, the default settings will return the document multiple times if more than
one location meets the criteria.
When true, the document will only return once even if the document has multiple locations
that meet the criteria.
Example
The following aggregation nds at most 5 unique documents with a location at most .008 from the center
[40.72, -73.99] and have type equal to public:
db.places.aggregate([
{
$geoNear: {
near: [40.724, -73.997],
distanceField: "dist.calculated",
maxDistance: 0.008,
query: { type: "public" },
includeLocs: "dist.location",
22.1. Pipeline 235
MongoDB Documentation, Release 2.4.1
uniqueDocs: true,
num: 5
}
}
])
The aggregation returns the following:
{
"result" : [
{ "_id" : 7,
"name" : "Washington Square",
"type" : "public",
"location" : [
[ 40.731, -73.999 ],
[ 40.732, -73.998 ],
[ 40.730, -73.995 ],
[ 40.729, -73.996 ]
],
"dist" : {
"calculated" : 0.0050990195135962296,
"location" : [ 40.729, -73.996 ]
}
},
{ "_id" : 8,
"name" : "Sara D. Roosevelt Park",
"type" : "public",
"location" : [
[ 40.723, -73.991 ],
[ 40.723, -73.990 ],
[ 40.715, -73.994 ],
[ 40.715, -73.994 ]
],
"dist" : {
"calculated" : 0.006082762530298062,
"location" : [ 40.723, -73.991 ]
}
}
],
"ok" : 1
}
The matching documents in the result eld contain two new elds:
dist.calculated eld that contains the calculated distance, and
dist.location eld that contains the location used in the calculation.
Note: The options for $geoNear (page 802) are similar to the geoNear (page 845) command with the
following exceptions:
distanceField is a mandatory eld for the $geoNear (page 802) pipeline operator; the option does
not exist in the geoNear (page 845) command.
includeLocs accepts a string in the $geoNear (page 802) pipeline operator and a boolean in
the geoNear (page 845) command.
236 Chapter 22. Aggregation Framework Reference
MongoDB Documentation, Release 2.4.1
22.2 Expressions
These operators calculate values within the aggregation framework.
22.2.1 $group Operators
The $group (page 804) pipeline stage provides the following operations:
$addToSet
Returns an array of all the values found in the selected eld among the documents in that group. Every unique
value only appears once in the result set. There is no ordering guarantee for the output documents.
$first
Returns the rst value it encounters for its group .
Note: Only use $first (page 802) when the $group (page 804) follows an $sort (page 813) operation.
Otherwise, the result of this operation is unpredictable.
$last
Returns the last value it encounters for its group.
Note: Only use $last (page 806) when the $group (page 804) follows an $sort (page 813) operation.
Otherwise, the result of this operation is unpredictable.
$max
Returns the highest value among all values of the eld in all documents selected by this group.
$min
The $min (page 808) operator returns the lowest non-null value of a eld in the documents for a $group
(page 804) operation. Changed in version 2.4: If some, but not all, documents for the $min (page 808)
operation have either a null value for the eld or are missing the eld, the $min (page 808) operator only
considers the non-null and the non-missing values for the eld. If all documents for the $min (page 808)
operation have null value for the eld or are missing the eld, the $min (page 808) operator returns null
for the minimum value. Before 2.4, if any of the documents for the $min (page 808) operation were missing
the eld, the $min (page 808) operator would not return any value. If any of the documents for the $min
(page 808) had the value null, the $min (page 808) operator would return a null.
Example
The users collection contains the following documents:
{ "_id" : "abc001", "age" : 25 }
{ "_id" : "abe001", "age" : 35 }
{ "_id" : "efg001", "age" : 20 }
{ "_id" : "xyz001", "age" : 15 }
To nd the minimum value of the age eld from all the documents, use the $min (page 808) operator:
db.users.aggregate( [ { $group: { _id:0, minAge: { $min: "$age"} } } ] )
The operation returns the value of the age eld in the minAge eld:
{ "result" : [ { "_id" : 0, "minAge" : 15 } ], "ok" : 1 }
22.2. Expressions 237
MongoDB Documentation, Release 2.4.1
To nd the minimum value of the age eld for only those documents with _id starting with the letter a,
use the $min (page 808) operator after a $match (page 807) operation:
db.users.aggregate( [ { $match: { _id: /^a/ } },
{ $group: { _id: 0, minAge: { $min: "$age"} } }
] )
The operation returns the minimum value of the age eld for the two documents with _id starting with
the letter a:
{ "result" : [ { "_id" : 0, "minAge" : 25 } ], "ok" : 1 }
Example
The users collection contains the following documents where some of the documents are either missing the
age eld or the age eld contains null:
{ "_id" : "abc001", "age" : 25 }
{ "_id" : "abe001", "age" : 35 }
{ "_id" : "efg001", "age" : 20 }
{ "_id" : "xyz001", "age" : 15 }
{ "_id" : "xxx001" }
{ "_id" : "zzz001", "age" : null }
The following operation nds the minimum value of the age eld in all the documents:
db.users.aggregate( [ { $group: { _id:0, minAge: { $min: "$age"} } } ] )
Because only some documents for the $min (page 808) operation are missing the age eld or have age
eld equal to null, $min (page 808) only considers the non-null and the non-missing values and the
operation returns the following document:
{ "result" : [ { "_id" : 0, "minAge" : 15 } ], "ok" : 1 }
The following operation nds the minimum value of the age eld for only those documents where the
_id equals "xxx001" or "zzz001":
db.users.aggregate( [ { $match: { _id: {$in: [ "xxx001", "zzz001" ] } } },
{ $group: { _id: 0, minAge: { $min: "$age"} } }
] )
The $min operation returns null for the minimum age since all documents for the $min operation have
null value for the eld age or are missing the eld:
{ "result" : [ { "_id" : 0, "minAge" : null } ], "ok" : 1 }
$avg
Returns the average of all the values of the eld in all documents selected by this group.
$push
Returns an array of all the values found in the selected eld among the documents in that group. A value may
appear more than once in the result set if more than one eld in the grouped documents has that value.
$sum
Returns the sum of all the values for a specied eld in the grouped documents, as in the second use above.
238 Chapter 22. Aggregation Framework Reference
MongoDB Documentation, Release 2.4.1
Alternately, if you specify a value as an argument, $sum (page 815) will increment this eld by the specied
value for every document in the grouping. Typically, as in the rst use above, specify a value of 1 in order to
count members of the group.
22.2.2 Boolean Operators
The three boolean operators accept Booleans as arguments and return Booleans as results.
Note: These operators convert non-booleans to Boolean values according to the BSON standards. Here, null,
undefined, and 0 values become false, while non-zero numeric values, and all other types, such as strings, dates,
objects become true.
$and
Takes an array one or more values and returns true if all of the values in the array are true. Otherwise $and
(page 798) returns false.
Note: $and (page 798) uses short-circuit logic: the operation stops evaluation after encountering the rst
false expression.
$or
Takes an array of one or more values and returns true if any of the values in the array are true. Otherwise
$or (page 811) returns false.
Note: $or (page 811) uses short-circuit logic: the operation stops evaluation after encountering the rst true
expression.
$not
Returns the boolean opposite value passed to it. When passed a true value, $not (page 810) returns false;
when passed a false value, $not (page 810) returns true.
22.2.3 Comparison Operators
These operators perform comparisons between two values and return a Boolean, in most cases, reecting the result of
that comparison.
All comparison operators take an array with a pair of values. You may compare numbers, strings, and dates. Except
for $cmp (page 798), all comparison operators return a Boolean value. $cmp (page 798) returns an integer.
$cmp
Takes two values in an array and returns an integer. The returned value is:
A negative number if the rst value is less than the second.
A positive number if the rst value is greater than the second.
0 if the two values are equal.
$eq
Takes two values in an array and returns a boolean. The returned value is:
true when the values are equivalent.
false when the values are not equivalent.
22.2. Expressions 239
MongoDB Documentation, Release 2.4.1
$gt
Takes two values in an array and returns an boolean. The returned value is:
true when the rst value is greater than the second value.
false when the rst value is less than or equal to the second value.
$gte
Takes two values in an array and returns an boolean. The returned value is:
true when the rst value is greater than or equal to the second value.
false when the rst value is less than the second value.
$lt
Takes two values in an array and returns an boolean. The returned value is:
true when the rst value is less than the second value.
false when the rst value is greater than or equal to the second value.
$lte
Takes two values in an array and returns an boolean. The returned value is:
true when the rst value is less than or equal to the second value.
false when the rst value is greater than the second value.
$ne
Takes two values in an array returns an boolean. The returned value is:
true when the values are not equivalent.
false when the values are equivalent.
22.2.4 Arithmetic Operators
These operators only support numbers.
$add
Takes an array of one or more numbers and adds them together, returning the sum.
$divide
Takes an array that contains a pair of numbers and returns the value of the rst number divided by the second
number.
$mod
Takes an array that contains a pair of numbers and returns the remainder of the rst number divided by the
second number.
See Also:
$mod (page 770)
$multiply
Takes an array of one or more numbers and multiples them, returning the resulting product.
$subtract
Takes an array that contains a pair of numbers and subtracts the second from the rst, returning their difference.
240 Chapter 22. Aggregation Framework Reference
MongoDB Documentation, Release 2.4.1
22.2.5 String Operators
These operators manipulate strings within projection expressions.
$concat
New in version 2.4. Takes an array of strings, concatenates the strings, and returns the concatenated string.
$concat (page 799) can only accept an array of strings.
Use $concat (page 799) with the following syntax:
{ $concat: [ <string>, <string>, ... ] }
If array element has a value of null or refers to a eld that is missing, $concat (page 799) will return null.
$strcasecmp
Takes in two strings. Returns a number. $strcasecmp (page 814) is positive if the rst string is greater
than the second and negative if the rst string is less than the second. $strcasecmp (page 814) returns 0
if the strings are identical.
Note: $strcasecmp (page 814) may not make sense when applied to glyphs outside the Roman alphabet.
$strcasecmp (page 814) internally capitalizes strings before comparing them to provide a case-insensitive
comparison. Use $cmp (page 798) for a case sensitive comparison.
$substr
$substr (page 814) takes a string and two numbers. The rst number represents the number of bytes in the
string to skip, and the second number species the number of bytes to return from the string.
Note: $substr (page 814) is not encoding aware and if used improperly may produce a result string contain-
ing an invalid UTF-8 character sequence.
$toLower
Takes a single string and converts that string to lowercase, returning the result. All uppercase letters become
lowercase.
Note: $toLower (page 815) may not make sense when applied to glyphs outside the Roman alphabet.
$toUpper
Takes a single string and converts that string to uppercase, returning the result. All lowercase letters become
uppercase.
Note: $toUpper (page 815) may not make sense when applied to glyphs outside the Roman alphabet.
22.2.6 Date Operators
All date operators take a Date typed value as a single argument and return a number.
$dayOfYear
Takes a date and returns the day of the year as a number between 1 and 366.
$dayOfMonth
Takes a date and returns the day of the month as a number between 1 and 31.
22.2. Expressions 241
MongoDB Documentation, Release 2.4.1
$dayOfWeek
Takes a date and returns the day of the week as a number between 1 (Sunday) and 7 (Saturday.)
$year
Takes a date and returns the full year.
$month
Takes a date and returns the month as a number between 1 and 12.
$week
Takes a date and returns the week of the year as a number between 0 and 53.
Weeks begin on Sundays, and week 1 begins with the rst Sunday of the year. Days preceding the rst Sunday
of the year are in week 0. This behavior is the same as the %U operator to the strftime standard library
function.
$hour
Takes a date and returns the hour between 0 and 23.
$minute
Takes a date and returns the minute between 0 and 59.
$second
Takes a date and returns the second between 0 and 59, but can be 60 to account for leap seconds.
$millisecond
Takes a date and returns the millisecond portion of the date as an integer between 0 and 999.
22.2.7 Conditional Expressions
$cond
Use the $cond (page 801) operator with the following syntax:
{ $cond: [ <boolean-expression>, <true-case>, <false-case> ] }
Takes an array with three expressions, where the rst expression evaluates to a Boolean value. If the rst
expression evaluates to true, $cond (page 801) returns the value of the second expression. If the rst expression
evaluates to false, $cond (page 801) evaluates and returns the third expression.
$ifNull
Use the $ifNull (page 806) operator with the following syntax:
{ $ifNull: [ <expression>, <replacement-if-null> ] }
Takes an array with two expressions. $ifNull (page 806) returns the rst expression if it evaluates to a
non-null value. Otherwise, $ifNull (page 806) returns the second expressions value.
242 Chapter 22. Aggregation Framework Reference
CHAPTER 23
Map-Reduce
Map-reduce operations can handle complex aggregation tasks. To performmap-reduce operations, MongoDBprovides
the mapReduce (page 860) command and, in the mongo (page 1002) shell, the db.collection.mapReduce()
(page 922) wrapper method.
For many simple aggregation tasks, see the aggregation framework (page 211).
23.1 Map-Reduce Examples
This section provides some map-reduce examples in the mongo (page 1002) shell using the
db.collection.mapReduce() (page 922) method:
db.collection.mapReduce(
<mapfunction>,
<reducefunction>,
{
out: <collection>,
query: <document>,
sort: <document>,
limit: <number>,
finalize: <function>,
scope: <document>,
jsMode: <boolean>,
verbose: <boolean>
}
)
For more information on the parameters, see the db.collection.mapReduce() (page 922) reference page .
Consider the following map-reduce operations on a collection orders that contains documents of the following
prototype:
{
_id: ObjectId("50a8240b927d5d8b5891743c"),
cust_id: "abc123",
ord_date: new Date("Oct 04, 2012"),
status: A,
price: 250,
items: [ { sku: "mmm", qty: 5, price: 2.5 },
243
MongoDB Documentation, Release 2.4.1
{ sku: "nnn", qty: 5, price: 2.5 } ]
}
23.1.1 Return the Total Price Per Customer Id
Performmap-reduce operation on the orders collection to group by the cust_id, and for each cust_id, calculate
the sum of the price for each cust_id:
1. Dene the map function to process each input document:
In the function, this refers to the document that the map-reduce operation is processing.
The function maps the price to the cust_id for each document and emits the cust_id and price
pair.
var mapFunction1 = function() {
emit(this.cust_id, this.price);
};
2. Dene the corresponding reduce function with two arguments keyCustId and valuesPrices:
The valuesPrices is an array whose elements are the price values emitted by the map function and
grouped by keyCustId.
The function reduces the valuesPrice array to the sum of its elements.
var reduceFunction1 = function(keyCustId, valuesPrices) {
return Array.sum(valuesPrices);
};
3. Perform the map-reduce on all documents in the orders collection using the mapFunction1 map function
and the reduceFunction1 reduce function.
db.orders.mapReduce(
mapFunction1,
reduceFunction1,
{ out: "map_reduce_example" }
)
This operation outputs the results to a collection named map_reduce_example. If the
map_reduce_example collection already exists, the operation will replace the contents with the re-
sults of this map-reduce operation:
23.1.2 Calculate the Number of Orders, Total Quantity, and Average Quantity Per
Item
In this example you will perform a map-reduce operation on the orders collection, for all documents that have
an ord_date value greater than 01/01/2012. The operation groups by the item.sku eld, and for each sku
calculates the number of orders and the total quantity ordered. The operation concludes by calculating the average
quantity per order for each sku value:
1. Dene the map function to process each input document:
In the function, this refers to the document that the map-reduce operation is processing.
For each item, the function associates the sku with a new object value that contains the count of 1
and the item qty for the order and emits the sku and value pair.
244 Chapter 23. Map-Reduce
MongoDB Documentation, Release 2.4.1
var mapFunction2 = function() {
for (var idx = 0; idx < this.items.length; idx++) {
var key = this.items[idx].sku;
var value = {
count: 1,
qty: this.items[idx].qty
};
emit(key, value);
}
};
2. Dene the corresponding reduce function with two arguments keySKU and valuesCountObjects:
valuesCountObjects is an array whose elements are the objects mapped to the grouped keySKU
values passed by map function to the reducer function.
The function reduces the valuesCountObjects array to a single object reducedValue that also
contains the count and the qty elds.
In reducedValue, the count eld contains the sum of the count elds from the individual array
elements, and the qty eld contains the sum of the qty elds from the individual array elements.
var reduceFunction2 = function(keySKU, valuesCountObjects) {
reducedValue = { count: 0, qty: 0 };
for (var idx = 0; idx < valuesCountObjects.length; idx++) {
reducedValue.count += valuesCountObjects[idx].count;
reducedValue.qty += valuesCountObjects[idx].qty;
}
return reducedValue;
};
3. Dene a nalize function with two arguments key and reducedValue. The function modies the
reducedValue object to add a computed eld named average and returns the modied object:
var finalizeFunction2 = function (key, reducedValue) {
reducedValue.average = reducedValue.qty/reducedValue.count;
return reducedValue;
};
4. Perform the map-reduce operation on the orders collection using the mapFunction2,
reduceFunction2, and finalizeFunction2 functions.
db.orders.mapReduce( mapFunction2,
reduceFunction2,
{
out: { merge: "map_reduce_example" },
query: { ord_date: { $gt: new Date(01/01/2012) } },
finalize: finalizeFunction2
}
)
This operation uses the query eld to select only those documents with ord_date greater than new
Date(01/01/2012). Then it output the results to a collection map_reduce_example. If the
map_reduce_example collection already exists, the operation will merge the existing contents with the
results of this map-reduce operation:
23.1. Map-Reduce Examples 245
MongoDB Documentation, Release 2.4.1
23.2 Incremental Map-Reduce
If the map-reduce dataset is constantly growing, then rather than performing the map-reduce operation over the entire
dataset each time you want to run map-reduce, you may want to perform an incremental map-reduce.
To perform incremental map-reduce:
1. Run a map-reduce job over the current collection and output the result to a separate collection.
2. When you have more data to process, run subsequent map-reduce job with:
the query parameter that species conditions that match only the new documents.
the out parameter that species the reduce action to merge the new results into the existing output
collection.
Consider the following example where you schedule a map-reduce operation on a sessions collection to run at the
end of each day.
23.2.1 Data Setup
The sessions collection contains documents that log users session each day, for example:
db.sessions.save( { userid: "a", ts: ISODate(2011-11-03 14:17:00), length: 95 } );
db.sessions.save( { userid: "b", ts: ISODate(2011-11-03 14:23:00), length: 110 } );
db.sessions.save( { userid: "c", ts: ISODate(2011-11-03 15:02:00), length: 120 } );
db.sessions.save( { userid: "d", ts: ISODate(2011-11-03 16:45:00), length: 45 } );
db.sessions.save( { userid: "a", ts: ISODate(2011-11-04 11:05:00), length: 105 } );
db.sessions.save( { userid: "b", ts: ISODate(2011-11-04 13:14:00), length: 120 } );
db.sessions.save( { userid: "c", ts: ISODate(2011-11-04 17:00:00), length: 130 } );
db.sessions.save( { userid: "d", ts: ISODate(2011-11-04 15:37:00), length: 65 } );
23.2.2 Initial Map-Reduce of Current Collection
Run the rst map-reduce operation as follows:
1. Dene the map function that maps the userid to an object that contains the elds userid, total_time,
count, and avg_time:
var mapFunction = function() {
var key = this.userid;
var value = {
userid: this.userid,
total_time: this.length,
count: 1,
avg_time: 0
};
emit( key, value );
};
2. Dene the corresponding reduce function with two arguments key and values to calculate the total time
and the count. The key corresponds to the userid, and the values is an array whose elements corresponds
to the individual objects mapped to the userid in the mapFunction.
246 Chapter 23. Map-Reduce
MongoDB Documentation, Release 2.4.1
var reduceFunction = function(key, values) {
var reducedObject = {
userid: key,
total_time: 0,
count:0,
avg_time:0
};
values.forEach( function(value) {
reducedObject.total_time += value.total_time;
reducedObject.count += value.count;
}
);
return reducedObject;
};
3. Dene finalize function with two arguments key and reducedValue. The function modies the
reducedValue document to add another eld average and returns the modied document.
var finalizeFunction = function (key, reducedValue) {
if (reducedValue.count > 0)
reducedValue.avg_time = reducedValue.total_time / reducedValue.count;
return reducedValue;
};
4. Perform map-reduce on the session collection using the mapFunction, the reduceFunction, and the
finalizeFunction functions. Output the results to a collection session_stat. If the session_stat
collection already exists, the operation will replace the contents:
db.sessions.mapReduce( mapFunction,
reduceFunction,
{
out: { reduce: "session_stat" },
finalize: finalizeFunction
}
)
23.2.3 Subsequent Incremental Map-Reduce
Later as the sessions collection grows, you can run additional map-reduce operations. For example, add new
documents to the sessions collection:
db.sessions.save( { userid: "a", ts: ISODate(2011-11-05 14:17:00), length: 100 } );
db.sessions.save( { userid: "b", ts: ISODate(2011-11-05 14:23:00), length: 115 } );
db.sessions.save( { userid: "c", ts: ISODate(2011-11-05 15:02:00), length: 125 } );
db.sessions.save( { userid: "d", ts: ISODate(2011-11-05 16:45:00), length: 55 } );
At the end of the day, perform incremental map-reduce on the sessions collection but use the query eld to select
only the new documents. Output the results to the collection session_stat, but reduce the contents with the
results of the incremental map-reduce:
db.sessions.mapReduce( mapFunction,
reduceFunction,
{
23.2. Incremental Map-Reduce 247
MongoDB Documentation, Release 2.4.1
query: { ts: { $gt: ISODate(2011-11-05 00:00:00) } },
out: { reduce: "session_stat" },
finalize: finalizeFunction
}
);
23.3 Temporary Collection
The map-reduce operation uses a temporary collection during processing. At completion, the map-reduce operation
renames the temporary collection. As a result, you can perform a map-reduce operation periodically with the same
target collection name without affecting the intermediate states. Use this mode when generating statistical output
collections on a regular basis.
23.4 Concurrency
The map-reduce operation is composed of many tasks, including:
reads from the input collection,
executions of the map function,
executions of the reduce function,
writes to the output collection.
These various tasks take the following locks:
The read phase takes a read lock. It yields every 100 documents.
The insert into the temporary collection takes a write lock for a single write.
If the output collection does not exist, the creation of the output collection takes a write lock.
If the output collection exists, then the output actions (i.e. merge, replace, reduce) take a write lock.
Changed in version 2.4: The V8 JavaScript engine, which became the default in 2.4, allows multiple JavaScript
operations to execute at the same time. Prior to 2.4, JavaScript code (i.e. map, reduce, finalize functions)
executed in a single thread.
Note: The nal write lock during post-processing makes the results appear atomically. However, output actions
merge and reduce may take minutes to process. For the merge and reduce, the nonAtomic ag is available.
See the db.collection.mapReduce() (page 922) reference for more information.
23.5 Sharded Cluster
23.5.1 Sharded Input
When using sharded collection as the input for a map-reduce operation, mongos (page 999) will automatically dis-
patch the map-reduce job to each shard in parallel. There is no special option required. mongos (page 999) will wait
for jobs on all shards to nish.
248 Chapter 23. Map-Reduce
MongoDB Documentation, Release 2.4.1
23.5.2 Sharded Output
By default the output collection is not sharded. The process is:
mongos (page 999) dispatches a map-reduce nish job to the shard that will store the target collection.
The target shard pulls results from all other shards, and runs a nal reduce/nalize operation, and write to the
output.
If using the sharded option to the out parameter, MongoDB shards the output using _id eld as the shard
key. Changed in version 2.2.
If the output collection does not exist, MongoDB creates and shards the collection on the _id eld. If the
collection is empty, MongoDB creates chunks using the result of the rst stage of the map-reduce operation.
mongos (page 999) dispatches, in parallel, a map-reduce nish job to every shard that owns a chunk.
Each shard will pull the results it owns from all other shards, run a nal reduce/nalize, and write to the output
collection.
Note:
During later map-reduce jobs, MongoDB splits chunks as needed.
Balancing of chunks for the output collection is automatically prevented during post-processing to avoid con-
currency issues.
In MongoDB 2.0:
mongos (page 999) retrieves the results from each shard, and performs merge sort to order the results, and
performs a reduce/nalize as needed. mongos (page 999) then writes the result to the output collection in
sharded mode.
This model requires only a small amount of memory, even for large datasets.
Shard chunks are not automatically split during insertion. This requires manual intervention until the chunks
are granular and balanced.
Warning: For best results, only use the sharded output options for mapReduce (page 860) in version 2.2 or
later.
23.6 Troubleshooting Map-Reduce Operations
You can troubleshoot the map function and the reduce function in the mongo (page 1002) shell.
23.6.1 Troubleshoot the Map Function
You can verify the key and value pairs emitted by the map function by writing your own emit function.
Consider a collection orders that contains documents of the following prototype:
{
_id: ObjectId("50a8240b927d5d8b5891743c"),
cust_id: "abc123",
ord_date: new Date("Oct 04, 2012"),
status: A,
price: 250,
23.6. Troubleshooting Map-Reduce Operations 249
MongoDB Documentation, Release 2.4.1
items: [ { sku: "mmm", qty: 5, price: 2.5 },
{ sku: "nnn", qty: 5, price: 2.5 } ]
}
1. Dene the map function that maps the price to the cust_id for each document and emits the cust_id and
price pair:
var map = function() {
emit(this.cust_id, this.price);
};
2. Dene the emit function to print the key and value:
var emit = function(key, value) {
print("emit");
print("key: " + key + " value: " + tojson(value));
}
3. Invoke the map function with a single document from the orders collection:
var myDoc = db.orders.findOne( { _id: ObjectId("50a8240b927d5d8b5891743c") } );
map.apply(myDoc);
4. Verify the key and value pair is as you expected.
emit
key: abc123 value:250
5. Invoke the map function with multiple documents from the orders collection:
var myCursor = db.orders.find( { cust_id: "abc123" } );
while (myCursor.hasNext()) {
var doc = myCursor.next();
print ("document _id= " + tojson(doc._id));
map.apply(doc);
print();
}
6. Verify the key and value pairs are as you expected.
23.6.2 Troubleshoot the Reduce Function
Conrm Output Type
You can test that the reduce function returns a value that is the same type as the value emitted from the map function.
1. Dene a reduceFunction1 function that takes the arguments keyCustId and valuesPrices.
valuesPrices is an array of integers:
var reduceFunction1 = function(keyCustId, valuesPrices) {
return Array.sum(valuesPrices);
};
2. Dene a sample array of integers:
var myTestValues = [ 5, 5, 10 ];
3. Invoke the reduceFunction1 with myTestValues:
250 Chapter 23. Map-Reduce
MongoDB Documentation, Release 2.4.1
reduceFunction1(myKey, myTestValues);
4. Verify the reduceFunction1 returned an integer:
20
5. Dene a reduceFunction2 function that takes the arguments keySKU and valuesCountObjects.
valuesCountObjects is an array of documents that contain two elds count and qty:
var reduceFunction2 = function(keySKU, valuesCountObjects) {
reducedValue = { count: 0, qty: 0 };
for (var idx = 0; idx < valuesCountObjects.length; idx++) {
reducedValue.count += valuesCountObjects[idx].count;
reducedValue.qty += valuesCountObjects[idx].qty;
}
return reducedValue;
};
6. Dene a sample array of documents:
var myTestObjects = [
{ count: 1, qty: 5 },
{ count: 2, qty: 10 },
{ count: 3, qty: 15 }
];
7. Invoke the reduceFunction2 with myTestObjects:
reduceFunction2(myKey, myTestObjects);
8. Verify the reduceFunction2 returned a document with exactly the count and the qty eld:
{ "count" : 6, "qty" : 30 }
Ensure Insensitivity to the Order of Mapped Values
The reduce function takes a key and a values array as its argument. You can test that the result of the reduce
function does not depend on the order of the elements in the values array.
1. Dene a sample values1 array and a sample values2 array that only differ in the order of the array elements:
var values1 = [
{ count: 1, qty: 5 },
{ count: 2, qty: 10 },
{ count: 3, qty: 15 }
];
var values2 = [
{ count: 3, qty: 15 },
{ count: 1, qty: 5 },
{ count: 2, qty: 10 }
];
2. Dene a reduceFunction2 function that takes the arguments keySKU and valuesCountObjects.
valuesCountObjects is an array of documents that contain two elds count and qty:
23.6. Troubleshooting Map-Reduce Operations 251
MongoDB Documentation, Release 2.4.1
var reduceFunction2 = function(keySKU, valuesCountObjects) {
reducedValue = { count: 0, qty: 0 };
for (var idx = 0; idx < valuesCountObjects.length; idx++) {
reducedValue.count += valuesCountObjects[idx].count;
reducedValue.qty += valuesCountObjects[idx].qty;
}
return reducedValue;
};
3. Invoke the reduceFunction2 rst with values1 and then with values2:
reduceFunction2(myKey, values1);
reduceFunction2(myKey, values2);
4. Verify the reduceFunction2 returned the same result:
{ "count" : 6, "qty" : 30 }
Ensure Reduce Function Idempotentcy
Because the map-reduce operation may call a reduce multiple times for the same key, the reduce function must
return a value of the same type as the value emitted from the map function. You can test that the reduce function
process reduced values without affecting the nal value.
1. Dene a reduceFunction2 function that takes the arguments keySKU and valuesCountObjects.
valuesCountObjects is an array of documents that contain two elds count and qty:
var reduceFunction2 = function(keySKU, valuesCountObjects) {
reducedValue = { count: 0, qty: 0 };
for (var idx = 0; idx < valuesCountObjects.length; idx++) {
reducedValue.count += valuesCountObjects[idx].count;
reducedValue.qty += valuesCountObjects[idx].qty;
}
return reducedValue;
};
2. Dene a sample key:
var myKey = myKey;
3. Dene a sample valuesIdempotent array that contains an element that is a call to the reduceFunction2
function:
var valuesIdempotent = [
{ count: 1, qty: 5 },
{ count: 2, qty: 10 },
reduceFunction2(myKey, [ { count:3, qty: 15 } ] )
];
4. Dene a sample values1 array that combines the values passed to reduceFunction2:
var values1 = [
{ count: 1, qty: 5 },
{ count: 2, qty: 10 },
252 Chapter 23. Map-Reduce
MongoDB Documentation, Release 2.4.1
{ count: 3, qty: 15 }
];
5. Invoke the reduceFunction2 rst with myKey and valuesIdempotent and then with myKey and
values1:
reduceFunction2(myKey, valuesIdempotent);
reduceFunction2(myKey, values1);
6. Verify the reduceFunction2 returned the same result:
{ "count" : 6, "qty" : 30 }
In addition to the aggregation framework, MongoDB provides simple aggregation methods and commands (page 255),
that you may nd useful for some classes of tasks:
23.6. Troubleshooting Map-Reduce Operations 253
MongoDB Documentation, Release 2.4.1
254 Chapter 23. Map-Reduce
CHAPTER 24
Simple Aggregation Methods and
Commands
In addition to the aggregation framework (page 211) and map-reduce, MongoDB provides the following methods and
commands to perform aggregation:
24.1 Count
MongoDB offers the following command and methods to provide count functionality:
count (page 830)
db.collection.count() (page 905)
cursor.count() (page 892)
24.2 Distinct
MongoDB offers the following command and method to provide the distinct functionality:
distinct (page 833)
db.collection.distinct() (page 906)
24.3 Group
MongoDB offers the following command and method to provide group functionality:
group (page 849)
db.collection.group() (page 918)
255
MongoDB Documentation, Release 2.4.1
256 Chapter 24. Simple Aggregation Methods and Commands
Part VI
Text Search
257
MongoDB Documentation, Release 2.4.1
New in version 2.4.
259
MongoDB Documentation, Release 2.4.1
260
CHAPTER 25
Overview
Text search supports the search of string content in documents of a collection. Text search introduces a new text
(page 282) index type and a new text (page 883) command.
The text search process:
tokenizes and stems the search term(s) during both the index creation and the text command execution.
assigns a score to each document that contains the search term in the indexed elds. The score determines the
relevance of a document to a given search query.
By default, text (page 883) command returns at most the top 100 matching documents as determined by the scores.
261
MongoDB Documentation, Release 2.4.1
262 Chapter 25. Overview
CHAPTER 26
Create a text Index
To perform text search, create a text index on the eld or elds whose value is a string or an array of string elements.
To create a text index, use the db.collection.ensureIndex() (page 908) method with a document that
contains eld and value pairs where the value is the string literal text.
Important:
Before you can create a text index (page 323) or run the text command (page 324), you need to manually enable
the text search. See Enable Text Search (page 519) for information on how to enable the text search feature.
Text indexes have signicant storage requirements and performance costs. See text index feature (page 282) for
more information.
A collection can have at most one text index.
The following example creates a text index on the elds subject and content:
db.collection.ensureIndex(
{
subject: "text",
content: "text"
}
)
This text index catalogs all string data in the subject eld and the content eld, where the eld value is either
a string or an array of string elements.
See text Index (page 324) for details on the options available when creating text indexes.
Additionally, MongoDB permits compound indexes (page 275) that include text index elds in combination with
ascending/descending index elds. For more information, see:
Limit the Number of Index Entries Scanned for Text Search (page 524)
Return Text Queries Using Only a text Index (page 524)
263
MongoDB Documentation, Release 2.4.1
264 Chapter 26. Create a text Index
CHAPTER 27
text Command
The text (page 883) command can search for words and phrases. The command matches on the complete stemmed
words. For example, if a document eld contains the word blueberry, a search on the term blue will not match
the document. However, a search on either blueberry or blueberries will match.
By default, the text (page 883) command returns the top 100 scoring documents in descending order, but you can
specify a limit option to change the maximum number to return.
Given a collection with a text index, use the runCommand() method to execute the text (page 883) command,
as in:
db.collection.runCommand( "text" , { search: <string> } )
For information and examples on various text search patterns, see Search String Content for Text (page 519).
265
MongoDB Documentation, Release 2.4.1
266 Chapter 27. text Command
CHAPTER 28
Text Search Output
The text (page 883) command returns a document that contains the result set.
See Text Search Output (page 1138) for information on the output.
267
MongoDB Documentation, Release 2.4.1
268 Chapter 28. Text Search Output
Part VII
Indexes
269
MongoDB Documentation, Release 2.4.1
Indexes provide high performance read operations for frequently used queries. Indexes are particularly useful where
the total size of the documents exceeds the amount of available RAM.
For basic concepts and options, see Indexing Overview (page 273). For procedures and operational concerns, see
Indexing Operations (page 283). For information on how applications might use indexes, see Indexing Strategies
(page 289).
271
MongoDB Documentation, Release 2.4.1
272
CHAPTER 29
Core MongoDB Indexing Background
29.1 Indexing Overview
This document provides an overview of indexes in MongoDB, including index types and creation options. For op-
erational guidelines and procedures, see the Indexing Operations (page 283) document. For strategies and practical
approaches, see the Indexing Strategies (page 289) document.
29.1.1 Synopsis
An index is a data structure that allows you to quickly locate documents based on the values stored in certain specied
elds. Fundamentally, indexes in MongoDB are similar to indexes in other database systems. MongoDB supports
indexes on any eld or sub-eld contained in documents within a MongoDB collection.
MongoDB indexes have the following core features:
MongoDB denes indexes on a per-collection level.
You can create indexes on a single eld or on multiple elds using a compound index (page 275).
Indexes enhance query performance, often dramatically. However, each index also incurs some overhead for
every write operation. Consider the queries, the frequency of these queries, the size of your working set, the
insert load, and your applications requirements as you create indexes in your MongoDB environment.
All MongoDB indexes use a B-tree data structure. MongoDB can use this representation of the data to optimize
query responses.
Every query, including update operations, uses one and only one index. The query optimizer (page 134) selects
the index empirically by occasionally running alternate query plans and by selecting the plan with the best re-
sponse time for each query type. You can override the query optimizer using the cursor.hint() (page 894)
method.
An index covers a query if:
all the elds in the query (page 128) are part of that index, and
all the elds returned in the documents that match the query are in the same index.
When an index covers a query, the server can both match the query conditions (page 128) and return the results
using only the index; MongoDB does not need to look at the documents, only the index, to fulll the query.
Querying the index can be faster than querying the documents outside of the index.
273
MongoDB Documentation, Release 2.4.1
See Create Indexes that Support Covered Queries (page 290) for more information.
Using queries with good index coverage reduces the number of full documents that MongoDB needs to store in
memory, thus maximizing database performance and throughput.
If an update does not change the size of a document or cause the document to outgrow its allocated area, then
MongoDB will update an index only if the indexed elds have changed. This improves performance. Note that
if the document has grown and must move, all index keys must then update.
29.1.2 Index Types
This section enumerates the types of indexes available in MongoDB. For all collections, MongoDB creates the default
_id index (page 274). You can create additional indexes with the ensureIndex() (page 908) method on any single
eld or sequence of elds (page 275) within any document or sub-document (page 275). MongoDB also supports
indexes of arrays, called multi-key indexes (page 277).
_id Index
The _id index is a unique index (page 278)
1
on the _id eld, and MongoDB creates this index by default on all
collections.
2
You cannot delete the index on _id.
The _id eld is the primary key for the collection, and every document must have a unique _id eld. You may store
any unique value in the _id eld. The default value of _id is ObjectID on every insert() <db.collection.insert()
operation. An ObjectId is a 12-byte unique identiers suitable for use as the value of an _id eld.
Note: In sharded clusters, if you do not use the _id eld as the shard key, then your application must ensure the
uniqueness of the values in the _id eld to prevent errors. This is most-often done by using a standard auto-generated
ObjectId.
Secondary Indexes
All indexes in MongoDB are secondary indexes. You can create indexes on any eld within any document or sub-
document. Additionally, you can create compound indexes with multiple elds, so that a single query can match
multiple components using the index while scanning fewer whole documents.
In general, you should create indexes that support your primary, common, and user-facing queries. Doing so requires
MongoDB to scan the fewest number of documents possible.
In the mongo (page 1002) shell, you can create an index by calling the ensureIndex() (page 908) method.
Arguments to ensureIndex() (page 908) resemble the following:
{ "field": 1 }
{ "product.quantity": 1 }
{ "product": 1, "quantity": 1 }
For each eld in the index specify either 1 for an ascending order or -1 for a descending order, which represents the
order of the keys in the index. For indexes with more than one key (i.e. compound indexes (page 275)) the sequence
of elds is important.
1
Although the index on _id is unique, the getIndexes() (page 916) method will not print unique: true in the mongo (page 1002)
shell.
2
Before version 2.2 capped collections did not have an _id eld. In 2.2, all capped collections have an _id eld, except those in the local
database. See the release notes (page 1174) for more information.
274 Chapter 29. Core MongoDB Indexing Background
MongoDB Documentation, Release 2.4.1
Indexes on Sub-documents
You can create indexes on elds that hold sub-documents as in the following example:
Example
Given the following document in the factories collection:
{ "_id": ObjectId(...), metro: { city: "New York", state: "NY" } } )
You can create an index on the metro key. The following queries would then use that index, and both would return
the above document:
db.factories.find( { metro: { city: "New York", state: "NY" } } );
db.factories.find( { metro: { $gte : { city: "New York" } } } );
The second query returns the document because { city: "New York" } is less than { city: "New
York", state: "NY" } The order of comparison is in ascending key order in the order the keys occur in
the BSON document.
Indexes on Embedded Fields
You can create indexes on elds in sub-documents, just as you can index top-level elds in documents.
3
These
indexes allow you to use a dot notation, to introspect into sub-documents.
Consider a collection named people that holds documents that resemble the following example document:
{"_id": ObjectId(...)
"name": "John Doe"
"address": {
"street": "Main"
"zipcode": 53511
"state": "WI"
}
}
You can create an index on the address.zipcode eld, using the following specication:
db.people.ensureIndex( { "address.zipcode": 1 } )
Compound Indexes
MongoDB supports compound indexes, where a single index structure holds references to multiple elds within a
collections documents. Consider a collection named products that holds documents that resemble the following
document:
{
"_id": ObjectId(...)
"item": "Banana"
"category": ["food", "produce", "grocery"]
"location": "4th Street Store"
"stock": 4
3
Indexes on Sub-documents (page 275), by contrast allow you to index elds that hold documents, including the full content, up to the maximum
Index Size (page 1134) of the sub-document in the index.
29.1. Indexing Overview 275
MongoDB Documentation, Release 2.4.1
"type": cases
"arrival": Date(...)
}
If most applications queries include the item eld and a signicant number of queries will also check the stock
eld, you can specify a single compound index to support both of these queries:
db.products.ensureIndex( { "item": 1, "location": 1, "stock": 1 } )
Compound indexes support queries on any prex of the elds in the index.
4
For example, MongoDB can use the
above index to support queries that select the item eld and to support queries that select the item eld and the
location eld. The index, however, would not support queries that select the following:
only the location eld
only the stock eld
only the location and stock elds
only the item and stock elds
Important: You may not create compound indexes that have hashed index elds. You will receive an error if you
attempt to create a compound index that includes a hashed index (page 284).
When creating an index, the number associated with a key species the direction of the index. The options are 1
(ascending) and -1 (descending). Direction doesnt matter for single key indexes or for random access retrieval but is
important if you are doing sort queries on compound indexes.
The order of elds in a compound index is very important. In the previous example, the index will contain references
to documents sorted rst by the values of the item eld and, within each value of the item eld, sorted by the values
of location, and then sorted by values of the stock eld.
Indexes with Ascending and Descending Keys
Indexes store references to elds in either ascending or descending order. For single-eld indexes, the order of
keys doesnt matter, because MongoDB can traverse the index in either direction. However, for compound indexes
(page 275), if you need to order results against two elds, sometimes you need the index elds running in opposite
order relative to each other.
To specify an index with a descending order, use the following form:
db.products.ensureIndex( { "field": -1 } )
More typically in the context of a compound index (page 275), the specication would resemble the following proto-
type:
db.products.ensureIndex( { "fieldA": 1, "fieldB": -1 } )
Consider a collection of event data that includes both usernames and a timestamp. If you want to return a list of events
sorted by username and then with the most recent events rst. To create this index, use the following command:
db.events.ensureIndex( { "username" : 1, "timestamp" : -1 } )
4
Index prexes are the beginning subset of elds. For example, given the index { a: 1, b: 1, c: 1 } both { a: 1 } and {
a: 1, b: 1 } are prexes of the index.
276 Chapter 29. Core MongoDB Indexing Background
MongoDB Documentation, Release 2.4.1
Multikey Indexes
If you index a eld that contains an array, MongoDB indexes each value in the array separately, in a multikey index.
Example
Given the following document:
{ "_id" : ObjectId("..."),
"name" : "Warm Weather",
"author" : "Steve",
"tags" : [ "weather", "hot", "record", "april" ] }
Then an index on the tags eld would be a multikey index and would include these separate entries:
{ tags: "weather" }
{ tags: "hot" }
{ tags: "record" }
{ tags: "april" }
Queries could use the multikey index to return queries for any of the above values.
Note: For hashed indexes, MongoDB collapses sub-documents and computes the hash for the entire value, but does
not support multi-key (i.e. arrays) indexes. For elds that hold sub-documents, you cannot use the index to support
queries that introspect the sub-document.
You can use multikey indexes to index elds within objects embedded in arrays, as in the following example:
Example
Consider a feedback collection with documents in the following form:
{
"_id": ObjectId(...)
"title": "Grocery Quality"
"comments": [
{ author_id: ObjectId(...)
date: Date(...)
text: "Please expand the cheddar selection." },
{ author_id: ObjectId(...)
date: Date(...)
text: "Please expand the mustard selection." },
{ author_id: ObjectId(...)
date: Date(...)
text: "Please expand the olive selection." }
]
}
An index on the comments.text eld would be a multikey index and would add items to the index for all of the
sub-documents in the array.
With an index, such as { comments.text: 1 } you, consider the following query:
db.feedback.find( { "comments.text": "Please expand the olive selection." } )
This would select the document, that contains the following document in the comments.text array:
29.1. Indexing Overview 277
MongoDB Documentation, Release 2.4.1
{ author_id: ObjectId(...)
date: Date(...)
text: "Please expand the olive selection." }
Compound Multikey Indexes May Only Include One Array Field
While you can create multikey compound indexes (page 275), at most one eld in a compound index may hold an
array. For example, given an index on { a: 1, b: 1 }, the following documents are permissible:
{a: [1, 2], b: 1}
{a: 1, b: [1, 2]}
However, the following document is impermissible, and MongoDB cannot insert such a document into a collection
with the {a: 1, b: 1 } index:
{a: [1, 2], b: [1, 2]}
If you attempt to insert a such a document, MongoDB will reject the insertion, and produce an error that says cannot
index parallel arrays. MongoDB does not index parallel arrays because they require the index to include
each value in the Cartesian product of the compound keys, which could quickly result in incredibly large and difcult
to maintain indexes.
Unique Indexes
A unique index causes MongoDB to reject all documents that contain a duplicate value for the indexed eld. To
create a unique index on the user_id eld of the members collection, use the following operation in the mongo
(page 1002) shell:
db.addresses.ensureIndex( { "user_id": 1 }, { unique: true } )
By default, unique is false on MongoDB indexes.
If you use the unique constraint on a compound index (page 275) then MongoDB will enforce uniqueness on the
combination of values, rather than the individual value for any or all values of the key.
If a document does not have a value for the indexed eld in a unique index, the index will store a null value for this
document. MongoDB will only permit one document without a unique value in the collection because of this unique
constraint. You can combine with the sparse index (page 278) to lter these null values from the unique index.
You may not specify a unique constraint on a hashed index (page 279).
Sparse Indexes
Sparse indexes only contain entries for documents that have the indexed eld.
5
Any document that is missing the
eld is not indexed. The index is sparse because of the missing documents when values are missing.
By contrast, non-sparse indexes contain all documents in a collection, and store null values for documents that do
not contain the indexed eld. Create a sparse index on the xmpp_id eld, of the members collection, using the
following operation in the mongo (page 1002) shell:
db.addresses.ensureIndex( { "xmpp_id": 1 }, { sparse: true } )
5
All documents that have the indexed eld are indexed in a sparse index, even if that eld stores a null value in some documents.
278 Chapter 29. Core MongoDB Indexing Background
MongoDB Documentation, Release 2.4.1
By default, sparse is false on MongoDB indexes.
Warning: Using these indexes will sometimes result in incomplete results when ltering or sorting results,
because sparse indexes are not complete for all documents in a collection.
Note: Do not confuse sparse indexes in MongoDB with block-level indexes in other databases. Think of them as
dense indexes with a specic lter.
You can combine the sparse index option with the unique indexes (page 278) option so that mongod (page 989) will
reject documents that have duplicate values for a eld, but that ignore documents that do not have the key.
Hashed Index
New in version 2.4. Hashed indexes maintain entries with hashes of the values of the indexed eld. The hashing
function collapses sub-documents and computes the hash for the entire value but does not support multi-key (i.e.
arrays) indexes.
MongoDB can use the hashed index to support equality queries, but hashed indexes do not support range queries.
You may not create compound indexes that have hashed index elds or specify a unique constraint
on a hashed index; however, you can create both a hashed index and an ascending/descending
(i.e. non-hashed) index on the same eld: MongoDB will use the scalar index for range queries.
Warning: hashed indexes truncate oating point numbers to 64-bit integers before hashing. For example,
a hashed index would store the same value for a eld that held a value of 2.3, 2.2 and 2.9. To prevent
collisions, do not use a hashed index for oating point numbers that cannot be consistently converted to 64-bit
integers (and then back to oating point.) hashed indexes do not support oating point values larger than 2
53
.
Create a hashed index using an operation that resembles the following:
db.active.ensureIndex( { a: "hashed" } )
This operation creates a hashed index for the active collection on the a eld.
29.1.3 Index Creation Options
You specify index creation options in the second argument in ensureIndex() (page 908).
The options sparse (page 278), unique (page 278), and TTL (page 281) affect the kind of index that MongoDB creates.
This section addresses, background construction (page 279) and duplicate dropping (page 281), which affect how
MongoDB builds the indexes.
Background Construction
By default, creating an index is a blocking operation. Building an index on a large collection of data can take a
long time to complete. To resolve this issue, the background option can allow you to continue to use your mongod
(page 989) instance during the index build.
For example, to create an index in the background of the zipcode eld of the people collection you would issue
the following:
db.people.ensureIndex( { zipcode: 1}, {background: true} )
29.1. Indexing Overview 279
MongoDB Documentation, Release 2.4.1
By default, background is false for building MongoDB indexes.
You can combine the background option with other options, as in the following:
db.people.ensureIndex( { zipcode: 1}, {background: true, sparse: true } )
Be aware of the following behaviors with background index construction:
A mongod (page 989) instance can build more than one index in the background concurrently. Changed in
version 2.4: Before 2.4, a mongod (page 989) instance could only build one background index per database at
a time.Changed in version 2.2: Before 2.2, a single mongod (page 989) instance could only build one index at
a time.
The indexing operation runs in the background so that other database operations can run while creating the
index. However, the mongo (page 1002) shell session or connection where you are creating the index will block
until the index build is complete. Open another connection or mongo (page 1002) instance to continue using
commands to the database.
The background index operation use an incremental approach that is slower than the normal foreground index
builds. If the index is larger than the available RAM, then the incremental process can take much longer than
the foreground build.
If your application includes ensureIndex() (page 908) operations, and an index doesnt exist for other
operational concerns, building the index can have a severe impact on the performance of the database.
Make sure that your application checks for the indexes at start up using the getIndexes() (page 916) method
or the equivalent method for your driver and terminates if the proper indexes do not exist. Always build indexes
in production instances using separate application code, during designated maintenance windows.
Building Indexes on Secondaries
Background index operations on a replica set primary become foreground indexing operations on secondary members
of the set. All indexing operations on secondaries block replication.
To build large indexes on secondaries the best approach is to restart one secondary at a time in standalone mode and
build the index. After building the index, restart as a member of the replica set, allow it to catch up with the other
members of the set, and then build the index on the next secondary. When all the secondaries have the new index, step
down the primary, restart it as a standalone, and build the index on the former primary.
Remember, the amount of time required to build the index on a secondary node must be within the window of the
oplog, so that the secondary can catch up with the primary.
See Build Indexes on Replica Sets (page 288) for more information on this process.
Indexes on secondary members in recovering mode are always built in the foreground to allow them to catch up as
soon as possible.
See Build Indexes on Replica Sets (page 288) for a complete procedure for rebuilding indexes on secondaries.
Note: If MongoDBis building an index in the background, you cannot performother administrative operations involv-
ing that collection, including repairDatabase (page 872), drop that collection (i.e. db.collection.drop()
(page 907),) and compact (page 825). These operations will return an error during background index builds.
Queries will not use these indexes until the index build is complete.
280 Chapter 29. Core MongoDB Indexing Background
MongoDB Documentation, Release 2.4.1
Drop Duplicates
MongoDB cannot create a unique index (page 278) on a eld that has duplicate values. To force the creation of a
unique index, you can specify the dropDups option, which will only index the rst occurrence of a value for the key,
and delete all subsequent values.
Warning: As in all unique ind