MongoDB Manual Master
MongoDB Manual Master
Release 2.4.3
Contents
I
1
Install MongoDB
Installation Guides 1.1 Install MongoDB on Red Hat Enterprise, CentOS, or Fedora Linux 1.2 Install MongoDB on Ubuntu . . . . . . . . . . . . . . . . . . . . . 1.3 Install MongoDB on Debian . . . . . . . . . . . . . . . . . . . . . 1.4 Install MongoDB on Linux . . . . . . . . . . . . . . . . . . . . . 1.5 Install MongoDB on OS X . . . . . . . . . . . . . . . . . . . . . . 1.6 Install MongoDB on Windows . . . . . . . . . . . . . . . . . . . . 1.7 Install MongoDB Enterprise . . . . . . . . . . . . . . . . . . . . . 1.8 Getting Started with MongoDB . . . . . . . . . . . . . . . . . . . Release Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
3 3 6 9 11 13 16 20 22 31
II
3
Administration
Run-time Database Conguration 3.1 Congure the Database . . . . . . . . . . . . . . . . . 3.2 Security Considerations . . . . . . . . . . . . . . . . 3.3 Replication and Sharding Conguration . . . . . . . . 3.4 Run Multiple Database Instances on the Same System 3.5 Diagnostic Congurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
37 37 38 39 40 40 43 43 45 56 63 63 64 65 66 73 73 75 77
Backup and Recovery Operations for MongoDB 4.1 Backup Strategies for MongoDB Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Backup and Recovery Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Backup and Restore Sharded Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Center Awareness 5.1 Operational Segregation in MongoDB Operations and Deployments 5.2 Tag Aware Sharding . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Administer and Manage Shard Tags . . . . . . . . . . . . . . . . . 5.4 Deploy a Geographically Distributed Replica Set . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Journaling 6.1 Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Journaling Internals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Connect to MongoDB with SSL
7.1 7.2 8
Congure mongod and mongos for SSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SSL Conguration for Clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Monitor MongoDB with SNMP 8.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Congure SNMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manage mongod Processes 9.1 Start mongod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Stop mongod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Sending a UNIX INT or TERM Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10 Rotate Log Files 10.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Monitoring for MongoDB 11.1 Monitoring Tools . . . . . . . . 11.2 Process Logging . . . . . . . . 11.3 Diagnosing Performance Issues 11.4 Replication and Monitoring . . 11.5 Sharding and Monitoring . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
12 Analyze Performance of Database Operations 12.1 Proling Levels . . . . . . . . . . . . . . . . . . . . . 12.2 Enable Database Proling and Set the Proling Level . 12.3 View Proler Data . . . . . . . . . . . . . . . . . . . 12.4 Proler Overhead . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
13 Import and Export MongoDB Data 107 13.1 Data Type Fidelity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 13.2 Data Import and Export and Backups Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 13.3 Human Intelligible Import/Export Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 14 Linux ulimit Settings 111 14.1 Resource Utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 14.2 Review and Set Resource Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 14.3 Recommended Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 15 Production Notes 15.1 Backups . . . . . . . . . . . . . . . . . 15.2 Networking . . . . . . . . . . . . . . . . 15.3 MongoDB on Linux . . . . . . . . . . . 15.4 Readahead . . . . . . . . . . . . . . . . 15.5 MongoDB on Virtual Environments . . . 15.6 Disk and Storage Systems . . . . . . . . 15.7 Hardware Requirements and Limitations 15.8 Performance Monitoring . . . . . . . . . 15.9 Production Checklist . . . . . . . . . . . 115 115 115 115 116 116 116 117 118 118
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
16 Use Database Commands 123 16.1 Database Command Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 16.2 Issue Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 16.3 admin Database Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
ii
16.4 Command Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 17 MongoDB Tutorials 17.1 Getting Started . . . . . . . 17.2 Administration . . . . . . . 17.3 Development Patterns . . . 17.4 Application Development . 17.5 Text Search Patterns . . . . 17.6 Data Modeling Patterns . . 17.7 MongoDB Use Case Studies 125 125 125 127 127 128 128 128
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
III
Security
129
18 Security Concepts and Strategies 133 18.1 Security Practices and Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 18.2 Access Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 19 Tutorials 139 19.1 Network Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 19.2 Access Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 20 Reference 20.1 User Privilege Roles in MongoDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.2 system.users Privilege Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3 Password Hashing Insecurity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 157 162 164
IV
165
21 Read and Write Operations in MongoDB 169 21.1 Read Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 21.2 Write Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 21.3 Write Concern Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 22 Fundamental Concepts for Document Databases 22.1 BSON Documents . . . . . . . . . . . . . . 22.2 ObjectId . . . . . . . . . . . . . . . . . . . 22.3 GridFS . . . . . . . . . . . . . . . . . . . . 22.4 Database References . . . . . . . . . . . . . 23 CRUD Operations for MongoDB 23.1 Create . . . . . . . . . . . . 23.2 Read . . . . . . . . . . . . 23.3 Update . . . . . . . . . . . 23.4 Delete . . . . . . . . . . . . 189 189 196 198 199 203 203 211 221 229
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Data Modeling
231
24 Background 235 24.1 Data Modeling Considerations for MongoDB Applications . . . . . . . . . . . . . . . . . . . . . . . 235 25 Data Modeling Patterns 241 25.1 Model Embedded One-to-One Relationships Between Documents . . . . . . . . . . . . . . . . . . . 241 25.2 Model Embedded One-to-Many Relationships Between Documents . . . . . . . . . . . . . . . . . . 242 iii
Model Referenced One-to-Many Relationships Between Documents Model Data for Atomic Operations . . . . . . . . . . . . . . . . . Model Tree Structures with Parent References . . . . . . . . . . . Model Tree Structures with Child References . . . . . . . . . . . . Model Tree Structures with an Array of Ancestors . . . . . . . . . Model Tree Structures with Materialized Paths . . . . . . . . . . . Model Tree Structures with Nested Sets . . . . . . . . . . . . . . . Model Data to Support Keyword Search . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
VI
Aggregation
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
253
257 257 257 258 259 261 262
26 Aggregation Framework 26.1 Overview . . . . . . . . 26.2 Framework Components 26.3 Use . . . . . . . . . . . 26.4 Optimizing Performance 26.5 Sharded Operation . . . 26.6 Limitations . . . . . . .
27 Aggregation Framework Examples 263 27.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 27.2 Aggregations using the Zip Code Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 27.3 Aggregation with User Preference Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 28 Aggregation Framework Reference 28.1 $add (aggregation) . . . . . . . 28.2 $addToSet (aggregation) . . . . 28.3 $and (aggregation) . . . . . . . 28.4 $avg (aggregation) . . . . . . . 28.5 $cmp (aggregation) . . . . . . . 28.6 $concat (aggregation) . . . . . 28.7 $cond (aggregation) . . . . . . 28.8 $dayOfMonth (aggregation) . . 28.9 $dayOfWeek (aggregation) . . . 28.10 $dayOfYear (aggregation) . . . 28.11 $divide (aggregation) . . . . . . 28.12 $eq (aggregation) . . . . . . . . 28.13 $rst (aggregation) . . . . . . . 28.14 $geoNear (aggregation) . . . . 28.15 $group (aggregation) . . . . . . 28.16 $gt (aggregation) . . . . . . . . 28.17 $gte (aggregation) . . . . . . . 28.18 $hour (aggregation) . . . . . . 28.19 $ifNull (aggregation) . . . . . . 28.20 $last (aggregation) . . . . . . . 28.21 $limit (aggregation) . . . . . . 28.22 $lt (aggregation) . . . . . . . . 28.23 $lte (aggregation) . . . . . . . . 28.24 $match (aggregation) . . . . . . 28.25 $max (aggregation) . . . . . . . 28.26 $millisecond (aggregation) . . . 28.27 $min (aggregation) . . . . . . . 28.28 $minute (aggregation) . . . . . 273 274 274 274 274 274 274 277 277 277 278 278 278 278 278 280 281 282 282 282 282 282 283 283 283 284 284 285 286
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
iv
28.29 28.30 28.31 28.32 28.33 28.34 28.35 28.36 28.37 28.38 28.39 28.40 28.41 28.42 28.43 28.44 28.45 28.46 28.47 28.48 28.49 28.50
$mod (aggregation) . . . . $month (aggregation) . . $multiply (aggregation) . $ne (aggregation) . . . . . $not (aggregation) . . . . $or (aggregation) . . . . . $project (aggregation) . . $push (aggregation) . . . $second (aggregation) . . $skip (aggregation) . . . . $sort (aggregation) . . . . $strcasecmp (aggregation) $substr (aggregation) . . . $subtract (aggregation) . . $sum (aggregation) . . . . $toLower (aggregation) . $toUpper (aggregation) . . $unwind (aggregation) . . $week (aggregation) . . . $year (aggregation) . . . . Pipeline . . . . . . . . . . Expressions . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
286 286 286 287 287 287 287 289 289 289 289 291 291 291 291 291 292 292 293 293 293 302
29 SQL to Aggregation Framework Mapping Chart 309 29.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 30 Map-Reduce 30.1 Examples . . . . . . . . . . . . . . . . . 30.2 Temporary Collection . . . . . . . . . . 30.3 Concurrency . . . . . . . . . . . . . . . 30.4 Sharded Cluster . . . . . . . . . . . . . 30.5 Troubleshooting Map-Reduce Operations 313 313 318 318 318 319
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
31 Simple Aggregation Methods and Commands 325 31.1 Count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 31.2 Distinct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 31.3 Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
VII
Indexes
327
32 Index Concepts 331 32.1 Indexing Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 33 Indexing Strategies for Applications 343 33.1 Indexing Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 34 Index Tutorials 349 34.1 Indexing Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 35 Geospatial Indexing 359 35.1 Geospatial Indexes and Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 36 Text Indexing 373 36.1 Text Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
VIII
Replication
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
385
389 389 397 400 410 414
37 Replica Set Use and Operation 37.1 Replica Set Fundamental Concepts . . . . . . . . . . . . . . . . . . . . . . 37.2 Replica Set Architectures and Deployment Patterns . . . . . . . . . . . . . . 37.3 Replica Set Considerations and Behaviors for Applications and Development 37.4 Replica Set Internals and Behaviors . . . . . . . . . . . . . . . . . . . . . . 37.5 Master Slave Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38 Replica Set Tutorials and Procedures 421 38.1 Replica Set Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421 39 Replica Set Reference Material 465 39.1 Replica Set Conguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 39.2 Replica Set Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470 39.3 Replica Set Features and Version Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
IX
Sharding
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
483
487 487 489 492 494 495
40 Sharding Concepts 40.1 Sharded Cluster Overview . . . . . . . 40.2 Sharded Cluster Architectures . . . . . 40.3 Query Routing in Sharded Clusters . . 40.4 Security Practices for Sharded Clusters 40.5 Sharded Cluster Internals . . . . . . .
41 Administration 505 41.1 Sharded Cluster Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505 42 Reference 539 42.1 Sharding Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539 42.2 Cong Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
Application Development
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
555
559 559 559 562 564 565 567 567 572 575 576 579 581
43 Development Considerations 43.1 MongoDB Drivers and Client Libraries . 43.2 Optimization Strategies for MongoDB . 43.3 Capped Collections . . . . . . . . . . . . 43.4 Server-side JavaScript . . . . . . . . . . 43.5 Store a JavaScript Function on the Server
44 Application Design Patterns for MongoDB 44.1 Perform Two Phase Commits . . . . . . . . . . . . . 44.2 Create Tailable Cursor . . . . . . . . . . . . . . . . . 44.3 Isolate Sequence of Operations . . . . . . . . . . . . 44.4 Create an Auto-Incrementing Sequence Field . . . . . 44.5 Limit Number of Elements in an Array after an Update 44.6 Expire Data from Collections by Setting TTL . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
XI
583
587
Start the mongo Shell . . . . . . . . . . . Executing Queries . . . . . . . . . . . . . Print . . . . . . . . . . . . . . . . . . . . Use a Custom Prompt . . . . . . . . . . . Use an External Editor in the mongo Shell Exit the Shell . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
587 588 588 589 590 590 591 591 592 592 593 595 595 595 595 596 596 597
46 Data Types in the mongo Shell 46.1 Date . . . . . . . . . . . . 46.2 ObjectId . . . . . . . . . 46.3 NumberLong . . . . . . . 46.4 NumberInt . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
47 Access the mongo Shell Help Information 47.1 Command Line Help . . . . . . . . . 47.2 Shell Help . . . . . . . . . . . . . . 47.3 Database Help . . . . . . . . . . . . 47.4 Collection Help . . . . . . . . . . . . 47.5 Cursor Help . . . . . . . . . . . . . 47.6 Type Help . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
48 Write Scripts for the mongo Shell 599 48.1 Opening New Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599 48.2 Scripting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600 49 mongo Shell Quick Reference 49.1 mongo Shell Command History . 49.2 Command Line Options . . . . . 49.3 Command Helpers . . . . . . . . 49.4 Basic Shell JavaScript Operations 49.5 Keyboard Shortcuts . . . . . . . 49.6 Queries . . . . . . . . . . . . . . 49.7 Error Checking Methods . . . . . 49.8 Administrative Command Helpers 49.9 Opening Additional Connections 49.10 Miscellaneous . . . . . . . . . . 49.11 Additional Resources . . . . . . . 601 601 601 601 602 603 604 607 607 607 608 608
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
XII
Use Cases
609
50 Operational Intelligence 613 50.1 Storing Log Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613 50.2 Pre-Aggregated Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623 50.3 Hierarchical Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632 51 Product Data Management 51.1 Product Catalog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Inventory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3 Category Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641 641 649 655
52 Content Management Systems 663 52.1 Metadata and Asset Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663 52.2 Storing Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 670
vii
53 Python Application Development 681 53.1 Write a Tumblelog Application with Django MongoDB Engine . . . . . . . . . . . . . . . . . . . . 681 53.2 Write a Tumblelog Application with Flask and MongoEngine . . . . . . . . . . . . . . . . . . . . . 693
XIII
711
713 713 714 714 714 714 714 715 715 715 715 716 716 716 716 717 717 718 718 718 718 719 719 719 719 720 720 721 722 723 724 724 725 725 726 727 727 727 728 728 728
54 FAQ: MongoDB Fundamentals 54.1 What kind of database is MongoDB? . . . . . . . . . . . . . . . . . . . . . . 54.2 Do MongoDB databases have tables? . . . . . . . . . . . . . . . . . . . . . . 54.3 Do MongoDB databases have schemas? . . . . . . . . . . . . . . . . . . . . . 54.4 What languages can I use to work with MongoDB? . . . . . . . . . . . . . . . 54.5 Does MongoDB support SQL? . . . . . . . . . . . . . . . . . . . . . . . . . 54.6 What are typical uses for MongoDB? . . . . . . . . . . . . . . . . . . . . . . 54.7 Does MongoDB support transactions? . . . . . . . . . . . . . . . . . . . . . . 54.8 Does MongoDB require a lot of RAM? . . . . . . . . . . . . . . . . . . . . . 54.9 How do I congure the cache size? . . . . . . . . . . . . . . . . . . . . . . . 54.10 Does MongoDB require a separate caching layer for application-level caching? 54.11 Does MongoDB handle caching? . . . . . . . . . . . . . . . . . . . . . . . . 54.12 Are writes written to disk immediately, or lazily? . . . . . . . . . . . . . . . . 54.13 What language is MongoDB written in? . . . . . . . . . . . . . . . . . . . . . 54.14 What are the limitations of 32-bit versions of MongoDB? . . . . . . . . . . . 55 FAQ: MongoDB for Application Developers 55.1 What is a namespace in MongoDB? . . . . . . . . . . . . . . . . . 55.2 How do you copy all objects from one collection to another? . . . . 55.3 If you remove a document, does MongoDB remove it from disk? . 55.4 When does MongoDB write updates to disk? . . . . . . . . . . . . 55.5 How do I do transactions and locking in MongoDB? . . . . . . . . 55.6 How do you aggregate data with MongoDB? . . . . . . . . . . . . 55.7 Why does MongoDB log so many Connection Accepted events? 55.8 Does MongoDB run on Amazon EBS? . . . . . . . . . . . . . . . 55.9 Why are MongoDBs data les so large? . . . . . . . . . . . . . . 55.10 How do I optimize storage use for small documents? . . . . . . . . 55.11 When should I use GridFS? . . . . . . . . . . . . . . . . . . . . . 55.12 How does MongoDB address SQL or Query injection? . . . . . . . 55.13 How does MongoDB provide concurrency? . . . . . . . . . . . . . 55.14 What is the compare order for BSON types? . . . . . . . . . . . . 55.15 How do I query for elds that have null values? . . . . . . . . . . . 55.16 Are there any restrictions on the names of Collections? . . . . . . . 55.17 How do I isolate cursors from intervening write operations? . . . . 55.18 When should I embed documents within other documents? . . . . . 55.19 Can I manually pad documents to prevent moves during updates? .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
56 FAQ: The mongo Shell 56.1 How can I enter multi-line operations in the mongo shell? . . . . . . . . . . 56.2 How can I access different databases temporarily? . . . . . . . . . . . . . . 56.3 Does the mongo shell support tab completion and other keyboard shortcuts? 56.4 How can I customize the mongo shell prompt? . . . . . . . . . . . . . . . . 56.5 Can I edit long shell operations with an external text editor? . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
57 FAQ: Concurrency 731 57.1 What type of locking does MongoDB use? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731 57.2 How granular are locks in MongoDB? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 732 57.3 How do I see the status of locks on my mongod instances? . . . . . . . . . . . . . . . . . . . . . . 732 viii
Does a read or write operation ever yield the lock? . . . . . . . . . . . . . . Which operations lock the database? . . . . . . . . . . . . . . . . . . . . . Which administrative commands lock the database? . . . . . . . . . . . . . Does a MongoDB operation ever lock more than one database? . . . . . . . How does sharding affect concurrency? . . . . . . . . . . . . . . . . . . . . How does concurrency affect a replica set primary? . . . . . . . . . . . . . . How does concurrency affect secondaries? . . . . . . . . . . . . . . . . . . What kind of concurrency does MongoDB provide for JavaScript operations?
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
732 732 733 734 734 734 734 734 735 736 736 736 736 736 737 737 737 737 738 738 738 738 738 739 739 739 739 739 739 740 740 740 740 740 741 741 741 743 743 743 744 744 744 744 745 745 745 745 745 746 746 746
58 FAQ: Sharding with MongoDB 58.1 Is sharding appropriate for a new deployment? . . . . . . . . . . . . . . . . . . . . . . . . . 58.2 How does sharding work with replication? . . . . . . . . . . . . . . . . . . . . . . . . . . . 58.3 Can I change the shard key after sharding a collection? . . . . . . . . . . . . . . . . . . . . . 58.4 What happens to unsharded collections in sharded databases? . . . . . . . . . . . . . . . . . 58.5 How does MongoDB distribute data across shards? . . . . . . . . . . . . . . . . . . . . . . . 58.6 What happens if a client updates a document in a chunk during a migration? . . . . . . . . . 58.7 What happens to queries if a shard is inaccessible or slow? . . . . . . . . . . . . . . . . . . . 58.8 How does MongoDB distribute queries among shards? . . . . . . . . . . . . . . . . . . . . . 58.9 How does MongoDB sort queries in sharded environments? . . . . . . . . . . . . . . . . . . 58.10 How does MongoDB ensure unique _id eld values when using a shard key other than _id? 58.11 Ive enabled sharding and added a second shard, but all the data is still on one server. Why? . 58.12 Is it safe to remove old les in the moveChunk directory? . . . . . . . . . . . . . . . . . . . 58.13 How does mongos use connections? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58.14 Why does mongos hold connections open? . . . . . . . . . . . . . . . . . . . . . . . . . . . 58.15 Where does MongoDB report on connections used by mongos? . . . . . . . . . . . . . . . . 58.16 What does writebacklisten in the log mean? . . . . . . . . . . . . . . . . . . . . . . . 58.17 How should administrators deal with failed migrations? . . . . . . . . . . . . . . . . . . . . 58.18 What is the process for moving, renaming, or changing the number of cong servers? . . . . 58.19 When do the mongos servers detect cong server changes? . . . . . . . . . . . . . . . . . . 58.20 Is it possible to quickly update mongos servers after updating a replica set conguration? . . 58.21 What does the maxConns setting on mongos do? . . . . . . . . . . . . . . . . . . . . . . . 58.22 How do indexes impact queries in sharded systems? . . . . . . . . . . . . . . . . . . . . . . 58.23 Can shard keys be randomly generated? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58.24 Can shard keys have a non-uniform distribution of values? . . . . . . . . . . . . . . . . . . . 58.25 Can you shard on the _id eld? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58.26 Can shard key be in ascending order, like dates or timestamps? . . . . . . . . . . . . . . . . . 58.27 What do moveChunk commit failed errors mean? . . . . . . . . . . . . . . . . . . . 58.28 How does draining a shard affect the balancing of uneven chunk distribution? . . . . . . . . . 59 FAQ: Replica Sets and Replication in MongoDB 59.1 What kinds of replication does MongoDB support? . . . . . . . . . . . . . 59.2 What do the terms primary and master mean? . . . . . . . . . . . . . 59.3 What do the terms secondary and slave mean? . . . . . . . . . . . . . 59.4 How long does replica set failover take? . . . . . . . . . . . . . . . . . . . 59.5 Does replication work over the Internet and WAN connections? . . . . . . 59.6 Can MongoDB replicate over a noisy connection? . . . . . . . . . . . . 59.7 What is the preferred replication method: master/slave or replica sets? . . . 59.8 What is the preferred replication method: replica sets or replica pairs? . . . 59.9 Why use journaling if replication already provides data redundancy? . . . 59.10 Are write operations durable if write concern does not acknowledge writes? 59.11 How many arbiters do replica sets need? . . . . . . . . . . . . . . . . . . 59.12 What information do arbiters exchange with the rest of the replica set? . . 59.13 Which members of a replica set vote in elections? . . . . . . . . . . . . . 59.14 Do hidden members vote in replica set elections? . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
ix
59.15 Is it normal for replica set members to use different amounts of disk space? . . . . . . . . . . . . . . 747 60 FAQ: MongoDB Storage 60.1 What are memory mapped les? . . . . . . . . . . . . . . . . . . . . . . . 60.2 How do memory mapped les work? . . . . . . . . . . . . . . . . . . . . 60.3 How does MongoDB work with memory mapped les? . . . . . . . . . . 60.4 What are page faults? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60.5 What is the difference between soft and hard page faults? . . . . . . . . . 60.6 What tools can I use to investigate storage use in MongoDB? . . . . . . . 60.7 What is the working set? . . . . . . . . . . . . . . . . . . . . . . . . . . . 60.8 Why are the les in my data directory larger than the data in my database? 60.9 How can I check the size of a collection? . . . . . . . . . . . . . . . . . . 60.10 How can I check the size of indexes? . . . . . . . . . . . . . . . . . . . . 60.11 How do I know when the server runs out of disk space? . . . . . . . . . . 61 FAQ: Indexes 61.1 Should you run ensureIndex() after every insert? . . . . . . 61.2 How do you know what indexes exist in a collection? . . . . . . . 61.3 How do you determine the size of an index? . . . . . . . . . . . . 61.4 What happens if an index does not t into RAM? . . . . . . . . . 61.5 How do you know what index a query used? . . . . . . . . . . . 61.6 How do you determine what elds to index? . . . . . . . . . . . 61.7 How do write operations affect indexes? . . . . . . . . . . . . . . 61.8 Will building a large index affect database performance? . . . . . 61.9 Can I use index keys to constrain query matches? . . . . . . . . . 61.10 Using $ne and $nin in a query is slow. Why? . . . . . . . . . . 61.11 Can I use a multi-key index to support a query for a whole array? 61.12 How can I effectively use indexes strategy for attribute lookups? . 749 749 749 750 750 750 750 750 751 752 752 752 755 755 755 756 756 756 756 756 756 757 757 757 757 759 759 760 760 761
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
62 FAQ: MongoDB Diagnostics 62.1 Where can I nd information about a mongod process that stopped running unexpectedly? 62.2 Does TCP keepalive time affect sharded clusters and replica sets? . . . . . . . . . . . . 62.3 Memory Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.4 Sharded Cluster Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
XIV
Reference
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
765
767 767 811 921 1019
63 MongoDB Interface 63.1 Query, Update and Projection Operators 63.2 Database Commands . . . . . . . . . . 63.3 mongo Shell Methods . . . . . . . . . 63.4 SQL to MongoDB Mapping Chart . . .
64 Architecture and Components 1025 64.1 MongoDB Package Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1025 65 Internal Metadata and Reporting 65.1 The local Database . . . . 65.2 System Collections . . . . . . 65.3 Database Proler Output . . . 65.4 Exit Codes and Statuses . . . 1101 1101 1103 1104 1107
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Connection String URI Format MongoDB Extended JSON . . Database References . . . . . . GridFS Reference . . . . . . . Glossary . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
XV
Release Notes
1135
67 Current Stable Release 1139 67.1 Release Notes for MongoDB 2.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1139 68 Previous Stable Releases 68.1 Release Notes for MongoDB 2.2 . . 68.2 Release Notes for MongoDB 2.0 . . 68.3 Release Notes for MongoDB 1.8 . . 68.4 Release Notes for MongoDB 1.6 . . 68.5 Release Notes for MongoDB 1.4 . . 68.6 Release Notes for MongoDB 1.2.x . 1161 1161 1171 1177 1182 1184 1186
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
69 Other MongoDB Release Notes 1189 69.1 Default Write Concern Change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1189 70 Version Numbers 1191
XVI
1193
1197 1199 1201 1203
71 License 72 Editions 73 Version and Revisions 74 Report an Issue or Make a Change Request
75 Contribute to the Documentation 1205 75.1 MongoDB Manual Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1205 75.2 About the Documentation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1206
xi
xii
Part I
Install MongoDB
CHAPTER 1
Installation Guides
MongoDB runs on most platforms, and supports 32-bit and 64-bit architectures. 10gen, the MongoDB makers, provides both binaries and packages. Choose your platform below:
mongo-10gen This package contains all MongoDB tools from the latest stable release. Additionally, you can use this package to install tools from a previous release (page 4) of MongoDB. Install this package on all production MongoDB hosts and optionally on other systems from which you may need to administer MongoDB systems.
If you are running a 32-bit system, which isnt recommended for production deployments, place the following conguration in http://docs.mongodb.org/manual/etc/yum.repos.d/10gen.repo le:
[10gen] name=10gen Repository baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/i686 gpgcheck=0 enabled=1
Install Packages Issue the following command (as root or with sudo) to install the latest stable version of MongoDB and the associated tools:
yum install mongo-10gen mongo-10gen-server
When this command completes, you have successfully installed MongoDB! Manage Installed Versions You can use the mongo-10gen and mongo-10gen-server packages to install previous releases of MongoDB. To install a specic release, append the version number, as in the following example:
yum install mongo-10gen-2.2.3 mongo-10gen-server-2.2.3
This installs the mongo-10gen and mongo-10gen-server packages with the 2.2.3 release. You can specify any available version of MongoDB; however yum will upgrade the mongo-10gen and mongo-10gen-server packages when a newer version becomes available. Use the following pinning procedure to prevent unintended upgrades. To pin a package, add the following line to your http://docs.mongodb.org/manual/etc/yum.conf le:
exclude=mongo-10gen,mongo-10gen-server
Start MongoDB Start the mongod (page 1025) process by issuing the following command (as root, or with sudo):
service mongod start
You can verify that the mongod (page 1025) process has started successfully by checking the contents of the log le at http://docs.mongodb.org/manual/var/log/mongo/mongod.log. You may optionally, ensure that MongoDB will start following a system reboot, by issuing the following command (with root privileges:)
chkconfig mongod on
Stop MongoDB Stop the mongod (page 1025) process by issuing the following command (as root, or with sudo):
service mongod stop
Restart MongoDB You can restart the mongod (page 1025) process by issuing the following command (as root, or with sudo):
service mongod restart
Follow the state of this process by watching the output in the http://docs.mongodb.org/manual/var/log/mongo/mongod le to watch for errors or important messages from the server.
Control mongos As of the current release, there are no control scripts for mongos (page 1036). mongos (page 1036) is only used in sharding deployments and typically do not run on the same systems where mongod (page 1025) runs. You can use the mongodb script referenced above to derive your own mongos (page 1036) control script. SELinux Considerations You must SELinux to allow MongoDB to start on Fedora systems. Administrators have two options: enable access to the relevant ports (e.g. 27017) for SELinux. See Interfaces and Port Numbers (page 134) for more information on MongoDBs default ports. disable SELinux entirely. This requires a system reboot and may have larger implications for your deployment.
This will connect to the database running on the localhost interface by default. At the mongo (page 1040) prompt, issue the following two commands to insert a record in the test collection of the (default) test database and then retrieve that document.
> db.test.save( { a: 1 } ) > db.test.find()
See also: mongo (page 1040) and mongo Shell Methods (page 921)
Install MongoDB on Linux (page 11) Install MongoDB on OS X (page 13) Install MongoDB on Windows (page 16)
Install Packages Issue the following command to install the latest stable version of MongoDB:
sudo apt-get install mongodb-10gen
When this command completes, you have successfully installed MongoDB! Continue for conguration and start-up suggestions. Manage Installed Versions You can use the mongodb-10gen package to install previous versions of MongoDB. To install a specic release, append the version number to the package name, as in the following example:
apt-get install mongodb-10gen=2.2.3
This will install the 2.2.3 release of MongoDB. You can specify any available version of MongoDB; however apt-get will upgrade the mongodb-10gen package when a newer version becomes available. Use the following pinning procedure to prevent unintended upgrades. To pin a package, issue the following command at the system prompt to pin the version of MongoDB at the currently installed version:
You can verify that mongod (page 1025) has started successfully by checking the contents of the log le at http://docs.mongodb.org/manual/var/log/mongodb/mongodb.log. Stopping MongoDB As needed, you may stop the mongod (page 1025) process by issuing the following command:
sudo service mongodb stop
Restarting MongoDB You may restart the mongod (page 1025) process by issuing the following command:
sudo service mongodb restart
Controlling mongos As of the current release, there are no control scripts for mongos (page 1036). mongos (page 1036) is only used in sharding deployments and typically do not run on the same systems where mongod (page 1025) runs. You can use the mongodb script referenced above to derive your own mongos (page 1036) control script.
This will connect to the database running on the localhost interface by default. At the mongo (page 1040) prompt, issue the following two commands to insert a record in the test collection of the (default) test database.
> db.test.save( { a: 1 } ) > db.test.find()
See also: mongo (page 1040) and mongo Shell Methods (page 921)
Install Packages Issue the following command to install the latest stable version of MongoDB:
sudo apt-get install mongodb-10gen
When this command completes, you have successfully installed MongoDB! Manage Installed Versions You can use the mongodb-10gen package to install previous versions of MongoDB. To install a specic release, append the version number to the package name, as in the following example:
apt-get install mongodb-10gen=2.2.3
This will install the 2.2.3 release of MongoDB. You can specify any available version of MongoDB; however apt-get will upgrade the mongodb-10gen package when a newer version becomes available. Use the following pinning procedure to prevent unintended upgrades. To pin a package, issue the following command at the system prompt to pin the version of MongoDB at the currently installed version:
echo "mongodb-10gen hold" | dpkg --set-selections
10
You can verify that mongod (page 1025) has started successfully by checking the contents of the log le at http://docs.mongodb.org/manual/var/log/mongodb/mongodb.log. Stopping MongoDB Issue the following command to stop mongod (page 1025):
sudo /etc/init.d/mongodb stop
Restarting MongoDB Issue the following command to restart mongod (page 1025):
sudo /etc/init.d/mongodb restart
Controlling mongos As of the current release, there are no control scripts for mongos (page 1036). mongos (page 1036) is only used in sharding deployments and typically do not run on the same systems where mongod (page 1025) runs. You can use the mongodb script referenced above to derive your own mongos (page 1036) control script.
This will connect to the database running on the localhost interface by default. At the mongo (page 1040) prompt, issue the following two commands to insert a record in the test collection of the (default) test database.
> db.test.save( { a: 1 } ) > db.test.find()
See also: mongo (page 1040) and mongo Shell Methods (page 921)
usage guide. See also: Additional installation tutorials: Install MongoDB on Red Hat Enterprise, CentOS, or Fedora Linux (page 3) Install MongoDB on Ubuntu (page 6) Install MongoDB on Debian (page 9) Install MongoDB on OS X (page 13) Install MongoDB on Windows (page 16)
If you need to run the 32-bit version, use the following command.
curl http://downloads.mongodb.org/linux/mongodb-linux-i686-2.4.3.tgz > mongodb.tgz
Once youve downloaded the release, issue the following command to extract the les from the archive:
tar -zxvf mongodb.tgz
Optional You may use the following command to copy the extracted folder into a more generic location.
cp -R -n mongodb-linux-????-??-??/ mongodb
You can nd the mongod (page 1025) binary, and the binaries all of the associated MongoDB utilities, in the bin/ directory within the extracted directory. Using MongoDB Before you start mongod (page 1025) for the rst time, you will need to create the data directory. By default, mongod (page 1025) writes data to the http://docs.mongodb.org/manual/data/db/ directory. To create this directory, use the following command:
mkdir -p /data/db
Note: Ensure that the system account that will run the mongod (page 1025) process has read and write permissions to this directory. If mongod (page 1025) runs under the mongodb user account, issue the following command to change the owner of this folder:
12
If you use an alternate location for your data directory, ensure that this user can write to your chosen data path. You can specify, and create, an alternate path using the --dbpath (page 1027) option to mongod (page 1025) and the above command. The 10gen builds of MongoDB contain no control scripts or method to control the mongod (page 1025) process. You may wish to create control scripts, modify your path, and/or create symbolic links to the MongoDB programs in your http://docs.mongodb.org/manual/usr/local/bin or http://docs.mongodb.org/manual/usr/bin directory for easier use. For testing purposes, you can start a mongod (page 1025) directly in the terminal without creating a control script:
mongod --config /etc/mongod.conf
Note: The above command assumes that the mongod (page 1025) binary is accessible via your systems search path, and that you have created a default conguration le located at http://docs.mongodb.org/manual/etc/mongod.conf. Among the tools included with this MongoDB distribution, is the mongo (page 1040) shell. You can use this shell to connect to your MongoDB instance by issuing the following command at the system prompt:
./bin/mongo
Note: The ./bin/mongo command assumes that the mongo (page 1040) binary is in the bin/ sub-directory of the current directory. This is the directory into which you extracted the .tgz le. This will connect to the database running on the localhost interface by default. At the mongo (page 1040) prompt, issue the following two commands to insert a record in the test collection of the (default) test database and then retrieve that record:
> db.test.save( { a: 1 } ) > db.test.find()
See also: mongo (page 1040) and mongo Shell Methods (page 921)
1.5.1 Synopsis
This tutorial outlines the basic installation process for deploying MongoDB on Macintosh OS X systems. This tutorial provides two main methods of installing the MongoDB server (i.e. mongod (page 1025)) and associated tools: rst using the community package management tools, and second using builds of MongoDB provided by 10gen. See also: 1.5. Install MongoDB on OS X 13
Additional installation tutorials: Install MongoDB on Red Hat Enterprise, CentOS, or Fedora Linux (page 3) Install MongoDB on Ubuntu (page 6) Install MongoDB on Debian (page 9) Install MongoDB on Linux (page 11) Install MongoDB on Windows (page 16)
Use the following command to install the MongoDB package into your Homebrew system.
brew install mongodb
Later, if you need to upgrade MongoDB, you can issue the following sequence of commands to update the MongoDB installation on your system:
brew update brew upgrade mongodb
MacPorts MacPorts distributes build scripts that allow you to easily build packages and their dependencies on your own system. The compilation process can take signicant period of time depending on your systems capabilities and existing dependencies. Issue the following command in the system shell:
port install mongodb
Using MongoDB from Homebrew and MacPorts The packages installed with Homebrew and MacPorts contain no control scripts or interaction with the systems process manager. If you have congured Homebrew and MacPorts correctly, including setting your PATH, the MongoDB applications and utilities will be accessible from the system shell. Start the mongod (page 1025) process in a terminal (for testing or development) or using a process management tool.
mongod
14
Then open the mongo (page 1040) shell by issuing the following command at the system prompt:
mongo
This will connect to the database running on the localhost interface by default. At the mongo (page 1040) prompt, issue the following two commands to insert a record in the test collection of the (default) test database and then retrieve that record.
> db.test.save( { a: 1 } ) > db.test.find()
See also: mongo (page 1040) and mongo Shell Methods (page 921)
Note: The mongod (page 1025) process will not run on older Macintosh computers with PowerPC (i.e. non-Intel) processors. Once youve downloaded the release, issue the following command to extract the les from the archive:
tar -zxvf mongodb.tgz
Optional You may use the following command to move the extracted folder into a more generic location.
mv -n mongodb-osx-[platform]-[version]/ /path/to/new/location/
Replace [platform] with i386 or x86_64 depending on your system and the version you downloaded, and [version] with 2.4 or the version of MongoDB that you are installing. You can nd the mongod (page 1025) binary, and the binaries all of the associated MongoDB utilities, in the bin/ directory within the archive. Using MongoDB from 10gen Builds Before you start mongod (page 1025) for the rst time, you will need to create the data directory. By default, mongod (page 1025) writes data to the http://docs.mongodb.org/manual/data/db/ directory. To create this directory, and set the appropriate permissions use the following commands:
sudo mkdir -p /data/db sudo chown id -u /data/db
15
You can specify an alternate path for data les using the --dbpath (page 1027) option to mongod (page 1025). The 10gen builds of MongoDB contain no control scripts or method to control the mongod (page 1025) process. You may wish to create control scripts, modify your path, and/or create symbolic links to the MongoDB programs in your http://docs.mongodb.org/manual/usr/local/bin directory for easier use. For testing purposes, you can start a mongod (page 1025) directly in the terminal without creating a control script:
mongod --config /etc/mongod.conf
Note: This command assumes that the mongod (page 1025) binary is accessible via your systems search path, and that you have created a default conguration le located at http://docs.mongodb.org/manual/etc/mongod.conf. Among the tools included with this MongoDB distribution, is the mongo (page 1040) shell. You can use this shell to connect to your MongoDB instance by issuing the following command at the system prompt from inside of the directory where you extracted mongo (page 1040):
./bin/mongo
Note: The ./bin/mongo command assumes that the mongo (page 1040) binary is in the bin/ sub-directory of the current directory. This is the directory into which you extracted the .tgz le. This will connect to the database running on the localhost interface by default. At the mongo (page 1040) prompt, issue the following two commands to insert a record in the test collection of the (default) test database and then retrieve that record:
> db.test.save( { a: 1 } ) > db.test.find()
See also: mongo (page 1040) and mongo Shell Methods (page 921)
1.6.2 Procedure
Important: If you are running any edition of Windows Server 2008 R2 or Windows 7, please install a hotx to resolve an issue with memory mapped les on Windows.
16
Download MongoDB for Windows Download the latest production release of MongoDB from the MongoDB downloads page. There are three builds of MongoDB for Windows: MongoDB for Windows Server 2008 R2 edition (i.e. 2008R2) only runs on Windows Server 2008 R2, Windows 7 64-bit, and newer versions of Windows. This build takes advantage of recent enhancements to the Windows Platform and cannot operate on older versions of Windows. MongoDB for Windows 64-bit runs on any 64-bit version of Windows newer than Windows XP, including Windows Server 2008 R2 and Windows 7 64-bit. MongoDB for Windows 32-bit runs on any 32-bit version of Windows newer than Windows XP. 32-bit versions of MongoDB are only intended for older systems and for use in testing and development systems. Changed in version 2.2: MongoDB does not support Windows XP. Please use a more recent version of Windows to use more recent releases of MongoDB. Note: Always download the correct version of MongoDB for your Windows system. The 64-bit versions of MongoDB will not work with 32-bit Windows. 32-bit versions of MongoDB are suitable only for testing and evaluation purposes and only support databases smaller than 2GB. You can nd the architecture of your version of Windows platform using the following command in the Command Prompt:
wmic os get osarchitecture
In Windows Explorer, nd the MongoDB download le, typically in the default Downloads directory. Extract the archive to C:\ by right clicking on the archive and selecting Extract All and browsing to C:\. Note: The folder name will be either:
C:\mongodb-win32-i386-[version]
Or:
C:\mongodb-win32-x86_64-[version]
Set up the Environment Start the Command Prompt by selecting the Start Menu, then All Programs, then Accessories, then right click Command Prompt, and select Run as Administrator from the popup menu. In the Command Prompt, issue the following commands:
cd \ move C:\mongodb-win32-* C:\mongodb
Note: MongoDB is self-contained and does not have any other system dependencies. You can run MongoDB from any folder you choose. You may install MongoDB in any directory (e.g. D:\test\mongodb) MongoDB requires a data folder to store its les. The default location for the MongoDB data directory is C:\data\db. Create this folder using the Command Prompt. Issue the following command sequence:
17
md data md data\db
Note: You may specify an alternate path for \data\db with the dbpath (page 1085) setting for mongod.exe (page 1045), as in the following example:
C:\mongodb\bin\mongod.exe --dbpath d:\test\mongodb\data
If your path includes spaces, enclose the entire path in double quotations, for example:
C:\mongodb\bin\mongod.exe --dbpath "d:\test\mongo db data"
This will start the main MongoDB database process. The waiting for connections message in the console output indicates that the mongod.exe process is running successfully. Note: Depending on the security level of your system, Windows will issue a Security Alert dialog box about blocking some features of C:\\mongodb\bin\mongod.exe from communicating on networks. All users should select Private Networks, such as my home or work network and click Allow access. For additional information on security and MongoDB, please read the Security Practices and Management (page 133) page. Warning: Do not allow mongod.exe (page 1045) to be accessible to public networks without running in Secure Mode (i.e. auth (page 1085).) MongoDB is designed to be run in trusted environments and the database does not enable authentication or Secure Mode by default. Connect to MongoDB using the mongo.exe (page 1040) shell. Open another Command Prompt and issue the following command:
C:\mongodb\bin\mongo.exe
Note: Executing the command start C:\mongodb\bin\mongo.exe will automatically start the mongo.exe shell in a separate Command Prompt window. The mongo.exe (page 1040) shell will connect to mongod.exe (page 1045) running on the localhost interface and port 27017 by default. At the mongo.exe (page 1040) prompt, issue the following two commands to insert a record in the test collection of the default test database and then retrieve that record:
> db.test.save( { a: 1 } ) > db.test.find()
See also: mongo (page 1040) and mongo Shell Methods (page 921). If you want to develop applications using .NET, see the documentation of C# and MongoDB for more information.
18
Setup MongoDB as a Windows Service, so that the database will start automatically following each reboot cycle. Note: mongod.exe (page 1045) added support for running as a Windows service in version 2.0, and mongos.exe (page 1046) added support for running as a Windows Service in version 2.1.1.
Congure the System You should specify two options when running MongoDB as a Windows Service: a path for the log output (i.e. logpath (page 1084)) and a conguration le (page 1082). 1. Create a specic directory for MongoDB log les:
md C:\mongodb\log
2. Create a conguration le for the logpath (page 1084) option for MongoDB in the Command Prompt by issuing this command:
echo logpath=C:\mongodb\log\mongo.log > C:\mongodb\mongod.cfg
While these optional steps are optional, creating a specic location for log les and using the conguration le are good practice. Note: Consider setting the logappend (page 1084) option. If you do not, mongod.exe (page 1045) will delete the contents of the existing log le when starting. Changed in version 2.2: The default logpath (page 1084) and logappend (page 1084) behavior changed in the 2.2 release.
Install and Run the MongoDB Service Run all of the following commands in Command Prompt with Administrative Privileges: 1. To install the MongoDB service:
C:\mongodb\bin\mongod.exe --config C:\mongodb\mongod.cfg --install
Modify the path to the mongod.cfg le as needed. For the --install (page 1045) option to succeed, you must specify a logpath (page 1084) setting or the --logpath (page 1026) run-time option. 2. To run the MongoDB service:
net start MongoDB
Note: If you wish to use an alternate path for your dbpath (page 1085) specify it in the cong le (e.g. C:\mongodb\mongod.cfg) on that you specied in the --install (page 1045) operation. You may also specify --dbpath (page 1027) on the command line; however, always prefer the conguration le. If the dbpath (page 1085) directory does not exist, mongod.exe (page 1045) will not be able to start. The default value for dbpath (page 1085) is \data\db.
19
Red Hat Enterprise Linux 6.x series and Amazon Linux AMI require libssl, libgsasl7, net-snmp, net-snmp-libs, and net-snmp-utils. To download libgsasl you must enable the EPEL repository by issuing the following sequence of commands to add and update the system repositories:
sudo rpm -ivh http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm sudo yum update -y
When you have installed and updated the EPEL repositories, issue the following install these packages:
sudo yum install libssl net-snmp net-snmp-libs net-snmp-utils libgsasl
SUSE Enterprise Linux requires libopenssl0_9_8, libsnmp15, slessp1-libsnmp15, and snmp-mibs. Issue a command such as the following to install these packages:
sudo zypper install libopenssl0_9_8 libsnmp15 slessp1-libsnmp15 snmp-mibs
Note: For the 2.4 release, the MongoDB Enterprise for SUSE requires libgsasl which is not available in the default repositories for SUSE.
20
Download and Extract Package Use the sequence of commands below to download and extract MongoDB Enterprise packages appropriate for your distribution:
Ubuntu 12.04
Note: Ensure that the system account that will run the mongod (page 1025) process has read and write permissions to this directory. If mongod (page 1025) runs under the mongodb user account, issue the following command to change the owner of this folder:
chown mongodb /data/db
If you use an alternate location for your data directory, ensure that this user can write to your chosen data path. You can specify, and create, an alternate path using the --dbpath (page 1027) option to mongod (page 1025) and the above command. The 10gen builds of MongoDB contain no control scripts or method to control the mongod (page 1025) process. You may wish to create control scripts, modify your path, and/or create symbolic links 1.7. Install MongoDB Enterprise 21
to the MongoDB programs in your http://docs.mongodb.org/manual/usr/local/bin http://docs.mongodb.org/manual/usr/bin directory for easier use. For testing purposes, you can start a mongod (page 1025) directly in the terminal without creating a control script:
mongod --config /etc/mongod.conf
or
Note: The above command assumes that the mongod (page 1025) binary is accessible via your systems search path, and that you have created a default conguration le located at http://docs.mongodb.org/manual/etc/mongod.conf. Among the tools included with this MongoDB distribution, is the mongo (page 1040) shell. You can use this shell to connect to your MongoDB instance by issuing the following command at the system prompt:
./bin/mongo
Note: The ./bin/mongo command assumes that the mongo (page 1040) binary is in the bin/ sub-directory of the current directory. This is the directory into which you extracted the .tgz le. This will connect to the database running on the localhost interface by default. At the mongo (page 1040) prompt, issue the following two commands to insert a record in the test collection of the (default) test database and then retrieve that record:
> db.test.save( { a: 1 } ) > db.test.find()
See also: mongo (page 1040) and mongo Shell Methods (page 921)
22
This tutorial addresses the following aspects of MongoDB use: Connect to a Database (page 23) Connect to a mongod (page 1025) (page 23) Select a Database (page 23) Display mongo Help (page 24) Create a Collection and Insert Documents (page 24) Insert Individual Documents (page 24) Insert Multiple Documents Using a For Loop (page 25) Working with the Cursor (page 26) Iterate over the Cursor with a Loop (page 26) Use Array Operations with the Cursor (page 26) Query for Specic Documents (page 27) Return a Single Document from a Collection (page 28) Limit the Number of Documents in the Result Set (page 28) Next Steps with MongoDB (page 29)
By default, mongo (page 1040) looks for a database server listening on port 27017 on the localhost interface. To connect to a server on a different port or interface, use the --port (page 1041) and --host (page 1041) options. Select a Database After starting the mongo (page 1040) shell your session will use the test database for context, by default. At any time issue the following operation at the mongo (page 1040) to report the current database:
db
db returns the name of the current database. 1. From the mongo (page 1040) shell, display the list of databases with the following operation:
show dbs
3. Conrm that your session has the mydb database as context, using the db operation, which returns the name of the current database as follows:
db
23
At this point, if you issue the show dbs operation again, it will not include mydb, because MongoDB will not create a database until you insert data into that database. The Create a Collection and Insert Documents (page 24) section describes the process for inserting data. New in version 2.4: show databases also returns a list of databases. Display mongo Help At any point you can access help for the mongo (page 1040) shell using the following operation:
help
Furthermore, you can append the .help() method to some JavaScript methods, any cursor object, as well as the db and db.collection objects to return additional help information.
2. If mongo (page 1040) does not return mydb for the previous operation, set the context to the mydb database with the following operation:
use mydb
3. Create two documents, named j and k, with the following sequence of JavaScript operations:
j = { name : "mongo" } k = { x : 3 }
4. Insert the j and k documents into the collection things with the following sequence of operations:
db.things.insert( j ) db.things.insert( k )
When you insert the rst document, the mongod (page 1025) will create both the mydb database and the things collection. 5. Conrm that the collection named things exists using the following operation:
show collections
The mongo (page 1040) shell will return the list of the collections in the current (i.e. mydb) database. At this point, the only collection is things. All mongod (page 1025) databases also have a system.indexes (page 1103) collection. 6. Conrm that the documents exist in the collection things by issuing query on the things collection. Using the find() (page 928) method in an operation that resembles the following:
24
db.things.find()
This operation returns the following results. The ObjectId (page 196) values will be unique:
{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" } { "_id" : ObjectId("4c2209fef3924d31102bd84b"), "x" : 3 }
All MongoDB documents must have an _id eld with a unique value. These operations do not explicitly specify a value for the _id eld, so mongo (page 1040) creates a unique ObjectId (page 196) value for the eld before inserting it into the collection. Insert Multiple Documents Using a For Loop 1. From the mongo (page 1040) shell, add more documents to the things collection using the following for loop:
for (var i = 1; i <= 20; i++) db.things.insert( { x : 4 , j : i } )
The mongo (page 1040) shell displays the rst 20 documents in the collection. Your ObjectId (page 196) values will be different:
{ { { { { { { { { { { { { { { { { { { { "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" : : : : : : : : : : : : : : : : : : : : ObjectId("4c2209f9f3924d31102bd84a"), ObjectId("4c2209fef3924d31102bd84b"), ObjectId("4c220a42f3924d31102bd856"), ObjectId("4c220a42f3924d31102bd857"), ObjectId("4c220a42f3924d31102bd858"), ObjectId("4c220a42f3924d31102bd859"), ObjectId("4c220a42f3924d31102bd85a"), ObjectId("4c220a42f3924d31102bd85b"), ObjectId("4c220a42f3924d31102bd85c"), ObjectId("4c220a42f3924d31102bd85d"), ObjectId("4c220a42f3924d31102bd85e"), ObjectId("4c220a42f3924d31102bd85f"), ObjectId("4c220a42f3924d31102bd860"), ObjectId("4c220a42f3924d31102bd861"), ObjectId("4c220a42f3924d31102bd862"), ObjectId("4c220a42f3924d31102bd863"), ObjectId("4c220a42f3924d31102bd864"), ObjectId("4c220a42f3924d31102bd865"), ObjectId("4c220a42f3924d31102bd866"), ObjectId("4c220a42f3924d31102bd867"), "name" : "mongo" } "x" : 3 } "x" : 4, "j" : 1 } "x" : 4, "j" : 2 } "x" : 4, "j" : 3 } "x" : 4, "j" : 4 } "x" : 4, "j" : 5 } "x" : 4, "j" : 6 } "x" : 4, "j" : 7 } "x" : 4, "j" : 8 } "x" : 4, "j" : 9 } "x" : 4, "j" : 10 } "x" : 4, "j" : 11 } "x" : 4, "j" : 12 } "x" : 4, "j" : 13 } "x" : 4, "j" : 14 } "x" : 4, "j" : 15 } "x" : 4, "j" : 16 } "x" : 4, "j" : 17 } "x" : 4, "j" : 18 }
1. The find() (page 928) returns a cursor. To iterate the cursor and return more documents use the it operation in the mongo (page 1040) shell. The mongo (page 1040) shell will exhaust the cursor, and return the following documents:
{ "_id" : ObjectId("4c220a42f3924d31102bd868"), "x" : 4, "j" : 19 } { "_id" : ObjectId("4c220a42f3924d31102bd869"), "x" : 4, "j" : 20 }
For more information on inserting new documents, see the insert() (page 204) documentation.
25
2. Print the full result set by using a while loop to iterate over the c variable:
while ( c.hasNext() ) printjson( c.next() )
The hasNext() function returns true if the cursor has documents. The next() method returns the next document. The printjson() method renders the document in a JSON-like format. The result of this operation follows, although if the ObjectId (page 196) values will be unique:
{ { { { { { { { { { { { { { { { { { { { { { "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" : : : : : : : : : : : : : : : : : : : : : : ObjectId("4c2209f9f3924d31102bd84a"), ObjectId("4c2209fef3924d31102bd84b"), ObjectId("4c220a42f3924d31102bd856"), ObjectId("4c220a42f3924d31102bd857"), ObjectId("4c220a42f3924d31102bd858"), ObjectId("4c220a42f3924d31102bd859"), ObjectId("4c220a42f3924d31102bd85a"), ObjectId("4c220a42f3924d31102bd85b"), ObjectId("4c220a42f3924d31102bd85c"), ObjectId("4c220a42f3924d31102bd85d"), ObjectId("4c220a42f3924d31102bd85e"), ObjectId("4c220a42f3924d31102bd85f"), ObjectId("4c220a42f3924d31102bd860"), ObjectId("4c220a42f3924d31102bd861"), ObjectId("4c220a42f3924d31102bd862"), ObjectId("4c220a42f3924d31102bd863"), ObjectId("4c220a42f3924d31102bd864"), ObjectId("4c220a42f3924d31102bd865"), ObjectId("4c220a42f3924d31102bd866"), ObjectId("4c220a42f3924d31102bd867"), ObjectId("4c220a42f3924d31102bd868"), ObjectId("4c220a42f3924d31102bd869"), "name" : "mongo" } "x" : 3 } "x" : 4, "j" : 1 } "x" : 4, "j" : 2 } "x" : 4, "j" : 3 } "x" : 4, "j" : 4 } "x" : 4, "j" : 5 } "x" : 4, "j" : 6 } "x" : 4, "j" : 7 } "x" : 4, "j" : 8 } "x" : 4, "j" : 9 } "x" : 4, "j" : 10 } "x" : 4, "j" : 11 } "x" : 4, "j" : 12 } "x" : 4, "j" : 13 } "x" : 4, "j" : 14 } "x" : 4, "j" : 15 } "x" : 4, "j" : 16 } "x" : 4, "j" : 17 } "x" : 4, "j" : 18 } "x" : 4, "j" : 19 } "x" : 4, "j" : 20 }
Use Array Operations with the Cursor You can manipulate a cursor object as if it were an array. Consider the following procedure: 1. In the mongo (page 1040) shell, query the things collection and assign the resulting cursor object to the c variable: 26 Chapter 1. Installation Guides
var c = db.things.find()
When you access documents in a cursor using the array index notation, mongo (page 1040) rst calls the cursor.toArray() method and loads into RAM all documents returned by the cursor. The index is then applied to the resulting array. This operation iterates the cursor completely and exhausts the cursor. For very large result sets, mongo (page 1040) may run out of available memory. For more information on the cursor, see Iterate the Returned Cursor (page 218). Query for Specic Documents MongoDB has a rich query system that allows you to select and lter the documents in a collection along specic elds and values. See Query Document (page 170) and Read (page 211) for a full account of queries in MongoDB. In this procedure, you query for specic documents in the things collection by passing a query document as a parameter to the find() (page 928) method. A query document species the criteria the query must match to return a document. To query for specic documents, do the following: 1. In the mongo (page 1040) shell, query for all documents where the name eld has a value of mongo by passing the { name : "mongo" } query document as a parameter to the find() (page 928) method:
db.things.find( { name : "mongo" } )
MongoDB returns one document that ts this criteria. The ObjectId (page 196) value will be different:
{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
2. Query for all documents where x has a value of 4 by passing the { x : to find() (page 928):
db.things.find( { x : 4 } )
27
{ { { { {
: : : : :
: : : : :
4, 4, 4, 4, 4,
: : : : :
16 17 18 19 20
} } } } }
ObjectId (page 196) values are always unique. 3. Query for all documents where x has a value of 4, as in the previous query, but only return only the value of j. MongoDB will also return the _id eld, unless explicitly excluded. To do this, you add the { j : 1 } document as the projection in the second parameter to find() (page 928). This operation would resemble the following:
db.things.find( { x : 4 } , { j : 1 } )
Return a Single Document from a Collection With the db.collection.findOne() (page 933) method you can return a single document from a MongoDB collection. The findOne() (page 933) method takes the same parameters as find() (page 928), but returns a document rather than a cursor. To retrieve one document from the things collection, issue the following command:
db.things.findOne()
For more information on querying for documents, see the Read (page 211) and Read Operations (page 169) documentation. Limit the Number of Documents in the Result Set You can constrain the size of the result set to increase performance by limiting the amount of data your application must receive over the network.
28
To specify the maximum number of documents in the result set, call the limit() (page 963) method on a cursor, as in the following command:
db.things.find().limit(3)
MongoDB will return the following result, with different ObjectId (page 196) values:
{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" } { "_id" : ObjectId("4c2209fef3924d31102bd84b"), "x" : 3 } { "_id" : ObjectId("4c220a42f3924d31102bd856"), "x" : 4, "j" : 1 }
29
30
CHAPTER 2
Release Notes
You should always install the latest, stable version of MongoDB. Stable versions have an even-numbered minor version number. For example: v2.4 is stable, v2.2, and v2.0 were previously the stable, while v2.1 and v2.3 are a development versions. Current Stable Release: Release Notes for MongoDB 2.4 (page 1139) Previous Stable Releases: Release Notes for MongoDB 2.2 (page 1161) Release Notes for MongoDB 2.0 (page 1171) Release Notes for MongoDB 1.8 (page 1177)
31
32
Part II
Administration
33
The documentation in this section outlines core administrative tasks and practices that operators of MongoDB will want to consider.
35
36
CHAPTER 3
The command line (page 1025) and conguration le (page 1082) interfaces provide MongoDB administrators with a large number of options and settings for controlling the operation of the database system. This document provides an overview of common congurations and examples of best-practice congurations for common use cases. While both interfaces provide access to the same collection of options and settings, this document primarily uses the conguration le interface. If you run MongoDB using a control script or installed from a package for your operating system, you likely already have a conguration le located at http://docs.mongodb.org/manual/etc/mongodb.conf. Conrm this by checking the content of the http://docs.mongodb.org/manual/etc/init.d/mongod or http://docs.mongodb.org/manual/etc/rc.d/mongod script to insure that the control scripts start the mongod (page 1025) with the appropriate conguration le (see below.) To start MongoDB instance using this conguration issue a command in the following form:
mongod --config /etc/mongodb.conf mongod -f /etc/mongodb.conf
Modify the values in the http://docs.mongodb.org/manual/etc/mongodb.conf le on your system to control the conguration of your database instance.
For most standalone servers, this is a sufcient base conguration. It makes several assumptions, but consider the following explanation: fork (page 1085) is true, which enables a daemon mode for mongod (page 1025), which detaches (i.e. forks) the MongoDB from the current session and allows you to run the database as a conventional server. bind_ip (page 1083) is 127.0.0.1, which forces the server to only listen for requests on the localhost IP. Only bind to secure interfaces that the application-level systems can access with access control provided by system network ltering (i.e. rewall). 37
port (page 1083) is 27017, which is the default MongoDB port for database instances. MongoDB can bind to any port. You can also lter access based on port using network ltering tools. Note: UNIX-like systems require superuser privileges to attach processes to ports lower than 1024. quiet (page 1090) is true. This disables all but the most critical entries in output/log le. In normal operation this is the preferable operation to avoid log noise. In diagnostic or testing situations, set this value to false. Use setParameter (page 872) to modify this setting during run time. dbpath (page 1085) is http://docs.mongodb.org/manual/srv/mongodb, which species where MongoDB will store its data les. http://docs.mongodb.org/manual/srv/mongodb and http://docs.mongodb.org/manual/var/lib/mongodb are popular locations. The user account that mongod (page 1025) runs under will need read and write access to this directory. logpath (page 1084) is http://docs.mongodb.org/manual/var/log/mongodb/mongod.log which is where mongod (page 1025) will write its output. If you do not set this value, mongod (page 1025) writes all output to standard output (e.g. stdout.) logappend (page 1084) is true, which ensures that mongod (page 1025) does not overwrite an existing log le following the server start operation. journal (page 1087) is true, which enables journaling. Journaling ensures single instance write-durability. 64-bit builds of mongod (page 1025) enable journaling by default. Thus, this setting may be redundant. Given the default conguration, some of these values may be redundant. However, in many situations explicitly stating the conguration increases overall system intelligibility.
Consider the following explanation for these conguration decisions: bind_ip (page 1083) has three values: 127.0.0.1, the localhost interface; 10.8.0.10, a private IP address typically used for local networks and VPN interfaces; and 192.168.4.24, a private network interface typically used for local networks. Because production MongoDB instances need to be accessible from multiple database servers, it is important to bind MongoDB to multiple interfaces that are accessible from your application servers. At the same time its important to limit these interfaces to interfaces controlled and protected at the network layer. nounixsocket (page 1085) to true disables the UNIX Socket, which is otherwise enabled by default. This limits access on the local system. This is desirable when running MongoDB on systems with shared access, but in most situations has minimal impact. auth (page 1085) is true enables the authentication system within MongoDB. If enabled you will need to log in by connecting over the localhost interface for the rst time to create user credentials. See also: Security Practices and Management (page 133)
38
Use descriptive names for sets. Once congured use the mongo (page 1040) shell to add hosts to the replica set. See also: Replica set reconguration (page 469). To enable authentication for the replica set, add the following option:
keyFile = /srv/mongodb/keyfile
New in version 1.8: for replica sets, and 1.9.1 for sharded replica sets. Setting keyFile (page 1085) enables authentication and species a key le for the replica set member use to when authenticating to each other. The content of the key le is arbitrary, but must be the same on all members of the replica set and mongos (page 1036) instances that connect to the set. The keyle must be less than one kilobyte in size and may only contain characters in the base64 set and the le must not have group or world permissions on UNIX systems. See also: The Replica set Reconguration (page 469) section for information regarding the process for changing replica set during operation. Additionally, consider the Replica Set Security (page 395) section for information on conguring authentication with replica sets. Finally, see the Replication (page 387) index and the Replica Set Fundamental Concepts (page 389) document for more information on replication in MongoDB and replica set conguration in general.
This creates a cong server running on the private IP address 10.8.0.12 on port 27001. Make sure that there are no port conicts, and that your cong server is accessible from all of your mongos (page 1036) and mongod (page 1025) instances. To set up shards, congure two or more mongod (page 1025) instance using your base conguration (page 37), adding the shardsvr (page 1093) setting: 3.3. Replication and Sharding Conguration 39
shardsvr = true
Finally, to establish the cluster, congure at least one mongos (page 1036) process with the following settings:
configdb = 10.8.0.12:27001 chunkSize = 64
You can specify multiple configdb (page 1093) instances by specifying hostnames and ports in the form of a comma separated list. In general, avoid modifying the chunkSize (page 1094) from the default value of 64, 1 and should ensure this setting is consistent among all mongos (page 1036) instances. See also: The Sharding (page 485) section of the manual for more information on sharding and cluster conguration.
The dbpath (page 1085) value controls the location of the mongod (page 1025) instances data directory. Ensure that each database has a distinct and well labeled data directory. The pidfilepath (page 1085) controls where mongod (page 1025) process places its process id le. As this tracks the specic mongod (page 1025) le, it is crucial that le be unique and well labeled to make it easy to start and stop these processes. Create additional control scripts and/or adjust your existing MongoDB conguration and control script as needed to control these processes.
Use the base conguration (page 37) and add these options if you are experiencing some unknown issue or performance problem as needed: slowms (page 1089) congures the threshold for the database proler to consider a query slow. The default value is 100 milliseconds. Set a lower value if the database proler does not return useful results. See Optimization Strategies for MongoDB (page 559) for more information on optimizing operations in MongoDB.
1 Chunk size is 64 megabytes by default, which provides the ideal balance between the most even distribution of data, for which smaller chunk sizes are best, and minimizing chunk migration, for which larger chunk sizes are optimal. 2 Single-tenant systems with SSD or other high performance disks may provide acceptable performance levels for multiple mongod (page 1025) instances. Additionally, you may nd that multiple databases with small working sets may function acceptably on a single system.
40
profile (page 1088) sets the database proler level. The proler is not active by default because of the possible impact on the proler itself on performance. Unless this setting has a value, queries are not proled. verbose (page 1083) enables a verbose logging mode that modies mongod (page 1025) output and increases logging to include a greater number of events. Only use this option if you are experiencing an issue that is not reected in the normal logging level. If you require additional verbosity, consider the following options:
v = true vv = true vvv = true vvvv = true vvvvv = true
Each additional level v adds additional verbosity to the logging. The verbose option is equal to v = true. diaglog (page 1086) enables diagnostic logging. Level 3 logs all read and write options. objcheck (page 1084) forces mongod (page 1025) to validate all requests from clients upon receipt. Use this option to ensure that invalid requests are not causing errors, particularly when running a database with untrusted clients. This option may affect database performance. cpu (page 1085) forces mongod (page 1025) to report the percentage of the last interval spent in write-lock. The interval is typically 4 seconds, and each output line in the log includes both the actual interval since the last report and the percentage of time spent in write lock.
41
42
CHAPTER 4
43
44
Replica Set Backup Considerations In most cases, backing up data stored in a replica set is similar to backing up data stored in a single instance. It is possible to lock a single secondary database and then create a backup from that instance. When you unlock the database, the secondary will catch up with the primary. You may also choose to deploy a dedicated hidden member for backup purposes. If you have a sharded cluster where each shard is itself a replica set, you can use this method to create a backup of the entire cluster without disrupting the operation of the node. In these situations you should still turn off the balancer when you create backups. For any cluster, using a non-primary node to create backups is particularly advantageous in that the backup operation does not affect the performance of the primary. Replication itself provides some measure of redundancy. Nevertheless, keeping point-in time backups of your cluster to provide for disaster recovery and as an additional layer of protection is crucial. For an overview of backup strategies and considerations for all MongoDB deployments, consider, Backup Strategies for MongoDB Systems (page 43). For practical instructions and example backup procedures consider the following documents:
The mongodump (page 1048) utility can back up data by either: connecting to a running mongod (page 1025) or mongos (page 1036) instance, or accessing data les without an active instance. The utility can create a backup for an entire server, database or collection, or can use a query to backup just part of a collection. When you run mongodump (page 1048) without any arguments, the command connects to the local database instance (e.g. 127.0.0.1 or localhost) on port 27017 and creates a database backup named dump/ in the current directory. To backup data from a mongod (page 1025) or mongos (page 1036) instance running on the same machine and on the default port of 27017 use the following command: 4.2. Backup and Recovery Procedures 45
mongodump
Note: The format of data created by mongodump (page 1048) tool from the 2.2 distribution or later is different and incompatible with earlier versions of mongod (page 1025). To limit the amount of data included in the database dump, you can specify --db (page 1049) and --collection (page 1050) as options to the mongodump (page 1048) command. For example:
mongodump --dbpath /data/db/ --out /data/backup/ mongodump --host mongodb.example.net --port 27017
mongodump (page 1048) will write BSON les that hold a copy of data accessible via the mongod (page 1025) listening on port 27017 of the mongodb.example.net host.
mongodump --collection collection --db test
This command creates a dump of the collection named collection from the database test in a dump/ subdirectory of the current working directory.
Point in Time Operation Using Oplogs
Use the --oplog (page 1050) option with mongodump (page 1048) to collect the oplog entries to build a point-intime snapshot of a database within a replica set. With --oplog (page 1050), mongodump (page 1048) copies all the data from the source database as well as all of the oplog entries from the beginning of the backup procedure to until the backup procedure completes. This backup procedure, in conjunction with mongorestore --oplogReplay (page 1054), allows you to restore a backup that reects a consistent and specic moment in time.
Create Backups Without a Running mongod Instance
If your MongoDB instance is not running, you can use the --dbpath (page 1049) option to specify the location to your MongoDB instances database les. mongodump (page 1048) reads from the data les directly with this operation. This locks the data directory to prevent conicting writes. The mongod (page 1025) process must not be running or attached to these data les when you run mongodump (page 1048) in this conguration. Consider the following example:
mongodump --dbpath /srv/mongodb
The --host (page 1048) and --port (page 1048) options for mongodump (page 1048) allow you to connect to and backup from a remote host. Consider the following example:
mongodump --host mongodb1.example.net --port 3017 --username user --password pass --out /opt/backup/m
On any mongodump (page 1048) command you may, as above, specify username and password credentials to specify database authentication. Restore a Database with mongorestore The mongorestore (page 1052) utility restores a binary backup created by mongodump (page 1048). By default, mongorestore (page 1052) looks for a database backup in the dump/ directory. 46 Chapter 4. Backup and Recovery Operations for MongoDB
The mongorestore (page 1052) utility can restore data either by: connecting to a running mongod (page 1025) or mongos (page 1036) directly, or writing to a local database path without use of a running mongod (page 1025). The mongorestore (page 1052) utility can restore either an entire database backup or a subset of the backup. A mongorestore (page 1052) command that connects to an active mongod (page 1025) or mongos (page 1036) has the following prototype form:
mongorestore --port <port number> <path to the backup>
A mongorestore (page 1052) command that writes to data les without using a running mongod (page 1025) has the following prototype form:
mongorestore --dbpath <local database path> <path to the backup>
Here, mongorestore (page 1052) imports the database backup in the dump-2012-10-25 directory to the mongod (page 1025) instance running on the localhost interface.
Restore Point in Time Oplog Backup
If you created your database dump using the --oplog (page 1050) option to ensure a point-in-time snapshot, call mongorestore (page 1052) with the --oplogReplay (page 1054) option, as in the following example:
mongorestore --oplogReplay
You may also consider using the mongorestore --objcheck (page 1054) option to check the integrity of objects while inserting them into the database, or you may consider the mongorestore --drop (page 1054) option to drop each collection from the database before restoring from backups.
Restore a Subset of data from a Binary Database Dump
mongorestore (page 1052) also includes the ability to a lter to all input before inserting it into the new database. Consider the following example:
mongorestore --filter {"field": 1}
Here, mongorestore (page 1052) only adds documents to the database from the dump located in the dump/ folder if the documents have a eld name field that holds a value of 1. Enclose the lter in single quotes (e.g. ) to prevent the lter from interacting with your shell environment.
Restore without a Running mongod
mongorestore (page 1052) can write data to MongoDB data les without needing to connect to a mongod (page 1025) directly.
mongorestore --dbpath /srv/mongodb --journal
Here, mongorestore (page 1052) restores the database dump located in dump/ folder into the data les located at http://docs.mongodb.org/manual/srv/mongodb. Additionally, the --journal (page 1053) option
47
ensures that mongorestore (page 1052) records all operation in the durability journal. The journal prevents data le corruption if anything (e.g. power failure, disk failure, etc.) interrupts the restore operation. See also: mongodump (page 1048) and mongorestore (page 1051).
Restore Backups to Non-Local mongod Instances
By default, mongorestore (page 1052) connects to a MongoDB instance running on the localhost interface (e.g. 127.0.0.1) and on the default port (27017). If you want to restore to a different host or port, use the --host (page 1052) and --port (page 1052) options. Consider the following example:
mongorestore --host mongodb1.example.net --port 3017 --username user --password pass /opt/backup/mong
As above, you may specify username and password connections if your mongod (page 1025) requires authentication.
48
Alternately, store all MongoDB data les on a dedicated device so that you can make backups without duplicating extraneous data. Ensure that you copy data from snapshots and onto other systems to ensure that data is safe from site failures. Although different snapshots methods provide different capability, the LVM method outlined below does not provide any capacity for capturing incremental backups.
Snapshots With Journaling
If your mongod (page 1025) instance has journaling enabled, then you can use any kind of le system or volume/block level snapshot tool to create backups. If you manage your own infrastructure on a Linux-based system, congure your system with LVM to provide your disk packages and provide snapshot capability. You can also use LVM-based setups within a cloud/virtualized environment. Note: Running LVM provides additional exibility and enables the possibility of using snapshots to back up MongoDB.
If your deployment depends on Amazons Elastic Block Storage (EBS) with RAID congured within your instance, it is impossible to get a consistent state across all disks using the platforms snapshot tool. As an alternative, you can do one of the following: Flush all writes to disk and create a write lock to ensure consistent state during the backup process. If you choose this option see Create Backups on Instances that do not have Journaling Enabled (page 51). Congure LVM to run and hold your MongoDB data les on top of the RAID within your system. If you choose this option, perform the LVM backup operation described in Create a Snapshot (page 49). Backup and Restore Using LVM on a Linux System This section provides an overview of a simple backup process using LVM on a Linux system. While the tools, commands, and paths may be (slightly) different on your system the following steps provide a high level overview of the backup operation. Note: Only use the following procedure as a guideline for a backup system and infrastructure. Production backup systems must consider a number of application specic requirements and factors unique to specic environments.
Create a Snapshot
To create a snapshot with LVM , issue a command as root in the following format:
lvcreate --size 100M --snapshot --name mdb-snap01 /dev/vg0/mongodb
This command creates an LVM snapshot (with the --snapshot option) named mdb-snap01 of the mongodb volume in the vg0 volume group.
This example creates a snapshot named mdb-snap01 located at http://docs.mongodb.org/manual/dev/vg0/mdb-snap0 The location and paths to your systems volume groups and devices may vary slightly depending on your operating systems LVM conguration. 4.2. Backup and Recovery Procedures 49
The snapshot has a cap of at 100 megabytes, because of the parameter --size 100M. This size does not reect the total amount of the data on the disk, but rather the quantity of differences between the current state of http://docs.mongodb.org/manual/dev/vg0/mongodb and the creation of the snapshot (i.e. http://docs.mongodb.org/manual/dev/vg0/mdb-snap01.) Warning: Ensure that you create snapshots with enough space to account for data growth, particularly for the period of time that it takes to copy data out of the system or to a temporary image. If your snapshot runs out of space, the snapshot image becomes unusable. Discard this logical volume and create another. The snapshot will exist when the command returns. You can restore directly from the snapshot at any time or by creating a new logical volume and restoring from this snapshot to the alternate image. While snapshots are great for creating high quality backups very quickly, they are not ideal as a format for storing backup data. Snapshots typically depend and reside on the same storage infrastructure as the original disk images. Therefore, its crucial that you archive these snapshots and store them elsewhere.
Archive a Snapshot
After creating a snapshot, mount the snapshot and move the data to separate storage. Your system might try to compress the backup images as you move the ofine. The following procedure fully archives the data from the snapshot:
umount /dev/vg0/mdb-snap01 dd if=/dev/vg0/mdb-snap01 | gzip > mdb-snap01.gz
The above command sequence does the following: Ensures that the http://docs.mongodb.org/manual/dev/vg0/mdb-snap01 device is not mounted. Performs a block level copy of the entire snapshot image using the dd command and compresses the result in a gzipped le in the current working directory. Warning: This command will create a large gz le in your current working directory. Make sure that you run this command in a le system that has enough free space.
Restore a Snapshot
To restore a snapshot created with the above method, issue the following sequence of commands:
lvcreate --size 1G --name mdb-new vg0 gzip -d -c mdb-snap01.gz | dd of=/dev/vg0/mdb-new mount /dev/vg0/mdb-new /srv/mongodb
The above sequence does the following: Creates a new logical volume named mdb-new, in the http://docs.mongodb.org/manual/dev/vg0 volume group. The path to the new device will be http://docs.mongodb.org/manual/dev/vg0/mdb-new. Warning: This volume will have a maximum size of 1 gigabyte. The original le system must have had a total size of 1 gigabyte or smaller, or else the restoration will fail. Change 1G to your desired volume size. Uncompresses and unarchives the mdb-snap01.gz into the mdb-new disk image.
50
Mounts the mdb-new disk image to the http://docs.mongodb.org/manual/srv/mongodb directory. Modify the mount point to correspond to your MongoDB data le location, or other location as needed. Note: The restored snapshot will have a stale mongod.lock le. If you do not remove this le from the snapshot, and MongoDB may assume that the stale lock le indicates an unclean shutdown. If youre running with journal (page 1087) enabled, and you do not use db.fsyncLock() (page 981), you do not need to remove the mongod.lock le. If you use db.fsyncLock() (page 981) you will need to remove the lock.
To restore a backup without writing to a compressed gz le, use the following sequence of commands:
umount /dev/vg0/mdb-snap01 lvcreate --size 1G --name mdb-new vg0 dd if=/dev/vg0/mdb-snap01 of=/dev/vg0/mdb-new mount /dev/vg0/mdb-new /srv/mongodb
You can implement off-system backups using the combined process (page 51) and SSH. This sequence is identical to procedures explained above, except that it archives and compresses the backup on a remote system using SSH. Consider the following procedure:
umount /dev/vg0/mdb-snap01 dd if=/dev/vg0/mdb-snap01 | ssh [email protected] gzip > /opt/backup/mdb-snap01.gz lvcreate --size 1G --name mdb-new vg0 ssh [email protected] gzip -d -c /opt/backup/mdb-snap01.gz | dd of=/dev/vg0/mdb-new mount /dev/vg0/mdb-new /srv/mongodb
Create Backups on Instances that do not have Journaling Enabled If your mongod (page 1025) instance does not run with journaling enabled, or if your journal is on a separate volume, obtaining a functional backup of a consistent state is more complicated. As described in this section, you must ush all writes to disk and lock the database to prevent writes during the backup process. If you have a replica set conguration, then for your backup use a secondary which is not receiving reads (i.e. hidden member). 1. To ush writes to disk and to lock the database (to prevent further writes), issue the db.fsyncLock() (page 981) method in the mongo (page 1040) shell:
db.fsyncLock();
2. Perform the backup operation described in Create a Snapshot (page 49). 3. To unlock the database after the snapshot has completed, use the following command in the mongo (page 1040) shell:
db.fsyncUnlock();
Note: Changed in version 2.0: MongoDB 2.0 added db.fsyncLock() (page 981) and db.fsyncUnlock() (page 981) helpers to the mongo (page 1040) shell. Prior to this version, use the fsync (page 868) command with the lock option, as follows:
51
Note: The database cannot be locked with db.fsyncLock() (page 981) while proling is enabled. You must disable proling before locking the database with db.fsyncLock() (page 981). Disable proling using db.setProfilingLevel() (page 988) as follows in the mongo (page 1040) shell:
db.setProfilingLevel(0)
Warning: Changed in version 2.2: When used in combination with fsync (page 868) or db.fsyncLock() (page 981), mongod (page 1025) may block some reads, including those from mongodump (page 1048), when queued write operation waits behind the fsync (page 868) lock.
Considerations You must run copydb (page 866) or clone (page 860) on the destination server. You cannot use copydb (page 866) or clone (page 860) with databases that have a sharded collection in a sharded cluster, or any database via a mongos (page 1036). You can use copydb (page 866) or clone (page 860) with databases that do not have sharded collections in a cluster when youre connected directly to the mongod (page 1025) instance. You can run copydb (page 866) or clone (page 860) commands on a secondary member of a replica set, with properly congured read preference. Each destination mongod (page 1025) instance must have enough free disk space on the destination server for the database you are copying. Use the db.stats() (page 989) operation to check the size of the database on the source mongod (page 1025) instance. For more information, see db.stats() (page 989). 52 Chapter 4. Backup and Recovery Operations for MongoDB
Processes
Copy and Rename a Database
To copy a database from one MongoDB instance to another and rename the database in the process, use the copydb (page 866) command, or the db.copyDatabase() (page 973) helper in the mongo (page 1040) shell. Use the following procedure to copy the database named test on server db0.example.net to the server named db1.example.net and rename it to records in the process: Verify that the database, test exists on the source mongod (page 1025) instance running on the db0.example.net host. Connect to the destination server, running on the db1.example.net host, using the mongo (page 1040) shell. Model your operation on the following command:
db.copyDatabase( "test", "records", db0.example.net )
Rename a Database
You can also use copydb (page 866) or the db.copyDatabase() (page 973) helper to: rename a database within a single MongoDB instance or create a duplicate database for testing purposes. Use the following procedure to rename the test database records on a single mongod (page 1025) instance: Connect to the mongod (page 1025) using the mongo (page 1040) shell. Model your operation on the following command:
db.copyDatabase( "test", "records" )
To copy a database from a source MongoDB instance that has authentication enabled, you can specify authentication credentials to the copydb (page 866) command or the db.copyDatabase() (page 973) helper in the mongo (page 1040) shell. In the following operation, you will copy the test database from the mongod (page 1025) running on db0.example.net to the records database on the local instance (e.g. db1.example.net.) Because the mongod (page 1025) instance running on db0.example.net requires authentication for all connections, you will need to pass db.copyDatabase() (page 973) authentication credentials, as in the following procedure: Connect to the destination mongod (page 1025) instance running on the db1.example.net host using the mongo (page 1040) shell. Issue the following command:
db.copyDatabase( "test", "records", db0.example.net, "<username>", "<password>")
53
Clone a Database
The clone (page 860) command copies a database between mongod (page 1025) instances like copydb (page 866); however, clone (page 860) preserves the database name from the source instance on the destination mongod (page 1025). For many operations, clone (page 860) is functionally equivalent to copydb (page 866), but it has a more simple syntax and a more narrow use. The mongo (page 1040) shell provides the db.cloneDatabase() (page 973) helper as a wrapper around clone (page 860). You can use the following procedure to clone a database from the mongod (page 1025) instance running on db0.example.net to the mongod (page 1025) running on db1.example.net: Connect to the destination mongod (page 1025) instance running on the db1.example.net host using the mongo (page 1040) shell. Issue the following command to specify the name of the database you want to copy:
use records
Use the following operation to initiate the clone (page 860) operation:
db.cloneDatabase( "db0.example.net" )
When you are aware of a mongod (page 1025) instance running without journaling that stops unexpectedly and youre not running with replication, you should always run the repair operation before starting MongoDB again. If youre using replication, then restore from a backup and allow replication to perform an initial sync (page 413) to restore data.
1 To ensure a clean shut down, use the mongod --shutdown (page 1032) option, your control script, Control-C (when running mongod (page 1025) in interactive mode,) or kill $(pidof mongod) or kill -2 $(pidof mongod). 2 You can also use the db.collection.validate() (page 953) method to test the integrity of a single collection. However, this process is time consuming, and without journaling you can safely assume that the data is in an invalid state and you should either run the repair operation or resync from an intact member of the replica set.
54
If the mongod.lock le in the data directory specied by dbpath (page 1085), http://docs.mongodb.org/manual/data/db by default, is not a zero-byte le, then mongod (page 1025) will refuse to start, and you will nd a message that contains the following line in your MongoDB log our output:
Unclean shutdown detected.
This indicates that you need to remove the lockle and run repair. If you run repair when the mongodb.lock le exists without the mongod --repairpath (page 1031) option, you will see a message that contains the following line:
old lock file: /data/db/mongod.lock. probably means unclean shutdown
You must remove the lockle and run the repair operation before starting the database normally using the following procedure:
Overview
Warning: Recovering a member of a replica set. Do not use this procedure to recover a member of a replica set. Instead you should either restore from a backup (page 43) or perform an initial sync using data from an intact member of the set, as described in Resync a Member of a Replica Set (page 430). There are two processes to repair data les that result from an unexpected shutdown: 1. Use the --repair (page 1030) option in conjunction with the --repairpath (page 1031) option. mongod (page 1025) will read the existing data les, and write the existing data to new data les. This does not modify or alter the existing data les. You do not need to remove the mongod.lock le before using this procedure. 2. Use the --repair (page 1030) option. mongod (page 1025) will read the existing data les, write the existing data to new les and replace the existing, possibly corrupt, les with new les. You must remove the mongod.lock le before using this procedure. Note: --repair (page 1030) functionality is also available in the shell with the db.repairDatabase() (page 987) helper for the repairDatabase (page 872) command.
Procedures
To repair your data les using the --repairpath (page 1031) option to preserve the original data les unmodied: 1. Start mongod (page 1025) using --repair (page 1030) to read the existing data les.
mongod --dbpath /data/db --repair --repairpath /data/db0
les
will
be
in
the
2. Start mongod (page 1025) using the following invocation to point the dbpath (page 1085) at http://docs.mongodb.org/manual/data/db0:
mongod --dbpath /data/db0
Once you conrm that the data les are operational you may delete or archive the data les in the http://docs.mongodb.org/manual/data/db directory.
55
To repair your data les without preserving the original les, do not use the --repairpath (page 1031) option, as in the following procedure: 1. Remove the stale lock le:
rm /data/db/mongod.lock
Replace http://docs.mongodb.org/manual/data/db with your dbpath (page 1085) where your MongoDB instances data les reside. Warning: After you remove the mongod.lock le you must run the --repair (page 1030) process before using your database. 2. Start mongod (page 1025) using --repair (page 1030) to read the existing data les.
mongod --dbpath /data/db --repair
When this completes, the repaired data les will replace http://docs.mongodb.org/manual/data/db directory.
the
original
data
les
in
the
3. Start mongod (page 1025) using the following invocation to point the dbpath (page 1085) at http://docs.mongodb.org/manual/data/db:
mongod --dbpath /data/db
mongod.lock In normal operation, you should never remove the mongod.lock le and start mongod (page 1025). Instead consider the one of the above methods to recover the database and remove the lock les. In dire situations you can remove the lockle, and start the database using the possibly corrupt les, and attempt to recover data from the database; however, its impossible to predict the state of the database in these situations. If you are not running with journaling, and your database shuts down unexpectedly for any reason, you should always proceed as if your database is in an inconsistent and likely corrupt state. If at all possible restore from backup (page 43) or, if running as a replica set, restore by performing an initial sync using data from an intact member of the set, as described in Resync a Member of a Replica Set (page 430).
56
Procedure
Capture Data
Note: If you use mongodump (page 1048) without specifying a database or collection, mongodump (page 1048) will capture collection data and the cluster meta-data from the cong servers (page 502). You cannot use the --oplog (page 1050) option for mongodump (page 1048) when capturing data from mongos (page 1036). This option is only available when running directly against a replica set member. You can perform a backup of a sharded cluster by connecting mongodump (page 1048) to a mongos (page 1036). Use the following operation at your systems prompt:
mongodump --host mongos3.example.net --port 27017
mongodump (page 1048) will write BSON les that hold a copy of data stored in the sharded cluster accessible via the mongos (page 1036) listening on port 27017 of the mongos3.example.net host.
Restore Data
Backups created with mongodump (page 1048) do not reect the chunks or the distribution of data in the sharded collection or collections. Like all mongodump (page 1048) output, these backups contain separate directories for each database and BSON les for each collection in that database. You can restore mongodump (page 1048) output to any MongoDB instance, including a standalone, a replica set, or a new sharded cluster. When restoring data to sharded cluster, you must deploy and congure sharding before restoring data from the backup. See Deploy a Sharded Cluster (page 505) for more information.
Procedure In this procedure, you will stop the cluster balancer and take a backup up of the cong database, and then take backups of each shard in the cluster using a le-system snapshot tool. If you need an exact moment-in-time snapshot of the system, you will need to stop all application writes before taking the lesystem snapshots; otherwise the snapshot will only approximate a moment in time. For approximate point-in-time snapshots, you can improve the quality of the backup while minimizing impact on the cluster by taking the backup from a secondary member of the replica set that provides each shard.
57
1. Disable the balancer process that equalizes the distribution of data among the shards. To disable the balancer, use the sh.stopBalancer() (page 1007) method in the mongo (page 1040) shell, and see the Disable the Balancer (page 531) procedure. Warning: It is essential that you stop the balancer before creating backups. If the balancer remains active, your resulting backups could have duplicate data or miss some data, as chunks may migrate while recording backups. 2. Lock one member of each replica set in each shard so that your backups reect the state of your database at the nearest possible approximation of a single moment in time. Lock these mongod (page 1025) instances in as short of an interval as possible. To lock or freeze a sharded cluster, you must: use the db.fsyncLock() (page 981) method in the mongo (page 1040) shell connected to a single secondary member of the replica set that provides shard mongod (page 1025) instance. Shutdown one of the cong servers (page 502), to prevent all metadata changes during the backup process. 3. Use mongodump (page 1048) to backup one of the cong servers (page 502). This backs up the clusters metadata. You only need to back up one cong server, as they all hold the same data. Issue this command against one of the cong mongod (page 1025) instances or via the mongos (page 1036):
mongodump --db config
4. Back up the replica set members of the shards that you locked. You may back up the shards in parallel. For each shard, create a snapshot. Use the procedures in Use Filesystem Snapshots to Backup and Restore MongoDB Databases (page 48). 5. Unlock all locked replica set members of each shard using the db.fsyncUnlock() (page 981) method in the mongo (page 1040) shell. 6. Restore the balancer with the sh.startBalancer() (page 1005) method according to the Disable the Balancer (page 531) procedure. Use the following command sequence when connected to the mongos (page 1036) with the mongo (page 1040) shell:
use config sh.startBalancer()
58
Procedure In this procedure, you will stop the cluster balancer and take a backup up of the cong database, and then take backups of each shard in the cluster using mongodump (page 1048) to capture the backup data. If you need an exact momentin-time snapshot of the system, you will need to stop all application writes before taking the lesystem snapshots; otherwise the snapshot will only approximate a moment of time. For approximate point-in-time snapshots, you can improve the quality of the backup while minimizing impact on the cluster by taking the backup from a secondary member of the replica set that provides each shard. 1. Disable the balancer process that equalizes the distribution of data among the shards. To disable the balancer, use the sh.stopBalancer() (page 1007) method in the mongo (page 1040) shell, and see the Disable the Balancer (page 531) procedure. Warning: It is essential that you stop the balancer before creating backups. If the balancer remains active, your resulting backups could have duplicate data or miss some data, as chunks migrate while recording backups. 2. Lock one member of each replica set in each shard so that your backups reect the state of your database at the nearest possible approximation of a single moment in time. Lock these mongod (page 1025) instances in as short of an interval as possible. To lock or freeze a sharded cluster, you must: Shutdown one member of each replica set. Ensure that the oplog has sufcient capacity to allow these secondaries to catch up to the state of the primaries after nishing the backup procedure. See Oplog (page 394) for more information. Shutdown one of the cong servers (page 502), to prevent all metadata changes during the backup process. 3. Use mongodump (page 1048) to backup one of the cong servers (page 502). This backs up the clusters metadata. You only need to back up one cong server, as they all hold the same data. Issue this command against one of the cong mongod (page 1025) instances or via the mongos (page 1036):
mongodump --journal --db config
4. Back up the replica set members of the shards that shut down using mongodump (page 1048) and specifying the --dbpath (page 1049) option. You may back up the shards in parallel. Consider the following invocation:
mongodump --journal --dbpath /data/db/ --out /data/backup/
You must run this command on the system where the mongod (page 1025) ran. This operation will use journaling and create a dump of the entire mongod (page 1025) instance with data les stored in http://docs.mongodb.org/manual/data/db/. mongodump (page 1048) will write the output of this dump to the http://docs.mongodb.org/manual/data/backup/ directory. 5. Restart all stopped replica set members of each shard as normal and allow them to catch up with the state of the primary. 6. Restore the balancer with the sh.startBalancer() (page 1005) method according to the Disable the Balancer (page 531) procedure. Use the following command sequence when connected to the mongos (page 1036) with the mongo (page 1040) shell:
use config sh.startBalancer()
59
60
(a) Start the three cong servers (page 502) by issuing commands similar to the following, using values appropriate to your conguration:
mongod --configsvr --dbpath /data/configdb --port 27019
(b) Restore the Cong Database (page 547) on each cong server. (c) Start one mongos (page 1036) instance. (d) Update the Cong Database (page 547) collection named shards to reect the new hostnames. 3. Restore the following: Data les for each server in each shard. Because replica sets provide each production shard, restore all the members of the replica set or use the other standard approaches for restoring a replica set from backup. See the Restore a Snapshot (page 50) and Restore a Database with mongorestore (page 46) sections for details on these procedures. Data les for each cong server (page 502), if you have not already done so in the previous step. 4. Restart all the mongos (page 1036) instances. 5. Restart all the mongod (page 1025) instances. 6. Connect to a mongos (page 1036) instance from a mongo (page 1040) shell and use the db.printShardingStatus() (page 986) method to ensure that the cluster is operational, as follows:
db.printShardingStatus() show collections
This operation congures the balancer to run between 6:00am and 11:00pm, server time. Schedule your backup operation to run and complete outside of this time. Ensure that the backup can complete outside the window when the balancer is running and that the balancer can effectively balance the collection among the shards in the window allotted to each.
61
62
CHAPTER 5
MongoDB provides a number of features that allow application developers and database administrators to customize the behavior of a sharded cluster or replica set deployment so that MongoDB may be more data center aware, or allow operational and location-based separation. MongoDB also supports segregation based on functional parameters, to ensure that certain mongod (page 1025) instances are only used for reporting workloads or that certain high-frequency portions of a sharded collection only exist on specic shards. Consider the following documents:
63
Write Concerns (page 400), which controls how MongoDB ensures that write operations propagate to members of a replica set. Replica Set Tags (page 457), which control how applications create and interact with custom groupings of replica set members to create custom application-specic read preferences and write concerns. Tag Aware Sharding (page 534), which allows MongoDB administrators to dene an application-specic balancing policy, to control how documents belonging to specic ranges of a shard key distribute to shards in the sharded cluster. See also: Before adding operational segregation features to your application and MongoDB deployment, become familiar with all documentation of replication (page 387) and sharding (page 485), particularly Replica Set Fundamental Concepts (page 389) and Sharded Cluster Overview (page 487).
Note: Because a single chunk may span different tagged shard key ranges, the balancer may migrate chunks to tagged shards that contain values that exceed the upper bound of the selected tag range. Example Given a sharded collection with two congured tag ranges, such that: Shard key values between 100 and 200 have tags to direct corresponding chunks to shards tagged NYC. Shard Key values between 200 and 300 have tags to direct corresponding chunks to shards tagged SFO.
1 To migrate chunks in a tagged environment, the balancer selects a target shard with a tag range that has an upper bound that is greater than the migrating chunks lower bound. If a shard with a matching tagged range exists, the balancer will migrate the chunk to that shard.
64
In this cluster, the balancer will migrate a chunk with shard key values ranging between 150 and 220 to a shard tagged NYC, since 150 is closer to 200 than 300. After conguring tags on the shards and ranges of the shard key, the cluster may take some time to reach the proper distribution of data, depending on the division of chunks (i.e. splits) and the current distribution of data in the cluster. Once congured, the balancer will respect tag ranges during future balancing rounds (page 499). See also: Administer and Manage Shard Tags (page 519)
You may remove tags from a particular shard using the sh.removeShardTag() (page 1003) method when connected to a mongos (page 1036) instance, as in the following example, which removes the NRT tag from a shard:
sh.removeShardTag("shard0002", "NRT")
65
Note: Shard ranges are always inclusive of the lower value and exclusive of the upper boundary.
You can nd tag ranges for all namespaces in the tags (page 552) collection of the config database. The output of sh.status() (page 1005) displays all tag ranges. To return all shard key ranges tagged with NYC, use the following sequence of operations:
use config db.tags.find({ tags: "NYC" })
5.4.1 Overview
While replica sets provide basic protection against single-instance failure, when all of the members of a replica set reside in a single facility, the replica set is still susceptible to some classes of errors in that facility including power outages, networking distortions, and natural disasters. To protect against these classes of failures, deploy a replica set with one or more members in a geographically distinct facility or data center. 66 Chapter 5. Data Center Awareness
5.4.2 Requirements
For a three-member replica set you need two instances in a primary facility (hereafter, Site A) and one member in a secondary facility (hereafter, Site B.) Site A should be the same facility or very close to your primary application infrastructure (i.e. application servers, caching layer, users, etc.) For a four-member replica set you need two members in Site A, two members in Site B (or one member in Site B and one member in Site C,) and a single arbiter in Site A. For replica sets with additional members in the secondary facility or with multiple secondary facilities, the requirements are the same as above but with the following notes: Ensure that a majority of the voting members (page 391) are within Site A. This includes secondary-only members (page 390) and arbiters (page 390) For more information on the need to keep the voting majority on one site, see Elections (page 391). If you deploy a replica set with an uneven number of members, deploy an arbiter (page 390) on Site A. The arbiter must be on site A to keep the majority there. For all congurations in this tutorial, deploy each replica set member on a separate system. Although you may deploy more than one replica set member on a single system, doing so reduces the redundancy and capacity of the replica set. Such deployments are typically for testing purposes and beyond the scope of this tutorial.
5.4.3 Procedures
Deploy a Distributed Three-Member Replica Set A geographically distributed three-member deployment has the following features: Each member of the replica set resides on its own machine, and the MongoDB processes all bind to port 27017, which is the standard MongoDB port. Each member of the replica set must be accessible by way of resolvable DNS or hostnames in the following scheme: mongodb0.example.net mongodb1.example.net mongodb2.example.net Congure DNS names appropriately, or set up your systems http://docs.mongodb.org/manual/etc/hosts le to reect this conguration. Ensure that one system (e.g. mongodb2.example.net) resides in Site B. Host all other systems in Site A. Ensure that network trafc can pass between all members in the network securely and efciently. Consider the following: Establish a virtual private network between the systems in Site A and Site B to encrypt all trafc between the sites and remains private. Ensure that your network topology routes all trafc between members within a single site over the local area network. Congure authentication using auth (page 1085) and keyFile (page 1085), so that only servers and process with authentication can connect to the replica set. Congure networking and rewall rules so that only trafc (incoming and outgoing packets) on the default MongoDB port (e.g. 27017) from within your deployment. See also: For more information on security and rewalls, see Security (page 395).
67
Specify run-time conguration on each system in a conguration le (page 1082) stored in http://docs.mongodb.org/manual/etc/mongodb.conf or in a related location. Do not specify run-time conguration through command line options. For each MongoDB instance, use the following conguration, with values set appropriate to your systems:
port = 27017 bind_ip = 10.8.0.10 dbpath = /srv/mongodb/ fork = true replSet = rs0/mongodb0.example.net,mongodb1.example.net,mongodb2.example.net
Modify bind_ip (page 1083) to reect a secure interface on your system that is able to access all other members of the set and that is accessible to all other members of the replica set. The DNS or host names need to point and resolve to this IP address. Congure network rules or a virtual private network (i.e. VPN) to permit this access. Note: The portion of the replSet (page 1092) following the http://docs.mongodb.org/manual/ provides a seed list of known members of the replica set. mongod (page 1025) uses this list to fetch conguration changes following restarts. It is acceptable to omit this section entirely, and have the replSet (page 1092) option resemble:
replSet = rs0
For more documentation on the above run time congurations, as well as additional conguration options, see Conguration File Options (page 1082). To deploy a geographically distributed three-member set: 1. On each system start the mongod (page 1025) process by issuing a command similar to following:
mongod --config /etc/mongodb.conf
Note: In production deployments you likely want to use and congure a control script to manage this process based on this command. Control scripts are beyond the scope of this document. 2. Open a mongo (page 1040) shell connected to one of the mongod (page 1025) instances:
mongo
3. Use the rs.initiate() (page 992) method on one member to initiate a replica set consisting of the current member and using the default conguration:
rs.initiate()
5. Add the remaining members to the replica set by issuing a sequence of commands similar to the following. The example commands assume the current primary is mongodb0.example.net:
rs.add("mongodb1.example.net") rs.add("mongodb2.example.net")
68
6. Make sure that you have congured the member located in Site B (i.e. mongodb2.example.net) as a secondary-only member (page 390): (a) Issue the following command to determine the _id (page 465) value for mongodb2.example.net:
rs.conf()
(b) In the members (page 465) array, save the _id (page 465) value. The example in the next step assumes this value is 2. (c) In the mongo (page 1040) shell connected to the replica sets primary, issue a command sequence similar to the following:
cfg = rs.conf() cfg.members[2].priority = 0 rs.reconfig(cfg)
Note: In some situations, the rs.reconfig() (page 992) shell method can force the current primary to step down and causes an election. When the primary steps down, all clients will disconnect. This is the intended behavior. While, this typically takes 10-20 seconds, attempt to make these changes during scheduled maintenance periods. After these commands return you have a geographically distributed three-member replica set. 7. To check the status of your replica set, issue rs.status() (page 994). See also: The documentation of the following shell functions for more information: rs.initiate() (page 992) rs.conf() (page 991) rs.reconfig() (page 992) rs.add() (page 990) Deploy a Distributed Four-Member Replica Set A geographically distributed four-member deployment has the following features: Each member of the replica set, except for the arbiter (see below), resides on its own machine, and the MongoDB processes all bind to port 27017, which is the standard MongoDB port. Each member of the replica set must be accessible by way of resolvable DNS or hostnames in the following scheme: mongodb0.example.net mongodb1.example.net mongodb2.example.net mongodb3.example.net Congure DNS names appropriately, or set up your systems http://docs.mongodb.org/manual/etc/host le to reect this conguration. Ensure that one system (e.g. mongodb2.example.net) resides in Site B. Host all other systems in Site A. One host (e.g. mongodb3.example.net) will be an arbiter and can run on a system that is also used for an application server or some other shared purpose.
69
There are three possible architectures for this replica set: Two members in Site A, two secondary-only members (page 390) in Site B, and an arbiter in Site A. Three members in Site A and one secondary-only member in Site B. Two members in Site A, one secondary-only member in Site B, one secondary-only member in Site C, and an arbiter in site A. In most cases the rst architecture is preferable because it is the least complex. Ensure that network trafc can pass between all members in the network securely and efciently. Consider the following: Establish a virtual private network between the systems in Site A and Site B (and Site C if it exists) to encrypt all trafc between the sites and remains private. Ensure that your network topology routes all trafc between members within a single site over the local area network. Congure authentication using auth (page 1085) and keyFile (page 1085), so that only servers and process with authentication can connect to the replica set. Congure networking and rewall rules so that only trafc (incoming and outgoing packets) on the default MongoDB port (e.g. 27017) from within your deployment. See also: For more information on security and rewalls, see Security (page 395). Specify run-time conguration on each system in a conguration le (page 1082) stored in http://docs.mongodb.org/manual/etc/mongodb.conf or in a related location. Do not specify run-time conguration through command line options. For each MongoDB instance, use the following conguration, with values set appropriate to your systems:
port = 27017 bind_ip = 10.8.0.10 dbpath = /srv/mongodb/ fork = true
replSet = rs0/mongodb0.example.net,mongodb1.example.net,mongodb2.example.net,mongodb3.example.ne
Modify bind_ip (page 1083) to reect a secure interface on your system that is able to access all other members of the set and that is accessible to all other members of the replica set. The DNS or host names need to point and resolve to this IP address. Congure network rules or a virtual private network (i.e. VPN) to permit this access. Note: The portion of the replSet (page 1092) following the http://docs.mongodb.org/manual/ provides a seed list of known members of the replica set. mongod (page 1025) uses this list to fetch conguration changes following restarts. It is acceptable to omit this section entirely, and have the replSet (page 1092) option resemble:
replSet = rs0
For more documentation on the above run time congurations, as well as additional conguration options, see doc:/reference/conguration-options. To deploy a geographically distributed four-member set: 1. On each system start the mongod (page 1025) process by issuing a command similar to following:
70
Note: In production deployments you likely want to use and congure a control script to manage this process based on this command. Control scripts are beyond the scope of this document. 2. Open a mongo (page 1040) shell connected to this host:
mongo
3. Use rs.initiate() (page 992) to initiate a replica set consisting of the current member and using the default conguration:
rs.initiate()
5. Add the remaining members to the replica set by issuing a sequence of commands similar to the following. The example commands assume the current primary is mongodb0.example.net:
rs.add("mongodb1.example.net") rs.add("mongodb2.example.net") rs.add("mongodb3.example.net")
6. In the same shell session, issue the following command to add the arbiter (e.g. mongodb4.example.net):
rs.addArb("mongodb4.example.net")
7. Make sure that you have congured each member located in Site B (e.g. mongodb3.example.net) as a secondary-only member (page 390): (a) Issue the following command to determine the _id (page 465) value for the member:
rs.conf()
(b) In the members (page 465) array, save the _id (page 465) value. The example in the next step assumes this value is 2. (c) In the mongo (page 1040) shell connected to the replica sets primary, issue a command sequence similar to the following:
cfg = rs.conf() cfg.members[2].priority = 0 rs.reconfig(cfg)
Note: In some situations, the rs.reconfig() (page 992) shell method can force the current primary to step down and causes an election. When the primary steps down, all clients will disconnect. This is the intended behavior. While, this typically takes 10-20 seconds, attempt to make these changes during scheduled maintenance periods. After these commands return you have a geographically distributed four-member replica set. 8. To check the status of your replica set, issue rs.status() (page 994). See also: The documentation of the following shell functions for more information: rs.initiate() (page 992)
71
rs.conf() (page 991) rs.reconfig() (page 992) rs.add() (page 990) Deploy a Distributed Set with More than Four Members The procedure for deploying a geographically distributed set with more than four members is similar to the above procedures, with the following differences: Never deploy more than seven voting members. Use the procedure for a four-member set if you have an even number of members (see Deploy a Distributed Four-Member Replica Set (page 434)). Ensure that Site A always has a majority of the members by deploying the arbiter within Site A. For six member sets, deploy at least three voting members in addition to the arbiter in Site A, the remaining members in alternate sites. Use the procedure for a three-member set if you have an odd number of members (see Deploy a Distributed Three-Member Replica Set (page 432)). Ensure that Site A always has a majority of the members of the set. For example, if a set has ve members, deploy three members within the primary facility and two members in other facilities. If you have a majority of the members of the set outside of Site A and the network partitions to prevent communication between sites, the current primary in Site A will step down, even if none of the members outside of Site A are eligible to become primary. Additionally, consider the Write Concern (page 400) and Read Preference (page 404) documents, which addresses capabilities related to data center awareness.
72
CHAPTER 6
Journaling
MongoDB uses write ahead logging to an on-disk journal to guarantee write operation (page 181) durability and to provide crash resiliency. Before applying a change to the data les, MongoDB writes the change operation to the journal. If MongoDB should terminate or encounter an error before it can write the changes from the journal to the data les, MongoDB can re-apply the write operation and maintain a consistent state. Without a journal, if mongod (page 1025) exits unexpectedly, you must assume your data is in an inconsistent state, and you must run either repair (page 54) or, preferably, resync (page 430) from a clean member of the replica set. With journaling enabled, if mongod (page 1025) stops unexpectedly, the program can recover everything written to the journal, and the data remains in a consistent state. By default, the greatest extent of lost writes, i.e., those not made to the journal, are those made in the last 100 milliseconds. See journalCommitInterval (page 1087) for more information on the default. With journaling, if you want a data set to reside entirely in RAM, you need enough RAM to hold the dataset plus the write working set. The write working set is the amount of unique data you expect to see written between re-mappings of the private view. For information on views, see Storage Views used in Journaling (page 76). Important: Changed in version 2.0: For 64-bit builds of mongod (page 1025), journaling is enabled by default. For other platforms, see journal (page 1087).
6.1 Procedures
6.1.1 Enable Journaling
Changed in version 2.0: For 64-bit builds of mongod (page 1025), journaling is enabled by default. To enable journaling, start mongod (page 1025) with the --journal (page 1029) command line option. If no journal les exist, when mongod (page 1025) starts, it must preallocate new journal les. During this operation, the mongod (page 1025) is not listening for connections until preallocation completes: for some systems this may take a several minutes. During this period your applications and the mongo (page 1040) shell are not available.
73
To disable journaling, start mongod (page 1025) with the --nojournal (page 1030) command line option.
2. Create a set of journal les by staring a mongod (page 1025) instance that uses the temporary directory:
mongod --port 10000 --dbpath ~/tmpDbpath --journal
3. When you see the following log output, indicating mongod (page 1025) has the les, press CONTROL+C to stop the mongod (page 1025) instance:
web admin interface listening on port 11000
4. Preallocate journal les for the new instance of mongod (page 1025) by moving the journal les from the data directory of the existing instance to the data directory of the new instance:
mv ~/tmpDbpath/journal /data/db/
74
Chapter 6. Journaling
You can also run this command on a busy system to see the sync time on a busy system, which may be higher if the journal directory is on the same volume as the data les. The journalLatencyTest (page 918) command also provides a way to check if your disk drive is buffering writes in its local cache. If the number is very low (i.e., less than 2 milliseconds) and the drive is non-SSD, the drive is probably buffering writes. In that case, enable cache write-through for the device in your operating system, unless you have a disk controller card with battery backed RAM.
which you will not be able to connect to the database. This is a one-time preallocation and does not occur with future invocations. To avoid preallocation lag, see Avoid Preallocation Lag (page 74).
76
Chapter 6. Journaling
CHAPTER 7
This document outlines the use and operation of MongoDBs SSL support. SSL allows MongoDB clients to support encrypted connections to mongod (page 1025) instances. Note: The default distribution of MongoDB does not contain support for SSL. To use SSL, you must either build MongoDB locally passing the --ssl option to scons or use MongoDB Enterprise. These instructions outline the process for getting started with SSL and assume that you have already installed a build of MongoDB that includes SSL support and that your client driver supports SSL.
This operation generates a new, self-signed certicate with no passphrase that is valid for 365 days. Once you have the certicate, concatenate the certicate and private key to a .pem le, as in the following example:
cat mongodb-cert.key mongodb-cert.crt > mongodb.pem
7.1.2 Set Up mongod and mongos with SSL Certicate and Key
To use SSL in your MongoDB deployment, include the following run-time options with mongod (page 1025) and mongos (page 1036): sslOnNormalPorts (page 1095) sslPEMKeyFile (page 1095) with the .pem le that contains the SSL certicate and key. Consider the following syntax for mongod (page 1025):
mongod --sslOnNormalPorts --sslPEMKeyFile <pem>
77
For example, given an SSL certicate located at http://docs.mongodb.org/manual/etc/ssl/mongodb.pem, congure mongod (page 1025) to use SSL encryption for all connections with the following command:
mongod --sslOnNormalPorts --sslPEMKeyFile /etc/ssl/mongodb.pem
Note: Specify <pem> with the full path name to the certicate. If the private key portion of the <pem> is encrypted, specify the encryption password with the sslPEMKeyPassword (page 1095) option. You may also specify these options in the conguration le (page 1082), as in the following example:
sslOnNormalPorts = true sslPEMKeyFile = /etc/ssl/mongodb.pem
To connect, to mongod (page 1025) and mongos (page 1036) instances using SSL, the mongo (page 1040) shell and MongoDB tools must include the --ssl option. See SSL Conguration for Clients (page 79) for more information on connecting to mongod (page 1025) and mongos (page 1036) running with SSL.
For example, given a signed SSL certicate located at http://docs.mongodb.org/manual/etc/ssl/mongodb.pem and the certicate authority le at http://docs.mongodb.org/manual/etc/ssl/ca.pem, you can congure mongod (page 1025) for SSL encryption as follows:
mongod --sslOnNormalPorts --sslPEMKeyFile /etc/ssl/mongodb.pem --sslCAFile /etc/ssl/ca.pem
Note: Specify the <pem> le and the <ca> le with either the full path name or the relative path name. If the <pem> is encrypted, specify the encryption password with the sslPEMKeyPassword (page 1095) option. You may also specify these options in the conguration le (page 1082), as in the following example:
sslOnNormalPorts = true sslPEMKeyFile = /etc/ssl/mongodb.pem sslCAFile = /etc/ssl/ca.pem
To connect, to mongod (page 1025) and mongos (page 1036) instances using SSL, the mongo (page 1040) tools must include the both the --ssl (page 1041) and --sslPEMKeyFile (page 1041) option. See SSL Conguration
78
for Clients (page 79) for more information on connecting to mongod (page 1025) and mongos (page 1036) running with SSL. Block Revoked Certicates for Clients To prevent clients with revoked certicates from connecting, include the sslCRLFile (page 1096) to specify a .pem le that contains revoked certicates. For example, the following mongod (page 1025) with SSL conguration includes the sslCRLFile (page 1096) setting:
Clients with revoked certicates in the http://docs.mongodb.org/manual/etc/ssl/ca-crl.pem will not be able to connect to this mongod (page 1025) instance. Validate Only if a Client Presents a Certicate In most cases it is important to ensure that clients present valid certicates. However, if you have clients that cannot present a client certicate, or are transitioning to using a certicate authority you may only want to validate certicates from clients that present a certicate. If you want to bypass validation for clients that dont present certicates, include the sslWeakCertificateValidation (page 1096) run-time option with mongod (page 1025) and mongos (page 1036). If the client does not present a certicate, no validation occurs. These connections, though not validated, are still encrypted using SSL. For example, consider the following mongod (page 1025) with an SSL conguration that includes the sslWeakCertificateValidation (page 1096) setting:
Then, clients can connect either with the option --ssl (page 1041) and no certicate or with the option --ssl (page 1041) and a valid certicate. See SSL Conguration for Clients (page 79) for more information on SSL connections for clients. Note: If the client presents a certicate, the certicate must be a valid certicate. All connections, including those that have not presented certicates are encrypted using SSL.
79
Connect to MongoDB Instance that Requires Client Certicates To connect to a mongod (page 1025) or mongos (page 1036) that requires CA-signed client certicates (page 78), start the mongo (page 1040) shell with --ssl (page 1041) and the --sslPEMKeyFile (page 1095) option to specify the signed certicate-key le, as in the following:
mongo --ssl --sslPEMKeyFile /etc/ssl/client.pem
Connect to MongoDB Instance that Validates when Presented with a Certicate To connect to a mongod (page 1025) or mongos (page 1036) instance that only requires valid certicates when the client presents a certicate (page 79), start mongo (page 1040) shell either with the --ssl (page 1041) ssl and no certicate or with the --ssl (page 1041) ssl and a valid signed certicate. For example, if mongod (page 1025) is running with weak certicate validation, both of the following mongo (page 1040) shell clients can connect to that mongod (page 1025):
mongo --ssl mongo --ssl --sslPEMKeyFile /etc/ssl/client.pem
7.2.2 MMS
The MMS agent will also have to connect via SSL in order to gather its stats. Because the agent already utilizes SSL for its communications to the MMS servers, this is just a matter of enabling SSL support in MMS itself on a per host basis. Use the Edit host button (i.e. the pencil) on the Hosts page in the MMS console and is currently enabled on a group by group basis by 10gen. Please see the MMS Manual for more information about MMS conguration.
80
7.2.3 PyMongo
Add the ssl=True parameter to a PyMongo MongoClient to create a MongoDB connection to an SSL MongoDB instance:
from pymongo import MongoClient c = MongoClient(host="mongodb.example.net", port=27017, ssl=True)
7.2.4 Java
Consider the following example SSLApp.java class le:
import com.mongodb.*; import javax.net.ssl.SSLSocketFactory; public class SSLApp { public static void main(String args[]) throws Exception {
MongoClientOptions o = new MongoClientOptions.Builder() .socketFactory(SSLSocketFactory.getDefault()) .build(); MongoClient m = new MongoClient("localhost", o); DB db = m.getDB( "test" ); DBCollection c = db.getCollection( "foo" ); System.out.println( c.findOne() ); } }
7.2.5 Ruby
The recent versions of the Ruby driver have support for connections to SSL servers. Install the latest version of the driver with the following command:
gem install mongo
81
7.2.7 .NET
As of release 1.6, the .NET driver supports SSL connections with mongod (page 1025) and mongos (page 1036) instances. To connect using SSL, you must add an option to the connection string, specifying ssl=true as follows:
var connectionString = "mongodb://localhost/?ssl=true"; var server = MongoServer.Create(connectionString);
The .NET driver will validate the certicate against the local trusted certicate store, in addition to providing encryption of the server. This behavior may produce issues during testing if the server uses a self-signed certicate. If you encounter this issue, add the sslverifycertificate=false option to the connection string to prevent the .NET driver from validating the certicate, as follows:
var connectionString = "mongodb://localhost/?ssl=true&sslverifycertificate=false"; var server = MongoServer.Create(connectionString);
82
CHAPTER 8
New in version 2.2. Enterprise Feature This feature is only available in MongoDB Enterprise. This document outlines the use and operation of MongoDBs SNMP extension, which is only available in MongoDB Enterprise.
8.1 Prerequisites
8.1.1 Install MongoDB Enterprise
MongoDB Enterprise
83
Red Hat Enterprise Linux 6.x series and Amazon Linux AMI require libssl, net-snmp, net-snmp-libs, and net-snmp-utils. Issue a command such as the following to install these packages:
sudo yum install libssl net-snmp net-snmp-libs net-snmp-utils
SUSE Enterprise Linux requires libopenssl0_9_8, libsnmp15, slessp1-libsnmp15, and snmp-mibs. Issue a command such as the following to install these packages:
sudo zypper install libopenssl0_9_8 libsnmp15 slessp1-libsnmp15 snmp-mibs
Replace [/path/to/mongodb/distribution/] with the path to your MONGO-MIB.txt conguration le. Copy the mongod.conf le into the http://docs.mongodb.org/manual/etc/snmp directory with the following command:
cp mongod.conf /etc/snmp/mongod.conf
8.2.2 Start Up
You can control MongoDB Enterprise using default or custom control scripts, just as with any other mongod: Use the following command to view all SNMP options available in your MongoDB:
mongod --help | grep snmp
Ensure that the following directories exist: http://docs.mongodb.org/manual/data/db/ (This is the path where MongoDB stores the data les.) http://docs.mongodb.org/manual/var/log/mongodb/ (This is the path where MongoDB writes the log output.) If they do not, issue the following command:
mkdir -p /var/log/mongodb/ /data/db/
84
Optionally, you can set these options in a conguration le (page 1082). To check if mongod is running with SNMP support, issue the following command:
ps -ef | grep mongod --snmp
The command should return output that includes the following line. This indicates that the proper mongod instance is running:
systemuser 31415 10260 0 Jul13 pts/16 00:00:00 mongod --snmp-master --port 3001 # [...]
You may also choose to specify the path to the MIB le:
snmpwalk -m /usr/share/snmp/mibs/MONGO-MIB -v 2c -c mongodb 127.0.0.1:1161 1.3.6.1.4.1.37601
Use this command only to ensure that you can retrieve and validate SNMP data from MongoDB.
8.3 Troubleshooting
Always check the logs for errors if something does not run as expected; see the log at http://docs.mongodb.org/manual/var/log/mongodb/1.log. The presence of the following line indicates that the mongod cannot read the http://docs.mongodb.org/manual/etc/snmp/mongod.conf le:
8.3. Troubleshooting
85
86
CHAPTER 9
MongoDB runs as a standard program. You can start MongoDB from a command line by issuing the mongod (page 1025) command and specifying options. For a list of options, see mongod (page 1025). MongoDB can also run as a Windows service. For details, see MongoDB as a Windows Service (page 18). To install MongoDB, see Install MongoDB (page 3). The following examples assume the directory containing the mongod (page 1025) process is in your system paths. The mongod (page 1025) process is the primary database process that runs on an individual server. mongos (page 1036) provides a coherent MongoDB interface equivalent to a mongod (page 1025) from the perspective of a client. The mongo (page 1040) binary provides the administrative shell. This document page discusses the mongod (page 1025) process; however, some portions of this document may be applicable to mongos (page 1036) instances. See also: Run-time Database Conguration (page 37), mongod (page 1025), mongos (page 1036), and Conguration File Options (page 1082).
87
2. To switch to the admin database and shutdown the mongod (page 1025) instance, issue the following commands:
use admin db.shutdownServer()
You may only use db.shutdownServer() (page 989) when connected to the mongod (page 1025) when authenticated to the admin database or on systems without authentication connected via the localhost interface. Alternately, you can shut down the mongod (page 1025) instance: 88 Chapter 9. Manage mongod Processes
using the --shutdown (page 1032) option from a driver using the shutdown (page 873). For details, see the drivers documentation (page 559) for your driver.
To keep checking the secondaries for a specied number of seconds if none are immediately up-to-date, issue shutdown (page 873) with the timeoutSecs argument. MongoDB will keep checking the secondaries for the specied number of seconds if none are immediately up-to-date. If any of the secondaries catch up within the allotted time, the primary will shut down. If no secondaries catch up, it will not shut down. The following command issues shutdown (page 873) with timeoutSecs set to 5:
db.adminCommand({shutdown : 1, timeoutSecs : 5})
Alternately you can use the timeoutSecs argument with the shutdownServer() (page 989) method:
db.shutdownServer({timeoutSecs : 5})
89
90
CHAPTER 10
10.1 Overview
Log rotation archives the current log le and starts a new one. Specically, log rotation renames the current log le by appending the lename with a timestamp, 1 opens a new log le, and nally closes the old log. MongoDB will only rotate logs, when you use the logRotate (page 870) command, or issue the process a SIGUSR1 signal as described in this procedure. See also: For information on logging, see the Process Logging (page 96) section.
10.2 Procedure
The following steps create and rotate a log le: 1. Start a mongod (page 1025) with verbose logging, with appending enabled, and with the following log le:
mongod -v --logpath /var/log/mongodb/server1.log --logappend
3. Rotate the log le using one of the following methods. From the mongo (page 1040) shell, issue the logRotate (page 870) command from the admin database:
use admin db.runCommand( { logRotate : 1 } )
This is the only available method to rotate log les on Windows systems. From the UNIX shell, rotate logs for a single process by issuing the following command:
1
91
From the UNIX shell, rotate logs for all mongod (page 1025) processes on a machine by issuing the following command:
killall -SIGUSR1 mongod
For results you get something similar to the following. The timestamps will be different.
server1.log server1.log.2011-11-24T23-30-00
The example results indicate a log rotation performed at exactly 11:30 pm on November 24th, 2011 UTC, which is the local time offset by the local time zone. The original log le is the one with the timestamp. The new log is server1.log le. If you issue a second logRotate (page 870) command an hour later, then an additional le would appear when listing matching les, as in the following example:
server1.log server1.log.2011-11-24T23-30-00 server1.log.2011-11-25T00-30-00
This operation does not modify the server1.log.2011-11-24T23-30-00 le created earlier, while server1.log.2011-11-25T00-30-00 is the previous server1.log le, renamed. server1.log is a new, empty le that receives all new log output.
92
CHAPTER 11
Monitoring is a critical component of all database administration. A rm grasp of MongoDBs reporting will allow you to assess the state of your database and maintain your deployment without crisis. Additionally, a sense of MongoDBs normal operational parameters will allow you to diagnose issues as you encounter them, rather than waiting for a crisis or failure. This document provides an overview of the available tools and data provided by MongoDB as well as an introduction to diagnostic strategies, and suggestions for monitoring instances in MongoDBs replica sets and sharded clusters. Note: 10gen provides a hosted monitoring service which collects and aggregates these data to provide insight into the performance and operation of MongoDB deployments. See the MongoDB Monitoring Service (MMS) and the MMS documentation for more information.
11.1.1 Utilities
The MongoDB distribution includes a number of utilities that return statistics about instances performance and activity quickly. These are typically most useful for diagnosing issues and assessing normal operation. mongotop mongotop (page 1072) tracks and reports the current read and write activity of a MongoDB instance. mongotop (page 1072) provides per-collection visibility into use. Use mongotop (page 1072) to verify that activity and use match expectations. See the mongotop manual (page 1071) for details.
93
mongostat mongostat (page 1067) captures and returns counters of database operations. mongostat (page 1067) reports operations on a per-type (e.g. insert, query, update, delete, etc.) basis. This format makes it easy to understand the distribution of load on the server. Use mongostat (page 1067) to understand the distribution of operation types and to inform capacity planning. See the mongostat manual (page 1067) for details. REST Interface MongoDB provides a REST interface that exposes a diagnostic and monitoring information in a simple web page. Enable this by setting rest (page 1089) to true, and access this page via the local host interface using the port numbered 1000 more than that the database port. In default congurations the REST interface is accessible on 28017. For example, to access the REST interface on a locally running mongod instance: http://localhost:28017
11.1.2 Statistics
MongoDB provides a number of commands that return statistics about the state of the MongoDB instance. These data may provide ner granularity regarding the state of the MongoDB instance than the tools above. Consider using their output in scripts and programs to develop custom alerts, or to modify the behavior of your application in response to the activity of your instance. serverStatus Access serverStatus data (page 893) by way of the serverStatus (page 893) command. This document contains a general overview of the state of the database, including disk usage, memory use, connection, journaling, index accesses. The command returns quickly and does not impact MongoDB performance. While this output contains a (nearly) complete account of the state of a MongoDB instance, in most cases you will not run this command directly. Nevertheless, all administrators should be familiar with the data provided by serverStatus (page 893). See also: db.serverStatus() (page 988) and serverStatus data (page 893). replSetGetStatus View the replSetGetStatus data (page 844) with the replSetGetStatus (page 844) command (rs.status() (page 994) from the shell). The document returned by this command reects the state and conguration of the replica set. Use this data to ensure that replication is properly congured, and to check the connections between the current host and the members of the replica set. dbStats The dbStats data (page 881) is accessible by way of the dbStats (page 881) command (db.stats() (page 989) from the shell). This command returns a document that contains data that reects the amount of storage used and data contained in the database, as well as object, collection, and index counters. Use this data to check and track the state and storage of a specic database. This output also allows you to compare utilization between databases and to determine average document size in a database.
94
collStats The collStats data (page 877) is accessible using the collStats (page 877) command (db.printCollectionStats() (page 986) from the shell). It provides statistics that resemble dbStats (page 881) on the collection level: this includes a count of the objects in the collection, the size of the collection, the amount of disk space used by the collection, and information about the indexes.
95
Notes Several plugins including: MongoDB Monitoring, MongoDB Slow Queries and MongoDB Replica Set Monitoring. Dashboard for MongoDB, MongoDB specic alerts, replication failover timeline and iPhone, iPad and Android mobile apps.
11.3.1 Locks
MongoDB uses a locking system to ensure consistency. However, if certain operations are long-running, or a queue forms, performance slows as requests and operations wait for the lock. Because lock related slow downs can be intermittent, look to the data in the globalLock (page 896) section of the serverStatus (page 893) response to assess if the lock has been a challenge to your performance. If globalLock.currentQueue.total (page 897) is consistently high, then there is a chance that a large number of requests are waiting for a lock. This indicates a possible concurrency issue that might affect performance. 96 Chapter 11. Monitoring for MongoDB
If globalLock.totalTime (page 896) is high in context of uptime (page 894) then the database has existed in a lock state for a signicant amount of time. If globalLock.ratio (page 897) is also high, MongoDB has likely been processing a large number of long running queries. Long queries are often the result of a number of factors: ineffective use of indexes, non-optimal schema design, poor query structure, system architecture issues, or insufcient RAM resulting in page faults (page 97) and disk reads.
If requests are high because there are many concurrent application requests, the database may have trouble keeping up with demand. If this is the case, then you will need to increase the capacity of your deployment. For readheavy applications increase the size of your replica set and distribute read operations to secondary members. For write heavy applications, deploy sharding and add one or more shards to a sharded cluster to distribute load among mongod (page 1025) instances. Spikes in the number of connections can also be the result of application or driver errors. All of the MongoDB drivers supported by 10gen implement connection pooling, which allows clients to use and reuse connections more efciently. Extremely high numbers of connections, particularly without corresponding workload is often indicative of a driver or other conguration error.
See also: The documentation of db.setProfilingLevel() (page 988) for more information about this command. Note: Because the database proler can have an impact on the performance, only enable proling for strategic intervals and as minimally as possible on production systems. You may enable proling on a per-mongod (page 1025) basis. This setting will not propagate across a replica set or sharded cluster. The following proling levels are available: Level 0 1 2 Setting Off. No proling. On. Only includes slow operations. On. Includes all operations.
See the output of the proler in the system.profile collection of your database. You can specify the slowms (page 1089) setting to set a threshold above which the proler considers operations slow and thus included in the level 1 proling data. You may congure slowms (page 1089) at runtime, as an argument to the db.setProfilingLevel() (page 988) operation. Additionally, mongod (page 1025) records all slow queries to its log (page 1084), as dened by slowms (page 1089). The data in system.profile does not persist between mongod (page 1025) restarts. You can view the prolers output by issuing the show profile command in the mongo (page 1040) shell, with the following operation.
db.system.profile.find( { millis : { $gt : 100 } } )
This returns all operations that lasted longer than 100 milliseconds. Ensure that the value specied here (i.e. 100) is above the slowms (page 1089) threshold. See also: Optimization Strategies for MongoDB (page 559) addresses strategies that may improve the performance of your database queries and operations.
98
See the replSetGetStatus (page 844) document for a more in depth overview view of this output. In general watch the value of optimeDate (page 846). Pay particular attention to the difference in time between the primary and the secondary members. The size of the operation log is only congurable during the rst run using the --oplogSize (page 1032) argument to the mongod (page 1025) command, or preferably the oplogSize (page 1092) in the MongoDB conguration le. If you do not specify this on the command line before running with the --replSet (page 1032) option, mongod (page 1025) will create a default sized oplog. By default the oplog is 5% of total available disk space on 64-bit systems. See also: Change the Size of the Oplog (page 437)
For active deployments, the above query might return a useful result set. The balancing process, which originates on a randomly selected mongos (page 1036), takes a special balancer lock that prevents other balancing activity from transpiring. Use the following command, also to the config database, to check the status of the balancer lock.
db.locks.find( { _id : "balancer" } )
If this lock exists, make sure that the balancer process is actively using this lock.
100
CHAPTER 12
The database proler collects ne grained data about MongoDB write operations, cursors, database commands on a running mongod (page 1025) instance. You can enable proling on a per-database or per-instance basis. The proling level (page 101) is also congurable when enabling proling. The database proler writes all the data it collects to the system.profile (page 1103) collection, which is a capped collection (page 562). See Database Proler Output (page 1104) for overview of the data in the system.profile (page 1103) documents created by the proler. This document outlines a number of key administration options for the database proler. For additional related information, consider the following resources: Database Proler Output (page 1104) Prole Command (page 892) db.currentOp() (page 975)
101
To enable proling and set the proling level, issue use the db.setProfilingLevel() (page 988) helper in the mongo (page 1040) shell, passing the proling level as a parameter. For example, to enable proling for all database operations, consider the following operation in the mongo (page 1040) shell:
db.setProfilingLevel(2)
The shell returns a document showing the previous level of proling. The "ok" : operation succeeded:
{ "was" : 0, "slowms" : 100, "ok" : 1 }
To verify the new setting, see the Check Proling Level (page 102) section.
The was eld indicates the current level of proling. The slowms eld indicates how long an operation must exist in milliseconds for an operation to pass the slow threshold. MongoDB will log operations that take longer than the threshold if the proling level is 1. This document returns the proling level in the was eld. For an explanation of proling levels, see Proling Levels (page 101). To return only the proling level, use the db.getProfilingLevel() (page 982) helper in the mongo (page 1040) as in the following:
db.getProfilingLevel()
102
This sets the proling level to 1, which collects proling data for slow operations only, and denes slow operations as those that last longer than 15 milliseconds. See also: profile (page 1088) and slowms (page 1089).
To return all operations except command operations ($cmd), run a query similar to the following:
db.system.profile.find( { op: { $ne : command } } ).pretty()
To return operations for a particular collection, run a query similar to the following. This example returns operations in the mydb databases test collection:
103
To return operations slower than 5 milliseconds, run a query similar to the following:
db.system.profile.find( { millis : { $gt : 5 } } ).pretty()
To return information from a certain time range, run a query similar to the following:
db.system.profile.find( { ts : { $gt : new ISODate("2012-12-09T03:00:00Z") , $lt : new ISODate("2012-12-09T03:40:00Z") } } ).pretty()
The following example looks at the time range, suppresses the user eld from the output to make it easier to read, and sorts the results by how long each operation took to run:
db.system.profile.find( { ts : { $gt : new ISODate("2011-07-12T03:00:00Z") , $lt : new ISODate("2011-07-12T03:40:00Z") } }, { user : 0 } ).sort( { millis : -1 } )
104
105
106
CHAPTER 13
This document provides an overview of the import and export programs included in the MongoDB distribution. These tools are useful when you want to backup or export a portion of your data without capturing the state of the entire database, or for simple data ingestion cases. For more complex data migration tasks, you may want to write your own import and export scripts using a client driver to interact with the database itself. For disaster recovery protection and routine database backup operation, use full database instance backups (page 43). Warning: Because these tools primarily operate by interacting with a running mongod (page 1025) instance, they can impact the performance of your running database. Not only do these processes create trafc for a running database instance, they also force the database to read all data through memory. When MongoDB reads infrequently used data, it can supplant more frequently accessed data, causing a deterioration in performance for the databases regular workload. mongoimport (page 1060) and mongoexport (page 1063) do not reliably preserve all rich BSON data types, because BSON is a superset of JSON . Thus, mongoimport (page 1060) and mongoexport (page 1063) cannot represent BSON data accurately in JSON . As a result data exported or imported with these tools may lose some measure of delity. See MongoDB Extended JSON (page 1117) for more information about MongoDB Extended JSON. See also: See the Backup Strategies for MongoDB Systems (page 43) document for more information on backing up MongoDB instances. Additionally, consider the following references for commands addressed in this document: mongoexport (page 1063) mongorestore (page 1051) mongodump (page 1048) If you want to transform and process data once youve imported it in MongoDB consider the documents in the Aggregation (page 255) section, including: Map-Reduce (page 313) and Aggregation Framework (page 257).
107
If maintaining type delity is important, consider writing a data import and export system that does not force BSON documents into JSON form as part of the process. The following list of types contain examples for how MongoDB will represent how BSON documents render in JSON. data_binary
{ "$binary" : "<bindata>", "$type" : "<t>" }
<bindata> is the base64 representation of a binary string. <t> is the hexadecimal representation of a single byte indicating the data type. data_date
Date( <date> )
<date> is the JSON representation of a 64-bit signed integer for milliseconds since epoch. data_timestamp
Timestamp( <t>, <i> )
<t> is the JSON representation of a 32-bit unsigned integer for milliseconds since epoch. <i> is a 32-bit unsigned integer for the increment. data_regex
/<jRegex>/<jOptions>
<jRegex> is a string that may contain valid JSON characters and unescaped double quote (i.e. ") characters, but may not contain unescaped forward slash (i.e. http://docs.mongodb.org/manual/) characters. <jOptions> is a string that may contain only the characters g, i, m, and s. data_oid
ObjectId( "<id>" )
<id> is a 24 character hexadecimal string. These representations require that data_oid values have an associated eld named _id. data_ref
DBRef( "<name>", "<id>" )
<name> is a string of valid JSON characters. <id> is a 24 character hexadecimal string. See also: MongoDB Extended JSON (page 1117)
Labeling should describe the contents of the backup, and reect the subset of the data corpus, captured in the backup or export. Do not create or apply exports if the backup process itself will have an adverse effect on a production system. Make sure that they reect a consistent data state. Export or backup processes can impact data integrity (i.e. type delity) and consistency if updates continue during the backup process. Test backups and exports by restoring and importing to ensure that the backups are useful.
This will export all documents in the collection named collection into the le collection.json. Without the output specication (i.e. --out collection.json (page 1066),) mongoexport (page 1063) writes output to standard output (i.e. stdout.) You can further narrow the results by supplying a query lter using the --query (page 1065) and limit results to a single database using the --db (page 1065) option. For instance:
mongoexport --db sales --collection contacts --query {"field": 1}
This command returns all documents in the sales databases contacts collection, with a eld named field with a value of 1. Enclose the query in single quotes (e.g. ) to ensure that it does not interact with your shell environment. The resulting documents will return on standard output. By default, mongoexport (page 1063) returns one JSON document per MongoDB document. Specify the --jsonArray (page 1065) argument to return the export as a single JSON array. Use the --csv (page 1065) le to return the result in CSV (comma separated values) format. If your mongod (page 1025) instance is not running, you can use the --dbpath (page 1064) option to specify the location to your MongoDB instances database les. See the following example:
mongoexport --db sales --collection contacts --dbpath /srv/MongoDB/
This reads the data les directly. This locks the data directory to prevent conicting writes. The mongod (page 1025) process must not be running or attached to these data les when you run mongoexport (page 1063) in this conguration.
109
The --host (page 1063) and --port (page 1064) options allow you to specify a non-local host to connect to capture the export. Consider the following example:
mongoexport --host mongodb1.example.net --port 37017 --username user --password pass --collection con
On any mongoexport (page 1063) command you may, as above specify username and password credentials as above.
This imports the contents of the le collection.json into the collection named collection. If you do not specify a le with the --file (page 1062) option, mongoimport (page 1060) accepts input over standard input (e.g. stdin.) If you specify the --upsert (page 1062) option, all of mongoimport (page 1060) operations will attempt to update existing documents in the database and insert other documents. This option will cause some performance impact depending on your conguration. You can specify the database option --db (page 1061) to import these documents to a particular database. If your MongoDB instance is not running, use the --dbpath (page 1061) option to specify the location of your MongoDB instances database les. Consider using the --journal (page 1061) option to ensure that mongoimport (page 1060) records its operations in the journal. The mongod process must not be running or attached to these data les when you run mongoimport (page 1060) in this conguration. Use the --ignoreBlanks (page 1062) option to ignore blank elds. For CSV and TSV imports, this option provides the desired functionality in most cases: it avoids inserting blank elds in MongoDB documents.
110
CHAPTER 14
The Linux kernel provides a system to limit and control the number of threads, connections, and open les on a perprocess and per-user basis. These limits prevent single users from using too many system resources. Sometimes, these limits, as congured by the distribution developers, are too low for MongoDB and can cause a number of issues in the course of normal MongoDB operation. Generally, MongoDB should be the only user process on a system, to prevent resource contention.
14.1.1 mongod
1 le descriptor for each data le in use by the mongod (page 1025) instance. 1 le descriptor for each journal le used by the mongod (page 1025) instance when journal (page 1087) is true. In replica sets, each mongod (page 1025) maintains a connection to all other members of the set. mongod (page 1025) uses background threads for a number of internal processes, including TTL collections (page 581), replication, and replica set health checks, which may require a small number of additional resources.
14.1.2 mongos
In addition to the threads and le descriptors for client connections, mongos (page 1036) must maintain connects to all cong servers and all shards, which includes all members of all replica sets. For mongos (page 1036), consider the following behaviors: mongos (page 1036) instances maintain a connection pool to each shard so that the mongos (page 1036) can reuse connections and quickly fulll requests without needing to create new connections.
111
You can limit the number of incoming connections using the maxConns (page 1083) run-time option. By restricting the number of incoming connections you can prevent a cascade effect where the mongos (page 1036) creates too many connections on the mongod (page 1025) instances. Note: You cannot set maxConns (page 1083) to a value higher than 20000.
ulimit refers to the per-user limitations for various resources. Therefore, if your mongod (page 1025) instance executes as a user that is also running multiple processes, or multiple mongod (page 1025) processes, you might see contention for these resources. Also, be aware that the processes value (i.e. -u) refers to the combined number of distinct processes and sub-process threads. You can change ulimit settings by issuing a command in the following form:
ulimit -n <value>
For many distributions of Linux you can change values by substituting the -n option for any possible value in the output of ulimit -a. See your operating system documentation for the precise procedure for changing system limits on running systems. Note: After changing the ulimit settings, you must restart the process to take advantage of the modied settings. You can use the http://docs.mongodb.org/manual/proc le system to see the current limitations on a running process. Depending on your systems conguration, and default settings, any change to system limits made using ulimit may revert following system a system restart. Check your distribution and operating system documentation for more information.
112
Note: This section applies only to Linux operating systems. The http://docs.mongodb.org/manual/proc le-system stores the per-process limits in the le system object located at http://docs.mongodb.org/manual/proc/<pid>/limits, where <pid> is the processs PID or process identier. You can use the following bash function to return the content of the limits object for a process or processes with a given name:
return-limits(){ for process in $@; do process_pids=ps -C $process -o pid --no-headers | cut -d " " -f 2 if [ -z $@ ]; then echo "[no $process running]" else for pid in $process_pids; do echo "[$process #$pid -- limits]" cat /proc/$pid/limits done fi done }
You can copy and paste this function into a current shell session or load it as part of a script. Call the function with one the following invocations:
return-limits mongod return-limits mongos return-limits mongod mongos
us
-f (le size): unlimited -t (cpu time): unlimited -v (virtual memory): unlimited 1 -n (open les): 64000 -m (memory size): unlimited 1 -u (processes/threads): 32000 Always remember to restart your mongod (page 1025) and mongos (page 1036) instances after changing the ulimit settings to make sure that the settings change takes effect.
1 If you limit virtual or resident memory size on a system running MongoDB the operating system will refuse to honor additional allocation requests.
114
CHAPTER 15
Production Notes
This page details system congurations that affect MongoDB, especially in production.
15.1 Backups
To make backups of your MongoDB database, please refer to Backup Strategies for MongoDB Systems (page 43).
15.2 Networking
Always run MongoDB in a trusted environment, with network rules that prevent access from all unknown machines, systems, or networks. As with any sensitive system dependent on network access, your MongoDB deployment should only be accessible to specic systems that require access: application servers, monitoring services, and other MongoDB components. See documents in the Security (page 131) section for additional information, specically: Interfaces and Port Numbers (page 134) Firewalls (page 135) Congure Linux iptables Firewall for MongoDB (page 139) Congure Windows netsh Firewall for MongoDB (page 143) For Windows users, consider the Windows Server Technet Article on TCP Conguration when deploying MongoDB on Windows.
115
For MongoDB on Linux use the following recommended congurations: Turn off atime for the storage volume with the database les. Set the le descriptor limit and the user process limit above 20,000, according to the suggestions in Linux ulimit Settings (page 111). A low ulimit will affect MongoDB when under heavy use and will produce weird errors. Do not use hugepages virtual memory pages, MongoDB performs better with normal virtual memory pages. Disable NUMA in your BIOS. If that is not possible see NUMA (page 117). Ensure that readahead settings for the block devices that store the database les are acceptable. See the Readahead (page 116) section Use NTP to synchronize time among your hosts. This is especially important in sharded clusters.
15.4 Readahead
For random access use patterns set readahead values low, for example setting readahead to a small value such as 32 (16KB) often works well.
15.5.1 EC2
MongoDB is compatible with EC2 and requires no conguration changes specic to the environment.
15.5.2 VMWare
MongoDB is compatible with VMWare. Some in the MongoDB community have run into issues with VMWares memory overcommit feature and suggest disabling the feature. You can clone a virtual machine running MongoDB. You might use this to spin up a new virtual host that will be added as a member of a replica set. If journaling is enabled, the clone snapshot will be consistent. If not using journaling, stop mongod (page 1025), clone, and then restart.
15.5.3 OpenVZ
The MongoDB community has encountered issues running MongoDB on OpenVZ.
116
15.6.2 RAID
Most MongoDB deployments should use disks backed by RAID-10. RAID-5 and RAID-6 do not typically provide sufcient performance to support a MongoDB deployment. RAID-0 provides good write performance but provides limited availability, and reduced performance on read operations, particularly using Amazons EBS volumes: as a result, avoid RAID-0 with MongoDB deployments.
117
To fully disable NUMA you must perform both operations. However, you can change zone_reclaim_mode without restarting mongod. For more information, see documentation on Proc/sys/vm. See the The MySQL swap insanity problem and the effects of NUMA post, which describes the effects of NUMA on databases. This blog post addresses the impact of NUMA for MySQL; however, the issues for MongoDB are similar. The post introduces NUMA its goals, and illustrates how these goals are not compatible with production databases.
Use the mount command to see what device your data directory (page 1085) resides on. Key elds from iostat: %util: this is the most useful eld for a quick check, it indicates what percent of the time the device/drive is in use. avgrq-sz: average request size. Smaller number for this value reect more random IO operations.
15.8.2 bwm-ng
bwm-ng is a command-line tool for monitoring network use. If you suspect a network-based bottleneck, you may use bwm-ng to begin your diagnostic process.
118
Set bool_multi to true when updating many documents. Otherwise only the rst matched will update.
Then the following query which looks for a number value 123 will not return that document:
db.mycollection.find( { x : 123 } )
119
15.9.8 Locking
Older versions of MongoDB used a global lock; use MongoDB v2.2+ for better results. See the Concurrency (page 731) page for more information.
15.9.9 Packages
Be sure you have the latest stable release if you are using a package manager. You can see what is current on the Downloads page, even if you then choose to install via a package manager.
120
Disable NUMA for best results. If you have NUMA enabled, mongod (page 1025) will print a warning when it starts. Avoid excessive prefetch/readahead on the lesystem. Check your prefetch settings. Note on linux the parameter is in sectors, not bytes. 32KBytes (a setting of 64 sectors) is pretty reasonable. Check ulimit (page 111) settings. Use SSD if available and economical. Spinning disks can work well but SSDs capacity for random I/O operations work well with the update model of mongod (page 1025). See Remote Filesystems (page 117) for more info. Ensure that clients keep reasonable pool sizes to avoid overloading the connection tracking capacity of a single mongod (page 1025) or mongos (page 1036) instance.
121
122
CHAPTER 16
The MongoDB command interface provides access to all non CRUD database operations. Fetching server stats, initializing a replica set, and running a map-reduce job are all accomplished with commands. See Database Commands (page 811) for list of all commands sorted by function, and Database Commands (page 811) for a list of all commands sorted alphabetically.
Many drivers (page 559) provide an equivalent for the db.runCommand() (page 988) method. Internally, running commands with db.runCommand() (page 988) is equivalent to a special query against the $cmd collection. Many common commands have their own shell helpers or wrappers in the mongo (page 1040) shell and drivers, such as the db.isMaster() (page 984) method in the mongo (page 1040) JavaScript shell.
However, theres also a command helper that automatically runs the command in the context of the admin database:
db._adminCommand( {buildInfo: 1} )
123
124
CHAPTER 17
MongoDB Tutorials
This page lists the tutorials available as part of the MongoDB Manual (page 1). In addition to these documents, you can refer to the introductory MongoDB Tutorial (page 22). If there is a process or pattern that you would like to see included here, please open a Jira Case.
17.2 Administration
17.2.1 Replica Sets
Deploy a Replica Set (page 421) Convert a Standalone to a Replica Set (page 425) Add Members to a Replica Set (page 426) Remove Members from Replica Set (page 429) Replace a Replica Set Member (page 429) Adjust Priority for Replica Set Member (page 430) Resync a Member of a Replica Set (page 430) Deploy a Geographically Distributed Replica Set (page 431) Change the Size of the Oplog (page 437) Force a Member to Become Primary (page 439)
125
Change Hostnames in a Replica Set (page 442) Add an Arbiter to Replica Set (page 450) Convert a Secondary to an Arbiter (page 451) Congure a Secondarys Sync Target (page 456) Congure a Delayed Replica Set Member (page 453) Congure a Replica Set Member as Hidden (page 454) Congure a Non-Voting Replica Set Member (page 454) Prevent Replica Set Member from Becoming Primary (page 455) Congure Replica Set Tag Sets (page 457) Manage Chained Replication (page 441) Recongure a Replica Set with Unavailable Members (page 460) Recover MongoDB Data following Unexpected Shutdown (page 54) Troubleshoot Replica Sets (page 446)
17.2.2 Sharding
Deploy a Sharded Cluster (page 505) Convert a Replica Set to a Replicated Sharded Cluster (page 513) Add Shards to a Cluster (page 512) Remove Shards from an Existing Sharded Cluster (page 532) Deploy Three Cong Servers for Production Deployments (page 520) Migrate Cong Servers with the Same Hostname (page 521) Migrate Cong Servers with Different Hostnames (page 521) Replace a Cong Server (page 522) Backup Cluster Metadata (page 523) Backup a Small Sharded Cluster with mongodump (page 56) Create Backup of a Sharded Cluster with Filesystem Snapshots (page 57) Create Backup of a Sharded Cluster with Database Dumps (page 58) Restore a Single Shard (page 60) Restore Sharded Clusters (page 60) Schedule Backup Window for Sharded Clusters (page 61) Administer and Manage Shard Tags (page 519)
Expire Data from Collections by Setting TTL (page 581) Analyze Performance of Database Operations (page 101) Rotate Log Files (page 91) Build Old Style Indexes (page 358) Manage mongod Processes (page 87) Use mongodump and mongorestore to Backup and Restore MongoDB Databases (page 45) Use Filesystem Snapshots to Backup and Restore MongoDB Databases (page 48)
17.2.4 Security
Congure Linux iptables Firewall for MongoDB (page 139) Congure Windows netsh Firewall for MongoDB (page 143) Enable Authentication (page 148) Create a User Administrator (page 148) Add a User to a Database (page 150) Generate a Key File (page 151) Deploy MongoDB with Kerberos Authentication (page 151) Create a Vulnerability Report (page 146)
127
128
Part III
Security
129
The documentation in this section outlines basic security, risk management, and access control, and includes specic tasks for conguring rewalls, authentication, and system privileges. User roles in MongoDB provide granular control over user authorization and access. If you believe you have discovered a vulnerability in MongoDB, please see Create a Vulnerability Report (page 146).
131
132
CHAPTER 18
133
134
the rest (page 1089) setting for mongod (page 1025). Enables a fully interactive administrative REST interface, which is disabled by default. The status interface, which is enabled by default, is read-only. This conguration makes that interface fully interactive. The REST interface does not support any authentication and you should always restrict access to this interface to only allow trusted clients to connect to this port. You may also enable this interface on the command line as mongod --rest (page 1030). Important: Disable this option for production deployments. If do you leave this interface enabled, you should only allow trusted clients to access this port. the bind_ip (page 1083) setting for mongod (page 1025) and mongos (page 1036) instances. Limits the network interfaces on which MongoDB programs will listen for incoming connections. You can also specify a number of interfaces by passing bind_ip (page 1083) a comma separated list of IP addresses. You can use the mongod --bind_ip (page 1026) and mongos --bind_ip (page 1036) option on the command line at run time to limit the network accessibility of a MongoDB program. Important: Make sure that your mongod (page 1025) and mongos (page 1036) instances are only accessible on trusted networks. If your system has more than one network interface, bind MongoDB programs to the private or internal network interface.
Firewalls Firewalls allow administrators to lter and control access to a system by providing granular control over what network communications. For administrators of MongoDB, the following capabilities are important: limiting incoming trafc on a specic port to specic systems. limiting incoming trafc from untrusted hosts. On Linux systems, the iptables interface provides access to the underlying netfilter rewall. On Windows systems netsh command line interface provides access to the underlying Windows Firewall. For additional information about rewall conguration consider the following documents: Congure Linux iptables Firewall for MongoDB (page 139) Congure Windows netsh Firewall for MongoDB (page 143) For best results and to minimize overall exposure, ensure that only trafc from trusted sources can reach mongod (page 1025) and mongos (page 1036) instances and that the mongod (page 1025) and mongos (page 1036) instances can only connect to trusted outputs. See also: For MongoDB deployments on Amazons web services, see the Amazon EC2 page, which addresses Amazons Security Groups and other EC2-specic security features. Virtual Private Networks Virtual private networks, or VPNs, make it possible to link two networks over an encrypted and limited-access trusted network. Typically MongoDB users who use VPNs use SSL rather than IPSEC VPNs for performance issues. Depending on conguration and implementation VPNs provide for certicate validation and a choice of encryption protocols, which requires a rigorous level of authentication and identication of all clients. Furthermore, because VPNs provide a secure tunnel, using a VPN connection to control access to your MongoDB instance, you can prevent tampering and man-in-the-middle attacks. 18.1. Security Practices and Management 135
18.1.5 Operations
Always run the mongod (page 1025) or mongos (page 1036) process as a unique user with the minimum required permissions and access. Never run a MongoDB program as a root or administrative users. The system users that run the MongoDB processes should have robust authentication credentials that prevent unauthorized or casual access. To further limit the environment, you can run the mongod (page 1025) or mongos (page 1036) process in a chroot environment. Both user-based access restrictions and chroot conguration follow recommended conventions for administering all daemon processes on Unix-like systems. You can disable anonymous access to the database by enabling MongoDB authentication. See Access Control (page 137).
18.1.6 Interfaces
Simply limiting access to a mongod (page 1025) is not sufcient for totally controlling risk exposure. Consider the recommendations in the following section, for limiting exposure other interface-related risks. JavaScript and the Security of the mongo Shell Be aware of the following capabilities and behaviors of the mongo (page 1040) shell: mongo (page 1040) will evaluate a .js le passed to the mongo --eval (page 1041) option. The mongo (page 1040) shell does not validate the input of JavaScript input to --eval (page 1041). mongo (page 1040) will evaluate a .mongorc.js le before starting. You can disable this behavior by passing the mongo --norc (page 1040) option. On Linux and Unix systems, mongo (page 1040) reads the .mongorc.js le from $HOME /.mongorc.js (i.e. ~/.mongorc.js), and Windows mongo.exe reads the .mongorc.js le from %HOME%.mongorc.js or %HOMEDRIVE%%HOMEPATH%.mongorc.js. HTTP Status Interface The HTTP status interface provides a web-based interface that includes a variety of operational data, logs, and status reports regarding the mongod (page 1025) or mongos (page 1036) instance. The HTTP interface is always available on the port numbered 1000 greater than the primary mongod (page 1025) port. By default this is 28017, but is indirectly set using the port (page 1083) option which allows you to congure the primary mongod (page 1025) port. Without the rest (page 1089) setting, this interface is entirely read-only, and limited in scope; nevertheless, this interface may represent an exposure. To disable the HTTP interface, set the nohttpinterface (page 1088) run time option or the --nohttpinterface (page 1030) command line option. REST API The REST API to MongoDB provides additional information and write access on top of the HTTP Status interface. The REST interface is disabled by default, and is not recommended for production use. While the REST API does not provide any support for insert, update, or remove operations, it does provide administrative access, and its accessibility represents a vulnerability in a secure environment. If you must use the REST API, please control and limit access to the REST API. The REST API does not include any support for authentication, even when running with auth (page 1085) enabled. See the following documents for instructions on restricting access to the REST API interface: 136 Chapter 18. Security Concepts and Strategies
Congure Linux iptables Firewall for MongoDB (page 139) Congure Windows netsh Firewall for MongoDB (page 143)
18.2.1 Authentication
MongoDB provides support for basic authentication by: storing user credentials in a databases system.users (page 162) collection, and providing the auth (page 1085) and keyFile (page 1085) conguration settings to enable authentication for a given mongod (page 1025) or mongos (page 1036) instance. Authentication is disabled by default. To enable authentication, see the following: Enable Authentication (page 148) Deploy MongoDB with Kerberos Authentication (page 151)
18.2.2 Authorization
MongoDB supports role-based access to databases and database operations by storing each users roles in a privilege document (page 157) in the system.users (page 162) collection. For a description of privilege documents and of available roles, see User Privilege Roles in MongoDB (page 157). Changed in version 2.4: The schema of system.users (page 162) changed to accommodate a more sophisticated user privilege model, as dened in privilege documents (page 157). The system.users (page 162) collection is protected to prevent privilege escalation attacks. To access the collection, you must have the userAdmin (page 159) or userAdminAnyDatabase (page 161) role.
137
To assign user roles, you must rst create an admin user in the database. Then you create additional users, assigning them appropriate user roles. To assign user roles, see the following: Create a User Administrator (page 148) Add a User to a Database (page 150) User Roles in the admin Database The admin database provides roles not available in other databases, including a role that effectively makes a user a MongoDB system superuser. See Database Administration Roles (page 158) and Administrative Roles (page 159). Authentication to One Database at a Time You can log in as only one user for a given database, including the admin database. If you authenticate to a database as one user and later authenticate on the same database as a different user, the second authentication invalidates the rst. Logging into a different database, however, does not invalidate authentication on other databases.
138
CHAPTER 19
Tutorials
139
Patterns This section contains a number of patterns and examples for conguring iptables for use with MongoDB deployments. If you have congured different ports using the port (page 1083) conguration setting, you will need to modify the rules accordingly.
Trafc to and from mongod Instances
This pattern is applicable to all mongod (page 1025) instances running as standalone instances or as part of a replica set. The goal of this pattern is to explicitly allow trafc to the mongod (page 1025) instance from the application server. In the following examples, replace <ip-address> with the IP address of the application server:
iptables -A INPUT -s <ip-address> -p tcp --destination-port 27017 -m state --state NEW,ESTABLISHED -j iptables -A OUTPUT -d <ip-address> -p tcp --source-port 27017 -m state --state ESTABLISHED -j ACCEPT
The rst rule allows all incoming trafc from <ip-address> on port 27017, which allows the application server to connect to the mongod (page 1025) instance. The second rule, allows outgoing trafc from the mongod (page 1025) to reach the application server. Optional If you have only one application server, you can replace <ip-address> with either the IP address itself, such as: 198.51.100.55. You can also express this using CIDR notation as 198.51.100.55/32. If you want to permit a larger block of possible IP addresses you can allow trafc from a http://docs.mongodb.org/manual/24 using one of the following specications for the <ip-address>, as follows:
10.10.10.10/24 10.10.10.10/255.255.255.0
mongos (page 1036) instances provide query routing for sharded clusters. Clients connect to mongos (page 1036) instances, which behave from the clients perspective as mongod (page 1025) instances. In turn, the mongos (page 1036) connects to all mongod (page 1025) instances that are components of the sharded cluster. Use the same iptables command to allow trafc to and from these instances as you would from the mongod (page 1025) instances that are members of the replica set. Take the conguration outlined in the Trafc to and from mongod Instances (page 140) section as an example.
Trafc to and from a MongoDB Cong Server
Cong servers, host the cong database that stores metadata for sharded clusters. Each production cluster has three cong servers, initiated using the mongod --configsvr (page 1034) option. 1 Cong servers listen for connections on port 27019. As a result, add the following iptables rules to the cong server to allow incoming and outgoing connection on port 27019, for connection to the other cong servers.
iptables -A INPUT -s <ip-address> -p tcp --destination-port 27019 -m state --state NEW,ESTABLISHED -j iptables -A OUTPUT -d <ip-address> -p tcp --source-port 27019 -m state --state ESTABLISHED -j ACCEPT
1
You can also run a cong server by setting the configsvr (page 1093) option in a conguration le.
140
Replace <ip-address> with the address or address space of all the mongod (page 1025) that provide cong servers. Additionally, cong servers need to allow incoming connections from all of the mongos (page 1036) instances in the cluster and all mongod (page 1025) instances in the cluster. Add rules that resemble the following:
Replace <ip-address> with the address of the mongos (page 1036) instances and the shard mongod (page 1025) instances.
Trafc to and from a MongoDB Shard Server
For shard servers, running as mongod --shardsvr (page 1034) 2 Because the default port number when running with shardsvr (page 1093) is 27018, you must congure the following iptables rules to allow trafc to and from each shard:
iptables -A INPUT -s <ip-address> -p tcp --destination-port 27018 -m state --state NEW,ESTABLISHED -j iptables -A OUTPUT -d <ip-address> -p tcp --source-port 27018 -m state --state ESTABLISHED -j ACCEPT
Replace the <ip-address> specication with the IP address of all mongod (page 1025). This allows you to permit incoming and outgoing trafc between all shards including constituent replica set members, to: all mongod (page 1025) instances in the shards replica sets. all mongod (page 1025) instances in other shards.
3
Furthermore, shards need to be able make outgoing connections to: all mongos (page 1036) instances. all mongod (page 1025) instances in the cong servers. Create a rule that resembles the following, and replace the <ip-address> with the address of the cong servers and the mongos (page 1036) instances:
iptables -A OUTPUT -d <ip-address> -p tcp --source-port 27018 -m state --state ESTABLISHED -j ACCEPT
1. The mongostat (page 1067) diagnostic tool, when running with the --discover (page 1069) needs to be able to reach all components of a cluster, including the cong servers, the shard servers, and the mongos (page 1036) instances. 2. If your monitoring system needs access the HTTP interface, insert the following rule to the chain:
Replace <ip-address> with the address of the instance that needs access to the HTTP or REST interface. For all deployments, you should restrict access to this port to only the monitoring instance. Optional For shard server mongod (page 1025) instances running with shardsvr (page 1093), the rule would resemble the following:
2 You can also specify the shard server option using the shardsvr (page 1093) setting in the conguration le. Shard members are also often conventional replica sets using the default port. 3 All shards in a cluster need to be able to communicate with all other shards to facilitate chunk and balancing operations.
141
For cong server mongod (page 1025) instances running with configsvr (page 1093), the rule would resemble the following:
Change Default Policy to DROP The default policy for iptables chains is to allow all trafc. After completing all iptables conguration changes, you must change the default policy to DROP so that all trafc that isnt explicitly allowed as above will not be able to reach components of the MongoDB deployment. Issue the following commands to change this policy:
iptables -P INPUT DROP iptables -P OUTPUT DROP
Manage and Maintain iptables Conguration This section contains a number of basic operations for managing and using iptables. There are various front end tools that automate some aspects of iptables conguration, but at the core all iptables front ends provide the same basic functionality:
Make all iptables Rules Persistent
By default all iptables rules are only stored in memory. When your system restarts, your rewall rules will revert to their defaults. When you have tested a rule set and have guaranteed that it effectively controls trafc you can use the following operations to you should make the rule set persistent. On Red Hat Enterprise Linux, Fedora Linux, and related distributions you can issue the following command:
service iptables save
On Debian, Ubuntu, and related distributions, you can use the following command to dump the iptables rules to the http://docs.mongodb.org/manual/etc/iptables.conf le:
iptables-save > /etc/iptables.conf
Place this command in your rc.local le, or in the http://docs.mongodb.org/manual/etc/network/if-up.d/ipta le with other similar operations.q
List all iptables Rules
To list all of currently applied iptables rules, use the following operation at the system shell.
iptables --L
142
If you make a conguration mistake when entering iptables rules or simply need to revert to the default rule set, you can use the following operation at the system shell to ush all rules:
iptables --F
If youve already made your iptables rules persistent, you will need to repeat the appropriate procedure in the Make all iptables Rules Persistent (page 142) section.
143
Patterns This section contains a number of patterns and examples for conguring Windows Firewall for use with MongoDB deployments. If you have congured different ports using the port (page 1083) conguration setting, you will need to modify the rules accordingly.
Trafc to and from mongod.exe Instances
This pattern is applicable to all mongod.exe (page 1045) instances running as standalone instances or as part of a replica set. The goal of this pattern is to explicitly allow trafc to the mongod.exe (page 1045) instance from the application server.
netsh advfirewall firewall add rule name="Open mongod port 27017" dir=in action=allow protocol=TCP lo
This rule allows all incoming trafc to port 27017, which allows the application server to connect to the mongod.exe (page 1045) instance. Windows Firewall also allows enabling network access for an entire application rather than to a specic port, as in the following example:
netsh advfirewall firewall add rule name="Allowing mongod" dir=in action=allow program=" C:\mongodb\b
You can allow all access for a mongos.exe (page 1046) server, with the following invocation:
netsh advfirewall firewall add rule name="Allowing mongos" dir=in action=allow program=" C:\mongodb\b
mongos.exe (page 1046) instances provide query routing for sharded clusters. Clients connect to mongos.exe (page 1046) instances, which behave from the clients perspective as mongod.exe (page 1045) instances. In turn, the mongos.exe (page 1046) connects to all mongod.exe (page 1045) instances that are components of the sharded cluster. Use the same Windows Firewall command to allow trafc to and from these instances as you would from the mongod.exe (page 1045) instances that are members of the replica set.
netsh advfirewall firewall add rule name="Open mongod shard port 27018" dir=in action=allow protocol=
Conguration servers, host the cong database that stores metadata for sharded clusters. Each production cluster has three conguration servers, initiated using the mongod --configsvr (page 1034) option. 4 Conguration servers listen for connections on port 27019. As a result, add the following Windows Firewall rules to the cong server to allow incoming and outgoing connection on port 27019, for connection to the other cong servers.
netsh advfirewall firewall add rule name="Open mongod config svr port 27019" dir=in action=allow prot
Additionally, cong servers need to allow incoming connections from all of the mongos.exe (page 1046) instances in the cluster and all mongod.exe (page 1045) instances in the cluster. Add rules that resemble the following:
netsh advfirewall firewall add rule name="Open mongod config svr inbound" dir=in action=allow protoco
Replace <ip-address> with the addresses of the mongos.exe (page 1046) instances and the shard mongod.exe (page 1045) instances.
4
You can also run a cong server by setting the configsvr (page 1093) option in a conguration le.
144
For shard servers, running as mongod --shardsvr (page 1034) 5 Because the default port number when running with shardsvr (page 1093) is 27018, you must congure the following Windows Firewall rules to allow trafc to and from each shard:
netsh advfirewall firewall add rule name="Open mongod shardsvr inbound" dir=in action=allow protocol= netsh advfirewall firewall add rule name="Open mongod shardsvr outbound" dir=out action=allow protoco
Replace the <ip-address> specication with the IP address of all mongod.exe (page 1045) instances. This allows you to permit incoming and outgoing trafc between all shards including constituent replica set members to: all mongod.exe (page 1045) instances in the shards replica sets. all mongod.exe (page 1045) instances in other shards.
6
Furthermore, shards need to be able make outgoing connections to: all mongos.exe (page 1046) instances. all mongod.exe (page 1045) instances in the cong servers. Create a rule that resembles the following, and replace the <ip-address> with the address of the cong servers and the mongos.exe (page 1046) instances:
netsh advfirewall firewall add rule name="Open mongod config svr outbound" dir=out action=allow proto
1. The mongostat (page 1067) diagnostic tool, when running with the --discover (page 1069) needs to be able to reach all components of a cluster, including the cong servers, the shard servers, and the mongos.exe (page 1046) instances. 2. If your monitoring system needs access the HTTP interface, insert the following rule to the chain:
netsh advfirewall firewall add rule name="Open mongod HTTP monitoring inbound" dir=in action=all
Replace <ip-address> with the address of the instance that needs access to the HTTP or REST interface. For all deployments, you should restrict access to this port to only the monitoring instance. Optional For shard server mongod.exe (page 1045) instances running with shardsvr (page 1093), the rule would resemble the following:
netsh advfirewall firewall add rule name="Open mongos HTTP monitoring inbound" dir=in action=all
For cong server mongod.exe (page 1045) instances running with configsvr (page 1093), the rule would resemble the following:
netsh advfirewall firewall add rule name="Open mongod configsvr HTTP monitoring inbound" dir=in
5 You can also specify the shard server option using the shardsvr (page 1093) setting in the conguration le. Shard members are also often conventional replica sets using the default port. 6 All shards in a cluster need to be able to communicate with all other shards to facilitate chunk and balancing operations.
145
Manage and Maintain Windows Firewall Congurations This section contains a number of basic operations for managing and using netsh. While you can use the GUI front ends to manage the Windows Firewall, all core functionality is accessible is accessible from netsh.
Delete all Windows Firewall Rules
netsh advfirewall firewall delete rule name="Open mongod shard port 27018" protocol=tcp localport=270
To simplify administration of larger collection of systems, you can export or import rewall systems from different servers) rules very easily on Windows: Export all rewall rules with the following command:
netsh advfirewall export "C:\temp\MongoDBfw.wfw"
Replace "C:\temp\MongoDBfw.wfw" with a path of your choosing. You can use a command in the following form to import a le created using this operation:
netsh advfirewall import "C:\temp\MongoDBfw.wfw"
146
Information to Provide All vulnerability reports should contain as much information as possible so 10gen can move quickly to resolve the issue. In particular, please include the following: The name of the product. Common Vulnerability information, if applicable, including: CVSS (Common Vulnerability Scoring System) Score. CVE (Common Vulnerability and Exposures) Identier. Contact information, including an email address and/or phone number, if applicable. Create the Report in Jira 10gen prefers jira.mongodb.org for all communication regarding MongoDB and related products. Submit a ticket in the Core Server Security project at: https://jira.mongodb.org/browse/SECURITY/. The ticket number will become the reference identication for the issue for the lifetime of the issue. You can use this identier for tracking purposes. Send the Report via Email While Jira is preferred, you may also report vulnerabilities via email to [email protected]. You may encrypt email using the 10gen public key at http://docs.mongodb.org/10gen-gpg-key.asc. 10gen responds to vulnerability reports sent via email with a response email that contains a reference number for a Jira ticket posted to the SECURITY project. Evaluation of a Vulnerability Report 10gen validates all submitted vulnerabilities and uses Jira to track all communications regarding a vulnerability, including requests for clarication or additional information. If needed, 10gen representatives set up a conference call to exchange information regarding the vulnerability. Disclosure 10gen requests that you do not publicly disclose any information regarding the vulnerability or exploit the issue until 10gen has had the opportunity to analyze the vulnerability, to respond to the notication, and to notify key users, customers, and partners. The amount of time required to validate a reported vulnerability depends on the complexity and severity of the issue. 10gen takes all required vulnerabilities very seriously and will always ensure that there is a clear and open channel of communication with the reporter. After validating an issue, 10gen coordinates public disclosure of the issue with the reporter in a mutually agreed timeframe and format. If required or requested, the reporter of a vulnerability will receive credit in the published security bulletin.
147
1. Start the mongod (page 1025) or mongos (page 1036) instance without the auth (page 1085) or keyFile (page 1085) setting. 2. Create the administrator user as described in Create a User Administrator (page 148). 3. Re-start the mongod (page 1025) or mongos (page 1036) instance with the auth (page 1085) or keyFile (page 1085) setting.
Enable Authentication and then Create Administrator
1. Start the mongod (page 1025) or mongos (page 1036) instance with the auth (page 1085) or keyFile (page 1085) setting. 2. Connect to the instance on the same system so that you can authenticate using the localhost exception (page 149). 3. Create the administrator user as described in Create a User Administrator (page 148). Query Authenticated Users If you have the userAdmin (page 159) or userAdminAnyDatabase (page 161) role on a database, you can query authenticated users in that database with the following operation:
db.system.users.find()
148
This should be the rst user created for a MongoDB deployment. This user can then create all other users in the system. Important: The userAdminAnyDatabase (page 161) user can grant itself and any other user full access to the entire MongoDB instance. The credentials to log in as this user should be carefully controlled. Users with the userAdminAnyDatabase (page 161) and userAdminAnyDatabase (page 161) privileges are not the same as the UNIX root superuser in that this role confers no additional access beyond user administration. These users cannot perform administrative operations or read or write data without rst conferring themselves with additional permissions. Note: The userAdmin (page 159) is a database specic privilege, and only grants a user the ability to administer users on a single database. However, for the admin database, userAdmin (page 159) allows a user the ability to gain userAdminAnyDatabase (page 161), and so for the admin database only these roles are effectively the same.
Create a User Administrator 1. Connect to the mongod (page 1025) or mongos (page 1036) by either: Authenticating as an existing user with the userAdmin (page 159) or userAdminAnyDatabase (page 161) role. Authenticating using the localhost exception (page 149). When creating the rst user in a deployment, you must authenticate using the localhost exception (page 149). 2. Switch to the admin database:
db = db.getSiblingDB(admin)
3. Add the user with either the userAdmin (page 159) role or userAdminAnyDatabase (page 161) role, and only that role, by issuing a command similar to the following, where <username> is the username and <password> is the password:
db.addUser( { user: "<username>", pwd: "<password>", roles: [ "userAdminAnyDatabase" ] } )
Authenticate with Full Administrative Access via Localhost If there are no users for the admin database, you can connect with full administrative access via the localhost interface. This bypass exists to support bootstrapping new deployments. This approach is useful, for example, if you want to run mongod (page 1025) or mongos (page 1036) with authentication before creating your rst user. To authenticate via localhost, connect to the mongod (page 1025) or mongos (page 1036) from a client running on the same system. Your connection will have full administrative access. To disable the localhost bypass, set the enableLocalhostAuthBypass (page 1097) parameter using setParameter (page 1091) during startup:
mongod --setParameter enableLocalhostAuthBypass=0
Note: For versions of MongoDB 2.2 prior to 2.2.4, if mongos (page 1036) is running with keyFile (page 1085), then all users connecting over the localhost interface must authenticate, even if there arent any users in the admin database. Connections on localhost are not correctly granted full access on sharded systems that run those versions. MongoDB 2.2.4 resolves this issue. 19.2. Access Control 149
Note: In version 2.2, you cannot add the rst user to a sharded cluster using the localhost connection. If you are running a 2.2 sharded cluster and want to enable authentication, you must deploy the cluster and add the rst user to the admin database before restarting the cluster to run with keyFile (page 1085).
Example The following creates a user named Bob in the admin database. The privilege document (page 162) uses Bobs credentials from the products database and assigns him userAdmin privileges.
use admin db.addUser( { user: "Bob", userSource: "products", roles: [ "userAdmin" ] } )
Example The following creates a user named Carlos in the admin database and gives him readWrite access to the config database, which lets him change certain settings for sharded clusters, such as to disable the balancer.
db = db.getSiblingDB(admin) db.addUser( { user: "Carlos", pwd: "Moon1234", roles: [ "clusterAdmin" ], otherDBRoles: { config: [ "readWrite" ] } } )
150
Only the admin database supports the otherDBRoles (page 163) eld.
Generate a Key File on a Linux or Unix System Use the following openssl command at the system shell to generate pseudo-random content for a key le for systems that do not have Windows components (i.e. OS X, Unix, or Linux systems):
openssl rand -base64 753
Key File Properties Be aware that MongoDB strips whitespace characters (e.g. x0d, x09, and x20,) for cross-platform convenience. As a result, the following operations produce identical keys:
echo echo echo echo -e -e -e -e "my secret key" > key1 "my secret key\n" > key2 "my secret key" > key3 "my\r\nsecret\r\nkey\r\n" > key4
151
Process Overview To run MongoDB with Kerberos support, you must: Congure a Kerberos service principal for each mongod (page 1025) and mongos (page 1036) instance in your MongoDB deployment. Generate and distribute keytab les for each MongoDB component (i.e. mongod (page 1025) and mongos (page 1036))in your deployment. Ensure that you only transmit keytab les over secure channels. Optional. Start the mongod (page 1025) instance without auth (page 1085) and create users inside of MongoDB that you can use to bootstrap your deployment. Start mongod (page 1025) and mongos (page 1036) with the KRB5_KTNAME environment variable as well as a number of required run time options. If you did not create Kerberos user accounts, you can use the localhost exception (page 149) to create users at this point until you create the rst user on the admin database. Authenticate clients, including the mongo (page 1040) shell using Kerberos. Operations
Create Users and Privilege Documents
For every user that you want to be able to authenticate using Kerberos, you must create corresponding privilege documents in the system.users (page 162) collection to provision access to users. Consider the following document:
{ user: "application/[email protected]", roles: ["read"], userSource: "$external" }
This grants the Kerberos user principal application/[email protected] read only access to a database. The userSource (page 163) $external reference allows mongod (page 1025) to consult an external source (i.e. Kerberos) to authenticate this user. In the mongo (page 1040) shell you can pass the db.addUser() (page 971) a user privilege document to provision access to users, as in the following operation:
db = db.getSiblingDB("records") db.addUser( { "user": "application/[email protected]", "roles": [ "read" ], "userSource": "$external" } )
These operations grants the Kerberos user application/[email protected] access to the records database. To remove access to a user, use the remove() (page 948) method, as in the following example:
db.system.users.remove( { user: "application/[email protected]" } )
To modify a user document, use update (page 221) operations on documents in the system.users (page 162) collection. See also: system.users Privilege Documents (page 162) and User Privilege Roles in MongoDB (page 157). 152 Chapter 19. Tutorials
Once you have provisioned privileges to users in the mongod (page 1025), and obtained a valid keytab le, you must start mongod (page 1025) using a command in the following form:
env KRB5_KTNAME=<path to keytab file> <mongod invocation>
For successful operation with mongod (page 1025) use the following run time options in addition to your normal default conguration options: --setParameter (page 1031) with the authenticationMechanisms=GSSAPI argument to enable support for Kerberos. --auth (page 1027) to enable authentication. --keyFile (page 1027) to allow components of a single MongoDB deployment to communicate with each other, if needed to support replica set and sharded cluster operations. keyFile (page 1085) implies auth (page 1085). For example, consider the following invocation:
env KRB5_KTNAME=/opt/mongodb/mongod.keytab \ /opt/mongodb/bin/mongod --dbpath /opt/mongodb/data \ --fork --logpath /opt/mongodb/log/mongod.log \ --auth --setParameter authenticationMechanisms=GSSAPI
You can also specify these options using the conguration le. As in the following:
# /opt/mongodb/mongod.conf, Example configuration file. fork = true auth = true dbpath = /opt/mongodb/data logpath = /opt/mongodb/log/mongod.log setParameter = authenticationMechanisms=GSSAPI
To use this conguration le, start mongod (page 1025) as in the following:
env KRB5_KTNAME=/opt/mongodb/mongod.keytab \ /opt/mongodb/bin/mongod --config /opt/mongodb/mongod.conf
To start a mongos (page 1036) instance using Kerberos, you must create a Kerberos service principal and deploy a keytab le for this instance, and then start the mongos (page 1036) with the following invocation:
env KRB5_KTNAME=/opt/mongodb/mongos.keytab \ /opt/mongodb/bin/mongos --configdb shard0.example.net,shard1.example.net,shard2.example.net \ --setParameter authenticationMechanisms=GSSAPI \ --keyFile /opt/mongodb/mongos.keyfile
If you encounter problems when trying to start mongod (page 1025) or mongos (page 1036), please see the troubleshooting section (page 154) for more information. Important: Before users can authenticate to MongoDB using Kerberos you must create users (page 152) and grant them privileges within MongoDB. If you have not created users when you start MongoDB with Kerberos you can use the localhost authentication exception (page 149) to add users. See the Create Users and Privilege Documents (page 152) section and the User Privilege Roles in MongoDB (page 157) document for more information.
153
To connect to a mongod (page 1025) instance using the mongo (page 1040) shell you must begin by using the kinit program to initialize and authenticate a Kerberos session. Then, start a mongo (page 1040) instance, and use the db.auth() (page 972) method, to authenticate against the special $external database, as in the following operation:
use $external db.auth( { mechanism: "GSSAPI", user: "application/[email protected]" } )
Alternately, you can authenticate using command line options to mongo (page 1040), as in the following equivalent example:
mongo --authenticationMechanism=GSSAPI --authenticationDatabase=$external \ --username application/[email protected]
These operations authenticates the Kerberos principal name application/[email protected] to the connected mongod (page 1025), and will automatically acquire all available privileges as needed.
Use MongoDB Drivers to Authenticate with Kerberos
At the time of release, the C++, Java, C#, and Python drivers all provide support for Kerberos authentication to MongoDB. Consider the following tutorials for more information: Java C# C++ Python Troubleshooting
Kerberos Conguration Checklist
If youre having trouble getting mongod (page 1025) to start with Kerberos, there are a number of Kerberos-specic issues that can prevent successful authentication. As you begin troubleshooting your Kerberos deployment, ensure that: The mongod (page 1025) is from MongoDB Enterprise. You have a valid keytab le specied in the environment running the mongod (page 1025). For the mongod (page 1025) instance running on the db0.example.net host, the service principal should be mongodb/db0.example.net. DNS allows the mongod (page 1025) to resolve the components of the Kerberos infrastructure. You should have both A and PTR records (i.e. forward and reverse DNS) for the system that runs the mongod (page 1025) instance. The canonical system hostname of the system that runs the mongod (page 1025) instance is the resolvable fully qualied domain for this host. Test system hostname resolution with the hostname -f command at the system prompt.
154
Both the Kerberos KDC and the system running mongod (page 1025) instance must be able to resolve each other using DNS 7 The time systems of the systems running the mongod (page 1025) instances and the Kerberos infrastructure are synchronized. Time differences greater than 5 minutes will prevent successful authentication. If you still encounter problems with Kerberos, you can start both mongod (page 1025) and mongo (page 1040) (or another client) with the environment variable KRB5_TRACE set to different les to produce more verbose logging of the Kerberos process to help further troubleshooting, as in the following example:
env KRB5_KTNAME=/opt/mongodb/mongod.keytab \ KRB5_TRACE=/opt/mongodb/log/mongodb-kerberos.log \ /opt/mongodb/bin/mongod --dbpath /opt/mongodb/data \ --fork --logpath /opt/mongodb/log/mongod.log \ --auth --setParameter authenticationMechanisms=GSSAPI
In some situations, MongoDB will return error messages from the GSSAPI interface if there is a problem with the Kerberos service. GSSAPI error in client while negotiating security context. This error occurs on the client and reects insufcient credentials or a malicious attempt to authenticate. If you receive this error ensure that youre using the correct credentials and the correct fully qualied domain name when connecting to the host. GSSAPI error acquiring credentials. This error only occurs when attempting to start the mongod (page 1025) or mongos (page 1036) and reects improper conguration of system hostname or a missing or incorrectly congured keytab le. If you encounter this problem, consider all the items in the Kerberos Conguration Checklist (page 154), in particular: examine the keytab le, with the following command:
klist -k <keytab>
Replace <keytab> with the path to your keytab le. check the congured hostname for your system, with the following command:
hostname -f
Ensure that this name matches the name in the keytab le, or use the saslHostName (page 1098) to pass MongoDB the correct hostname.
Enable the Traditional MongoDB Authentication Mechanism
For testing and development purposes you can enable both the Kerberos (i.e. GSSAPI) authentication mechanism in combination with the traditional MongoDB challenge/response authentication mechanism (i.e. MONGODB-CR), using the following setParameter (page 1091) run-time option:
mongod --setParameter authenticationMechanisms=GSSAPI,MONGODB-CR
7 By default, Kerberos attempts to resolve hosts using the content of the http://docs.mongodb.org/manual/etc/kerb5.conf before using DNS to resolve hosts.
155
Warning: All keyFile (page 1085) internal authentication between members of a replica set or sharded cluster still uses the MONGODB-CR authentication mechanism, even if MONGODB-CR is not enabled. All client authentication will still use Kerberos.
156
CHAPTER 20
Reference
20.1.1 Roles
Changed in version 2.4. Roles in MongoDB provide users with a set of specic privileges, on specic logical databases. Users may have multiple roles and may have different roles on different logical database. Roles only grant privileges and never limit access: if a user has read (page 157) and readWriteAnyDatabase (page 161) permissions on the records database, that user will be able to write data to the records database. Note: By default, MongoDB 2.4 is backwards-compatible with the MongoDB 2.2 access control roles. You can explicitly disable this backwards-compatibility by setting the supportCompatibilityFormPrivilegeDocuments (page 1099) option to 0 during startup, as in the following command-line invocation of MongoDB:
mongod --setParameter supportCompatibilityFormPrivilegeDocuments=0
In general, you should set this option if your deployment does not need to support legacy user documents. Typically legacy user documents are only useful during the upgrade process and while you migrate applications to the updated privilege document form. See privilege documents (page 162) and Delegated Credentials for MongoDB Authentication (page 164) for more information about permissions and authentication in MongoDB. Database User Roles read Provides users with the ability to read data from any collection within a specic logical database. This includes find() (page 928) and the following database commands: aggregate (page 812) checkShardingIndex (page 851) cloneCollectionAsCapped (page 859) 157
collStats (page 877) count (page 812) dataSize (page 881) dbHash (page 881) dbStats (page 881) distinct (page 813) filemd5 (page 868) geoNear (page 826) geoSearch (page 827) geoWalk (page 827) group (page 814) mapReduce (page 818) (inline output only.) text (page 836) (beta feature.) readWrite Provides users with the ability to read from or write to any collection within a specic logical database. Users with readWrite (page 158) have access to all of the operations available to read (page 157) users, as well as the following basic write operations: insert() (page 939), remove() (page 948), and update() (page 951). Additionally, users with the readWrite (page 158) have access to the following database commands: cloneCollection (page 860) (as the target database.) convertToCapped (page 865) create (page 866) (and to create collections implicitly.) drop() (page 925) dropIndexes (page 867) emptycapped (page 919) ensureIndex() (page 925) findAndModify (page 829) mapReduce (page 818) (output to a collection.) renameCollection (page 871) (within the same database.) Database Administration Roles dbAdmin Provides the ability to perform the following set of administrative operations within the scope of this logical database. clean (page 859) collMod (page 861) collStats (page 877) compact (page 862)
158
convertToCapped (page 865) create (page 866) db.createCollection() (page 974) dbStats (page 881) drop() (page 925) dropIndexes (page 867) ensureIndex() (page 925) indexStats (page 887) profile (page 892) reIndex (page 870) renameCollection (page 871) (within a single database.) validate (page 911) Furthermore, only dbAdmin (page 158) has the ability to read the system.profile (page 1103) collection. userAdmin Allows users to read and write data to the system.users (page 162) collection of any database. Users with this role will be able to modify permissions for existing users and create new users. userAdmin (page 159) does not restrict the permissions that a user can grant, and a userAdmin (page 159) user can grant privileges to themselves or other users in excess of the userAdmin (page 159) users current privileges. Important: userAdmin (page 159) is effectively the superuser role for a specic database. Users with userAdmin (page 159) can grant themselves all privileges. However, userAdmin (page 159) does not explicitly authorize a user for any privileges beyond user administration. Note: The userAdmin (page 159) is a database specic privilege, and only grants a user the ability to administer users on a single database. However, for the admin database, userAdmin (page 159) allows a user the ability to gain userAdminAnyDatabase (page 161), and so for the admin database only these roles are effectively the same.
159
_cpuProfilerStop cursorInfo (page 881) diagLogging (page 883) dropDatabase (page 867) enableSharding (page 851) flushRouterConfig (page 852) fsync (page 868) db.fsyncUnlock() (page 981) getCmdLineOpts (page 883) getLog (page 884) getParameter (page 869) getShardMap (page 852) getShardVersion (page 852) hostInfo (page 884) db.currentOp() (page 975) db.killOp() (page 985) listDatabases (page 892) listShards (page 853) logRotate (page 870) moveChunk (page 853) movePrimary (page 854) netstat (page 892) removeShard (page 854) repairDatabase (page 872) replSetFreeze (page 844) replSetGetStatus (page 844) replSetInitiate (page 846) replSetMaintenance (page 847) replSetReconfig (page 847) replSetStepDown (page 848) replSetSyncFrom (page 848) resync (page 849) serverStatus (page 893) setParameter (page 872) setShardVersion (page 855) shardCollection (page 855)
160
shardingState (page 855) shutdown (page 873) splitChunk (page 856) splitVector (page 857) split (page 857) top (page 910) touch (page 873) unsetSharding (page 858)
161
20.2.1 Overview
The documents in the <database>.system.users (page 162) collection store credentials and user privilege information used by the authentication system to provision access to users in the MongoDB system. See User Privilege Roles in MongoDB (page 157) for more information about access roles, and Security (page 131) for an overview security in MongoDB.
Note: The pwd (page 163) and userSource (page 163) elds are mutually exclusive. A single document cannot contain both. The following privilege document with the otherDBRoles (page 163) eld is only supported on the admin database:
{ user: "<username>", userSource: "<database>", otherDBRoles: { <database0> : [], <database1> : [] }, roles: [] }
Consider the content of the following elds in the system.users (page 162) documents: <database>.system.users.user user (page 162) is a string that identies each user. Users exist in the context of a single logical database; however, users from one database may obtain access in another database by way of the otherDBRoles (page 163) eld on the admin database, the userSource (page 163) eld, or the Any Database Roles (page 161). 162 Chapter 20. Reference
<database>.system.users.pwd pwd (page 163) holds a hashed shared secret used to authenticate the user (page 162). pwd (page 163) eld is mutually exclusive with the userSource (page 163) eld. <database>.system.users.roles roles (page 163) holds an array of user roles. The available roles are: read (page 157) readWrite (page 158) dbAdmin (page 158) userAdmin (page 159) clusterAdmin (page 159) readAnyDatabase (page 161) readWriteAnyDatabase (page 161) userAdminAnyDatabase (page 161) dbAdminAnyDatabase (page 161) See Roles (page 157) for full documentation of all available user roles. <database>.system.users.userSource A string that holds the name of the database that contains the credentials for the user. If userSource (page 163) is $external, then MongoDB will use an external resource, such as Kerberos, for authentication credentials. Note: In the current release, the only external authentication source is Kerberos, which is only available in MongoDB Enterprise. Use userSource (page 163) to ensure that a single users authentication credentials are only stored in a single location in a mongod (page 1025) instances data. A userSource (page 163) and user (page 162) pair identies a unique user in a MongoDB system. admin.system.users.otherDBRoles A document that holds one or more elds with a name that is the name of a database in the MongoDB instance with a value that holds a list of roles this user has on other databases. Consider the following example:
{ user: "admin", userSource: "$external", roles: [ "clusterAdmin"], otherDBRoles: { config: [ "read" ], records: [ "dbadmin" ] } }
This user has the following privileges: clusterAdmin (page 159) on the admin database, read (page 157) on the config (page 548) database, and dbAdmin (page 158) on the records database.
163
Then for every database that the application0 user requires access, add documents to the system.users (page 162) collection that resemble the following:
{ user: "application0", roles: [readWrite], userSource: "accounts" }
To gain privileges to databases where the application0 has access, you must rst authenticate to the accounts database.
164
Part IV
165
CRUD stands for create, read, update, and delete, which are the four core database operations used in database driven application development. The CRUD Operations for MongoDB (page 203) section provides introduction to each class of operation along with complete examples of each operation. The documents in the Read and Write Operations in MongoDB (page 169) section provide a higher level overview of the behavior and available functionality of these operations.
167
168
CHAPTER 21
The Read Operations (page 169) and Write Operations (page 181) documents provide higher level introductions and description of the behavior and operations of read and write operations for MongoDB deployments. The BSON Documents (page 189) provides an overview of documents and document-orientation in MongoDB.
The db.collection object species the database and collection to query. All queries in MongoDB address a single collection. You can enter db in the mongo (page 1040) shell to return the name of the current database. Use the show collections operation in the mongo (page 1040) shell to list the current collections in the database. Queries in MongoDB are BSON objects that use a set of query operators (page 767) to describe query parameters. The <query> argument of the find() (page 928) method holds this query document. A read operation without a query document will return all documents in the collection.
1
db.collection.find() (page 928) is a wrapper for the more formal query structure with the $query operator.
169
The <projection> argument describes the result set in the form of a document. Projections specify or limit the elds to return. Without a projection, the operation will return all elds of the documents. Specify a projection if your documents are larger, or when your application only needs a subset of available elds. The order of documents returned by a query is not dened and is not necessarily consistent unless you specify a sort (sort() (page 968)). For example, the following operation on the inventory collection selects all documents where the type eld equals food and the price eld has a value less than 9.95. The projection limits the response to the item and qty, and _id eld:
db.inventory.find( { type: food, price: { $lt: 9.95 } }, { item: 1, qty: 1 } )
The findOne() (page 933) method is similar to the find() (page 928) method except the findOne() (page 933) method returns a single document from a collection rather than a cursor. The method has the syntax:
db.collection.findOne( <query>, <projection> )
For additional documentation and examples of the main MongoDB read operators, refer to the Read (page 211) page of the Core MongoDB Operations (CRUD) (page 167) section. Query Document This section provides an overview of the query document for MongoDB queries. See the preceding section for more information on queries in MongoDB (page 169). The following examples demonstrate the key properties of the query document in MongoDB queries, using the find() (page 928) method from the mongo (page 1040) shell, and a collection of documents named inventory: An empty query document ({}) selects all documents in the collection:
db.inventory.find( {} )
Not specifying a query document to the find() (page 928) is equivalent to specifying an empty query document. Therefore the following operation is equivalent to the previous operation:
db.inventory.find()
A single-clause query selects all documents in a collection where a eld has a certain value. These are simple equality queries. In the following example, the query selects all documents in the collection where the type eld has the value snacks:
db.inventory.find( { type: "snacks" } )
A single-clause query document can also select all documents in a collection given a condition or set of conditions for one eld in the collections documents. Use the query operators (page 767) to specify conditions in a MongoDB query. In the following example, the query selects all documents in the collection where the value of the type eld is either food or snacks:
db.inventory.find( { type: { $in: [ food, snacks ] } } )
Note: Although you can express this query using the $or (page 774) operator, choose the $in (page 769) operator rather than the $or (page 774) operator when performing equality checks on the same eld.
170
A compound query can specify conditions for more than one eld in the collections documents. Implicitly, a logical AND conjunction connects the clauses of a compound query so that the query selects the documents in the collection that match all the conditions. In the following example, the query document species an equality match on a single eld, followed by a range of values for a second eld using a comparison operator (page 767):
db.inventory.find( { type: food, price: { $lt: 9.95 } } )
This query selects all documents where the type eld has the value food and the value of the price eld is less than ($lt (page 770)) 9.95. Using the $or (page 774) operator, you can specify a compound query that joins each clause with a logical OR conjunction so that the query selects the documents in the collection that match at least one condition. In the following example, the query document selects all documents in the collection where the eld qty has a value greater than ($gt (page 768)) 100 or the value of the price eld is less than ($lt (page 770)) 9.95:
db.inventory.find( { $or: [ { qty: { $gt: 100 } }, { price: { $lt: 9.95 } } ] } )
With additional clauses, you can specify precise conditions for matching documents. In the following example, the compound query document selects all documents in the collection where the value of the type eld is food and either the qty has a value greater than ($gt (page 768)) 100 or the value of the price eld is less than ($lt (page 770)) 9.95:
db.inventory.find( { type: food, $or: [ { qty: { $gt: 100 } }, { price: { $lt: 9.95 } } ] } )
Subdocuments
When the eld holds an embedded document (i.e. subdocument), you can either specify the entire subdocument as the value of a eld, or reach into the subdocument using dot notation, to specify values for individual elds in the subdocument: Equality matches within subdocuments select documents if the subdocument matches exactly the specied subdocument, including the eld order. In the following example, the query matches all documents where the value of the eld producer is a subdocument that contains only the eld company with the value ABC123 and the eld address with the value 123 Street, in the exact order:
db.inventory.find( { producer: { company: ABC123, address: 123 Street } } )
Equality matches for specic elds within subdocuments select documents when the eld in the subdocument contains a eld that matches the specied value. In the following example, the query uses the dot notation to match all documents where the value of the eld producer is a subdocument that contains a eld company with the value ABC123 and may contain other elds: 21.1. Read Operations 171
Arrays
When the eld holds an array, you can query for values in the array, and if the array holds sub-documents, you query for specic elds within the sub-documents using dot notation: Equality matches can specify an entire array, to select an array that matches exactly. In the following example, the query matches all documents where the value of the eld tags is an array and holds three elements, fruit, food, and citrus, in this order:
db.inventory.find( { tags: [ fruit, food, citrus ] } )
Equality matches can specify a single element in the array. If the array contains at least one element with the specied value, as in the following example: the query matches all documents where the value of the eld tags is an array that contains, as one of its elements, the element fruit:
db.inventory.find( { tags: fruit } )
Equality matches can also select documents by values in an array using the array index (i.e. position) of the element in the array, as in the following example: the query uses the dot notation to match all documents where the value of the tags eld is an array whose rst element equals fruit:
db.inventory.find( { tags.0 : fruit } )
In the following examples, consider an array that contains subdocuments: If you know the array index of the subdocument, you can specify the document using the subdocuments position. The following example selects all documents where the memos contains an array whose rst element (i.e. index is 0) is a subdocument with the eld by with the value shipping:
db.inventory.find( { memos.0.by: shipping } )
If you do not know the index position of the subdocument, concatenate the name of the eld that contains the array, with a dot (.) and the name of the eld in the subdocument. The following example selects all documents where the memos eld contains an array that contains at least one subdocument with the eld by with the value shipping:
db.inventory.find( { memos.by: shipping } )
To match by multiple elds in the subdocument, you can use either dot notation or the $elemMatch (page 787) operator: The following example uses dot notation to query for documents where the value of the memos eld is an array that has at least one subdocument that contains the eld memo equal to on time and the eld by equal to shipping:
db.inventory.find( { memos.memo: on time, memos.by: shipping } )
172
The following example uses $elemMatch (page 787) to query for documents where the value of the memos eld is an array that has at least one subdocument that contains the eld memo equal to on time and the eld by equal to shipping:
db.inventory.find( { memos: { $elemMatch: { memo : on time, by: shipping } } } )
Refer to the Query, Update and Projection Operators (page 767) document for the complete list of query operators. Result Projections The projection specication limits the elds to return for all matching documents. Restricting the elds to return can minimize network transit costs and the costs of deserializing documents in the application layer. The second argument to the find() (page 928) method is a projection, and it takes the form of a document with a list of elds for inclusion or exclusion from the result set. You can either specify the elds to include (e.g. { field: 1 }) or specify the elds to exclude (e.g. { field: 0 }). The _id eld is, by default, included in the result set. To exclude the _id eld from the result set, you need to specify in the projection document the exclusion of the _id eld (i.e. { _id: 0 }). Note: You cannot combine inclusion and exclusion semantics in a single projection with the exception of the _id eld. Consider the following projection specications in find() (page 928) operations: If you specify no projection, the find() (page 928) method returns all elds of all documents that match the query.
db.inventory.find( { type: food } )
This operation will return all documents in the inventory collection where the value of the type eld is food. A projection can explicitly include several elds. In the following operation, find() (page 928) method returns all documents that match the query as well as item and qty elds. The results also include the _id eld:
db.inventory.find( { type: food }, { item: 1, qty: 1 } )
You can remove the _id eld from the results by specifying its exclusion in the projection, as in the following example:
db.inventory.find( { type: food }, { item: 1, qty: 1, _id:0 } )
This operation returns all documents that match the query, and only includes the item and qty elds in the result set. To exclude a single eld or group of elds you can use a projection in the following form:
db.inventory.find( { type: food }, { type:0 } )
This operation returns all documents where the value of the type eld is food, but does not include the type eld in the output. 21.1. Read Operations 173
With the exception of the _id eld you cannot combine inclusion and exclusion statements in projection documents. The $elemMatch (page 802) and $slice (page 805) projection operators provide more control when projecting only a portion of an array.
21.1.2 Indexes
Indexes improve the efciency of read operations by reducing the amount of data that query operations need to process and thereby simplifying the work associated with fullling queries within MongoDB. The indexes themselves are a special data structure that MongoDB maintains when inserting or modifying documents, and any given index can: support and optimize specic queries, sort operations, and allow for more efcient storage utilization. For more information about indexes in MongoDB see: Indexes (page 329) and Indexing Overview (page 331). You can create indexes using the db.collection.ensureIndex() (page 925) method in the mongo (page 1040) shell, as in the following prototype operation:
db.collection.ensureIndex( { <field1>: <order>, <field2>: <order>, ... } )
The field species the eld to index. The eld may be a eld from a subdocument, using dot notation to specify subdocument elds. You can create an index on a single eld or a compound index (page 333) that includes multiple elds in the index. The order option is species either ascending ( 1 ) or descending ( -1 ). MongoDB can read the index in either direction. In most cases, you only need to specify indexing order (page 334) to support sort operations in compound queries. Covering a Query An index covers (page 344) a query, a covered query, when: all the elds in the query (page 170) are part of that index, and all the elds returned in the documents that match the query are in the same index. For these queries, MongoDB does not need to inspect at documents outside of the index, which is often more efcient than inspecting entire documents. Example Given a collection inventory with the following index on the type and item elds:
{ type: 1, item: 1 }
This index will cover the following query on the type and item elds, which returns only the item eld:
db.inventory.find( { type: "food", item:/^c/ }, { item: 1, _id: 0 } )
However, this index will not cover the following query, which returns the item eld and the _id eld:
db.inventory.find( { type: "food", item:/^c/ }, { item: 1 } )
See Create Indexes that Support Covered Queries (page 344) for more information on the behavior and use of covered queries. 174 Chapter 21. Read and Write Operations in MongoDB
Measuring Index Use The explain() (page 957) cursor method allows you to inspect the operation of the query system, and is useful for analyzing the efciency of queries, and for determining how the query uses the index. Call the explain() (page 957) method on a cursor returned by find() (page 928), as in the following example:
db.inventory.find( { type: food } ).explain()
Note: Only use explain() (page 957) to test the query operation, and not the timing of query performance. Because explain() (page 957) attempts multiple query plans, it does not reect accurate query performance. If the above operation could not use an index, the output of explain() (page 957) would resemble the following:
{ "cursor" : "BasicCursor", "isMultiKey" : false, "n" : 5, "nscannedObjects" : 4000006, "nscanned" : 4000006, "nscannedObjectsAllPlans" : 4000006, "nscannedAllPlans" : 4000006, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 2, "nChunkSkips" : 0, "millis" : 1591, "indexBounds" : { }, "server" : "mongodb0.example.net:27017" }
The BasicCursor value in the cursor (page 960) eld conrms that this query does not use an index. The explain.nscannedObjects (page 960) value shows that MongoDB must scan 4,000,006 documents to return only 5 documents. To increase the efciency of the query, create an index on the type eld, as in the following example:
db.inventory.ensureIndex( { type: 1 } )
Run the explain() (page 957) operation, as follows, to test the use of the index:
db.inventory.find( { type: food } ).explain()
175
The BtreeCursor value of the cursor (page 960) eld indicates that the query used an index. This query: returned 5 documents, as indicated by the n (page 960) eld; scanned 5 documents from the index, as indicated by the nscanned (page 960) eld; then read 5 full documents from the collection, as indicated by the nscannedObjects (page 960) eld. Although the query uses an index to nd the matching documents, if indexOnly (page 960) is false then an index could not cover (page 174) the query: MongoDB could not both match the query conditions (page 170) and return the results using only this index. See Create Indexes that Support Covered Queries (page 344) for more information. Query Optimization The MongoDB query optimizer processes queries and chooses the most efcient query plan for a query given the available indexes. The query system then uses this query plan each time the query runs. The query optimizer occasionally reevaluates query plans as the content of the collection changes to ensure optimal query plans. To create a new query plan, the query optimizer: 1. runs the query against several candidate indexes in parallel. 2. records the matches in a common results buffer or buffers. If the candidate plans include only ordered query plans, there is a single common results buffer. If the candidate plans include only unordered query plans, there is a single common results buffer. If the candidate plans include both ordered query plans and unordered query plans, there are two common results buffers, one for the ordered plans and the other for the unordered plans. If an index returns a result already returned by another index, the optimizer skips the duplicate match. In the case of the two buffers, both buffers are de-duped. 3. stops the testing of candidate plans and selects an index when one of the following events occur: An unordered query plan has returned all the matching results; or An ordered query plan has returned all the matching results; or An ordered query plan has returned a threshold number of matching results: Version 2.0: Threshold is the query batch size. The default batch size is 101. Version 2.2: Threshold is 101. The selected index becomes the index specied in the query plan; future iterations of this query or queries with the same query pattern will use this index. Query pattern refers to query select conditions that differ only in the values, as in the following two queries with the same query pattern:
db.inventory.find( { type: food } ) db.inventory.find( { type: utensil } )
To manually compare the performance of a query using more than one index, you can use the hint() (page 962) and explain() (page 957) methods in conjunction, as in the following prototype:
db.collection.find().hint().explain()
176
The following operations each run the same query but will reect the use of the different indexes:
db.inventory.find( { type: food } ).hint( { type: 1 } ).explain() db.inventory.find( { type: food } ).hint( { type: 1, name: 1 }).explain()
This returns the statistics regarding the execution of the query. For more information on the output of explain() (page 957), see cursor.explain() (page 957). Note: If you run explain() (page 957) without including hint() (page 962), the query optimizer reevaluates the query and runs against multiple indexes before returning the query statistics. As collections change over time, the query optimizer deletes a query plan and reevaluates the after any of the following events: the collection receives 1,000 write operations. the reIndex (page 870) rebuilds the index. you add or drop an index. the mongod (page 1025) process restarts. For more information, see Indexing Strategies (page 343). Query Operations that Cannot Use Indexes Effectively Some query operations cannot use indexes effectively or cannot use indexes at all. Consider the following situations: The inequality operators $nin (page 771) and $ne (page 771) are not very selective, as they often match a large portion of the index. As a result, in most cases, a $nin (page 771) or $ne (page 771) query with an index may perform no better than a $nin (page 771) or $ne (page 771) query that must scan all documents in a collection. Queries that specify regular expressions, with inline JavaScript regular expressions or $regex (page 778) operator expressions, cannot use an index. However, the regular expression with anchors to the beginning of a string can use an index.
21.1.3 Cursors
The find() (page 928) method returns a cursor to the results; however, in the mongo (page 1040) shell, if the returned cursor is not assigned to a variable, then the cursor is automatically iterated up to 20 times 2 to print up to the rst 20 documents that match the query, as in the following example:
db.inventory.find( { type: food } );
When you assign the find() (page 928) to a variable: you can call the cursor variable in the shell to iterate up to 20 times 2 and print the matching documents, as in the following example:
var myCursor = db.inventory.find( { type: food } ); myCursor
you can use the cursor method next() (page 966) to access the documents, as in the following example:
2 You can use the DBQuery.shellBatchSize to change the number of iteration from the default value 20. See Executing Queries (page 588) for more information.
177
var myCursor = db.inventory.find( { type: food } ); var myDocument = myCursor.hasNext() ? myCursor.next() : null; if (myDocument) { var myItem = myDocument.item; print(tojson(myItem)); }
As an alternative print operation, consider the printjson() helper method to replace print(tojson()):
if (myDocument) { var myItem = myDocument.item; printjson(myItem); }
you can use the cursor method forEach() (page 962) to iterate the cursor and access the documents, as in the following example:
var myCursor = db.inventory.find( { type: food } );
myCursor.forEach(printjson);
See JavaScript cursor methods (page 955) and your driver (page 559) documentation for more information on cursor methods. Iterator Index In the mongo (page 1040) shell, you can use the toArray() (page 969) method to iterate the cursor and return the documents in an array, as in the following:
var myCursor = db.inventory.find( { type: food } ); var documentArray = myCursor.toArray(); var myDocument = documentArray[3];
The toArray() (page 969) method loads into RAM all documents returned by the cursor; the toArray() (page 969) method exhausts the cursor. Additionally, some drivers (page 559) provide access to the documents by using an index on the cursor (i.e. cursor[index]). This is a shortcut for rst calling the toArray() (page 969) method and then using an index on the resulting array. Consider the following example:
var myCursor = db.inventory.find( { type: food } ); var myDocument = myCursor[3];
Cursor Behaviors Consider the following behaviors related to cursors: By default, the server will automatically close the cursor after 10 minutes of inactivity or if client has exhausted the cursor. To override this behavior, you can specify the noTimeout wire protocol ag in your query; however, you should either close the cursor manually or exhaust the cursor. In the mongo (page 1040) shell, you can set the noTimeout ag:
178
See your driver (page 559) documentation for information on setting the noTimeout ag. See Cursor Flags (page 179) for a complete list of available cursor ags. Because the cursor is not isolated during its lifetime, intervening write operations may result in a cursor that returns a single document 3 more than once. To handle this situation, see the information on snapshot mode (page 725). The MongoDB server returns the query results in batches: For most queries, the rst batch returns 101 documents or just enough documents to exceed 1 megabyte. Subsequent batch size is 4 megabytes. To override the default size of the batch, see batchSize() (page 956) and limit() (page 963). For queries that include a sort operation without an index, the server must load all the documents in memory to perform the sort and will return all documents in the rst batch. Batch size will not exceed the maximum BSON document size (page 1109). As you iterate through the cursor and reach the end of the returned batch, if there are more results, cursor.next() (page 966) will perform a getmore operation (page 977) to retrieve the next batch. To see how many documents remain in the batch as you iterate the cursor, you can use the objsLeftInBatch() (page 967) method, as in the following example:
var myCursor = db.inventory.find(); var myFirstDocument = myCursor.hasNext() ? myCursor.next() : null; myCursor.objsLeftInBatch();
You can use the command cursorInfo (page 881) to retrieve the following information on cursors: total number of open cursors size of the client cursors in current use number of timed out cursors since the last server restart Consider the following example:
db.runCommand( { cursorInfo: 1 } )
Cursor Flags The mongo (page 1040) shell provides the following cursor ags: DBQuery.Option.tailable
A single document relative to value of the _id eld. A cursor cannot return the same document more than once if the document has not changed.
3
179
DBQuery.Option.slaveOk DBQuery.Option.oplogReplay DBQuery.Option.noTimeout DBQuery.Option.awaitData DBQuery.Option.exhaust DBQuery.Option.partial Aggregation Changed in version 2.2. MongoDB can perform some basic data aggregation operations on results before returning data to the application. These operations are not queries; they use database commands rather than queries, and they do not return a cursor. However, they still require MongoDB to read data. Running aggregation operations on the database side can be more efcient than running them in the application layer and can reduce the amount of data MongoDB needs to send to the application. These aggregation operations include basic grouping, counting, and even processing data using a map reduce framework. Additionally, in 2.2 MongoDB provides a complete aggregation framework for more rich aggregation operations. The aggregation framework provides users with a pipeline like framework: documents enter from a collection and then pass through a series of steps by a sequence of pipeline operators (page 293) that manipulate and transform the documents until theyre output at the end. The aggregation framework is accessible via the aggregate (page 812) command or the db.collection.aggregate() (page 922) helper in the mongo (page 1040) shell. For more information on the aggregation framework see Aggregation (page 255). Additionally, MongoDB provides a number of simple data aggregation operations for more basic data aggregation operations: count (page 812) (count() (page 956)) distinct (page 813) (db.collection.distinct() (page 924)) group (page 814) (db.collection.group() (page 936)) mapReduce (page 818). (Also consider mapReduce() (page 941) and Map-Reduce (page 313).)
21.1.4 Architecture
Read Operations from Sharded Clusters Sharded clusters allow you to partition a data set among a cluster of mongod (page 1025) in a way that is nearly transparent to the application. See the Sharding (page 485) section of this manual for additional information about these deployments. For a sharded cluster, you issue all operations to one of the mongos (page 1036) instances associated with the cluster. mongos (page 1036) instances route operations to the mongod (page 1025) in the cluster and behave like mongod (page 1025) instances to the application. Read operations to a sharded collection in a sharded cluster are largely the same as operations to a replica set or standalone instances. See the section on Read Operations in Sharded Clusters (page 492) for more information. In sharded deployments, the mongos (page 1036) instance routes the queries from the clients to the mongod (page 1025) instances that hold the data, using the cluster metadata stored in the cong database (page 502).
180
For sharded collections, if queries do not include the shard key (page 487), the mongos (page 1036) must direct the query to all shards in a collection. These scatter gather queries can be inefcient, particularly on larger clusters, and are unfeasible for routine operations. For more information on read operations in sharded clusters, consider the following resources: An Introduction to Shard Keys (page 487) Shard Key Internals and Operations (page 495) Querying Sharded Clusters (page 496) mongos Operational Overview (page 492) Read Operations from Replica Sets Replica sets use read preferences to determine where and how to route read operations to members of the replica set. By default, MongoDB always reads data from a replica sets primary. You can modify that behavior by changing the read preference mode (page 404). You can congure the read preference mode (page 404) on a per-connection or per-operation basis to allow reads from secondaries to: reduce latency in multi-data-center deployments, improve read throughput by distributing high read-volumes (relative to write volume), for backup operations, and/or to allow reads during failover (page 391) situations. Read operations from secondary members of replica sets are not guaranteed to reect the current state of the primary, and the state of secondaries will trail the primary by some amount of time. Often, applications dont rely on this kind of strict consistency, but application developers should always consider the needs of their application before setting read preference. For more information on read preference or on the read preference modes, see Read Preference (page 404) and Read Preference Modes (page 404).
181
For information on specic methods used to perform write operations in the mongo (page 1040) shell, see the following: db.collection.insert() (page 939) db.collection.update() (page 951) db.collection.save() (page 949) db.collection.findAndModify() (page 929) db.collection.remove() (page 948) For information on how to perform write operations from within an application, see the MongoDB Drivers and Client Libraries (page 559) documentation or the documentation for your client library.
182
For more information see your driver documentation (page 559) for details on performing bulk inserts in your application. Also consider the following resources: Sharded Clusters (page 185), Strategies for Bulk Inserts in Sharded Clusters (page 527), and Import and Export MongoDB Data (page 107).
21.2.4 Indexing
After every insert, update, or delete operation, MongoDB must update every index associated with the collection in addition to the data itself. Therefore, every index on a collection adds some amount of overhead for the performance of write operations. 4 In general, the performance gains that indexes provide for read operations are worth the insertion penalty; however, when optimizing write performance, be careful when creating new indexes and always evaluate the indexes on the collection and ensure that your queries are actually using these indexes. For more information on indexes in MongoDB consider Indexes (page 329) and Indexing Strategies (page 343).
21.2.5 Isolation
When a single write operation modies multiple documents, the operation as a whole is not atomic, and other operations may interleave. The modication of a single document, or record, is always atomic, even if the write operation modies multiple sub-document within the single record. No other operations are atomic; however, you can attempt to isolate a write operation that affects multiple documents using the isolation operator (page 800). To isolate a sequence of write operations from other read and write operations, see Perform Two Phase Commits (page 567).
21.2.6 Updates
Each document in a MongoDB collection has allocated record space which includes the entire document and a small amount of padding. This padding makes it possible for update operations to increase the size of a document slightly without causing the document to outgrow the allocated record size. Documents in MongoDB can grow up to the full maximum BSON document size (page 1109). However, when documents outgrow their allocated record size MongoDB must allocate a new record and move the document to the new record. Update operations that do not cause a document to grow, (i.e. in-place updates,) are signicantly more efcient than those updates that cause document growth. Use data models (page 235) that minimize the need for document growth when possible. For complete examples of update operations, see Update (page 221).
183
To minimize document movements, MongoDB employs padding. MongoDB adaptively learns if documents in a collection tend to grow, and if they do, adds a paddingFactor (page 878) so that the documents have room to grow on subsequent writes. The paddingFactor (page 878) indicates the padding for new inserts and moves. New in version 2.2: You can use the collMod (page 861) command with the usePowerOf2Sizes (page 861) ag so that MongoDB allocates document space in sizes that are powers of 2. This helps ensure that MongoDB can efciently reuse the space freed as a result of deletions or document relocations. As with all padding, using document space allocations with power of 2 sizes minimizes, but does not eliminate, document movements. To check the current paddingFactor (page 878) on a collection, you can run the db.collection.stats() (page 950) operation in the mongo (page 1040) shell, as in the following example:
db.myCollection.stats()
Since MongoDB writes each document at a different point in time, the padding for each document will not be the same. You can calculate the padding size by subtracting 1 from the paddingFactor (page 878), for example:
padding size = (paddingFactor - 1) * <document size>.
For example, a paddingFactor (page 878) of 1.0 species no padding whereas a paddingFactor of 1.5 species a padding size of 0.5 or 50 percent (50%) of the document size. Because the paddingFactor (page 878) is relative to the size of each document, you cannot calculate the exact amount of padding for a collection based on the average document size and padding factor. If an update operation causes the document to decrease in size, for instance if you perform an $unset (page 792) or a $pop (page 794) update, the document remains in place and effectively has more padding. If the document remains this size, the space is not reclaimed until you perform a compact (page 862) or a repairDatabase (page 872) operation. Note: The following operations remove padding: compact (page 862), repairDatabase (page 872), and initial replica sync operations. However, with the compact (page 862) command, you can run the command with a paddingFactor or a paddingBytes parameter. Padding is also removed if you use mongoexport (page 1063) from a collection. If you use mongoimport (page 1060) into a new collection, mongoimport (page 1060) will not add padding. If you use mongoimport (page 1060) with an existing collection with padding, mongoimport (page 1060) will not affect the existing padding. When a database operation removes padding, subsequent update that require changes in record sizes will have reduced throughput until the collections padding factor grows. Padding does not affect in-place, and after compact (page 862), repairDatabase (page 872), and replica set initial sync the collection will require less storage. See also: Can I manually pad documents to prevent moves during updates? (page 726) The $inc (page 788) for in-place updates.
184
21.2.8 Architecture
Replica Sets In replica sets, all write operations go to the sets primary, which applies the write operation then records the operations on the primarys operation log or oplog. The oplog is a reproducible sequence of operations to the data set. Secondary members of the set are continuously replicating the oplog and applying the operations to themselves in an asynchronous process. Large volumes of write operations, particularly bulk operations, may create situations where the secondary members have difculty applying the replicating operations from the primary at a sufcient rate: this can cause the secondarys state to fall behind that of the primary. Secondaries that are signicantly behind the primary present problems for normal operation of the replica set, particularly failover (page 391) in the form of rollbacks (page 392) as well as general read consistency (page 392). To help avoid this issue, you can customize the write concern (page 182) to return conrmation of the write operation to another member 5 of the replica set every 100 or 1,000 operations. This provides an opportunity for secondaries to catch up with the primary. Write concern can slow the overall progress of write operations but ensure that the secondaries can maintain a largely current state with respect to the primary. For more information on replica sets and write operations, see Replica Acknowledged (page 401), Oplog (page 394), Oplog Internals (page 410), and Change the Size of the Oplog (page 437). Sharded Clusters In a sharded cluster, MongoDB directs a given write operation to a shard and then performs the write on a particular chunk on that shard. Shards and chunks are range-based. Shard keys affect how MongoDB distributes documents among shards. Choosing the correct shard key can have a great impact on the performance, capability, and functioning of your database and cluster. For more information, see Sharded Cluster Administration (page 505) and Bulk Inserts (page 182).
185
null, which indicates the write operations have completed successfully, or a description of the last error encountered. The denition of a successful write depends on the arguments specied to getLastError (page 834), or in replica sets, the conguration of getLastErrorDefaults (page 468). When deciding the level of write concern for your application, see the introduction to Write Concern (page 400). The getLastError (page 834) command has the following options to congure write concern requirements: j or journal option This option conrms that the mongod (page 1025) instance has written the data to the on-disk journal and ensures data is not lost if the mongod (page 1025) instance shuts down unexpectedly. Set to true to enable, as shown in the following example:
db.runCommand( { getLastError: 1, j: "true" } )
If you set journal (page 1087) to true, and the mongod (page 1025) does not have journaling enabled, as with nojournal (page 1088), then getLastError (page 834) will provide basic receipt acknowledgment, and will include a jnote eld in its return document. w option This option provides the ability to disable write concern entirely as well as species the write concern operations for replica sets. See Write Concern Considerations (page 400) for an introduction to the fundamental concepts of write concern. By default, the w option is set to 1, which provides basic receipt acknowledgment on a single mongod (page 1025) instance or on the primary in a replica set. The w option takes the following values: -1: Disables all acknowledgment of write operations, and suppresses all errors, including network and socket errors. 0: Disables basic acknowledgment of write operations, but returns information about socket exceptions and networking errors to the application. Note: If you disable basic write operation acknowledgment but require journal commit acknowledgment, the journal commit prevails, and the driver will require that mongod (page 1025) will acknowledge the write operation. 1: Provides acknowledgment of write operations on a standalone mongod (page 1025) or the primary in a replica set. A number greater than 1: Guarantees that write operations have propagated successfully to the specied number of replica set members including the primary. If you set w to a number that is greater than the number of set members that hold data, MongoDB waits for the non-existent members to become available, which means MongoDB blocks indenitely. majority: Conrms that write operations have propagated to the majority of congured replica set: a majority of the sets congured members must acknowledge the write operation before it succeeds. This ensures that write operation will never be subject to a rollback in the course of normal operation, and furthermore allows you to avoid hard coding assumptions about the size of your replica set into your application. 186 Chapter 21. Read and Write Operations in MongoDB
A tag set: By specifying a tag set (page 457) you can have ne-grained control over which replica set members must acknowledge a write operation to satisfy the required level of write concern. getLastError (page 834) also supports a wtimeout setting which allows clients to specify a timeout for the write concern: if you dont specify wtimeout and the mongod (page 1025) cannot fulll the write concern the getLastError (page 834) will block, potentially forever. For more information on write concern and replica sets, see Write Concern for Replica Sets (page 401) for more information. In sharded clusters, mongos (page 1036) instances will pass write concern on to the shard mongod (page 1025) instances.
187
188
CHAPTER 22
22.1.1 Structure
The document structure in MongoDB are BSON objects with support for the full range of BSON types; however, BSON documents are conceptually, similar to JSON objects, and have the following structure:
{ field1: field2: field3: ... fieldN: } value1, value2, value3, valueN
Having support for the full range of BSON types, MongoDB documents may contain eld and value pairs where the value can be another document, an array, an array of documents as well as the basic types such as Double, String, and Date. See also BSON Type Considerations (page 194). Consider the following document that contains values of varying types:
189
var mydoc = { _id: ObjectId("5099803df3f4948bd2f98391"), name: { first: "Alan", last: "Turing" }, birth: new Date(Jun 23, 1912), death: new Date(Jun 07, 1954), contribs: [ "Turing machine", "Turing test", "Turingery" ], views : NumberLong(1250000) }
The document contains the following elds: _id that holds an ObjectId. name that holds a subdocument that contains the elds first and last. birth and death, which both have Date types. contribs that holds an array of strings. views that holds a value of NumberLong type. All eld names are strings in BSON documents. Be aware that there are some restrictions on field names (page 1112) for BSON documents: eld names cannot contain null characters, dots (.), or dollar signs ($). Note: BSON documents may have more than one eld with the same name; however, most MongoDB Interfaces (page 559) represent MongoDB with a structure (e.g. a hash table) that does not support duplicate eld names. If you need to manipulate documents that have more than one eld with the same name, see your drivers documentation for more information. Some documents created by internal MongoDB processes may have duplicate elds, but no MongoDB process will ever add duplicate keys to an existing user document.
Type Operators To determine the type of elds, the mongo (page 1040) shell provides the following operators: instanceof returns a boolean to test if a value has a specic type. typeof returns the type of a eld. Example Consider the following operations using instanceof and typeof: The following operation tests whether the _id eld is of type ObjectId:
mydoc._id instanceof ObjectId
The operation returns true. The following operation returns the type of the _id eld:
typeof mydoc._id
In this case typeof will return the more generic object type rather than ObjectId type.
Dot Notation MongoDB uses the dot notation to access the elements of an array and to access the elds of a subdocument. 190 Chapter 22. Fundamental Concepts for Document Databases
To access an element of an array by the zero-based index position, you concatenate the array name with the dot (.) and zero-based index position:
<array>.<index>
To access a eld of a subdocument with dot-notation, you concatenate the subdocument name with the dot (.) and the eld name:
<subdocument>.<field>
See also: Subdocuments (page 171) for dot notation examples with subdocuments. Arrays (page 172) for dot notation examples with arrays.
191
] }
The document contains the following elds: _id, which must hold a unique value and is immutable. name that holds another document. This sub-document contains the elds first and last, which both hold strings. birth and death that both have date types. contribs that holds an array of strings. awards that holds an array of documents. Consider the following behavior and constraints of the _id eld in MongoDB documents: In documents, the _id eld is always indexed for regular collections. The _id eld may contain values of any BSON data type other than an array. Warning: To ensure functioning replication, do not store values that are of the BSON regular expression type in the _id eld. Consider the following options for the value of an _id eld: Use an ObjectId. See the ObjectId (page 196) documentation. Although it is common to assign ObjectId values to _id elds, if your objects have a natural unique identier, consider using that for the value of _id to save space and to avoid an additional index. Generate a sequence number for the documents in your collection in your application and use this value for the _id value. See the Create an Auto-Incrementing Sequence Field (page 576) tutorial for an implementation pattern. Generate a UUID in your application code. For a more efcient storage of the UUID values in the collection and in the _id index, store the UUID as a value of the BSON BinData type. Index keys that are of the BinData type are more efciently stored in the index if: the binary subtype value is in the range of 0-7 or 128-135, and the length of the byte array is: 0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 16, 20, 24, or 32. Use your drivers BSON UUID facility to generate UUIDs. Be aware that driver implementations may implement UUID serialization and deserialization logic differently, which may not be fully compatible with other drivers. See your driver documentation for information concerning UUID interoperability. Query Specication Documents Query documents specify the conditions that determine which records to select for read, update, and delete operations. You can use <field>:<value> expressions to specify the equality condition and query operator (page 767) expressions to specify additional conditions. When passed as an argument to methods such as the find() (page 928) method, the remove() (page 948) method, or the update() (page 951) method, the query document selects documents for MongoDB to return, remove, or update, as in the following:
192
db.bios.find( { _id: 1 } ) db.bios.remove( { _id: { $gt: 3 } } ) db.bios.update( { _id: 1, name: { first: John, last: Backus } }, <update>, <options> )
See also: Query Document (page 170) and Read (page 211) for more examples on selecting documents for reads. Update (page 221) for more examples on selecting documents for updates. Delete (page 229) for more examples on selecting documents for deletes. Update Specication Documents Update documents specify the data modications to perform during an update() (page 951) operation to modify existing records in a collection. You can use update operators (page 788) to specify the exact actions to perform on the document elds. Consider the update document example:
{ $set: { name.middle: Warner }, $push: { awards: { award: IBM Fellow, year: 1963, by: IBM } } }
When passed as an argument to the update() (page 951) method, the update actions document: Modies the eld name whose value is another document. Specically, the $set (page 792) operator updates the middle eld in the name subdocument. The document uses dot notation (page 190) to access a eld in a subdocument. Adds an element to the eld awards whose value is an array. Specically, the $push (page 796) operator adds another document as element to the eld awards.
db.bios.update( { _id: 1 }, { $set: { name.middle: Warner }, $push: { awards: { award: IBM Fellow, year: 1963, by: IBM } } } )
See also: update operators (page 788) page for the available update operators and syntax. update (page 221) for more examples on update documents. For additional examples of updates that involve array elements, including where the elements are documents, see the $ (page 792) positional operator.
193
Index Specication Documents Index specication documents describe the elds to index on during the index creation (page 925). See indexes (page 331) for an overview of indexes. 1 Index documents contain eld and value pairs, in the following form:
{ field: value }
field is the eld in the documents to index. value is either 1 for ascending or -1 for descending. The following document species the multi-key index (page 334) on the _id eld and the last eld contained in the subdocument name eld. The document uses dot notation (page 190) to access a eld in a subdocument:
{ _id: 1, name.last: 1 }
When passed as an argument to the ensureIndex() (page 925) method, the index documents species the index to create:
db.bios.ensureIndex( { _id: 1, name.last: 1 } )
Sort Order Specication Documents Sort order documents specify the order of documents that a query() (page 928) returns. Pass sort order specication documents as an argument to the sort() (page 968) method. See the sort() (page 968) page for more information on sorting. The sort order documents contain eld and value pairs, in the following form:
{ field: value }
field is the eld by which to sort documents. value is either 1 for ascending or -1 for descending. The following document species the sort order using the elds from a sub-document name rst sort by the last eld ascending, then by the first eld also ascending:
{ name.last: 1, name.first: 1 }
When passed as an argument to the sort() (page 968) method, the sort order document sorts the results of the find() (page 928) method:
db.bios.find().sort( { name.last: 1, name.first: 1 } )
Indexes optimize a number of key read (page 169) and write (page 181) operations.
194
String BSON strings are UTF-8. In general, drivers for each programming language convert from the languages string format to UTF-8 when serializing and deserializing BSON. This makes it possible to store most international characters in BSON strings with ease. 2 In addition, MongoDB $regex (page 778) queries support UTF-8 in the regex string. Timestamps BSON has a special timestamp type for internal MongoDB use and is not associated with the regular Date (page 195) type. Timestamp values are a 64 bit value where: the rst 32 bits are a time_t value (seconds since the Unix epoch) the second 32 bits are an incrementing ordinal for operations within a given second. Within a single mongod (page 1025) instance, timestamp values are always unique. In replication, the oplog has a ts eld. The values in this eld reect the operation time, which uses a BSON timestamp value. Note: The BSON Timestamp type is for internal MongoDB use. For most cases, in application development, you will want to use the BSON date type. See Date (page 195) for more information. If you create a BSON Timestamp using the empty constructor (e.g. new Timestamp()), MongoDB will only generate a timestamp if you use the constructor in the rst eld of the document. 3 Otherwise, MongoDB will generate an empty timestamp value (i.e. Timestamp(0, 0).) Changed in version 2.1: mongo (page 1040) shell displays the Timestamp value with the wrapper:
Timestamp(<time_t>, <ordinal>)
Prior to version 2.1, the mongo (page 1040) shell display the Timestamp value as a document:
{ t : <time_t>, i : <ordinal> }
Date BSON Date is a 64-bit integer that represents the number of milliseconds since the Unix epoch (Jan 1, 1970). The ofcial BSON specication refers to the BSON Date type as the UTC datetime. Changed in version 2.0: BSON Date type is signed. Consider the following examples of BSON Date: Construct a Date using the new Date() constructor in the mongo (page 1040) shell:
var mydate1 = new Date()
4
Construct a Date using the ISODate() constructor in the mongo (page 1040) shell:
var mydate2 = ISODate()
2 Given strings using UTF-8 character sets, using sort() (page 968) on strings will be reasonably correct; however, because internally sort() (page 968) uses the C++ strcmp api, the sort order may handle some characters incorrectly. 3 If the rst eld in the document is _id, then you can generate a timestamp in the second eld of a document. 4 Prior to version 2.0, Date values were incorrectly interpreted as unsigned integers, which affected sorts, range queries, and indexes on Date elds. Because indexes are not recreated when upgrading, please re-index if you created an index on Date values with an earlier version, and dates before 1970 are relevant to your application.
195
Return the month portion of the Date value; months are zero-indexed, so that January is month 0:
mydate1.getMonth()
22.2 ObjectId
22.2.1 Overview
ObjectId is a 12-byte BSON type, constructed using: a 4-byte value representing the seconds since the Unix epoch, a 3-byte machine identier, a 2-byte process id, and a 3-byte counter, starting with a random value. In MongoDB, documents stored in a collection require a unique _id eld that acts as a primary key. Because ObjectIds are small, most likely unique, and fast to generate, MongoDB uses ObjectIds as the default value for the _id eld if the _id eld is not specied; i.e., the mongod (page 1025) adds the _id eld and generates a unique ObjectId to assign as its value. Using ObjectIds for the _id eld, provides the following additional benets: in the mongo (page 1040) shell, you can access the creation time of the ObjectId, using the getTimestamp() method. sorting on an _id eld that stores ObjectId values is roughly equivalent to sorting by creation time. Important: The relationship between the order of ObjectId values and generation time is not strict within a single second. If multiple systems, or multiple processes or threads on a single system generate values, within a single second; ObjectId values do not represent a strict insertion order. Clock skew between clients can also result in non-strict ordering even for values, because client drivers generate ObjectId values, not the mongod (page 1025) process. Also consider the BSON Documents (page 189) section for related information on MongoDBs document orientation.
22.2.2 ObjectId()
The mongo (page 1040) shell provides the ObjectId() wrapper class to generate a new ObjectId, and to provide the following helper attribute and methods: str The hexadecimal string value of the ObjectId() object. getTimestamp() Returns the timestamp portion of the ObjectId() object as a Date. toString()
196
Returns the string representation of the ObjectId() object. The returned string literal has the format ObjectId(...). Changed in version 2.2: In previous versions ObjectId.toString() returns the value of the ObjectId as a hexadecimal string. valueOf() Returns the value of the ObjectId() object as a hexadecimal string. The returned string is the str attribute. Changed in version 2.2: In previous versions ObjectId.valueOf() returns the ObjectId() object.
22.2.3 Examples
Consider the following uses ObjectId() class in the mongo (page 1040) shell: To generate a new ObjectId, use the ObjectId() constructor with no argument:
x = ObjectId()
To generate a new ObjectId using the ObjectId() constructor with a unique hexadecimal string:
y = ObjectId("507f191e810c19729de860ea")
To return the timestamp of an ObjectId() object, use the getTimestamp() method as follows:
ObjectId("507f191e810c19729de860ea").getTimestamp()
To return the string representation of an ObjectId() object, use the toString() method as follows:
ObjectId("507f191e810c19729de860ea").toString()
To return the value of an ObjectId() object as a hexadecimal string, use the valueOf() method as follows:
ObjectId("507f191e810c19729de860ea").valueOf()
22.2. ObjectId
197
22.3 GridFS
GridFS is a specication for storing and retrieving les that exceed the BSON -document size limit (page 1109) of 16MB. Instead of storing a le in a single document, GridFS divides a le into parts, or chunks, 5 and stores each of those chunks as a separate document. By default GridFS limits chunk size to 256k. GridFS uses two collections to store les. One collection stores the le chunks, and the other stores le metadata. When you query a GridFS store for a le, the driver or client will reassemble the chunks as needed. You can perform range queries on les stored through GridFS. You also can access information from arbitrary sections of les, which allows you to skip into the middle of a video or audio le. GridFS is useful not only for storing les that exceed 16MB but also for storing any les for which you want access without having to load the entire le into memory. For more information on the indications of GridFS, see When should I use GridFS? (page 720).
The use of the term chunks in the context of GridFS is not related to the use of the term chunks in the context of sharding.
198
See the relevant driver (page 559) documentation for the specic behavior of your GridFS application. If your driver does not create this index, issue the following operation using the mongo (page 1040) shell:
db.fs.chunks.ensureIndex( { files_id: 1, n: 1 }, { unique: true } );
Optionally, interfaces may support other additional GridFS buckets as in the following example:
// returns GridFS bucket named "contracts" GridFS myContracts = new GridFS(myDatabase, "contracts"); // retrieve GridFS object "smithco" GridFSDBFile file = myContracts.findOne("smithco"); // saves the GridFS file to the file system file.writeTo(new File("/tmp/smithco.pdf"));
199
queries to return the referenced documents. Many drivers (page 559) have helper methods that form the query for the DBRef automatically. The drivers 6 do not automatically resolve DBRefs into documents. Use a DBRef when you need to embed documents from multiple collections in documents from one collection. DBRefs also provide a common format and type to represent these relationships among documents. The DBRef format provides common semantics for representing links between documents if your database must interact with multiple frameworks and tools. Unless you have a compelling reason for using a DBRef, use manual references.
Then, when a query returns the document from the people collection you can, if needed, make a second query for the document referenced by the places_id eld in the places collection. Use For nearly every case where you want to store a relationship between two documents, use manual references (page 1119). The references are simple to create and your application can resolve references as needed. The only limitation of manual linking is that these references do not convey the database and collection name. If you have documents in a single collection that relate to documents in more than one collection, you may need to consider using DBRefs (page 1120).
6
Some community supported drivers may have alternate behavior and may resolve a DBRef into a document automatically.
200
22.4.2 DBRefs
Background DBRefs are a convention for representing a document, rather than a specic reference type. They include the name of the collection, and in some cases the database, in addition to the value from the _id eld. Format DBRefs have the following elds: $ref The $ref eld holds the name of the collection where the referenced document resides. $id The $id eld contains the value of the _id eld in the referenced document. $db Optional. Contains the name of the database where the referenced document resides. Only some drivers support $db references. Example DBRef document would resemble the following:
{ "$ref" : <value>, "$id" : <value>, "$db" : <value> }
The DBRef in this example, points to a document in the creators collection of the users database that has ObjectId("5126bc054aed4daf9e2ab772") in its _id eld. Note: The order of elds in the DBRef matters, and you must use the above sequence when using a DBRef.
Support C++ The C++ driver contains no support for DBRefs. You can transverse references manually. C# The C# driver provides access to DBRef objects with the MongoDBRef Class and supplies the FetchDBRef Method for accessing these objects. Java The DBRef class provides supports for DBRefs from Java. JavaScript The mongo (page 1040) shells JavaScript (page 921) interface provides a DBRef.
201
Perl The Perl driver contains no support for DBRefs. You can transverse references manually or use the MongoDBx::AutoDeref CPAN module. PHP The PHP driver does support DBRefs, including the optional $db reference, through The MongoDBRef class. Python The Python driver provides the DBRef class, and the dereference method for interacting with DBRefs. Ruby The Ruby Driver supports DBRefs using the DBRef class and the deference method. Use In most cases you should use the manual reference (page 1119) method for connecting two or more related documents. However, if you need to reference documents from multiple collections, consider a DBRef.
202
CHAPTER 23
These documents provide an overview and examples of common database operations, i.e. CRUD, in MongoDB.
23.1 Create
Of the four basic database operations (i.e. CRUD), create operations are those that add new records or documents to a collection in MongoDB. For general information about write operations and the factors that affect their performance, see Write Operations (page 181); for documentation of the other CRUD operations, see the Core MongoDB Operations (CRUD) (page 167) page. Overview (page 203) insert() (page 204) Insert the First Document in a Collection (page 204) Insert a Document without Specifying an _id Field (page 205) Bulk Insert Multiple Documents (page 207) Insert a Document with save() (page 208) update() Operations with the upsert Flag (page 209) Insert a Document that Contains field and value Pairs (page 209) Insert a Document that Contains Update Operator Expressions (page 210) Update operations with save() (page 211)
23.1.1 Overview
You can create documents in a MongoDB collection using any of the following basic operations: insert (page 204) updates with the upsert option (page 209) All insert operations in MongoDB exhibit the following properties: If you attempt to insert a document without the _id eld, the client library or the mongod (page 1025) instance will add an _id eld and populate the eld with a unique ObjectId. For operations with write concern (page 400), if you specify an _id eld, the _id eld must be unique within the collection; otherwise the mongod (page 1025) will return a duplicate key exception. The maximum BSON document size is 16 megabytes.
203
The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB provides the GridFS API. See mongofiles (page 1079) and the documentation for your driver (page 559) for more information about GridFS. Documents (page 189) have the following restrictions on eld names: The eld name _id is reserved for use as a primary key; its value must be unique in the collection, is immutable, and may be of any type other than an array. The eld names cannot start with the $ character. The eld names cannot contain the . character. Note: As of these driver versions (page 1189), all write operations will issue a getLastError (page 834) command to conrm the result of the write operation:
{ getLastError: 1 }
Refer to the documentation on write concern (page 182) in the Write Operations (page 181) document for more information.
23.1.2 insert()
The insert() (page 939) is the primary method to insert a document or documents into a MongoDB collection, and has the following syntax:
db.collection.insert( <document> )
Corresponding Operation in SQL The insert() (page 939) method is analogous to the INSERT statement.
Insert the First Document in a Collection If the collection does not exist 1 , then the insert() (page 939) method creates the collection during the rst insert. Specically in the example, if the collection bios does not exist , then the insert operation will create this collection:
db.bios.insert( { _id: 1, name: { first: John, last: Backus }, birth: new Date(Dec 03, 1924), death: new Date(Mar 17, 2007), contribs: [ Fortran, ALGOL, Backus-Naur Form, FP ], awards: [ { award: W.W. McDowell Award, year: 1967, by: IEEE Computer Society }, { award: National Medal of Science, year: 1975,
1
You can also view a list of the existing collections in the database using the show collections operation in the mongo (page 1040) shell.
204
by: National Science Foundation }, { award: Turing Award, year: 1977, by: ACM }, { award: Draper Prize, year: 1993, by: National Academy of Engineering } ] } )
You can conrm the insert by querying (page 211) the bios collection:
db.bios.find()
This operation returns the following document from the bios collection:
{ "_id" : 1, "name" : { "first" : "John", "last" : "Backus" }, "birth" : ISODate("1924-12-03T05:00:00Z"), "death" : ISODate("2007-03-17T04:00:00Z"), "contribs" : [ "Fortran", "ALGOL", "Backus-Naur Form", "FP" ], "awards" : [ { "award" : "W.W. McDowell Award", "year" : 1967, "by" : "IEEE Computer Society" }, { "award" : "National Medal of Science", "year" : 1975, "by" : "National Science Foundation" }, { "award" : "Turing Award", "year" : 1977, "by" : "ACM" }, { "award" : "Draper Prize", "year" : 1993, "by" : "National Academy of Engineering" } ] }
Insert a Document without Specifying an _id Field If the new document does not contain an _id eld, then the insert() (page 939) method adds the _id eld to the document and generates a unique ObjectId for the value:
23.1. Create
205
db.bios.insert( { name: { first: John, last: McCarthy }, birth: new Date(Sep 04, 1927), death: new Date(Dec 24, 2011), contribs: [ Lisp, Artificial Intelligence, ALGOL ], awards: [ { award: Turing Award, year: 1971, by: ACM }, { award: Kyoto Prize, year: 1988, by: Inamori Foundation }, { award: National Medal of Science, year: 1990, by: National Science Foundation } ] } )
You can verify the inserted document by the querying the bios collection:
db.bios.find( { name: { first: John, last: McCarthy } } )
The returned document contains an _id eld with the generated ObjectId value:
{ "_id" : ObjectId("50a1880488d113a4ae94a94a"), "name" : { "first" : "John", "last" : "McCarthy" }, "birth" : ISODate("1927-09-04T04:00:00Z"), "death" : ISODate("2011-12-24T05:00:00Z"), "contribs" : [ "Lisp", "Artificial Intelligence", "ALGOL" ], "awards" : [ { "award" : "Turing Award", "year" : 1971, "by" : "ACM" }, { "award" : "Kyoto Prize", "year" :1988, "by" : "Inamori Foundation" }, { "award" : "National Medal of Science", "year" : 1990, "by" : "National Science Foundation" } ] }
206
Bulk Insert Multiple Documents If you pass an array of documents to the insert() (page 939) method, the insert() (page 939) performs a bulk insert into a collection. The following operation inserts three documents into the bios collection. The operation also illustrates the dynamic schema characteristic of MongoDB. Although the document with _id: 3 contains a eld title which does not appear in the other documents, MongoDB does not require the other documents to contain this eld:
db.bios.insert( [ { _id: 3, name: { first: Grace, last: Hopper }, title: Rear Admiral, birth: new Date(Dec 09, 1906), death: new Date(Jan 01, 1992), contribs: [ UNIVAC, compiler, FLOW-MATIC, COBOL ], awards: [ { award: Computer Sciences Man of the Year, year: 1969, by: Data Processing Management Association }, { award: Distinguished Fellow, year: 1973, by: British Computer Society }, { award: W. W. McDowell Award, year: 1976, by: IEEE Computer Society }, { award: National Medal of Technology, year: 1991, by: United States } ] }, { _id: 4, name: { first: Kristen, last: Nygaard }, birth: new Date(Aug 27, 1926), death: new Date(Aug 10, 2002), contribs: [ OOP, Simula ], awards: [ { award: Rosing Prize, year: 1999, by: Norwegian Data Association }, { award: Turing Award, year: 2001, by: ACM }, {
23.1. Create
207
award: IEEE John von Neumann Medal, year: 2001, by: IEEE } ] }, { _id: 5, name: { first: Ole-Johan, last: Dahl }, birth: new Date(Oct 12, 1931), death: new Date(Jun 29, 2002), contribs: [ OOP, Simula ], awards: [ { award: Rosing Prize, year: 1999, by: Norwegian Data Association }, { award: Turing Award, year: 2001, by: ACM }, { award: IEEE John von Neumann Medal, year: 2001, by: IEEE } ] } ] )
Insert a Document with save() The save() (page 949) method performs an insert if the document to save does not contain the _id eld. The following save() (page 949) operation performs an insert into the bios collection since the document does not contain the _id eld:
db.bios.save( { name: { first: Guido, last: van Rossum}, birth: new Date(Jan 31, 1956), contribs: [ Python ], awards: [ { award: Award for the Advancement of Free Software, year: 2001, by: Free Software Foundation }, { award: NLUUG Award, year: 2003, by: NLUUG } ]
208
} )
Insert a Document that Contains field and value Pairs If no document matches the <query> argument, the upsert performs an insert. If the <update> argument includes only eld and value pairs, the new document contains the elds and values specied in the <update> argument. If query does not include an _id eld, the operation adds the _id eld and generates a unique ObjectId for its value. The following update inserts a new document into the bios collection 2 :
db.bios.update( { name: { first: Dennis, last: Ritchie} }, { name: { first: Dennis, last: Ritchie}, birth: new Date(Sep 09, 1941), death: new Date(Oct 12, 2011), contribs: [ UNIX, C ], awards: [ { award: Turing Award, year: 1983, by: ACM }, { award: National Medal of Technology, year: 1998, by: United States }, { award: Japan Prize, year: 2011, by: The Japan Prize Foundation }
2 Prior to version 2.2, in the mongo (page 1040) shell, you would specify the upsert and the multi options in the update() (page 951) method as positional boolean options. See update() (page 951) for details.
23.1. Create
209
] }, { upsert: true } )
Insert a Document that Contains Update Operator Expressions If no document matches the <query> argument, the update operation inserts a new document. If the <update> argument includes only update operators (page 788), the new document contains the elds and values from <query> argument with the operations from the <update> argument applied. The following operation inserts a new document into the bios collection 2 :
db.bios.update( { _id: 7, name: { first: Ken, last: Thompson } }, { $set: { birth: new Date(Feb 04, 1943), contribs: [ UNIX, C, B, UTF-8 ], awards: [ { award: Turing Award, year: 1983, by: ACM }, { award: IEEE Richard W. Hamming Medal, year: 1990, by: IEEE }, { award: National Medal of Technology, year: 1998, by: United States }, { award: Tsutomu Kanai Award, year: 1999, by: IEEE }, { award: Japan Prize, year: 2011, by: The Japan Prize Foundation } ] } }, { upsert: true } )
210
Update operations with save() The save() (page 949) method is identical to an update operation with the upsert ag (page 209) performs an upsert if the document to save contains the _id eld. To determine whether to perform an insert or an update, save() (page 949) method queries documents on the _id eld. The following operation performs an upsert that inserts a document into the bios collection since no documents in the collection contains an _id eld with the value 10:
db.bios.save( { _id: 10, name: { first: Yukihiro, aka: Matz, last: Matsumoto}, birth: new Date(Apr 14, 1965), contribs: [ Ruby ], awards: [ { award: Award for the Advancement of Free Software, year: 2011, by: Free Software Foundation } ] } )
23.2 Read
Of the four basic database operations (i.e. CRUD), read operations are those that retrieve records or documents from a collection in MongoDB. For general information about read operations and the factors that affect their performance, see Read Operations (page 169); for documentation of the other CRUD operations, see the Core MongoDB Operations (CRUD) (page 167) page.
23.2. Read
211
Overview (page 212) find() (page 212) Return All Documents in a Collection (page 213) Return Documents that Match Query Conditions (page 214) * Equality Matches (page 214) * Using Operators (page 214) * Query for Ranges (page 214) * On Arrays (page 215) Query an Element (page 215) Query Multiple Fields on an Array of Documents (page 215) On Subdocuments (page 215) * Exact Matches (page 215) Fields of a Subdocument (page 216) Logical Operators (page 216) * OR Disjunctions (page 216) AND Conjunctions (page 216) With a Projection (page 217) * Specify the Fields to Return (page 217) * Explicitly Exclude the _id Field (page 217) * Return All but the Excluded Fields (page 217) * On Arrays and Subdocuments (page 217) Iterate the Returned Cursor (page 218) * With Variable Name (page 218) * With next() Method (page 218) * With forEach() Method (page 219) Modify the Cursor Behavior (page 219) * Order Documents in the Result Set (page 219) * Limit the Number of Documents to Return (page 219) * Set the Starting Point of the Result Set (page 220) * Combine Cursor Methods (page 220) findOne() (page 220) With Empty Query Specication (page 220) With a Query Specication (page 220) With a Projection (page 221) * Specify the Fields to Return (page 221) * Return All but the Excluded Fields (page 221) Access the findOne Result (page 221)
23.2.1 Overview
You can retrieve documents from MongoDB using either of the following methods: nd (page 212) ndOne (page 220)
23.2.2 find()
The find() (page 928) method is the primary method to select documents from a collection. The find() (page 928) method returns a cursor that contains a number of documents. Most drivers (page 559) provide application developers with a native iterable interface for handling cursors and accessing documents. The find() (page 928) method has the following syntax:
212
Corresponding Operation in SQL The find() (page 928) method is analogous to the SELECT statement, while: the <query> argument corresponds to the WHERE statement, and the <projection> argument corresponds to the list of elds to select from the result set. The examples refer to a collection named bios that contains documents with the following prototype:
{ "_id" : 1, "name" : { "first" : "John", "last" :"Backus" }, "birth" : ISODate("1924-12-03T05:00:00Z"), "death" : ISODate("2007-03-17T04:00:00Z"), "contribs" : [ "Fortran", "ALGOL", "Backus-Naur Form", "FP" ], "awards" : [ { "award" : "W.W. McDowellAward", "year" : 1967, "by" : "IEEE Computer Society" }, { "award" : "National Medal of Science", "year" : 1975, "by" : "National Science Foundation" }, { "award" : "Turing Award", "year" : 1977, "by" : "ACM" }, { "award" : "Draper Prize", "year" : 1993, "by" : "National Academy of Engineering" } ] }
Note: In the mongo (page 1040) shell, you can format the output by adding .pretty() to the find() (page 928) method call.
Return All Documents in a Collection If there is no <query> argument, the find() (page 928) method selects all documents from a collection. The following operation returns all documents (or more precisely, a cursor to all documents) in the bios collection:
db.bios.find()
23.2. Read
213
Return Documents that Match Query Conditions If there is a <query> argument, the find() (page 928) method selects all documents from a collection that satises the query specication.
Equality Matches
The following operation returns a cursor to documents in the bios collection where the eld _id equals 5:
db.bios.find( { _id: 5 } )
Using Operators
The following operation returns a cursor to all documents in the bios collection where the eld _id equals 5 or ObjectId("507c35dd8fada716c89d0013"):
db.bios.find( { _id: { $in: [ 5, } )
ObjectId("507c35dd8fada716c89d0013") ] }
This statement returns all documents with field between value1 and value2. Note: If the eld contains an array and the query has multiple conditional operators, the eld as a whole will match if either a single array element meets the conditions or a combination of array elements meet the conditions. Example Query a eld that contains an array. A collection students contains the following documents where the score eld contains an array of values:
{ "_id" : 1, "score" : [ -1, 3 ] } { "_id" : 2, "score" : [ 1, 5 ] } { "_id" : 3, "score" : [ 5, 5 ] }
214
In the document with _id equal to 1, the score: [ -1, 3 ] as a whole meets the specied conditions since the element -1 meets the $lt: 2 condition and the element 3 meets the $gt: 0 condition. In the document with _id equal to 2, the score: [ 1, 5 ] as a whole meets the specied conditions since the element 1 meets both the $lt: 2 condition and the $gt: 0 condition.
On Arrays
Query an Element The following operation returns a cursor to all documents in the bios collection where the array eld contribs contains the element UNIX:
db.bios.find( { contribs: UNIX } )
Query Multiple Fields on an Array of Documents The following operation returns a cursor to all documents in the bios collection where awards array contains a subdocument element that contains the award eld equal to Turing Award and the year eld greater than 1980:
db.bios.find( { awards: { $elemMatch: { award: Turing Award, year: { $gt: 1980 } } } } )
On Subdocuments
Exact Matches The following operation returns a cursor to all documents in the bios collection where the subdocument name is exactly { first: Yukihiro, last: Matsumoto }, including the order:
db.bios.find( { name: { first: Yukihiro, last: Matsumoto } } )
The name eld must match the sub-document exactly, including order. For instance, the query would not match documents with name elds that held either of the following values:
{ first: Yukihiro, aka: Matz, last: Matsumoto }
23.2. Read
215
Fields of a Subdocument The following operation returns a cursor to all documents in the bios collection where the subdocument name contains a eld first with the value Yukihiro and a eld last with the value Matsumoto; the query uses dot notation to access elds in a subdocument:
db.bios.find( { name.first: Yukihiro, name.last: Matsumoto } )
The query matches the document where the name eld contains a subdocument with the eld first with the value Yukihiro and a eld last with the value Matsumoto. For instance, the query would match documents with name elds that held either of the following values:
{ first: Yukihiro, aka: Matz, last: Matsumoto } { last: Matsumoto, first: Yukihiro }
Logical Operators
OR Disjunctions The following operation returns a cursor to all documents in the bios collection where either the eld first in the sub-document name starts with the letter G or where the eld birth is less than new Date(01/01/1945):
db.bios.find( { $or: [ { name.first : /^G/ }, { birth: { $lt: new Date(01/01/1945) } } ] } )
AND Conjunctions The following operation returns a cursor to all documents in the bios collection where the eld first in the subdocument name starts with the letter K and the array eld contribs contains the element UNIX:
db.bios.find( { name.first: /^K/, contribs: UNIX } )
216
In this query, the parameters (i.e. the selections of both elds) combine using an implicit logical AND for criteria on different elds contribs and name.first. For multiple AND criteria on the same eld, use the $and (page 772) operator. With a Projection If there is a <projection> argument, the find() (page 928) method returns only those elds as specied in the <projection> argument to include or exclude: Note: The _id eld is implicitly included in the <projection> argument. In projections that explicitly include elds, _id is the only eld that you can explicitly exclude. Otherwise, you cannot mix include eld and exclude eld specications.
The following operation nds all documents in the bios collection and returns only the name eld, the contribs eld, and the _id eld:
db.bios.find( { }, { name: 1, contribs: 1 } )
The following operation nds all documents in the bios collection and returns only the name eld and the contribs eld:
db.bios.find( { }, { name: 1, contribs: 1, _id: 0 } )
The following operation nds the documents in the bios collection where the contribs eld contains the element OOP and returns all elds except the _id eld, the first eld in the name subdocument, and the birth eld from the matching documents:
db.bios.find( { contribs: OOP }, { _id: 0, name.first: 0, birth: 0 } )
The following operation nds all documents in the bios collection and returns the last eld in the name subdocument and the rst two elements in the contribs array:
23.2. Read
217
See also: dot notation for information on reaching into embedded sub-documents. Arrays (page 172) for more examples on accessing arrays. Subdocuments (page 171) for more examples on accessing subdocuments. $elemMatch (page 787) query operator for more information on matching array elements. $elemMatch (page 802) projection operator for additional information on restricting array elements to return. Iterate the Returned Cursor The find() (page 928) method returns a cursor to the results; however, in the mongo (page 1040) shell, if the returned cursor is not assigned to a variable, then the cursor is automatically iterated up to 20 times 3 to print up to the rst 20 documents that match the query, as in the following example:
db.bios.find( { _id: 1 } );
When you assign the find() (page 928) to a variable, you can type the name of the cursor variable to iterate up to 20 times 1 and print the matching documents, as in the following example:
var myCursor = db.bios.find( { _id: 1 } ); myCursor
You can use the cursor method next() (page 966) to access the documents, as in the following example:
var myCursor = db.bios.find( { _id: 1 } ); var myDocument = myCursor.hasNext() ? myCursor.next() : null; if (myDocument) { var myName = myDocument.name; print (tojson(myName)); }
To print, you can also use the printjson() method instead of print(tojson()):
3 You can use the DBQuery.shellBatchSize to change the number of iteration from the default value 20. See Cursor Flags (page 179) and Cursor Behaviors (page 178) for more information.
218
You can use the cursor method forEach() (page 962) to iterate the cursor and access the documents, as in the following example:
var myCursor = db.bios.find( { _id: 1 } ); myCursor.forEach(printjson);
For more information on cursor handling, see: cursor.hasNext() (page 962) cursor.next() (page 966) cursor.forEach() (page 962) cursors (page 177) JavaScript cursor methods (page 955) Modify the Cursor Behavior In addition to the <query> and the <projection> arguments, the mongo (page 1040) shell and the drivers (page 559) provide several cursor methods that you can call on the cursor returned by find() (page 928) method to modify its behavior, such as:
Order Documents in the Result Set
The sort() (page 968) method orders the documents in the result set. The following operation returns all documents (or more precisely, a cursor to all documents) in the bios collection ordered by the name eld ascending:
db.bios.find().sort( { name: 1 } )
The limit() (page 963) method limits the number of documents in the result set. The following operation returns at most 5 documents (or more precisely, a cursor to at most 5 documents) in the bios collection:
db.bios.find().limit( 5 )
23.2. Read
219
The skip() (page 967) method controls the starting point of the results set. The following operation returns all documents, skipping the rst 5 documents in the bios collection:
db.bios.find().skip( 5 )
See the JavaScript cursor methods (page 955) reference and your driver (page 559) documentation for additional references. See Cursors (page 177) for more information regarding cursors.
23.2.3 findOne()
The findOne() (page 933) method selects a single document from a collection and returns that document. findOne() (page 933) does not return a cursor. The findOne() (page 933) method has the following syntax:
db.collection.findOne( <query>, <projection> )
Except for the return value, findOne() (page 933) method is quite similar to the find() (page 928) method; in fact, internally, the findOne() (page 933) method is the find() (page 928) method with a limit of 1. With Empty Query Specication If there is no <query> argument, the findOne() (page 933) method selects just one document from a collection. The following operation returns a single document from the bios collection:
db.bios.findOne()
With a Query Specication If there is a <query> argument, the findOne() (page 933) method selects the rst document from a collection that meets the <query> argument: The following operation returns the rst matching document from the bios collection where either the eld first in the subdocument name starts with the letter G or where the eld birth is less than new Date(01/01/1945):
db.bios.findOne( { $or: [ { name.first : /^G/ }, { birth: { $lt: new Date(01/01/1945) } }
Regardless of the order you chain the limit() (page 963) and the sort() (page 968), the request to the server has the structure that treats the query and the sort() (page 968) modier as a single object. Therefore, the limit() (page 963) operation method is always applied after the sort() (page 968) regardless of the specied order of the operations in the chain. See the meta query operators (page 805) for more information.
4
220
] } )
With a Projection You can pass a <projection> argument to findOne() (page 933) to control the elds included in the result set.
Specify the Fields to Return
The following operation nds a document in the bios collection and returns only the name eld, the contribs eld, and the _id eld:
db.bios.findOne( { }, { name: 1, contribs: 1 } )
The following operation returns a document in the bios collection where the contribs eld contains the element OOP and returns all elds except the _id eld, the first eld in the name subdocument, and the birth eld from the matching documents:
db.bios.findOne( { contribs: OOP }, { _id: 0, name.first: 0, birth: 0 } )
Access the findOne Result Although similar to the find() (page 928) method, because the findOne() (page 933) method returns a document rather than a cursor, you cannot apply the cursor methods such as limit() (page 963), sort() (page 968), and skip() (page 967) to the result of the findOne() (page 933) method. However, you can access the document directly, as in the example:
var myDocument = db.bios.findOne(); if (myDocument) { var myName = myDocument.name; print (tojson(myName)); }
23.3 Update
Of the four basic database operations (i.e. CRUD), update operations are those that modify existing records or documents in a MongoDB collection. For general information about write operations and the factors that affect their performance, see Write Operations (page 181); for documentation of other CRUD operations, see the Core MongoDB Operations (CRUD) (page 167) page. 23.3. Update 221
Overview (page 222) Update (page 222) Modify with Update Operators (page 223) * Update a Field in a Document (page 223) * Add a New Field to a Document (page 223) * Remove a Field from a Document (page 224) * Update Arrays (page 224) Update an Element by Specifying Its Position (page 224) Update an Element without Specifying Its Position (page 224) Update a Document Element without Specifying Its Position (page 224) Add an Element to an Array (page 225) Update Multiple Documents (page 225) * Replace Existing Document with New Document (page 225) update() Operations with the upsert Flag (page 226) Save (page 226) Behavior (page 227) Save Performs an Update (page 227) Update Operators (page 228) Fields (page 228) Array (page 228) * Operators (page 228) * Modiers (page 228) Bitwise (page 228) Isolation (page 229)
23.3.1 Overview
Update operation modies an existing document or documents in a collection. MongoDB provides the following methods to perform update operations: update (page 222) save (page 226) Note: Consider the following behaviors of MongoDBs update operations. When performing update operations that increase the document size beyond the allocated space for that document, the update operation relocates the document on disk and may reorder the document elds depending on the type of update. As of these driver versions (page 1189), all write operations will issue a getLastError (page 834) command to conrm the result of the write operation:
{ getLastError: 1 }
Refer to the documentation on write concern (page 182) in the Write Operations (page 181) document for more information.
23.3.2 Update
The update() (page 951) method is the primary method used to modify documents in a MongoDB collection. By default, the update() (page 951) method updates a single document, but by using the multi option, update()
222
(page 951) can update all documents that match the query criteria in the collection. The update() (page 951) method can either replace the existing document with the new document or update specic elds in the existing document. The update() (page 951) has the following syntax 5 :
db.collection.update( <query>, <update>, <options> )
Corresponding operation in SQL The update() (page 951) method corresponds to the UPDATE operation in SQL, and: the <query> argument corresponds to the WHERE statement, and the <update> corresponds to the SET ... statement. The default behavior of the update() (page 951) method updates a single document and would correspond to the SQL UPDATE statement with the LIMIT 1. With the multi option, update() (page 951) method would correspond to the SQL UPDATE statement without the LIMIT clause.
Modify with Update Operators If the <update> argument contains only update operator (page 228) expressions such as the $set (page 792) operator expression, the update() (page 951) method updates the corresponding elds in the document. To update elds in subdocuments, MongoDB uses dot notation.
Update a Field in a Document
Use $set (page 792) to update a value of a eld. The following operation queries the bios collection for the rst document that has an _id eld equal to 1 and sets the value of the eld middle, in the subdocument name, to Warner:
db.bios.update( { _id: 1 }, { $set: { name.middle: Warner }, } )
If the <update> argument contains elds not currently in the document, the update() (page 951) method adds the new elds to the document. The following operation queries the bios collection for the rst document that has an _id eld equal to 3 and adds to that document a new mbranch eld and a new aka eld in the subdocument name:
db.bios.update( { _id: 3 }, { $set: { mbranch: Navy, name.aka: Amazing Grace
5 This examples uses the interface added in MongoDB 2.2 to specify the multi and the upsert options in a document form. Prior to version 2.2, in the mongo (page 1040) shell, you would specify the upsert and the multi options in the update() (page 951) method as positional boolean options. See update() (page 951) for details.
23.3. Update
223
} } )
If the <update> argument contains $unset (page 792) operator, the update() (page 951) method removes the eld from the document. The following operation queries the bios collection for the rst document that has an _id eld equal to 3 and removes the birth eld from the document:
db.bios.update( { _id: 3 }, { $unset: { birth: 1 } } )
Update Arrays
Update an Element by Specifying Its Position If the update operation requires an update of an element in an array eld, the update() (page 951) method can perform the update using the position of the element and dot notation. Arrays in MongoDB are zero-based. The following operation queries the bios collection for the rst document with _id eld equal to 1 and updates the second element in the contribs array:
db.bios.update( { _id: 1 }, { $set: { contribs.1: ALGOL 58 } } )
Update an Element without Specifying Its Position The update() (page 951) method can perform the update using the $ (page 792) positional operator if the position is not known. The array eld must appear in the query argument in order to determine which array element to update. The following operation queries the bios collection for the rst document where the _id eld equals 3 and the contribs array contains an element equal to compiler. If found, the update() (page 951) method updates the rst matching element in the array to A compiler in the document:
db.bios.update( { _id: 3, contribs: compiler }, { $set: { contribs.$: A compiler } } )
Update a Document Element without Specifying Its Position The update() (page 951) method can perform the update of an array that contains subdocuments by using the positional operator (i.e. $ (page 792)) and the dot notation. The following operation queries the bios collection for the rst document where the _id eld equals 6 and the awards array contains a subdocument element with the by eld equal to ACM. If found, the update() (page 951) method updates the by eld in the rst matching subdocument:
224
db.bios.update( { _id: 6, awards.by: ACM } , { $set: { awards.$.by: Association for Computing Machinery } } )
Add an Element to an Array The following operation queries the bios collection for the rst document that has an _id eld equal to 1 and adds a new element to the awards eld:
db.bios.update( { _id: 1 }, { $push: { awards: { award: IBM Fellow, year: 1963, by: IBM } } } )
If the <options> argument contains the multi option set to true or 1, the update() (page 951) method updates all documents that match the query. The following operation queries the bios collection for all documents where the awards eld contains a subdocument element with the award eld equal to Turing and sets the turing eld to true in the matching documents 6 :
db.bios.update( { awards.award: Turing }, { $set: { turing: true } }, { multi: true } )
Replace Existing Document with New Document If the <update> argument contains only eld and value pairs, the update() (page 951) method replaces the existing document with the document in the <update> argument, except for the _id eld. The following operation queries the bios collection for the rst document that has a name eld equal to { first: John, last: McCarthy } and replaces all but the _id eld in the document with the elds in the <update> argument:
db.bios.update( { name: { first: John, last: McCarthy } }, { name: { first: Ken, last: Iverson }, born: new Date(Dec 17, 1941), died: new Date(Oct 19, 2004), contribs: [ APL, J ], awards: [ { award: Turing Award, year: 1979, by: ACM }, { award: Harry H. Goode Memorial Award, year: 1975, by: IEEE Computer Society },
6 Prior to version 2.2, in the mongo (page 1040) shell, you would specify the upsert and the multi options in the update() (page 951) method as positional boolean options. See update() (page 951) for details.
23.3. Update
225
See also Update Operations with the Upsert Flag (page 209) in the Create (page 203) document.
23.3.4 Save
The save() (page 949) method performs a special type of update() (page 951), depending on the _id eld of the specied document. The save() (page 949) method has the following syntax:
db.collection.save( <document> )
7 Prior to version 2.2, in the mongo (page 1040) shell, you would specify the upsert and the multi options in the update() (page 951) method as positional boolean options. See update() (page 951) for details. 8 If the <update> argument includes only eld and value pairs, the new document contains the elds and values specied in the <update> argument. If the <update> argument includes only update operators (page 228), the new document contains the elds and values from <query> argument with the operations from the <update> argument applied.
226
Behavior If you specify a document with an _id eld, save() (page 949) performs an update() (page 951) with the upsert option set: if an existing document in the collection has the same _id, save() (page 949) updates that document, and inserts the document otherwise. If you do not specify a document with an _id eld to save() (page 949), performs an insert() (page 939) operation. That is, save() (page 949) method is equivalent to the update() (page 951) method with the upsert option and a <query> argument with an _id eld. Example Consider the following pseudocode explanation of save() (page 949) as an illustration of its behavior:
function save( doc ) { if( doc["_id"] ) { update( {_id: doc["_id"] }, doc, { upsert: true } ); } else { insert(doc); } }
Save Performs an Update If the <document> argument contains the _id eld that exists in the collection, the save() (page 949) method performs an update that replaces the existing document with the <document> argument. The following operation queries the bios collection for a document where the _id equals ObjectId("507c4e138fada716c89d0014") and replaces the document with the <document> argument:
db.bios.save( { _id: ObjectId("507c4e138fada716c89d0014"), name: { first: Martin, last: Odersky }, contribs: [ Scala ] } )
See also: Insert a Document with save() (page 208) and Update operations with save() (page 211) in the Create (page 203) section.
23.3. Update
227
Description Increments the value of the eld by the specied amount. Renames a eld. Sets the value of a eld upon documentation creation during an upsert. Has no effect on update operations that modify existing documents. Sets the value of a eld in an existing document. Removes the specied eld from an existing document.
Name $ (page 792) $addToSet (page 793) $pop (page 794) $pullAll (page 794) $pull (page 795) $pushAll (page 795) $push (page 796)
Modiers
Description Acts as a placeholder to update the rst element that matches the query condition in an update. Adds elements to an existing array only if they do not already exist in the set. Removes the rst or last item of an array. Removes multiple values from an array. Removes items from an array that match a query statement. Deprecated. Adds several items to an array. Adds an item to an array.
Name $each (page 797) $slice (page 797) $sort (page 798) Bitwise Name $bit (page 799)
Description Modies the $push (page 796) and $addToSet (page 793) operators to append multiple items for array updates. Modies the $push (page 796) operator to limit the size of updated arrays. Modies the $push (page 796) operator to reorder documents stored in an array.
228
Isolation Name $isolated (page 800) Description Modies behavior of multi-updates to improve the isolation of the operation.
23.4 Delete
Of the four basic database operations (i.e. CRUD), delete operations are those that remove documents from a collection in MongoDB. For general information about write operations and the factors that affect their performance, see Write Operations (page 181); for documentation of other CRUD operations, see the Core MongoDB Operations (CRUD) (page 167) page. Overview (page 229) Remove All Documents that Match a Condition (page 230) Remove a Single Document that Matches a Condition (page 230) Remove All Documents from a Collection (page 230) Capped Collection (page 230) Isolation (page 230)
23.4.1 Overview
The remove() (page 229) method in the mongo (page 1040) shell provides this operation, as do corresponding methods in the drivers (page 559). Note: As of these driver versions (page 1189), all write operations will issue a getLastError (page 834) command to conrm the result of the write operation:
{ getLastError: 1 }
Refer to the documentation on write concern (page 182) in the Write Operations (page 181) document for more information. Use the remove() (page 948) method to delete documents from a collection. The remove() (page 948) method has the following syntax:
db.collection.remove( <query>, <justOne> )
Corresponding operation in SQL The remove() (page 948) method is analogous to the DELETE statement, and: the <query> argument corresponds to the WHERE statement, and the <justOne> argument takes a Boolean and has the same affect as LIMIT 1. remove() (page 948) deletes documents from the collection. If you do not specify a query, remove() (page 948) removes all documents from a collection, but does not remove the indexes. 9
9 To remove all documents from a collection, it may be more efcient to use the drop() (page 925) method to drop the entire collection, including the indexes, and then recreate the collection and rebuild the indexes.
23.4. Delete
229
Note: For large deletion operations, it may be more efcient to copy the documents that you want to keep to a new collection and then use drop() (page 925) on the original collection.
Note: This operation is not equivalent to the drop() (page 925) method.
23.4.6 Isolation
If the <query> argument to the remove() (page 948) method matches multiple documents in the collection, the delete operation may interleave with other write operations to that collection. For an unsharded collection, you have the option to override this behavior with the $isolated (page 800) isolation operator, effectively isolating the delete operation from other write operations. To isolate the operation, include $isolated: 1 in the <query> parameter as in the following example:
db.bios.remove( { turing: true, $isolated: 1 } )
230
Part V
Data Modeling
231
Data in MongoDB has a exible schema. Collections do not enforce document structure. Although you may be able to use different structures for a single data set in MongoDB, different data models may have signicant impacts on MongoDB and application performance. Consider Data Modeling Considerations for MongoDB Applications (page 235) for a conceptual overview of data modeling problems in MongoDB, and the Data Modeling Patterns (page 241) documents for examples of different approaches to data models. See also: Use Cases (page 611) for overviews of application design, including data models, with MongoDB.
233
234
CHAPTER 24
Background
235
236
Atomicity MongoDB only provides atomic operations on the level of a single document. 1 As a result needs for atomic operations inuence decisions to use embedded or referenced relationships when modeling data for MongoDB. Embed elds that need to be modied together atomically in the same document. See Model Data for Atomic Operations (page 245) for an example of atomic updates within a single document.
If the total number of documents is low you may group documents into collection by type. For logs, consider maintaining distinct log collections, such as logs.dev and logs.debug. The logs.dev collection would contain only the documents related to the dev environment. Generally, having large number of collections has no signicant performance penalty and results in very good performance. Distinct collections are very important for high-throughput batch processing. When using models that have a large number of collections, consider the following behaviors: Each collection has a certain minimum overhead of a few kilobytes.
Document-level atomic operations include all operations within a single MongoDB document record: operations that affect multiple subdocuments within that single record are still atomic.
1
237
Each index, including the index on _id, requires at least 8KB of data space. A single <database>.ns le stores all meta-data for each database. Each index and collection has its own entry in the namespace le, MongoDB places limits on the size of namespace files (page 1109). Because of limits on namespaces (page 1109), you may wish to know the current number of namespaces in order to determine how many additional namespaces the database can support, as in the following example:
db.system.namespaces.count()
The <database>.ns le defaults to 16 MB. To change the size of the <database>.ns le, pass a new size to --nssize option <new size MB> (page 1030) on server start. The --nssize (page 1030) sets the size for new <database>.ns les. For existing databases, after starting up the server with --nssize (page 1030), run the db.repairDatabase() (page 987) command from the mongo (page 1040) shell. Indexes Create indexes to support common queries. Generally, indexes and index use in MongoDB correspond to indexes and index use in relational database: build indexes on elds that appear often in queries and for all operations that return sorted results. MongoDB automatically creates a unique index on the _id eld. As you create indexes, consider the following behaviors of indexes: Each index requires at least 8KB of data space. Adding an index has some negative performance impact for write operations. For collections with high writeto-read ratio, indexes are expensive as each insert must add keys to each index. Collections with high proportion of read operations to write operations often benet from additional indexes. Indexes do not affect un-indexed read operations. See Indexing Strategies (page 343) for more information on determining indexes. Additionally, the MongoDB database proler (page 101) may help identify inefcient queries. Sharding Sharding allows users to partition a collection within a database to distribute the collections documents across a number of mongod (page 1025) instances or shards. The shard key determines how MongoDB distributes data among shards in a sharded collection. Selecting the proper shard key (page 487) has signicant implications for performance. See Sharded Cluster Overview (page 487) for more information on sharding and the selection of the shard key (page 487). Document Growth Certain updates to documents can increase the document size, such as pushing elements to an array and adding new elds. If the document size exceeds the allocated space for that document, MongoDB relocates the document on disk. This internal relocation can be both time and resource consuming. Although MongoDB automatically provides padding to minimize the occurrence of relocations, you may still need to manually handle document growth. Refer to Pre-Aggregated Reports (page 623) for an example of the Pre-allocation approach to handle document growth.
238
239
240
CHAPTER 25
25.1.2 Pattern
Consider the following example that maps patron and address relationships. The example illustrates the advantage of embedding over referencing if you need to view one data entity in context of the other. In this one-to-one relationship between patron and address data, the address belongs to the patron. In the normalized data model, the address contains a reference to the parent.
{ _id: "joe", name: "Joe Bookreader" } { patron_id: "joe", street: "123 Fake Street", city: "Faketon", state: "MA" zip: 12345 }
If the address data is frequently retrieved with the name information, then with referencing, your application needs to issue multiple queries to resolve the reference. The better data model would be to embed the address data in the patron data, as in the following document:
{ _id: "joe", name: "Joe Bookreader",
241
address: { street: "123 Fake Street", city: "Faketon", state: "MA" zip: 12345 } }
With the embedded data model, your application can retrieve the complete patron information with one query.
25.2.2 Pattern
Consider the following example that maps patron and multiple address relationships. The example illustrates the advantage of embedding over referencing if you need to view many data entities in context of another. In this one-tomany relationship between patron and address data, the patron has multiple address entities. In the normalized data model, the address contains a reference to the parent.
{ _id: "joe", name: "Joe Bookreader" } { patron_id: "joe", street: "123 Fake Street", city: "Faketon", state: "MA", zip: 12345 } { patron_id: "joe", street: "1 Some Other Street", city: "Boston", state: "MA", zip: 12345 }
If your application frequently retrieves the address data with the name information, then your application needs to issue multiple queries to resolve the references. A more optimal schema would be to embed the address data entities in the patron data, as in the following document:
242
{ _id: "joe", name: "Joe Bookreader", addresses: [ { street: "123 Fake Street", city: "Faketon", state: "MA", zip: 12345 }, { street: "1 Some Other Street", city: "Boston", state: "MA", zip: 12345 } ] }
With the embedded data model, your application can retrieve the complete patron information with one query.
25.3.2 Pattern
Consider the following example that maps publisher and book relationships. The example illustrates the advantage of referencing over embedding to avoid repetition of the publisher information. Embedding the publisher document inside the book document would lead to repetition of the publisher data, as the following documents show:
{ title: "MongoDB: The Definitive Guide", author: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher: { name: "OReilly Media", founded: 1980, location: "CA" } }
243
{ title: "50 Tips and Tricks for MongoDB Developer", author: "Kristina Chodorow", published_date: ISODate("2011-05-06"), pages: 68, language: "English", publisher: { name: "OReilly Media", founded: 1980, location: "CA" } }
To avoid repetition of the publisher data, use references and keep the publisher information in a separate collection from the book collection. When using references, the growth of the relationships determine where to store the reference. If the number of books per publisher is small with limited growth, storing the book reference inside the publisher document may sometimes be useful. Otherwise, if the number of books per publisher is unbounded, this data model would lead to mutable, growing arrays, as in the following example:
{ name: "OReilly Media", founded: 1980, location: "CA", books: [12346789, 234567890, ...] } { _id: 123456789, title: "MongoDB: The Definitive Guide", author: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English" } { _id: 234567890, title: "50 Tips and Tricks for MongoDB Developer", author: "Kristina Chodorow", published_date: ISODate("2011-05-06"), pages: 68, language: "English" }
To avoid mutable, growing arrays, store the publisher reference inside the book document:
{ _id: "oreilly", name: "OReilly Media", founded: 1980, location: "CA" } { _id: 123456789, title: "MongoDB: The Definitive Guide", author: [ "Kristina Chodorow", "Mike Dirolf" ],
244
published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher_id: "oreilly" } { _id: 234567890, title: "50 Tips and Tricks for MongoDB Developer", author: "Kristina Chodorow", published_date: ISODate("2011-05-06"), pages: 68, language: "English", publisher_id: "oreilly" }
You can use the db.collection.findAndModify() (page 929) method to atomically determine if a book is available for checkout and update with the new checkout information. Embedding the available eld and the checkout eld within the same document ensures that the updates to these elds are in sync:
db.books.findAndModify ( { query: { _id: 123456789, available: { $gt: 0 } }, update: { $inc: { available: -1 }, $push: { checkout: { by: "abc", date: new Date() } } } } )
245
25.5.2 Pattern
The Parent References pattern stores each tree node in a document; in addition to the tree node, the document stores the id of the nodes parent. Consider the following example that models a tree of categories using Parent References:
db.categories.insert( db.categories.insert( db.categories.insert( db.categories.insert( db.categories.insert( db.categories.insert( { { { { { { _id: _id: _id: _id: _id: _id: "MongoDB", parent: "Databases" } ) "Postgres", parent: "Databases" } ) "Databases", parent: "Programming" } ) "Languages", parent: "Programming" } ) "Programming", parent: "Books" } ) "Books", parent: null } )
You can create an index on the eld parent to enable fast search by the parent node:
db.categories.ensureIndex( { parent: 1 } )
You can query by the parent eld to nd its immediate children nodes:
db.categories.find( { parent: "Databases" } )
The Parent Links pattern provides a simple solution to tree storage, but requires multiple queries to retrieve subtrees.
25.6.2 Pattern
The Child References pattern stores each tree node in a document; in addition to the tree node, document stores in an array the id(s) of the nodes children. 246 Chapter 25. Data Modeling Patterns
Consider the following example that models a tree of categories using Child References:
db.categories.insert( db.categories.insert( db.categories.insert( db.categories.insert( db.categories.insert( db.categories.insert( { { { { { { _id: _id: _id: _id: _id: _id: "MongoDB", children: [] } ) "Postgres", children: [] } ) "Databases", children: [ "MongoDB", "Postgres" ] } ) "Languages", children: [] } ) "Programming", children: [ "Databases", "Languages" ] } ) "Books", children: [ "Programming" ] } )
The query to retrieve the immediate children of a node is fast and straightforward:
db.categories.findOne( { _id: "Databases" } ).children
You can create an index on the eld children to enable fast search by the child nodes:
db.categories.ensureIndex( { children: 1 } )
You can query for a node in the children eld to nd its parent node as well as its siblings:
db.categories.find( { children: "MongoDB" } )
The Child References pattern provides a suitable solution to tree storage as long as no operations on subtrees are necessary. This pattern may also provide a suitable solution for storing graphs where a node may have multiple parents.
25.7.2 Pattern
The Array of Ancestors pattern stores each tree node in a document; in addition to the tree node, document stores in an array the id(s) of the nodes ancestors or path. Consider the following example that models a tree of categories using Array of Ancestors:
db.categories.insert( db.categories.insert( db.categories.insert( db.categories.insert( db.categories.insert( db.categories.insert( { { { { { { _id: _id: _id: _id: _id: _id:
"MongoDB", ancestors: [ "Books", "Programming", "Databases" ], parent: " "Postgres", ancestors: [ "Books", "Programming", "Databases" ], parent: "Databases", ancestors: [ "Books", "Programming" ], parent: "Programming "Languages", ancestors: [ "Books", "Programming" ], parent: "Programming "Programming", ancestors: [ "Books" ], parent: "Books" } ) "Books", ancestors: [ ], parent: null } )
The query to retrieve the ancestors or path of a node is fast and straightforward:
db.categories.findOne( { _id: "MongoDB" } ).ancestors
You can create an index on the eld ancestors to enable fast search by the ancestors nodes:
247
db.categories.ensureIndex( { ancestors: 1 } )
The Array of Ancestors pattern provides a fast and efcient solution to nd the descendants and the ancestors of a node by creating an index on the elements of the ancestors eld. This makes Array of Ancestors a good choice for working with subtrees. The Array of Ancestors pattern is slightly slower than the Materialized Paths pattern but is more straightforward to use.
25.8.2 Pattern
The Materialized Paths pattern stores each tree node in a document; in addition to the tree node, document stores as a string the id(s) of the nodes ancestors or path. Although the Materialized Paths pattern requires additional steps of working with strings and regular expressions, the pattern also provides more exibility in working with the path, such as nding nodes by partial paths. Consider the following example that models a tree of categories using Materialized Paths ; the path string uses the comma , as a delimiter:
db.categories.insert( db.categories.insert( db.categories.insert( db.categories.insert( db.categories.insert( db.categories.insert( { { { { { { _id: _id: _id: _id: _id: _id: "Books", path: null } ) "Programming", path: ",Books," } ) "Databases", path: ",Books,Programming," } ) "Languages", path: ",Books,Programming," } ) "MongoDB", path: ",Books,Programming,Databases," } ) "Postgres", path: ",Books,Programming,Databases," } )
You can query to retrieve the whole tree, sorting by the path:
db.categories.find().sort( { path: 1 } )
You can use regular expressions on the path eld to nd the descendants of Programming:
db.categories.find( { path: /,Programming,/ } )
You can also retrieve the descendants of Books where the Books is also at the topmost level of the hierarchy:
db.categories.find( { path: /^,Books,/ } )
248
db.categories.ensureIndex( { path: 1 } )
This index may improve performance, depending on the query: For queries of the Books sub-tree (e.g. http://docs.mongodb.org/manual/^,Books,/) an index on the path eld improves the query performance signicantly. For queries of the Programming sub-tree (e.g. http://docs.mongodb.org/manual/,Programming,/), or similar queries of sub-tress, where the node might be in the middle of the indexed string, the query must inspect the entire index. For these queries an index may provide some performance improvement if the index is signicantly smaller than the entire collection.
25.9.2 Pattern
The Nested Sets pattern identies each node in the tree as stops in a round-trip traversal of the tree. The application visits each node in the tree twice; rst during the initial trip, and second during the return trip. The Nested Sets pattern stores each tree node in a document; in addition to the tree node, document stores the id of nodes parent, the nodes initial stop in the left eld, and its return stop in the right eld. Consider the following example that models a tree of categories using Nested Sets:
db.categories.insert( db.categories.insert( db.categories.insert( db.categories.insert( db.categories.insert( db.categories.insert( { { { { { { _id: _id: _id: _id: _id: _id: "Books", parent: 0, left: 1, right: 12 } ) "Programming", parent: "Books", left: 2, right: 11 } ) "Languages", parent: "Programming", left: 3, right: 4 } ) "Databases", parent: "Programming", left: 5, right: 10 } ) "MongoDB", parent: "Databases", left: 6, right: 7 } ) "Postgres", parent: "Databases", left: 8, right: 9 } )
The Nested Sets pattern provides a fast and efcient solution for nding subtrees but is inefcient for modifying the tree structure. As such, this pattern is best for static trees that do not change.
249
In 2.4, MongoDB provides a text search feature. See Text Search (page 373) for more information. If your application needs to perform queries on the content of a eld that holds text you can perform exact matches on the text or use $regex (page 778) to use regular expression pattern matches. However, for many operations on text, these methods do not satisfy application requirements. This pattern describes one method for supporting keyword search using MongoDB to support application search functionality, that uses keywords stored in an array in the same document as the text eld. Combined with a multi-key index (page 334), this pattern can support applications keyword search operations.
25.10.1 Pattern
To add structures to your document to support keyword-based queries, create an array eld in your documents and add the keywords as strings in the array. You can then create a multi-key index (page 334) on the array and create queries that select values from the array. Example Given a collection of library volumes that you want to provide topic-based search. For each volume, you add the array topics, and you add as many keywords as needed for a given volume. For the Moby-Dick volume you might have the following document:
{ title : "Moby-Dick" , author : "Herman Melville" , published : 1851 , ISBN : 0451526996 , topics : [ "whaling" , "allegory" , "revenge" , "American" , "novel" , "nautical" , "voyage" , "Cape Cod" ] }
The multi-key index creates separate index entries for each keyword in the topics array. For example the index contains one entry for whaling and another for allegory. You then query based on the keywords. For example:
db.volumes.findOne( { topics : "voyage" }, { title: 1 } )
Note: An array with a large number of elements, such as one with several hundreds or thousands of keywords will incur greater indexing costs on insertion.
Asynchronous Indexing. MongoDB builds indexes synchronously, which means that the indexes used for keyword indexes are always current and can operate in real-time. However, asynchronous bulk indexes may be more efcient for some kinds of content and workloads.
251
252
Part VI
Aggregation
253
In version 2.2, MongoDB introduced the aggregation framework (page 257) that provides a powerful and exible set of tools to use for many data aggregation tasks. If youre familiar with data aggregation in SQL, consider the SQL to Aggregation Framework Mapping Chart (page 309) document as an introduction to some of the basic concepts in the aggregation framework. Consider the full documentation of the aggregation framework and other data aggregation tools for MongoDB here:
255
256
CHAPTER 26
Aggregation Framework
26.1 Overview
The MongoDB aggregation framework provides a means to calculate aggregated values without having to use mapreduce. While map-reduce is powerful, it is often more difcult than necessary for many simple aggregation tasks, such as totaling or averaging eld values. If youre familiar with SQL, the aggregation framework provides similar functionality to GROUP BY and related SQL operators as well as simple forms of self joins. Additionally, the aggregation framework provides projection capabilities to reshape the returned data. Using the projections in the aggregation framework, you can add computed elds, create new virtual sub-objects, and extract sub-elds into the top-level of results. See also: Consider Aggregation Framework Examples (page 263) and Aggregation Framework Reference (page 273) for more documentation.
26.2.1 Pipelines
Conceptually, documents from a collection pass through an aggregation pipeline, which transforms these objects as they pass through. For those familiar with UNIX-like shells (e.g. bash,) the concept is analogous to the pipe (i.e. |) used to string text lters together. In a shell environment the pipe redirects a stream of characters from the output of one process to the input of the next. The MongoDB aggregation pipeline streams MongoDB documents from one pipeline operator (page 293) to the next to process the documents. Pipeline operators can be repeated in the pipe. All pipeline operators process a stream of documents and the pipeline behaves as if the operation scans a collection and passes all matching documents into the top of the pipeline. Each operator in the pipeline transforms each document as it passes through the pipeline. Note: Pipeline operators need not produce one output document for every input document: operators may also 257
generate new documents or lter out documents. Warning: The pipeline cannot operate on values of the following types: Binary, Symbol, MinKey, MaxKey, DBRef, Code, and CodeWScope. See also: The Aggregation Framework Reference (page 273) includes documentation of the following pipeline operators: $project (page 287) $match (page 283) $limit (page 282) $skip (page 289) $unwind (page 292) $group (page 280) $sort (page 289) $geoNear (page 278)
26.2.2 Expressions
Expressions (page 302) produce output documents based on calculations performed on input documents. The aggregation framework denes expressions using a document format using prexes. Expressions are stateless and are only evaluated when seen by the aggregation process. All aggregation expressions can only operate on the current document in the pipeline, and cannot integrate data from other documents. The accumulator expressions used in the $group (page 280) operator maintain that state (e.g. totals, maximums, minimums, and related data) as documents progress through the pipeline. See also: Aggregation expressions (page 302) for additional examples of the expressions provided by the aggregation framework.
26.3 Use
26.3.1 Invocation
Invoke an aggregation operation with the aggregate() (page 922) wrapper in the mongo (page 1040) shell or the aggregate (page 812) database command. Always call aggregate() (page 922) on a collection object that determines the input documents of the aggregation pipeline. The arguments to the aggregate() (page 922) method specify a sequence of pipeline operators (page 293), where each operator may have a number of operands. First, consider a collection of documents named articles using the following format:
{ title : "this is my title" , author : "bob" , posted : new Date () , pageViews : 5 , tags : [ "fun" , "good" , "fun" ] , comments : [
258
{ author :"joe" , text : "this is cool" } , { author :"sam" , text : "this is bad" } ], other : { foo : 5 } }
The following example aggregation operation pivots data to create a set of author names grouped by tags applied to an article. Call the aggregation framework by issuing the following command:
db.articles.aggregate( { $project : { author : 1, tags : 1, } }, { $unwind : "$tags" }, { $group : { _id : { tags : "$tags" }, authors : { $addToSet : "$author" } } } );
The aggregation pipeline begins with the collection articles and selects the author and tags elds using the $project (page 287) aggregation operator. The $unwind (page 292) operator produces one output document per tag. Finally, the $group (page 280) operator pivots these elds.
26.3.2 Result
The aggregation operation in the previous section returns a document with two elds: result which holds an array of documents returned by the pipeline ok which holds the value 1, indicating success. Changed in version 2.4: If an error occurs, the aggregate() (page 922) helper throws an exception. In previous versions, the helper returned a document with the error message and code, and ok status eld not equal to 1, same as the aggregate (page 812) command. As a document, the result is subject to the BSON Document size (page 1109) limit, which is currently 16 megabytes.
259
$skip (page 289). The above operators can also use an index when placed before the following aggregation operators: $project (page 287) $unwind (page 292) $group (page 280). New in version 2.4. The $geoNear (page 278) pipeline operator takes advantage of a geospatial index. When using $geoNear (page 278), the $geoNear (page 278) pipeline operation must appear as the rst stage in an aggregation pipeline.
During the optimization phase, the optimizer transforms the sequence to the following:
{ $sort: { age : -1 } }, { $limit: 15 } { $skip: 10 }
Note: The $limit (page 282) value has increased to the sum of the initial value and the $skip (page 289) value.
260
$limit + $skip + $limit + $skip Sequence Optimization When you have continuous sequence of $limit (page 282) pipeline stage followed by a $skip (page 289) pipeline stage, the aggregation will attempt to re-arrange the pipeline stages to combine the limits together and the skips together. For example, if the pipeline consists of the following stages:
{ { { { $limit: 100 }, $skip: 5 }, $limit: 10}, $skip: 2 }
During the intermediate step, the optimizer reverses the position of the $skip (page 289) followed by a $limit (page 282) to $limit (page 282) followed by the $skip (page 289).
{ { { { $limit: 100 }, $limit: 15}, $skip: 5 }, $skip: 2 }
The $limit (page 282) value has increased to the sum of the initial value and the $skip (page 289) value. Then, for the nal $limit (page 282) value, the optimizer selects the minimum between the adjacent $limit (page 282) values. For the nal $skip (page 289) value, the optimizer adds the adjacent $skip (page 289) values, to transform the sequence to the following:
{ $limit: 15 }, { $skip: 7 }
261
The aggregation framework is compatible with sharded collections. When operating on a sharded collection, the aggregation pipeline is split into two parts. The aggregation framework pushes all of the operators up to the rst $group (page 280) or $sort (page 289) operation to each shard. 1 Then, a second pipeline on the mongos (page 1036) runs. This pipeline consists of the rst $group (page 280) or $sort (page 289) and any remaining pipeline operators, and runs on the results received from the shards. The $group (page 280) operator brings in any sub-totals from the shards and combines them: in some cases these may be structures. For example, the $avg (page 274) expression maintains a total and count for each shard; mongos (page 1036) combines these values and then divides.
26.6 Limitations
Aggregation operations with the aggregate (page 812) command have the following limitations: The pipeline cannot operate on values of the following types: Binary, Symbol, MinKey, MaxKey, DBRef, Code, CodeWScope. Output from the pipeline can only contain 16 megabytes. If your result set exceeds this limit, the aggregate (page 812) command produces an error. If any single aggregation operation consumes more than 10 percent of system RAM the operation will produce an error.
1 If an early $match (page 283) can exclude shards through the use of the shard key in the predicate, then these operators are only pushed to the relevant shards.
262
CHAPTER 27
MongoDB provides exible data aggregation functionality with the aggregate (page 812) command. For additional information about aggregation consider the following resources: Aggregation Framework (page 257) Aggregation Framework Reference (page 273) SQL to Aggregation Framework Mapping Chart (page 309) This document provides a number of practical examples that display the capabilities of the aggregation framework. All examples use a publicly available data set of all zipcodes and populations in the United States.
27.1 Requirements
mongod (page 1025) and mongo (page 1040), version 2.2 or later.
263
The city eld holds the city. The state eld holds the two letter state abbreviation. The pop eld holds the population. The loc eld holds the location as a latitude longitude pair. All of the following examples use the aggregate() (page 922) helper in the mongo (page 1040) shell. aggregate() (page 922) provides a wrapper around the aggregate (page 812) database command. See the documentation for your driver (page 559) for a more idiomatic interface for data aggregation operations.
Aggregations operations using the aggregate() (page 922) helper, process all documents on the zipcodes collection. aggregate() (page 922) connects a number of pipeline (page 257) operators, which dene the aggregation process. In the above example, the pipeline passes all documents in the zipcodes collection through the following steps: the $group (page 280) operator collects all documents and creates documents for each state. These new per-state documents have one eld in addition the _id eld: totalPop which is a generated eld using the $sum (page 291) operation to calculate the total value of all pop elds in the source documents. After the $group (page 280) operation the documents in the pipeline resemble the following:
{ "_id" : "AK", "totalPop" : 550043 }
the $match (page 283) operation lters these documents so that the only documents that remain are those where the value of totalPop is greater than or equal to 10 million. The $match (page 283) operation does not alter the documents, which have the same format as the documents output by $group (page 280). The equivalent SQL for this operation is:
SELECT state, SUM(pop) AS pop FROM zips GROUP BY state HAVING pop > (10*1000*1000)
264
Aggregations operations using the aggregate() (page 922) helper, process all documents on the zipcodes collection. aggregate() (page 922) a number of pipeline (page 257) operators that dene the aggregation process. In the above example, the pipeline passes all documents in the zipcodes collection through the following steps: the $group (page 280) operator collects all documents and creates new documents for every combination of the city and state elds in the source document. After this stage in the pipeline, the documents resemble the following:
{ "_id" : { "state" : "CO", "city" : "EDGEWATER" }, "pop" : 13154 }
the second $group (page 280) operator collects documents by the state eld and use the $avg (page 274) expression to compute a value for the avgCityPop eld. The nal output of this aggregation operation is:
{ "_id" : "MN", "avgCityPop" : 5335 },
Aggregations operations using the aggregate() (page 922) helper, process all documents on the zipcodes collection. aggregate() (page 922) a number of pipeline (page 257) operators that dene the aggregation process. All documents from the zipcodes collection pass into the pipeline, which consists of the following steps:
265
the $group (page 280) operator collects all documents and creates new documents for every combination of the city and state elds in the source documents. By specifying the value of _id as a sub-document that contains both elds, the operation preserves the state eld for use later in the pipeline. The documents produced by this stage of the pipeline have a second eld, pop, which uses the $sum (page 291) operator to provide the total of the pop elds in the source document. At this stage in the pipeline, the documents resemble the following:
{ "_id" : { "state" : "CO", "city" : "EDGEWATER" }, "pop" : 13154 }
$sort (page 289) operator orders the documents in the pipeline based on the vale of the pop eld from largest to smallest. This operation does not alter the documents. the second $group (page 280) operator collects the documents in the pipeline by the state eld, which is a eld inside the nested _id document. Within each per-state document this $group (page 280) operator species four elds: Using the $last (page 282) expression, the $group (page 280) operator creates the biggestcity and biggestpop elds that store the city with the largest population and that population. Using the $first (page 278) expression, the $group (page 280) operator creates the smallestcity and smallestpop elds that store the city with the smallest population and that population. The documents, at this stage in the pipeline resemble the following:
{ "_id" : "WA", "biggestCity" : "SEATTLE", "biggestPop" : 520096, "smallestCity" : "BENGE", "smallestPop" : 2 }
The nal operation is $project (page 287), which renames the _id eld to state and moves the biggestCity, biggestPop, smallestCity, and smallestPop into biggestCity and smallestCity sub-documents. The nal output of this aggregation operation is:
{ "state" : "RI", "biggestCity" : { "name" : "CRANSTON", "pop" : 176404 }, "smallestCity" : { "name" : "CLAYVILLE", "pop" : 45 } }
266
All documents from the users collection passes through the pipeline, which consists of the following operations: The $project (page 287) operator: creates a new eld called name. converts the value of the _id to upper case, with the $toUpper (page 292) operator. Then the $project (page 287) creates a new eld, named name to hold this value. suppresses the id eld. $project (page 287) will pass the _id eld by default, unless explicitly suppressed. The $sort (page 289) operator orders the results by the name eld. The results of the aggregation would resemble the following:
{ "name" : "JANE" }, { "name" : "JILL" }, { "name" : "JOE" }
267
The pipeline passes all documents in the users collection through the following operations: The $project (page 287) operator: Creates two new elds: month_joined and name. Suppresses the id from the results. The aggregate() (page 922) method includes the _id, unless explicitly suppressed. The $month (page 286) operator converts the values of the joined eld to integer representations of the month. Then the $project (page 287) operator assigns those values to the month_joined eld. The $sort (page 289) operator sorts the results by the month_joined eld. The operation returns results that resemble the following:
{ "month_joined" : 1, "name" : "ruth" }, { "month_joined" : 1, "name" : "harold" }, { "month_joined" : 1, "name" : "kate" } { "month_joined" : 2, "name" : "jill" }
268
{ $sort : { "_id.month_joined" : 1 } } ] )
The pipeline passes all documents in the users collection through the following operations: The $project (page 287) operator creates a new eld called month_joined. The $month (page 286) operator converts the values of the joined eld to integer representations of the month. Then the $project (page 287) operator assigns the values to the month_joined eld. The $group (page 280) operator collects all documents with a given month_joined value and counts how many documents there are for that value. Specically, for each unique value, $group (page 280) creates a new per-month document with two elds: _id, which contains a nested document with the month_joined eld and its value. number, which is a generated eld. The $sum (page 291) operator increments this eld by 1 for every document containing the given month_joined value. The $sort (page 289) operator sorts the documents created by $group (page 280) according to the contents of the month_joined eld. The result of this aggregation operation would resemble the following:
{ "_id" : { "month_joined" : 1 }, "number" : 3 }, { "_id" : { "month_joined" : 2 }, "number" : 9 }, { "_id" : { "month_joined" : 3 }, "number" : 5 }
269
The pipeline begins with all documents in the users collection, and passes these documents through the following operations: The $unwind (page 292) operator separates each value in the likes array, and creates a new version of the source document for every element in the array. Example Given the following document from the users collection:
{ _id : "jane", joined : ISODate("2011-03-02"), likes : ["golf", "racquetball"] }
The $unwind (page 292) operator would create the following documents:
{ _id : "jane", joined : ISODate("2011-03-02"), likes : "golf" } { _id : "jane", joined : ISODate("2011-03-02"), likes : "racquetball" }
The $group (page 280) operator collects all documents the same value for the likes eld and counts each grouping. With this information, $group (page 280) creates a new document with two elds: _id, which contains the likes value. number, which is a generated eld. The $sum (page 291) operator increments this eld by 1 for every document containing the given likes value. The $sort (page 289) operator sorts these documents by the number eld in reverse order. The $limit (page 282) operator only includes the rst 5 result documents. The results of aggregation would resemble the following:
{ "_id" : "golf", "number" : 33 }, { "_id" : "racquetball", "number" : 31 }, { "_id" : "swimming", "number" : 24 }, { "_id" : "handball", "number" : 19 }, { "_id" : "tennis",
270
"number" : 18 }
271
272
CHAPTER 28
New in version 2.1.0. The aggregation framework provides the ability to project, process, and/or control the output of the query, without using map-reduce. Aggregation uses a syntax that resembles the same syntax and form as regular MongoDB database queries. These aggregation operations are all accessible by way of the aggregate() (page 922) method. While all examples in this document use this method, aggregate() (page 922) is merely a wrapper around the database command aggregate (page 812). The following prototype aggregation operations are equivalent:
db.people.aggregate( <pipeline> ) db.people.aggregate( [<pipeline>] ) db.runCommand( { aggregate: "people", pipeline: [<pipeline>] } )
These operations perform aggregation routines on the collection named people. <pipeline> is a placeholder for the aggregation pipeline denition. aggregate() (page 922) accepts the stages of the pipeline (i.e. <pipeline>) as an array, or as arguments to the method. This documentation provides an overview of all aggregation operators available for use in the aggregation pipeline as well as details regarding their use and behavior. See also: Aggregation Framework (page 257) overview, the Aggregation Framework Documentation Index (page 255), and the Aggregation Framework Examples (page 263) for more information on the aggregation functionality. Aggregation Operators: Pipeline (page 293) Expressions (page 302) $group Operators (page 302) Boolean Operators (page 304) Comparison Operators (page 305) Arithmetic Operators (page 306) String Operators (page 306) Date Operators (page 307) Conditional Expressions (page 308)
273
274
If array element has a value of null or refers to a eld that is missing, $concat (page 274) will return null. Example Project new concatenated values. A collection menu contains the documents that stores information on menu items separately in the section, the category and the type elds, as in the following:
{ { { { _id: _id: _id: _id: 1, 2, 3, 4, item: item: item: item: { { { { sec: sec: sec: sec: "dessert", category: "pie", type: "apple" } } "dessert", category: "pie", type: "cherry" } } "main", category: "pie", type: "shepherds" } } "main", category: "pie", type: "chicken pot" } }
The following operation uses $concat (page 274) to concatenate the type eld from the sub-document item, a space, and the category eld from the sub-document item to project a new food eld:
db.menu.aggregate( { $project: { food: { $concat: [ "$item.type", " ", "$item.category" ] } } } )
The operation returns the following result set where the food eld contains the concatenated strings:
{ "result" : [ { { { { ], "ok" : 1 } "_id" "_id" "_id" "_id" : : : : 1, 2, 3, 4, "food" "food" "food" "food" : : : : "apple pie" }, "cherry pie" }, "shepherds pie" }, "chicken pot pie" }
Example Group by a concatenated string. A collection menu contains the documents that stores information on menu items separately in the section, the category and the type elds, as in the following:
{ { { { _id: _id: _id: _id: 1, 2, 3, 4, item: item: item: item: { { { { sec: sec: sec: sec: "dessert", category: "pie", type: "apple" } } "dessert", category: "pie", type: "cherry" } } "main", category: "pie", type: "shepherds" } } "main", category: "pie", type: "chicken pot" } }
The following aggregation uses $concat (page 274) to concatenate the sec eld from the sub-document item, the string ": ", and the category eld from the sub-document item to group by the new concatenated string and perform a count:
db.menu.aggregate( { $group: { _id: { $concat: [ "$item.sec",
275
Example Concatenate null or missing values. A collection menu contains the documents that stores information on menu items separately in the section, the category and the type elds. Not all documents have the all three elds. For example, the document with _id equal to 5 is missing the category eld:
{ { { { { _id: _id: _id: _id: _id: 1, 2, 3, 4, 5, item: item: item: item: item: { { { { { sec: sec: sec: sec: sec: "dessert", category: "pie", type: "apple" } } "dessert", category: "pie", type: "cherry" } } "main", category: "pie", type: "shepherds" } } "main", category: "pie", type: "chicken pot" } } "beverage", type: "coffee" } }
The following aggregation uses the $concat (page 274) to concatenate the type eld from the sub-document item, a space, and the category eld from the sub-document item:
db.menu.aggregate( { $project: { food: { $concat: [ "$item.type", " ", "$item.category" ] } } } )
Because the document with _id equal to 5 is missing the type eld in the item sub-document, $concat (page 274) returns the value null as the concatenated value for the document:
{ "result" : [ { { { { { ], "ok" : 1 } "_id" "_id" "_id" "_id" "_id" : : : : : 1, 2, 3, 4, 5, "food" "food" "food" "food" "food" : : : : : "apple pie" }, "cherry pie" }, "shepherds pie" }, "chicken pot pie" }, null }
276
To handle possible missing elds, you can use $ifNull (page 282) with $concat (page 274), as in the following example which substitutes <unknown type> if the eld type is null or missing, and <unknown category> if the eld category is null or is missing:
db.menu.aggregate( { $project: { food:
{ $concat: [ { $ifNull: ["$item.type", "<unknown type>"] " ", { $ifNull: ["$item.category", "<unknown cate ] } } } )
Takes an array with three expressions, where the rst expression evaluates to a Boolean value. If the rst expression evaluates to true, $cond (page 277) returns the value of the second expression. If the rst expression evaluates to false, $cond (page 277) evaluates and returns the third expression.
277
278
near (coordinates) Species the coordinates (e.g. [ x, y ]) to use as the center of a geospatial query. distanceField (string) Species the output eld that will contain the calculated distance. You can use the dot notation to specify a eld within a subdocument. limit (number) Optional. Species the maximum number of documents to return. The default value is 100. See also the num option. num (number) Optional. Synonym for the limit option. If both num and limit are included, the num value overrides the limit value. maxDistance (number) Optional. Limits the results to the documents within the specied distance from the center coordinates. query (document) Optional. Limits the results to the documents that match the query. The query syntax is identical to the read operation query (page 170) syntax. spherical (boolean) Optional. Default value is false. When true, MongoDB performs calculation using spherical geometry. distanceMultiplier (number) Optional. Species a factor to multiply all distances returned by $geoNear (page 278). For example, use distanceMultiplier to convert from spherical queries returned in radians to linear units (i.e. miles or kilometers) by multiplying by the radius of the Earth. includeLocs (string) Optional. Species the output eld that identies the location used to calculate the distance. This option is useful when a location eld contains multiple locations. You can use the dot notation to specify a eld within a subdocument. uniqueDocs (boolean) Optional. Default value is false. If a location eld contains multiple locations, the default settings will return the document multiple times if more than one location meets the criteria. When true, the document will only return once even if the document has multiple locations that meet the criteria. Example The following aggregation nds at most 5 unique documents with a location at most .008 from the center [40.72, -73.99] and have type equal to public:
db.places.aggregate([ { $geoNear: { near: [40.724, -73.997], distanceField: "dist.calculated", maxDistance: 0.008, query: { type: "public" }, includeLocs: "dist.location", uniqueDocs: true, num: 5 } } ])
279
"type" : "public", "location" : [ [ 40.731, -73.999 ], [ 40.732, -73.998 ], [ 40.730, -73.995 ], [ 40.729, -73.996 ] ], "dist" : { "calculated" : 0.0050990195135962296, "location" : [ 40.729, -73.996 ] } }, { "_id" : 8, "name" : "Sara D. Roosevelt Park", "type" : "public", "location" : [ [ 40.723, -73.991 ], [ 40.723, -73.990 ], [ 40.715, -73.994 ], [ 40.715, -73.994 ] ], "dist" : { "calculated" : 0.006082762530298062, "location" : [ 40.723, -73.991 ] } } ], "ok" : 1 }
The matching documents in the result eld contain two new elds: dist.calculated eld that contains the calculated distance, and dist.location eld that contains the location used in the calculation. Note: The options for $geoNear (page 278) are similar to the geoNear (page 826) command with the following exceptions: distanceField is a mandatory eld for the $geoNear (page 278) pipeline operator; the option does not exist in the geoNear (page 826) command. includeLocs accepts a string in the $geoNear (page 278) pipeline operator and a boolean in the geoNear (page 826) command.
280
With the exception of the _id eld, $group (page 280) cannot output nested documents. Every group expression must specify an _id eld. You may specify the _id eld as a dotted eld path reference, a document with multiple elds enclosed in braces (i.e. { and }), or a constant value. Note: Use $project (page 287) as needed to rename the grouped eld after an $group (page 280) operation, if necessary. Consider the following example:
db.article.aggregate( { $group : { _id : "$author", docsPerAuthor : { $sum : 1 }, viewsPerAuthor : { $sum : "$pageViews" } }} );
This groups by the author eld and computes two elds, the rst docsPerAuthor is a counter eld that adds one for each document with a given author eld using the $sum (page 291) function. The viewsPerAuthor eld is the sum of all of the pageViews elds in the documents for each group. Each eld dened for the $group (page 280) must use one of the group aggregation function listed below to generate its composite value: $addToSet (page 274) $first (page 278) $last (page 282) $max (page 284) $min (page 285) $avg (page 274) $push (page 289) $sum (page 291) Warning: The aggregation system currently stores $group (page 280) operations in memory, which may cause problems when processing a larger number of groups.
281
Takes an array with two expressions. $ifNull (page 282) returns the rst expression if it evaluates to a non-null value. Otherwise, $ifNull (page 282) returns the second expressions value.
This operation returns only the rst 5 documents passed to it from by the pipeline. $limit (page 282) has no effect on the content of the documents it passes. Note: Changed in version 2.4: When a $sort (page 289) immediately precedes a $limit (page 282) in the pipeline, the $sort (page 289) operation only maintains the top n results as it progresses, where n is the
282
specied limit. Before 2.4, $sort (page 289) would sort all the results in memory, and then limit the results to n results.
The $match (page 283) selects the documents where the author eld equals dave, and the aggregation returns the following:
{ "result" : [ { "_id" : ObjectId("512bc95fe835e68f199c8686"), "author": "dave", "score" : 80 }, { "_id" : ObjectId("512bc962e835e68f199c8687"), "author" : "dave", "score" : 85 } ], "ok" : 1 }
283
Example The following example selects documents to process using the $match (page 283) pipeline operator and then pipes the results to the $group (page 280) pipeline operator to compute a count of the documents:
db.articles.aggregate( [ { $match : { score : { $gt : 70, $lte : 90 } } }, { $group: { _id: null, count: { $sum: 1 } } } ] );
In the aggregation pipeline, $match (page 283) selects the documents where the score is greater than 70 and less than or equal to 90. These documents are then piped to the $group (page 280) to perform a count. The aggregation returns the following:
{ "result" : [ { "_id" : null, "count" : 3 } ], "ok" : 1 }
Note: Place the $match (page 283) as early in the aggregation pipeline as possible. Because $match (page 283) limits the total number of documents in the aggregation pipeline, earlier $match (page 283) operations minimize the amount of processing down the pipe. If you place a $match (page 283) at the very beginning of a pipeline, the query can take advantage of indexes like any other db.collection.find() (page 928) or db.collection.findOne() (page 933). New in version 2.4: $match (page 283) queries can support the geospatial $geoWithin (page 782) operations. Warning: You cannot use $where (page 779) in $match (page 283) queries as part of the aggregation pipeline.
284
To nd the minimum value of the age eld from all the documents, use the $min (page 285) operator:
db.users.aggregate( [ { $group: { _id:0, minAge: { $min: "$age"} } } ] )
The operation returns the value of the age eld in the minAge eld:
{ "result" : [ { "_id" : 0, "minAge" : 15 } ], "ok" : 1 }
To nd the minimum value of the age eld for only those documents with _id starting with the letter a, use the $min (page 285) operator after a $match (page 283) operation:
db.users.aggregate( [ { $match: { _id: /^a/ } }, { $group: { _id: 0, minAge: { $min: "$age"} } } ] )
The operation returns the minimum value of the age eld for the two documents with _id starting with the letter a:
{ "result" : [ { "_id" : 0, "minAge" : 25 } ], "ok" : 1 }
Example The users collection contains the following documents where some of the documents are either missing the age eld or the age eld contains null:
{ { { { { { "_id" "_id" "_id" "_id" "_id" "_id" : : : : : : "abc001", "age" "abe001", "age" "efg001", "age" "xyz001", "age" "xxx001" } "zzz001", "age" : : : : 25 35 20 15 } } } }
: null }
The following operation nds the minimum value of the age eld in all the documents:
285
Because only some documents for the $min (page 285) operation are missing the age eld or have age eld equal to null, $min (page 285) only considers the non-null and the non-missing values and the operation returns the following document:
{ "result" : [ { "_id" : 0, "minAge" : 15 } ], "ok" : 1 }
The following operation nds the minimum value of the age eld for only those documents where the _id equals "xxx001" or "zzz001":
db.users.aggregate( [ { $match: { _id: {$in: [ "xxx001", "zzz001" ] } } }, { $group: { _id: 0, minAge: { $min: "$age"} } } ] )
The $min (page 285) operation returns null for the minimum age since all documents for the $min (page 285) operation have null value for the eld age or are missing the eld:
{ "result" : [ { "_id" : 0, "minAge" : null } ], "ok" : 1 }
286
This operation includes the title eld and the author eld in the document that returns from the aggregation pipeline. Note: The _id eld is always included by default. You may explicitly exclude _id as follows:
287
Here, the projection excludes the _id eld but includes the title and author elds. Projections can also add computed elds to the document stream passing through the pipeline. A computed eld can use any of the expression operators (page 302). Consider the following example:
db.article.aggregate( { $project : { title : 1, doctoredPageViews : { $add:["$pageViews", 10] } }} );
Here, the eld doctoredPageViews represents the value of the pageViews eld after adding 10 to the original eld using the $add (page 274). Note: You must enclose the expression that denes the computed eld in braces, so that the expression is a valid object. You may also use $project (page 287) to rename elds. Consider the following example:
db.article.aggregate( { $project : { title : 1 , page_views : "$pageViews" , bar : "$other.foo" }} );
This operation renames the pageViews eld to page_views, and renames the foo eld in the other subdocument as the top-level eld bar. The eld references used for renaming elds are direct expressions and do not use an operator or surrounding braces. All aggregation eld references can use dotted paths to refer to elds in nested documents. Finally, you can use the $project (page 287) to create and populate new sub-documents. Consider the following example that creates a new object-valued eld named stats that holds a number of values:
db.article.aggregate( { $project : { title : 1 , stats : { pv : "$pageViews", foo : "$other.foo", dpv : { $add:["$pageViews", 10] } } }} );
This projection includes the title eld and places $project (page 287) into inclusive mode. Then, it creates the stats documents with the following elds: pv which includes and renames the pageViews from the top level of the original documents. 288 Chapter 28. Aggregation Framework Reference
foo which includes the value of other.foo from the original documents. dpv which is a computed eld that adds 10 to the value of the pageViews eld in the original document using the $add (page 274) aggregation expression.
This operation skips the rst 5 documents passed to it by the pipeline. $skip (page 289) has no effect on the content of the documents it passes along the pipeline.
This sorts the documents in the collection named <collection-name>, according to the key and specication in the { <sort-key> } document. Specify the sort in a document with a eld or elds that you want to sort by and a value of 1 or -1 to specify an ascending or descending sort respectively, as in the following example:
289
This operation sorts the documents in the users collection, in descending order according by the age eld and then in ascending order according to the value in the posts eld. When comparing values of different BSON types, MongoDB uses the following comparison order, from lowest to highest: 1.MinKey (internal type) 2.Null 3.Numbers (ints, longs, doubles) 4.Symbol, String 5.Object 6.Array 7.BinData 8.ObjectID 9.Boolean 10.Date, Timestamp 11.Regular Expression 12.MaxKey (internal type) Note: MongoDB treats some types as equivalent for comparison purposes. For instance, numeric types undergo conversion before comparison. Note: The $sort (page 289) cannot begin sorting documents until previous operators in the pipeline have returned all output. $skip (page 289) $sort (page 289) operator can take advantage of an index when placed at the beginning of the pipeline or placed before the following aggregation operators: $project (page 287) $unwind (page 292) $group (page 280). Changed in version 2.4: When a $sort (page 289) immediately precedes a $limit (page 282) in the pipeline, the $sort (page 289) operation only maintains the top n results as it progresses, where n is the specied limit. Before 2.4, $sort (page 289) would sort all the results in memory, and then limit the results to n results. Warning: Changed in version 2.4: Sorts immediately proceeded by a limit no longer need to t into memory. Previously, all sorts had to t into memory or use an index. Unless the $sort (page 289) operator can use an index, or immediately precedes a $limit (page 282), the $sort (page 289) operation must t within memory. For $sort (page 289) operations that immediately precede a $limit (page 282) stage, MongoDB only needs to store the number of items specied by $limit (page 282) in memory.
290
291
Note: The dollar sign (i.e. $) must proceed the eld specication handed to the $unwind (page 292) operator. In the above aggregation $project (page 287) selects (inclusively) the author, title, and tags elds, as well as the _id eld implicitly. Then the pipeline passes the results of the projection to the $unwind (page 292) operator, which will unwind the tags eld. This operation may return a sequence of documents that resemble the following for a collection that contains one document holding a tags eld with an array of 3 items.
{ "result" : [ { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob", "tags" : "fun" }, { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob", "tags" : "good" }, { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob", "tags" : "fun" } ],
292
"OK" : 1 }
A single document becomes 3 documents: each document is identical except for the value of the tags eld. Each value of tags is one of the values in the original tags array. Note: $unwind (page 292) has the following behaviors: $unwind (page 292) is most useful in combination with $group (page 280). You may undo the effects of unwind operation with the $group (page 280) pipeline operator. If you specify a target eld for $unwind (page 292) that does not exist in an input document, the pipeline ignores the input document, and will generate no result documents. If you specify a target eld for $unwind (page 292) db.collection.aggregate() (page 922) generates an error. that is not an array,
If you specify a target eld for $unwind (page 292) that holds an empty array ([]) in an input document, the pipeline ignores the input document, and will generates no result documents.
28.49 Pipeline
Warning: The pipeline cannot operate on values of the following types: Binary, Symbol, MinKey, MaxKey, DBRef, Code, and CodeWScope. Pipeline operators appear in an array. Conceptually, documents pass through these operators in a sequence. All examples in this section assume that the aggregation pipeline begins with a collection named article that contains documents that resemble the following:
{ title : "this is my title" , author : "bob" , posted : new Date() , pageViews : 5 , tags : [ "fun" , "good" , "fun" ] , comments : [
293
{ author :"joe" , text : "this is cool" } , { author :"sam" , text : "this is bad" } ], other : { foo : 5 } }
The current pipeline operators are: $project Reshapes a document stream by renaming, adding, or removing elds. Also use $project (page 287) to create computed values or sub-objects. Use $project (page 287) to: Include elds from the original document. Insert computed elds. Rename elds. Create and populate elds that hold sub-documents. Use $project (page 287) to quickly select the elds that you want to include or exclude from the response. Consider the following aggregation framework operation.
db.article.aggregate( { $project : { title : 1 , author : 1 , }} );
This operation includes the title eld and the author eld in the document that returns from the aggregation pipeline. Note: The _id eld is always included by default. You may explicitly exclude _id as follows:
db.article.aggregate( { $project : { _id : 0 , title : 1 , author : 1 }} );
Here, the projection excludes the _id eld but includes the title and author elds. Projections can also add computed elds to the document stream passing through the pipeline. A computed eld can use any of the expression operators (page 302). Consider the following example:
db.article.aggregate( { $project : { title : 1, doctoredPageViews : { $add:["$pageViews", 10] } }} );
Here, the eld doctoredPageViews represents the value of the pageViews eld after adding 10 to the original eld using the $add (page 274). Note: You must enclose the expression that denes the computed eld in braces, so that the expression is a valid object.
294
You may also use $project (page 287) to rename elds. Consider the following example:
db.article.aggregate( { $project : { title : 1 , page_views : "$pageViews" , bar : "$other.foo" }} );
This operation renames the pageViews eld to page_views, and renames the foo eld in the other subdocument as the top-level eld bar. The eld references used for renaming elds are direct expressions and do not use an operator or surrounding braces. All aggregation eld references can use dotted paths to refer to elds in nested documents. Finally, you can use the $project (page 287) to create and populate new sub-documents. Consider the following example that creates a new object-valued eld named stats that holds a number of values:
db.article.aggregate( { $project : { title : 1 , stats : { pv : "$pageViews", foo : "$other.foo", dpv : { $add:["$pageViews", 10] } } }} );
This projection includes the title eld and places $project (page 287) into inclusive mode. Then, it creates the stats documents with the following elds: pv which includes and renames the pageViews from the top level of the original documents. foo which includes the value of other.foo from the original documents. dpv which is a computed eld that adds 10 to the value of the pageViews eld in the original document using the $add (page 274) aggregation expression. $match $match (page 283) pipes the documents that match its conditions to the next operator in the pipeline. The $match (page 283) query syntax is identical to the read operation query (page 170) syntax. Example The following operation uses $match (page 283) to perform a simple equality match:
db.articles.aggregate( { $match : { author : "dave" } } );
The $match (page 283) selects the documents where the author eld equals dave, and the aggregation returns the following:
{ "result" : [ { "_id" : ObjectId("512bc95fe835e68f199c8686"), "author": "dave", "score" : 80
28.49. Pipeline
295
Example The following example selects documents to process using the $match (page 283) pipeline operator and then pipes the results to the $group (page 280) pipeline operator to compute a count of the documents:
db.articles.aggregate( [ { $match : { score : { $gt : 70, $lte : 90 } } }, { $group: { _id: null, count: { $sum: 1 } } } ] );
In the aggregation pipeline, $match (page 283) selects the documents where the score is greater than 70 and less than or equal to 90. These documents are then piped to the $group (page 280) to perform a count. The aggregation returns the following:
{ "result" : [ { "_id" : null, "count" : 3 } ], "ok" : 1 }
Note: Place the $match (page 283) as early in the aggregation pipeline as possible. Because $match (page 283) limits the total number of documents in the aggregation pipeline, earlier $match (page 283) operations minimize the amount of processing down the pipe. If you place a $match (page 283) at the very beginning of a pipeline, the query can take advantage of indexes like any other db.collection.find() (page 928) or db.collection.findOne() (page 933). New in version 2.4: $match (page 283) queries can support the geospatial $geoWithin (page 782) operations. Warning: You cannot use $where (page 779) in $match (page 283) queries as part of the aggregation pipeline. $limit Restricts the number of documents that pass through the $limit (page 282) in the pipeline. $limit (page 282) takes a single numeric (positive whole number) value as a parameter. Once the specied number of documents pass through the pipeline operator, no more will. Consider the following example:
db.article.aggregate( { $limit : 5 } );
296
This operation returns only the rst 5 documents passed to it from by the pipeline. $limit (page 282) has no effect on the content of the documents it passes. Note: Changed in version 2.4: When a $sort (page 289) immediately precedes a $limit (page 282) in the pipeline, the $sort (page 289) operation only maintains the top n results as it progresses, where n is the specied limit. Before 2.4, $sort (page 289) would sort all the results in memory, and then limit the results to n results. $skip Skips over the specied number of documents that pass through the $skip (page 289) in the pipeline before passing all of the remaining input. $skip (page 289) takes a single numeric (positive whole number) value as a parameter. Once the operation has skipped the specied number of documents, it passes all the remaining documents along the pipeline without alteration. Consider the following example:
db.article.aggregate( { $skip : 5 } );
This operation skips the rst 5 documents passed to it by the pipeline. $skip (page 289) has no effect on the content of the documents it passes along the pipeline. $unwind Peels off the elements of an array individually, and returns a stream of documents. $unwind (page 292) returns one document for every member of the unwound array within every source document. Take the following aggregation command:
db.article.aggregate( { $project : { author : 1 , title : 1 , tags : 1 }}, { $unwind : "$tags" } );
Note: The dollar sign (i.e. $) must proceed the eld specication handed to the $unwind (page 292) operator. In the above aggregation $project (page 287) selects (inclusively) the author, title, and tags elds, as well as the _id eld implicitly. Then the pipeline passes the results of the projection to the $unwind (page 292) operator, which will unwind the tags eld. This operation may return a sequence of documents that resemble the following for a collection that contains one document holding a tags eld with an array of 3 items.
{ "result" : [ { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob", "tags" : "fun" }, { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob",
28.49. Pipeline
297
"tags" : "good" }, { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob", "tags" : "fun" } ], "OK" : 1 }
A single document becomes 3 documents: each document is identical except for the value of the tags eld. Each value of tags is one of the values in the original tags array. Note: $unwind (page 292) has the following behaviors: $unwind (page 292) is most useful in combination with $group (page 280). You may undo the effects of unwind operation with the $group (page 280) pipeline operator. If you specify a target eld for $unwind (page 292) that does not exist in an input document, the pipeline ignores the input document, and will generate no result documents. If you specify a target eld for $unwind (page 292) db.collection.aggregate() (page 922) generates an error. that is not an array,
If you specify a target eld for $unwind (page 292) that holds an empty array ([]) in an input document, the pipeline ignores the input document, and will generates no result documents. $group Groups documents together for the purpose of calculating aggregate values based on a collection of documents. Practically, group often supports tasks such as average page views for each page in a website on a daily basis. The output of $group (page 280) depends on how you dene groups. Begin by specifying an identier (i.e. a _id eld) for the group youre creating with this pipeline. You can specify a single eld from the documents in the pipeline, a previously computed value, or an aggregate key made up from several incoming elds. Aggregate keys may resemble the following document:
{ _id : { author: $author, pageViews: $pageViews, posted: $posted } }
With the exception of the _id eld, $group (page 280) cannot output nested documents. Every group expression must specify an _id eld. You may specify the _id eld as a dotted eld path reference, a document with multiple elds enclosed in braces (i.e. { and }), or a constant value. Note: Use $project (page 287) as needed to rename the grouped eld after an $group (page 280) operation, if necessary. Consider the following example:
db.article.aggregate( { $group : { _id : "$author", docsPerAuthor : { $sum : 1 }, viewsPerAuthor : { $sum : "$pageViews" } }} );
298
This groups by the author eld and computes two elds, the rst docsPerAuthor is a counter eld that adds one for each document with a given author eld using the $sum (page 291) function. The viewsPerAuthor eld is the sum of all of the pageViews elds in the documents for each group. Each eld dened for the $group (page 280) must use one of the group aggregation function listed below to generate its composite value: $addToSet (page 274) $first (page 278) $last (page 282) $max (page 284) $min (page 285) $avg (page 274) $push (page 289) $sum (page 291) Warning: The aggregation system currently stores $group (page 280) operations in memory, which may cause problems when processing a larger number of groups. $sort The $sort (page 289) pipeline operator sorts all input documents and returns them to the pipeline in sorted order. Consider the following prototype form:
db.<collection-name>.aggregate( { $sort : { <sort-key> } } );
This sorts the documents in the collection named <collection-name>, according to the key and specication in the { <sort-key> } document. Specify the sort in a document with a eld or elds that you want to sort by and a value of 1 or -1 to specify an ascending or descending sort respectively, as in the following example:
db.users.aggregate( { $sort : { age : -1, posts: 1 } } );
This operation sorts the documents in the users collection, in descending order according by the age eld and then in ascending order according to the value in the posts eld. When comparing values of different BSON types, MongoDB uses the following comparison order, from lowest to highest: 1.MinKey (internal type) 2.Null 3.Numbers (ints, longs, doubles) 4.Symbol, String 5.Object 6.Array 7.BinData 8.ObjectID 28.49. Pipeline 299
9.Boolean 10.Date, Timestamp 11.Regular Expression 12.MaxKey (internal type) Note: MongoDB treats some types as equivalent for comparison purposes. For instance, numeric types undergo conversion before comparison. Note: The $sort (page 289) cannot begin sorting documents until previous operators in the pipeline have returned all output. $skip (page 289) $sort (page 28