2.10. Documents migration
This is an overview of the process and tools that MongoEngine provides to help you make changes to your schema and run migrations of documents stored in the database.
The general idea is to keep it simple and explicit. Migrations of documents stored in MongoDB can be complex and may require custom code, but MongoEngine tries to help with the most common cases.
2.10.1. Example 1: Addition of a field
Before making a simple change, such as adding a new field to a Document, let's first take a look at the initial state of the schema:
class User(Document):
name = StringField()
Documents are stored in the database as follows:
# {_id: ..., name: "John"}
On the next step you want to add an enabled field with a default value of
True. You assume that existing documents should also have this field set to
True. The following is the new Document definition:
class User(Document):
name = StringField()
enabled = BooleanField(default=True)
When adding fields/modifying default values, you can use any of the following to do the migration as a standalone script:
# Use mongoengine to set a default value for a given field
User.objects().update(enabled=True)
# or use pymongo directly
user_coll = User._get_collection()
user_coll.update_many({}, {'$set': {'enabled': True}})
If you change a field to a non-required field (with no default value), existing documents without that field will work fine. But if you want to set a default value for existing documents that don't have the field, you can use the above approach.
Other pitfalls: defining default values can have unexpected consequences when it comes to
querying. When querying with a filter like User.objects(enabled__exists=False),
you should get 0 results after the migration.
2.10.2. Example 2: Inheritance change
Let's start with the following schema:
class Human(Document):
name = StringField()
meta = {"allow_inheritance": True}
class Jedi(Human):
dark_side = BooleanField()
light_saber_color = StringField()
Jedi(name="Darth Vader", dark_side=True, light_saber_color="red").save()
Jedi(name="Obi Wan Kenobi", dark_side=False, light_saber_color="blue").save()
assert Human.objects.count() == 2
assert Jedi.objects.count() == 2
# Let's check how these documents got stored in mongodb
print(Jedi.objects.as_pymongo())
# [
# {'_id': ..., '_cls': 'Human.Jedi', 'name': 'Darth Vader', 'dark_side': True, ...},
# {'_id': ..., '_cls': 'Human.Jedi', 'name': 'Obi Wan Kenobi', 'dark_side': False, ...}
# ]
As you can observe, when you use inheritance, MongoEngine stores a field named
_cls behind the scenes to keep track of the Document class. Let's now take the
scenario that you want to refactor the inheritance schema and:
- Have the Jedis with
dark_side=True/FalsebecomeGoodJedis/DarkSith - Get rid of the
dark_sidefield
# unchanged
class Human(Document):
name = StringField()
meta = {"allow_inheritance": True}
# attribute 'dark_side' removed
class GoodJedi(Human):
light_saber_color = StringField()
# new class
class BadSith(Human):
light_saber_color = StringField()
MongoEngine doesn't know about the change or how to map them with the existing data so if you don't apply any migration, you will observe strange behavior, as if the collection was suddenly empty.
The migration script should update the _cls fields and remove the
dark_side field:
humans_coll = Human._get_collection()
old_class = 'Human.Jedi'
good_jedi_class = 'Human.GoodJedi'
bad_sith_class = 'Human.BadSith'
# Step 1: rename _cls for GoodJedi
humans_coll.update_many(
{'_cls': old_class, 'dark_side': False},
{'$set': {'_cls': good_jedi_class}, '$unset': {'dark_side': 1}}
)
# Step 2: rename _cls for BadSith
humans_coll.update_many(
{'_cls': old_class, 'dark_side': True},
{'$set': {'_cls': bad_sith_class}, '$unset': {'dark_side': 1}}
)
After the migration you can verify:
jedi = GoodJedi.objects().first()
assert jedi.name == "Obi Wan Kenobi"
sith = BadSith.objects().first()
assert sith.name == "Darth Vader"
for doc in humans_coll.find():
if doc['_cls'] == 'Human.Jedi':
doc['_cls'] = 'Human.BadSith' if doc['dark_side'] else 'Human.GoodJedi'
doc.pop('dark_side')
humans_coll.replace_one({'_id': doc['_id']}, doc)
2.10.3. Example 4: Index removal
If you remove an index from your Document class, or remove an indexed Field from your Document class, you'll need to manually drop the corresponding index. MongoEngine will not do that for you.
The way to deal with this case is to identify the name of the index to drop with
index_information(), and then drop it with drop_index(). Let's
assume that you start with the following Document class:
class User(Document):
name = StringField(unique=True)
Running User._get_collection().index_information() would give you something
like:
{
'_id_': {'key': [('_id', 1)], ...},
'name_1': {'key': [('name', 1)], 'unique': True, ...}
}
Thus: _id which is the default index and name_1 which is our
custom index. If you would remove the name field or its index, you would have
to call:
User._get_collection().drop_index('name_1')
auto_create_index is disabled).
2.10.4. Recommendations
- Write migration scripts whenever you make changes to the model schemas.
- Using
DynamicDocumentormeta = {"strict": False}may help to avoid some migrations or to have 2 versions of your application co-exist. - Write post-processing checks to verify that migration scripts worked (see below).
2.10.5. Post-processing checks
The following recipe can be used to sanity check a Document collection after you applied a migration. It does not make any assumption on what was migrated — it will fetch 1000 objects randomly and run some quick checks on the documents to make sure they look OK. As written, it will fail on the first occurrence of an error, but this can be adapted based on your needs:
import random
def check_collection(DocClass, sample_size=1000):
"""Sanity checks on a Document collection after migration."""
count = DocClass.objects.count()
print(f"Total documents: {count}")
# Sample random documents
sample = random.sample(range(count), min(sample_size, count))
for i in sample:
doc = DocClass.objects.skip(i).first()
try:
# Attempt to load and validate the document
doc.validate()
except Exception as e:
print(f"Validation error on document {doc.id}: {e}")
raise
print(f"Checked {len(sample)} documents — all OK.")
check_collection(User)