Actively migrate away from RocksDB based ID tracker#6579
Conversation
| let id_tracker = create_rocksdb_id_tracker(db_builder.require()?)?; | ||
|
|
||
| // Actively migrate RocksDB based ID tracker into mutable ID tracker | ||
| if feature_flags().migrate_rocksdb_id_tracker { | ||
| let id_tracker = migrate_rocksdb_id_tracker_to_mutable(id_tracker, segment_path)?; | ||
| return Ok(sp(IdTrackerEnum::MutableIdTracker(id_tracker))); | ||
| } | ||
|
|
||
| return Ok(sp(IdTrackerEnum::RocksDbIdTracker(id_tracker))); |
There was a problem hiding this comment.
Note: this is the only place left where we load the RocksDB based ID tracker. It is loaded if it still exists on disk.
Since this is the only call site, we can migrate the ID tracker here.
db7b9f0 to
a8a75de
Compare
This comment was marked as resolved.
This comment was marked as resolved.
|
|
||
| // Copy all mappings into it | ||
| for (external_id, internal_id) in old_id_tracker.iter_from(None) { | ||
| let version = old_id_tracker.internal_version(internal_id).unwrap_or(0); |
There was a problem hiding this comment.
Why are we creating a synthetic version here?
If the version is missing it means the point is not well formed I believe.
There was a problem hiding this comment.
Good question.
We can do two things here:
- ignore the point not having a version
- keep the point with the lowest possible version, a higher point version will have priority
I did pick the second approach. If there's a newer version of such point in another segment, it is used instead.
Mappings are always flushed before versions. If the point was just created and we abort flushing half way through, it may not have a version yet. Our WAL replay will then update this very point in-place and give it the correct version.
If we would ignore (delete) the point not having the version, our WAL replay would still put it back for us - but in a different place. So that would work as well.
If the point was just deleted we should not be able to get into this situation, because the mapping is dropped/flushed first.
Please correct me if you think my logic is wrong here. This is always quite convoluted. Thoughts?
|
|
||
| // Actively migrate RocksDB based ID tracker into mutable ID tracker | ||
| if common::flags::feature_flags().migrate_rocksdb_id_tracker { | ||
| let id_tracker = migrate_rocksdb_id_tracker_to_mutable(id_tracker, segment_path)?; |
There was a problem hiding this comment.
what happens if we fail here?
I assume the segment won't be loaded at all instead of keeping the current rocksdb id tracker.
Is it what we want?
Can this crash loop on restart?
There was a problem hiding this comment.
Correct. The collection would fail to load.
The goal here is to be sure we don't leave anything RocksDB behind.
I'm not sure if simply chugging along on failure is better. If we'd log the problem, I'm not sure if it's ever seen. It would be a major problem if we drop the RocksDB dependency in the next release.
We could pick a phased approach instead:
- now: if migration fails, still load old tracker and print a warning
- in the next minor release: force migrate and crash if it fails
- in the minor release after: fully drop RocksDB dependency
It would prolong the whole process of removing RocksDB though, which I'm not a fan of.
Alternatively we can force optimization here. If migration fails we keep the RocksDB index, then we force the optimizer to pick up all not-yet-migrated-segments which would also move segments away from the RocksDB ID tracker. Though this is less reliable, and it feels like a hack.
Thoughts?
There was a problem hiding this comment.
Also possible: we can keep the feature flag and enable it by default, it would still crash on failure like you describe. But it would allow people to explicitly disable the flag to bypass the forced migration - if that is ever needed.
There was a problem hiding this comment.
I am ok with migration failure, in this case, assuming we can always start previous version and it won't corrupt the storage in the middle of process
There was a problem hiding this comment.
To improve it a bit I've added additional cleanup in 9e2313a
If migrating to the mutable ID tracker failed, it now cleans up the files it created as part of it.
a1802b0 to
9e2313a
Compare
* Actively migrate RocksDB ID tracker to new format on segment load * Extract ID tracker migration logic to function * Feature flag ID tracker migration * Simplify ID tracker migration by moving it deeper into load function * Move migrate function to the bottom * Add test to assert RocksDB to mutable ID tracker migration * Assert new mutable ID tracker is empty * Review remarks * On RocksDB to mutable ID tracker migration failure, clean up files * Demote empty mutable ID tracker to debug assertion * Copy all point versions, including deleted, set known mappings * read links and versions separatelly --------- Co-authored-by: generall <[email protected]>
Actively migrate away from ID trackers that still use RocksDB. On segment load, it is replaced with our new mutable ID tracker.
The RocksDB ID tracker has been disabled for a long time. However, there may be very old deployments with a static collection that still use it. An optimization run on all segments was required to migrate it, with this change we will now actively migrate it on segment load.
This is part of a bigger effort to fully remove RocksDB from our code base.
The migration is behind a runtime feature flag (
migrate_rocksdb_id_tracker). We'll likely enable it by default in one of the upcoming releases.Tasks
All Submissions:
devbranch. Did you create your branch fromdev?New Feature Submissions:
cargo +nightly fmt --allcommand prior to submission?cargo clippy --all --all-featurescommand?