Feature/etcm 967 db consistency check#1070
Conversation
| hash <- blockNumberMappingStorage.get(idx) | ||
| _ <- blockHeadersStorage.get(hash) | ||
| } yield ()).fold { | ||
| log.error("Database seems to be in inconsistent state, shutting down") |
There was a problem hiding this comment.
Shouldn't we include information on how the user should act when this happens? I guess how to delete the DB?
There was a problem hiding this comment.
Is this a good place to add such documentation for the client? Not convinced, I think we have a mantis docs project.
We don't have a policy for such situations.
There was a problem hiding this comment.
In that case advising the user to check the website. But I think that saying nothing is the worse option :)
There was a problem hiding this comment.
I think we should log the missing block's hash and number. I'm not sure it would really helps us when this happens, but it does not hurt.
|
Is the long-term plan to be able to have Mantis self fix this consistency? |
@LeonorLunatech depends, we don't know what the setting is. We did not have a user report this as an issue, so it might be test env specific. |
Ok, i get that. In that case, because I'm also wondering if there's a performance impact, would it make sense to have this enabled only when running in test networks? |
| loadGenesisData() | ||
|
|
||
| private[this] def startSyncController(): Unit = syncController ! SyncProtocol.Start | ||
| StorageConsistencyChecker.checkStorageConsistency( |
There was a problem hiding this comment.
Do you mind wrapping this in a startConsistencyCheck() or something similar?
| blockHeadersStorage, | ||
| shutdown | ||
| )(log) | ||
| timers.startSingleTimer(Tick, 5.seconds) |
There was a problem hiding this comment.
It's a bit confusing. Are we doing the check every 10 minutes (https://github.com/input-output-hk/mantis/pull/1070/files#diff-a8c02d6306b1c231f0efde1b3f46fa9cc6659899bfa882c317479432df489a7aR23) or 5 seconds?
It's probably a good idea to make this configurable, unless it's just a temporary measure
There was a problem hiding this comment.
good catch, this should be a method
| shutdown: ShutdownOp, | ||
| step: Int = DefaultStep | ||
| )(implicit log: Logger): Unit = | ||
| Range(0, bestBlockNumber.intValue, step).foreach { idx => |
There was a problem hiding this comment.
I'm wondering if it makes sense to check every 1000th block. I imagine a case where the check passes but the db is inconsistent.
Perhaps it would be worthwhile to occasionally check the whole range?
There was a problem hiding this comment.
this is a first pass. I want to start identifying the circumstances when clients loose consistency.
ea77359 to
70f9679
Compare
Check if the db is consistent. It has been observed that dbs get broken on sagano. This should support figuring out when it happens.