-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
We've been seeing an issue where a process does not always have read-your-writes consistency, and a previous version of a row was sometimes being returned.
The general scenario is events being processed to update a read-side projection. Events are processed one at a time, in order, retrieving the current value, and then applying a function to the event and the current value to create the next value. Sometimes, not often or consistently, it looked like a write was lost — the commit was successful but processing of the next event would not see the updated value, but be based on the value before.
We tracked this down to the query last result cache and a race condition around the modification id counters. The modification ids for tables are updated before a transaction is committed. The race condition is that a concurrent query can fall between the modification counter update and the commit to store, so that it associates the previous result with the next modification id, and on subsequent checks doesn't realise that the cache is outdated. It's slim, but possible in what we've seen.
It needed a particular combination to reproduce: simultaneous queries on the row being updated, and connection pooling (we're using R2DBC). We haven't created a reproducer for H2 directly, but here's the pattern we were seeing in more detail:
- projected value will be a concatenation of the event values (
A|B|C|D|...) - value for the projected row is currently
A - modification id counter is currently
N - process event
B- update value fromAtoA|Band commit transaction (on connection1) - modification counter is incremented to
N+1, before the transaction commit to store - before value
A|Bis actually stored, there's a concurrent query for this row (on connection2) - the query returns current value
A, but associates this with the updated modification counter ofN+1<-- here's the bug - value
A|Bis committed to storage - process next event
C- but on pooled connection2now, where the query last result cache is incorrect - query for the current value before update sees that the last modification id counter
N+1is up-to-date - query returns cached previous result of
Ainstead of retrieving the updated valueA|B - event processor upserts to
A|Cinstead ofA|B|C
This could potentially be resolved by simply reversing the order here:
h2database/h2/src/main/org/h2/engine/SessionLocal.java
Lines 682 to 683 in 4daeb2b
| markUsedTablesAsUpdated(); | |
| transaction.commit(); |
so that the transaction is committed first, before the table modification ids are updated. That would avoid the stale data issue. Instead, the race condition would be flipped, and the last result cache could have an updated value associated with the previous modification id. That shouldn't be problematic for consistency — only that the cache would be considered stale on the next query, even though it has the newer value, and the query fetches the result again.
We've worked around this for now by setting OPTIMIZE_REUSE_RESULTS=FALSE.