Skip to content

Race condition causing stale data in query last result cache #3894

@pvlugter

Description

@pvlugter

We've been seeing an issue where a process does not always have read-your-writes consistency, and a previous version of a row was sometimes being returned.

The general scenario is events being processed to update a read-side projection. Events are processed one at a time, in order, retrieving the current value, and then applying a function to the event and the current value to create the next value. Sometimes, not often or consistently, it looked like a write was lost — the commit was successful but processing of the next event would not see the updated value, but be based on the value before.

We tracked this down to the query last result cache and a race condition around the modification id counters. The modification ids for tables are updated before a transaction is committed. The race condition is that a concurrent query can fall between the modification counter update and the commit to store, so that it associates the previous result with the next modification id, and on subsequent checks doesn't realise that the cache is outdated. It's slim, but possible in what we've seen.

It needed a particular combination to reproduce: simultaneous queries on the row being updated, and connection pooling (we're using R2DBC). We haven't created a reproducer for H2 directly, but here's the pattern we were seeing in more detail:

  • projected value will be a concatenation of the event values (A|B|C|D|...)
  • value for the projected row is currently A
  • modification id counter is currently N
  • process event B - update value from A to A|B and commit transaction (on connection 1)
  • modification counter is incremented to N+1, before the transaction commit to store
  • before value A|B is actually stored, there's a concurrent query for this row (on connection 2)
  • the query returns current value A, but associates this with the updated modification counter of N+1 <-- here's the bug
  • value A|B is committed to storage
  • process next event C - but on pooled connection 2 now, where the query last result cache is incorrect
  • query for the current value before update sees that the last modification id counter N+1 is up-to-date
  • query returns cached previous result of A instead of retrieving the updated value A|B
  • event processor upserts to A|C instead of A|B|C

This could potentially be resolved by simply reversing the order here:

markUsedTablesAsUpdated();
transaction.commit();

so that the transaction is committed first, before the table modification ids are updated. That would avoid the stale data issue. Instead, the race condition would be flipped, and the last result cache could have an updated value associated with the previous modification id. That shouldn't be problematic for consistency — only that the cache would be considered stale on the next query, even though it has the newer value, and the query fetches the result again.

We've worked around this for now by setting OPTIMIZE_REUSE_RESULTS=FALSE.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions