Skip to content

Implementing the WAITAOF command (issue #10505)#11713

Merged
oranagra merged 21 commits intoredis:unstablefrom
slavak:waitaof_infra
Mar 14, 2023
Merged

Implementing the WAITAOF command (issue #10505)#11713
oranagra merged 21 commits intoredis:unstablefrom
slavak:waitaof_infra

Conversation

@slavak
Copy link
Contributor

@slavak slavak commented Jan 15, 2023

Implementing the WAITAOF functionality requested in issue #10505, which would allow the user to block until a specified number of Redises have fsynced all previous write commands to the AOF.

Syntax: WAITAOF <num_local> <num_replicas> <timeout>
Response: Array containing two elements: num_local, num_replicas
num_local is always either 0 or 1 representing the local AOF on the master.
num_replicas is the number of replicas that acknowledged the a replication
offset of the last write being fsynced to the AOF.

Returns an error when called on replicas, or when called with non-zero
num_local on a master with AOF disabled, in all other cases the response
just contains number of fsync copies.

Main changes:

  • Added code to keep track of replication offsets that are confirmed to have been fsynced to disk.
  • Keep advancing master_repl_offset even when replication is disabled (and there's no replication backlog, only if there's an AOF enabled).
    This way we can use this command and it's mechanisms even when replication is disabled.
  • Extend REPLCONF ACK to REPLCONF ACK <ofs> FACK <ofs>, the FACK will be appended only if there's an AOF on the replica, and already ignored on old masters (thus backwards compatible)
  • WAIT now no longer wait for the replication offset after your last command, but rather the replication offset after your last write (or read command that caused propagation, e.g. lazy expiry).

Unrelated changes:

  • WAIT command respects CLIENT_DENY_BLOCKING (not just CLIENT_MULTI)

Implementation details:

  • Add an atomic var named fsynced_reploff_pending that's updated (usually by the bio thread) and later copied to the main fsynced_reploff variable (only if the AOF base file exists).
    I.e. during the initial AOF rewrite it will not be used as the fsynced offset since the AOF base is still missing.
  • Replace close+fsync bio job with new BIO_CLOSE_AOF (AOF specific) job that will also update fsync offset the field.
  • Handle all AOF jobs (BIO_CLOSE_AOF, BIO_AOF_FSYNC) in the same bio
    worker thread, to impose ordering on their execution. This solves a
    race condition where a job could set fsynced_reploff_pending to a higher
    value than another pending fsync job, resulting in indicating an offset
    for which parts of the data have not yet actually been fsynced.
    Imposing an ordering on the jobs guarantees that fsync jobs are executed
    in increasing order of replication offset.
  • Drain bio jobs when switching appendfsync to "always"
    This should prevent a write race between updates to fsynced_reploff_pending
    in the main thread (flushAppendOnlyFile when set to ALWAYS fsync), and
    those done in the bio thread.
  • Drain the pending fsync when starting over a new AOF to avoid race conditions
    with the previous AOF offsets overriding the new one (e.g. after switching to replicate from a new master).
  • Make sure to update the fsynced offset at the end of the initial AOF rewrite.
    a must in case there are no additional writes that trigger a periodic fsync, specifically for a replica that does a full sync.

Limitations:
It is possible to write a module and a Lua script that propagate to the AOF and doesn't
propagate to the replication stream. see REDISMODULE_ARGV_NO_REPLICAS and luaRedisSetReplCommand.
These features are incompatible with the WAITAOF command, and can result in two bad cases.
The scenario is that the user executes command that only propagates to AOF, and then immediately
issues a WAITAOF, and there's no further writes on the replication stream after that.

  1. if the the last thing that happened on the replication stream is a PING (which increased the replication offset but won't trigger an fsync on the replica), then the client would hang forever (will wait for an fack that the replica will never send sine it doesn't trigger any fsyncs).
  2. if the last thing that happened is a write command that got propagated properly, then WAITAOF will be released immediately, without waiting for an fsync (since the offset didn't change)

Refactoring:

  • Plumbing to allow bio worker to handle multiple job types
    This introduces infrastructure necessary to allow BIO workers to
    not have a 1-1 mapping of worker to job-type. This allows in the
    future to assign multiple job types to a single worker, either as
    a performance/resource optimization, or as a way of enforcing
    ordering between specific classes of jobs.

Discussion:
[Note currently this PR implements the most explicit approach of the options presented blow]

In its most general form, the command can be used to wait for fsync by replicas, by the master (or only Redis if replication is not used), or some combination of the above. A few use-cases I can think of:

  1. I have no replication, and wish to wait for all of my previous writes to be persisted to disk before I continue.
  2. I have a master and n replicas. I wish to wait for at least m of those replicas to persist my writes to disk before I continue.
  3. I have a master and n replicas. I wish to wait until at least m Redises (including both master and replicas) have persisted my writes to disk before I continue.

The simplest approach would be to extend the existing WAIT command with an optional AOF keyword:

WAIT AOF <num> <timeout>

In which case we need to define whether <num> includes the master or not. If it does, use-cases 1 and 3 are covered, but use-case 2—which would probably be a pretty common one—is impossible; If it does not, 2 is supported but 1 and 3 are not. This could be extended by giving "0" the special meaning of "wait for current Redis only" to enable use-case 1, but this might be somewhat surprising semantics.

Another option that crossed my mind is to use positive/negative numbers for the different meanings. e.g.: "-5" means "wait for any 5 Redises" while "5" means "wait for 5 replicas." This is simple and supports all 3 use-cases above, but might not be intuitive or particularly easy to remember.

Yet another possibility is of course to split this functionality into a separate command entirely, in which case we need to define the arguments and semantics of that new command.

Todo:

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depending on the type of interface we choose for WAITAOF maybe the race you presented is irrelevant?
i.e. if the caller explicitly asks to wait for AOF files on replicas, it should hang.

maybe another solution besides the one you presented in the commit comment could be to somehow bump pot_fsynced_reploff to the right offset (last one that existed) on the next fsync?
maybe we should discuss that on a call.

@slavak
Copy link
Contributor Author

slavak commented Feb 5, 2023

maybe another solution besides the one you presented in the commit comment could be to somehow bump pot_fsynced_reploff to the right offset (last one that existed) on the next fsync?
maybe we should discuss that on a call.

I don't think that would help if you already have clients waiting on a WAITAOF, since they would be waiting on an offset < the last updated server.master_repl_offset. If we artificially bump up pot_fsynced_reploff without an actual fsync taking place, those clients will be released on the next check, regardless of if the changes they care about have actually been written to disk.

Added code to keep track of replication offsets that are confirmed
to have been fsynced to disk. REPLCONF ACK command is extended to
also report this offset- reported as REPLCONF FACK <offset>.

This REPLCONF extension is backwards-compatible because the REPLCONF
command handling terminates after encountering the ACK offset, so
older Redises will simply ignore the unknown FACK that follows it.
This introduces infrastructure necessary to allow BIO workers to
not have a 1-1 mapping of worker to job-type. This allows in the
future to assign multiple job types to a single worker, either as
a performance/resource optimization, or as a way of enforcing
ordering between specific classes of jobs.
- Replace close+fsync job with new BIO_CLOSE_AOF job that will also update
  the field.
- Handle all AOF jobs (BIO_CLOSE_AOF, BIO_AOF_FSYNC) in the same bio
  worker thread, to impose ordering on their execution. This solves a
  race condition where a job could set `pot_fsynced_reploff` to a higher
  value than another pending fsync job, resulting in indicating an offset
  for which parts of the data have not yet actually been fsynced.
  Imposing an ordering on the jobs guarantees that fsync jobs are executed
  in increasing order of replication offset.
This should prevent a write race between updates to `pot_fsynced_reploff`
in the main thread (`flushAppendOnlyFile` when set to ALWAYS fsync), and
those done in the bio thread.
When this is the case, the `server.master_repl_offset` we usually
rely on is not being updated, so we need an alternative means of
tracking fsyncs.

In this case, we use `pot_fsynced_reploff` as a basic "fsync epoch"
counter instead, incrementing it each time the AOF is fsynced.

This is inspired by antirez's original approach to the problem, here:
redis/redis@unstable...wait-aof

This change introduces a possible race in potential implementation of
a `WAIT AOF` command:
1. Client executes `WAIT AOF` to wait for an fsync of its own server.
   Replication is currently enabled. A client waiting for fsync of the
   current `master_repl_offset` is added to the `clients_waiting_acks` list.
2. Suppose `server.pot_fsynced_reploff == server.master_repl_offset - n`.
3. We now disable replication.
4. Now every AOF fsync will simply increment `server.pot_fsynced_reploff`.
   As a result, the blocked client will unnecessarily wait `n` fsyncs
   before it is released.

To resolve this race, we might have to:
* When disabling replication, iterate over `clients_waiting_acks` list and
  set relevant offsets to the value `pot_fsynced_reploff`.
This reverts the previous commit that tracked fsync "epochs", as
it was non-ideal and suffered from some serious bugs.

Instead I've modified the code to continue incrementing the replication
offset as long as AOF is configured, even with no replication backlog.
This will allow us to rely on a single field for tracking AOF fsync in
all scenarios, with or without replication.
@slavak slavak force-pushed the waitaof_infra branch 2 times, most recently from 0cba22e to 66c87f5 Compare February 5, 2023 13:48
* Revert refactor of if-else to switch to reduce diff size.
* Add missing initalization to `startAppendOnly` to resolve bug.
slavak and others added 3 commits February 5, 2023 16:08
Iterating over the worker's entire jobs list wasn't very efficient.
This commit adds some minimal bookkeeping to maintain a running counter
per job type. This allows `bioPendingJobsOfType` to become an O(1)
operation.
Syntax:
WAITAOF <num_local> <num_replicas> <timeout>
Response:
Array containing num_local, num_replicas

Returns an error when called on replicas, or when called with non-zero
num_local on master with AOF disabled, in all other cases the response
just contains number of fsync copies.

Changes
* Make sure for the replica or master to update the fsynced offset at
  the end of the initial AOF rewrite (even if there are no additional
  writes that trigger a periodic fsync.
* Drain the pending fsync when starting over a new AOF to avoid race
  conditions with the previous aof offsets overriding the new one.
* implement the reciving side of `REPLCONF ACK <ofs> FACK <ofs>` which
  will be appended only if there's an AOF on the replica, and ignored on
  old masters
* WAIT command respects CLIENT_DENY_BLOCKING (not just CLIENT_MULTI)

Minor notes:
* rename pot_fsynced_reploff to fsynced_reploff_pending
* when an initial rewrite ends, update fsynced_reploff right away, don't
  wait for the next beforeSleep.
@oranagra oranagra changed the title Implementing the WAIT AOF command (issue #10505) Implementing the WAITAOF command (issue #10505) Feb 6, 2023
@oranagra
Copy link
Member

oranagra commented Feb 7, 2023

@redis/core-team this is ready for review, and the top comment is updated.

@oranagra oranagra added state:major-decision Requires core team consensus release-notes indication that this issue needs to be mentioned in the release notes labels Feb 7, 2023
@sundb
Copy link
Collaborator

sundb commented Feb 8, 2023

The following steps may cause the WAITAOF command to be blocked forever, until aof is saved again.

  1. add a log in waitaofCommand()
ackreplicas = replicationCountAOFAcksByOffset(c->woff);
acklocal = server.fsynced_reploff >= c->woff;
printf("server.fsynced_reploff: %lld, c->woff: %lld\n", server.fsynced_reploff, c->woff);
  1. start master server
./src/redis-server --appendonly yes
  1. start slave server
./src/redis-server --port 6380
  1. execute SLAVEOF command with the slave client
./src/redis-cli -h 127.0.0.1 -p 6380
slaveof 127.0.0.1 6379
  1. execute WAITAOF command with the master client
while true; do echo "waitaof 1 0 0"; done | ./src/redis-cli -x

master logs:

115202:C 08 Feb 2023 12:16:17.615 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
115202:C 08 Feb 2023 12:16:17.615 # Redis version=255.255.255, bits=64, commit=909f091f, modified=1, pid=115202, just started
115202:C 08 Feb 2023 12:16:17.615 # Configuration loaded
115202:M 08 Feb 2023 12:16:17.615 * monotonic clock: POSIX clock_gettime
115202:M 08 Feb 2023 12:16:17.616 * Running mode=standalone, port=6379.
115202:M 08 Feb 2023 12:16:17.616 # Server initialized
115202:M 08 Feb 2023 12:16:17.616 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
115202:M 08 Feb 2023 12:16:17.616 * Reading RDB base file on AOF loading...
115202:M 08 Feb 2023 12:16:17.616 * Loading RDB produced by version 255.255.255
115202:M 08 Feb 2023 12:16:17.616 * RDB age 1225 seconds
115202:M 08 Feb 2023 12:16:17.616 * RDB memory usage when created 0.82 Mb
115202:M 08 Feb 2023 12:16:17.616 * RDB is base AOF
115202:M 08 Feb 2023 12:16:17.617 * Done loading RDB, keys loaded: 0, keys expired: 0.
115202:M 08 Feb 2023 12:16:17.617 * DB loaded from base file appendonly.aof.1.base.rdb: 0.000 seconds
115202:M 08 Feb 2023 12:16:17.617 * DB loaded from append only file: 0.000 seconds
115202:M 08 Feb 2023 12:16:17.617 * Opening AOF incr file appendonly.aof.1.incr.aof on server start
115202:M 08 Feb 2023 12:16:17.617 * Ready to accept connections tcp
115202:M 08 Feb 2023 12:16:25.929 * Replica 127.0.0.1:6380 asks for synchronization
115202:M 08 Feb 2023 12:16:25.929 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'c6eb517826ab88f5c97a2dd31195c079650a18d8', my replication IDs are 'd6b625e0f3fc09a312e26c775a5681df6f0691df' and '0000000000000000000000000000000000000000')
115202:M 08 Feb 2023 12:16:25.929 * Replication backlog created, my new replication IDs are '114682cd6adfd9801b4c0d3e02413e85234e6a84' and '0000000000000000000000000000000000000000'
115202:M 08 Feb 2023 12:16:25.929 * Delay next BGSAVE for diskless SYNC
server.fsynced_reploff: 0, c->woff: 0
....repeated the line above
server.fsynced_reploff: 0, c->woff: 14
115202:M 08 Feb 2023 12:16:30.651 * Starting BGSAVE for SYNC with target: replicas sockets
115202:M 08 Feb 2023 12:16:30.652 * Background RDB transfer started by pid 115369
115369:C 08 Feb 2023 12:16:30.653 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
115202:M 08 Feb 2023 12:16:30.654 # Diskless rdb transfer, done reading from pipe, 1 replicas still up.
115202:M 08 Feb 2023 12:16:30.659 * Background RDB transfer terminated with success
115202:M 08 Feb 2023 12:16:30.659 * Streamed RDB transfer with replica 127.0.0.1:6380 succeeded (socket). Waiting for REPLCONF ACK from replica to enable streaming
115202:M 08 Feb 2023 12:16:30.659 * Synchronization with replica 127.0.0.1:6380 succeeded

The main reason is that WAIT[AOF] waits for the repl offset of the previous command this client executed.
But in reality the increase in repl offset is not always due to the command, but may be due to the synchronization process between master and slave.

@oranagra
Copy link
Member

oranagra commented Feb 8, 2023

I should have seen this coming.
The periodic PINGs increase the replication offset but they're not triggering an fsync (on either the master or replica), so a client that didn't send a write command before WAITAOF (or one that send a read command between the write command and the WAITAOF, and a PING slipped in between his write and read) would hang until someone else triggers an fsync.

one clear solution for that is replication stream multiplexing (tracked in #8440), where PINGs will not increase the replication offset.
another possibility is to update c->woff only after write commands, specifically after ones that triggered propagation.
then, if the client issued a read after the write, it won't be a problem.
in addition to that, we can state that sending a WAIT[AOF] without any write command (or actually a modification) on that connection is invalid.

looking for other advises.

Comment on lines +3497 to +3499
/* Return the number of replicas that already acknowledged the specified
* replication offset being AOF fsynced. */
int replicationCountAOFAcksByOffset(long long offset) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out it's possible to write a module and a Lua script that propagate to the AOF and doesn't propagate to the replication stream. see REDISMODULE_ARGV_NO_REPLICAS and luaRedisSetReplCommand.

That can result in two bad cases, scenario:
User executes command that only propagates to AOF, and then immediately issues a WAITAOF, and there's no further writes on the replication stream after that.

  1. if the the last thing that happened on the replication stream is a PING (which increased the replication offset but won't trigger an fsync on the replica), then the client would hang forever (will wait for an fack that the replica will never send sine it doesn't trigger any fsyncs).
  2. if the last thing that happened is a write command that got propagated properly, then WAITAOF will be released immediately, without waiting for an fsync (since the offset didn't change)

We need to decide if we want to handle that in some way, or ignore it as an abusive case (i'm not sure what's the justification for someone to explicitly propagate only to AOF).

On the other hand, if the client asked WAITAOF to wait for remote AOF files, and propagated a command only to the local AOF, maybe it deserves to get blocked forever (even if there are further writes), since his write didn't go to replicas and he waits for something that wouldn't happen..

So the other scenario we should look at is a case where there's no replication, only a master with AOF, and a command that's explicitly only propagating to AOF, and then user uses WAITAOF to wait for the local AOF file.
in that case, with the current code, we won't increment the replication offset and the client will get released before the fsync. that's probably easy to solve (increment the replication offset in that scenario).

@oranagra
Copy link
Member

oranagra commented Mar 9, 2023

@redis/core-team reminder to take a look / approve this one.

@madolson
Copy link
Contributor

I generally dislike wait, I've seen a lot of individuals use it with a 3 node configuration and just wait for one replica. While that technically increases the real world durability, they can still lose writes to the last node. When they set up wait with 2 nodes, they inevitably have a prolonged outage when one of the replicas dies. Now we're just taking more dependencies on disks, which makes me think more people are going to slam into these sharp edges. The answer to durability is not just to write data to more disks, but to write it intelligently in a way that can be recovered.

With that said, I feel like that is kind of what Redis does, so practically OK with it.

@oranagra
Copy link
Member

PR approved in a core-team meeting.
@slavak maybe you wanna make a redis-doc PR?

@oranagra
Copy link
Member

@oranagra oranagra merged commit 9344f65 into redis:unstable Mar 14, 2023
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Mar 15, 2023
WAITAOF wad added in redis#11713, its return is an array.
But forget to handle WAITAOF in last_offset and last_numreplicas,
causing WAITAOF to return a WAIT like reply.

Tests was added to validate that case (both WAIT and WAITAOF).
@slavak slavak deleted the waitaof_infra branch March 15, 2023 07:00
@slavak
Copy link
Contributor Author

slavak commented Mar 15, 2023

@oranagra Will do.

oranagra pushed a commit that referenced this pull request Mar 15, 2023
WAITAOF wad added in #11713, its return is an array.
But forget to handle WAITAOF in last_offset and last_numreplicas,
causing WAITAOF to return a WAIT like reply.

Tests was added to validate that case (both WAIT and WAITAOF).
This PR also refactored processClientsWaitingReplicas a bit for better
maintainability and readability.
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Mar 15, 2023
There be a situation that satisfies WAIT, and then wrongly unblock
WAITAOF because we mix-use last_offset and last_numreplicas.

We update last_offset and last_numreplicas only when the condition
matches. i.e. output of either replicationCountAOFAcksByOffset or
replicationCountAcksByOffset is right.

In this case, we need to have separate last_ variables for each of
them. Added a last_aof_offset and last_aof_numreplicas for WAITAOF.

WAITAOF was added in redis#11713. Found while coding redis#11917.
A Test was added to validate that case.
oranagra pushed a commit that referenced this pull request Mar 15, 2023
There be a situation that satisfies WAIT, and then wrongly unblock
WAITAOF because we mix-use last_offset and last_numreplicas.

We update last_offset and last_numreplicas only when the condition
matches. i.e. output of either replicationCountAOFAcksByOffset or
replicationCountAcksByOffset is right.

In this case, we need to have separate last_ variables for each of
them. Added a last_aof_offset and last_aof_numreplicas for WAITAOF.

WAITAOF was added in #11713. Found while coding #11917.
A Test was added to validate that case.
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Mar 26, 2023
…YNC_ALWAYS

This PR fix two bugs:

1. About the MP-AOF one, that code assumes that we started a new AOF file just
now (when we have a new base into which we're gonna incrementally write), but the
fact is that with multipart-aof, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to re-issue
SELECT, and i suppose there's no reason to update any of the fsync metrics.
This should have been deleted with MP-AOF (introduced in redis#9788, 7.0).
The aof_fsync_offset it updates will cause us to miss an fsync in flushAppendOnlyFile,
if we stop write commands in AOF_FSYNC_EVERYSEC

2. About the AOF_FSYNC_ALWAYS one, it is follow redis#6053 (the details is in redis#5985),
we also check AOF_FSYNC_ALWAYS to prevent users from switching from everysec to always.

We detected these problems with WAITAOF test (redis 7.2), introduced in redis#11713.
oranagra pushed a commit that referenced this pull request Mar 29, 2023
…SYNC_ALWAYS (#11973)

This PR fix several unrelated bugs that were discovered by the same set of tests
(WAITAOF tests in #11713), could make the `WAITAOF` test hang. 

The change in `backgroundRewriteDoneHandler` is about MP-AOF.
That leftover / old code assumes that we started a new AOF file just now
(when we have a new base into which we're gonna incrementally write), but
the fact is that with MP-AOF, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to
re-issue `SELECT`, and no reason to update any of the fsync variables in that flow.
This should have been deleted with MP-AOF (introduced in #9788, 7.0).
The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync
in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC`
while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever.

Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset`
and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared
`aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be
smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced.

The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053
(the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where
appendfsync is changed from everysec to always while there is data that's written but not yet fsynced.
oranagra pushed a commit that referenced this pull request Apr 17, 2023
…SYNC_ALWAYS (#11973)

This PR fix several unrelated bugs that were discovered by the same set of tests
(WAITAOF tests in #11713), could make the `WAITAOF` test hang.

The change in `backgroundRewriteDoneHandler` is about MP-AOF.
That leftover / old code assumes that we started a new AOF file just now
(when we have a new base into which we're gonna incrementally write), but
the fact is that with MP-AOF, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to
re-issue `SELECT`, and no reason to update any of the fsync variables in that flow.
This should have been deleted with MP-AOF (introduced in #9788, 7.0).
The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync
in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC`
while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever.

Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset`
and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared
`aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be
smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced.

The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053
(the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where
appendfsync is changed from everysec to always while there is data that's written but not yet fsynced.

(cherry picked from commit cb17178)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus state:needs-doc-pr requires a PR to redis-doc repository

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

8 participants