Implementing the WAITAOF command (issue #10505) by slavak · Pull Request #11713 · redis/redis

slavak · 2023-01-15T10:22:53Z

Implementing the WAITAOF functionality requested in issue #10505, which would allow the user to block until a specified number of Redises have fsynced all previous write commands to the AOF.

Syntax: WAITAOF <num_local> <num_replicas> <timeout>
Response: Array containing two elements: num_local, num_replicas
num_local is always either 0 or 1 representing the local AOF on the master.
num_replicas is the number of replicas that acknowledged the a replication
offset of the last write being fsynced to the AOF.

Returns an error when called on replicas, or when called with non-zero
num_local on a master with AOF disabled, in all other cases the response
just contains number of fsync copies.

Main changes:

Added code to keep track of replication offsets that are confirmed to have been fsynced to disk.
Keep advancing master_repl_offset even when replication is disabled (and there's no replication backlog, only if there's an AOF enabled).
This way we can use this command and it's mechanisms even when replication is disabled.
Extend REPLCONF ACK to REPLCONF ACK <ofs> FACK <ofs>, the FACK will be appended only if there's an AOF on the replica, and already ignored on old masters (thus backwards compatible)
WAIT now no longer wait for the replication offset after your last command, but rather the replication offset after your last write (or read command that caused propagation, e.g. lazy expiry).

Unrelated changes:

WAIT command respects CLIENT_DENY_BLOCKING (not just CLIENT_MULTI)

Implementation details:

Add an atomic var named fsynced_reploff_pending that's updated (usually by the bio thread) and later copied to the main fsynced_reploff variable (only if the AOF base file exists).
I.e. during the initial AOF rewrite it will not be used as the fsynced offset since the AOF base is still missing.
Replace close+fsync bio job with new BIO_CLOSE_AOF (AOF specific) job that will also update fsync offset the field.
Handle all AOF jobs (BIO_CLOSE_AOF, BIO_AOF_FSYNC) in the same bio
worker thread, to impose ordering on their execution. This solves a
race condition where a job could set fsynced_reploff_pending to a higher
value than another pending fsync job, resulting in indicating an offset
for which parts of the data have not yet actually been fsynced.
Imposing an ordering on the jobs guarantees that fsync jobs are executed
in increasing order of replication offset.
Drain bio jobs when switching appendfsync to "always"
This should prevent a write race between updates to fsynced_reploff_pending
in the main thread (flushAppendOnlyFile when set to ALWAYS fsync), and
those done in the bio thread.
Drain the pending fsync when starting over a new AOF to avoid race conditions
with the previous AOF offsets overriding the new one (e.g. after switching to replicate from a new master).
Make sure to update the fsynced offset at the end of the initial AOF rewrite.
a must in case there are no additional writes that trigger a periodic fsync, specifically for a replica that does a full sync.

Limitations:
It is possible to write a module and a Lua script that propagate to the AOF and doesn't
propagate to the replication stream. see REDISMODULE_ARGV_NO_REPLICAS and luaRedisSetReplCommand.
These features are incompatible with the WAITAOF command, and can result in two bad cases.
The scenario is that the user executes command that only propagates to AOF, and then immediately
issues a WAITAOF, and there's no further writes on the replication stream after that.

if the the last thing that happened on the replication stream is a PING (which increased the replication offset but won't trigger an fsync on the replica), then the client would hang forever (will wait for an fack that the replica will never send sine it doesn't trigger any fsyncs).
if the last thing that happened is a write command that got propagated properly, then WAITAOF will be released immediately, without waiting for an fsync (since the offset didn't change)

Refactoring:

Plumbing to allow bio worker to handle multiple job types
This introduces infrastructure necessary to allow BIO workers to
not have a 1-1 mapping of worker to job-type. This allows in the
future to assign multiple job types to a single worker, either as
a performance/resource optimization, or as a way of enforcing
ordering between specific classes of jobs.

Discussion:
[Note currently this PR implements the most explicit approach of the options presented blow]

In its most general form, the command can be used to wait for fsync by replicas, by the master (or only Redis if replication is not used), or some combination of the above. A few use-cases I can think of:

I have no replication, and wish to wait for all of my previous writes to be persisted to disk before I continue.
I have a master and n replicas. I wish to wait for at least m of those replicas to persist my writes to disk before I continue.
I have a master and n replicas. I wish to wait until at least m Redises (including both master and replicas) have persisted my writes to disk before I continue.

The simplest approach would be to extend the existing WAIT command with an optional AOF keyword:

WAIT AOF <num> <timeout>

In which case we need to define whether <num> includes the master or not. If it does, use-cases 1 and 3 are covered, but use-case 2—which would probably be a pretty common one—is impossible; If it does not, 2 is supported but 1 and 3 are not. This could be extended by giving "0" the special meaning of "wait for current Redis only" to enable use-case 1, but this might be somewhat surprising semantics.

Another option that crossed my mind is to use positive/negative numbers for the different meanings. e.g.: "-5" means "wait for any 5 Redises" while "5" means "wait for 5 replicas." This is simple and supports all 3 use-cases above, but might not be intuitive or particularly easy to remember.

Yet another possibility is of course to split this functionality into a separate command entirely, in which case we need to define the arguments and semantics of that new command.

Todo:

make a redis-doc PR: Add documentation for new WAITAOF command, and update commands.json redis-doc#2359

oranagra

depending on the type of interface we choose for WAITAOF maybe the race you presented is irrelevant?
i.e. if the caller explicitly asks to wait for AOF files on replicas, it should hang.

maybe another solution besides the one you presented in the commit comment could be to somehow bump pot_fsynced_reploff to the right offset (last one that existed) on the next fsync?
maybe we should discuss that on a call.

src/replication.c

src/server.h

src/bio.c

src/server.c

src/replication.c

src/aof.c

src/bio.c

slavak · 2023-02-05T08:10:56Z

maybe another solution besides the one you presented in the commit comment could be to somehow bump pot_fsynced_reploff to the right offset (last one that existed) on the next fsync?
maybe we should discuss that on a call.

I don't think that would help if you already have clients waiting on a WAITAOF, since they would be waiting on an offset < the last updated server.master_repl_offset. If we artificially bump up pot_fsynced_reploff without an actual fsync taking place, those clients will be released on the next check, regardless of if the changes they care about have actually been written to disk.

Added code to keep track of replication offsets that are confirmed to have been fsynced to disk. REPLCONF ACK command is extended to also report this offset- reported as REPLCONF FACK <offset>. This REPLCONF extension is backwards-compatible because the REPLCONF command handling terminates after encountering the ACK offset, so older Redises will simply ignore the unknown FACK that follows it.

This introduces infrastructure necessary to allow BIO workers to not have a 1-1 mapping of worker to job-type. This allows in the future to assign multiple job types to a single worker, either as a performance/resource optimization, or as a way of enforcing ordering between specific classes of jobs.

- Replace close+fsync job with new BIO_CLOSE_AOF job that will also update the field. - Handle all AOF jobs (BIO_CLOSE_AOF, BIO_AOF_FSYNC) in the same bio worker thread, to impose ordering on their execution. This solves a race condition where a job could set `pot_fsynced_reploff` to a higher value than another pending fsync job, resulting in indicating an offset for which parts of the data have not yet actually been fsynced. Imposing an ordering on the jobs guarantees that fsync jobs are executed in increasing order of replication offset.

This should prevent a write race between updates to `pot_fsynced_reploff` in the main thread (`flushAppendOnlyFile` when set to ALWAYS fsync), and those done in the bio thread.

When this is the case, the `server.master_repl_offset` we usually rely on is not being updated, so we need an alternative means of tracking fsyncs. In this case, we use `pot_fsynced_reploff` as a basic "fsync epoch" counter instead, incrementing it each time the AOF is fsynced. This is inspired by antirez's original approach to the problem, here: redis/redis@unstable...wait-aof This change introduces a possible race in potential implementation of a `WAIT AOF` command: 1. Client executes `WAIT AOF` to wait for an fsync of its own server. Replication is currently enabled. A client waiting for fsync of the current `master_repl_offset` is added to the `clients_waiting_acks` list. 2. Suppose `server.pot_fsynced_reploff == server.master_repl_offset - n`. 3. We now disable replication. 4. Now every AOF fsync will simply increment `server.pot_fsynced_reploff`. As a result, the blocked client will unnecessarily wait `n` fsyncs before it is released. To resolve this race, we might have to: * When disabling replication, iterate over `clients_waiting_acks` list and set relevant offsets to the value `pot_fsynced_reploff`.

This reverts the previous commit that tracked fsync "epochs", as it was non-ideal and suffered from some serious bugs. Instead I've modified the code to continue incrementing the replication offset as long as AOF is configured, even with no replication backlog. This will allow us to rely on a single field for tracking AOF fsync in all scenarios, with or without replication.

* Revert refactor of if-else to switch to reduce diff size. * Add missing initalization to `startAppendOnly` to resolve bug.

Iterating over the worker's entire jobs list wasn't very efficient. This commit adds some minimal bookkeeping to maintain a running counter per job type. This allows `bioPendingJobsOfType` to become an O(1) operation.

Syntax: WAITAOF <num_local> <num_replicas> <timeout> Response: Array containing num_local, num_replicas Returns an error when called on replicas, or when called with non-zero num_local on master with AOF disabled, in all other cases the response just contains number of fsync copies. Changes * Make sure for the replica or master to update the fsynced offset at the end of the initial AOF rewrite (even if there are no additional writes that trigger a periodic fsync. * Drain the pending fsync when starting over a new AOF to avoid race conditions with the previous aof offsets overriding the new one. * implement the reciving side of `REPLCONF ACK <ofs> FACK <ofs>` which will be appended only if there's an AOF on the replica, and ignored on old masters * WAIT command respects CLIENT_DENY_BLOCKING (not just CLIENT_MULTI) Minor notes: * rename pot_fsynced_reploff to fsynced_reploff_pending * when an initial rewrite ends, update fsynced_reploff right away, don't wait for the next beforeSleep.

src/bio.c

src/bio.h

oranagra · 2023-02-07T13:39:53Z

@redis/core-team this is ready for review, and the top comment is updated.

src/commands/waitaof.json

src/replication.c

sundb · 2023-02-08T14:27:37Z

The following steps may cause the WAITAOF command to be blocked forever, until aof is saved again.

add a log in waitaofCommand()

ackreplicas = replicationCountAOFAcksByOffset(c->woff);
acklocal = server.fsynced_reploff >= c->woff;
printf("server.fsynced_reploff: %lld, c->woff: %lld\n", server.fsynced_reploff, c->woff);

start master server

./src/redis-server --appendonly yes

start slave server

./src/redis-server --port 6380

execute SLAVEOF command with the slave client

./src/redis-cli -h 127.0.0.1 -p 6380
slaveof 127.0.0.1 6379

execute WAITAOF command with the master client

while true; do echo "waitaof 1 0 0"; done | ./src/redis-cli -x

master logs:

115202:C 08 Feb 2023 12:16:17.615 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
115202:C 08 Feb 2023 12:16:17.615 # Redis version=255.255.255, bits=64, commit=909f091f, modified=1, pid=115202, just started
115202:C 08 Feb 2023 12:16:17.615 # Configuration loaded
115202:M 08 Feb 2023 12:16:17.615 * monotonic clock: POSIX clock_gettime
115202:M 08 Feb 2023 12:16:17.616 * Running mode=standalone, port=6379.
115202:M 08 Feb 2023 12:16:17.616 # Server initialized
115202:M 08 Feb 2023 12:16:17.616 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
115202:M 08 Feb 2023 12:16:17.616 * Reading RDB base file on AOF loading...
115202:M 08 Feb 2023 12:16:17.616 * Loading RDB produced by version 255.255.255
115202:M 08 Feb 2023 12:16:17.616 * RDB age 1225 seconds
115202:M 08 Feb 2023 12:16:17.616 * RDB memory usage when created 0.82 Mb
115202:M 08 Feb 2023 12:16:17.616 * RDB is base AOF
115202:M 08 Feb 2023 12:16:17.617 * Done loading RDB, keys loaded: 0, keys expired: 0.
115202:M 08 Feb 2023 12:16:17.617 * DB loaded from base file appendonly.aof.1.base.rdb: 0.000 seconds
115202:M 08 Feb 2023 12:16:17.617 * DB loaded from append only file: 0.000 seconds
115202:M 08 Feb 2023 12:16:17.617 * Opening AOF incr file appendonly.aof.1.incr.aof on server start
115202:M 08 Feb 2023 12:16:17.617 * Ready to accept connections tcp
115202:M 08 Feb 2023 12:16:25.929 * Replica 127.0.0.1:6380 asks for synchronization
115202:M 08 Feb 2023 12:16:25.929 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'c6eb517826ab88f5c97a2dd31195c079650a18d8', my replication IDs are 'd6b625e0f3fc09a312e26c775a5681df6f0691df' and '0000000000000000000000000000000000000000')
115202:M 08 Feb 2023 12:16:25.929 * Replication backlog created, my new replication IDs are '114682cd6adfd9801b4c0d3e02413e85234e6a84' and '0000000000000000000000000000000000000000'
115202:M 08 Feb 2023 12:16:25.929 * Delay next BGSAVE for diskless SYNC
server.fsynced_reploff: 0, c->woff: 0
....repeated the line above
server.fsynced_reploff: 0, c->woff: 14
115202:M 08 Feb 2023 12:16:30.651 * Starting BGSAVE for SYNC with target: replicas sockets
115202:M 08 Feb 2023 12:16:30.652 * Background RDB transfer started by pid 115369
115369:C 08 Feb 2023 12:16:30.653 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
115202:M 08 Feb 2023 12:16:30.654 # Diskless rdb transfer, done reading from pipe, 1 replicas still up.
115202:M 08 Feb 2023 12:16:30.659 * Background RDB transfer terminated with success
115202:M 08 Feb 2023 12:16:30.659 * Streamed RDB transfer with replica 127.0.0.1:6380 succeeded (socket). Waiting for REPLCONF ACK from replica to enable streaming
115202:M 08 Feb 2023 12:16:30.659 * Synchronization with replica 127.0.0.1:6380 succeeded

The main reason is that WAIT[AOF] waits for the repl offset of the previous command this client executed.
But in reality the increase in repl offset is not always due to the command, but may be due to the synchronization process between master and slave.

oranagra · 2023-02-08T15:33:40Z

I should have seen this coming.
The periodic PINGs increase the replication offset but they're not triggering an fsync (on either the master or replica), so a client that didn't send a write command before WAITAOF (or one that send a read command between the write command and the WAITAOF, and a PING slipped in between his write and read) would hang until someone else triggers an fsync.

one clear solution for that is replication stream multiplexing (tracked in #8440), where PINGs will not increase the replication offset.
another possibility is to update c->woff only after write commands, specifically after ones that triggered propagation.
then, if the client issued a read after the write, it won't be a problem.
in addition to that, we can state that sending a WAIT[AOF] without any write command (or actually a modification) on that connection is invalid.

looking for other advises.

src/bio.c

src/replication.c

oranagra · 2023-02-23T11:18:47Z

src/replication.c

+/* Return the number of replicas that already acknowledged the specified
+ * replication offset being AOF fsynced. */
+int replicationCountAOFAcksByOffset(long long offset) {


Turns out it's possible to write a module and a Lua script that propagate to the AOF and doesn't propagate to the replication stream. see REDISMODULE_ARGV_NO_REPLICAS and luaRedisSetReplCommand.

That can result in two bad cases, scenario:
User executes command that only propagates to AOF, and then immediately issues a WAITAOF, and there's no further writes on the replication stream after that.

if the the last thing that happened on the replication stream is a PING (which increased the replication offset but won't trigger an fsync on the replica), then the client would hang forever (will wait for an fack that the replica will never send sine it doesn't trigger any fsyncs).

if the last thing that happened is a write command that got propagated properly, then WAITAOF will be released immediately, without waiting for an fsync (since the offset didn't change)

We need to decide if we want to handle that in some way, or ignore it as an abusive case (i'm not sure what's the justification for someone to explicitly propagate only to AOF).

On the other hand, if the client asked WAITAOF to wait for remote AOF files, and propagated a command only to the local AOF, maybe it deserves to get blocked forever (even if there are further writes), since his write didn't go to replicas and he waits for something that wouldn't happen..

So the other scenario we should look at is a case where there's no replication, only a master with AOF, and a command that's explicitly only propagating to AOF, and then user uses WAITAOF to wait for the local AOF file.
in that case, with the current code, we won't increment the replication offset and the client will get released before the fsync. that's probably easy to solve (increment the replication offset in that scenario).

oranagra · 2023-03-09T14:58:29Z

@redis/core-team reminder to take a look / approve this one.

madolson · 2023-03-14T04:11:03Z

I generally dislike wait, I've seen a lot of individuals use it with a 3 node configuration and just wait for one replica. While that technically increases the real world durability, they can still lose writes to the last node. When they set up wait with 2 nodes, they inevitably have a prolonged outage when one of the replicas dies. Now we're just taking more dependencies on disks, which makes me think more people are going to slam into these sharp edges. The answer to durability is not just to write data to more disks, but to write it intelligently in a way that can be recovered.

With that said, I feel like that is kind of what Redis does, so practically OK with it.

oranagra · 2023-03-14T16:38:25Z

PR approved in a core-team meeting.
@slavak maybe you wanna make a redis-doc PR?

oranagra · 2023-03-14T16:45:07Z

FULL CI (of the one test unit): https://github.com/redis/redis/actions/runs/4418135344

WAITAOF wad added in redis#11713, its return is an array. But forget to handle WAITAOF in last_offset and last_numreplicas, causing WAITAOF to return a WAIT like reply. Tests was added to validate that case (both WAIT and WAITAOF).

slavak · 2023-03-15T07:00:46Z

@oranagra Will do.

WAITAOF wad added in #11713, its return is an array. But forget to handle WAITAOF in last_offset and last_numreplicas, causing WAITAOF to return a WAIT like reply. Tests was added to validate that case (both WAIT and WAITAOF). This PR also refactored processClientsWaitingReplicas a bit for better maintainability and readability.

There be a situation that satisfies WAIT, and then wrongly unblock WAITAOF because we mix-use last_offset and last_numreplicas. We update last_offset and last_numreplicas only when the condition matches. i.e. output of either replicationCountAOFAcksByOffset or replicationCountAcksByOffset is right. In this case, we need to have separate last_ variables for each of them. Added a last_aof_offset and last_aof_numreplicas for WAITAOF. WAITAOF was added in redis#11713. Found while coding redis#11917. A Test was added to validate that case.

There be a situation that satisfies WAIT, and then wrongly unblock WAITAOF because we mix-use last_offset and last_numreplicas. We update last_offset and last_numreplicas only when the condition matches. i.e. output of either replicationCountAOFAcksByOffset or replicationCountAcksByOffset is right. In this case, we need to have separate last_ variables for each of them. Added a last_aof_offset and last_aof_numreplicas for WAITAOF. WAITAOF was added in #11713. Found while coding #11917. A Test was added to validate that case.

…YNC_ALWAYS This PR fix two bugs: 1. About the MP-AOF one, that code assumes that we started a new AOF file just now (when we have a new base into which we're gonna incrementally write), but the fact is that with multipart-aof, the fork done handler doesn't really affect the incremental file being maintained by the parent process, there's no reason to re-issue SELECT, and i suppose there's no reason to update any of the fsync metrics. This should have been deleted with MP-AOF (introduced in redis#9788, 7.0). The aof_fsync_offset it updates will cause us to miss an fsync in flushAppendOnlyFile, if we stop write commands in AOF_FSYNC_EVERYSEC 2. About the AOF_FSYNC_ALWAYS one, it is follow redis#6053 (the details is in redis#5985), we also check AOF_FSYNC_ALWAYS to prevent users from switching from everysec to always. We detected these problems with WAITAOF test (redis 7.2), introduced in redis#11713.

…SYNC_ALWAYS (#11973) This PR fix several unrelated bugs that were discovered by the same set of tests (WAITAOF tests in #11713), could make the `WAITAOF` test hang. The change in `backgroundRewriteDoneHandler` is about MP-AOF. That leftover / old code assumes that we started a new AOF file just now (when we have a new base into which we're gonna incrementally write), but the fact is that with MP-AOF, the fork done handler doesn't really affect the incremental file being maintained by the parent process, there's no reason to re-issue `SELECT`, and no reason to update any of the fsync variables in that flow. This should have been deleted with MP-AOF (introduced in #9788, 7.0). The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC` while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever. Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset` and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared `aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced. The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053 (the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where appendfsync is changed from everysec to always while there is data that's written but not yet fsynced.

…SYNC_ALWAYS (#11973) This PR fix several unrelated bugs that were discovered by the same set of tests (WAITAOF tests in #11713), could make the `WAITAOF` test hang. The change in `backgroundRewriteDoneHandler` is about MP-AOF. That leftover / old code assumes that we started a new AOF file just now (when we have a new base into which we're gonna incrementally write), but the fact is that with MP-AOF, the fork done handler doesn't really affect the incremental file being maintained by the parent process, there's no reason to re-issue `SELECT`, and no reason to update any of the fsync variables in that flow. This should have been deleted with MP-AOF (introduced in #9788, 7.0). The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC` while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever. Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset` and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared `aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced. The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053 (the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where appendfsync is changed from everysec to always while there is data that's written but not yet fsynced. (cherry picked from commit cb17178)

slavak force-pushed the waitaof_infra branch from ecf6cb6 to 4ebf34f Compare January 29, 2023 12:01

oranagra reviewed Jan 31, 2023

View reviewed changes

src/replication.c Outdated Show resolved Hide resolved

src/server.h Outdated Show resolved Hide resolved

src/bio.c Outdated Show resolved Hide resolved

oranagra reviewed Feb 1, 2023

View reviewed changes

src/server.c Outdated Show resolved Hide resolved

src/replication.c Outdated Show resolved Hide resolved

src/aof.c Outdated Show resolved Hide resolved

src/bio.c Outdated Show resolved Hide resolved

src/bio.c Outdated Show resolved Hide resolved

slavak added 6 commits February 5, 2023 10:14

Drain bio jobs when switching appendfsync to "always"

581776c

This should prevent a write race between updates to `pot_fsynced_reploff` in the main thread (`flushAppendOnlyFile` when set to ALWAYS fsync), and those done in the bio thread.

slavak force-pushed the waitaof_infra branch 2 times, most recently from 0cba22e to 66c87f5 Compare February 5, 2023 13:48

Code-review fixes

097f4c4

* Revert refactor of if-else to switch to reduce diff size. * Add missing initalization to `startAppendOnly` to resolve bug.

slavak force-pushed the waitaof_infra branch from 66c87f5 to 097f4c4 Compare February 5, 2023 13:49

slavak and others added 3 commits February 5, 2023 16:08

Optimize bioPendingJobsOfType

4e68a34

Iterating over the worker's entire jobs list wasn't very efficient. This commit adds some minimal bookkeeping to maintain a running counter per job type. This allows `bioPendingJobsOfType` to become an O(1) operation.

wip

ea259e3

oranagra changed the title ~~Implementing the WAIT AOF command (issue #10505)~~ Implementing the WAITAOF command (issue #10505) Feb 6, 2023

typo

174055f

sundb reviewed Feb 7, 2023

View reviewed changes

src/bio.c Outdated Show resolved Hide resolved

sundb reviewed Feb 7, 2023

View reviewed changes

src/bio.h Outdated Show resolved Hide resolved

CR comments about comments

909f091

oranagra added state:major-decision Requires core team consensus release-notes indication that this issue needs to be mentioned in the release notes labels Feb 7, 2023

itamarhaber reviewed Feb 7, 2023

View reviewed changes

src/commands/waitaof.json Outdated Show resolved Hide resolved

itamarhaber approved these changes Feb 7, 2023

View reviewed changes

src/replication.c Outdated Show resolved Hide resolved

src/replication.c Outdated Show resolved Hide resolved

sundb reviewed Feb 8, 2023

View reviewed changes

src/replication.c Outdated Show resolved Hide resolved

src/replication.c Show resolved Hide resolved

yossigo reviewed Feb 23, 2023

View reviewed changes

src/bio.c Show resolved Hide resolved

src/bio.c Outdated Show resolved Hide resolved

src/replication.c Show resolved Hide resolved

src/replication.c Show resolved Hide resolved

oranagra reviewed Feb 23, 2023

View reviewed changes

Merge branch 'unstable' into waitaof_infra

f228f8e

madolson approved these changes Mar 14, 2023

View reviewed changes

Merge remote-tracking branch 'origin/unstable' into waitaof_infra

4a99a4c

oranagra added 2 commits March 14, 2023 19:09

solving test failures, reply schema, valgrind, os cache reclaim

5b3ef75

hickup

97b8ce2

oranagra merged commit 9344f65 into redis:unstable Mar 14, 2023

enjoy-binbin mentioned this pull request Mar 15, 2023

[NEW] Allow WAIT to block until next fsync #10505

Closed

enjoy-binbin mentioned this pull request Mar 15, 2023

Fix WAITAOF reply when using last_offset and last_numreplicas #11917

Merged

slavak deleted the waitaof_infra branch March 15, 2023 07:00

enjoy-binbin mentioned this pull request Mar 15, 2023

Fix WAITAOF mix-use last_offset and last_numreplicas #11922

Merged

enjoy-binbin mentioned this pull request Mar 26, 2023

Fix fork done handler wrongly update fsync metrics and enhance AOF_FSYNC_ALWAYS #11973

Merged

uvletter mentioned this pull request Apr 19, 2023

Background Job Manager (BJM) - replacement for BIO #12029

Closed

pilor mentioned this pull request Jan 20, 2025

[CRASH] Kernel bad page state in process 'bio_aof' #13757

Open

Conversation

slavak commented Jan 15, 2023 • edited by oranagra Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slavak commented Feb 5, 2023

Uh oh!

Uh oh!

Uh oh!

oranagra commented Feb 7, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sundb commented Feb 8, 2023

Uh oh!

oranagra commented Feb 8, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oranagra Feb 23, 2023

Choose a reason for hiding this comment

Uh oh!

oranagra commented Mar 9, 2023

Uh oh!

madolson commented Mar 14, 2023

Uh oh!

oranagra commented Mar 14, 2023

Uh oh!

oranagra commented Mar 14, 2023

Uh oh!

slavak commented Mar 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

slavak commented Jan 15, 2023 •

edited by oranagra

Loading