Skip to content

[CRASH] Failure to rebalance on 7.0.0-rc2 #11014

@tbream

Description

@tbream

Running redis 7.0.0-rc2 on ubuntu in AWS.

During a rebalance operation after adding nodes, this particular master containing these slots failed continuously on each rebalance attempt of the cluster.

=== REDIS BUG REPORT START: Cut & paste starting from here ===
9412:M 18 Jul 2022 15:59:55.278 # === ASSERTION FAILED ===
9412:M 18 Jul 2022 15:59:55.278 # ==> blocked.c:624 'server.also_propagate.numops == 0' is not true

------ STACK TRACE ------

Backtrace:
/usr/local/bin/redis-server *:6379 [cluster](handleClientsBlockedOnKeys+0x254)[0x4cecb4]
/usr/local/bin/redis-server *:6379 [cluster](beforeSleep+0xe8)[0x446c58]
/usr/local/bin/redis-server *:6379 [cluster](aeProcessEvents+0x3aa)[0x442b2a]
/usr/local/bin/redis-server *:6379 [cluster](aeMain+0x1d)[0x442d4d]
/usr/local/bin/redis-server *:6379 [cluster](main+0x409)[0x43ecc9]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fa725305840]
/usr/local/bin/redis-server *:6379 [cluster](_start+0x29)[0x43f099]

------ INFO OUTPUT ------
# Server
redis_version:6.9.242
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:50a29168b229999b
redis_mode:cluster
os:Linux 4.4.0-1060-aws x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:c11-builtin
gcc_version:5.4.0
process_id:9412
process_supervised:no
run_id:5762aa4fe3d56afa0a26fa03a8dc80727c19652a
tcp_port:6379
server_time_usec:1658159995278425
uptime_in_seconds:7266858
uptime_in_days:84
hz:10
configured_hz:10
lru_clock:13992827
executable:/usr/local/bin/redis-server
config_file:/etc/redis/6379.conf
io_threads_active:0

# Clients
connected_clients:1341
cluster_connections:70
maxclients:150000
client_recent_max_input_buffer:20480
client_recent_max_output_buffer:20504
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0

# Memory
used_memory:45638469256
used_memory_human:42.50G
used_memory_rss:57257009152
used_memory_rss_human:53.32G
used_memory_peak:59139479568
used_memory_peak_human:55.08G
used_memory_peak_perc:77.17%
used_memory_overhead:29487727300
used_memory_startup:9423688
used_memory_dataset:16150741956
used_memory_dataset_perc:35.40%
allocator_allocated:45639175592
allocator_active:56496828416
allocator_resident:57283624960
total_system_memory:133661978624
total_system_memory_human:124.48G
used_memory_lua:37888
used_memory_vm_eval:37888
used_memory_lua_human:37.00K
used_memory_scripts_eval:0
number_of_cached_scripts:0
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:37888
used_memory_vm_total:75776
used_memory_vm_total_human:74.00K
used_memory_functions:184
used_memory_scripts:184
used_memory_scripts_human:184B
maxmemory:106929582899
maxmemory_human:99.59G
maxmemory_policy:noeviction
allocator_frag_ratio:1.24
allocator_frag_bytes:10857652824
allocator_rss_ratio:1.01
allocator_rss_bytes:786796544
rss_overhead_ratio:1.00
rss_overhead_bytes:-26615808
mem_fragmentation_ratio:1.25
mem_fragmentation_bytes:11618329144
mem_not_counted_for_evict:3584
mem_replication_backlog:104858916
mem_total_replication_buffers:105206024
mem_clients_slaves:348424
mem_clients_normal:7086552
mem_cluster_links:76160
mem_aof_buffer:3584
mem_allocator:jemalloc-5.2.1
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:10423

# Persistence
loading:0
async_loading:0
current_cow_peak:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:21839092757
rdb_bgsave_in_progress:0
rdb_last_save_time:1650893137
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:410
rdb_current_bgsave_time_sec:-1
rdb_saves:0
rdb_last_cow_size:6788845568
rdb_last_load_keys_expired:0
rdb_last_load_keys_loaded:120527186
aof_enabled:1
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:433
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_rewrites:1626
aof_last_write_status:ok
aof_last_cow_size:31108997120
module_fork_in_progress:0
module_fork_last_cow_size:0
aof_current_size:16708212897
aof_base_size:15920075489
aof_pending_rewrite:0
aof_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:2

# Stats
total_connections_received:1493483
total_commands_processed:21833443018
instantaneous_ops_per_sec:4734
total_net_input_bytes:1661867530597
total_net_output_bytes:21996788022
instantaneous_input_kbps:881.31
instantaneous_output_kbps:1865.17
rejected_connections:0
sync_full:1
sync_partial_ok:1
sync_partial_err:1
expired_keys:1009194
expired_stale_perc:11.69
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:4354
evicted_keys:0
evicted_clients:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
keyspace_hits:18147550
keyspace_misses:4305787
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:792783
total_forks:1627
migrate_cached_sockets:1
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
total_active_defrag_time:0
current_active_defrag_time:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:1139
dump_payload_sanitizations:0
total_reads_processed:10690047223
total_writes_processed:34801822
io_threaded_reads_processed:0
io_threaded_writes_processed:0
reply_buffer_shrinks:27565
reply_buffer_expands:3151

# Replication
role:master
connected_slaves:2
min_slaves_good_slaves:2
slave0:ip=20.0.83.95,port=6379,state=online,offset=44091096078,lag=0
slave1:ip=20.0.83.12,port=6379,state=online,offset=44091048369,lag=0
master_failover_state:no-failover
master_replid:440b19326917cca6521c88c32af2eaa8336145c9
master_replid2:f7e7603168b44d0a9db4fdd96cf1b907beb3ea41
master_repl_offset:44091170424
second_repl_offset:43585485265
repl_backlog_active:1
repl_backlog_size:104857600
repl_backlog_first_byte_offset:43986311517
repl_backlog_histlen:104858908

# CPU
used_cpu_sys:137268.968000
used_cpu_user:180039.168000
used_cpu_sys_children:39060.212000
used_cpu_user_children:782172.496000
used_cpu_sys_main_thread:133114.124000
used_cpu_user_main_thread:179873.748000

# Modules

# Commandstats
cmdstat_quit:calls=125459,usec=50132,usec_per_call=0.40,rejected_calls=0,failed_calls=0
cmdstat_function|dump:calls=3,usec=10,usec_per_call=3.33,rejected_calls=0,failed_calls=0
cmdstat_config|get:calls=484333,usec=1952954,usec_per_call=4.03,rejected_calls=0,failed_calls=0
cmdstat_client|setname:calls=29306,usec=71317,usec_per_call=2.43,rejected_calls=0,failed_calls=0
cmdstat_psync:calls=2,usec=792987,usec_per_call=396493.50,rejected_calls=0,failed_calls=0
cmdstat_select:calls=7,usec=7,usec_per_call=1.00,rejected_calls=0,failed_calls=0
cmdstat_dbsize:calls=21,usec=20,usec_per_call=0.95,rejected_calls=0,failed_calls=0
cmdstat_readonly:calls=24241,usec=13403,usec_per_call=0.55,rejected_calls=0,failed_calls=0
cmdstat_replconf:calls=2367,usec=1441,usec_per_call=0.61,rejected_calls=0,failed_calls=0
cmdstat_get:calls=16108776,usec=52277307,usec_per_call=3.25,rejected_calls=137,failed_calls=0
cmdstat_ping:calls=838630,usec=269615,usec_per_call=0.32,rejected_calls=32,failed_calls=0
cmdstat_multi:calls=1095071,usec=216485,usec_per_call=0.20,rejected_calls=0,failed_calls=0
cmdstat_migrate:calls=228217,usec=44765074,usec_per_call=196.15,rejected_calls=0,failed_calls=2
cmdstat_del:calls=10735503943,usec=51089041306,usec_per_call=4.76,rejected_calls=0,failed_calls=0
cmdstat_info:calls=1098655,usec=72714853,usec_per_call=66.19,rejected_calls=0,failed_calls=0
cmdstat_cluster|nodes:calls=776,usec=85949,usec_per_call=110.76,rejected_calls=3,failed_calls=0
cmdstat_cluster|forget:calls=31,usec=2708,usec_per_call=87.35,rejected_calls=0,failed_calls=0
cmdstat_cluster|countkeysinslot:calls=4,usec=7,usec_per_call=1.75,rejected_calls=0,failed_calls=0
cmdstat_cluster|info:calls=484335,usec=20396392,usec_per_call=42.11,rejected_calls=11,failed_calls=0
cmdstat_cluster|getkeysinslot:calls=228229,usec=1237624,usec_per_call=5.42,rejected_calls=0,failed_calls=0
cmdstat_cluster|replicate:calls=2,usec=226,usec_per_call=113.00,rejected_calls=0,failed_calls=0
cmdstat_cluster|setslot:calls=30,usec=109,usec_per_call=3.63,rejected_calls=0,failed_calls=0
cmdstat_cluster|meet:calls=1,usec=12,usec_per_call=12.00,rejected_calls=0,failed_calls=0
cmdstat_cluster|reset:calls=1,usec=232112873,usec_per_call=232112880.00,rejected_calls=0,failed_calls=0
cmdstat_cluster|slots:calls=4517,usec=338211,usec_per_call=74.88,rejected_calls=0,failed_calls=0
cmdstat_cluster|delslots:calls=2,usec=55,usec_per_call=27.50,rejected_calls=0,failed_calls=0
cmdstat_debug:calls=0,usec=0,usec_per_call=0.00,rejected_calls=2,failed_calls=0
cmdstat_set:calls=11076073412,usec=55942994477,usec_per_call=5.05,rejected_calls=952,failed_calls=0
cmdstat_restore-asking:calls=17576,usec=39680,usec_per_call=2.26,rejected_calls=0,failed_calls=0
cmdstat_exec:calls=1095071,usec=49235816,usec_per_call=44.96,rejected_calls=0,failed_calls=0

# Errorstats
errorstat_ASK:count=161
errorstat_ERR:count=4
errorstat_LOADING:count=46
errorstat_MOVED:count=928

# Latencystats
latency_percentiles_usec_quit:p50=0.001,p99=6.015,p99.9=9.023
latency_percentiles_usec_function|dump:p50=3.007,p99=4.015,p99.9=4.015
latency_percentiles_usec_config|get:p50=3.007,p99=13.055,p99.9=74.239
latency_percentiles_usec_client|setname:p50=2.007,p99=10.047,p99.9=22.015
latency_percentiles_usec_psync:p50=39.167,p99=796917.759,p99.9=796917.759
latency_percentiles_usec_select:p50=1.003,p99=2.007,p99.9=2.007
latency_percentiles_usec_dbsize:p50=1.003,p99=2.007,p99.9=2.007
latency_percentiles_usec_readonly:p50=1.003,p99=2.007,p99.9=9.023
latency_percentiles_usec_replconf:p50=1.003,p99=1.003,p99.9=6.015
latency_percentiles_usec_get:p50=3.007,p99=10.047,p99.9=21.119
latency_percentiles_usec_ping:p50=0.001,p99=4.015,p99.9=11.007
latency_percentiles_usec_multi:p50=0.001,p99=1.003,p99.9=10.047
latency_percentiles_usec_migrate:p50=214.015,p99=280.575,p99.9=2883.583
latency_percentiles_usec_del:p50=4.015,p99=19.071,p99.9=44.031
latency_percentiles_usec_info:p50=64.255,p99=116.223,p99.9=187.391
latency_percentiles_usec_cluster|nodes:p50=105.471,p99=218.111,p99.9=242.687
latency_percentiles_usec_cluster|forget:p50=89.087,p99=122.367,p99.9=122.367
latency_percentiles_usec_cluster|countkeysinslot:p50=2.007,p99=2.007,p99.9=2.007
latency_percentiles_usec_cluster|info:p50=40.191,p99=65.023,p99.9=79.359
latency_percentiles_usec_cluster|getkeysinslot:p50=5.023,p99=10.047,p99.9=15.039
latency_percentiles_usec_cluster|replicate:p50=87.039,p99=139.263,p99.9=139.263
latency_percentiles_usec_cluster|setslot:p50=3.007,p99=6.015,p99.9=6.015
latency_percentiles_usec_cluster|meet:p50=12.031,p99=12.031,p99.9=12.031
latency_percentiles_usec_cluster|reset:p50=1002438.655,p99=1002438.655,p99.9=1002438.655
latency_percentiles_usec_cluster|slots:p50=59.135,p99=225.279,p99.9=323.583
latency_percentiles_usec_cluster|delslots:p50=20.095,p99=35.071,p99.9=35.071
latency_percentiles_usec_set:p50=3.007,p99=25.087,p99.9=145.407
latency_percentiles_usec_restore-asking:p50=2.007,p99=11.007,p99.9=22.015
latency_percentiles_usec_exec:p50=40.191,p99=101.375,p99.9=137.215

# Cluster
cluster_enabled:1

# Keyspace
db0:keys=259699940,expires=259699940,avg_ttl=91233303

------ CLIENT LIST OUTPUT ------
...

------ MODULES INFO OUTPUT ------

------ CONFIG DEBUG OUTPUT ------
sanitize-dump-payload no
repl-diskless-sync yes
slave-read-only yes
io-threads-do-reads no
lazyfree-lazy-server-del no
client-query-buffer-limit 1gb
proto-max-bulk-len 512mb
list-compress-depth 0
replica-read-only yes
lazyfree-lazy-user-flush no
io-threads 1
repl-diskless-load disabled
lazyfree-lazy-expire no
lazyfree-lazy-user-del no
activedefrag no
lazyfree-lazy-eviction no

------ FAST MEMORY TEST ------
9412:M 18 Jul 2022 15:59:55.339 # Bio thread for job type #0 terminated
9412:M 18 Jul 2022 15:59:55.339 # Bio thread for job type #1 terminated
9412:M 18 Jul 2022 15:59:55.339 # Bio thread for job type #2 terminated
*** Preparing to test memory region 893000 (2297856 bytes)
*** Preparing to test memory region 2554000 (135168 bytes)
*** Preparing to test memory region 7f95cc800000 (8388608 bytes)
*** Preparing to test memory region 7f95cd000000 (28169080832 bytes)
*** Preparing to test memory region 7f9c60200000 (2097152 bytes)
*** Preparing to test memory region 7f9c60600000 (8388608 bytes)
*** Preparing to test memory region 7f9c60e00000 (2097152 bytes)
*** Preparing to test memory region 7f9c61200000 (8388608 bytes)
*** Preparing to test memory region 7f9c61a00000 (46172995584 bytes)
*** Preparing to test memory region 7fa721cfd000 (8388608 bytes)
*** Preparing to test memory region 7fa7224fe000 (8388608 bytes)
*** Preparing to test memory region 7fa722cff000 (8388608 bytes)
*** Preparing to test memory region 7fa723500000 (8388608 bytes)
*** Preparing to test memory region 7fa723d00000 (22020096 bytes)
*** Preparing to test memory region 7fa7256ab000 (16384 bytes)
*** Preparing to test memory region 7fa7258c8000 (16384 bytes)
*** Preparing to test memory region 7fa7261f5000 (24576 bytes)
*** Preparing to test memory region 7fa726208000 (4096 bytes)
.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O
Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.

=== REDIS BUG REPORT END. Make sure to include from START to END. ===

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions