Skip to content

[CRASH] While running the rebalancing command in a cluster mixture of 7.0.8 and 7.2.1 #12603

@salarali

Description

@salarali

Crash report

=== REDIS BUG REPORT START: Cut & paste starting from here ===
64949:M 22 Sep 2023 18:45:29.872 # === ASSERTION FAILED ===
64949:M 22 Sep 2023 18:45:29.872 # ==> cluster.c:4926 'nodes_for_slot != NULL' is not true

------ STACK TRACE ------

Backtrace:
/usr/local/bin/redis-server *:6379 [cluster](clusterDelSlot+0xf4)[0x4f4984]
/usr/local/bin/redis-server *:6379 [cluster](clusterCommand+0x9f9)[0x4fb459]
/usr/local/bin/redis-server *:6379 [cluster](call+0x141)[0x465d61]
/usr/local/bin/redis-server *:6379 [cluster](processCommand+0xae8)[0x466f88]
/usr/local/bin/redis-server *:6379 [cluster](processInputBuffer+0xe1)[0x48a011]
/usr/local/bin/redis-server *:6379 [cluster](readQueryFromClient+0x330)[0x48a530]
/usr/local/bin/redis-server *:6379 [cluster][0x578828]
/usr/local/bin/redis-server *:6379 [cluster](aeMain+0xe9)[0x45c659]
/usr/local/bin/redis-server *:6379 [cluster](main+0x3d6)[0x451af6]
/lib64/libc.so.6(+0x3feb0)[0x7f5d45c3feb0]
/lib64/libc.so.6(__libc_start_main+0x80)[0x7f5d45c3ff60]
/usr/local/bin/redis-server *:6379 [cluster](_start+0x25)[0x452245]

------ INFO OUTPUT ------
# Server
redis_version:7.2.1
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:72ee6a8e2f5852a7
redis_mode:cluster
os:Linux 6.1.52-71.125.amzn2023.x86_64 x86_64
arch_bits:64
monotonic_clock:POSIX clock_gettime
multiplexing_api:epoll
atomicvar_api:c11-builtin
gcc_version:11.4.1
process_id:64949
process_supervised:systemd
run_id:953c8a90925e7690fd1ddf146f1e232e08602ee4
tcp_port:6379
server_time_usec:1695408329872592
uptime_in_seconds:742
uptime_in_days:0
hz:100
configured_hz:100
lru_clock:909513
executable:/usr/local/bin/redis-server
config_file:/etc/redis/6379.conf
io_threads_active:0
listener0:name=tcp,bind=*,bind=-::*,port=6379

# Clients
connected_clients:3
cluster_connections:12
maxclients:20000
client_recent_max_input_buffer:20480
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
total_blocking_keys:0
total_blocking_keys_on_nokey:0

# Memory
used_memory:3553432
used_memory_human:3.39M
used_memory_rss:8355840
used_memory_rss_human:7.97M
used_memory_peak:3707280
used_memory_peak_human:3.54M
used_memory_peak_perc:95.85%
used_memory_overhead:2606308
used_memory_startup:2123600
used_memory_dataset:947124
used_memory_dataset_perc:66.24%
allocator_allocated:2657992
allocator_active:3031040
allocator_resident:8732672
total_system_memory:267397464064
total_system_memory_human:249.03G
used_memory_lua:31744
used_memory_vm_eval:31744
used_memory_lua_human:31.00K
used_memory_scripts_eval:0
number_of_cached_scripts:0
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:32768
used_memory_vm_total:64512
used_memory_vm_total_human:63.00K
used_memory_functions:184
used_memory_scripts:184
used_memory_scripts_human:184B
maxmemory:246960619520
maxmemory_human:230.00G
maxmemory_policy:volatile-lfu
allocator_frag_ratio:1.14
allocator_frag_bytes:373048
allocator_rss_ratio:2.88
allocator_rss_bytes:5701632
rss_overhead_ratio:0.96
rss_overhead_bytes:-376832
mem_fragmentation_ratio:3.41
mem_fragmentation_bytes:5906024
mem_not_counted_for_evict:0
mem_replication_backlog:389580
mem_total_replication_buffers:389576
mem_clients_slaves:0
mem_clients_normal:62088
mem_cluster_links:12864
mem_aof_buffer:0
mem_allocator:jemalloc-5.3.0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0

# Persistence
loading:0
async_loading:0
current_cow_peak:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:148
rdb_bgsave_in_progress:0
rdb_last_save_time:1695407587
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:0
rdb_current_bgsave_time_sec:-1
rdb_saves:0
rdb_last_cow_size:397312
rdb_last_load_keys_expired:0
rdb_last_load_keys_loaded:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_rewrites:0
aof_rewrites_consecutive_failures:0
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0

# Stats
total_connections_received:57
total_commands_processed:839
instantaneous_ops_per_sec:0
total_net_input_bytes:417210
total_net_output_bytes:561195
total_net_repl_input_bytes:0
total_net_repl_output_bytes:986
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
instantaneous_input_repl_kbps:0.00
instantaneous_output_repl_kbps:0.00
rejected_connections:0
sync_full:1
sync_partial_ok:0
sync_partial_err:1
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:3
evicted_keys:0
evicted_clients:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:246
total_forks:1
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
total_active_defrag_time:0
current_active_defrag_time:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:2
dump_payload_sanitizations:0
total_reads_processed:770
total_writes_processed:245
io_threaded_reads_processed:0
io_threaded_writes_processed:0
reply_buffer_shrinks:117
reply_buffer_expands:73
eventloop_cycles:77292
eventloop_duration_sum:415380
eventloop_duration_cmd_sum:13068
instantaneous_eventloop_cycles_per_sec:102
instantaneous_eventloop_duration_usec:3
acl_access_denied_auth:0
acl_access_denied_cmd:0
acl_access_denied_key:0
acl_access_denied_channel:0

# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:ed8492580ce2ad290acea75cf1520bf0dd86be3b
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:380143
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:380143

# CPU
used_cpu_sys:0.344065
used_cpu_user:0.458312
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000840
used_cpu_sys_main_thread:0.346191
used_cpu_user_main_thread:0.453631

# Modules

# Commandstats
cmdstat_asking:calls=2,usec=1,usec_per_call=0.50,rejected_calls=0,failed_calls=0
cmdstat_replconf:calls=523,usec=734,usec_per_call=1.40,rejected_calls=0,failed_calls=0
cmdstat_select:calls=1,usec=3,usec_per_call=3.00,rejected_calls=0,failed_calls=0
cmdstat_cluster|forget:calls=1,usec=49,usec_per_call=49.00,rejected_calls=0,failed_calls=0
cmdstat_cluster|setslot:calls=1,usec=6,usec_per_call=6.00,rejected_calls=0,failed_calls=0
cmdstat_cluster|nodes:calls=18,usec=1128,usec_per_call=62.67,rejected_calls=0,failed_calls=0
cmdstat_cluster|countkeysinslot:calls=2,usec=2,usec_per_call=1.00,rejected_calls=0,failed_calls=0
cmdstat_cluster|meet:calls=1,usec=17,usec_per_call=17.00,rejected_calls=0,failed_calls=0
cmdstat_restore-asking:calls=148,usec=1144,usec_per_call=7.73,rejected_calls=0,failed_calls=0
cmdstat_readonly:calls=40,usec=78,usec_per_call=1.95,rejected_calls=0,failed_calls=0
cmdstat_dbsize:calls=2,usec=2,usec_per_call=1.00,rejected_calls=0,failed_calls=0
cmdstat_ping:calls=19,usec=38,usec_per_call=2.00,rejected_calls=0,failed_calls=0
cmdstat_psync:calls=1,usec=63,usec_per_call=63.00,rejected_calls=0,failed_calls=0
cmdstat_evalsha:calls=0,usec=0,usec_per_call=0.00,rejected_calls=2,failed_calls=0
cmdstat_info:calls=80,usec=9803,usec_per_call=122.54,rejected_calls=0,failed_calls=0

# Errorstats
errorstat_CLUSTERDOWN:count=2

# Latencystats
latency_percentiles_usec_asking:p50=0.001,p99=1.003,p99.9=1.003
latency_percentiles_usec_replconf:p50=1.003,p99=10.047,p99.9=25.087
latency_percentiles_usec_select:p50=3.007,p99=3.007,p99.9=3.007
latency_percentiles_usec_cluster|forget:p50=49.151,p99=49.151,p99.9=49.151
latency_percentiles_usec_cluster|setslot:p50=6.015,p99=6.015,p99.9=6.015
latency_percentiles_usec_cluster|nodes:p50=59.135,p99=99.327,p99.9=99.327
latency_percentiles_usec_cluster|countkeysinslot:p50=0.001,p99=2.007,p99.9=2.007
latency_percentiles_usec_cluster|meet:p50=17.023,p99=17.023,p99.9=17.023
latency_percentiles_usec_restore-asking:p50=7.007,p99=23.039,p99.9=29.055
latency_percentiles_usec_readonly:p50=2.007,p99=6.015,p99.9=6.015
latency_percentiles_usec_dbsize:p50=1.003,p99=1.003,p99.9=1.003
latency_percentiles_usec_ping:p50=1.003,p99=8.031,p99.9=8.031
latency_percentiles_usec_psync:p50=63.231,p99=63.231,p99.9=63.231
latency_percentiles_usec_info:p50=127.487,p99=162.815,p99.9=182.271

# Cluster
cluster_enabled:1

# Keyspace
db0:keys=148,expires=148,avg_ttl=1339600

# Cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:7
cluster_size:3
cluster_current_epoch:35
cluster_my_epoch:0
cluster_stats_messages_ping_sent:725
cluster_stats_messages_pong_sent:686
cluster_stats_messages_meet_sent:1
cluster_stats_messages_sent:1412
cluster_stats_messages_ping_received:686
cluster_stats_messages_pong_received:726
cluster_stats_messages_received:1412
total_cluster_links_buffer_limit_exceeded:0

------ CLUSTER NODES OUTPUT ------
8d4910a3e12225982e1fe2ef85c2050fe2a3a644 172.30.74.6:6379@16379,,tls-port=0,shard-id=f3e80bd96b1406c4fc58519c97ee80060afdc6fd myself,master - 0 1695408325000 0 connected [3800-<-5ebca6e2424e3007d364b6aeeaf0ab0639e03e99]
6b9b491a28d0f186cf8f8e2ac718d2f74161f066 172.30.78.223:6379@16379,,tls-port=0,shard-id=7aa6f7c58b6540e890cb0ee2bce847c7975dc75f master - 0 1695408328000 32 connected 0-203 3277-3799 4097-4915 5735-6331 9788-9829 9831-13106
978b84b7d47f5df4efde141bb88f197da6fef2c2 172.30.75.17:6379@16379,,tls-port=0,shard-id=9438032f1b120effa2f8729573269ff25419a918 slave 5ebca6e2424e3007d364b6aeeaf0ab0639e03e99 0 1695408327000 34 connected
7fdd721c765e0bdf3eb81507a5bc77fe9e8bb3c3 172.30.85.245:6379@16379,,tls-port=0,shard-id=85a8651ae45c47446acb31c845eb02987cdb8e40 slave 6b9b491a28d0f186cf8f8e2ac718d2f74161f066 0 1695408329195 32 connected
09a30cccec92498c638c17909d6e0dab7415c8aa 172.30.83.247:6379@16379,,tls-port=0,shard-id=ca9f01669a79f43998ef055cc09165cfe989c5ac master - 0 1695408327180 31 connected 204-3276 4916-5734 8218-9787
5ebca6e2424e3007d364b6aeeaf0ab0639e03e99 172.30.90.72:6379@16379,,tls-port=0,shard-id=bc788d107abec337f8df48d87b99cf942afc8a2f master - 0 1695408326000 34 connected 3800-4096 6332-8217 9830 13107-16383
b73ffb1cf6f3a6f63f32bffa29f17f04c8950204 172.30.78.185:6379@16379,,tls-port=0,shard-id=046bf30e794bcd54cffda278f2a11b629e13992b slave 09a30cccec92498c638c17909d6e0dab7415c8aa 0 1695408328188 31 connected

------ CLIENT LIST OUTPUT ------
id=45 addr=127.0.0.1:64662 laddr=127.0.0.1:6379 fd=11 name= age=736 idle=9 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1928 events=r cmd=info user=default redir=-1 resp=2 lib-name= lib-ver=
id=58 addr=172.30.74.6:34048 laddr=172.30.74.6:6379 fd=12 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=97 qbuf-free=20377 argv-mem=62 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=22502 events=r cmd=cluster|setslot user=default redir=-1 resp=2 lib-name= lib-ver=
id=59 addr=172.30.90.72:23524 laddr=172.30.74.6:6379 fd=13 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=16384 rbp=0 obl=0 oll=0 omem=0 tot-mem=37760 events=r cmd=restore-asking user=default redir=-1 resp=2 lib-name= lib-ver=

------ CURRENT CLIENT INFO ------
id=58 addr=172.30.74.6:34048 laddr=172.30.74.6:6379 fd=12 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=97 qbuf-free=20377 argv-mem=62 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=22502 events=r cmd=cluster|setslot user=default redir=-1 resp=2 lib-name= lib-ver=
argc: '5'
argv[0]: '"CLUSTER"'
argv[1]: '"SETSLOT"'
argv[2]: '"3800"'
argv[3]: '"node"'
argv[4]: '"8d4910a3e12225982e1fe2ef85c2050fe2a3a644"'

------ EXECUTING CLIENT INFO ------
id=58 addr=172.30.74.6:34048 laddr=172.30.74.6:6379 fd=12 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=97 qbuf-free=20377 argv-mem=62 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=22502 events=r cmd=cluster|setslot user=default redir=-1 resp=2 lib-name= lib-ver=
argc: '5'
argv[0]: '"CLUSTER"'
argv[1]: '"SETSLOT"'
argv[2]: '"3800"'
argv[3]: '"node"'
argv[4]: '"8d4910a3e12225982e1fe2ef85c2050fe2a3a644"'

------ MODULES INFO OUTPUT ------

------ CONFIG DEBUG OUTPUT ------
activedefrag yes
proto-max-bulk-len 512mb
lazyfree-lazy-user-del no
io-threads-do-reads yes
sanitize-dump-payload no
list-compress-depth 0
lazyfree-lazy-server-del no
lazyfree-lazy-user-flush no
client-query-buffer-limit 1gb
repl-diskless-load on-empty-db
replica-read-only yes
lazyfree-lazy-expire yes
slave-read-only yes
io-threads 6
lazyfree-lazy-eviction no
repl-diskless-sync yes

------ FAST MEMORY TEST ------
64949:M 22 Sep 2023 18:45:29.873 # Bio worker thread #0 terminated
64949:M 22 Sep 2023 18:45:29.873 # Bio worker thread #1 terminated
64949:M 22 Sep 2023 18:45:29.873 # Bio worker thread #2 terminated
64949:M 22 Sep 2023 18:45:29.873 # IO thread(tid:140038304155200) terminated
64949:M 22 Sep 2023 18:45:29.873 # IO thread(tid:140038293665344) terminated
64949:M 22 Sep 2023 18:45:29.873 # IO thread(tid:140038283175488) terminated
64949:M 22 Sep 2023 18:45:29.873 # IO thread(tid:140038272685632) terminated
64949:M 22 Sep 2023 18:45:29.874 # IO thread(tid:140038262195776) terminated
*** Preparing to test memory region 725000 (2273280 bytes)
*** Preparing to test memory region 2274000 (135168 bytes)
*** Preparing to test memory region 7f5d319f8000 (10485760 bytes)
*** Preparing to test memory region 7f5d323f9000 (10485760 bytes)
*** Preparing to test memory region 7f5d32dfa000 (10485760 bytes)
*** Preparing to test memory region 7f5d337fb000 (10485760 bytes)
*** Preparing to test memory region 7f5d37400000 (6291456 bytes)
*** Preparing to test memory region 7f5d37b80000 (2621440 bytes)
*** Preparing to test memory region 7f5d45400000 (8388608 bytes)
*** Preparing to test memory region 7f5d45dfb000 (53248 bytes)
*** Preparing to test memory region 7f5d45eb3000 (20480 bytes)
*** Preparing to test memory region 7f5d45efc000 (4096 bytes)
*** Preparing to test memory region 7f5d45fe1000 (4096 bytes)
*** Preparing to test memory region 7f5d460c2000 (8192 bytes)
.O.O.O.O.O.O.O.O.O.O.O.O.O.O
Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.

=== REDIS BUG REPORT END. Make sure to include from START to END. ===

Additional information

  1. Amazon Linux 2023, x86
  2. Steps to reproduce (if any)

I have 6 nodes currently in a clustered state, running at 7.0.8 version. I added 1 more node, 7.2.1. When I tried to migrate the data over, got this crash, while running the following command

redis-cli --cluster rebalance 172.30.74.6:6379 --cluster-use-empty-masters --cluster-weight 5ebca6e2424e3007d364b6aeeaf0ab0639e03e99=0 --cluster-pipeline 1000

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions