Add config set command to modify corresponding redis nodes dynamically#95
Add config set command to modify corresponding redis nodes dynamically#95doyoubi merged 4 commits intoeleme:masterfrom
config set command to modify corresponding redis nodes dynamically#95Conversation
|
Could you add some tests for this feature please? |
|
Thanks for the pull request. |
|
config.node need by modified in config_add, update_slots and setremotes. all reading stages accquire a read lock using a rwlock, is this acceptable? |
|
When worker threads are processing commands, they will choose a redis connection and may access |
|
fine. I will make a lock-free implement then. |
|
now it's lock free impl. is it necessary to write a separately test? |
src/command.c
Outdated
| const size_t CMD_NUM = sizeof(cmds) / sizeof(struct cmd_item); | ||
| static struct dict command_map; | ||
| //need not destroy in linux | ||
| static pthread_mutex_t lock_setremotes = PTHREAD_MUTEX_INITIALIZER; |
There was a problem hiding this comment.
Even if it's ok not to destroy mutex, I think we should do it explicitly. Not destroying it is confusing.
src/command.c
Outdated
|
|
||
| if (strcasecmp(type, "INFO") == 0) { | ||
| return cmd_proxy_info(cmd); | ||
| } else if (strcasecmp(type, "SETREMOTES") == 0) { |
There was a problem hiding this comment.
Consider using the CONFIG SET command. It's more intuitive to use redis command to telll the user what exactly this command do. In fact I didn't realize what SETREMOTES is until I read the code.
There was a problem hiding this comment.
sure to change this?
because the PR in redisctl almost be merged so we must decide it quickly.
CONFIG SET NODE HOST1 PORT1 HOST2 PORT2 ... ?
There was a problem hiding this comment.
this cmd is implemented to support operations. It's mainly for REDIS-CTL.
There was a problem hiding this comment.
Yes, because later we may let more config options changable in run time. I think we should keep all this to only one CONFIG SET command.
There was a problem hiding this comment.
ok, no problem
the cmd's syntax will be "CONFIG SET NODE ip:port,ip1:port1", is that ok?
and i want to figure out if there is a method to check "cluster_ok" in proxy, need by redisctl.
There was a problem hiding this comment.
CONFIG SET NODE ip port ip port... is easier to parse. And I suggest NODE should not be case sensitive.
What does cluster_ok mean?
There was a problem hiding this comment.
"CONFIG SET NODE ip:port,ip1:port1" is that i want to reuse config_add and easy to understand comparing to config syntax, is this ok? Maybe we will reuse more in config_add when we add more functions.
there's a polling monitor in REDIS-CTL and it tries to fetch the status of cluster in proxy. The value will turn to false when cluster is down or update-slots-thread fails.
There was a problem hiding this comment.
- OK.
- Corvus doesn't check the status of cluster.
There was a problem hiding this comment.
- Ok. Maybe i will always set this to "ok" for corvus proxy.
src/command.c
Outdated
| //lock global | ||
| pthread_mutex_lock(&lock_setremotes); | ||
| if (data->elements % 2 != 0 || data->elements < 4) { | ||
| cmd_mark_fail(cmd, rep_addr_err); |
There was a problem hiding this comment.
Please correct your indention.
src/command.c
Outdated
| return cmd_proxy_info(cmd); | ||
| } else if (strcasecmp(type, "SETREMOTES") == 0) { | ||
| //lock global | ||
| pthread_mutex_lock(&lock_setremotes); |
There was a problem hiding this comment.
The critical section here is too large. A smaller one provides better performance and makes it more easier to understand what it protects.
src/command.c
Outdated
| } | ||
| if (pos_to_str(&data->element[i + 1].pos, port_s) == CORVUS_ERR) { | ||
| continue; | ||
| } |
There was a problem hiding this comment.
When the input is illegal, reject it with error message instead of just ignoring it.
src/command.c
Outdated
| } | ||
| { | ||
| struct node_conf *newnode = cv_malloc(sizeof(struct node_conf)); | ||
| memset(newnode, 0, sizeof(struct node_conf)); |
There was a problem hiding this comment.
I think cv_calloc is better.
src/connection.c
Outdated
| for (i = 0; i < config.node.len; i++) { | ||
| server = conn_get_server_from_pool(ctx, &config.node.addr[i], false); | ||
| struct node_conf *node = config.node; | ||
| conf_node_inc_ref(node); |
There was a problem hiding this comment.
Note that there is race condition here. After you get node, other threads may call conf_node_dec_ref and destroy it which will cause corruption in conf_node_inc_ref(node). We do need lock here and this is not likely to bring notable performance problem since the critical section here is so small.
There was a problem hiding this comment.
to avoid locking, is it possible to increase refcount first then get pointer?
{
ATOMIC_INC(ref)
return ref
}
There was a problem hiding this comment.
No... Before you increase reference count, you have to get its address first. But between them, other threads may free the content of that address. ATOMIC_INC does not guarantee the address is safe to use.
There was a problem hiding this comment.
Done with a mutex lock and change inc() to return a pointer
| struct context *contexts = get_contexts(); | ||
|
|
||
| memcpy(&config.node, &conf, sizeof(config.node)); | ||
| config.node = cv_malloc(sizeof(struct node_conf)); |
There was a problem hiding this comment.
Have you used valgrind to check whether there is memory leak in test program?
Please add more complete tests.
There was a problem hiding this comment.
yeah, checked for newest version and change all malloc to calloc
15393e3 to
8d187eb
Compare
|
hi, there is a new feature is needed to support running in dockers: is that in the corvus road map? and surely some basic configurations' modifying in runtime can be made together. |
|
We originally don't have plans for overriding configurations on the fly. But it's indeed a must have feature for running in docker etc. It's quite honoured to have your pull request. In regard to hot configuration modifying, we currently lack enough man power to fully test this feature, so it'll be sometime before we're fully confident to merge this request. But
seems a simple feature, your pull request is quite welcome. |
|
@wooparadog |
|
when would this PR be merged ... Now corvus automatically discovers redis nodes, and when I use |
|
@tonicbupt what kind of cluster do you deploy? generally multi (M-S) or more slaves in a "slot group" may not cause such problem. The slave of the broken master should become master automatically. And once a request failed corvus would to update the mapping once. @wooparadog am i right? I just say it according to the debug log and i remember having done such experiment before. |
|
Corvus will redirect and retry when receive |
|
@jasonjoo2010 here's how I reproduce this issue: |
|
@tonicbupt In your case corvus will not redirect |
|
@doyoubi |
|
@doyoubi maybe i know what tonic is saying about. when he remote a node from cluster using ruskit, the node removed is still accessable (no conn err and no redirect error), so corvus will not retry during this request. All other request will succeed. maybe corvus only update its map with a update thread timeout(does it exists?) |
|
@tonicbupt I reproduce your case and get |
|
@doyoubi actually I mean after removing nodes, |
|
@tonicbupt Of course, corvus will find the new node by redirecting according to |
|
@jasonjoo2010 Hey, Just in case that you forgot this. This feature is officially in our schedule, there are still changes requested, if you are not active on this PR anymore, we'll continue your work by ourselves. |
|
@tevino Or is there any more reviews? |
doyoubi
left a comment
There was a problem hiding this comment.
Thanks to your contribution. Sorry for leaving this PR for such a long time.
src/command.c
Outdated
| const size_t CMD_NUM = sizeof(cmds) / sizeof(struct cmd_item); | ||
| static struct dict command_map; | ||
|
|
||
| static pthread_mutex_t lock_config = PTHREAD_MUTEX_INITIALIZER; |
There was a problem hiding this comment.
Is this lock used to guarantee that there is always only one thread modifying the config.node? I think we can do this in another place. See the reason below.
src/command.c
Outdated
| pthread_mutex_lock(&lock_config); | ||
| if (strcasecmp(option, "NODE") == 0) { | ||
| // config set node host:port,host1:port1 | ||
| char value[data->element[3].pos.str_len + 1]; |
There was a problem hiding this comment.
Here the length of element may be less than 4. I think we should add check before this.
src/command.c
Outdated
| if (data->elements != 4) { | ||
| cmd_mark_fail(cmd, rep_config_parse_err); | ||
| } else if (data->element[3].pos.str_len | ||
| < 9|| pos_to_str(&data->element[3].pos, value) != CORVUS_OK) { |
There was a problem hiding this comment.
Would you add some comment about that 9?
src/corvus.c
Outdated
| config.bind = 12345; | ||
| memset(&config.node, 0, sizeof(struct node_conf)); | ||
| config.node = cv_calloc(1, sizeof(struct node_conf)); | ||
| memset(config.node, 0, sizeof(struct node_conf)); |
src/corvus.c
Outdated
| if (socket_parse_ip(p, &config.node.addr[config.node.len]) == -1) { | ||
| cv_free(config.node.addr); | ||
| return -1; | ||
| addr = cv_realloc(addr, sizeof(struct address) * (addr_cnt + 1)); |
There was a problem hiding this comment.
Please correct your indention.
src/corvus.c
Outdated
| newnode->len = addr_cnt; | ||
| newnode->refcount = 1; | ||
| struct node_conf *oldnode = config.node; | ||
| config.node = newnode; |
There was a problem hiding this comment.
Actually we should add lock to protect the swapping pointers here. This is corresponding to the locking in conf_node_inc_ref;
Think about this case:
(1) oldnode get its value.
(2) conf_node_dec_ref is called somewhere else and destroy this oldnode.
Well, even though once initialization, config.node will only be modified by cmd_config protected by lock_config, conf_node_dec_ref also exists in conn_get_raw_server which may be confusing - Is this place will free the current config.node?
Further the the critical section protected by lock_config is a little bit long.
There was a problem hiding this comment.
node_dec_ref has to be reserved to make things simple if you just take it as "i dont want to use it anymore".
lock_config has been removed and it considered to be a config cmd lock. but no logic cannot be multi-thead currently i just remove it now.
i add same lock in pointer swapping.
| void conf_node_dec_ref(struct node_conf *node) | ||
| { | ||
| pthread_mutex_lock(&lock_conf_node); | ||
| int refcount = ATOMIC_DEC(node->refcount, 1); |
There was a problem hiding this comment.
I think we can safely remove the lock here. Thanks to the atomic operation multiple calling of conf_node_dec_ref will end up with only one zero refcount;
There was a problem hiding this comment.
yeah, so just remove it
src/command.c
Outdated
| return CORVUS_ERR; | ||
| } | ||
| if (strcasecmp(type, "SET") == 0) { | ||
| // `config set` generelly need global lock |
There was a problem hiding this comment.
This comment is outdated now.
src/corvus.c
Outdated
| addr_cnt++; | ||
| p = strtok(NULL, ","); | ||
| } | ||
| { |
There was a problem hiding this comment.
Remember to correct your indention.
There was a problem hiding this comment.
And remember to fix conflicts.
There was a problem hiding this comment.
What is that new PR mainly about? Is that a new feature?
I recommend fixing the conflicts here and creating a separate PR for new features.
There was a problem hiding this comment.
yeah, conflicts have been resolved
There was a problem hiding this comment.
@doyoubi OK, now
did these slowly due to poor internet trafic
src/corvus.c
Outdated
| pthread_mutex_lock(&lock_conf_node); | ||
| struct node_conf *oldnode = config.node; | ||
| config.node = newnode; | ||
| conf_node_dec_ref(oldnode); |
There was a problem hiding this comment.
conf_node_dec_ref(oldnode); can be safely moved outside of the critical section.
src/slot.c
Outdated
| memcpy(&node_list.nodes[i], &node->addr[i], sizeof(struct address)); | ||
| node_list.len++; | ||
| } | ||
| conf_node_dec_ref(node); |
There was a problem hiding this comment.
Please move line 40 and 46 outside of the critical section.
src/command.c
Outdated
| } | ||
| } | ||
| } else { | ||
| cmd_mark_fail(cmd, rep_addr_err); |
There was a problem hiding this comment.
I think what you want here is rep_config_err.
8269e60 to
a1642dd
Compare
| cmd_mark_fail(cmd, rep_addr_err); | ||
| } | ||
| } else if (strcasecmp(type, "GET") == 0) { | ||
| if (strcasecmp(option, "NODE") == 0) { |
There was a problem hiding this comment.
We'd better check data->elements here.
There was a problem hiding this comment.
aha, sorry about changes after merging.
I change the assert for elements judging ensure that.
proxy setremotes commandconfig set command to modify corresponding redis nodes dynamically
|
Thanks for your contribution @jasonjoo2010 . We will finally merge this PR after we finish testing. |
|
@jasonjoo2010 Thanks for your work! 😋 |
|
🍻 |
for redis-ctl monitor system
add setremotes support for op purpose