Random Smooth Weighted Round Robin
In the Smooth Weighted Round Robin (SWRR) algorithm proposed by Nginx, there will be an initial herding effect, which means that in cluster deployment scenarios, multiple stand-alone machines within the cluster and multiple WORKERs within stand-alone machines will make the same scheduling choices.
So I want to introduce a certain randomness on the basis of the current SWRR to break the continuity of the same scheduling, thereby destroying the SWRR algorithm's problem of being unable to perceive other people's scheduling decisions.
And if you need to find the scheduling result with O(1) time, you only need to replace the scheduling sequence table calculation part in the VNSWRR algorithm.
for (peer = rrp->peers->peer, i = 0; peer; peer = peer->next, i++) {
if (peer->down) { continue; }
if (peer->max_fails && peer->fails >= peer->max_fails && now - peer->checked <= peer->fail_timeout) { continue; }
if (peer->max_conns && peer->conns >= peer->max_conns) { continue; }
peer->current_weight += peer->effective_weight;
total += peer->effective_weight;
if (peer->effective_weight < peer->weight) {
peer->effective_weight++;
}
if (best == NULL || peer->current_weight > best->current_weight) {
best = peer;
p = i;
}
}
best->current_weight -= total;
return best;This is the source code part of Nginx's SWRR scheduling implementation. It updates the current weight of RS in each round, and determines the node with the largest current weight, and finally subtracts the total weight from best.
Note that the processing of {9-13} is relatively special. One is that it will accumulate the total weight every large loop traversal, and the other is that its defined weight will change.
total += peer->effective_weight;
if (peer->effective_weight < peer->weight) {
peer->effective_weight++;
}peer->weight: User-configured weightpeer->effective_weight: Dynamic effective weight, initialized to weight. When communication with RS fails, it decreases, and then gradually recovers
Here peer->weight is, effective_weight is actually also, but it is a dynamically changing effective weight value, initialized to weight. In Nginx's design, when communication errors occur, it will decrease to reduce the possibility of continuing to forward to this node, and then gradually recover.
Refer to
ngx_http_upstream_init_round_robininitialization assignment, which will havepeer.weight = peer.effective_weight = server.weight
At system initialization, the herding effect of SWRR is solved and the random effect is amplified by initializing all effective_weight to 1.
Regarding the introduction of random factors, two ideas are provided. They are equivalent in terms of randomness issues, and the difference lies in the distribution of random overhead:
- When selecting best, you can add a random number judgment. When
peer->current_weight == best->current_weight, whether to replacebest = peer. At this time, the random overhead is to calculate the next random number and judge it at each scheduling. - Disrupt the order of optional RS to make the selection random due to the random RS order when facing the same current weight RS. At this time, the random overhead is to organize RS storage in the scheduler preparation stage, which can be disrupted in O(N) order with Fisher-Yates and other methods.
Here, random factors are introduced through random number threshold judgment.
peer.weight = server.weight;
peer.effective_weight = 1;
random(lb_sync_ip:tid) // different in LBs and WORKERs
for all RS {
peer->current_weight += peer->effective_weight;
total += peer->effective_weight;
if (peer.effective_weight < peer.weight) {
peer.effective_weight++;
}
if (best == NULL
|| peer->current_weight > best->current_weight
|| (peer->current_weight == best->current_weight && random.next() > THRESHOLD))
{
best = peer;
}
}
best->current_weight -= total;
return best;{1-2} We initialize effective_weight to 1 and weight to the defined value.
{3, 14} Then introduce random numbers to judge that when the current weights of the two RS being compared are equal, best will only be updated when the random result exceeds the threshold, thereby disrupting the scheduling order.
Other parts follow Nginx, gradually updating effective_weight and updating the current weight value of best.
The final scheduling sequence table will be divided into two parts:
- Total weight growth Step
- Total weight stable Step
We take three servers as an example, and their configured weights are 2, 3, and 4 respectively.
Here [current_weight] indicates that it is selected because it has the largest current weight or there are multiple maximums, but it is selected in the random selection; (total_weight) indicates the total weight value accumulated in the current round. We don't consider random factors when selecting.
| Server | eff | cur | eff | cur | cur | eff | cur | cur | eff | cur | cur |
|---|---|---|---|---|---|---|---|---|---|---|---|
| A(2) | 1 | [1] | 2 | -2 | 0 | 2 | 0 | 2 | 2 | 2 | [4] |
| B(3) | 1 | 1 | 2 | 1 | [3] | 3 | -3 | 0 | 3 | 0 | 3 |
| C(4) | 1 | 1 | 2 | 1 | 3 | 3 | 3 | [6] | 4 | -2 | 2 |
| total | (3) | (6) | (8) | (9) |
During the process of increasing the total weight to the total weight defined by the user configuration, the scheduling selection is A-B-C.
It can be seen that in the process of weight growth, the initial scheduling results can be effectively disrupted and evenly distributed. Because all RS have the same initial weight, all RS can be randomly selected during the initial operation.
| Server | |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A(2) | [4] | -5 | -3 | -3 | -1 | -1 | 1 | 1 | 3 | 3 | [5] | -4 | -2 | -2 | 0 | 0 | 2 | 2 | [4] |
| B(3) | 3 | 3 | [6] | -3 | 0 | 0 | 3 | 3 | [6] | -3 | 0 | 0 | 3 | 3 | [6] | -3 | 0 | 0 | 3 |
| C(4) | 2 | 2 | 6 | 6 | [10] | 1 | [5] | -4 | 0 | 0 | 4 | 4 | [8] | -1 | 3 | 3 | [7] | -2 | 2 |
When the weight increases to the configured weight, it can run like normal SWRR. The scheduling selection is A-B-C-C-B-A-C-B-C.
The final complete scheduling sequence is also quite ideal.
The advantage of this SWRR scheme is that the implementation logic is simple and solves the problem of unified scheduling within the cluster, but this does not achieve the pre-calculated O(1) scheduling of VNSWRR.
Another way to achieve randomness is to disrupt the order of server nodes before each scheduling. This method disrupts the node order before scheduling, making the selection random when the weights are equal.
The advantage of this approach is that the random calculation is performed only once before scheduling, rather than at each comparison, which can reduce the overhead of random number generation.
When we consider the pre-calculation method of VNSWRR to achieve O(1) time scheduling, referring to the way random factors are introduced in VNSWRR, the shuffle order scheme is more reasonable because it can enter the same scheduling sequence cycle after running to a stable state, avoiding the need to recalculate the scheduling sequence table due to continuous attempts to judge the threshold with random numbers.
Considering the problem of VNSWRR pre-calculation overhead, a step-by-step calculation method may be introduced. In order to reduce the total step length of the first step,
effective_weightcan be initialized to the minimum weight instead. This reduces the step length of the first step frommax_weight - 1tomax_weight - min_weight, and since theeffective_weightis the same at the beginning, the same goal of introducing randomness can be achieved.