Skip to content

Add dqn and ac for VM scheduling#358

Closed
MicrosoftHam wants to merge 3 commits intomicrosoft:v0.2_rl_refinementfrom
MicrosoftHam:v0.2_rl_refinement_vm
Closed

Add dqn and ac for VM scheduling#358
MicrosoftHam wants to merge 3 commits intomicrosoft:v0.2_rl_refinementfrom
MicrosoftHam:v0.2_rl_refinement_vm

Conversation

@MicrosoftHam
Copy link
Copy Markdown
Contributor

Description

Implement reinforcement learning algorithm for VM scheduling simulation.

Linked issue(s)/Pull request(s)

Type of Change

  • Non-breaking bug fix
  • Breaking bug fix
  • New feature
  • Test
  • Doc update
  • Docker update

Related Component

  • Simulation toolkit
  • RL toolkit
  • Distributed toolkit

Has Been Tested

  • OS:
    • Windows
    • Mac OS
    • Linux
  • Python version:
    • 3.6
    • 3.7
  • Key information snapshot(s):

Needs Follow Up Actions

  • New release package
  • New docker image

Checklist

  • Add/update the related comments
  • Add/update the related tests
  • Add/update the related documentations
  • Update the dependent downstream modules usage

@Jinyu-W Jinyu-W requested review from Jinyu-W and ysqyang June 10, 2021 09:38

action_prob = Categorical(self.forward(states, critic=False)[0] * legal_action) # (batch_size, action_space_size)
action = action_prob.sample()
log_p = action_prob.log_prob(action)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering should we use the action_prob without multiplying legal_action to calculate the log_p or not?

Similar concern as the reward shaping case we discussed several days ago (when only the postpone is valid but model output prefers some PM).

class SequenceNet(AbsBlock):
"""Fully connected network with optional batch normalization, activation and dropout components.

Args:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not corresponds to the actual parameters. Absence better than errors.

from maro.rl import AbsBlock


class SequenceNet(AbsBlock):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this used?

from maro.rl import AbsBlock


class SequenceNet(AbsBlock):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn't seem to be a need to subclass AbsBlock. Can you try inheriting from AbsCoreModel and use PM and VM as components?

self.component["critic"](states) if critic else None
)

def get_action(self, states, legal_action, training=True):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make legal_action a part of states so we don't have to change the call interface? There is no restriction on what type states should be in forward.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a tuple (states, legal_action) as states to input the function

self._skip_connection = skip_connection

# build the pm sequence net
pm_dims = [self._pm_input_dim*self._pm_num] + self._hidden_dims[:2]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pm_dims = [self._pm_input_dim * self._pm_num] + self._hidden_dims[:2]

self._name = name

def forward(self, x):
pm_info_input = x[:, :self._pm_state_dim].view(-1, self._pm_window_size, self._pm_num * self._pm_input_dim)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why previous _pm_state_dim? (why not x.view(...))

# self._pm_sequence_rnn.flatten_parameters()
# pm_sequence_feature, _ = self._pm_sequence_rnn(pm_info_feature)

vm_info_input = x[:, -self._vm_state_dim:].view(-1, self._vm_window_size, self._vm_input_dim)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar question to the pm one

log_p_new = torch.clamp(log_p_new, min=-20)

if self.config.clip_ratio is not None:
ratio = torch.exp(log_p_new - log_p)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why design the ratio like this? what's the meaning of it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to use the PPO algorithm

actor_loss = -(torch.min(ratio * advantages, clip_ratio * advantages)).mean()
else:
dist = Categorical(action_probs)
actor_loss = -(log_p_new * advantages + 10 * dist.entropy()).mean()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to encourage bigger entropy?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to prevent the action probability converge too fast

from collections import defaultdict

from maro.rl import ExperienceSet
from examples.vm_scheduling.refine_rl.common import VMEnvWrapper
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it still runnable? examples.vm_scheduling.refine_rl.common seems not exist (at least in this PR)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It runnable in my environment, as I have the examples.vm_scheduling.refine_rl.common. So I don't notice this.

del buf["states"][:-1]
del buf["actions"][:-1]
del buf["rewards"][:-1]
del buf["info"][:-1]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return buf["info"][1:] but del buf["info"][:-1]?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info store the legal_action; In DQN, exp_set should store the next_legal_action just like the next_states. But legal_action is useless in the AC training process. So I use the same treatment as the DQN.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is useless, why add it into buf["info"] then?

__all__ = [
"ACNet",
"VMActorCritic",
"CombineNet", "SequenceNet"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no CombineNet and SequenceNet anymore

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix it

total_pm_info[:, 3] /= self._max_memory_capacity

# get the remaining cpu and memory of the pms
remain_cpu = (1 - total_pm_info[:, 2]).reshape(1, self._pm_num, 1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the situation where the PM capacity varied, the calculation method is wrong.

remain_cpu = (total_pm_info[:, 0] - total_pm_info[:, 2]) / max_cpu_capacity
remain_memory = (total_pm_info[:, 1] - total_pm_info[:, 3]) / max_memory_capacity

would be better

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I'll fix it

self._pm_num = pm_num
self._durations = durations
self._vm_states = np.load(vm_state_path)
self._dim = (pm_num * 2) * pm_window_size + len(VM_ATTRIBUTES) * vm_window_size
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment for the PM attributes used (why * 2)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a variable PM_DIM and explain the variable

total_pm_info = np.concatenate((remain_cpu, remain_memory), axis=2)

# get the sequence pms' information
self._history_pm_state = np.concatenate((self._history_pm_state, total_pm_info), axis=0)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep all the history?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The history_pm_state will store the total history information. Although some history information will never be used, it's difficult to remove the numpy data, so I choose store them all.

vm_info = np.array([
decision_event.vm_cpu_cores_requirement,
decision_event.vm_memory_requirement,
min(self._durations - env.tick, decision_event.vm_lifetime) / 200,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

200 or adjusted based on the duration in the config?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a new variable to replace the 200

vm_info[2] = (vm_info[2] * 1.0) / 200
vm_info[3] = (self._durations - vm_info[3]) * 1.0 / 200
else:
vm_info = np.zeros(len(VM_ATTRIBUTES), dtype=np.float)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the total_vm_info already initialized as zeros, no need to assign zeros again

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove the else condition


total_vm_info[self._vm_window_size - (idx - self._st + 1), :] = vm_info

self._st = (self._st + 1) % self._vm_states.shape[0]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potential bugs.

no vm info/order checking,
if the requests info saved in self._vm_states less than the actual requests, wrong VM states info (re-starting from idx 0) would be used.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a judgement to determine whether the vm states and current simulation are matched

PM_ATTRIBUTES = ["cpu_cores_capacity", "memory_capacity", \
"cpu_cores_allocated", "memory_allocated"]

VM_ATTRIBUTES = ["cpu_cores_requirement", "memory_requirement", "lifetime", "remain_time", "total_income"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the latter 3 are actual information got by peeking.
How about adding switches in the config to decide to use them or not?

Copy link
Copy Markdown
Contributor

@hamcoder hamcoder Jul 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a switch to determine peeking or not

import pandas as pd


PM_ATTRIBUTES = ["cpu_cores_capacity", "memory_capacity", \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PM_ATTRIBUTES here are not aligned with the VM_ATTRIBUTES.
The previous one is used to extract features from the snapshot_list, but the latter one is the list of features used.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add PM_EXTRACTED_ATTRIBUTES to represent the features extracted from the snapshot_list, and PM_ATTRIBUTES to represent the list of features used.

return total_vm_info

def _get_legal_pm(self, decision_event, total_pm_info):
# get the legal pm
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# get the legal pm
legal_pm = np.zeros(self._pm_num + 1)
legal_pm[self.pm_num] = 1
if len(decision_event.valid_pms) > 0:
    remain_cpu_dict = dict()
    for pm in decision_event.valid_pms:
        # if two pm has same remaining cpu, only choose the one which has smaller id
        if total_pm_info[-1, pm, 0] not in remain_cpu_dict.keys():
            remain_cpu_dict[total_pm_info[-1, pm, 0]] = 1
            legal_pm[pm] = 1

would be enough

TICKS_PER_HOUR: 12

# Path of the vm table data.
VM_TABLE: "maro/tests/data/vm_scheduling/vmtable_short.bin"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file not exist

VM_TABLE: "maro/tests/data/vm_scheduling/vmtable_short.bin"

# Path of the cpu readings file.
CPU_READINGS: "maro/tests/data/vm_scheduling/vm_cpu_readings-file-2-of-short.bin"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file not exist


vm_table, vm, cpu = [], [], []

vmtable_data_path = "data/vmtable_10k.csv"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where does this file come from?

@@ -0,0 +1,66 @@
import pandas as pd
Copy link
Copy Markdown
Contributor

@Jinyu-W Jinyu-W Jun 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments for this file, used for what ?

Or you can add an README for this whole example, to introduce what's the files in this folder, what's the steps (generating data, get VM states, run ...)

plt.switch_backend('agg')


class VMLearner:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you implement a new Learner instead of using the original one?

@staticmethod
def _get_td_errors(
q_values, next_q_values,
rewards, gamma, loss_func
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why switch line here?

@Jinyu-W Jinyu-W added in progress Not finished items/bugs. vm scheduling data center scenario. labels Jun 25, 2021
@ysqyang ysqyang mentioned this pull request Jul 19, 2021
20 tasks
@Jinyu-W
Copy link
Copy Markdown
Contributor

Jinyu-W commented Sep 6, 2021

Closed since the RL examples for VM are added in another PR: #375

@Jinyu-W Jinyu-W closed this Sep 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

in progress Not finished items/bugs. vm scheduling data center scenario.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants