Artificial Intelligence
Q learning
Pham Viet Cuong
Dept. Control Engineering & Automation, FEEE
Ho Chi Minh City University of Technology
Q learning
ü Supervised learning: Classification, regression
ü Unsupervised learning: Clustering
ü Reinforcement learning:
v More general than supervised/unsupervised learning
v Learn from interactive with environment (perform actions and
observe rewards) to achieve a goal
v Goal: Learn a policy to maximize some measure of long-term reward
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 2
Q learning
ü Examples:
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 3
Q learning
ü Examples:
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 4
Q learning
ü Examples: video games
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 5
Q learning
ü Examples:
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 6
Q learning
ü Example:
v Put an agent in any
room
v Goal: go to Room 5
with fastest route 0 1
4 3 2
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 7
Q learning
ü State: Room 0, Room 1, . . ., Room 5
ü Action: Go to Room 0, Go to Room 1, . . ., Go to Room 5
ü Reward: matrix R
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 8
Q learning
ü Matrix Q: memory of what agent has learned through experience
v Agent starts out knowing nothing
v Q is initialized to zero
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 9
Q learning
ü Defined:
v States
v Actions
v Rewards matrix R
v Matrix Q
ü Training in progress
v Updating matrix Q
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 10
Q learning
ü Utilize the Q matrix:
v Step 1: Set current state = initial state.
v Step 2: From current state, find the action with the highest Q value.
v Step 3: Perform action chosen in Step 2
v Step 4: Set current state = next state.
v Step 5: Repeat Steps 2, 3 and 4 until current state = goal state.
0 1
4 3 2
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 11
Q learning
ü Q learning algorithm:
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 12
Q learning
ü Q learning algorithm: gamma = 0.8, episode 1, initial state: 1
state = 1 action: go to 5 next_state = 5
100 0
0.8
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 13
Q learning
ü Q learning algorithm: episode 2, initial state = 3
state = 3 action: go to 1 next_state = 1
0 100
0.8
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 14
Q learning
ü Q learning algorithm: episode 2, initial state = 3
state = 1 action: go to 5 next_state = 5
100 0
0.8
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 15
Q learning
ü Q learning algorithm:
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 16
Q learning
ü Q learning algorithm:
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 17
Artificial Neural Networks
ü References
v http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture14.pdf
v http://mnemstudio.org/path-finding-q-learning-tutorial.htm
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 18