강의 계획서 (Lecture Plan)
| Week | Topic | Contents | Download |
|---|---|---|---|
| 01 | Introduction to Reinforce Learning | Introduction, Basic Ideas, RL Fields | |
| 02 | What does RL Learn? | Key Concepts, Trajectory, Discount Factor, Value Functions |
|
| 03 | Taxonomy & MDP Basics | RL Taxonomy, MDP Basic Theory | |
| 04 | Markov Decision Process in RL | Concept of MDP, Relationship between MDP and RL | |
| 05 | Dynamic Programming in RL | Recap DP, Bellman Equation, Value Propagation, Policy and Value Iteration |
|
| 06 | Monte Carlo Methods in RL | DP Limitations, Sample Return (Gt), First-visit, Every-visit, MC Controls, Limitations of MC |
|
| 07 | Temporal Difference | TD prediction and controls, SARSA, Q-learning, Practical Advantage of TD |
|
| 08 | MAB, Exploration vs. Exploitation | Multi-Armed Bandit Problem, Exploration, Exploitation |
|
| 09 | Multi-Armed Bandit Practice | Multi-Armed Bandit Implementation, Analytics | Guide Page Code Exercise: ◦ python script ◦ notebook |
| 10 | Q-Learning | Bellman and Q-learning Theroy | |
| 11 | DQN | Deep Q-learning & Practice | ◦ DQN Theroy (pdf) ◦ DQN Practice (pdf) ◦ Codes (.zip) |
| 12 | Coding Practice | Practice for Code Repair using RL & LLM | ◦ Guide Page ◦ Codes (.zip) |
| 13 | Double DQN | Double Q-learning & Practice | ◦ Theory ◦ Practice Guide ◦ Codes (.zip) |
| 14 | Dueling DQN | Dueling Deep Q-learning Theory | ◦ Theory (.pdf) |
| 15 | Policy Gradient | Policy Optimization, Derivation, Tricks, Intro to Advanced Policy Gradient Methods | ◦ Theory (.pdf) |