강의 계획서 (Lecture Plan)

Week	Topic	Contents	Download
01	Introduction to Reinforce Learning	Introduction, Basic Ideas, RL Fields	pdf
02	What does RL Learn?	Key Concepts, Trajectory, Discount Factor, Value Functions	pdf
03	Taxonomy & MDP Basics	RL Taxonomy, MDP Basic Theory	pdf
04	Markov Decision Process in RL	Concept of MDP, Relationship between MDP and RL	pdf
05	Dynamic Programming in RL	Recap DP, Bellman Equation, Value Propagation, Policy and Value Iteration	pdf
06	Monte Carlo Methods in RL	DP Limitations, Sample Return (Gt), First-visit, Every-visit, MC Controls, Limitations of MC	pdf
07	Temporal Difference	TD prediction and controls, SARSA, Q-learning, Practical Advantage of TD	pdf
08	MAB, Exploration vs. Exploitation	Multi-Armed Bandit Problem, Exploration, Exploitation	pdf
09	Multi-Armed Bandit Practice	Multi-Armed Bandit Implementation, Analytics	Guide Page Code Exercise: ◦ python script ◦ notebook
10	Q-Learning	Bellman and Q-learning Theroy	pdf
11	DQN	Deep Q-learning & Practice	◦ DQN Theroy (pdf) ◦ DQN Practice (pdf) ◦ Codes (.zip)
12	Coding Practice	Practice for Code Repair using RL & LLM	◦ Guide Page ◦ Codes (.zip)
13	Double DQN	Double Q-learning & Practice	◦ Theory ◦ Practice Guide ◦ Codes (.zip)
14	Dueling DQN	Dueling Deep Q-learning Theory	◦ Theory (.pdf)
15	Policy Gradient	Policy Optimization, Derivation, Tricks, Intro to Advanced Policy Gradient Methods	◦ Theory (.pdf)
16	Play with SB3	Introduction to SB3, Gymnasium Integration, Practical RL Experiments ()	◦ Theory (.pdf) ◦ Codes (.zip)
17	Actor-Critic Algorithm	Actor-Critic basics: critic-based bootstrapping, baseline/advantage with TD error (A2C), and A2C/A3C with practical stability ideas.	pdf
18	DDPG, TD3, SAC	Continuous control problems, deterministic policy gradient, actor-critic structure, TD3 stabilization tricks, and SAC entropy-regularized policy optimization	pdf
19	TRPO, PPO	Safe Region, KL-Divergence, Computational Issues, Policy Ratio, Clipping	pdf
20	IL, RLHF	Immitation Learning, Reinfroce Learning from Human Feedback	pdf

Introduction to RL →