(대학원) 강화학습

(Grad) Reinforce Learning · Prof. Giseop Noh

RL Theory | Q-learning | PPO | RLHF | Immitation Learning

Lecture plan (click me)

주차별 자료

01 Introduction to RL PDF
02 What does RL Learn? PDF
03 RL Taxonomy & MDP Basics PDF
04 Markov Decision Process in RL PDF
05 Dynamic Programming in RL PDF
06 Monte Carlo Methods in RL PDF
07 Temporal Difference PDF
08 MAB, Exploration vs. Exploitation PDF
09 Multi-Armed Bandit Practice MD
10 Q-Learning PDF
11 Deep Q-learning Network (DQN) MD
12 Practice for Code Repair using RL & LLM MD
13 Double DQN PDF
14 Dueling DQN PDF
15 Policy Gradient PDF
16 Play with Stable Baselines 3 MD
17 Actor-Critic Algorithm PDF
18 DDPG, TD3, SAC Algorithms PDF
19 TRPO and PPO Algorithms PDF
20 Immitation Learning & RL from Human Feedback PDF

전체 강좌 목록으로 이동