My notes from the AI Alignment Course by BlueDot Impact
Week 1 - AI and the Years Ahead
Week 3 - Reinforcement Learning from Human/AI Feedback
Week 5 - Mechanistic Interpretability
Week 6 - Technical Governance Approaches
[OLD]
Week 2 - Reward Misspecification and Instrumental Convergence