The aim is to understand
How reward functions can fail to specify our intentions
What RLHF is
What IRL is
What instrumental goals are and how ML systems could seek power
Specification gaming: the flip side of AI ingenuity
https://deepmind.google/discover/blog/specification-gaming-the-flip-side-of-ai-ingenuity/