The aim is to understand

How reward functions can fail to specify our intentions
What RLHF is
What IRL is
What instrumental goals are and how ML systems could seek power

Specification gaming: the flip side of AI ingenuity

https://deepmind.google/discover/blog/specification-gaming-the-flip-side-of-ai-ingenuity/