Expand each heading to view my notes.
What is Mechanistic Interpretability
An Introduction to Circuits
Toy Models of Superposition
Towards Mono-semanticity: Decomposing Language Models with Dictionary Learning