A downloadable project

Causal reasoning is a crucial part of how we humans safely and robustly think about the world. Can we identify if LLMs have causal reasoning? Marius Hobbhahn and Tom Lieberum (2022, Alignment Forum) approached this with probing. For this hackathon, we follow-up on that work by exploring a mechanistic interpretability analysis of causal reasoning in the 80 million parameters of GPT-2 Small using Neel Nanda’s Easy Transformer package.

Download

Download
Mechanisms of Causal Reasoning in LLMs - Ben, Jacy, Mark, Sky - Interpretability Hackathon.pdf 111 kB
Download
Mechanisms_of_Causal_Reasoning_in_LLMs.ipynb 164 kB
Download
Ball prompts.csv 1 kB

Leave a comment

Log in with itch.io to leave a comment.