My interest is in learning and generalization dynamics as it pertains to (mis)alignment. My favourite writings on this topic are Paul Christiano's essays from 2018 to 2019; in some ways, these ideas were what got me into AI safety.
At MATS, I'm developing a new finetuning-based propensity auditing method with Boyd Kane, advised by Alex Turner and Alex Cloud. Our work was selected as one of eight Spotlight talks at the end-of-fellowship MATS Symposium. Previously, I researched value generalization from reinforcement learning with James Evans and Austin Kozlowski at UChicago Knowledge Lab.
Narrow RL Induces Broad Behavior Changes in LLMs
, Austin C. Kozlowski, James Evans
NeurIPS 2025 Workshop on LLM Evaluation