Research

My interest is in learning and generalization dynamics as it pertains to (mis)alignment. My favourite writings on this topic are Paul Christiano's essays from 2018 to 2019; in some ways, these ideas were what got me into AI safety.

At MATS, I'm developing a new finetuning-based propensity auditing method with Boyd Kane, advised by Alex Turner and Alex Cloud. Our work was selected as one of eight Spotlight talks at the end-of-fellowship MATS Symposium. Previously, I researched value generalization from reinforcement learning with James Evans and Austin Kozlowski at UChicago Knowledge Lab.

Narrow RL Induces Broad Behavior Changes in LLMs

Jo J. Jiao, Austin C. Kozlowski, James Evans

NeurIPS 2025 Workshop on LLM Evaluation