Jo Jiao

I do research in AI safety. I'm most interested in how models generalize and misgeneralize. My favourite writings on this topic are Paul Christiano's essays circa 2018–19. These ideas, as much as anything, got me into AI safety.

Uncovering Hidden Propensities in Language Models via Limited-Parameter Finetuning

Boyd Kane*, Jo J. Jiao*, Bryce Woodworth, Elizabeth Donoway, Alex Cloud, Alexander Matt Turner

Preprint Under Review

At MATS, my collaborator and I developed a new finetuning-based propensity auditing method. Our work was selected as one of eight Spotlight talks at the end-of-fellowship MATS Symposium.

This project began as an attempt to apply Elizabeth Donoway's excess description length to propensity evaluations. We ended up somewhere quite different, but I still think EDL is important for understanding what the method measures. Some notes & links to her papers here.

Think Fast: Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models

Dewi Gould, Francis Rhys Ward, Anders Cairns Woodruff, Rauno Arike, Josh Hills, Alex Serrano, Ida Caspary, Jason Ross Brown, Jo J. Jiao, Patrick Leask, Twm Stone, Ram Potham, Ionut Gabriel Stan, Harry Mayne, Simeon Hellsten, Shubhorup Biswas, Ariana Azarbal, William L. Anderson, Elle Najt, Ryan Greenblatt, Julian Stastny

arXiv:2606.07157

Narrow RL Induces Broad Behavior Changes in LLMs

Jo J. Jiao, Austin C. Kozlowski, James Evans

NeurIPS 2025 Workshop on LLM Evaluation

Research