Previously, I was a member of the technical staff at Safe Superintelligence. Before that, I was a researcher at Scale AI, and earlier a PhD candidate at Harvard advised by David Parkes.
In my spare time, I've been a lifelong Go player (in fact, seeing AlphaGo beat Lee Sedol was the origin of my interest in AI). I also co-founded the Gradient, a digital magazine focusing on AI.
Reconstructed o1 test-time scaling laws using public API access to o1-mini.
Humanity's Last Exam
Long Phan*, Alice Gatti*, Ziwen Han*, Nathaniel Li*, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, John Ling, Sean Shi, and 1109 others
Nature, 2026
website
We demonstrate that multi-turn human jailbreaks can achieve >70% success rates against LLM defenses that report single-digit success rates for automated single-turn attacks.
Training on chain-of-thoughts that lead to a correct answer can help a LLM self-improve and generalize far beyond their original capabilities in the toy environment of addition.
Unified algorithm for both reinforcement learning and game theory. Can solve MDPs as fast as RL methods and imperfect-information games as fast as CFR using the single set of hyperparameters.
A novel no-regret learning procedure that converges to correlated and coarse-correlated equilibria several orders of magnitude faster than previous methods in randomly generated normal-form games.
Existing language models can generate either high quality or diverse utterances, but not both simultaneously. How can we measure that in a single metric?