Hugh Zhang
I recently joined Scale AI to help kickstart our open source AI research efforts. As such, I'm currently on indefinite leave from Harvard.
Previously, I was a software engineer at Asana (gap year before college), studied Economics at Stanford, and have worked at Google Brain , Meta AI , and Google Deepmind . At Harvard, I was a PhD candidate in the Harvard EconCS group, where I am advised by David Parkes and supported by the NSF Graduate Research Fellowships Program and a Kempner Institute Graduate Fellowship .
My current research interest revolves around teaching large language models to do reasoning and planning. Previously, I did similar work as part of the CICERO project, the first AI agent to achieve human-level performance in the game of Diplomacy .
In my spare time, I've been a lifelong Go player (in fact, seeing AlphaGo beat Lee Sedol was origin of my interest in AI). I also co-founded the Gradient , a digital magazine focusing on AI.
Email  / 
CV  / 
Google Scholar  / 
Twitter  / 
Goodreads  / 
Github
Research
In the past, I've been interested in multi-agent reinforcement learning, game theory, and language models. Much of my research is trying to figure out how to cram all three into one project. * denotes equal or alphabetical ordering.
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
Kenneth Li ,
Samy Jelassi ,
Hugh Zhang ,
Sham Kakade ,
Martin Wattenberg ,
David Brandfonbrener ,
A lightweight alternative to fine-tuning that performs better than LORA for very small datasets and requires minimal model access.
Chain-of-Thought Reasoning is a Policy Improvement Operator
Hugh Zhang , David C. Parkes
Workshop on Instruction Tuning and Instruction Following at NeurIPS 2023
twitter /
slides /
poster
AlphaZero-like self-improvement for language models.
Easy as ABCs: Unifying Boltzmann Q-Learning and Counterfactual Regret Minimization
Luca D'Amico-Wong* , Hugh Zhang* , Marc Lanctot , David C. Parkes
code
Unified algorithm for both reinforcement learning and game theory. Can solve MDPs as fast as RL methods and imperfect-information games as fast as CFR using the exact same set of hyperparameters.
Your browser does not support the video tag.
Human-Level Play In The Game Of Diplomacy By Combining Language Models With Strategic Reasoning
Anton Bakhtin*, Noam Brown*, Emily Dinan*, Gabriele Farina*, Colin Flaherty*, Daniel Fried*, Andrew Goff*, Jonathan Gray*, Hengyuan Hu*, Athul Paul Jacob*, Mojtaba Komeili*, Karthik Konath*, Adam Lerer*, Mike Lewis*, Alexander H. Miller*, Sasha Mitts*, Adithya Renduchintala*, Stephen Roller*, Dirk Rowe*, Weiyan Shi*, Joe Spisak*, Alexander Wei*, David Wu*, Hugh Zhang* , Markus Zijlstra*
Science , 2022
paper
/
blog
/
nyt
/
economist
/
gizmodo
/
forbes
/
new scientist
/
ars technica
/
mit tech review
/
kotaku
/
engadget
/
register
/
hacker news
/
reddit
Human level performance in the game of Diplomacy, where agents negotiate with other humans in natural language.
Equilibrium Finding in Normal-Form Games Via Greedy Regret Minimization
Hugh Zhang , Adam Lerer , Noam Brown
Association for the Advancement of Artificial Intelligence (AAAI) , 2022
A novel no-regret learning procedure that converges to correlated and coarse-correlated equilibria several orders of magnitude faster than previous methods in randomly generated normal-form games.
Trading Off Diversity and Quality in Natural Language Generation
Hugh Zhang* , Daniel Duckworth* , Daphne Ippolito , Arvind Neelakantan
Workshop on Human Evaluation of Natural Language Processing Systems at the Conference of the European Chapter of the Association for Computational Linguistics (HumEval Workshop @EACL) , 2021
The first large-scale evaluation of decoding methods for large language models along the entire quality-diversity spectrum.
A Simple Adaptive Procedure Converging to Forgiving Correlated Equilibria
Hugh Zhang (advised by Gabriel Carroll )
Stanford Senior Honors Thesis in Economics , 2020
(John G. Sobieski Award for Creative Thinking)
Alongside Celli et. al (2020) (concurrent work), this paper gives the first internal regret minimization dynamics for extensive-form games.
Unifying Human and Statistical Evaluation for Natural Language Generation
Tatsunori Hashimoto* , Hugh Zhang* , Percy Liang
North American Chapter of the Association for Computational Linguistics (NAACL) , 2019 (Oral Presentation)
Existing language models can generate either high quality or diverse utterances, but not both simultaneously. How can we measure that in a single metric?