About me

I am currently a first-year PhD student at UW-Madison, where I am fortunately advised by Prof. Sharon Li. I received the B.E. and M.E. in Computer Science and Technology from HUST (Huazhong University of Science and Technology).
My current research interests lie in RL algorithms for large-scale models and their downstream applications, such as agentic systems.

🔥 News

2026/04 Two papaers get accepted at ACL 2026.
2026/01 GEB (General Exploratory Bonus for RLHF) is accepted by ICLR2026.
2025/11 GEB (General Exploratory Bonus for RLHF) is selected as oral presentation in ResponsibleFM@NeurIPS2026.
2025/08 I started my PhD journey at UW-Madison.
2025/05 Free Process Rewards without Process Labels is accepted by ICML2025.
2025/01 PQM (Process Reward Model with Q-value Rankings) is accepted by ICLR2025.
2024/02 1 paper got accepted by Findings of NAACL2024.
2023/10 I received National Scholarship (Top 3% nationwide).
2023/05 1 paper got accepted by ACL 2023 main conference.

📝 Selected Publications

(See full list in the publication section or [google scholar])

LAD: Learning Advantage Distribution for Reasoning [pdf] [code] (ACL2026)
Wendi Li, Sharon Li
General Exploratory Bonus for Optimistic Exploration in RLHF [pdf] [code] (ICLR2026, ResponsibleFM@NeurIPS2026 Oral)
Wendi Li , Changdae Oh, Yixuan Li
Process Reinforcement through Implicit Rewards [pdf] [code]
Ganqu Cui*, Lifan Yuan*, Zefan Wang*, Hanbin Wang*, Wendi Li* , Bingxiang He*, Yuchen Fan*, Tianyu Yu*, Qixin Xu*, Weize Chen, Jiarui Yuan, Huayu Chen, Kaiyan Zhang, Xingtai Lv, Shuo Wang, Yuan Yao, Xu Han, Hao Peng, Yu Cheng, Zhiyuan Liu, Maosong Sun, Bowen Zhou, Ning Ding
Free Process Rewards without Process Labels [pdf] [code] (ICML2025)
Lifan Yuan*, Wendi Li*, Huayu Chen, Ganqu Cui, Ning Ding, Kaiyan Zhang, Bowen Zhou, Zhiyuan Liu, Hao Peng
Process Reward Model with Q-value Rankings [pdf] [code] (ICLR2025)
Wendi Li, Yixuan Li
Reinforcement Learning with Token-level Feedback for Controllable Text Generation [pdf] [code] (NAACL2024)
Wendi Li, Wei Wei, Kaihe Xu, Wenfeng Xie, Dangyang Chen, Yu Cheng
TREA: Tree-Structure Reasoning Schema for Conversational Recommendation [pdf] [code] (ACL2023)
Wendi Li, Wei Wei, Xiaoye Qu, Xian-Ling Mao, Ye Yuan, Wenfeng Xie, Dangyang Chen

Wendi Li

About me

🔥 News

📝 Selected Publications