(* denotes equal contribution)
Publications
- LAD: Learning Advantage Distribution for Reasoning [pdf] [code] (ACL 2026)
Wendi Li, Sharon Li - Towards Reducible Uncertainty Modeling for Reliable Large Language Model Agents [pdf] (ACL 2026)
Changdae Oh, Seongheon Park, To Eun Kim, Jiatong Li, Wendi Li , Samuel Yeh, Xuefeng Du, Hamed Hassani, Paul Bogdan, Dawn Song, Sharon Li - Process Reinforcement through Implicit Rewards [pdf] [code] (TMLR 2026)
Ganqu Cui*, Lifan Yuan*, Zefan Wang*, Hanbin Wang*, Wendi Li* , Bingxiang He*, Yuchen Fan*, Tianyu Yu*, Qixin Xu*, Weize Chen, Jiarui Yuan, Huayu Chen, Kaiyan Zhang, Xingtai Lv, Shuo Wang, Yuan Yao, Xu Han, Hao Peng, Yu Cheng, Zhiyuan Liu, Maosong Sun, Bowen Zhou, Ning Ding - General Exploratory Bonus for Optimistic Exploration in RLHF [pdf] [code] (ICLR 2026; ResponsibleFM@NIPS2026 Oral)
Wendi Li, Changdae Oh, Yixuan Li - Free Process Rewards without Process Labels [pdf] [code] (ICML2025)
Lifan Yuan*, Wendi Li*, Huayu Chen, Ganqu Cui, Ning Ding, Kaiyan Zhang, Bowen Zhou, Zhiyuan Liu, Hao Peng Process Reward Model with Q-value Rankings [pdf] [code] (ICLR2025)
Wendi Li, Yixuan Li- Reinforcement Learning with Token-level Feedback for Controllable Text Generation [pdf] [code] (Findings of NAACL2024)
Wendi Li, Wei Wei, Kaihe Xu, Wenfeng Xie, Dangyang Chen, Yu Cheng - Position Debiasing Fine-Tuning for Causal Perception in Long-Term Dialogue [pdf] [code] (IJCAI2024)
Shixuan Fan, Wei Wei, Wendi Li, Xian-Ling Mao, Wenfeng Xie, Dangyang Chen - TREA: Tree-Structure Reasoning Schema for Conversational Recommendation [pdf] [code] (ACL2023)
Wendi Li, Wei Wei, Xiaoye Qu, Xian-Ling Mao, Ye Yuan, Wenfeng Xie, Dangyang Chen - Towards Hierarchical Policy Learning for Conversational Recommendation with Hypergraph-based Reinforcement Learning [pdf] [code] (IJCAI2023)
Sen Zhao, Wei Wei, Yifan Liu, Ziyang Wang, Wendi Li, Xian-Ling Mao, Shuai Zhu, Minghui Yang, Zujie Wen