(* denotes equal contribution)

Publications

  • LAD: Learning Advantage Distribution for Reasoning [pdf] [code] (ACL 2026)
    Wendi Li, Sharon Li


  • Towards Reducible Uncertainty Modeling for Reliable Large Language Model Agents [pdf] (ACL 2026)
    Changdae Oh, Seongheon Park, To Eun Kim, Jiatong Li, Wendi Li , Samuel Yeh, Xuefeng Du, Hamed Hassani, Paul Bogdan, Dawn Song, Sharon Li


  • Process Reinforcement through Implicit Rewards [pdf] [code] (TMLR 2026)
    Ganqu Cui*, Lifan Yuan*, Zefan Wang*, Hanbin Wang*, Wendi Li* , Bingxiang He*, Yuchen Fan*, Tianyu Yu*, Qixin Xu*, Weize Chen, Jiarui Yuan, Huayu Chen, Kaiyan Zhang, Xingtai Lv, Shuo Wang, Yuan Yao, Xu Han, Hao Peng, Yu Cheng, Zhiyuan Liu, Maosong Sun, Bowen Zhou, Ning Ding


  • General Exploratory Bonus for Optimistic Exploration in RLHF [pdf] [code] (ICLR 2026; ResponsibleFM@NIPS2026 Oral)
    Wendi Li, Changdae Oh, Yixuan Li


  • Free Process Rewards without Process Labels [pdf] [code] (ICML2025)
    Lifan Yuan*, Wendi Li*, Huayu Chen, Ganqu Cui, Ning Ding, Kaiyan Zhang, Bowen Zhou, Zhiyuan Liu, Hao Peng


  • Process Reward Model with Q-value Rankings [pdf] [code] (ICLR2025)
    Wendi Li, Yixuan Li

  • Reinforcement Learning with Token-level Feedback for Controllable Text Generation [pdf] [code] (Findings of NAACL2024)
    Wendi Li, Wei Wei, Kaihe Xu, Wenfeng Xie, Dangyang Chen, Yu Cheng


  • Position Debiasing Fine-Tuning for Causal Perception in Long-Term Dialogue [pdf] [code] (IJCAI2024)
    Shixuan Fan, Wei Wei, Wendi Li, Xian-Ling Mao, Wenfeng Xie, Dangyang Chen


  • TREA: Tree-Structure Reasoning Schema for Conversational Recommendation [pdf] [code] (ACL2023)
    Wendi Li, Wei Wei, Xiaoye Qu, Xian-Ling Mao, Ye Yuan, Wenfeng Xie, Dangyang Chen


  • Towards Hierarchical Policy Learning for Conversational Recommendation with Hypergraph-based Reinforcement Learning [pdf] [code] (IJCAI2023)
    Sen Zhao, Wei Wei, Yifan Liu, Ziyang Wang, Wendi Li, Xian-Ling Mao, Shuai Zhu, Minghui Yang, Zujie Wen