![]() |
Kai YuPh.D. (Cantab) FISCA FIEEE Distinguished Professor Head of the Cross-media Language Intelligence (X-LANCE) Lab (Former SpeechLab) Director of the Machine Intelligence Institute School of Computer Science Shanghai Jiao Tong University Email: kai.yu [AT] sjtu [DOT] edu [DOT] cn Address: School of Computer Science, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China [中文]|[English] |
I am currently a distinguished professor and the director of Machine Intelligence Institute of the School of Computer Science at Shanghai Jiao Tong University (SJTU), as well as the co-founder and chief scientist of AISpeech. I am a fellow of ISCA (International Speech Communication Association), fellow of IEEE (Institute of Electrical and Electronics Engineers) and distinguished member of CCF (China Computer Federation).
My academic journey began at the Department of Automation at Tsinghua University, where I completed my bachelor and master degrees in 1999 and 2002 respectively. I obtained my PhD at the Machine Intelligence Lab of the Engineering Department, Cambridge University, U.K. in 2006 and then worked as a senior research associate there. I joined SJTU in 2012 and founded SpeechLab at SJTU. Later, SpeechLab was extended and renamed as Cross-media Language Intelligence (X-LANCE) Lab as it is now. I have served as a member of IEEE Speech and Language Processing Technical Committee (2017-2019), a board member of the IEEE Signal Processing Society Conferences Board (2024-2025), as well as an associate editor of IEEE/ACM Transactions on Audio, Speech, and Language Processing (2019-2024). I am currently a board member of the IEEE Signal Processing Society Membership Board. I am also a coucil member of the CCF (China Computer Federation) and serve as the director of the Speech, Dialogue and Auditory Processing Technical Committee of CCF.
My research interests primarily lie in the field of conversational AI, including rich aspects of speech and language processing as well as multi-modal linguistic computing. The goal of my research is to build cognitive conversational agent which can operate in complex real-world environment, deal with uncertainty, deliver information in a humanized way and evolve via interacting with environment. I have published over 300 peer-reviewed journal and conference papers and won numerous paper awards. I used to serve as program chairs for Interspeech, ICMI and SigDial, general chair for National Conference on Man-machine Communication (the largest domestic speech conference in China), as well as area chairs of speech processing or dialogue systems for Interspeech, ACL, EMNLP etc.
The outcome of my research have been both recognized in academia and successfully industrialized. I founded AISpeech to commercialize state-of-the-art speech and language processing technology. AISpeech has been selected into the “AI Key Players” list in the Equity Research Report of AI by Goldman Sachs in 2016 and one of the Cool Vendors for AI (East Asia) by Gartner in 2017. On behalf of AISpeech, I am also leading the National AI Open Innovation Platform on Language Computing, granted by Ministry of Science and Technology of China in 2022.
Natural Language Processing and Conversational Agent
structured language understanding, KBQA and machine reading comprehension, statistical dialogue systems, multi-lingual language processing, LLM, conversational agent system
Multi-modal LLM and Interaction
multi-modal LLM, embodied agent, digital avatar, GUI understanding and manipulation, AGI for science
LALM A Survey on Speech Large Language Models for Understanding
Jing Peng, Yucheng Wang, Bohan Li, Yiwei Guo, Hankun Wang, YanGui Fang, Yu Xi, Haoyu Li, Xu Li, Ke Zhang, Shuai Wang and Kai Yu
IEEE Journal of Selected Topics in Signal Processing, (JSTSP), vol. 20, no. 1, pp. 2-31, 2026
Speech Recent Advances in Discrete Speech Tokens: A Review
Yiwei Guo, Zhihan Li, Hankun Wang, Bohan Li, Chongtian Shao, Hanglei Zhang, Chenpeng Du, Xie Chen, Shujie Liu and Kai Yu
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), pp. 4184-4204, vol. 48, 2026
TSE Detect, Attend and Extract: Keyword Guided Target Speaker Extraction
Haoyu Li, Yu Xi, Yidi Jiang, Shuai Wang, Kate Knill, Mark Gales, Haizhou Li and Kai Yu
IJCAI-ECAI 2026
LALM TASU: Text-only alignment for speech understanding
Jing Peng, Yi Yang, Xu Li, Yu Xi, Quanwei Tang, Yangui Fang, Junjie Li and Kai Yu
ICASSP 2026
LALM AHAMask: Reliable Task Specification for Large Audio Language Models without Instructions
Yiwei Guo, Bohan Li, Hankun Wang, Zhihan Li, Shuai Wang, Xie Chen and Kai Yu
AAAI 2026
TTS VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
Chenpeng Du, Yiwei Guo, Hankun Wang, Yifan Yang, Zhikang Niu, Shuai Wang, Hui Zhang, Xie Chen and Kai Yu
ICASSP 2025
ASR TDT-KWS: Fast and Accurate Keyword Spotting Using Token-and-duration Transducer
Yu Xi, Hao Li, Baochen Yang, Haoyu Li, Hainan Xu and Kai Yu
ICASSP 2024
Agent DiSRouter: Distributed Self-Routing for LLM Selections
Hang Zheng, Hongshen Xu, Yongkai Lin, Shuai Fan, Lu Chen and Kai Yu
ICLR 2026
Agent Reducing Tool Hallucination via Reliability Alignment
Hongshen Xu, Zichen Zhu, Lei Pan, Zihan Wang, Su Zhu, Da Ma, Ruisheng Cao, Lu Chen and Kai Yu
ICML 2025
Agent Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback
Hongshen Xu, Zichen Zhu, Situo Zhang, Da Ma, Shuai Fan, Lu Chen and Kai Yu
COLM 2024
LLM Developing ChemDFM as a large language foundation model for chemistry
Zihan Zhao, Da Ma, Lu Chen, Liangtai Sun, Zihao Li, Yi Xia, Bo Chen, Hongshen Xu, Zichen Zhu, Su Zhu, Shuai Fan, Guodong Shen Kai Yu and Xin Chen
Cell Reports Physical Science, vol. 6, issue. 4, pp. 102523, 2025
LLM Large Language Models Are Semi-Parametric Reinforcement Learning Agents.
Danyang Zhang, Lu Chen, Situo Zhang, Hongshen Xu, Zihan Zhao and Kai Yu
NeurIPS 2023
NLP A Heterogeneous Graph to Abstract Syntax Tree Framework for Text-to-SQL
Ruisheng Cao, Lu Chen, Jieyu Li, Hanchong Zhang, Hongshen Xu, Wangyou Zhang, Kai Yu
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 45, no. 11, pp. 13796-13813, 2023
GUI MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation
Zichen Zhu, Hao Tang, Yansi Li, Dingye Liu, Hongshen Xu, Kunyao Lan, Danyang Zhang, Yixuan Jiang, Hao Zhou, Chenrun Wang, Situo Zhang, Liangtai Sun, Yixiao Wang, Yuheng Sun, Lu Chen and Kai Yu
NAACL 2025
Avatar VQTalker: Towards Multilingual Talking Avatars Through Facial Motion Tokenization
Tao Liu, Ziyang Ma, Qi Chen, Feilong Chen, Shuai Fan, Xie Chen and Kai Yu
AAAI 2025
MLLM ChemDFM-X: towards large multimodal model for chemistry
Zihan Zhao, Bo Chen, Jingpiao Li, Lu Chen, Liyang Wen, Pengyu Wang, Zichen Zhu, Danyang Zhang, Yansi Li, Zhongyang Dai, Xin Chen and Kai Yu
Science China Information Science, 67: 220109, 2024
GUI Towards Multi-modal Conversational Agents on Mobile GUI
Liangtai Sun, Xingyu Chen, Lu Chen, Tianle Dai, Zichen Zhu and Kai Yu
EMNLP 2022