Kai Yu, Shanghai Jiao Tong Univerisity


		Kai Yu Ph.D. (Cantab) FISCA FIEEE Distinguished Professor Head of the Cross-media Language Intelligence (X-LANCE) Lab (Former SpeechLab) Director of the Machine Intelligence Institute School of Computer Science Shanghai Jiao Tong University Email: kai.yu [AT] sjtu [DOT] edu [DOT] cn Address: School of Computer Science, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China [中文]\|[English]

Kai Yu

Ph.D. (Cantab) FISCA FIEEE

Distinguished Professor
Head of the Cross-media Language Intelligence (X-LANCE) Lab (Former SpeechLab)
Director of the Machine Intelligence Institute

School of Computer Science
Shanghai Jiao Tong University

Email: kai.yu [AT] sjtu [DOT] edu [DOT] cn
Address: School of Computer Science, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China

[中文]|[English]

Biography

I am currently a distinguished professor and the director of Machine Intelligence Institute of the School of Computer Science at Shanghai Jiao Tong University (SJTU), as well as the co-founder and chief scientist of AISpeech. I am a fellow of ISCA (International Speech Communication Association), fellow of IEEE (Institute of Electrical and Electronics Engineers) and distinguished member of CCF (China Computer Federation).

My academic journey began at the Department of Automation at Tsinghua University, where I completed my bachelor and master degrees in 1999 and 2002 respectively. I obtained my PhD at the Machine Intelligence Lab of the Engineering Department, Cambridge University, U.K. in 2006 and then worked as a senior research associate there. I joined SJTU in 2012 and founded SpeechLab at SJTU. Later, SpeechLab was extended and renamed as Cross-media Language Intelligence (X-LANCE) Lab as it is now. I have served as a member of IEEE Speech and Language Processing Technical Committee (2017-2019), a board member of the IEEE Signal Processing Society Conferences Board (2024-2025), as well as an associate editor of IEEE/ACM Transactions on Audio, Speech, and Language Processing (2019-2024). I am currently a board member of the IEEE Signal Processing Society Membership Board. I am also a coucil member of the CCF (China Computer Federation) and serve as the director of the Speech, Dialogue and Auditory Processing Technical Committee of CCF.

My research interests primarily lie in the field of conversational AI, including rich aspects of speech and language processing as well as multi-modal linguistic computing. The goal of my research is to build cognitive conversational agent which can operate in complex real-world environment, deal with uncertainty, deliver information in a humanized way and evolve via interacting with environment. I have published over 300 peer-reviewed journal and conference papers and won numerous paper awards. I used to serve as program chairs for Interspeech, ICMI and SigDial, general chair for National Conference on Man-machine Communication (the largest domestic speech conference in China), as well as area chairs of speech processing or dialogue systems for Interspeech, ACL, EMNLP etc.

The outcome of my research have been both recognized in academia and successfully industrialized. I founded AISpeech to commercialize state-of-the-art speech and language processing technology. AISpeech has been selected into the “AI Key Players” list in the Equity Research Report of AI by Goldman Sachs in 2016 and one of the Cool Vendors for AI (East Asia) by Gartner in 2017. On behalf of AISpeech, I am also leading the National AI Open Innovation Platform on Language Computing, granted by Ministry of Science and Technology of China in 2022.

SJTU X-LANCE Lab

We are looking for self-motivated Ph.D./master/undergraduate students and postdocs interested in speech and language processing. Please send your CV to me if you want to join us.

Research Interests

Speech and Audio Processing
neural speech signal processing, robust speech and speaker recognition, high-fidelity speech synthesis, audio analysis and auditory cognition, multi-modal speech processing and universal audio model
Natural Language Processing and Conversational Agent
structured language understanding, KBQA and machine reading comprehension, statistical dialogue systems, multi-lingual language processing, LLM, conversational agent system
Multi-modal LLM and Interaction
multi-modal LLM, embodied agent, digital avatar, GUI understanding and manipulation, AGI for science

Selected Publications [Google Scholar][More Papers]

Review and Perspective

LALM A Survey on Speech Large Language Models for Understanding
Jing Peng, Yucheng Wang, Bohan Li, Yiwei Guo, Hankun Wang, YanGui Fang, Yu Xi, Haoyu Li, Xu Li, Ke Zhang, Shuai Wang and Kai Yu
IEEE Journal of Selected Topics in Signal Processing, (JSTSP), vol. 20, no. 1, pp. 2-31, 2026
Speech Recent Advances in Discrete Speech Tokens: A Review
Yiwei Guo, Zhihan Li, Hankun Wang, Bohan Li, Chongtian Shao, Hanglei Zhang, Chenpeng Du, Xie Chen, Shujie Liu and Kai Yu
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), pp. 4184-4204, vol. 48, 2026

Speech and Audio Processing

TSE Detect, Attend and Extract: Keyword Guided Target Speaker Extraction
Haoyu Li, Yu Xi, Yidi Jiang, Shuai Wang, Kate Knill, Mark Gales, Haizhou Li and Kai Yu
IJCAI-ECAI 2026
LALM TASU: Text-only alignment for speech understanding
Jing Peng, Yi Yang, Xu Li, Yu Xi, Quanwei Tang, Yangui Fang, Junjie Li and Kai Yu
ICASSP 2026
LALM AHAMask: Reliable Task Specification for Large Audio Language Models without Instructions
Yiwei Guo, Bohan Li, Hankun Wang, Zhihan Li, Shuai Wang, Xie Chen and Kai Yu
AAAI 2026
TTS VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
Chenpeng Du, Yiwei Guo, Hankun Wang, Yifan Yang, Zhikang Niu, Shuai Wang, Hui Zhang, Xie Chen and Kai Yu
ICASSP 2025
ASR TDT-KWS: Fast and Accurate Keyword Spotting Using Token-and-duration Transducer
Yu Xi, Hao Li, Baochen Yang, Haoyu Li, Hainan Xu and Kai Yu
ICASSP 2024

Natural Language Processing and Conversational Agent

Agent DiSRouter: Distributed Self-Routing for LLM Selections
Hang Zheng, Hongshen Xu, Yongkai Lin, Shuai Fan, Lu Chen and Kai Yu
ICLR 2026
Agent Reducing Tool Hallucination via Reliability Alignment
Hongshen Xu, Zichen Zhu, Lei Pan, Zihan Wang, Su Zhu, Da Ma, Ruisheng Cao, Lu Chen and Kai Yu
ICML 2025
Agent Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback
Hongshen Xu, Zichen Zhu, Situo Zhang, Da Ma, Shuai Fan, Lu Chen and Kai Yu
COLM 2024
LLM Developing ChemDFM as a large language foundation model for chemistry
Zihan Zhao, Da Ma, Lu Chen, Liangtai Sun, Zihao Li, Yi Xia, Bo Chen, Hongshen Xu, Zichen Zhu, Su Zhu, Shuai Fan, Guodong Shen Kai Yu and Xin Chen
Cell Reports Physical Science, vol. 6, issue. 4, pp. 102523, 2025
LLM Large Language Models Are Semi-Parametric Reinforcement Learning Agents.
Danyang Zhang, Lu Chen, Situo Zhang, Hongshen Xu, Zihan Zhao and Kai Yu
NeurIPS 2023
NLP A Heterogeneous Graph to Abstract Syntax Tree Framework for Text-to-SQL
Ruisheng Cao, Lu Chen, Jieyu Li, Hanchong Zhang, Hongshen Xu, Wangyou Zhang, Kai Yu
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 45, no. 11, pp. 13796-13813, 2023

Multi-modal LLM and Interaction

GUI MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation
Zichen Zhu, Hao Tang, Yansi Li, Dingye Liu, Hongshen Xu, Kunyao Lan, Danyang Zhang, Yixuan Jiang, Hao Zhou, Chenrun Wang, Situo Zhang, Liangtai Sun, Yixiao Wang, Yuheng Sun, Lu Chen and Kai Yu
NAACL 2025
Avatar VQTalker: Towards Multilingual Talking Avatars Through Facial Motion Tokenization
Tao Liu, Ziyang Ma, Qi Chen, Feilong Chen, Shuai Fan, Xie Chen and Kai Yu
AAAI 2025
MLLM ChemDFM-X: towards large multimodal model for chemistry
Zihan Zhao, Bo Chen, Jingpiao Li, Lu Chen, Liyang Wen, Pengyu Wang, Zichen Zhu, Danyang Zhang, Yansi Li, Zhongyang Dai, Xin Chen and Kai Yu
Science China Information Science, 67: 220109, 2024
GUI Towards Multi-modal Conversational Agents on Mobile GUI
Liangtai Sun, Xingyu Chen, Lu Chen, Tianle Dai, Zichen Zhu and Kai Yu
EMNLP 2022

Professional Qualification and Service

Institute of Electrical and Electronics Engineers (IEEE)

Fellow of IEEE
Board Member of IEEE Signal Processing Society Conferences Board (2024-2025)
Board Member of IEEE Signal Processing Society Membership Board (2024-2026)
Member of IEEE Speech and Language Processing Technical Committee (2017-2019)
Associate Editor of IEEE/ACM Transactions on Audio Speech and Language Processing (2019-2024)
General Chair of ICASSP 2025 Satellite Event in Suzhou

International Speech Communication Association (ISCA)

Fellow of ISCA
Program Chair of Interspeech 2020

China Computer Federation (CCF)

Distinguished Member of CCF
Member of the 13th Council of CCF
Director of the Speech, Dialogue and Auditory Processing Technical Committee of CCF
Standing Committee Member of the Large Model Forum of CCF

Chinese Information Processing Society of China (CIPSC)

Member of the 9th Council of CIPSC
Associate Director of the Speech Information Processing Technical Committee of CIPSC

Industry Service

Director of the National AI Open Innovation Platform on Language Computing, Ministry of Science and Technology of China (MOST)
Member of the AI Key Technology and Application Evaluation Academic Committee of the Key Laboratory of the Ministry of Industry and Information Technology of China
Member of the Information System User Interfaces Branch (TC28/SC35) of the National Information Technology Standardization Technical Committee
Member of the 4th National Computer Science and Technology Terminology Approval Committee
Director of the Academic and Intellectual Property Working Group of the China Artificial Intelligence Industry Alliance (AIIA)
Associate Director of the Technical Committee of the Alliance of Intelligent Speech Technology Industry of China

Other Service

Vice President of the Shanghai Overseas Returned Scholar Association (SORSA)
Chairman of the AI Branch of SORSA
Vice President of the Shanghai Tsinghua Alumni Association
Member of the Young Scientists Committee of the World Laureates Forum

Academic Conference Service

ICASSP

IEEE SLTC Member
General Chair of ICASSP 2025 Satellite Event

Interspeech

Program Chair, Area Chair (Speech Recognition/Dialogue Systems)

EUSIPCO

Area chair (Speech Processing)

ACL

(Senior) Area chair/Meta-reviewer/Action Editor of ARR (Dialogue Systems/Spoken Language Technology)

NAACL

Area chair/Meta-reviewer/Action Editor of ARR (Dialogue Systems)

EMNLP

Area chair/Meta-reviewer/Action Editor of ARR (Dialogue Systems)

NeurIPS

Area Chair

SigDial

Program Chair

ICMI

Program Chair

NCMMSC

General Chair, Program Chair

Reviewer Service

Journal

IEEE/ACM Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE Signal Processing Letters
IEEE Signal Processing Magazine
Speech Communication
Computer Speech and Language
Journal of Computer Science (Chinese)
Journal of Automation (Chinese)

Conference

ICASSP, Interspeech, IEEE ASRU, IEEE SLT, APSIPA, ISCSLP, NCMMSC
ACL/NAACL/EACL, EMNLP, SigDial
AAAI, NeurIPS

Proposal and Award

EPSRC, U.K.
Science and Engineering Research Council, Agency for Science and Technology Research, Singapore
Israel Science Foundation (ISF), Israel
Foundation for Polish Science
Research Grants Council (RGC) of Hong Kong
National Natural Science Foundation of China
Ministry of Science and Technology of China
Ministry of Industry and Information Technology of China
Ministry of Education of China
Chinese Academy of Sciences

Award

Best Paper Award

EURASIP Speech Communication Best Paper Award
International Symposium on Chinese Spoken Language Processing Best Paper Award
ISCA Computer Speech and Language Best Paper Award
Interspeech Best Paper Award
IEEE SLT Best Paper Award
NCMMSC Best Paper Award

National and Provincial Award and Honor

Leading Talents in Scientific and Technological Innovation by Ministry of Science and Technology of China
Excellent Young Researcher Fund by National Science Foundation of China (NSFC)
Chinese Patent Excellence Award by China National Intellectual Property Administration
Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning by Shanghai Municipal Education Commission

Professional Society Academic Award and Honor

Distinguished Lecturer by International Speech Communication Association (ISCA)
Bamboo Award by China Computer Federation (CCF)
Distinguished Speaker of Advanced Disciplines Lectures by China Computer Federation (CCF)
Second Prize for Scientific and Technological Progress, WuWenJun AI Science and Technology Award by Chinese Association for Artificial Intelligence (CAAI)
First Prize for Natural Science, WuWenJun AI Science and Technology Award by Chinese Association for Artificial Intelligence (CAAI)

Other Award and Honor

Scientific Chinese (2016) Person of the Year by Scientific Chinese Magazine

Last updated on 2026-05-15.