Welcome to SJTU X-LANCE Lab!


X-LANCE实验室全称跨媒体语言智能实验室,英文Cross Media (X-)Language Intelligence Lab, 最初成立于2012年,2020年由原SpeechLab和E-learning实验室合并,目前有教师5人,博士、硕士和本科生共计80余人。









X-LANCE Lab Group

Automatic Speech Recognition

2 Ph.D | 6 Masters | 6 Bachelors

Automatic speech recognition (ASR) converts human speech waveform to text. Statistical ASR approaches are the focus. HMM-based acoustic modelling, statistical language model and decoding algorithm are the main areas. Research topics include, but not limited to, adaptation, low-resource ASR, robust and multi-lingual ASR, deep learning, discriminative training and software engineering for ASR.

Statistical Speech Synthesis

1 Ph.D | 2 Masters | 2 Bachelors

Speech Synthesis is the technique to produce natural human speech. It mainly consists of Text-to-speech (TTS) and Voice Conversion (VC). The TTS system produces the human speech from natural language. We follows the latest end-to-end techniques (e.g. Tacotron, WaveNet) to improve the quality and expressiveness of the generated waveform. The VC system converts the speech waveform from a source style to a target style (e.g. speaker, emotion). Our research interest is to improve the naturalness and similarity of the converted speech.

Spoken Dialogue System

2 Ph.D | 0 Masters | 7 Bachelors

Spoken Dialogue System (SDS) research mainly focus on the application of statistical approaches to speech understanding and dialogue management. SDS architecture, joint optimisation and system engineering are also studied. The aim is to build intelligent end-to-end systems, especially task-oriented systems, which can explicitly deal with the uncertainty arising in human-machine interaction and correctly understand the intention of the users.

Spoken Language Understanding

1 Ph.D | 1 Masters | 2 Bachelors

Spoken Language Understanding (SLU) serves as an interface between ASR and SDS, which converts a sentence to a structured representation of user meaning. Unlike general-domain NLU, SLU focuses only on specific application domains (in the current state of technology). Typically, SLU includes three tasks like domain classification, intent detection, slot filling. Our main research interests focus on deep learning for SLU, SLU domain adaptation & transfer, ASR-error robust SLU, deeper understanding, end-to-end SLU and so on.

Rich Audio Analysis

2 Ph.D | 2 Masters | 3 Bachelors

Rich Audio Analysis (RAA) focus on analysis and classification of non-text information within human speech. The information may involve speaker,emotion, noise, speaking style and so on. In addition, pronunciation evaluation and oral communication skill evaluation are related research topics. The aim is to use intelligent speech technology to assist language learning and examination.

Language Model

1 Ph.D | 3 Masters | 1 Bachelors

Language Model (LM) researches the statistical probability distribution of human languages. LM is usually used in natural language processing, speech recognition, machine translation, handwriting recognition and other applications. Our aim is to propose general LM for both evaluation and generation. We are now focus on the combination of traditional statistical LM and deep learning or reinforcement learning. We are also interested in structured LSTM LM and large vocabulary LM applications.