🎬videos

Videos of previous talks, lectures, and presentations.

Xmart Student Forum

Session 14 Yuancheng Wang: Towards Natural and Efficient Speech Synthesis — Perspectives on Modeling, Alignment, and Representation

Session 13 Dongchao Yang: Towards Multi-task Audio Foundation Models — An Audio Generation Perspective

Session 12 Junzuo Zhou & Yong Ren: Traceable Protection of Speech — Research on Audio Watermarking

Session 11 Shengpeng Ji: Opportunities and Challenges in the Era of End-to-End Spoken Dialogue

Session 10 Ruibin Yuan: Scaling Open Foundation Models for Music

Session 9 Shaolei Zhang: Toward Real-time Cross-Language Communication — Challenges, Techniques, and Future of Real-time Speech Models

Session 8 Junbin Xiao & Leilei Li: Research and Outlook on First-Person Perspective Problems

Session 7 Zirui Guo: From Retrieval-Augmented Generation to Graph-Augmented Generation — Exploring Next-Generation Intelligent Q&A Systems

Session 6 Haohe Liu: Latent Diffusion Model as a Versatile Coarse-to-Fine Audio Decoder

Session 5 Tianbao Xie: OSWorld — Benchmarking Multimodal Agents for Open-Ended Tasks in a Real Computer Environment

Session 4 Yuchen Hu: Post-Training Alignment of Large Speech Models

Session 3 Junyi Ao: SD-Eval New Benchmark — Equipping Large Speech Interaction Models with Cognitive and Emotional Intelligence

Session 2 Keqi Deng: Label-synchronous Neural Transducer

Session 1 Dong Zhang: Building End-to-End Spoken Dialogue Large Models

Xmart Frontier Talks

Session 7 Kele Xu: Multimodal Machine Learning for Sound Understanding

Session 6 Cewu Lu: Embodied Intelligence Scaling Laws and Scalable Data

Session 5 Wenwu Wang: Large Language-Audio Models and Their Applications

Session 4 Xipeng Qiu: From Large Language Models to World Models

Session 3 Tianfan Fu: Applications of Deep Learning in Drug Discovery and Development

Session 2 Hung-yi Lee: Challenges of Teaching New Skills to Foundation Models

Session 1 Haofen Wang: Knowledge Retrieval Augmentation — Paradigms and Key Technologies