Xmart Student Forum
Session 14 Yuancheng Wang: Towards Natural and Efficient Speech Synthesis — Perspectives on Modeling, Alignment, and Representation
Session 13 Dongchao Yang: Towards Multi-task Audio Foundation Models — An Audio Generation Perspective
Session 12 Junzuo Zhou & Yong Ren: Traceable Protection of Speech — Research on Audio Watermarking
Session 11 Shengpeng Ji: Opportunities and Challenges in the Era of End-to-End Spoken Dialogue
Session 10 Ruibin Yuan: Scaling Open Foundation Models for Music
Session 9 Shaolei Zhang: Toward Real-time Cross-Language Communication — Challenges, Techniques, and Future of Real-time Speech Models
Session 8 Junbin Xiao & Leilei Li: Research and Outlook on First-Person Perspective Problems
Session 7 Zirui Guo: From Retrieval-Augmented Generation to Graph-Augmented Generation — Exploring Next-Generation Intelligent Q&A Systems
Session 6 Haohe Liu: Latent Diffusion Model as a Versatile Coarse-to-Fine Audio Decoder
Session 5 Tianbao Xie: OSWorld — Benchmarking Multimodal Agents for Open-Ended Tasks in a Real Computer Environment
Session 4 Yuchen Hu: Post-Training Alignment of Large Speech Models
Session 3 Junyi Ao: SD-Eval New Benchmark — Equipping Large Speech Interaction Models with Cognitive and Emotional Intelligence
Session 2 Keqi Deng: Label-synchronous Neural Transducer
Session 1 Dong Zhang: Building End-to-End Spoken Dialogue Large Models
Xmart Frontier Talks
Session 7 Kele Xu: Multimodal Machine Learning for Sound Understanding
Session 6 Cewu Lu: Embodied Intelligence Scaling Laws and Scalable Data
Session 5 Wenwu Wang: Large Language-Audio Models and Their Applications
Session 4 Xipeng Qiu: From Large Language Models to World Models
Session 3 Tianfan Fu: Applications of Deep Learning in Drug Discovery and Development
Session 2 Hung-yi Lee: Challenges of Teaching New Skills to Foundation Models
Session 1 Haofen Wang: Knowledge Retrieval Augmentation — Paradigms and Key Technologies