📃Papers | X

2024

Advanced Long-Content Speech Recognition With Factorized Neural Transducer

Xun Gong , Yu Wu , Jinyu Li , Shujie Liu , Rui Zhao , Xie Chen, and Yanmin Qian

IEEE ACM Trans. Audio Speech Lang. Process., 2024
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer

Wenxi Chen , Yuzhe Liang , Ziyang Ma , Zhisheng Zheng , and Xie Chen

CoRR, 2024
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering

Yakun Song , Zhuo Chen , Xiaofei Wang , Ziyang Ma , and Xie Chen

CoRR, 2024
BAT: Learning to Reason about Spatial Sounds with Large Language Models

Zhisheng Zheng , Puyuan Peng , Ziyang Ma , Xie Chen, Eunsol Choi , and David Harwath

CoRR, 2024
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity

Ziyang Ma , Guanrou Yang , Yifan Yang , Zhifu Gao , Jiaming Wang , Zhihao Du , Fan Yu , Qian Chen , Siqi Zheng , Shiliang Zhang , and Xie Chen

CoRR, 2024
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning

Xuenan Xu , Zeyu Xie , Mengyue Wu, and Kai Yu

IEEE ACM Trans. Audio Speech Lang. Process., 2024
Towards Weakly Supervised Text-to-Audio Grounding

Xuenan Xu , Ziyang Ma , Mengyue Wu, and Kai Yu

CoRR, 2024
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech

Chenpeng Du , Yiwei Guo , Hankun Wang , Yifan Yang , Zhikang Niu , Shuai Wang , Hui Zhang , Xie Chen, and Kai Yu

CoRR, 2024
ChemDFM: Dialogue Foundation Model for Chemistry

Zihan Zhao , Da Ma , Lu Chen, Liangtai Sun , Zihao Li , Hongshen Xu , Zichen Zhu , Su Zhu , Shuai Fan , Guodong Shen , Xin Chen , and Kai Yu

CoRR, 2024
MULTI: Multimodal Understanding Leaderboard with Text and Images

Zichen Zhu, Yang Xu , Lu Chen, Jingkai Yang , Yichuan Ma , Yiming Sun , Hailin Wen , Jiaqi Liu , Jinyu Cai , Yingzi Ma , Situo Zhang , Zihan Zhao , Liangtai Sun , and Kai Yu

CoRR, 2024

2023

A Unified Framework From Face Image Restoration to Data Augmentation Using Generative Prior

Jiawei You , Ganyu Huang , Tianyuan Han , Haoze Yang , and Liping Shen

IEEE Access, 2023
Human Pose Estimation with Combined Feature Maps and Joint Embeddings

Tianyuan Han , Ganyu Huang , Chunhui Li , and Liping Shen

In Proceedings of the 2023 International Conference on Advances in Artificial Intelligence and Applications, AAIA 2023, Wuhan, China, November 18-20, 2023 , 2023
Assessing and Enhancing LLMs: A Physics and History Dataset and One-More-Check Pipeline Method

Chaofan He , Chunhui Li , Tianyuan Han , and Liping Shen

In Neural Information Processing - 30th International Conference, ICONIP 2023, Changsha, China, November 20-23, 2023, Proceedings, Part XIII , 2023
GAN Latent Space Manipulation Based Augmentation for Unbalanced Emotion Datasets

Yuhan Xiong , Jiawei You , and Liping Shen

In International Joint Conference on Neural Networks, IJCNN 2023, Gold Coast, Australia, June 18-23, 2023 , 2023
LongFNT: Long-Form Speech Recognition with Factorized Neural Transducer

Xun Gong , Yu Wu , Jinyu Li , Shujie Liu , Rui Zhao , Xie Chen, and Yanmin Qian

In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
Factorized AED: Factorized Attention-Based Encoder-Decoder for Text-Only Domain Adaptive ASR

Xun Gong , Wei Wang , Hang Shao , Xie Chen, and Yanmin Qian

In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
Exploring Binary Classification Loss for Speaker Verification

Bing Han , Zhengyang Chen , and Yanmin Qian

In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
Improving Dino-Based Self-Supervised Speaker Verification with Progressive Cluster-Aware Training

Bing Han , Wen Huang , Zhengyang Chen , and Yanmin Qian

In IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2023 - Workshops, Rhodes Island, Greece, June 4-10, 2023 , 2023
Robust Audio-Visual ASR with Unified Cross-Modal Attention

Jiahong Li , Chenda Li , Yifei Wu , and Yanmin Qian

In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
Target Sound Extraction with Variable Cross-Modality Clues

Chenda Li , Yao Qian , Zhuo Chen , Dongmei Wang , Takuya Yoshioka , Shujie Liu , Yanmin Qian , and Michael Zeng

In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
Predictive Skim: Contrastive Predictive Coding for Low-Latency Online Speech Separation

Chenda Li , Yifei Wu , and Yanmin Qian

In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
Multi-Speaker End-to-End Multi-Modal Speaker Diarization System for the MISP 2022 Challenge

Tao Liu , Zhengyang Chen , Yanmin Qian , and Kai Yu

In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
Joint Discriminator and Transfer Based Fast Domain Adaptation For End-To-End Speech Recognition

Hang Shao , Tian Tan , Wei Wang , Xun Gong , and Yanmin Qian

In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
Lowbit Neural Network Quantization for Speaker Verification

Haoyu Wang , Bei Liu , Yifei Wu , Zhengyang Chen , and Yanmin Qian

In IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2023 - Workshops, Rhodes Island, Greece, June 4-10, 2023 , 2023
Wespeaker: A Research and Production Oriented Speaker Embedding Learning Toolkit

Hongji Wang , Chengdong Liang , Shuai Wang , Zhengyang Chen , Binbin Zhang , Xu Xiang , Yanlei Deng , and Yanmin Qian

In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
HuBERT-AGG: Aggregated Representation Distillation of Hidden-Unit Bert for Robust Speech Recognition

Wei Wang , and Yanmin Qian

In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
Light-Weight Visualvoice: Neural Network Quantization On Audio Visual Speech Separation

Yifei Wu , Chenda Li , and Yanmin Qian

In IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2023 - Workshops, Rhodes Island, Greece, June 4-10, 2023 , 2023
Code-Switching Text Generation and Injection in Mandarin-English ASR

Haibin Yu , Yuxuan Hu , Yao Qian , Ma Jin , Linquan Liu , Shujie Liu , Yu Shi , Yanmin Qian , Edward Lin , and Michael Zeng

In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
Adaptive Large Margin Fine-Tuning For Robust Speaker Verification

Leying Zhang , Zhengyang Chen , and Yanmin Qian

In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation

Chenyang Le , Yao Qian , Long Zhou , Shujie Liu , Yanmin Qian , Michael Zeng , and Xuedong Huang

In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023 , 2023
Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation

Yoshiki Masuyama , Xuankai Chang , Wangyou Zhang , Samuele Cornell , Zhong-Qiu Wang , Nobutaka Ono , Yanmin Qian , and Shinji Watanabe

In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2023, New Paltz, NY, USA, October 22-25, 2023 , 2023
Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing (espnet-v.202310) (Version 1)

Yen-Ju Lu , Xuankai Chang , Chenda Li , Wangyou Zhang , Samuele Cornell , Zhaoheng Ni , Yoshiki Masuyama , Brian Yan , Robin Scheibler , Zhong-Qiu Wang , Yu Tsao , Yanmin Qian , and Shinji Watanabe

Oct 2023

Accessed on YYYY-MM-DD.
Self-Supervised Learning with Cluster-Aware-DINO for High-Performance Robust Speaker Verification

Bing Han , Zhengyang Chen , and Yanmin Qian

CoRR, Oct 2023
Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor

Zhengyang Chen , Bing Han , Shuai Wang , and Yanmin Qian

CoRR, Oct 2023
Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR

Hang Shao , Wei Wang , Bei Liu , Xun Gong , Haoyu Wang , and Yanmin Qian

CoRR, Oct 2023
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition

Wangyou Zhang , and Yanmin Qian

CoRR, Oct 2023
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers

Chenda Li , Yao Qian , Zhuo Chen , Naoyuki Kanda , Dongmei Wang , Takuya Yoshioka , Yanmin Qian , and Michael Zeng

CoRR, Oct 2023
InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion Models

Bing Han , Junyu Dai , Xuchen Song , Weituo Hao , Xinyan He , Dong Guo , Jitong Chen , Yuxuan Wang , and Yanmin Qian

CoRR, Oct 2023
Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding Enhancer

Zhengyang Chen , Bing Han , Shuai Wang , and Yanmin Qian

CoRR, Oct 2023
USED: Universal Speaker Extraction and Diarization

Junyi Ao , Mehmet Sinan Yildirim , Meng Ge , Shuai Wang , Ruijie Tao , Yanmin Qian , Liqun Deng , Longshuai Xiao , and Haizhou Li

CoRR, Oct 2023
Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition

Shuai Wang , Qibing Bai , Qi Liu , Jianwei Yu , Zhengyang Chen , Bing Han , Yanmin Qian , and Haizhou Li

CoRR, Oct 2023
The second multi-channel multi-party meeting transcription challenge (M2MeT) 2.0): A benchmark for speaker-attributed ASR

Yuhao Liang , Mohan Shi , Fan Yu , Yangze Li , Shiliang Zhang , Zhihao Du , Qian Chen , Lei Xie , Yanmin Qian , Jian Wu , Zhuo Chen , Kong Aik Lee , Zhijie Yan , and Hui Bu

CoRR, Oct 2023
Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction

Leying Zhang , Yao Qian , Linfeng Yu , Heming Wang , Xinkai Wang , Hemin Yang , Long Zhou , Shujie Liu , Yanmin Qian , and Michael Zeng

CoRR, Oct 2023
Toward Universal Speech Enhancement for Diverse Input Conditions

Wangyou Zhang , Kohei Saijo , Zhong-Qiu Wang , Shinji Watanabe , and Yanmin Qian

CoRR, Oct 2023
One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models

Hang Shao , Bei Liu , and Yanmin Qian

CoRR, Oct 2023
FAT-HuBERT: Front-end Adaptive Training of Hidden-unit BERT for Distortion-Invariant Robust Speech Recognition

Dongning Yang , Wei Wang , and Yanmin Qian

CoRR, Oct 2023
Speaker Adaptive Text-to-Speech With Timbre-Normalized Vector-Quantized Feature

Chenpeng Du , Yiwei Guo , Xie Chen, and Kai Yu

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2023
Fast-Hubert: an Efficient Training Framework for Self-Supervised Speech Representation Learning

Guanrou Yang , Ziyang Ma , Zhisheng Zheng , Yakun Song , Zhikang Niu , and Xie Chen

In IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023, Taipei, Taiwan, December 16-20, 2023 , Oct 2023
Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation

Qi Chen , Ziyang Ma , Tao Liu , Xu Tan , Qu Lu , Kai Yu , and Xie Chen

In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , Oct 2023
Front-End Adapter: Adapting Front-End Input of Speech Based Self-Supervised Learning for Speech Recognition

Xie Chen, Ziyang Ma , Changli Tang , Yujin Wang , and Zhisheng Zheng

In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , Oct 2023
Emodiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance

Yiwei Guo , Chenpeng Du , Xie Chen, and Kai Yu

In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , Oct 2023
DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder

Chenpeng Du , Qi Chen , Tianyu He , Xu Tan , Xie Chen, Kai Yu, Sheng Zhao , and Jiang Bian

In Proceedings of the 31st ACM International Conference on Multimedia, MM 2023, Ottawa, ON, Canada, 29 October 2023- 3 November 2023 , Oct 2023
Blank-regularized CTC for Frame Skipping in Neural Transducer

Yifan Yang , Xiaoyu Yang , Liyong Guo , Zengwei Yao , Wei Kang , Fangjun Kuang , Long Lin , Xie Chen, and Daniel Povey

CoRR, Oct 2023
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding

Chenpeng Du , Yiwei Guo , Feiyu Shen , Zhijun Liu , Zheng Liang , Xie Chen, Shuai Wang , Hui Zhang , and Kai Yu

CoRR, Oct 2023
Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation

Zheng Liang , Zheshu Song , Ziyang Ma , Chenpeng Du , Kai Yu , and Xie Chen

CoRR, Oct 2023
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation

Ziyang Ma , Zhisheng Zheng , Guanrou Yang , Yu Wang , Chao Zhang , and Xie Chen

CoRR, Oct 2023
Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems

Mingyu Cui , Jiawen Kang , Jiajun Deng , Xi Yin , Yutao Xie , Xie Chen, and Xunying Liu

CoRR, Oct 2023
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech

Sen Liu , Yiwei Guo , Chenpeng Du , Xie Chen, and Kai Yu

CoRR, Oct 2023
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition

Zhisheng Zheng , Ziyang Ma , Yu Wang , and Xie Chen

CoRR, Oct 2023
VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching

Yiwei Guo , Chenpeng Du , Ziyang Ma , Xie Chen, and Kai Yu

CoRR, Oct 2023
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS

Yifan Yang , Feiyu Shen , Chenpeng Du , Ziyang Ma , Kai Yu, Daniel Povey , and Xie Chen

CoRR, Oct 2023
Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer

Peng Wang , Yifan Yang , Zheng Liang , Tian Tan , Shiliang Zhang , and Xie Chen

CoRR, Oct 2023
Improved Factorized Neural Transducer Model For text-only Domain Adaptation

Junzhe Liu , Jianwei Yu , and Xie Chen

CoRR, Oct 2023
Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition

Ziyang Ma , Wen Wu , Zhisheng Zheng , Yiwei Guo , Qian Chen , Shiliang Zhang , and Xie Chen

CoRR, Oct 2023
Acoustic BPE for Speech Generation with Discrete Tokens

Feiyu Shen , Yiwei Guo , Chenpeng Du , Xie Chen, and Kai Yu

CoRR, Oct 2023
Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations

Hanglei Zhang , Yiwei Guo , Sen Liu , Xie Chen, and Kai Yu

CoRR, Oct 2023
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Ziyang Ma , Zhisheng Zheng , Jiaxin Ye , Jinchao Li , Zhifu Gao , Shiliang Zhang , and Xie Chen

CoRR, Oct 2023
OPAL: Ontology-Aware Pretrained Language Model for End-to-End Task-Oriented Dialogue

Zhi Chen , Yuncong Liu , Lu Chen , Su Zhu , Mengyue Wu, and Kai Yu

Trans. Assoc. Comput. Linguistics, Oct 2023
Transcribing Vocal Communications of Domestic Shiba lnu Dogs

Jieyi Huang , Chunhao Zhang , Mengyue Wu , and Kenny Q. Zhu

In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023 , Oct 2023
Detection of Multiple Mental Disorders from Social Media with Two-Stream Psychiatric Experts

Siyuan Chen , Zhiling Zhang , Mengyue Wu , and Kenny Q. Zhu

In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023 , Oct 2023
Semantic Space Grounded Weighted Decoding for Multi-Attribute Controllable Dialogue Generation

Zhiling Zhang , Mengyue Wu , and Kenny Q. Zhu

In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023 , Oct 2023
Diverse and Vivid Sound Generation from Text Descriptions

Guangwei Li , Xuenan Xu , Lingfeng Dai , Mengyue Wu, and Kai Yu

In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , Oct 2023
Investigating Pooling Strategies and Loss Functions for Weakly-Supervised Text-to-Audio Grounding via Contrastive Learning

Xuenan Xu , Mengyue Wu, and Kai Yu

In IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2023 - Workshops, Rhodes Island, Greece, June 4-10, 2023 , Oct 2023
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data

Xuenan Xu , Zhiling Zhang , Zelin Zhou , Pingyue Zhang , Zeyu Xie , Mengyue Wu , and Kenny Q. Zhu

In Proceedings of the 31st ACM International Conference on Multimedia, MM 2023, Ottawa, ON, Canada, 29 October 2023- 3 November 2023 , Oct 2023
LLM-empowered Chatbots for Psychiatrist and Patient Simulation: Application and Evaluation

Siyuan Chen , Mengyue Wu , Kenny Q. Zhu , Kunyao Lan , Zhiling Zhang , and Lyuchun Cui

CoRR, Oct 2023
Enhance Temporal Relations in Audio Captioning with Sound Event Detection

Zeyu Xie , Xuenan Xu , Mengyue Wu, and Kai Yu

CoRR, Oct 2023
Improving Audio Caption Fluency with Automatic Error Correction

Hanxue Zhang , Zeyu Xie , Xuenan Xu , Mengyue Wu, and Kai Yu

CoRR, Oct 2023
A Large-scale Dataset for Audio-Language Representation Learning

Luoyi Sun , Xuenan Xu , Mengyue Wu, and Weidi Xie

CoRR, Oct 2023
Does My Dog "Speak" Like Me? The Acoustic Correlation between Pet Dogs and Their Human Owners

Jieyi Huang , Chunhao Zhang , Yufei Wang , Mengyue Wu , and Kenny Q. Zhu

CoRR, Oct 2023
Towards Lexical Analysis of Dog Vocalizations via Online Videos

Yufei Wang , Chunhao Zhang , Jieyi Huang , Mengyue Wu , and Kenny Q. Zhu

CoRR, Oct 2023
PsyEval: A Comprehensive Large Language Model Evaluation Benchmark for Mental Health

Haoan Jin , Siyuan Chen , Mengyue Wu , and Kenny Q. Zhu

CoRR, Oct 2023
A Heterogeneous Graph to Abstract Syntax Tree Framework for Text-to-SQL

Ruisheng Cao , Lu Chen, Jieyu Li , Hanchong Zhang , Hongshen Xu , Wangyou Zhang , and Kai Yu

IEEE Trans. Pattern Anal. Mach. Intell., Oct 2023
Speech Enhancement With Integration of Neural Homomorphic Synthesis and Spectral Masking

Wenbin Jiang , and Kai Yu

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2023
SPM: A Split-Parsing Method for Joint Multi-Intent Detection and Slot Filling

Sheng Jiang , Su Zhu , Ruisheng Cao , Qingliang Miao , and Kai Yu

In Proceedings of the The 61st Annual Meeting of the Association for Computational Linguistics: Industry Track, ACL 2023, Toronto, Canada, July 9-14, 2023 , Oct 2023
Exploring Schema Generalizability of Text-to-SQL

Jieyu Li , Lu Chen, Ruisheng Cao , Su Zhu , Hongshen Xu , Zhi Chen , Hanchong Zhang , and Kai Yu

In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023 , Oct 2023
TeCS: A Dataset and Benchmark for Tense Consistency of Machine Translation

Yiming Ai , Zhiwei He , Kai Yu, and Rui Wang

In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), ACL 2023, Toronto, Canada, July 9-14, 2023 , Oct 2023
CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset

Hanchong Zhang , Jieyu Li , Lu Chen, Ruisheng Cao , Yunyan Zhang , Yu Huang , Yefeng Zheng , and Kai Yu

In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023 , Oct 2023
ACT-SQL: In-Context Learning for Text-to-SQL with Automatically-Generated Chain-of-Thought

Hanchong Zhang , Ruisheng Cao , Lu Chen, Hongshen Xu , and Kai Yu

In Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023 , Oct 2023
Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge

Chenpeng Du , Yiwei Guo , Feiyu Shen , and Kai Yu

In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , Oct 2023
DiffVoice: Text-to-Speech with Latent Diffusion

Zhijun Liu , Yiwei Guo , and Kai Yu

In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , Oct 2023
Large Language Models Are Semi-Parametric Reinforcement Learning Agents

Danyang Zhang , Lu Chen, Situo Zhang , Hongshen Xu , Zihan Zhao , and Kai Yu

In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023 , Oct 2023
Mobile-Env: A Universal Platform for Training and Evaluation of Mobile Interaction

Danyang Zhang , Lu Chen, and Kai Yu

CoRR, Oct 2023
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research

Liangtai Sun , Yang Han , Zihan Zhao , Da Ma , Zhennan Shen , Baocai Chen , Lu Chen, and Kai Yu

CoRR, Oct 2023
ASTormer: An AST Structure-aware Transformer Decoder for Text-to-SQL

Ruisheng Cao , Hanchong Zhang , Hongshen Xu , Jieyu Li , Da Ma , Lu Chen, and Kai Yu

CoRR, Oct 2023
DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder

Tao Liu , Chenpeng Du , Shuai Fan , Feilong Chen , and Kai Yu

CoRR, Oct 2023
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention

Junjie Li , Yiwei Guo , Xie Chen, and Kai Yu

CoRR, Oct 2023

2022

Heterogeneous Graph Representation for Knowledge Tracing

Jisen Chen , Jian Shen , Ting Long , Liping Shen, Weinan Zhang , and Yong Yu

In Neural Information Processing - 29th International Conference, ICONIP 2022, Virtual Event, November 22-26, 2022, Proceedings, Part I , Oct 2022
A simple but practical method: How to improve the usage of entities in the Chinese question generation

Haoze Yang , Kunyao Lan , Jiawei You , and Liping Shen

In International Joint Conference on Neural Networks, IJCNN 2022, Padua, Italy, July 18-23, 2022 , Oct 2022
From Uniform Models To Generic Representations: Stock Return Prediction With Pre-training

Jiawei You , Tianyuan Han , and Liping Shen

In International Joint Conference on Neural Networks, IJCNN 2022, Padua, Italy, July 18-23, 2022 , Oct 2022
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

Sanyuan Chen , Chengyi Wang , Zhengyang Chen , Yu Wu , Shujie Liu , Zhuo Chen , Jinyu Li , Naoyuki Kanda , Takuya Yoshioka , Xiong Xiao , Jian Wu , Long Zhou , Shuo Ren , Yanmin Qian , Yao Qian , Jian Wu , Michael Zeng , Xiangzhan Yu , and Furu Wei

IEEE J. Sel. Top. Signal Process., Oct 2022
Optimizing Data Usage for Low-Resource Speech Recognition

Yanmin Qian , and Zhikai Zhou

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2022
Dual-Path Modeling With Memory Embedding Model for Continuous Speech Separation

Chenda Li , Zhuo Chen , and Yanmin Qian

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2022
Layer-Wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition

Yanmin Qian , Xun Gong , and Houjun Huang

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2022
End-to-End Dereverberation, Beamforming, and Speech Recognition in a Cocktail Party

Wangyou Zhang , Xuankai Chang , Christoph Böddeker , Tomohiro Nakatani , Shinji Watanabe , and Yanmin Qian

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2022
Time-Domain Audio-Visual Speech Separation on Low Quality Videos

Yifei Wu , Chenda Li , Jinfeng Bai , Zhongqin Wu , and Yanmin Qian

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
Skim: Skipping Memory Lstm for Low-Latency Real-Time Continuous Speech Separation

Chenda Li , Lei Yang , Weiqin Wang , and Yanmin Qian

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
Large-Scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification

Zhengyang Chen , Sanyuan Chen , Yu Wu , Yao Qian , Chengyi Wang , Shujie Liu , Yanmin Qian , and Michael Zeng

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
Local Information Modeling with Self-Attention for Speaker Verification

Bing Han , Zhengyang Chen , and Yanmin Qian

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
Punctuation Prediction for Streaming On-Device Speech Recognition

Zhikai Zhou , Tian Tan , and Yanmin Qian

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
MLP-SVNET: A Multi-Layer Perceptrons Based Network for Speaker Verification

Bing Han , Zhengyang Chen , Bei Liu , and Yanmin Qian

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
Self-Knowledge Distillation via Feature Enhancement for Speaker Verification

Bei Liu , Haoyu Wang , Zhengyang Chen , Shuai Wang , and Yanmin Qian

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
Optimizing Alignment of Speech and Language Latent Spaces for End-To-End Speech Recognition and Understanding

Wei Wang , Shuo Ren , Yao Qian , Shujie Liu , Yu Shi , Yanmin Qian , and Michael Zeng

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
Exploring Effective Data Utilization for Low-Resource Speech Recognition

Zhikai Zhou , Wei Wang , Wangyou Zhang , and Yanmin Qian

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge

Fan Yu , Shiliang Zhang , Pengcheng Guo , Yihui Fu , Zhihao Du , Siqi Zheng , Weilong Huang , Lei Xie , Zheng-Hua Tan , DeLiang Wang , Yanmin Qian , Kong Aik Lee , Zhijie Yan , Bin Ma , Xin Xu , and Hui Bu

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
The Sjtu System For Multimodal Information Based Speech Processing Challenge 2021

Wei Wang , Xun Gong , Yifei Wu , Zhikai Zhou , Chenda Li , Wangyou Zhang , Bing Han , and Yanmin Qian

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
Attentive Feature Fusion for Robust Speaker Verification

Bei Liu , Zhengyang Chen , and Yanmin Qian

In Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
Dual Path Embedding Learning for Speaker Verification with Triplet Attention

Bei Liu , Zhengyang Chen , and Yanmin Qian

In Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
DF-ResNet: Boosting Speaker Verification Performance with Depth-First Design

Bei Liu , Zhengyang Chen , Shuai Wang , Haoyu Wang , Bing Han , and Yanmin Qian

In Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
Enroll-Aware Attentive Statistics Pooling for Target Speaker Verification

Leying Zhang , Zhengyang Chen , and Yanmin Qian

In Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
MSDWild: Multi-modal Speaker Diarization Dataset in the Wild

Tao Liu , Shuai Fan , Xu Xiang , Hongbo Song , Shaoxiong Lin , Jiaqi Sun , Tianyuan Han , Siyuan Chen , Binwei Yao , Sen Liu , Yifei Wu , Yanmin Qian , and Kai Yu

In Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
Knowledge Transfer and Distillation from Autoregressive to Non-Autoregessive Speech Recognition

Xun Gong , Zhikai Zhou , and Yanmin Qian

In Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
Self-Supervised Speaker Verification Using Dynamic Loss-Gate and Label Correction

Bing Han , Zhengyang Chen , and Yanmin Qian

In Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
Separating Long-Form Speech with Group-wise Permutation Invariant Training

Wangyou Zhang , Zhuo Chen , Naoyuki Kanda , Shujie Liu , Jinyu Li , Sefik Emre Eskimez , Takuya Yoshioka , Xiong Xiao , Zhong Meng , Yanmin Qian , and Furu Wei

In Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding

Yen-Ju Lu , Xuankai Chang , Chenda Li , Wangyou Zhang , Samuele Cornell , Zhaoheng Ni , Yoshiki Masuyama , Brian Yan , Robin Scheibler , Zhong-Qiu Wang , Yu Tsao , Yanmin Qian , and Shinji Watanabe

In Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
Improving Speech Separation with Knowledge Distilled from Self-supervised Pre-trained Models

Bowen Qu , Chenda Li , Jinfeng Bai , and Yanmin Qian

In 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022, Singapore, December 11-14, 2022 , Oct 2022
Text-Informed Knowledge Distillation for Robust Speech Enhancement and Recognition

Wei Wang , Wangyou Zhang , Shaoxiong Lin , and Yanmin Qian

In 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022, Singapore, December 11-14, 2022 , Oct 2022
Medical Difficult Airway Detection using Speech Technology

Zhikai Zhou , Shuang Cao , Zhengyang Chen , Bei Liu , Ming Xia , Hong Jiang , and Yanmin Qian

In 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022, Singapore, December 11-14, 2022 , Oct 2022
Speaking style compensation on synthetic audio for robust keyword spotting

Houjun Huang , and Yanmin Qian

In 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022, Singapore, December 11-14, 2022 , Oct 2022
The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines

Gaofeng Cheng , Yifan Chen , Runyan Yang , Qingxuan Li , Zehui Yang , Lingxuan Ye , Pengyuan Zhang , Qingqing Zhang , Lei Xie , Yanmin Qian , Kong Aik Lee , and Yonghong Yan

In 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022, Singapore, December 11-14, 2022 , Oct 2022
The X-Lance Speaker Diarization System for the Conversational Short-phrase Speaker Diarization Challenge 2022

Tao Liu , Xu Xiang , Zhengyang Chen , Bing Han , Kai Yu, and Yanmin Qian

In 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022, Singapore, December 11-14, 2022 , Oct 2022
End-to-End Multi-Speaker ASR with Independent Vector Analysis

Robin Scheibler , Wangyou Zhang , Xuankai Chang , Shinji Watanabe , and Yanmin Qian

In IEEE Spoken Language Technology Workshop, SLT 2022, Doha, Qatar, January 9-12, 2023 , Oct 2022
A Comprehensive Study on Self-Supervised Distillation for Speaker Representation Learning

Zhengyang Chen , Yao Qian , Bing Han , Yanmin Qian , and Michael Zeng

In IEEE Spoken Language Technology Workshop, SLT 2022, Doha, Qatar, January 9-12, 2023 , Oct 2022
The SJTU X-LANCE Lab System for CNSRC 2022

Zhengyang Chen , Bei Liu , Bing Han , Leying Zhang , and Yanmin Qian

CoRR, Oct 2022
SJTU-AISPEECH System for VoxCeleb Speaker Recognition Challenge 2022

Zhengyang Chen , Bing Han , Xu Xiang , Houjun Huang , Bei Liu , and Yanmin Qian

CoRR, Oct 2022
Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022

Zhengyang Chen , Bing Han , Xu Xiang , Houjun Huang , Bei Liu , and Yanmin Qian

CoRR, Oct 2022
Factorized Neural Transducer for Efficient Language Model Adaptation

Xie Chen, Zhong Meng , Sarangarajan Parthasarathy , and Jinyu Li

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature

Chenpeng Du , Yiwei Guo , Xie Chen, and Kai Yu

In Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition

Zhong Meng , Yashesh Gaur , Naoyuki Kanda , Jinyu Li , Xie Chen , Yu Wu , and Yifan Gong

In Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition

Yujin Wang , Changli Tang , Ziyang Ma , Zhisheng Zheng , Xie Chen, and Wei-Qiang Zhang

CoRR, Oct 2022
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets

Ziyang Ma , Zhisheng Zheng , Changli Tang , Yujin Wang , and Xie Chen

CoRR, Oct 2022
EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance

Yiwei Guo , Chenpeng Du , Xie Chen, and Kai Yu

CoRR, Oct 2022
Exploring Effective Fusion Algorithms for Speech Based Self-Supervised Learning Models

Changli Tang , Yujin Wang , Xie Chen, and Wei-Qiang Zhang

CoRR, Oct 2022
D4: a Chinese Dialogue Dataset for Depression-Diagnosis-Oriented Chat

Binwei Yao , Chao Shi , Likai Zou , Lingfeng Dai , Mengyue Wu, Lu Chen, Zhen Wang , and Kai Yu

In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022 , Oct 2022
Symptom Identification for Interpretable Detection of Multiple Mental Disorders on Social Media

Zhiling Zhang , Siyuan Chen , Mengyue Wu , and Kenny Q. Zhu

In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022 , Oct 2022
Category-Adapted Sound Event Enhancement with Weakly Labeled Data

Guangwei Li , Xuenan Xu , Heinrich Dinkel , Mengyue Wu, and Kai Yu

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
Diversity-Controllable and Accurate Audio Captioning Based on Neural Condition

Xuenan Xu , Mengyue Wu, and Kai Yu

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
Can Audio Captions Be Evaluated With Image Caption Metrics?

Zelin Zhou , Zhiling Zhang , Xuenan Xu , Zeyu Xie , Mengyue Wu , and Kenny Q. Zhu

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
Navigating Audio-Visual Event Detection Across Mismatched Modalities

Guangwei Li , Xuenan Xu , Mengyue Wu, and Kai Yu

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
Audio-Text Retrieval in Context

Siyu Lou , Xuenan Xu , Mengyue Wu, and Kai Yu

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
Climate and Weather: Inspecting Depression Detection via Emotion Recognition

Wen Wu , Mengyue Wu, and Kai Yu

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
Psychiatric Scale Guided Risky Post Screening for Early Detection of Depression

Zhiling Zhang , Siyuan Chen , Mengyue Wu , and Kenny Q. Zhu

In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022 , Oct 2022
A Comprehensive Survey of Automated Audio Captioning

Xuenan Xu , Mengyue Wu, and Kai Yu

CoRR, Oct 2022
DialogZoo: Large-Scale Dialog-Oriented Task Learning

Zhi Chen , Jijia Bao , Lu Chen, Yuncong Liu , Da Ma , Bei Chen , Mengyue Wu , Su Zhu , Jian-Guang Lou , and Kai Yu

CoRR, Oct 2022
Data augmentation based non-parallel voice conversion with frame-level speaker disentangler

Bo Chen , Zhihang Xu , and Kai Yu

Speech Commun., Oct 2022
Phone-Level Prosody Modelling With GMM-Based MDN for Diverse and Controllable Speech Synthesis

Chenpeng Du , and Kai Yu

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2022
Neural Fusion for Voice Cloning

Bo Chen , Chenpeng Du , and Kai Yu

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2022
META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI

Liangtai Sun , Xingyu Chen , Lu Chen, Tianle Dai , Zichen Zhu, and Kai Yu

In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022 , Oct 2022
AdapterShare: Task Correlation Modeling with Adapter Differentiation

Zhi Chen , Bei Chen , Lu Chen, Kai Yu, and Jian-Guang Lou

In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022 , Oct 2022
LatticeBART: Lattice-to-Lattice Pre-Training for Speech Recognition

Lingfeng Dai , Lu Chen, Zhikai Zhou , and Kai Yu

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
Text Adaptive Detection for Customizable Keyword Spotting

Yu Xi , Tian Tan , Wangyou Zhang , Baochen Yang , and Kai Yu

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
Unsupervised Word-Level Prosody Tagging for Controllable Speech Synthesis

Yiwei Guo , Chenpeng Du , and Kai Yu

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
The AISP-SJTU Simultaneous Translation System for IWSLT 2022

Qinpei Zhu , Renshou Wu , Guangfeng Liu , Xinyu Zhu , Xingyu Chen , Yang Zhou , Qingliang Miao , Rui Wang , and Kai Yu

In Proceedings of the 19th International Conference on Spoken Language Translation, IWSLT@ACL 2022, Dublin, Ireland (in-person and online), May 26-27, 2022 , Oct 2022
TIE: Topological Information Enhanced Structural Reading Comprehension on Web Pages

Zihan Zhao , Lu Chen, Ruisheng Cao , Hongshen Xu , Xingyu Chen , and Kai Yu

In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022 , Oct 2022
UniDU: Towards A Unified Generative Dialogue Understanding Framework

Zhi Chen , Lu Chen , Bei Chen , Libo Qin , Yuncong Liu , Su Zhu , Jian-Guang Lou , and Kai Yu

In Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL 2022, Edinburgh, UK, 07-09 September 2022 , Oct 2022
The AISP-SJTU Translation System for WMT 2022

Guangfeng Liu , Qinpei Zhu , Xingyu Chen , Renjie Feng , Jianxin Ren , Renshou Wu , Qingliang Miao , Rui Wang , and Kai Yu

In Proceedings of the Seventh Conference on Machine Translation, WMT 2022, Abu Dhabi, United Arab Emirates (Hybrid), December 7-8, 2022 , Oct 2022

2021

Modified Magnitude-Phase Spectrum Information for Spoofing Detection

Jichen Yang , Hongji Wang , Rohan Kumar Das , and Yanmin Qian

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2021
Audio-Visual Deep Neural Network for Robust Person Verification

Yanmin Qian , Zhengyang Chen , and Shuai Wang

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2021
Dual-Path Modeling for Long Recording Speech Separation in Meetings

Chenda Li , Zhuo Chen , Yi Luo , Cong Han , Tianyan Zhou , Keisuke Kinoshita , Marc Delcroix , Shinji Watanabe , and Yanmin Qian

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
Self-Supervised Learning Based Domain Adaptation for Robust Speaker Verification

Zhengyang Chen , Shuai Wang , and Yanmin Qian

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
SynAug: Synthesis-Based Data Augmentation for Text-Dependent Speaker Verification

Chenpeng Du , Bing Han , Shuai Wang , Yanmin Qian , and Kai Yu

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
Unit Selection Synthesis Based Data Augmentation for Fixed Phrase Speaker Verification

Houjun Huang , Xu Xiang , Fei Zhao , Shuai Wang , and Yanmin Qian

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
AISpeech-SJTU Accent Identification System for the Accented English Speech Recognition Challenge

Houjun Huang , Xu Xiang , Yexin Yang , Rao Ma , and Yanmin Qian

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
AISpeech-SJTU ASR System for the Accented English Speech Recognition Challenge

Tian Tan , Yizhou Lu , Rao Ma , Sen Zhu , Jiaqi Guo , and Yanmin Qian

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
Towards Data Selection on TTS Data for Children’s Speech Recognition

Wei Wang , Zhikai Zhou , Yizhou Lu , Hongji Wang , Chenpeng Du , and Yanmin Qian

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend

Wangyou Zhang , Christoph Böddeker , Shinji Watanabe , Tomohiro Nakatani , Marc Delcroix , Keisuke Kinoshita , Tsubasa Ochiai , Naoyuki Kamo , Reinhold Haeb-Umbach , and Yanmin Qian

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods

Xian Shi , Fan Yu , Yizhou Lu , Yuhao Liang , Qiangze Feng , Daliang Wang , Yanmin Qian , and Lei Xie

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation

Christoph Böddeker , Wangyou Zhang , Tomohiro Nakatani , Keisuke Kinoshita , Tsubasa Ochiai , Marc Delcroix , Naoyuki Kamo , Yanmin Qian , and Reinhold Haeb-Umbach

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
Layer-Wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition

Xun Gong , Yizhou Lu , Zhikai Zhou , and Yanmin Qian

In Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021 , Oct 2021
Knowledge Distillation from Multi-Modality to Single-Modality for Person Verification

Leying Zhang , Zhengyang Chen , and Yanmin Qian

In Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021 , Oct 2021
Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition

Zhengxi Liu , and Yanmin Qian

In Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021 , Oct 2021
The SJTU System for Short-Duration Speaker Verification Challenge 2021

Bing Han , Zhengyang Chen , Zhikai Zhou , and Yanmin Qian

In Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021 , Oct 2021
Audio-Visual Multi-Talker Speech Recognition in a Cocktail Party

Yifei Wu , Chenda Li , Song Yang , Zhongqin Wu , and Yanmin Qian

In Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021 , Oct 2021
Speaker Embedding Augmentation with Noise Distribution Matching

Xun Gong , Zhengyang Chen , Yexin Yang , Shuai Wang , Lan Wang , and Yanmin Qian

In 12th International Symposium on Chinese Spoken Language Processing, ISCSLP 2021, Hong Kong, January 24-27, 2021 , Oct 2021
Revisiting the Statistics Pooling Layer in Deep Speaker Embedding Learning

Shuai Wang , Yexin Yang , Yanmin Qian , and Kai Yu

In 12th International Symposium on Chinese Spoken Language Processing, ISCSLP 2021, Hong Kong, January 24-27, 2021 , Oct 2021
Data Augmentation for end-to-end Code-Switching Speech Recognition

Chenpeng Du , Hao Li , Yizhou Lu , Lan Wang , and Yanmin Qian

In IEEE Spoken Language Technology Workshop, SLT 2021, Shenzhen, China, January 19-22, 2021 , Oct 2021
Dual-Path RNN for Long Recording Speech Separation

Chenda Li , Yi Luo , Cong Han , Jinyu Li , Takuya Yoshioka , Tianyan Zhou , Marc Delcroix , Keisuke Kinoshita , Christoph Böddeker , Yanmin Qian , Shinji Watanabe , and Zhuo Chen

In IEEE Spoken Language Technology Workshop, SLT 2021, Shenzhen, China, January 19-22, 2021 , Oct 2021
Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions

Wangyou Zhang , Jing Shi , Chenda Li , Shinji Watanabe , and Yanmin Qian

In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2021, New Paltz, NY, USA, October 17-20, 2021 , Oct 2021
Towards Duration Robust Weakly Supervised Sound Event Detection

Heinrich Dinkel , Mengyue Wu, and Kai Yu

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2021
Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student Training

Heinrich Dinkel , Shuai Wang , Xuenan Xu , Mengyue Wu, and Kai Yu

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2021
Building Interpretable Interaction Trees for Deep NLP Models

Die Zhang , Hao Zhang , Huilin Zhou , Xiaoyi Bao , Da Huo , Ruizhao Chen , Xu Cheng , Mengyue Wu, and Quanshi Zhang

In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021 , Oct 2021
Decoupled Dialogue Modeling and Semantic Parsing for Multi-Turn Text-to-SQL

Zhi Chen , Lu Chen, Hanqi Li , Ruisheng Cao , Da Ma , Mengyue Wu, and Kai Yu

In Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021 , Oct 2021
Enriching Ontology with Temporal Commonsense for Low-Resource Audio Tagging

Zhiling Zhang , Zelin Zhou , Haifeng Tang , Guangwei Li , Mengyue Wu , and Kenny Q. Zhu

In CIKM ’21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1 - 5, 2021 , Oct 2021
Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events

Xuenan Xu , Heinrich Dinkel , Mengyue Wu, and Kai Yu

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning

Xuenan Xu , Heinrich Dinkel , Mengyue Wu, Zeyu Xie , and Kai Yu

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
A Lightweight Framework for Online Voice Activity Detection in the Wild

Xuenan Xu , Heinrich Dinkel , Mengyue Wu, and Kai Yu

In Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021 , Oct 2021
Audio Caption in a Car Setting with a Sentence-Level Loss

Xuenan Xu , Heinrich Dinkel , Mengyue Wu, and Kai Yu

In 12th International Symposium on Chinese Spoken Language Processing, ISCSLP 2021, Hong Kong, January 24-27, 2021 , Oct 2021
DEPA: Self-Supervised Audio Embedding for Depression Detection

Pingyue Zhang , Mengyue Wu, Heinrich Dinkel , and Kai Yu

In MM ’21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021 , Oct 2021
LET: Linguistic Knowledge Enhanced Graph Transformer for Chinese Short Text Matching

Boer Lyu , Lu Chen , Su Zhu , and Kai Yu

In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021 , Oct 2021
LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations

Ruisheng Cao , Lu Chen , Zhi Chen , Yanbin Zhao , Su Zhu , and Kai Yu

In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021 , Oct 2021
WebSRC: A Dataset for Web-Based Structural Reading Comprehension

Xingyu Chen , Zihan Zhao , Lu Chen, Jiabao Ji , Danyang Zhang , Ao Luo , Yuxuan Xiong , and Kai Yu

In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021 , Oct 2021
Glyph Enhanced Chinese Character Pre-Training for Lexical Sememe Prediction

Boer Lyu , Lu Chen, and Kai Yu

In Findings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 16-20 November, 2021 , Oct 2021
Class-Based Neural Network Language Model for Second-Pass Rescoring in ASR

Lingfeng Dai , Qi Liu , and Kai Yu

In Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021 , Oct 2021
Rich Prosody Diversity Modelling with Phone-Level Mixture Density Network

Chenpeng Du , and Kai Yu

In Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021 , Oct 2021
ShadowGNN: Graph Projection Neural Network for Text-to-SQL Parser

Zhi Chen , Lu Chen, Yanbin Zhao , Ruisheng Cao , Zihan Xu , Su Zhu , and Kai Yu

In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021 , Oct 2021
Few-Shot NLU with Vector Projection Distance and Abstract Triangular CRF

Su Zhu , Lu Chen, Ruisheng Cao , Zhi Chen , Qingliang Miao , and Kai Yu

In Natural Language Processing and Chinese Computing - 10th CCF International Conference, NLPCC 2021, Qingdao, China, October 13-17, 2021, Proceedings, Part I , Oct 2021
Relation-Aware Multi-hop Reasoning forVisual Dialog

Yao Zhao , Lu Chen, and Kai Yu

In Natural Language Processing and Chinese Computing - 10th CCF International Conference, NLPCC 2021, Qingdao, China, October 13-17, 2021, Proceedings, Part I , Oct 2021
Mixture Density Network for Phone-Level Prosody Modelling in Speech Synthesis

Chenpeng Du , and Kai Yu

CoRR, Oct 2021
Diverse and Controllable Speech Synthesis with GMM-Based Phone-Level Prosody Modelling

Chenpeng Du , and Kai Yu

CoRR, Oct 2021

2020

Improving End-to-End Single-Channel Multi-Talker Speech Recognition

Wangyou Zhang , Xuankai Chang , Yanmin Qian , and Shinji Watanabe

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2020
Data Augmentation Using Deep Generative Models for Embedding Based Speaker Recognition

Shuai Wang , Yexin Yang , Zhanghao Wu , Yanmin Qian , and Kai Yu

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2020
End-To-End Multi-Speaker Speech Recognition With Transformer

Xuankai Chang , Wangyou Zhang , Yanmin Qian , Jonathan Le Roux , and Shinji Watanabe

In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
Text Adaptation for Speaker Verification with Speaker-Text Factorized Embeddings

Yexin Yang , Shuai Wang , Xun Gong , Yanmin Qian , and Kai Yu

In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
Channel Invariant Speaker Embedding Learning with Joint Multi-Task and Adversarial Training

Zhengyang Chen , Shuai Wang , Yanmin Qian , and Kai Yu

In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
Deep Audio-Visual Speech Separation with Attention Mechanism

Chenda Li , and Yanmin Qian

In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
Learning Contextual Language Embeddings for Monaural Multi-Talker Speech Recognition

Wangyou Zhang , and Yanmin Qian

In Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming

Wangyou Zhang , Aswin Shanmugam Subramanian , Xuankai Chang , Shinji Watanabe , and Yanmin Qian

In Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
Dual-Adversarial Domain Adaptation for Generalized Replay Attack Detection

Hongji Wang , Heinrich Dinkel , Shuai Wang , Yanmin Qian , and Kai Yu

In Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
Listen, Watch and Understand at the Cocktail Party: Audio-Visual-Contextual Speech Separation

Chenda Li , and Yanmin Qian

In Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
Multi-Modality Matters: A Performance Leap on VoxCeleb

Zhengyang Chen , Shuai Wang , and Yanmin Qian

In Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
Adversarial Domain Adaptation for Speaker Verification Using Partially Shared Network

Zhengyang Chen , Shuai Wang , and Yanmin Qian

In Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
Bi-Encoder Transformer Network for Mandarin-English Code-Switching Speech Recognition Using Mixture of Experts

Yizhou Lu , Mingkun Huang , Hao Li , Jiaqi Guo , and Yanmin Qian

In Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
End-to-End Speaker-Dependent Voice Activity Detection

Yefei Chen , Shuai Wang , Yanmin Qian , and Kai Yu

CoRR, Oct 2020
A CRNN-GRU Based Reinforcement Learning Approach to Audio Captioning

Xuenan Xu , Heinrich Dinkel , Mengyue Wu, and Kai Yu

In Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), Tokyo, Japan (full virtual), November 2-4, 2020 , Oct 2020
Multiple Sound Sources Localization from Coarse to Fine

Rui Qian , Di Hu , Heinrich Dinkel , Mengyue Wu, Ning Xu , and Weiyao Lin

In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XX , Oct 2020
Voice Activity Detection in the Wild via Weakly Supervised Sound Event Detection

Yefei Chen , Heinrich Dinkel , Mengyue Wu, and Kai Yu

In Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
GPVAD: Towards noise robust voice activity detection via weakly supervised sound event detection

Heinrich Dinkel , Yefei Chen , Mengyue Wu, and Kai Yu

CoRR, Oct 2020
Interpreting Hierarchical Linguistic Interactions in DNNs

Die Zhang , Huilin Zhou , Xiaoyi Bao , Da Huo , Ruizhao Chen , Xu Cheng , Hao Zhang , Mengyue Wu, and Quanshi Zhang

CoRR, Oct 2020
Towards a new generation of artificial intelligence in China

Fei Wu , Cewu Lu , Mingjie Zhu , Hao Chen , Jun Zhu , Kai Yu, Lei Li , Ming Li , Qianfeng Chen , Xi Li , Xudong Cao , Zhongyuan Wang , Zhengjun Zha , Yueting Zhuang , and Yunhe Pan

Nat. Mach. Intell., Oct 2020
Prior Knowledge Driven Label Embedding for Slot Filling in Natural Language Understanding

Su Zhu , Zijian Zhao , Rao Ma , and Kai Yu

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2020
Dual Learning for Semi-Supervised Natural Language Understanding

Su Zhu , Ruisheng Cao , and Kai Yu

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2020
Modular End-to-End Automatic Speech Recognition Framework for Acoustic-to-Word Model

Qi Liu , Zhehuai Chen , Hao Li , Mingkun Huang , Yizhou Lu , and Kai Yu

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2020
Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue Management

Zhi Chen , Lu Chen, Xiaoyuan Liu , and Kai Yu

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2020
Neural Network Language Model Compression With Product Quantization and Soft Binarization

Kai Yu, Rao Ma , Kaiyu Shi , and Qi Liu

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2020
Schema-Guided Multi-Domain Dialogue State Tracking with Graph Attention Neural Networks

Lu Chen, Boer Lv , Chi Wang , Su Zhu , Bowen Tan , and Kai Yu

In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020 , Oct 2020
Semi-Supervised Text Simplification with Back-Translation and Asymmetric Denoising Autoencoders

Yanbin Zhao , Lu Chen , Zhi Chen , and Kai Yu

In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020 , Oct 2020
Line Graph Enhanced AMR-to-Text Generation with Mix-Order Graph Attention Networks

Yanbin Zhao , Lu Chen , Zhi Chen , Ruisheng Cao , Su Zhu , and Kai Yu

In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020 , Oct 2020
Neural Graph Matching Networks for Chinese Short Text Matching

Lu Chen, Yanbin Zhao , Boer Lyu , Lesheng Jin , Zhi Chen , Su Zhu , and Kai Yu

In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020 , Oct 2020
Unsupervised Dual Paraphrasing for Two-stage Semantic Parsing

Ruisheng Cao , Su Zhu , Chenyu Yang , Chen Liu , Rao Ma , Yanbin Zhao , Lu Chen, and Kai Yu

In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020 , Oct 2020
Efficient Context and Schema Fusion Networks for Multi-Domain Dialogue State Tracking

Su Zhu , Jieyu Li , Lu Chen, and Kai Yu

In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020 , Oct 2020
Duration Robust Weakly Supervised Sound Event Detection

Heinrich Dinkel , and Kai Yu

In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
Investigation of Specaugment for Deep Speaker Embedding Learning

Shuai Wang , Johan Rohdin , Oldrich Plchot , Lukás Burget , Kai Yu, and Jan Cernocký

In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
Speaker Augmentation for Low Resource Speech Recognition

Chenpeng Du , and Kai Yu

In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
Neural Lattice Search for Speech Recognition

Rao Ma , Hao Li , Qi Liu , Lu Chen, and Kai Yu

In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
A Hierarchical Tracker for Multi-Domain Dialogue State Tracking

Jieyu Li , Su Zhu , and Kai Yu

In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
Addressing the Polysemy Problem in Language Modeling with Attentional Multi-Sense Embeddings

Rao Ma , Lesheng Jin , Qi Liu , Lu Chen, and Kai Yu

In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
CODA: Improving Resource Utilization by Slimming and Co-locating DNN and CPU Jobs

Han Zhao , Weihao Cui , Quan Chen , Jingwen Leng , Kai Yu, Deze Zeng , Chao Li , and Minyi Guo

In 40th IEEE International Conference on Distributed Computing Systems, ICDCS 2020, Singapore, November 29 - December 1, 2020 , Oct 2020
Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding

Chen Liu , Su Zhu , Zijian Zhao , Ruisheng Cao , Lu Chen, and Kai Yu

In Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
Memory Attention Neural Network for Multi-domain Dialogue State Tracking

Zihan Xu , Zhi Chen , Lu Chen , Su Zhu , and Kai Yu

In Natural Language Processing and Chinese Computing - 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, October 14-18, 2020, Proceedings, Part I , Oct 2020
Robust Spoken Language Understanding with RL-Based Value Error Recovery

Chen Liu , Su Zhu , Lu Chen, and Kai Yu

In Natural Language Processing and Chinese Computing - 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, October 14-18, 2020, Proceedings, Part I , Oct 2020
An Investigation on Different Underlying Quantization Schemes for Pre-trained Language Models

Zihan Zhao , Yuncong Liu , Lu Chen, Qi Liu , Rao Ma , and Kai Yu

In Natural Language Processing and Chinese Computing - 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, October 14-18, 2020, Proceedings, Part I , Oct 2020
An Investigation on Deep Learning with Beta Stabilizer

Qi Liu , Tian Tan , and Kai Yu

CoRR, Oct 2020
Vector Projection Network for Few-shot Slot Tagging in Natural Language Understanding

Su Zhu , Ruisheng Cao , Lu Chen, and Kai Yu

CoRR, Oct 2020
Deep Reinforcement Learning for On-line Dialogue State Tracking

Zhi Chen , Lu Chen, Xiang Zhou , and Kai Yu

CoRR, Oct 2020
Structured Hierarchical Dialogue Policy with Graph Neural Networks

Zhi Chen , Xiaoyuan Liu , Lu Chen, and Kai Yu

CoRR, Oct 2020
Dual Learning for Dialogue State Tracking

Zhi Chen , Lu Chen, Yanbin Zhao , Su Zhu , and Kai Yu

CoRR, Oct 2020
CREDIT: Coarse-to-Fine Sequence Generation for Dialogue State Tracking

Zhi Chen , Lu Chen, Zihan Xu , Yanbin Zhao , Su Zhu , and Kai Yu

CoRR, Oct 2020

2019

Erratum to: Past review, current progress, and challenges ahead on the cocktail party problem

Yanmin Qian , Chao Weng , Xuankai Chang , Shuai Wang , and Dong Yu

Frontiers Inf. Technol. Electron. Eng., Oct 2019
Binary neural networks for speech recognition

Yanmin Qian , and Xu Xiang

Frontiers Inf. Technol. Electron. Eng., Oct 2019
Data augmentation using generative adversarial networks for robust speech recognition

Yanmin Qian , Hu Hu , and Tian Tan

Speech Commun., Oct 2019
Discriminative Neural Embedding Learning for Short-Duration Text-Independent Speaker Verification

Shuai Wang , Zili Huang , Yanmin Qian , and Kai Yu

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2019
Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

Xu Xiang , Shuai Wang , Houjun Huang , Yanmin Qian , and Kai Yu

In 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019, Lanzhou, China, November 18-21, 2019 , Oct 2019
GANs for Children: A Generative Data Augmentation Strategy for Children Speech Recognition

Peiyao Sheng , Zhuolin Yang , and Yanmin Qian

In IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019, Singapore, December 14-18, 2019 , Oct 2019
MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition

Xuankai Chang , Wangyou Zhang , Yanmin Qian , Jonathan Le Roux , and Shinji Watanabe

In IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019, Singapore, December 14-18, 2019 , Oct 2019
Exploring Model Units and Training Strategies for End-to-End Speech Recognition

Mingkun Huang , Yizhou Lu , Lan Wang , Yanmin Qian , and Kai Yu

In IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019, Singapore, December 14-18, 2019 , Oct 2019
End-to-End Overlapped Speech Detection and Speaker Counting with Raw Waveform

Wangyou Zhang , Man Sun , Lan Wang , and Yanmin Qian

In IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019, Singapore, December 14-18, 2019 , Oct 2019
Knowledge Distillation for Small Foot-print Deep Speaker Embedding

Shuai Wang , Yexin Yang , Tianzhe Wang , Yanmin Qian , and Kai Yu

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, United Kingdom, May 12-17, 2019 , Oct 2019
End-to-end Monaural Multi-speaker ASR System without Pretraining

Xuankai Chang , Yanmin Qian , Kai Yu, and Shinji Watanabe

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, United Kingdom, May 12-17, 2019 , Oct 2019
The SJTU Robust Anti-Spoofing System for the ASVspoof 2019 Challenge

Yexin Yang , Hongji Wang , Heinrich Dinkel , Zhengyang Chen , Shuai Wang , Yanmin Qian , and Kai Yu

In Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019 , Oct 2019
On the Usage of Phonetic Information for Text-Independent Speaker Embedding Extraction

Shuai Wang , Johan Rohdin , Lukás Burget , Oldrich Plchot , Yanmin Qian , Kai Yu, and Jan Cernocký

In Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019 , Oct 2019
Data Augmentation Using Variational Autoencoder for Embedding Based Speaker Verification

Zhanghao Wu , Shuai Wang , Yanmin Qian , and Kai Yu

In Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019 , Oct 2019
Joint Decoding of CTC Based Systems for Speech Recognition

Jiaqi Guo , Yongbin You , Yanmin Qian , and Kai Yu

In Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019 , Oct 2019
Knowledge Distillation for End-to-End Monaural Multi-Talker ASR System

Wangyou Zhang , Xuankai Chang , and Yanmin Qian

In Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019 , Oct 2019
Robust DOA Estimation Based on Convolutional Neural Network and Time-Frequency Masking

Wangyou Zhang , Ying Zhou , and Yanmin Qian

In Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019 , Oct 2019
Cross-Domain Replay Spoofing Attack Detection Using Domain Adversarial Training

Hongji Wang , Heinrich Dinkel , Shuai Wang , Yanmin Qian , and Kai Yu

In Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019 , Oct 2019
Prosody Usage Optimization for Children Speech Recognition with Zero Resource Children Speech

Chenda Li , and Yanmin Qian

In Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019 , Oct 2019
Audio Caption: Listen and Tell

Mengyue Wu, Heinrich Dinkel , and Kai Yu

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, United Kingdom, May 12-17, 2019 , Oct 2019
Text-based Depression Detection: What Triggers An Alert

Heinrich Dinkel , Mengyue Wu, and Kai Yu

CoRR, Oct 2019
What does a Car-ssette tape tell?

Xuenan Xu , Heinrich Dinkel , Mengyue Wu, and Kai Yu

CoRR, Oct 2019
AgentGraph: Toward Universal Dialogue Management With Structured Deep Reinforcement Learning

Lu Chen , Zhi Chen , Bowen Tan , Sishan Long , Milica Gasic , and Kai Yu

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2019
Semantic Parsing with Dual Learning

Ruisheng Cao , Su Zhu , Chen Liu , Jieyu Li , and Kai Yu

In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers , Oct 2019
Highly Efficient Neural Network Language Model Compression Using Soft Binarization Training

Rao Ma , Qi Liu , and Kai Yu

In IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019, Singapore, December 14-18, 2019 , Oct 2019
Data Augmentation with Atomic Templates for Spoken Language Understanding

Zijian Zhao , Su Zhu , and Kai Yu

In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019 , Oct 2019
A Hierarchical Decoding Model for Spoken Language Understanding from Unaligned Data

Zijian Zhao , Su Zhu , and Kai Yu

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, United Kingdom, May 12-17, 2019 , Oct 2019
CATSLU: The 1st Chinese Audio-Textual Spoken Language Understanding Challenge

Su Zhu , Zijian Zhao , Tiejun Zhao , Chengqing Zong , and Kai Yu

In International Conference on Multimodal Interaction, ICMI 2019, Suzhou, China, October 14-18, 2019 , Oct 2019
Robust Spoken Language Understanding with Acoustic and Domain Knowledge

Hao Li , Chen Liu , Su Zhu , and Kai Yu

In International Conference on Multimodal Interaction, ICMI 2019, Suzhou, China, October 14-18, 2019 , Oct 2019
Cross Aggregation of Multi-head Attention for Neural Machine Translation

Juncheng Cao , Hai Zhao , and Kai Yu

In Natural Language Processing and Chinese Computing - 8th CCF International Conference, NLPCC 2019, Dunhuang, China, October 9-14, 2019, Proceedings, Part I , Oct 2019
International Conference on Multimodal Interaction, ICMI 2019, Suzhou, China, October 14-18, 2019

Oct 2019

2018

Past review, current progress, and challenges ahead on the cocktail party problem

Yanmin Qian , Chao Weng , Xuankai Chang , Shuai Wang , and Dong Yu

Frontiers Inf. Technol. Electron. Eng., Oct 2018
Erratum to: Past review, current progress, and challenges ahead on the cocktail party problem

Yanmin Qian , Chao Weng , Xuankai Chang , Shuai Wang , and Dong Yu

Frontiers Inf. Technol. Electron. Eng., Oct 2018
Sequence discriminative training for deep learning based acoustic keyword spotting

Zhehuai Chen , Yanmin Qian , and Kai Yu

Speech Commun., Oct 2018
Single-channel multi-talker speech recognition with permutation invariant training

Yanmin Qian , Xuankai Chang , and Dong Yu

Speech Commun., Oct 2018
Adaptive Very Deep Convolutional Residual Network for Noise Robust Speech Recognition

Tian Tan , Yanmin Qian , Hu Hu , Ying Zhou , Wen Ding , and Kai Yu

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2018
Investigating Raw Wave Deep Neural Networks for End-to-End Speaker Spoofing Detection

Heinrich Dinkel , Yanmin Qian , and Kai Yu

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2018
Robust Mask Estimation By Integrating Neural Network-Based and Clustering-Based Approaches for Adaptive Acoustic Beamforming

Ying Zhou , and Yanmin Qian

In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
Knowledge Transfer in Permutation Invariant Training for Single-Channel Multi-Talker Speech Recognition

Tian Tan , Yanmin Qian , and Dong Yu

In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
Joint I-Vector with End-to-End System for Short Duration Text-Independent Speaker Verification

Zili Huang , Shuai Wang , and Yanmin Qian

In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
Generative Adversarial Networks Based Data Augmentation for Noise Robust Speech Recognition

Hu Hu , Tian Tan , and Yanmin Qian

In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
Focal Kl-Divergence Based Dilated Convolutional Neural Networks for Co-Channel Speaker Identification

Shuai Wang , Yanmin Qian , and Kai Yu

In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
Noise Robust Speech Recognition on Aurora4 by Humans and Machines

Yanmin Qian , Tian Tan , Hu Hu , and Qi Liu

In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
Fast Adaptation on Deepmixture Generative Network Based Acoustic Modeling

Wen Ding , Tian Tan , and Yanmin Qian

In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
Adaptive Permutation Invariant Training with Auxiliary Information for Monaural Multi-Talker Speech Recognition

Xuankai Chang , Yanmin Qian , and Dong Yu

In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation

Lianwu Chen , Meng Yu , Yanmin Qian , Dan Su , and Dong Yu

In Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018 , Oct 2018
Deep Extractor Network for Target Speaker Recovery from Single Channel Speech Mixtures

Jun Wang , Jie Chen , Dan Su , Lianwu Chen , Meng Yu , Yanmin Qian , and Dong Yu

In Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018 , Oct 2018
Monaural Multi-Talker Speech Recognition with Attention Mechanism and Gated Convolutional Networks

Xuankai Chang , Yanmin Qian , and Dong Yu

In Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018 , Oct 2018
Knowledge Distillation for Sequence Model

Mingkun Huang , Yongbin You , Zhehuai Chen , Yanmin Qian , and Kai Yu

In Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018 , Oct 2018
Covariance Based Deep Feature for Text-Dependent Speaker Verification

Shuai Wang , Heinrich Dinkel , Yanmin Qian , and Kai Yu

In Intelligence Science and Big Data Engineering - 8th International Conference, IScIDE 2018, Lanzhou, China, August 18-19, 2018, Revised Selected Papers , Oct 2018
Data Augmentation using Conditional Generative Adversarial Networks for Robust Speech Recognition

Peiyao Sheng , Zhuolin Yang , Hu Hu , Tian Tan , and Yanmin Qian

In 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018, Taipei City, Taiwan, November 26-29, 2018 , Oct 2018
Deep Discriminant Analysis for i-vector Based Robust Speaker Recognition

Shuai Wang , Zili Huang , Yanmin Qian , and Kai Yu

In 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018, Taipei City, Taiwan, November 26-29, 2018 , Oct 2018
Generative Adversarial Networks based X-vector Augmentation for Robust Probabilistic Linear Discriminant Analysis in Speaker Verification

Yexin Yang , Shuai Wang , Man Sun , Yanmin Qian , and Kai Yu

In 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018, Taipei City, Taiwan, November 26-29, 2018 , Oct 2018
Rich Short Text Conversation Using Semantic-Key-Controlled Sequence Generation

Kai Yu, Zijian Zhao , Xueyang Wu , Hongtao Lin , and Xuan Liu

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2018
Structured Dialogue Policy with Graph Neural Networks

Lu Chen, Bowen Tan , Sishan Long , and Kai Yu

In Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018 , Oct 2018
Towards Universal Dialogue State Tracking

Liliang Ren , Kaige Xie , Lu Chen, and Kai Yu

In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018 , Oct 2018
On Modular Training of Neural Acoustics-to-Word Model for LVCSR

Zhehuai Chen , Qi Liu , Hao Li , and Kai Yu

In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
Semi-Supervised Training Using Adversarial Multi-Task Learning for Spoken Language Understanding

Ouyu Lan , Su Zhu , and Kai Yu

In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
Policy Adaptation for Deep Reinforcement Learning-Based Dialogue Management

Lu Chen, Cheng Chang , Zhi Chen , Bowen Tan , Milica Gasic , and Kai Yu

In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
Robust Spoken Language Understanding with Unsupervised ASR-Error Adaptation

Su Zhu , Ouyu Lan , and Kai Yu

In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
MLN: Moment localization Network and Samples Selection for Moment Retrieval

Bo Huang , Ya Zhang , and Kai Yu

In Proceedings of the 2nd International Conference on Video and Image Processing, ICVIP 2018, Hong Kong, China, December 29-31, 2018 , Oct 2018
Angular Softmax for Short-Duration Text-independent Speaker Verification

Zili Huang , Shuai Wang , and Kai Yu

In Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018 , Oct 2018
Joint Spoken Language Understanding and Domain Adaptive Language Modeling

Huifeng Zhang , Su Zhu , Shuai Fan , and Kai Yu

In Intelligence Science and Big Data Engineering - 8th International Conference, IScIDE 2018, Lanzhou, China, August 18-19, 2018, Revised Selected Papers , Oct 2018
Binarized LSTM Language Model

Xuan Liu , Di Cao , and Kai Yu

In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers) , Oct 2018
Cost-Sensitive Active Learning for Dialogue State Tracking

Kaige Xie , Cheng Chang , Liliang Ren , Lu Chen, and Kai Yu

In Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, Melbourne, Australia, July 12-14, 2018 , Oct 2018
Concept Transfer Learning for Adaptive Language Understanding

Su Zhu , and Kai Yu

In Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, Melbourne, Australia, July 12-14, 2018 , Oct 2018
Intelligence Science and Big Data Engineering - 8th International Conference, IScIDE 2018, Lanzhou, China, August 18-19, 2018, Revised Selected Papers

Oct 2018

2017

Phone Synchronous Speech Recognition With CTC Lattices

Zhehuai Chen , Yimeng Zhuang , Yanmin Qian , and Kai Yu

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2017
Deep Feature Engineering for Noise Robust Spoofing Detection

Yanmin Qian , Nanxin Chen , Heinrich Dinkel , and Zhizheng Wu

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2017
Integrating online i-vector into GMM-UBM for text-dependent speaker verification

Xiaowei Jiang , Shuai Wang , Xu Xiang , and Yanmin Qian

In 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017, Kuala Lumpur, Malaysia, December 12-15, 2017 , Oct 2017
Future vector enhanced LSTM language model for LVCSR

Qi Liu , Yanmin Qian , and Kai Yu

In 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017, Okinawa, Japan, December 16-20, 2017 , Oct 2017
Multi-view LSTM Language Model with Word-Synchronized Auxiliary Feature for LVCSR

Yue Wu , Tianxing He , Zhehuai Chen , Yanmin Qian , and Kai Yu

In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data - 16th China National Conference, CCL 2017, - and - 5th International Symposium, NLP-NABD 2017, Nanjing, China, October 13-15, 2017, Proceedings , Oct 2017
End-to-end spoofing detection with raw waveform CLDNNS

Heinrich Dinkel , Nanxin Chen , Yanmin Qian , and Kai Yu

In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017 , Oct 2017
Small-footprint convolutional neural network for spoofing detection

Heinrich Dinkel , Yanmin Qian , and Kai Yu

In 2017 International Joint Conference on Neural Networks, IJCNN 2017, Anchorage, AK, USA, May 14-19, 2017 , Oct 2017
Binary Deep Neural Networks for Speech Recognition

Xu Xiang , Yanmin Qian , and Kai Yu

In Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017 , Oct 2017
What Does the Speaker Embedding Encode?

Shuai Wang , Yanmin Qian , and Kai Yu

In Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017 , Oct 2017
Recognizing Multi-Talker Speech with Permutation Invariant Training

Dong Yu , Xuankai Chang , and Yanmin Qian

In Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017 , Oct 2017
A Unified Confidence Measure Framework Using Auxiliary Normalization Graph

Zhehuai Chen , Yanmin Qian , and Kai Yu

In Intelligence Science and Big Data Engineering - 7th International Conference, IScIDE 2017, Dalian, China, September 22-23, 2017, Proceedings , Oct 2017
Adaptation of Deep Neural Network Acoustic Models for Robust Automatic Speech Recognition

Khe Chai Sim , Yanmin Qian , Gautam Mantena , Lahiru Samarakoon , Souvik Kundu , and Tian Tan

In New Era for Robust Speech Recognition, Exploiting Deep Learning , Oct 2017
On-line Dialogue Policy Learning with Companion Teaching

Lu Chen, Runzhe Yang , Cheng Chang , Zihao Ye , Xiang Zhou , and Kai Yu

In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, April 3-7, 2017, Volume 2: Short Papers , Oct 2017
Affordable On-line Dialogue Policy Learning

Cheng Chang , Runzhe Yang , Lu Chen, Xiang Zhou , and Kai Yu

In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017 , Oct 2017
Agent-Aware Dropout DQN for Safe and Efficient On-line Dialogue Policy Learning

Lu Chen, Xiang Zhou , Cheng Chang , Runzhe Yang , and Kai Yu

In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017 , Oct 2017
Confidence measures for CTC-based phone synchronous decoding

Zhehuai Chen , Yimeng Zhuang , and Kai Yu

In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017 , Oct 2017
Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding

Su Zhu , and Kai Yu

In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017 , Oct 2017
Discrete Duration Model for Speech Synthesis

Bo Chen , Tianling Bian , and Kai Yu

In Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017 , Oct 2017
Deep Attentive Structured Language Model Based on LSTM

Di Cao , and Kai Yu

In Intelligence Science and Big Data Engineering - 7th International Conference, IScIDE 2017, Dalian, China, September 22-23, 2017, Proceedings , Oct 2017
splab at the NTCIR-13 STC-2 Task

Xuan Liu , Xueyang Wu , Ruinian Chen , Zijian Zhao , Hongtao Lin , and Kai Yu

In The 13th NTCIR Conference, Evaluation of Information Access Technologies, National Center of Sciences, Tokyo, Japan, December 5-8, 2017 , Oct 2017

2016

Deep features for automatic spoofing detection

Yanmin Qian , Nanxin Chen , and Kai Yu

Speech Commun., Oct 2016
Cluster Adaptive Training for Deep Neural Network Based Acoustic Model

Tian Tan , Yanmin Qian , and Kai Yu

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2016
Neural Network Based Multi-Factor Aware Joint Training for Robust Speech Recognition

Yanmin Qian , Tian Tan , and Dong Yu

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2016
Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition

Yanmin Qian , Mengxiao Bi , Tian Tan , and Kai Yu

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2016
Overview of BTAS 2016 speaker anti-spoofing competition

Pavel Korshunov , Sébastien Marcel , Hannah Muckenhirn , André R. Gonçalves , A. G. Souza Mello , Ricardo Paranhos Velloso Violato , Flávio Olmos Simões , M. U. Neto , Marcus Assis Angeloni , José Augusto Stuchi , Heinrich Dinkel , Nanxin Chen , Yanmin Qian , Dipjyoti Paul , Goutam Saha , and Md. Sahidullah

In 8th IEEE International Conference on Biometrics Theory, Applications and Systems, BTAS 2016, Niagara Falls, NY, USA, September 6-9, 2016 , Oct 2016
Joint acoustic factor learning for robust deep neural network based automatic speech recognition

Souvik Kundu , Gautam Mantena , Yanmin Qian , Tian Tan , Marc Delcroix , and Khe Chai Sim

In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016 , Oct 2016
Speaker-aware training of LSTM-RNNS for acoustic modelling

Tian Tan , Yanmin Qian , Dong Yu , Souvik Kundu , Liang Lu , Khe Chai Sim , Xiong Xiao , and Yu Zhang

In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016 , Oct 2016
Improved DNN-based segmentation for multi-genre broadcast audio

Linlin Wang , Chao Zhang , Philip C. Woodland , Mark J. F. Gales , Panagiota Karanasou , Pierre Lanchantin , Xunying Liu , and Yanmin Qian

In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016 , Oct 2016
An investigation into using parallel data for far-field speech recognition

Yanmin Qian , Tian Tan , and Dong Yu

In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016 , Oct 2016
Integrated adaptation with multi-factor joint-learning for far-field speech recognition

Yanmin Qian , Tian Tan , Dong Yu , and Yu Zhang

In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016 , Oct 2016
Unrestricted Vocabulary Keyword Spotting Using LSTM-CTC

Yimeng Zhuang , Xuankai Chang , Yanmin Qian , and Kai Yu

In Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 8-12, 2016 , Oct 2016
Multi-task joint-learning for robust voice activity detection

Yimeng Zhuang , Sibo Tong , Maofan Yin , Yanmin Qian , and Kai Yu

In 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016, Tianjin, China, October 17-20, 2016 , Oct 2016
Very deep convolutional neural networks for robust speech recognition

Yanmin Qian , and Philip C. Woodland

In 2016 IEEE Spoken Language Technology Workshop, SLT 2016, San Diego, CA, USA, December 13-16, 2016 , Oct 2016
Evolvable dialogue state tracking for statistical dialogue management

Kai Yu, Lu Chen, Kai Sun , Qizhe Xie , and Su Zhu

Frontiers Comput. Sci., Oct 2016
Discriminatively trained joint speaker and environment representations for adaptation of deep neural network acoustic models

Maofan Yin , Sunil Sivadas , Kai Yu, and Bin Ma

In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016 , Oct 2016
A comparative study of robustness of deep learning approaches for VAD

Sibo Tong , Hao Gu , and Kai Yu

In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016 , Oct 2016
Phone Synchronous Decoding with CTC Lattice

Zhehuai Chen , Wei Deng , Tao Xu , and Kai Yu

In Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 8-12, 2016 , Oct 2016
Hybrid Dialogue State Tracking for Real World Human-to-Human Dialogues

Kai Sun , Su Zhu , Lu Chen, Siqiu Yao , Xueyang Wu , and Kai Yu

In Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 8-12, 2016 , Oct 2016
On training bi-directional neural network language model with noise contrastive estimation

Tianxing He , Yu Zhang , Jasha Droppo , and Kai Yu

In 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016, Tianjin, China, October 17-20, 2016 , Oct 2016
Rich punctuations prediction using large-scale deep learning

Xueyang Wu , Su Zhu , Yue Wu , and Kai Yu

In 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016, Tianjin, China, October 17-20, 2016 , Oct 2016
Directed automatic speech transcription error correction using bidirectional LSTM

Da Zheng , Zhehuai Chen , Yue Wu , and Kai Yu

In 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016, Tianjin, China, October 17-20, 2016 , Oct 2016
The splab at the NTCIR-12 Short Text Conversation Task

Ke Wu , Xuan Liu , and Kai Yu

In Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, National Center of Sciences, Tokyo, Japan, June 7-10, 2016 , Oct 2016

2015

Deep feature for text-dependent speaker verification

Yuan Liu , Yanmin Qian , Nanxin Chen , Tianfan Fu , Ya Zhang , and Kai Yu

Speech Commun., Oct 2015
Multi-task joint-learning of deep neural networks for robust speech recognition

Yanmin Qian , Maofan Yin , Yongbin You , and Kai Yu

In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, AZ, USA, December 13-17, 2015 , Oct 2015
Cambridge university transcription systems for the multi-genre broadcast challenge

Philip C. Woodland , Xunying Liu , Yanmin Qian , Chao Zhang , Mark J. F. Gales , Penny Karanasou , Pierre Lanchantin , and Linlin Wang

In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, AZ, USA, December 13-17, 2015 , Oct 2015
The development of the cambridge university alignment systems for the multi-genre broadcast challenge

Pierre Lanchantin , Mark J. F. Gales , Penny Karanasou , Xunying Liu , Yanmin Qian , Linlin Wang , Philip C. Woodland , and Chao Zhang

In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, AZ, USA, December 13-17, 2015 , Oct 2015
Speaker diarisation and longitudinal linking in multi-genre broadcast data

Penny Karanasou , Mark J. F. Gales , Pierre Lanchantin , Xunying Liu , Yanmin Qian , Linlin Wang , Philip C. Woodland , and Chao Zhang

In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, AZ, USA, December 13-17, 2015 , Oct 2015
Local trajectory based speech enhancement for robust speech recognition with deep neural network

Yongbin You , Yanmin Qian , and Kai Yu

In IEEE China Summit and International Conference on Signal and Information Processing, ChinaSIP 2015, Chengdu, China, July 12-15, 2015 , Oct 2015
An investigation on DNN-derived bottleneck features for GMM-HMM based robust speech recognition

Yongbin You , Yanmin Qian , Tianxing He , and Kai Yu

In IEEE China Summit and International Conference on Signal and Information Processing, ChinaSIP 2015, Chengdu, China, July 12-15, 2015 , Oct 2015
Cluster adaptive training for deep neural network

Tian Tan , Yanmin Qian , Maofan Yin , Yimeng Zhuang , and Kai Yu

In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19-24, 2015 , Oct 2015
A novel static parameter calculation method for model compensation

Suliang Bu , Yunxin Zhao , Yanmin Qian , and Kai Yu

In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19-24, 2015 , Oct 2015
Recurrent neural network language model with structured word embeddings for speech recognition

Tianxing He , Xu Xiang , Yanmin Qian , and Kai Yu

In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19-24, 2015 , Oct 2015
Automatic model redundancy reduction for fast back-propagation for deep neural networks in speech recognition

Yanmin Qian , Tianxing He , Wei Deng , and Kai Yu

In 2015 International Joint Conference on Neural Networks, IJCNN 2015, Killarney, Ireland, July 12-17, 2015 , Oct 2015
Multi-task learning for text-dependent speaker verification

Nanxin Chen , Yanmin Qian , and Kai Yu

In INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015 , Oct 2015
Robust deep feature for spoofing detection - the SJTU system for ASVspoof 2015 challenge

Nanxin Chen , Yanmin Qian , Heinrich Dinkel , Bo Chen , and Kai Yu

In INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015 , Oct 2015
Very deep convolutional neural networks for LVCSR

Mengxiao Bi , Yanmin Qian , and Kai Yu

In INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015 , Oct 2015
Paragraph vector based topic model for language model adaptation

Wengong Jin , Tianxing He , Yanmin Qian , and Kai Yu

In INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015 , Oct 2015
Constrained Markov Bayesian Polynomial for Efficient Dialogue State Tracking

Kai Yu, Kai Sun , Lu Chen , and Su Zhu

IEEE ACM Trans. Audio Speech Lang. Process., Oct 2015
An investigation of context clustering for statistical speech synthesis with deep neural network

Bo Chen , Zhehuai Chen , Jiachen Xu , and Kai Yu

In INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015 , Oct 2015
Recurrent Polynomial Network for Dialogue State Tracking with Mismatched Semantic Parsers

Qizhe Xie , Kai Sun , Su Zhu , Lu Chen, and Kai Yu

In Proceedings of the SIGDIAL 2015 Conference, The 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2-4 September 2015, Prague, Czech Republic , Oct 2015
Hyper-parameter Optimisation of Gaussian Process Reinforcement Learning for Statistical Dialogue Management

Lu Chen, Pei-Hao Su , and Milica Gasic

In Proceedings of the SIGDIAL 2015 Conference, The 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2-4 September 2015, Prague, Czech Republic , Oct 2015

2014

Stochastic data sweeping for fast DNN training

Wei Deng , Yanmin Qian , Yuchen Fan , Tianfan Fu , and Kai Yu

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4-9, 2014 , Oct 2014
Reshaping deep neural network for fast decoding by node-pruning

Tianxing He , Yuchen Fan , Yanmin Qian , Tian Tan , and Kai Yu

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4-9, 2014 , Oct 2014
Second order vector taylor series based robust speech recognition

Suliang Bu , Yanmin Qian , Khe Chai Sim , Yongbin You , and Kai Yu

In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4-9, 2014 , Oct 2014
Speaker verification with deep features

Yuan Liu , Tianfan Fu , Yuchen Fan , Yanmin Qian , and Kai Yu

In 2014 International Joint Conference on Neural Networks, IJCNN 2014, Beijing, China, July 6-11, 2014 , Oct 2014
Tandem deep features for text-dependent speaker verification

Tianfan Fu , Yanmin Qian , Yuan Liu , and Kai Yu

In INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014 , Oct 2014
A novel dynamic parameters calculation approach for model compensation

Suliang Bu , Yanmin Qian , and Kai Yu

In INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014 , Oct 2014
Acoustic emotion recognition using deep neural network

Jianwei Niu , Yanmin Qian , and Kai Yu

In The 9th International Symposium on Chinese Spoken Language Processing, Singapore, September 12-14, 2014 , Oct 2014
The SJTU System for Dialog State Tracking Challenge 2

Kai Sun , Lu Chen , Su Zhu , and Kai Yu

In Proceedings of the SIGDIAL 2014 Conference, The 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 18-20 June 2014, Philadelphia, PA, USA , Oct 2014
A generalized rule based tracker for dialogue state tracking

Kai Sun , Lu Chen , Su Zhu , and Kai Yu

In 2014 IEEE Spoken Language Technology Workshop, SLT 2014, South Lake Tahoe, NV, USA, December 7-10, 2014 , Oct 2014
Semantic parser enhancement for dialogue domain extension with little data

Su Zhu , Lu Chen, Kai Sun , Da Zheng , and Kai Yu

In 2014 IEEE Spoken Language Technology Workshop, SLT 2014, South Lake Tahoe, NV, USA, December 7-10, 2014 , Oct 2014

2013

Combination of data borrowing strategies for low-resource LVCSR

Yanmin Qian , Kai Yu, and Jia Liu

In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, Czech Republic, December 8-12, 2013 , Oct 2013
MLP-HMM two-stage unsupervised training for low-resource languages on conversational telephone speech recognition

Yanmin Qian , and Jia Liu

In INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25-29, 2013 , Oct 2013
A New Word Language Model Evaluation Metric for Character Based Languages

Peilu Wang , Ruihua Sun , Hai Zhao , and Kai Yu

In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data - 12th China National Conference, CCL 2013 and First International Symposium, NLP-NABD 2013, Suzhou, China, October 10-12, 2013. Proceedings , Oct 2013

2012

Introduction to the Issue on Advances in Spoken Dialogue Systems and Mobile Interface

Jason D. Williams , Kai Yu, Brahim Chaib-draa , Oliver Lemon , Roberto Pieraccini , Olivier Pietquin , Pascal Poupart , and Steve J. Young

IEEE J. Sel. Top. Signal Process., Oct 2012
ICMI’12 grand challenge: haptic voice recognition

Khe Chai Sim , Shengdong Zhao , Kai Yu, and Hank Liao

In International Conference on Multimodal Interaction, ICMI ’12, Santa Monica, CA, USA, October 22-26, 2012 , Oct 2012
Development of the 2012 SJTU HVR system

Hainan Xu , Yuchen Fan , and Kai Yu

In International Conference on Multimodal Interaction, ICMI ’12, Santa Monica, CA, USA, October 22-26, 2012 , Oct 2012