Heinrich Dinkel, Mengyue Wu and Kai Yu | Towards Duration Robust Weakly Supervised Sound Event Detection | Journal | TASLP | |
Yanmin Qian, Zhengyang Chen and Shuai Wang | Audio-Visual Deep Neural Network for Robust Person Verification | Journal | TASLP | |
Xuenan Xu, Heinrich Dinkel, Mengyue Wu and Kai Yu | Audio Caption in a Car Setting with a Sentence-Level Loss | Conference | ISCSLP | |
Xun Gong, Zhengyang Chen, Yexin Yang, Shuai Wang, Lan Wang and Yanmin Qian | Speaker Embedding Augmentation with Noise Distribution Matching | Conference | ISCSLP | |
Shuai Wang, Yexin Yang, Yanmin Qian and Kai Yu | Revisiting the Statistics Pooling Layer in Deep Speaker Embedding Learning | Conference | ISCSLP | |
Chenda Li, Jing Shi, Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Naoyuki Kamo, Moto Hira, Tomoki Hayashi, Christoph Boeddeker, Zhuo Chen and Shinji Watanabe | ESPnet-SE: end-to-end speech enhancement and separation toolkit designed for ASR integration | Conference | SLT | |
Chenda Li, Yi Luo, Cong Han, Jinyu Li, Takuya Yoshioka, Tianyan Zhou, Marc Delcroix, Keisuke Kinoshita, Christoph Boeddeker, Yanmin Qian, Shinji Watanabe and Zhuo Chen | Dual-path RNN for Long Recording Speech Separation | Conference | SLT | |
Chenpeng Du, Hao Li, Yizhou Lu, Lan Wang and Yanmin Qian | Data Augmentation for End-to-end Code-Switching Speech Recognition | Conference | SLT |
Chen Zhang, Daihui Peng,Lu Lv,Kaiming Zhuo,Kai Yu,Tian Shen,Yifeng Xu and Zhen Wang | Individual Perceived Stress Mediates Psychological Distress in Medical Workers During COVID-19 Epidemic Outbreak in Wuhan | Journal | Neuropsychiatric Disease and Treatment | |
Kai Yu, Rao Ma, Kaiyu Shi and Qi Liu | Neural Network Language Model Compression With Product Quantization and Soft Binarization | Journal | TASLP | |
Zhi Chen, Lu Chen, Xiaoyuan Liu and Kai Yu | Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue Management | Journal | TASLP | |
Shuai Wang, Yexin Yang, Zhanghao Wu, Yanmin Qian and Kai Yu | Data Augmentation using Deep Generative Models for Embedding based Speaker Recognition | Journal | TASLP | |
Qi Liu, Zhehuai Chen, Hao Li, Mingkun Huang, Yizhou Lu and Kai Yu | Modular End-to-end Automatic Speech Recognition Framework for Acoustic-to-word Model | Journal | TASLP | |
Su Zhu, Ruisheng Cao and Kai Yu | Dual Learning for Semi-Supervised Natural Language Understanding | Journal | TASLP | |
Wangyou Zhang, Xuankai Chang, Yanmin Qian and Shinji Watanabe | Improving End-to-End Single-Channel Multi-Talker Speech Recognition | Journal | TASLP | |
Su Zhu, Zijian Zhao, Rao Ma and Kai Yu | Prior Knowledge Driven Label Embedding for Slot Filling in Natural Language Understanding | Journal | TASLP | |
Wangyou Zhang and Yanmin Qian | Learning Contextual Language Embeddings for Monaural Multi-Talker Speech Recognition | Conference | INTERSPEECH | |
Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Shinji Watanabe and Yanmin Qian | End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming | Conference | INTERSPEECH | |
Zhijun Liu, Kuan Chen and Kai Yu | Neural Homomorphic Vocoder | Conference | INTERSPEECH | |
Yefei Chen, Heinrich Dinkel, Mengyue Wu and Kai Yu | Voice activity detection in the wild via weakly supervised sound event detection | Conference | INTERSPEECH | |
Chen Liu, Su Zhu, Zijian Zhao, Ruisheng Cao, Lu Chen and Kai Yu | Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding | Conference | INTERSPEECH | |
Yizhou Lu, Mingkun Huang, Hao Li, Jiaqi Guo and Yanmin Qian | Bi-encoder Transformer Network for Mandarin-English Code-switching Speech Recognition using Mixture of Experts | Conference | INTERSPEECH | |
Zhengyang Chen, Shuai Wang and Yanmin Qian | Adversarial Domain Adaptation for Speaker Verification Using Partially Shared Network | Conference | INTERSPEECH | |
Zhengyang Chen, Shuai Wang and Yanmin Qian | Multi-Modality Matters: A Performance Leap on VoxCeleb | Conference | INTERSPEECH | |
Hongji Wang, Heinrich Dinkel, Shuai Wang, Yanmin Qian and Kai Yu | Dual-Adversarial Domain Adaptation for Generalized Replay Attack Detection | Conference | INTERSPEECH | |
Chenda Li and Yanmin Qian | Listen, Watch and Understand at the Cocktail Party: Audio-Visual-Contextual Speech Separation | Conference | INTERSPEECH | |
Zihan Zhao, Yuncong Liu, Lu Chen, Qi Liu, Rao Ma, and Kai Yu | An Investigation on Different Underlying Quantization Schemes for Pre-trained Language Models | Conference | NLPCC | |
Zihan Xu, Zhi Chen, Lu Chen, Su Zhu and Kai Yu | Memory Attention Neural Network For Multi-Domain Dialogue State Tracking | Conference | NLPCC | |
Chen Liu, Su Zhu, Lu Chen and Kai Yu | Robust Spoken Language Understanding with RL-Based Value Error Recovery | Conference | NLPCC | |
Xuenan Xu, Heinrich Dinkel, Mengyue Wu and Kai Yu | A CRNN-GRU Based Reinforcement Learning Approach to Audio Captioning | Conference | DCASE | |
Rui Qian, Di Hu, Heinrich Dinkel, Mengyue Wu, Ning Xu and Weiyao Lin | Multiple Sound Sources Localization from Coarse to Fine | Conference | ECCV | |
Rui Qian, Di Hu, Heinrich Dinkel, Mengyue Wu, Ning Xu and Weiyao Lin | A Two-Stage Framework for Multiple Sound-Source Localization | Conference | CVPR | |
Lu Chen, Yanbin Zhao, Boer Lv, Lesheng Jin, Zhi Chen, Su Zhu and Kai Yu | Neural Graph Matching Networks for Chinese Short Text Matching | Conference | ACL | |
Yanbin Zhao, Lu Chen, Zhi Chen, Ruisheng Cao, Su Zhu and Kai Yu | Line Graph Enhanced AMR-to-Text Generation with Mix-Order Graph Attention Networks | Conference | ACL | |
Ruisheng Cao, Su Zhu, Chenyu Yang, Chen Liu, Rao Ma, Yanbin Zhao, Lu Chen and Kai Yu | Unsupervised Dual Paraphrasing for Two-stage Semantic Parsing | Conference | ACL | |
Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux and Shinji Watanabe | End-To-End Multi-Speaker Speech Recognition With Transformer | Conference | ICASSP | |
Heinrich Dinkel and Kai Yu | Duration Robust Weakly Supervised Sound Event Detection | Conference | ICASSP | |
Yexin Yang, Shuai Wang, Xun Gong, Yanmin Qian and Kai Yu | Text Adaptation for Speaker Verification with Speaker-Text Factorized Embeddings | Conference | ICASSP | |
Chenda Li, and Yanmin Qian | Deep Audio-Visual Speech Separation with attention Mechanism | Conference | ICASSP | |
Zhengyang Chen, Shuai Wang, Yanmin Qian and Kai Yu | Channel Invariant Speaker Embedding Learning with Joint Multi-Task and Adversarial Training | Conference | ICASSP | |
Chenpeng Du, and Kai Yu | Speaker Augmentation for Low Resource Speech Recognition | Conference | ICASSP | |
Chenda Li, and Yanmin Qian | Deep Audio-Visual Speech Separation with attention Mechanism | Conference | ICASSP | |
Shuai Wang, Johan Rohdin, Lukáš Burget, Oldřich Plchot, Kai Yu and Jan Černocký | Investigation of Specaugment for Deep Speaker Embedding Learning | Conference | ICASSP | |
Federico Landini, Shuai Wang, Mireia Diez, Lukáš Burget, Pavel Matějka, Kateřina Žmolíková, Ladislav Mošner, Anna Silnova, Oldřich Plchot, Ondřej Novotný Hossein Zeinali and Johan Rohdin | BUT System for the Second Dihard Speech Diarization Challenge | Conference | ICASSP | |
Mireia Diez, Lukáš Burget, Federico Landini, Shuai Wang and Honza Černocký | Optimizing Bayesian Hmm Based X-Vector Clustering for the Second Dihard Speech Diarization Challenge | Conference | ICASSP | |
Rao Ma, Hao Li, Qi Liu, Lu Chen and Kai Yu | Neural Lattice Search for Speech Recognition | Conference | ICASSP | |
Rao Ma, Lesheng Jin, Qi Liu, Lu Chen and Kai Yu | Addressing the Polysemy Problem in Language Modeling with Attentional Multi-Sense Embeddings | Conference | ICASSP | |
Lu Chen, Boer Lv, Chi Wang, Su Zhu, Bowen Tan and Kai Yu | Schema-Guided Multi-Domain Dialogue State Tracking with Graph Attention Neural Networks | Conference | AAAI | |
Yanbin Zhao, Lu Chen, Zhi Chen and Kai Yu | Semi-Supervised Text Simplification with Back-Translation and Asymmetric Denoising Autoencoders | Conference | AAAI |
Yanmin Qian and Xu Xiang | Binary Neural Networks for Speech Recognition | Journal | FITEE | |
Yanmin Qian, Hu Hu and Tian Tan | Data Augmentation Using Generative Adversarial Networks for Robust Speech Recognition. Speech Communication, vol | Journal | SC | |
Lu Chen, Zhi Chen, Bowen Tan, Sishan Long, Milica Gasic and Kai Yu | AgentGraph: Toward Universal Dialogue Management With Structured Deep Reinforcement Learning. | Journal | TASLP | |
Shuai Wang, Zili Huang, Yanmin Qian and Kai Yu | Discriminative Neural Embedding Learning for Short-Duration Text-Independent Speaker Verification. | Journal | TASLP | |
Xu Xiang, Shuai Wang, Houjun Huang, Yanmin Qian and Kai Yu | Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition | Conference | APSIPA | |
Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux and Shinji Watanabe | MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition | Conference | ASRU | |
Wangyou Zhang, Man Sun, Lan Wang and Yanmin Qian | End-to-End Overlapped Speech Detection and Speaker Counting with Raw Waveform | Conference | ASRU | |
Mingkun Huang, Yizhou Lu, Lan Wang, Yanmin Qian and Kai Yu | Exploring Model Units and Training Strategies for End-to-End Speech Recognition | Conference | ASRU | |
Rao Ma, Qi Liu and Kai Yu | Highly Efficient Neural Network Language Model Compression Using Soft Binarization Training | Conference | ASRU | |
Peiyao Sheng, Zhuolin Yang and Yanmin Qian | GANs for Children: A Generative Data Augmentation Strategy for Children Speech Recognition | Conference | ASRU | |
Zijian Zhao, Su Zhu and Kai Yu | Data Augmentation with Atomic Templates for Spoken Language Understanding | Conference | EMNLP | |
Yefei Chen, Shuai Wang, Yanmin Qian and Kai Yu | End-to-End Speaker-Dependent Voice Activity Detection | Conference | NCMMSC | |
Hao Li, Zhehuai Chen, Qi Liu, Yanmin Qian and Kai Yu | OOV Words Extension for Modular Neural Acoustics-to-Word Model | Conference | NCMMSC | |
Hao Li, Chen Liu, Su Zhu and Kai Yu | Robust Spoken Language Understanding with Acoustic and Domain Knowledge | Conference | ICMI | |
Su Zhu, Zijian Zhao, Tiejun Zhao, Chengqing Zong and Kai Yu | CATSLU: The 1st Chinese Audio-Textual Spoken Language Understanding Challenge | Conference | ICMI | |
Zhehuai Chen, Mahaveer Jain, Yongqiang Wang, Michael Seltzer and Christian Fuegen | Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR | Conference | InterSpeech | |
Jiaqi Guo, Yongbin You, Yanmin Qian and Kai Yu | Joint Decoding of CTC Based Systems for Speech Recognition | Conference | InterSpeech | |
Chenda Li and Yanmin Qian | Prosody Usage Optimization for Children Speech Recognition with Zero Resource Children Speech | Conference | InterSpeech | |
Wangyou Zhang, Xuankai Chang and Yanmin Qian | Knowledge Distillation for End-to-End Monaural Multi-talker ASR System | Conference | InterSpeech | |
Wangyou Zhang, Ying Zhou and Yanmin Qian | Robust DOA Estimation Based on Convolutional Neural Network and Time-Frequency Masking | Conference | InterSpeech | |
Zhanghao Wu, Shuai Wang, Yanmin Qian and Kai Yu | Data Augmentation using Variational Autoencoder for Embedding based Speaker Verification | Conference | InterSpeech | |
Yexin Yang, Hongji Wang, Heinrich Dinkel, Zhengyang Chen, Shuai Wang, Yanmin Qian and Kai Yu | The SJTU Robust Anti-spoofing System for the ASVspoof 2019 Challenge | Conference | InterSpeech | |
Hongji Wang, Heinrich Dinkel, Shuai Wang, Yanmin Qian and Kai Yu | Cross-domain Replay Spoofing Attack Detection using Domain Adversarial Training | Conference | InterSpeech | |
Shuai Wang, Johan Rohdin, Lukáš Burget, Oldřich Plchot, Yanmin Qian, Kai Yu and Jan Černocký | On the Usage of Phonetic Information for Text-independent Speaker Embedding Extraction | Conference | InterSpeech | |
Mireia Diez, Lukáš Burget, Shuai Wang, Johan Rohdin and Jan Černocký | Bayesian HMM based x-vector clustering for Speaker Diarization | Conference | InterSpeech | |
Ruisheng Cao, Su Zhu, Chen Liu, Jieyu Li and Kai Yu | Semantic Parsing with Dual Learning | Conference | ACL | |
Mengyue Wu, Heinrich Dinkel and Kai Yu | Audio Caption: Listen and Tell | Conference | ICASSP | |
Zhehuai Chen, Mahaveer Jain, Yongqiang Wang, Michael Seltzer and Christian Fuegen | End-to-end Contextual Speech Recognition using Class Language Models and a Token Passing Decoder | Conference | ICASSP | |
Zijian Zhao, Su Zhu and Kai Yu | A Hierarchical Decoding Model for Spoken Language Understanding from Unaligned Data | Conference | ICASSP | |
Shuai Wang, Yexin Yang, Tianzhe Wang, Yanmin Qian and Kai Yu | Knowledge Distillation for Small Foot-print Deep Speaker Embedding | Conference | ICASSP | |
Xuankai Chang, Yanmin Qian, Kai Yu and Shinji Watanabe | End-to-end Monaural Multi-speaker ASR System without Pretraining | Conference | ICASSP |
Zhehuai Chen, Jasha Droppo, Jinyu Li, Wayne Xiong | Progressive Joint Modeling in Unsupervised Single-channel Overlapped Speech Recognition | Journal | TASLP | |
Tian Tan, Yanmin Qian, Hu Hu, Wen Ding, Ying Zhou, Kai Yu | Adaptive very deep convolutional residual network for noise robust speech recognition | Journal | TASLP | |
Kai Yu, Zijian Zhao, Xueyang Wu, Hongtao Lin and Xuan Liu | Rich Short Text Conversation Using Semantic Key Controlled Sequence Generation | Journal | TASLP | |
Xuan Liu, Di Cao and Kai Yu | Binarized LSTM Language Model | Conference | NAACL | |
Ying Zhou and Yanmin Qian | Robust Mask Estimation by Integrating Neural Network-Based and Clustering-Based Approaches for Adaptive Acoustic Beamforming | Conference | ICASSP | |
Lu Chen, Cheng Chang, Zhi Chen, Bowen Tan, Milica Gasic and Kai Yu | Policy Adaptation for Deep Reinforcement Learning-based Dialogue Management | Conference | ICASSP | |
Shuai Wang, Yanmin Qian and Kai Yu | Focal KL-Divergence based Dilated Convolutional Neural Networks for Co-channel Speaker Identification | Conference | ICASSP | |
Ruinian Chen and Kai Yu | Fast OOV Words Incorporation using Structured Word Embeddings for Neural Network Language Model | Conference | ICASSP | |
Hu Hu, Tian Tan and Yan min Qian | Generative Adversarial Networks based Data Augmentation for Noise Robust Speech Recognition | Conference | ICASSP | |
Wen Ding, Tian Tan and Yanmin Qian | Fast Adaptation on Deep Mixture Generative Network Based Acoustic Modeling | Conference | ICASSP | |
Xuankai Chang, Yanmin Qian and Dong Yu | Adaptive Permutation Invariant Training with Auxiliary Information for Monaural Multi-Talker Speech Recognition | Conference | ICASSP | |
Tian Tan, Yanmin Qian and Dong Yu | Knowledge Transfer in Permutation Invariant Training for Single-channel Multi-talker Speech Recognition | Conference | ICASSP | |
Zhehuai Chen, Qi Liu, Hao Li, Kai Yu | On Modular Training of Neural Acoustics-to-word Model for LVCSR | Conference | ICASSP | |
Zhehuai Chen, Jasha Droppo | Sequence Modeling in Unsupervised Single-channel Overlapped Speech Recognition | Conference | ICASSP | |
Su Zhu, Ouyu Lan, Kai Yu | Robust Spoken Language Understanding with Unsupervised ASR-error Adaptation | Conference | ICASSP | |
Ouyu Lan, Su Zhu, Kai Yu | Semi-Supervised Training Using Adversarial Multi-Task Learning for Spoken Language Understanding | Conference | ICASSP | |
Zili Huang, Shuai Wang and Yanmin Qian | Joint I-Vector with End-to-End System for Short Duration Text-Independent Speaker Verification | Conference | ICASSP |
Yanmin Qian, Nanxin Chen, Heinrich Dinkel and Zhizheng Wu | Deep Feature Engineering for Noise Robust Spoofing Detection | Journal | TASLP | |
Zhehuai Chen, Yimeng Zhuang, Yanmin Qian and Kai Yu | Phone Synchronous Speech Recognition with CTC Lattices | Journal | TASLP | |
Ruinian Chen, Ying Zhou and Yanmin Qian | Emotion Recognition Using Support Vector Machine and Deep Neural Network | Conference | NCMMSC | |
Cheng Chang, Huifeng Zhang, Zhangxuan Gu and Yanmin Qian | Fusion Model for Speech Emotion Recognition with Low Level Descriptor Features | Conference | NCMMSC | |
Yue Wu, Qi Liu, Kai Yu | The adaptive adjustment of learning rate is applied in the language model | Conference | NCMMSC | |
Xuan Liu, Kai Yu | GLEU-Guided Multi-resolution Network for Short Text Conversation | Conference | NCMMSC | |
Kaiyu Shi, Xuan Liu and Yanmin Qian | Speech Emotion Recognition Based on SVM and GMM-HMM Hybrid System | Conference | NCMMSC | |
Yue Wu, Tianxing He, Zhehuai Chen, Yanmin Qian, Kai Yu | Multi-view LSTM Language Model with Word-synchronized Auxiliary Feature for LVCSR | Conference | CCL | |
Zhehuai Chen, Yanmin Qian, and Kai Yu | A unified confidence measure framework using auxiliary normalization graph | Conference | IScIDE | |
Di Cao and Kai Yu | Deep Attentive Structured Language Model Based on LSTM | Conference | IScIDE | |
Xiaowei Jiang, Shuai Wang, Xu Xiang and Yanmin Qian | Integrating Online i-vector into GMM-UBM for Text-dependent Speaker Verification | Conference | APSIPA | |
Qi Liu, Yanmin Qian and Kai Yu | Future Vector Enhanced LSTM Language Model for LVCSR | Conference | ASRU | |
Lu Chen, Xiang Zhou, Cheng Chang, Runzhe Yang and Kai Yu | Agent-Aware Dropout DQN for Safe and Efficient On-line Dialogue Policy Learning | Conference | EMNLP | |
Cheng Chang, Runzhe Yang, Lu Chen, Xiang Zhou and Kai Yu | Affordable On-line Dialogue Policy Learning | Conference | EMNLP | |
Xu Xiang, Yanmin Qian and Kai Yu | Binary Deep Neural Networks for Speech Recognition | Conference | InterSpeech | |
Dong Yu, Xuankai Chang and Yanmin Qian | Recognizing Multi-Talker Speech with Permutation Invariant Trainin | Conference | InterSpeech | |
Bo Chen, Jiahao Lai and Kai Yu | Comparison of Modeling Target in LSTM-RNN Duration Model | Conference | InterSpeech | |
Bo Chen, Tianling Bian and Kai Yu | Discrete Duration Model For Speech Synthesis | Conference | InterSpeech | |
Shuai Wang, Yanmin Qian and Kai Yu | What Does the Speaker Embedding Encode? | Conference | InterSpeech | |
Heinrich Dinkel, Yanmin Qian, Kai Yu | Small-footprint convolutional neural network for spoofing detection | Conference | IJCNN | |
Lu Chen, Runzhe Yang, Cheng Chang, Zihao Ye, Xiang Zhou and Kai Yu | On-line Dialogue Policy Learning with Companion Teaching | Conference | EACL | |
Heinrich Dinkel, Nanxin Chen, Yanmin Qian and Kai Yu | End-To-End Spoofing Detection With Raw Waveform Cldnns | Conference | ICASSP | |
Su Zhu and Kai Yu | Encoder-decoder with Focus-mechanism for Sequence Labelling Based Spoken Language Understanding | Conference | ICASSP | |
Zhehuai Chen, Yimeng Zhuang and Kai Yu | Confidence Measures for CTC-based Phone Synchronous Decoding | Conference | ICASSP |