📃Papers
Publications are listed in reversed chronological order.
2024
- Advanced Long-Content Speech Recognition With Factorized Neural TransducerIEEE ACM Trans. Audio Speech Lang. Process., 2024
- EAT: Self-Supervised Pre-Training with Efficient Audio TransformerCoRR, 2024
- ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence ReorderingCoRR, 2024
- BAT: Learning to Reason about Spatial Sounds with Large Language ModelsCoRR, 2024
- An Embarrassingly Simple Approach for LLM with Strong ASR CapacityCoRR, 2024
- Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio CaptioningIEEE ACM Trans. Audio Speech Lang. Process., 2024
- Towards Weakly Supervised Text-to-Audio GroundingCoRR, 2024
- VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-SpeechCoRR, 2024
- ChemDFM: Dialogue Foundation Model for ChemistryCoRR, 2024
- MULTI: Multimodal Understanding Leaderboard with Text and ImagesCoRR, 2024
2023
- A Unified Framework From Face Image Restoration to Data Augmentation Using Generative PriorIEEE Access, 2023
- Human Pose Estimation with Combined Feature Maps and Joint EmbeddingsIn Proceedings of the 2023 International Conference on Advances in Artificial Intelligence and Applications, AAIA 2023, Wuhan, China, November 18-20, 2023 , 2023
- Assessing and Enhancing LLMs: A Physics and History Dataset and One-More-Check Pipeline MethodIn Neural Information Processing - 30th International Conference, ICONIP 2023, Changsha, China, November 20-23, 2023, Proceedings, Part XIII , 2023
- GAN Latent Space Manipulation Based Augmentation for Unbalanced Emotion DatasetsIn International Joint Conference on Neural Networks, IJCNN 2023, Gold Coast, Australia, June 18-23, 2023 , 2023
- LongFNT: Long-Form Speech Recognition with Factorized Neural TransducerIn IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
- Factorized AED: Factorized Attention-Based Encoder-Decoder for Text-Only Domain Adaptive ASRIn IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
- Exploring Binary Classification Loss for Speaker VerificationIn IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
- Improving Dino-Based Self-Supervised Speaker Verification with Progressive Cluster-Aware TrainingIn IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2023 - Workshops, Rhodes Island, Greece, June 4-10, 2023 , 2023
- Robust Audio-Visual ASR with Unified Cross-Modal AttentionIn IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
- Target Sound Extraction with Variable Cross-Modality CluesIn IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
- Predictive Skim: Contrastive Predictive Coding for Low-Latency Online Speech SeparationIn IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
- Multi-Speaker End-to-End Multi-Modal Speaker Diarization System for the MISP 2022 ChallengeIn IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
- Joint Discriminator and Transfer Based Fast Domain Adaptation For End-To-End Speech RecognitionIn IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
- Lowbit Neural Network Quantization for Speaker VerificationIn IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2023 - Workshops, Rhodes Island, Greece, June 4-10, 2023 , 2023
- Wespeaker: A Research and Production Oriented Speaker Embedding Learning ToolkitIn IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
- HuBERT-AGG: Aggregated Representation Distillation of Hidden-Unit Bert for Robust Speech RecognitionIn IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
- Light-Weight Visualvoice: Neural Network Quantization On Audio Visual Speech SeparationIn IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2023 - Workshops, Rhodes Island, Greece, June 4-10, 2023 , 2023
- Code-Switching Text Generation and Injection in Mandarin-English ASRIn IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
- Adaptive Large Margin Fine-Tuning For Robust Speaker VerificationIn IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
- ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text TranslationIn Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023 , 2023
- Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning RepresentationIn IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2023, New Paltz, NY, USA, October 22-25, 2023 , 2023
- Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing (espnet-v.202310) (Version 1)Oct 2023Accessed on YYYY-MM-DD.
- Self-Supervised Learning with Cluster-Aware-DINO for High-Performance Robust Speaker VerificationCoRR, Oct 2023
- Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker AttractorCoRR, Oct 2023
- Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASRCoRR, Oct 2023
- Weakly-Supervised Speech Pre-training: A Case Study on Target Speech RecognitionCoRR, Oct 2023
- Adapting Multi-Lingual ASR Models for Handling Multiple TalkersCoRR, Oct 2023
- InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion ModelsCoRR, Oct 2023
- Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding EnhancerCoRR, Oct 2023
- USED: Universal Speaker Extraction and DiarizationCoRR, Oct 2023
- Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker RecognitionCoRR, Oct 2023
- The second multi-channel multi-party meeting transcription challenge (M2MeT) 2.0): A benchmark for speaker-attributed ASRCoRR, Oct 2023
- Diffusion Conditional Expectation Model for Efficient and Robust Target Speech ExtractionCoRR, Oct 2023
- Toward Universal Speech Enhancement for Diverse Input ConditionsCoRR, Oct 2023
- One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language ModelsCoRR, Oct 2023
- FAT-HuBERT: Front-end Adaptive Training of Hidden-unit BERT for Distortion-Invariant Robust Speech RecognitionCoRR, Oct 2023
- Speaker Adaptive Text-to-Speech With Timbre-Normalized Vector-Quantized FeatureIEEE ACM Trans. Audio Speech Lang. Process., Oct 2023
- Fast-Hubert: an Efficient Training Framework for Self-Supervised Speech Representation LearningIn IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023, Taipei, Taiwan, December 16-20, 2023 , Oct 2023
- Improving Few-Shot Learning for Talking Face System with TTS Data AugmentationIn IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , Oct 2023
- Front-End Adapter: Adapting Front-End Input of Speech Based Self-Supervised Learning for Speech RecognitionIn IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , Oct 2023
- Emodiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label GuidanceIn IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , Oct 2023
- DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion AutoencoderIn Proceedings of the 31st ACM International Conference on Multimedia, MM 2023, Ottawa, ON, Canada, 29 October 2023- 3 November 2023 , Oct 2023
- Blank-regularized CTC for Frame Skipping in Neural TransducerCoRR, Oct 2023
- UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and VocodingCoRR, Oct 2023
- Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data AugmentationCoRR, Oct 2023
- Pushing the Limits of Unsupervised Unit Discovery for SSL Speech RepresentationCoRR, Oct 2023
- Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition SystemsCoRR, Oct 2023
- DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-SpeechCoRR, Oct 2023
- Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech RecognitionCoRR, Oct 2023
- VoiceFlow: Efficient Text-to-Speech with Rectified Flow MatchingCoRR, Oct 2023
- Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTSCoRR, Oct 2023
- Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural TransducerCoRR, Oct 2023
- Improved Factorized Neural Transducer Model For text-only Domain AdaptationCoRR, Oct 2023
- Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion RecognitionCoRR, Oct 2023
- Acoustic BPE for Speech Generation with Discrete TokensCoRR, Oct 2023
- Expressive TTS Driven by Natural Language Prompts Using Few Human AnnotationsCoRR, Oct 2023
- emotion2vec: Self-Supervised Pre-Training for Speech Emotion RepresentationCoRR, Oct 2023
- OPAL: Ontology-Aware Pretrained Language Model for End-to-End Task-Oriented DialogueTrans. Assoc. Comput. Linguistics, Oct 2023
- Transcribing Vocal Communications of Domestic Shiba lnu DogsIn Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023 , Oct 2023
- Detection of Multiple Mental Disorders from Social Media with Two-Stream Psychiatric ExpertsIn Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023 , Oct 2023
- Semantic Space Grounded Weighted Decoding for Multi-Attribute Controllable Dialogue GenerationIn Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023 , Oct 2023
- Diverse and Vivid Sound Generation from Text DescriptionsIn IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , Oct 2023
- Investigating Pooling Strategies and Loss Functions for Weakly-Supervised Text-to-Audio Grounding via Contrastive LearningIn IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2023 - Workshops, Rhodes Island, Greece, June 4-10, 2023 , Oct 2023
- BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic DataIn Proceedings of the 31st ACM International Conference on Multimedia, MM 2023, Ottawa, ON, Canada, 29 October 2023- 3 November 2023 , Oct 2023
- LLM-empowered Chatbots for Psychiatrist and Patient Simulation: Application and EvaluationCoRR, Oct 2023
- Enhance Temporal Relations in Audio Captioning with Sound Event DetectionCoRR, Oct 2023
- Improving Audio Caption Fluency with Automatic Error CorrectionCoRR, Oct 2023
- A Large-scale Dataset for Audio-Language Representation LearningCoRR, Oct 2023
- Does My Dog "Speak" Like Me? The Acoustic Correlation between Pet Dogs and Their Human OwnersCoRR, Oct 2023
- Towards Lexical Analysis of Dog Vocalizations via Online VideosCoRR, Oct 2023
- PsyEval: A Comprehensive Large Language Model Evaluation Benchmark for Mental HealthCoRR, Oct 2023
- A Heterogeneous Graph to Abstract Syntax Tree Framework for Text-to-SQLIEEE Trans. Pattern Anal. Mach. Intell., Oct 2023
- Speech Enhancement With Integration of Neural Homomorphic Synthesis and Spectral MaskingIEEE ACM Trans. Audio Speech Lang. Process., Oct 2023
- SPM: A Split-Parsing Method for Joint Multi-Intent Detection and Slot FillingIn Proceedings of the The 61st Annual Meeting of the Association for Computational Linguistics: Industry Track, ACL 2023, Toronto, Canada, July 9-14, 2023 , Oct 2023
- Exploring Schema Generalizability of Text-to-SQLIn Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023 , Oct 2023
- TeCS: A Dataset and Benchmark for Tense Consistency of Machine TranslationIn Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), ACL 2023, Toronto, Canada, July 9-14, 2023 , Oct 2023
- CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical DatasetIn Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023 , Oct 2023
- ACT-SQL: In-Context Learning for Text-to-SQL with Automatically-Generated Chain-of-ThoughtIn Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023 , Oct 2023
- Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 ChallengeIn IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , Oct 2023
- DiffVoice: Text-to-Speech with Latent DiffusionIn IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , Oct 2023
- Large Language Models Are Semi-Parametric Reinforcement Learning AgentsIn Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023 , Oct 2023
- Mobile-Env: A Universal Platform for Training and Evaluation of Mobile InteractionCoRR, Oct 2023
- SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific ResearchCoRR, Oct 2023
- ASTormer: An AST Structure-aware Transformer Decoder for Text-to-SQLCoRR, Oct 2023
- DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoderCoRR, Oct 2023
- SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross AttentionCoRR, Oct 2023
2022
- Heterogeneous Graph Representation for Knowledge TracingIn Neural Information Processing - 29th International Conference, ICONIP 2022, Virtual Event, November 22-26, 2022, Proceedings, Part I , Oct 2022
- A simple but practical method: How to improve the usage of entities in the Chinese question generationIn International Joint Conference on Neural Networks, IJCNN 2022, Padua, Italy, July 18-23, 2022 , Oct 2022
- From Uniform Models To Generic Representations: Stock Return Prediction With Pre-trainingIn International Joint Conference on Neural Networks, IJCNN 2022, Padua, Italy, July 18-23, 2022 , Oct 2022
- WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech ProcessingIEEE J. Sel. Top. Signal Process., Oct 2022
- Optimizing Data Usage for Low-Resource Speech RecognitionIEEE ACM Trans. Audio Speech Lang. Process., Oct 2022
- Dual-Path Modeling With Memory Embedding Model for Continuous Speech SeparationIEEE ACM Trans. Audio Speech Lang. Process., Oct 2022
- Layer-Wise Fast Adaptation for End-to-End Multi-Accent Speech RecognitionIEEE ACM Trans. Audio Speech Lang. Process., Oct 2022
- End-to-End Dereverberation, Beamforming, and Speech Recognition in a Cocktail PartyIEEE ACM Trans. Audio Speech Lang. Process., Oct 2022
- Time-Domain Audio-Visual Speech Separation on Low Quality VideosIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
- Skim: Skipping Memory Lstm for Low-Latency Real-Time Continuous Speech SeparationIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
- Large-Scale Self-Supervised Speech Representation Learning for Automatic Speaker VerificationIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
- Local Information Modeling with Self-Attention for Speaker VerificationIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
- Punctuation Prediction for Streaming On-Device Speech RecognitionIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
- MLP-SVNET: A Multi-Layer Perceptrons Based Network for Speaker VerificationIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
- Self-Knowledge Distillation via Feature Enhancement for Speaker VerificationIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
- Optimizing Alignment of Speech and Language Latent Spaces for End-To-End Speech Recognition and UnderstandingIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
- Exploring Effective Data Utilization for Low-Resource Speech RecognitionIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
- Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand ChallengeIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
- The Sjtu System For Multimodal Information Based Speech Processing Challenge 2021In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
- Attentive Feature Fusion for Robust Speaker VerificationIn Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
- Dual Path Embedding Learning for Speaker Verification with Triplet AttentionIn Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
- DF-ResNet: Boosting Speaker Verification Performance with Depth-First DesignIn Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
- Enroll-Aware Attentive Statistics Pooling for Target Speaker VerificationIn Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
- MSDWild: Multi-modal Speaker Diarization Dataset in the WildIn Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
- Knowledge Transfer and Distillation from Autoregressive to Non-Autoregessive Speech RecognitionIn Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
- Self-Supervised Speaker Verification Using Dynamic Loss-Gate and Label CorrectionIn Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
- Separating Long-Form Speech with Group-wise Permutation Invariant TrainingIn Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
- ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and UnderstandingIn Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
- Improving Speech Separation with Knowledge Distilled from Self-supervised Pre-trained ModelsIn 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022, Singapore, December 11-14, 2022 , Oct 2022
- Text-Informed Knowledge Distillation for Robust Speech Enhancement and RecognitionIn 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022, Singapore, December 11-14, 2022 , Oct 2022
- Medical Difficult Airway Detection using Speech TechnologyIn 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022, Singapore, December 11-14, 2022 , Oct 2022
- Speaking style compensation on synthetic audio for robust keyword spottingIn 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022, Singapore, December 11-14, 2022 , Oct 2022
- The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and BaselinesIn 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022, Singapore, December 11-14, 2022 , Oct 2022
- The X-Lance Speaker Diarization System for the Conversational Short-phrase Speaker Diarization Challenge 2022In 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022, Singapore, December 11-14, 2022 , Oct 2022
- End-to-End Multi-Speaker ASR with Independent Vector AnalysisIn IEEE Spoken Language Technology Workshop, SLT 2022, Doha, Qatar, January 9-12, 2023 , Oct 2022
- A Comprehensive Study on Self-Supervised Distillation for Speaker Representation LearningIn IEEE Spoken Language Technology Workshop, SLT 2022, Doha, Qatar, January 9-12, 2023 , Oct 2022
- The SJTU X-LANCE Lab System for CNSRC 2022CoRR, Oct 2022
- SJTU-AISPEECH System for VoxCeleb Speaker Recognition Challenge 2022CoRR, Oct 2022
- Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022CoRR, Oct 2022
- Factorized Neural Transducer for Efficient Language Model AdaptationIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
- VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic FeatureIn Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
- Internal Language Model Adaptation with Text-Only Data for End-to-End Speech RecognitionIn Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
- Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech RecognitionCoRR, Oct 2022
- MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple TargetsCoRR, Oct 2022
- EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label GuidanceCoRR, Oct 2022
- Exploring Effective Fusion Algorithms for Speech Based Self-Supervised Learning ModelsCoRR, Oct 2022
- D4: a Chinese Dialogue Dataset for Depression-Diagnosis-Oriented ChatIn Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022 , Oct 2022
- Symptom Identification for Interpretable Detection of Multiple Mental Disorders on Social MediaIn Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022 , Oct 2022
- Category-Adapted Sound Event Enhancement with Weakly Labeled DataIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
- Diversity-Controllable and Accurate Audio Captioning Based on Neural ConditionIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
- Can Audio Captions Be Evaluated With Image Caption Metrics?In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
- Navigating Audio-Visual Event Detection Across Mismatched ModalitiesIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
- Audio-Text Retrieval in ContextIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
- Climate and Weather: Inspecting Depression Detection via Emotion RecognitionIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
- Psychiatric Scale Guided Risky Post Screening for Early Detection of DepressionIn Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022 , Oct 2022
- A Comprehensive Survey of Automated Audio CaptioningCoRR, Oct 2022
- DialogZoo: Large-Scale Dialog-Oriented Task LearningCoRR, Oct 2022
- Data augmentation based non-parallel voice conversion with frame-level speaker disentanglerSpeech Commun., Oct 2022
- Phone-Level Prosody Modelling With GMM-Based MDN for Diverse and Controllable Speech SynthesisIEEE ACM Trans. Audio Speech Lang. Process., Oct 2022
- Neural Fusion for Voice CloningIEEE ACM Trans. Audio Speech Lang. Process., Oct 2022
- META-GUI: Towards Multi-modal Conversational Agents on Mobile GUIIn Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022 , Oct 2022
- AdapterShare: Task Correlation Modeling with Adapter DifferentiationIn Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022 , Oct 2022
- LatticeBART: Lattice-to-Lattice Pre-Training for Speech RecognitionIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
- Text Adaptive Detection for Customizable Keyword SpottingIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
- Unsupervised Word-Level Prosody Tagging for Controllable Speech SynthesisIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
- The AISP-SJTU Simultaneous Translation System for IWSLT 2022In Proceedings of the 19th International Conference on Spoken Language Translation, IWSLT@ACL 2022, Dublin, Ireland (in-person and online), May 26-27, 2022 , Oct 2022
- TIE: Topological Information Enhanced Structural Reading Comprehension on Web PagesIn Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022 , Oct 2022
- UniDU: Towards A Unified Generative Dialogue Understanding FrameworkIn Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL 2022, Edinburgh, UK, 07-09 September 2022 , Oct 2022
- The AISP-SJTU Translation System for WMT 2022In Proceedings of the Seventh Conference on Machine Translation, WMT 2022, Abu Dhabi, United Arab Emirates (Hybrid), December 7-8, 2022 , Oct 2022
2021
- Modified Magnitude-Phase Spectrum Information for Spoofing DetectionIEEE ACM Trans. Audio Speech Lang. Process., Oct 2021
- Audio-Visual Deep Neural Network for Robust Person VerificationIEEE ACM Trans. Audio Speech Lang. Process., Oct 2021
- Dual-Path Modeling for Long Recording Speech Separation in MeetingsIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
- Self-Supervised Learning Based Domain Adaptation for Robust Speaker VerificationIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
- SynAug: Synthesis-Based Data Augmentation for Text-Dependent Speaker VerificationIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
- Unit Selection Synthesis Based Data Augmentation for Fixed Phrase Speaker VerificationIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
- AISpeech-SJTU Accent Identification System for the Accented English Speech Recognition ChallengeIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
- AISpeech-SJTU ASR System for the Accented English Speech Recognition ChallengeIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
- Towards Data Selection on TTS Data for Children’s Speech RecognitionIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
- End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced FrontendIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
- The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and MethodsIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
- Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech SeparationIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
- Layer-Wise Fast Adaptation for End-to-End Multi-Accent Speech RecognitionIn Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021 , Oct 2021
- Knowledge Distillation from Multi-Modality to Single-Modality for Person VerificationIn Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021 , Oct 2021
- Basis-MelGAN: Efficient Neural Vocoder Based on Audio DecompositionIn Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021 , Oct 2021
- The SJTU System for Short-Duration Speaker Verification Challenge 2021In Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021 , Oct 2021
- Audio-Visual Multi-Talker Speech Recognition in a Cocktail PartyIn Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021 , Oct 2021
- Speaker Embedding Augmentation with Noise Distribution MatchingIn 12th International Symposium on Chinese Spoken Language Processing, ISCSLP 2021, Hong Kong, January 24-27, 2021 , Oct 2021
- Revisiting the Statistics Pooling Layer in Deep Speaker Embedding LearningIn 12th International Symposium on Chinese Spoken Language Processing, ISCSLP 2021, Hong Kong, January 24-27, 2021 , Oct 2021
- Data Augmentation for end-to-end Code-Switching Speech RecognitionIn IEEE Spoken Language Technology Workshop, SLT 2021, Shenzhen, China, January 19-22, 2021 , Oct 2021
- Dual-Path RNN for Long Recording Speech SeparationIn IEEE Spoken Language Technology Workshop, SLT 2021, Shenzhen, China, January 19-22, 2021 , Oct 2021
- Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation ConditionsIn IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2021, New Paltz, NY, USA, October 17-20, 2021 , Oct 2021
- Towards Duration Robust Weakly Supervised Sound Event DetectionIEEE ACM Trans. Audio Speech Lang. Process., Oct 2021
- Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student TrainingIEEE ACM Trans. Audio Speech Lang. Process., Oct 2021
- Building Interpretable Interaction Trees for Deep NLP ModelsIn Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021 , Oct 2021
- Decoupled Dialogue Modeling and Semantic Parsing for Multi-Turn Text-to-SQLIn Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021 , Oct 2021
- Enriching Ontology with Temporal Commonsense for Low-Resource Audio TaggingIn CIKM ’21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1 - 5, 2021 , Oct 2021
- Text-to-Audio Grounding: Building Correspondence Between Captions and Sound EventsIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
- Investigating Local and Global Information for Automated Audio Captioning with Transfer LearningIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
- A Lightweight Framework for Online Voice Activity Detection in the WildIn Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021 , Oct 2021
- Audio Caption in a Car Setting with a Sentence-Level LossIn 12th International Symposium on Chinese Spoken Language Processing, ISCSLP 2021, Hong Kong, January 24-27, 2021 , Oct 2021
- DEPA: Self-Supervised Audio Embedding for Depression DetectionIn MM ’21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021 , Oct 2021
- LET: Linguistic Knowledge Enhanced Graph Transformer for Chinese Short Text MatchingIn Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021 , Oct 2021
- LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local RelationsIn Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021 , Oct 2021
- WebSRC: A Dataset for Web-Based Structural Reading ComprehensionIn Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021 , Oct 2021
- Glyph Enhanced Chinese Character Pre-Training for Lexical Sememe PredictionIn Findings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 16-20 November, 2021 , Oct 2021
- Class-Based Neural Network Language Model for Second-Pass Rescoring in ASRIn Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021 , Oct 2021
- Rich Prosody Diversity Modelling with Phone-Level Mixture Density NetworkIn Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021 , Oct 2021
- ShadowGNN: Graph Projection Neural Network for Text-to-SQL ParserIn Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021 , Oct 2021
- Few-Shot NLU with Vector Projection Distance and Abstract Triangular CRFIn Natural Language Processing and Chinese Computing - 10th CCF International Conference, NLPCC 2021, Qingdao, China, October 13-17, 2021, Proceedings, Part I , Oct 2021
- Relation-Aware Multi-hop Reasoning forVisual DialogIn Natural Language Processing and Chinese Computing - 10th CCF International Conference, NLPCC 2021, Qingdao, China, October 13-17, 2021, Proceedings, Part I , Oct 2021
- Mixture Density Network for Phone-Level Prosody Modelling in Speech SynthesisCoRR, Oct 2021
- Diverse and Controllable Speech Synthesis with GMM-Based Phone-Level Prosody ModellingCoRR, Oct 2021
2020
- Improving End-to-End Single-Channel Multi-Talker Speech RecognitionIEEE ACM Trans. Audio Speech Lang. Process., Oct 2020
- Data Augmentation Using Deep Generative Models for Embedding Based Speaker RecognitionIEEE ACM Trans. Audio Speech Lang. Process., Oct 2020
- End-To-End Multi-Speaker Speech Recognition With TransformerIn 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
- Text Adaptation for Speaker Verification with Speaker-Text Factorized EmbeddingsIn 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
- Channel Invariant Speaker Embedding Learning with Joint Multi-Task and Adversarial TrainingIn 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
- Deep Audio-Visual Speech Separation with Attention MechanismIn 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
- Learning Contextual Language Embeddings for Monaural Multi-Talker Speech RecognitionIn Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
- End-to-End Far-Field Speech Recognition with Unified Dereverberation and BeamformingIn Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
- Dual-Adversarial Domain Adaptation for Generalized Replay Attack DetectionIn Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
- Listen, Watch and Understand at the Cocktail Party: Audio-Visual-Contextual Speech SeparationIn Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
- Multi-Modality Matters: A Performance Leap on VoxCelebIn Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
- Adversarial Domain Adaptation for Speaker Verification Using Partially Shared NetworkIn Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
- Bi-Encoder Transformer Network for Mandarin-English Code-Switching Speech Recognition Using Mixture of ExpertsIn Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
- End-to-End Speaker-Dependent Voice Activity DetectionCoRR, Oct 2020
- A CRNN-GRU Based Reinforcement Learning Approach to Audio CaptioningIn Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), Tokyo, Japan (full virtual), November 2-4, 2020 , Oct 2020
- Multiple Sound Sources Localization from Coarse to FineIn Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XX , Oct 2020
- Voice Activity Detection in the Wild via Weakly Supervised Sound Event DetectionIn Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
- GPVAD: Towards noise robust voice activity detection via weakly supervised sound event detectionCoRR, Oct 2020
- Interpreting Hierarchical Linguistic Interactions in DNNsCoRR, Oct 2020
- Towards a new generation of artificial intelligence in ChinaNat. Mach. Intell., Oct 2020
- Prior Knowledge Driven Label Embedding for Slot Filling in Natural Language UnderstandingIEEE ACM Trans. Audio Speech Lang. Process., Oct 2020
- Dual Learning for Semi-Supervised Natural Language UnderstandingIEEE ACM Trans. Audio Speech Lang. Process., Oct 2020
- Modular End-to-End Automatic Speech Recognition Framework for Acoustic-to-Word ModelIEEE ACM Trans. Audio Speech Lang. Process., Oct 2020
- Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue ManagementIEEE ACM Trans. Audio Speech Lang. Process., Oct 2020
- Neural Network Language Model Compression With Product Quantization and Soft BinarizationIEEE ACM Trans. Audio Speech Lang. Process., Oct 2020
- Schema-Guided Multi-Domain Dialogue State Tracking with Graph Attention Neural NetworksIn The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020 , Oct 2020
- Semi-Supervised Text Simplification with Back-Translation and Asymmetric Denoising AutoencodersIn The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020 , Oct 2020
- Line Graph Enhanced AMR-to-Text Generation with Mix-Order Graph Attention NetworksIn Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020 , Oct 2020
- Neural Graph Matching Networks for Chinese Short Text MatchingIn Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020 , Oct 2020
- Unsupervised Dual Paraphrasing for Two-stage Semantic ParsingIn Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020 , Oct 2020
- Efficient Context and Schema Fusion Networks for Multi-Domain Dialogue State TrackingIn Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020 , Oct 2020
- Duration Robust Weakly Supervised Sound Event DetectionIn 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
- Investigation of Specaugment for Deep Speaker Embedding LearningIn 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
- Speaker Augmentation for Low Resource Speech RecognitionIn 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
- Neural Lattice Search for Speech RecognitionIn 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
- A Hierarchical Tracker for Multi-Domain Dialogue State TrackingIn 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
- Addressing the Polysemy Problem in Language Modeling with Attentional Multi-Sense EmbeddingsIn 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
- CODA: Improving Resource Utilization by Slimming and Co-locating DNN and CPU JobsIn 40th IEEE International Conference on Distributed Computing Systems, ICDCS 2020, Singapore, November 29 - December 1, 2020 , Oct 2020
- Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language UnderstandingIn Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
- Memory Attention Neural Network for Multi-domain Dialogue State TrackingIn Natural Language Processing and Chinese Computing - 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, October 14-18, 2020, Proceedings, Part I , Oct 2020
- Robust Spoken Language Understanding with RL-Based Value Error RecoveryIn Natural Language Processing and Chinese Computing - 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, October 14-18, 2020, Proceedings, Part I , Oct 2020
- An Investigation on Different Underlying Quantization Schemes for Pre-trained Language ModelsIn Natural Language Processing and Chinese Computing - 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, October 14-18, 2020, Proceedings, Part I , Oct 2020
- An Investigation on Deep Learning with Beta StabilizerCoRR, Oct 2020
- Vector Projection Network for Few-shot Slot Tagging in Natural Language UnderstandingCoRR, Oct 2020
- Deep Reinforcement Learning for On-line Dialogue State TrackingCoRR, Oct 2020
- Structured Hierarchical Dialogue Policy with Graph Neural NetworksCoRR, Oct 2020
- Dual Learning for Dialogue State TrackingCoRR, Oct 2020
- CREDIT: Coarse-to-Fine Sequence Generation for Dialogue State TrackingCoRR, Oct 2020
2019
- Erratum to: Past review, current progress, and challenges ahead on the cocktail party problemFrontiers Inf. Technol. Electron. Eng., Oct 2019
- Binary neural networks for speech recognitionFrontiers Inf. Technol. Electron. Eng., Oct 2019
- Data augmentation using generative adversarial networks for robust speech recognitionSpeech Commun., Oct 2019
- Discriminative Neural Embedding Learning for Short-Duration Text-Independent Speaker VerificationIEEE ACM Trans. Audio Speech Lang. Process., Oct 2019
- Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker RecognitionIn 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019, Lanzhou, China, November 18-21, 2019 , Oct 2019
- GANs for Children: A Generative Data Augmentation Strategy for Children Speech RecognitionIn IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019, Singapore, December 14-18, 2019 , Oct 2019
- MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech RecognitionIn IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019, Singapore, December 14-18, 2019 , Oct 2019
- Exploring Model Units and Training Strategies for End-to-End Speech RecognitionIn IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019, Singapore, December 14-18, 2019 , Oct 2019
- End-to-End Overlapped Speech Detection and Speaker Counting with Raw WaveformIn IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019, Singapore, December 14-18, 2019 , Oct 2019
- Knowledge Distillation for Small Foot-print Deep Speaker EmbeddingIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, United Kingdom, May 12-17, 2019 , Oct 2019
- End-to-end Monaural Multi-speaker ASR System without PretrainingIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, United Kingdom, May 12-17, 2019 , Oct 2019
- The SJTU Robust Anti-Spoofing System for the ASVspoof 2019 ChallengeIn Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019 , Oct 2019
- On the Usage of Phonetic Information for Text-Independent Speaker Embedding ExtractionIn Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019 , Oct 2019
- Data Augmentation Using Variational Autoencoder for Embedding Based Speaker VerificationIn Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019 , Oct 2019
- Joint Decoding of CTC Based Systems for Speech RecognitionIn Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019 , Oct 2019
- Knowledge Distillation for End-to-End Monaural Multi-Talker ASR SystemIn Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019 , Oct 2019
- Robust DOA Estimation Based on Convolutional Neural Network and Time-Frequency MaskingIn Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019 , Oct 2019
- Cross-Domain Replay Spoofing Attack Detection Using Domain Adversarial TrainingIn Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019 , Oct 2019
- Prosody Usage Optimization for Children Speech Recognition with Zero Resource Children SpeechIn Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019 , Oct 2019
- Audio Caption: Listen and TellIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, United Kingdom, May 12-17, 2019 , Oct 2019
- Text-based Depression Detection: What Triggers An AlertCoRR, Oct 2019
- What does a Car-ssette tape tell?CoRR, Oct 2019
- AgentGraph: Toward Universal Dialogue Management With Structured Deep Reinforcement LearningIEEE ACM Trans. Audio Speech Lang. Process., Oct 2019
- Semantic Parsing with Dual LearningIn Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers , Oct 2019
- Highly Efficient Neural Network Language Model Compression Using Soft Binarization TrainingIn IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019, Singapore, December 14-18, 2019 , Oct 2019
- Data Augmentation with Atomic Templates for Spoken Language UnderstandingIn Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019 , Oct 2019
- A Hierarchical Decoding Model for Spoken Language Understanding from Unaligned DataIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, United Kingdom, May 12-17, 2019 , Oct 2019
- CATSLU: The 1st Chinese Audio-Textual Spoken Language Understanding ChallengeIn International Conference on Multimodal Interaction, ICMI 2019, Suzhou, China, October 14-18, 2019 , Oct 2019
- Robust Spoken Language Understanding with Acoustic and Domain KnowledgeIn International Conference on Multimodal Interaction, ICMI 2019, Suzhou, China, October 14-18, 2019 , Oct 2019
- Cross Aggregation of Multi-head Attention for Neural Machine TranslationIn Natural Language Processing and Chinese Computing - 8th CCF International Conference, NLPCC 2019, Dunhuang, China, October 9-14, 2019, Proceedings, Part I , Oct 2019
- International Conference on Multimodal Interaction, ICMI 2019, Suzhou, China, October 14-18, 2019Oct 2019
2018
- Past review, current progress, and challenges ahead on the cocktail party problemFrontiers Inf. Technol. Electron. Eng., Oct 2018
- Erratum to: Past review, current progress, and challenges ahead on the cocktail party problemFrontiers Inf. Technol. Electron. Eng., Oct 2018
- Sequence discriminative training for deep learning based acoustic keyword spottingSpeech Commun., Oct 2018
- Single-channel multi-talker speech recognition with permutation invariant trainingSpeech Commun., Oct 2018
- Adaptive Very Deep Convolutional Residual Network for Noise Robust Speech RecognitionIEEE ACM Trans. Audio Speech Lang. Process., Oct 2018
- Investigating Raw Wave Deep Neural Networks for End-to-End Speaker Spoofing DetectionIEEE ACM Trans. Audio Speech Lang. Process., Oct 2018
- Robust Mask Estimation By Integrating Neural Network-Based and Clustering-Based Approaches for Adaptive Acoustic BeamformingIn 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
- Knowledge Transfer in Permutation Invariant Training for Single-Channel Multi-Talker Speech RecognitionIn 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
- Joint I-Vector with End-to-End System for Short Duration Text-Independent Speaker VerificationIn 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
- Generative Adversarial Networks Based Data Augmentation for Noise Robust Speech RecognitionIn 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
- Focal Kl-Divergence Based Dilated Convolutional Neural Networks for Co-Channel Speaker IdentificationIn 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
- Noise Robust Speech Recognition on Aurora4 by Humans and MachinesIn 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
- Fast Adaptation on Deepmixture Generative Network Based Acoustic ModelingIn 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
- Adaptive Permutation Invariant Training with Auxiliary Information for Monaural Multi-Talker Speech RecognitionIn 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
- Permutation Invariant Training of Generative Adversarial Network for Monaural Speech SeparationIn Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018 , Oct 2018
- Deep Extractor Network for Target Speaker Recovery from Single Channel Speech MixturesIn Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018 , Oct 2018
- Monaural Multi-Talker Speech Recognition with Attention Mechanism and Gated Convolutional NetworksIn Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018 , Oct 2018
- Knowledge Distillation for Sequence ModelIn Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018 , Oct 2018
- Covariance Based Deep Feature for Text-Dependent Speaker VerificationIn Intelligence Science and Big Data Engineering - 8th International Conference, IScIDE 2018, Lanzhou, China, August 18-19, 2018, Revised Selected Papers , Oct 2018
- Data Augmentation using Conditional Generative Adversarial Networks for Robust Speech RecognitionIn 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018, Taipei City, Taiwan, November 26-29, 2018 , Oct 2018
- Deep Discriminant Analysis for i-vector Based Robust Speaker RecognitionIn 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018, Taipei City, Taiwan, November 26-29, 2018 , Oct 2018
- Generative Adversarial Networks based X-vector Augmentation for Robust Probabilistic Linear Discriminant Analysis in Speaker VerificationIn 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018, Taipei City, Taiwan, November 26-29, 2018 , Oct 2018
- Rich Short Text Conversation Using Semantic-Key-Controlled Sequence GenerationIEEE ACM Trans. Audio Speech Lang. Process., Oct 2018
- Structured Dialogue Policy with Graph Neural NetworksIn Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018 , Oct 2018
- Towards Universal Dialogue State TrackingIn Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018 , Oct 2018
- On Modular Training of Neural Acoustics-to-Word Model for LVCSRIn 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
- Semi-Supervised Training Using Adversarial Multi-Task Learning for Spoken Language UnderstandingIn 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
- Policy Adaptation for Deep Reinforcement Learning-Based Dialogue ManagementIn 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
- Robust Spoken Language Understanding with Unsupervised ASR-Error AdaptationIn 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
- MLN: Moment localization Network and Samples Selection for Moment RetrievalIn Proceedings of the 2nd International Conference on Video and Image Processing, ICVIP 2018, Hong Kong, China, December 29-31, 2018 , Oct 2018
- Angular Softmax for Short-Duration Text-independent Speaker VerificationIn Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018 , Oct 2018
- Joint Spoken Language Understanding and Domain Adaptive Language ModelingIn Intelligence Science and Big Data Engineering - 8th International Conference, IScIDE 2018, Lanzhou, China, August 18-19, 2018, Revised Selected Papers , Oct 2018
- Binarized LSTM Language ModelIn Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers) , Oct 2018
- Cost-Sensitive Active Learning for Dialogue State TrackingIn Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, Melbourne, Australia, July 12-14, 2018 , Oct 2018
- Concept Transfer Learning for Adaptive Language UnderstandingIn Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, Melbourne, Australia, July 12-14, 2018 , Oct 2018
- Intelligence Science and Big Data Engineering - 8th International Conference, IScIDE 2018, Lanzhou, China, August 18-19, 2018, Revised Selected PapersOct 2018
2017
- Phone Synchronous Speech Recognition With CTC LatticesIEEE ACM Trans. Audio Speech Lang. Process., Oct 2017
- Deep Feature Engineering for Noise Robust Spoofing DetectionIEEE ACM Trans. Audio Speech Lang. Process., Oct 2017
- Integrating online i-vector into GMM-UBM for text-dependent speaker verificationIn 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017, Kuala Lumpur, Malaysia, December 12-15, 2017 , Oct 2017
- Future vector enhanced LSTM language model for LVCSRIn 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017, Okinawa, Japan, December 16-20, 2017 , Oct 2017
- Multi-view LSTM Language Model with Word-Synchronized Auxiliary Feature for LVCSRIn Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data - 16th China National Conference, CCL 2017, - and - 5th International Symposium, NLP-NABD 2017, Nanjing, China, October 13-15, 2017, Proceedings , Oct 2017
- End-to-end spoofing detection with raw waveform CLDNNSIn 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017 , Oct 2017
- Small-footprint convolutional neural network for spoofing detectionIn 2017 International Joint Conference on Neural Networks, IJCNN 2017, Anchorage, AK, USA, May 14-19, 2017 , Oct 2017
- Binary Deep Neural Networks for Speech RecognitionIn Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017 , Oct 2017
- What Does the Speaker Embedding Encode?In Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017 , Oct 2017
- Recognizing Multi-Talker Speech with Permutation Invariant TrainingIn Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017 , Oct 2017
- A Unified Confidence Measure Framework Using Auxiliary Normalization GraphIn Intelligence Science and Big Data Engineering - 7th International Conference, IScIDE 2017, Dalian, China, September 22-23, 2017, Proceedings , Oct 2017
- Adaptation of Deep Neural Network Acoustic Models for Robust Automatic Speech RecognitionIn New Era for Robust Speech Recognition, Exploiting Deep Learning , Oct 2017
- On-line Dialogue Policy Learning with Companion TeachingIn Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, April 3-7, 2017, Volume 2: Short Papers , Oct 2017
- Affordable On-line Dialogue Policy LearningIn Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017 , Oct 2017
- Agent-Aware Dropout DQN for Safe and Efficient On-line Dialogue Policy LearningIn Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017 , Oct 2017
- Confidence measures for CTC-based phone synchronous decodingIn 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017 , Oct 2017
- Encoder-decoder with focus-mechanism for sequence labelling based spoken language understandingIn 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017 , Oct 2017
- Discrete Duration Model for Speech SynthesisIn Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017 , Oct 2017
- Deep Attentive Structured Language Model Based on LSTMIn Intelligence Science and Big Data Engineering - 7th International Conference, IScIDE 2017, Dalian, China, September 22-23, 2017, Proceedings , Oct 2017
- splab at the NTCIR-13 STC-2 TaskIn The 13th NTCIR Conference, Evaluation of Information Access Technologies, National Center of Sciences, Tokyo, Japan, December 5-8, 2017 , Oct 2017
2016
- Deep features for automatic spoofing detectionSpeech Commun., Oct 2016
- Cluster Adaptive Training for Deep Neural Network Based Acoustic ModelIEEE ACM Trans. Audio Speech Lang. Process., Oct 2016
- Neural Network Based Multi-Factor Aware Joint Training for Robust Speech RecognitionIEEE ACM Trans. Audio Speech Lang. Process., Oct 2016
- Very Deep Convolutional Neural Networks for Noise Robust Speech RecognitionIEEE ACM Trans. Audio Speech Lang. Process., Oct 2016
- Overview of BTAS 2016 speaker anti-spoofing competitionIn 8th IEEE International Conference on Biometrics Theory, Applications and Systems, BTAS 2016, Niagara Falls, NY, USA, September 6-9, 2016 , Oct 2016
- Joint acoustic factor learning for robust deep neural network based automatic speech recognitionIn 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016 , Oct 2016
- Speaker-aware training of LSTM-RNNS for acoustic modellingIn 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016 , Oct 2016
- Improved DNN-based segmentation for multi-genre broadcast audioIn 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016 , Oct 2016
- An investigation into using parallel data for far-field speech recognitionIn 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016 , Oct 2016
- Integrated adaptation with multi-factor joint-learning for far-field speech recognitionIn 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016 , Oct 2016
- Unrestricted Vocabulary Keyword Spotting Using LSTM-CTCIn Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 8-12, 2016 , Oct 2016
- Multi-task joint-learning for robust voice activity detectionIn 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016, Tianjin, China, October 17-20, 2016 , Oct 2016
- Very deep convolutional neural networks for robust speech recognitionIn 2016 IEEE Spoken Language Technology Workshop, SLT 2016, San Diego, CA, USA, December 13-16, 2016 , Oct 2016
- Evolvable dialogue state tracking for statistical dialogue managementFrontiers Comput. Sci., Oct 2016
- Discriminatively trained joint speaker and environment representations for adaptation of deep neural network acoustic modelsIn 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016 , Oct 2016
- A comparative study of robustness of deep learning approaches for VADIn 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016 , Oct 2016
- Phone Synchronous Decoding with CTC LatticeIn Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 8-12, 2016 , Oct 2016
- Hybrid Dialogue State Tracking for Real World Human-to-Human DialoguesIn Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 8-12, 2016 , Oct 2016
- On training bi-directional neural network language model with noise contrastive estimationIn 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016, Tianjin, China, October 17-20, 2016 , Oct 2016
- Rich punctuations prediction using large-scale deep learningIn 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016, Tianjin, China, October 17-20, 2016 , Oct 2016
- Directed automatic speech transcription error correction using bidirectional LSTMIn 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016, Tianjin, China, October 17-20, 2016 , Oct 2016
- The splab at the NTCIR-12 Short Text Conversation TaskIn Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, National Center of Sciences, Tokyo, Japan, June 7-10, 2016 , Oct 2016
2015
- Deep feature for text-dependent speaker verificationSpeech Commun., Oct 2015
- Multi-task joint-learning of deep neural networks for robust speech recognitionIn 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, AZ, USA, December 13-17, 2015 , Oct 2015
- Cambridge university transcription systems for the multi-genre broadcast challengeIn 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, AZ, USA, December 13-17, 2015 , Oct 2015
- The development of the cambridge university alignment systems for the multi-genre broadcast challengeIn 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, AZ, USA, December 13-17, 2015 , Oct 2015
- Speaker diarisation and longitudinal linking in multi-genre broadcast dataIn 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, AZ, USA, December 13-17, 2015 , Oct 2015
- Local trajectory based speech enhancement for robust speech recognition with deep neural networkIn IEEE China Summit and International Conference on Signal and Information Processing, ChinaSIP 2015, Chengdu, China, July 12-15, 2015 , Oct 2015
- An investigation on DNN-derived bottleneck features for GMM-HMM based robust speech recognitionIn IEEE China Summit and International Conference on Signal and Information Processing, ChinaSIP 2015, Chengdu, China, July 12-15, 2015 , Oct 2015
- Cluster adaptive training for deep neural networkIn 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19-24, 2015 , Oct 2015
- A novel static parameter calculation method for model compensationIn 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19-24, 2015 , Oct 2015
- Recurrent neural network language model with structured word embeddings for speech recognitionIn 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19-24, 2015 , Oct 2015
- Automatic model redundancy reduction for fast back-propagation for deep neural networks in speech recognitionIn 2015 International Joint Conference on Neural Networks, IJCNN 2015, Killarney, Ireland, July 12-17, 2015 , Oct 2015
- Multi-task learning for text-dependent speaker verificationIn INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015 , Oct 2015
- Robust deep feature for spoofing detection - the SJTU system for ASVspoof 2015 challengeIn INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015 , Oct 2015
- Very deep convolutional neural networks for LVCSRIn INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015 , Oct 2015
- Paragraph vector based topic model for language model adaptationIn INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015 , Oct 2015
- Constrained Markov Bayesian Polynomial for Efficient Dialogue State TrackingIEEE ACM Trans. Audio Speech Lang. Process., Oct 2015
- An investigation of context clustering for statistical speech synthesis with deep neural networkIn INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015 , Oct 2015
- Recurrent Polynomial Network for Dialogue State Tracking with Mismatched Semantic ParsersIn Proceedings of the SIGDIAL 2015 Conference, The 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2-4 September 2015, Prague, Czech Republic , Oct 2015
- Hyper-parameter Optimisation of Gaussian Process Reinforcement Learning for Statistical Dialogue ManagementIn Proceedings of the SIGDIAL 2015 Conference, The 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2-4 September 2015, Prague, Czech Republic , Oct 2015
2014
- Stochastic data sweeping for fast DNN trainingIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4-9, 2014 , Oct 2014
- Reshaping deep neural network for fast decoding by node-pruningIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4-9, 2014 , Oct 2014
- Second order vector taylor series based robust speech recognitionIn IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4-9, 2014 , Oct 2014
- Speaker verification with deep featuresIn 2014 International Joint Conference on Neural Networks, IJCNN 2014, Beijing, China, July 6-11, 2014 , Oct 2014
- Tandem deep features for text-dependent speaker verificationIn INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014 , Oct 2014
- A novel dynamic parameters calculation approach for model compensationIn INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014 , Oct 2014
- Acoustic emotion recognition using deep neural networkIn The 9th International Symposium on Chinese Spoken Language Processing, Singapore, September 12-14, 2014 , Oct 2014
- The SJTU System for Dialog State Tracking Challenge 2In Proceedings of the SIGDIAL 2014 Conference, The 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 18-20 June 2014, Philadelphia, PA, USA , Oct 2014
- A generalized rule based tracker for dialogue state trackingIn 2014 IEEE Spoken Language Technology Workshop, SLT 2014, South Lake Tahoe, NV, USA, December 7-10, 2014 , Oct 2014
- Semantic parser enhancement for dialogue domain extension with little dataIn 2014 IEEE Spoken Language Technology Workshop, SLT 2014, South Lake Tahoe, NV, USA, December 7-10, 2014 , Oct 2014
2013
- Combination of data borrowing strategies for low-resource LVCSRIn 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, Czech Republic, December 8-12, 2013 , Oct 2013
- MLP-HMM two-stage unsupervised training for low-resource languages on conversational telephone speech recognitionIn INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25-29, 2013 , Oct 2013
- A New Word Language Model Evaluation Metric for Character Based LanguagesIn Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data - 12th China National Conference, CCL 2013 and First International Symposium, NLP-NABD 2013, Suzhou, China, October 10-12, 2013. Proceedings , Oct 2013
2012
- Introduction to the Issue on Advances in Spoken Dialogue Systems and Mobile InterfaceIEEE J. Sel. Top. Signal Process., Oct 2012
- ICMI’12 grand challenge: haptic voice recognitionIn International Conference on Multimodal Interaction, ICMI ’12, Santa Monica, CA, USA, October 22-26, 2012 , Oct 2012
- Development of the 2012 SJTU HVR systemIn International Conference on Multimodal Interaction, ICMI ’12, Santa Monica, CA, USA, October 22-26, 2012 , Oct 2012