東華大學圖書館 |

Man-machine speech communication = 17th National Conference, NCMMSC 2022, Hefei, China, December 15-18, 2022 : proceedings /

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Man-machine speech communication/ edited by Ling Zhenhua ... [et al.].
其他題名:	17th National Conference, NCMMSC 2022, Hefei, China, December 15-18, 2022 : proceedings /
其他題名:	NCMMSC 2022
其他作者:	Zhenhua, Ling.
團體作者:	NCMMSC (Conference)
出版者:	Singapore :Springer Nature Singapore : : 2023.,
面頁冊數:	1 online resource (xi, 332 p.) :ill., digital ;24 cm.
內容註:	MCPN: A Multiple Cross-Perception Network for Real-Time Emotion Recognition in Conversation -- Baby Cry Recognition Based on Acoustic Segment Model -- A Multi-feature Sets Fusion Strategy with Similar Samples Removal for Snore Sound Classification -- Multi-Hypergraph Neural Networks for Emotion Recognition in Multi-Party Conversations -- Using Emoji as an Emotion Modality in Text-Based Depression Detection -- Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis -- Semantic enhancement framework for robust speech recognition -- Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model -- Predictive AutoEncoders are Context-Aware Unsupervised Anomalous Sound Detectors -- A pipelined framework with serialized output training for overlapping speech recognition -- Adversarial Training Based on Meta-Learning in Unseen Domains for Speaker Verification -- Multi-Speaker Multi-Style Speech Synthesis with Timbre and Style Disentanglement -- Multiple Confidence Gates for Joint Training of SE and ASR -- Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Linguistic Information Fusion -- Pre-training Techniques For Improving Text-to-Speech Synthesis By Automatic Speech Recognition Based Data Enhancement -- A Time-Frequency Attention Mechanism with Subsidiary Information for Effective Speech Emotion Recognition -- Interplay between prosody and syntax-semantics: Evidence from the prosodic features of Mandarin tag questions -- Improving Fine-grained Emotion Control and Transfer with Gated Emotion Representations in Speech Synthesis -- Violence Detection through Fusing Visual Information to Auditory Scene -- Mongolian Text-to-Speech Challenge under Low-Resource Scenario for NCMMSC2022 -- VC-AUG Voice Conversion based Data Augmentation for Text-Dependent Speaker Veriﬁcation -- Transformer-based potential emotional relation mining network for emotion recognition in conversation -- FastFoley Non-Autoregressive Foley Sound Generation Based On Visual Semantics -- Structured Hierarchical Dialogue Policy with Graph Neural Networks -- Deep Reinforcement Learning for On-line Dialogue State Tracking -- Dual Learning for Dialogue State Tracking -- Automatic Stress Annotation and Prediction For Expressive Mandarin TTS -- MnTTS2 An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset.
Contained By:	Springer Nature eBook
標題:	Computational linguistics - Congresses. -
電子資源:	https://doi.org/10.1007/978-981-99-2401-1
ISBN:	9789819924011

Man-machine speech communication = 17th National Conference, NCMMSC 2022, Hefei, China, December 15-18, 2022 : proceedings /
Man-machine speech communication17th National Conference, NCMMSC 2022, Hefei, China, December 15-18, 2022 : proceedings /[electronic resource] :NCMMSC 2022edited by Ling Zhenhua ... [et al.]. - Singapore :Springer Nature Singapore :2023. - 1 online resource (xi, 332 p.) :ill., digital ;24 cm. - Communications in computer and information science,17651865-0937 ;. - Communications in computer and information science ;1765..

MCPN: A Multiple Cross-Perception Network for Real-Time Emotion Recognition in Conversation -- Baby Cry Recognition Based on Acoustic Segment Model -- A Multi-feature Sets Fusion Strategy with Similar Samples Removal for Snore Sound Classification -- Multi-Hypergraph Neural Networks for Emotion Recognition in Multi-Party Conversations -- Using Emoji as an Emotion Modality in Text-Based Depression Detection -- Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis -- Semantic enhancement framework for robust speech recognition -- Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model -- Predictive AutoEncoders are Context-Aware Unsupervised Anomalous Sound Detectors -- A pipelined framework with serialized output training for overlapping speech recognition -- Adversarial Training Based on Meta-Learning in Unseen Domains for Speaker Verification -- Multi-Speaker Multi-Style Speech Synthesis with Timbre and Style Disentanglement -- Multiple Confidence Gates for Joint Training of SE and ASR -- Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Linguistic Information Fusion -- Pre-training Techniques For Improving Text-to-Speech Synthesis By Automatic Speech Recognition Based Data Enhancement -- A Time-Frequency Attention Mechanism with Subsidiary Information for Effective Speech Emotion Recognition -- Interplay between prosody and syntax-semantics: Evidence from the prosodic features of Mandarin tag questions -- Improving Fine-grained Emotion Control and Transfer with Gated Emotion Representations in Speech Synthesis -- Violence Detection through Fusing Visual Information to Auditory Scene -- Mongolian Text-to-Speech Challenge under Low-Resource Scenario for NCMMSC2022 -- VC-AUG Voice Conversion based Data Augmentation for Text-Dependent Speaker Veriﬁcation -- Transformer-based potential emotional relation mining network for emotion recognition in conversation -- FastFoley Non-Autoregressive Foley Sound Generation Based On Visual Semantics -- Structured Hierarchical Dialogue Policy with Graph Neural Networks -- Deep Reinforcement Learning for On-line Dialogue State Tracking -- Dual Learning for Dialogue State Tracking -- Automatic Stress Annotation and Prediction For Expressive Mandarin TTS -- MnTTS2 An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset.

This book constitutes the refereed proceedings of the 17th National Conference on Man-Machine Speech Communication, NCMMSC 2022, held in China, in December 2022. The 21 full papers and 7 short papers included in this book were carefully reviewed and selected from 108 submissions. They were organized in topical sections as follows: MCPN: A Multiple Cross-Perception Network for Real-Time Emotion Recognition in Conversation -- Baby Cry Recognition Based on Acoustic Segment Model, MnTTS2 An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset.

ISBN: 9789819924011

Standard No.: 10.1007/978-981-99-2401-1doiSubjects--Topical Terms:

582158
Computational linguistics
--Congresses.

LC Class. No.: QA76.9.N38

Dewey Class. No.: 006.35

Man-machine speech communication = 17th National Conference, NCMMSC 2022, Hefei, China, December 15-18, 2022 : proceedings /
LDR:04167nmm a2200349 a 4500 001 2318489
003 DE-He213
005 20230509085529.0
006 m d
007 cr nn 008maaau
008 230902s2023 si s 0 eng d
020 $a 9789819924011 $q (electronic bk.)
020 $a 9789819924004 $q (paper)
024 7 $a 10.1007/978-981-99-2401-1 $2 doi
035 $a 978-981-99-2401-1
040 $a GP $c GP
041 0 $a eng
050 4 $a QA76.9.N38
072 7 $a UYQV $2 bicssc
072 7 $a COM012000 $2 bisacsh
072 7 $a UYQV $2 thema
082 0 4 $a 006.35 $2 23
090 $a QA76.9.N38 $b N337 2022
111 2 $a NCMMSC (Conference) $n (17th : $d 2022 : $c Hefei Shi, China) $3 3633513
245 1 0 $a Man-machine speech communication $h [electronic resource] : $b 17th National Conference, NCMMSC 2022, Hefei, China, December 15-18, 2022 : proceedings / $c edited by Ling Zhenhua ... [et al.].
246 3 $a NCMMSC 2022
260 $a Singapore : $b Springer Nature Singapore : $b Imprint: Springer, $c 2023.
300 $a 1 online resource (xi, 332 p.) : $b ill., digital ; $c 24 cm.
490 1 $a Communications in computer and information science, $x 1865-0937 ; $v 1765
505 0 $a MCPN: A Multiple Cross-Perception Network for Real-Time Emotion Recognition in Conversation -- Baby Cry Recognition Based on Acoustic Segment Model -- A Multi-feature Sets Fusion Strategy with Similar Samples Removal for Snore Sound Classification -- Multi-Hypergraph Neural Networks for Emotion Recognition in Multi-Party Conversations -- Using Emoji as an Emotion Modality in Text-Based Depression Detection -- Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis -- Semantic enhancement framework for robust speech recognition -- Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model -- Predictive AutoEncoders are Context-Aware Unsupervised Anomalous Sound Detectors -- A pipelined framework with serialized output training for overlapping speech recognition -- Adversarial Training Based on Meta-Learning in Unseen Domains for Speaker Verification -- Multi-Speaker Multi-Style Speech Synthesis with Timbre and Style Disentanglement -- Multiple Confidence Gates for Joint Training of SE and ASR -- Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Linguistic Information Fusion -- Pre-training Techniques For Improving Text-to-Speech Synthesis By Automatic Speech Recognition Based Data Enhancement -- A Time-Frequency Attention Mechanism with Subsidiary Information for Effective Speech Emotion Recognition -- Interplay between prosody and syntax-semantics: Evidence from the prosodic features of Mandarin tag questions -- Improving Fine-grained Emotion Control and Transfer with Gated Emotion Representations in Speech Synthesis -- Violence Detection through Fusing Visual Information to Auditory Scene -- Mongolian Text-to-Speech Challenge under Low-Resource Scenario for NCMMSC2022 -- VC-AUG Voice Conversion based Data Augmentation for Text-Dependent Speaker Veriﬁcation -- Transformer-based potential emotional relation mining network for emotion recognition in conversation -- FastFoley Non-Autoregressive Foley Sound Generation Based On Visual Semantics -- Structured Hierarchical Dialogue Policy with Graph Neural Networks -- Deep Reinforcement Learning for On-line Dialogue State Tracking -- Dual Learning for Dialogue State Tracking -- Automatic Stress Annotation and Prediction For Expressive Mandarin TTS -- MnTTS2 An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset.
520 $a This book constitutes the refereed proceedings of the 17th National Conference on Man-Machine Speech Communication, NCMMSC 2022, held in China, in December 2022. The 21 full papers and 7 short papers included in this book were carefully reviewed and selected from 108 submissions. They were organized in topical sections as follows: MCPN: A Multiple Cross-Perception Network for Real-Time Emotion Recognition in Conversation -- Baby Cry Recognition Based on Acoustic Segment Model, MnTTS2 An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset.
650 0 $a Computational linguistics $x Congresses. $3 582158
650 0 $a Natural language processing (Computer science) $v Congresses. $3 752585
650 0 $a Human-computer interaction $x Congresses. $3 705966
650 1 4 $a Computer Vision. $3 3538524
650 2 4 $a Natural Language Processing (NLP) $3 3381674
650 2 4 $a Signal, Speech and Image Processing. $3 3592727
650 2 4 $a Artificial Intelligence. $3 769149
650 2 4 $a User Interfaces and Human Computer Interaction. $3 892554
700 1 $a Zhenhua, Ling. $3 3633514
710 2 $a SpringerLink (Online service) $3 836513
773 0 $t Springer Nature eBook
830 0 $a Communications in computer and information science ; $v 1765. $3 3633515
856 4 0 $u https://doi.org/10.1007/978-981-99-2401-1
950 $a Computer Science (SpringerNature-11645)