語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
FindBook
Google Book
Amazon
博客來
Time-Domain Deep Neural Networks for Speech Separation.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Time-Domain Deep Neural Networks for Speech Separation./
作者:
Sun, Tao.
面頁冊數:
1 online resource (102 pages)
附註:
Source: Dissertations Abstracts International, Volume: 84-11, Section: B.
Contained By:
Dissertations Abstracts International84-11B.
標題:
Computer science. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30509742click for full text (PQDT)
ISBN:
9798379445409
Time-Domain Deep Neural Networks for Speech Separation.
Sun, Tao.
Time-Domain Deep Neural Networks for Speech Separation.
- 1 online resource (102 pages)
Source: Dissertations Abstracts International, Volume: 84-11, Section: B.
Thesis (Ph.D.)--Ohio University, 2022.
Includes bibliographical references
Speech separation separates the speech of interest from background noise (speech enhancement) or interfering speech (speaker separation). While the human auditory system has extraordinary speech separation capabilities, designing artificial models with similar functions has proven to be very challenging. Recently, waveform deep neural network (DNN) has become the dominant approach for speech separation with great success.Improving speech quality and intelligibility is a primary goal for the speech separation tasks. Integrating human speech elements into waveform DNNs has proven to be a simple yet effective strategy to boost objective performance (including speech quality and intelligibility) of speech separation models. In this dissertation, three solutions are proposed to integrate human speech elements into waveform speech separation solutions in an effective manner.First, we propose a knowledge-assisted framework to integrate pretrained self-supervised speech representations to boost the performance of speech enhancement networks. To enhance the output intelligibility, we design auxiliary perceptual loss functions that rely on speech representations pretrained on large datasets, to ensure the denoised network outputs sound like clean human speeches. Our second solution is for speaker separation, where we design a speaker-conditioned model that adopts a pretrained speaker identification model to generate speaker embeddings with rich speech information. Our third solution takes a different approach to improve speaker separation solutions. To suppress information of non-target speakers in auxiliary-loss based solutions, we introduce a loss function that can maximize the distance between speech representations of separated speeches and speeches of clean non-target speakers.In this dissertation, we also address a practical issue in frame-based DNN SE solution: frame stitching, where the input context to be observed in a network is often limited, resulting in boundary discontinuities in network outputs. We use recurrent neural network (RNN) to connect depthwise fully convolution networks (FCNs), allowing temporal information to be propagated along the networks on individual frames. Our FCN + RNN model demonstrates excellent smoothing effect on short frames, enabling speech enhancement systems with very short delays.
Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2023
Mode of access: World Wide Web
ISBN: 9798379445409Subjects--Topical Terms:
523869
Computer science.
Subjects--Index Terms:
Speech separationIndex Terms--Genre/Form:
542853
Electronic books.
Time-Domain Deep Neural Networks for Speech Separation.
LDR
:03790nmm a2200421K 4500
001
2360775
005
20231015184520.5
006
m o d
007
cr mn ---uuuuu
008
241011s2022 xx obm 000 0 eng d
020
$a
9798379445409
035
$a
(MiAaPQ)AAI30509742
035
$a
(MiAaPQ)OhioLINKohiou1647344440927022
035
$a
AAI30509742
040
$a
MiAaPQ
$b
eng
$c
MiAaPQ
$d
NTU
100
1
$a
Sun, Tao.
$3
1677649
245
1 0
$a
Time-Domain Deep Neural Networks for Speech Separation.
264
0
$c
2022
300
$a
1 online resource (102 pages)
336
$a
text
$b
txt
$2
rdacontent
337
$a
computer
$b
c
$2
rdamedia
338
$a
online resource
$b
cr
$2
rdacarrier
500
$a
Source: Dissertations Abstracts International, Volume: 84-11, Section: B.
500
$a
Advisor: Liu, Jundong.
502
$a
Thesis (Ph.D.)--Ohio University, 2022.
504
$a
Includes bibliographical references
520
$a
Speech separation separates the speech of interest from background noise (speech enhancement) or interfering speech (speaker separation). While the human auditory system has extraordinary speech separation capabilities, designing artificial models with similar functions has proven to be very challenging. Recently, waveform deep neural network (DNN) has become the dominant approach for speech separation with great success.Improving speech quality and intelligibility is a primary goal for the speech separation tasks. Integrating human speech elements into waveform DNNs has proven to be a simple yet effective strategy to boost objective performance (including speech quality and intelligibility) of speech separation models. In this dissertation, three solutions are proposed to integrate human speech elements into waveform speech separation solutions in an effective manner.First, we propose a knowledge-assisted framework to integrate pretrained self-supervised speech representations to boost the performance of speech enhancement networks. To enhance the output intelligibility, we design auxiliary perceptual loss functions that rely on speech representations pretrained on large datasets, to ensure the denoised network outputs sound like clean human speeches. Our second solution is for speaker separation, where we design a speaker-conditioned model that adopts a pretrained speaker identification model to generate speaker embeddings with rich speech information. Our third solution takes a different approach to improve speaker separation solutions. To suppress information of non-target speakers in auxiliary-loss based solutions, we introduce a loss function that can maximize the distance between speech representations of separated speeches and speeches of clean non-target speakers.In this dissertation, we also address a practical issue in frame-based DNN SE solution: frame stitching, where the input context to be observed in a network is often limited, resulting in boundary discontinuities in network outputs. We use recurrent neural network (RNN) to connect depthwise fully convolution networks (FCNs), allowing temporal information to be propagated along the networks on individual frames. Our FCN + RNN model demonstrates excellent smoothing effect on short frames, enabling speech enhancement systems with very short delays.
533
$a
Electronic reproduction.
$b
Ann Arbor, Mich. :
$c
ProQuest,
$d
2023
538
$a
Mode of access: World Wide Web
650
4
$a
Computer science.
$3
523869
650
4
$a
Information technology.
$3
532993
650
4
$a
Electrical engineering.
$3
649834
653
$a
Speech separation
653
$a
Deep neural networks
653
$a
Self-supervised learning
653
$a
Speech enhancement
653
$a
Speaker separation
655
7
$a
Electronic books.
$2
lcsh
$3
542853
690
$a
0984
690
$a
0800
690
$a
0489
690
$a
0544
710
2
$a
ProQuest Information and Learning Co.
$3
783688
710
2
$a
Ohio University.
$b
Electrical Engineering & Computer Science (Engineering and Technology).
$3
3281613
773
0
$t
Dissertations Abstracts International
$g
84-11B.
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30509742
$z
click for full text (PQDT)
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9483131
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入