語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
FindBook
Google Book
Amazon
博客來
Reconstruction of Human Faces from Voice.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Reconstruction of Human Faces from Voice./
作者:
Wen, Yandong.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2022,
面頁冊數:
117 p.
附註:
Source: Dissertations Abstracts International, Volume: 83-11, Section: B.
Contained By:
Dissertations Abstracts International83-11B.
標題:
Artificial intelligence. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=29162698
ISBN:
9798438749660
Reconstruction of Human Faces from Voice.
Wen, Yandong.
Reconstruction of Human Faces from Voice.
- Ann Arbor : ProQuest Dissertations & Theses, 2022 - 117 p.
Source: Dissertations Abstracts International, Volume: 83-11, Section: B.
Thesis (Ph.D.)--Carnegie Mellon University, 2022.
This item must not be sold to any third party vendors.
Voices and faces play pivotal roles in our social interactions. Despite their different physical manifestations, voices and faces contain highly similar types of information, including linguistic information (phonemes for voice and viseme for faces), affective state, and identity characteristics (weight, gender, age, etc.). For this reason, the associations between voices and faces have gathered significant research interest in psychology, cognitive science, artificial intelligence, and many other fields.In this thesis, we attempt to explore the identity associations between voices and faces by developing computational mechanisms for reconstructing faces from voices. More specifically, the task is designed to answer the question: Given an unheard audio clip spoken by an unseen person, can we algorithmically picture a face that has as many associations as possible with the speaker, in terms of identity?The link between voice and face has been established from many perspectives. Direct relationships include the effect of the underlying skeletal and articulator structure of the face and the tissue covering them, all of which govern the shapes, sizes, and acoustic properties of the vocal tract that produces the voice. Less directly, the same genetic, physical, and environmental influences that affect the development of the face also affect the voice. Given these demonstrable dependencies, it is reasonable to hypothesize that it may be possible to reconstruct faces from voice signals algorithmically. Our hypothesis is that if any facial parameter influences the speaker's voice, its effects on the voice must be discoverable by a properly designed computational model.This thesis presents how we approach the goal of generating faces from voices in three stages. First, we consider the cross-modal matching problem: given a voice recording, one must select the speaker's face from a gallery of face images. To this end, we propose disjoint mapping networks to learn representations of voices and faces in a shared space, such that their representations can be compared to one another. The results of matching empirically demonstrate the possibility of disambiguating faces from the voice. Second, we address the problem of reconstructing 2D face images from voices. We propose a simple but effective computational framework based on generative adversarial networks (GANs). The generated face images are visually plausible and have identity associations with the true speaker. Last, we investigate the problem of reconstructing 3D facial shapes from voices. We propose an anthropometry-guided framework that identifies which anthropometric measurements (AMs) are predictable from voice, and then reconstructs the 3D facial shapes from those predictable AMs. Compared to baseline methods, our results demonstrate notable improvements, especially in reconstructing the shapes of speakers' noses.
ISBN: 9798438749660Subjects--Topical Terms:
516317
Artificial intelligence.
Subjects--Index Terms:
Associations
Reconstruction of Human Faces from Voice.
LDR
:04008nmm a2200361 4500
001
2350409
005
20221020125756.5
008
241004s2022 ||||||||||||||||| ||eng d
020
$a
9798438749660
035
$a
(MiAaPQ)AAI29162698
035
$a
AAI29162698
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Wen, Yandong.
$0
(orcid)0000-0001-6330-7438
$3
3689889
245
1 0
$a
Reconstruction of Human Faces from Voice.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2022
300
$a
117 p.
500
$a
Source: Dissertations Abstracts International, Volume: 83-11, Section: B.
500
$a
Includes supplementary digital materials.
500
$a
Advisor: Singh, Rita.
502
$a
Thesis (Ph.D.)--Carnegie Mellon University, 2022.
506
$a
This item must not be sold to any third party vendors.
520
$a
Voices and faces play pivotal roles in our social interactions. Despite their different physical manifestations, voices and faces contain highly similar types of information, including linguistic information (phonemes for voice and viseme for faces), affective state, and identity characteristics (weight, gender, age, etc.). For this reason, the associations between voices and faces have gathered significant research interest in psychology, cognitive science, artificial intelligence, and many other fields.In this thesis, we attempt to explore the identity associations between voices and faces by developing computational mechanisms for reconstructing faces from voices. More specifically, the task is designed to answer the question: Given an unheard audio clip spoken by an unseen person, can we algorithmically picture a face that has as many associations as possible with the speaker, in terms of identity?The link between voice and face has been established from many perspectives. Direct relationships include the effect of the underlying skeletal and articulator structure of the face and the tissue covering them, all of which govern the shapes, sizes, and acoustic properties of the vocal tract that produces the voice. Less directly, the same genetic, physical, and environmental influences that affect the development of the face also affect the voice. Given these demonstrable dependencies, it is reasonable to hypothesize that it may be possible to reconstruct faces from voice signals algorithmically. Our hypothesis is that if any facial parameter influences the speaker's voice, its effects on the voice must be discoverable by a properly designed computational model.This thesis presents how we approach the goal of generating faces from voices in three stages. First, we consider the cross-modal matching problem: given a voice recording, one must select the speaker's face from a gallery of face images. To this end, we propose disjoint mapping networks to learn representations of voices and faces in a shared space, such that their representations can be compared to one another. The results of matching empirically demonstrate the possibility of disambiguating faces from the voice. Second, we address the problem of reconstructing 2D face images from voices. We propose a simple but effective computational framework based on generative adversarial networks (GANs). The generated face images are visually plausible and have identity associations with the true speaker. Last, we investigate the problem of reconstructing 3D facial shapes from voices. We propose an anthropometry-guided framework that identifies which anthropometric measurements (AMs) are predictable from voice, and then reconstructs the 3D facial shapes from those predictable AMs. Compared to baseline methods, our results demonstrate notable improvements, especially in reconstructing the shapes of speakers' noses.
590
$a
School code: 0041.
650
4
$a
Artificial intelligence.
$3
516317
650
4
$a
Computer science.
$3
523869
650
4
$a
Physical anthropology.
$3
518358
653
$a
Associations
653
$a
Faces
653
$a
Voices
690
$a
0800
690
$a
0984
690
$a
0327
710
2
$a
Carnegie Mellon University.
$b
Electrical and Computer Engineering.
$3
2094139
773
0
$t
Dissertations Abstracts International
$g
83-11B.
790
$a
0041
791
$a
Ph.D.
792
$a
2022
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=29162698
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9472847
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入
(1)帳號:一般為「身分證號」;外籍生或交換生則為「學號」。 (2)密碼:預設為帳號末四碼。
帳號
.
密碼
.
請在此電腦上記得個人資料
取消
忘記密碼? (請注意!您必須已在系統登記E-mail信箱方能使用。)