語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
FindBook
Google Book
Amazon
博客來
Multimodal Object Representation Learning in Haptic, Auditory, and Visual Domains.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Multimodal Object Representation Learning in Haptic, Auditory, and Visual Domains./
作者:
Heravi, Negin.
面頁冊數:
1 online resource (117 pages)
附註:
Source: Dissertations Abstracts International, Volume: 84-05, Section: A.
Contained By:
Dissertations Abstracts International84-05A.
標題:
Augmented reality. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=29755715click for full text (PQDT)
ISBN:
9798357500373
Multimodal Object Representation Learning in Haptic, Auditory, and Visual Domains.
Heravi, Negin.
Multimodal Object Representation Learning in Haptic, Auditory, and Visual Domains.
- 1 online resource (117 pages)
Source: Dissertations Abstracts International, Volume: 84-05, Section: A.
Thesis (Ph.D.)--Stanford University, 2022.
Includes bibliographical references
Humans frequently use all their senses to understand and interact with their environments. Our multi-modal mental priors of how objects and materials respond to physical interactions enable us to succeed in many of our everyday tasks. For example, to find a glass in the back of a dark cluttered cabinet, we heavily rely on our senses of touch and hearing as well as our prior knowledge of how a glass feels and sounds. This observation about human behavior motivates us to develop effective ways of modeling the multi-modal signals of vision, haptics, and audio. Such models have applications for robotics as well as for Augmented and Virtual Reality (AR/VR).Similar to humans, robots can benefit from having the capability to infer and use multi-modal signals of vision, haptics, and audio in their manual tasks. For example, they too can take advantage of their haptics and auditory signals where their visual perception fails in cluttered dark environments, such as inside a kitchen cabinet or during contact-rich manipulation tasks such as key insertion.Given that our real life experiences are multi-modal, effective AR/VR environments should be multi-modal as well. With the commercialization of several AR/VR devices over the past few decades, a variety of applications in areas such as ecommerce, gaming, education, and medicine has emerged. However, current AR/VR environments lack rich multi-modal sensory responses, which reduces the realism of these environments.For a model to efficiently render appropriate multi-modal signals in response to user interactions, or for a robot to use high-dimensional sensory observations in a meaningful way, they need to encode this data in low-dimensional representations. This motivates us to develop effective ways of learning representations of these different modalities, which is a challenging goal. From a modeling standpoint, visual cues of an object and its haptic and auditory feedback are heterogeneous, requiring domain-specific knowledge to design the appropriate perceptual modules for each. Furthermore, these representations should ideally be either task agnostic or easily generalizable to new tasks and scenarios since collecting a new dataset per task or object is expensive and impossible to scale. This motivates us to explore physically interpretable and object aware representations. In this dissertation, we demonstrate how object-aware learning-based representations can be used for learning appropriate representations in different modalities.In the first part, we focus on the modality of touch and use deep-learning based methods for haptic texture rendering. We present a learned action-conditional model for haptic textures that uses data from a vision-based tactile sensor (GelSight) and a user's action as input. This model predicts an induced acceleration that is used to provide haptic vibration feedback to a user to induce the sensation of a virtual texture. We show that our model outperforms previous state-of-the-art methods using a quantitative comparison between the predicted and ground truth signal. We further show the performance of our model for real time haptic texture rendering as well as generalization to unseen textures through human user studies.In the second part of this thesis, we explore processing audio signals. We develop a fully differentiable model for rendering and identification of impact sounds called DiffImpact.
Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2023
Mode of access: World Wide Web
ISBN: 9798357500373Subjects--Topical Terms:
1620831
Augmented reality.
Index Terms--Genre/Form:
542853
Electronic books.
Multimodal Object Representation Learning in Haptic, Auditory, and Visual Domains.
LDR
:04786nmm a2200397K 4500
001
2362352
005
20231027104011.5
006
m o d
007
cr mn ---uuuuu
008
241011s2022 xx obm 000 0 eng d
020
$a
9798357500373
035
$a
(MiAaPQ)AAI29755715
035
$a
(MiAaPQ)STANFORDsj589ft0971
035
$a
AAI29755715
040
$a
MiAaPQ
$b
eng
$c
MiAaPQ
$d
NTU
100
1
$a
Heravi, Negin.
$3
3703069
245
1 0
$a
Multimodal Object Representation Learning in Haptic, Auditory, and Visual Domains.
264
0
$c
2022
300
$a
1 online resource (117 pages)
336
$a
text
$b
txt
$2
rdacontent
337
$a
computer
$b
c
$2
rdamedia
338
$a
online resource
$b
cr
$2
rdacarrier
500
$a
Source: Dissertations Abstracts International, Volume: 84-05, Section: A.
500
$a
Advisor: Bohg, Jeannette;Okamura, Allison.
502
$a
Thesis (Ph.D.)--Stanford University, 2022.
504
$a
Includes bibliographical references
520
$a
Humans frequently use all their senses to understand and interact with their environments. Our multi-modal mental priors of how objects and materials respond to physical interactions enable us to succeed in many of our everyday tasks. For example, to find a glass in the back of a dark cluttered cabinet, we heavily rely on our senses of touch and hearing as well as our prior knowledge of how a glass feels and sounds. This observation about human behavior motivates us to develop effective ways of modeling the multi-modal signals of vision, haptics, and audio. Such models have applications for robotics as well as for Augmented and Virtual Reality (AR/VR).Similar to humans, robots can benefit from having the capability to infer and use multi-modal signals of vision, haptics, and audio in their manual tasks. For example, they too can take advantage of their haptics and auditory signals where their visual perception fails in cluttered dark environments, such as inside a kitchen cabinet or during contact-rich manipulation tasks such as key insertion.Given that our real life experiences are multi-modal, effective AR/VR environments should be multi-modal as well. With the commercialization of several AR/VR devices over the past few decades, a variety of applications in areas such as ecommerce, gaming, education, and medicine has emerged. However, current AR/VR environments lack rich multi-modal sensory responses, which reduces the realism of these environments.For a model to efficiently render appropriate multi-modal signals in response to user interactions, or for a robot to use high-dimensional sensory observations in a meaningful way, they need to encode this data in low-dimensional representations. This motivates us to develop effective ways of learning representations of these different modalities, which is a challenging goal. From a modeling standpoint, visual cues of an object and its haptic and auditory feedback are heterogeneous, requiring domain-specific knowledge to design the appropriate perceptual modules for each. Furthermore, these representations should ideally be either task agnostic or easily generalizable to new tasks and scenarios since collecting a new dataset per task or object is expensive and impossible to scale. This motivates us to explore physically interpretable and object aware representations. In this dissertation, we demonstrate how object-aware learning-based representations can be used for learning appropriate representations in different modalities.In the first part, we focus on the modality of touch and use deep-learning based methods for haptic texture rendering. We present a learned action-conditional model for haptic textures that uses data from a vision-based tactile sensor (GelSight) and a user's action as input. This model predicts an induced acceleration that is used to provide haptic vibration feedback to a user to induce the sensation of a virtual texture. We show that our model outperforms previous state-of-the-art methods using a quantitative comparison between the predicted and ground truth signal. We further show the performance of our model for real time haptic texture rendering as well as generalization to unseen textures through human user studies.In the second part of this thesis, we explore processing audio signals. We develop a fully differentiable model for rendering and identification of impact sounds called DiffImpact.
533
$a
Electronic reproduction.
$b
Ann Arbor, Mich. :
$c
ProQuest,
$d
2023
538
$a
Mode of access: World Wide Web
650
4
$a
Augmented reality.
$3
1620831
650
4
$a
Physics.
$3
516296
650
4
$a
Real time.
$3
3562675
650
4
$a
Neural networks.
$3
677449
650
4
$a
Sensors.
$3
3549539
650
4
$a
Robots.
$3
529507
650
4
$a
Storytelling.
$3
535033
650
4
$a
Localization.
$3
3560711
650
4
$a
Research & development--R&D.
$3
3554335
650
4
$a
Performance evaluation.
$3
3562292
650
4
$a
Feedback.
$3
677181
650
4
$a
Virtual reality.
$3
527460
650
4
$a
Sound.
$3
542298
650
4
$a
Robotics.
$3
519753
650
4
$a
Acoustics.
$3
879105
650
4
$a
Information technology.
$3
532993
655
7
$a
Electronic books.
$2
lcsh
$3
542853
690
$a
0771
690
$a
0605
690
$a
0986
690
$a
0800
690
$a
0505
690
$a
0489
690
$a
0338
710
2
$a
ProQuest Information and Learning Co.
$3
783688
710
2
$a
Stanford University.
$3
754827
773
0
$t
Dissertations Abstracts International
$g
84-05A.
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=29755715
$z
click for full text (PQDT)
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9484708
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入