語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Multimodal Representation Learning f...
~
Saha, Rudra.
FindBook
Google Book
Amazon
博客來
Multimodal Representation Learning for Visual Reasoning and Text-to-Image Translation.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Multimodal Representation Learning for Visual Reasoning and Text-to-Image Translation./
作者:
Saha, Rudra.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2018,
面頁冊數:
79 p.
附註:
Source: Masters Abstracts International, Volume: 80-06.
Contained By:
Masters Abstracts International80-06.
標題:
Computer Engineering. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10980267
ISBN:
9780438713840
Multimodal Representation Learning for Visual Reasoning and Text-to-Image Translation.
Saha, Rudra.
Multimodal Representation Learning for Visual Reasoning and Text-to-Image Translation.
- Ann Arbor : ProQuest Dissertations & Theses, 2018 - 79 p.
Source: Masters Abstracts International, Volume: 80-06.
Thesis (M.S.)--Arizona State University, 2018.
This item must not be sold to any third party vendors.
Multimodal Representation Learning is a multi-disciplinary research field which aims to integrate information from multiple communicative modalities in a meaningful manner to help solve some downstream task. These modalities can be visual, acoustic, linguistic, haptic etc. The interpretation of 'meaningful integration of information from different modalities' remains modality and task dependent. The downstream task can range from understanding one modality in the presence of information from other modalities, to that of translating input from one modality to another. In this thesis the utility of multimodal representation learning for understanding one modality vis-a-vis Image Understanding for Visual Reasoning given corresponding information in other modalities, as well as translating from one modality to the other, specifically, Text to Image Translation was investigated. Visual Reasoning has been an active area of research in computer vision. It encompasses advanced image processing and artificial intelligence techniques to locate, characterize and recognize objects, regions and their attributes in the image in order to comprehend the image itself. One way of building a visual reasoning system is to ask the system to answer questions about the image that requires attribute identification, counting, comparison, multi-step attention, and reasoning. An intelligent system is thought to have a proper grasp of the image if it can answer said questions correctly and provide a valid reasoning for the given answers. In this work how a system can be built by learning a multimodal representation between the stated image and the questions was investigated. Also, how background knowledge, specifically scene-graph information, if available, can be incorporated into existing image understanding models was demonstrated. Multimodal learning provides an intuitive way of learning a joint representation between different modalities. Such a joint representation can be used to translate from one modality to the other. It also gives way to learning a shared representation between these varied modalities and allows to provide meaning to what this shared representation should capture. In this work, using the surrogate task of text to image translation, neural network based architectures to learn a shared representation between these two modalities was investigated. Also, the ability that such a shared representation is capable of capturing parts of different modalities that are equivalent in some sense is proposed. Specifically, given an image and a semantic description of certain objects present in the image, a shared representation between the text and the image modality capable of capturing parts of the image being mentioned in the text was demonstrated. Such a capability was showcased on a publicly available dataset.
ISBN: 9780438713840Subjects--Topical Terms:
1567821
Computer Engineering.
Multimodal Representation Learning for Visual Reasoning and Text-to-Image Translation.
LDR
:03902nmm a2200325 4500
001
2209642
005
20191104073756.5
008
201008s2018 ||||||||||||||||| ||eng d
020
$a
9780438713840
035
$a
(MiAaPQ)AAI10980267
035
$a
(MiAaPQ)asu:18394
035
$a
AAI10980267
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Saha, Rudra.
$3
3436739
245
1 0
$a
Multimodal Representation Learning for Visual Reasoning and Text-to-Image Translation.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2018
300
$a
79 p.
500
$a
Source: Masters Abstracts International, Volume: 80-06.
500
$a
Publisher info.: Dissertation/Thesis.
500
$a
Advisor: Yang, Yezhou.
502
$a
Thesis (M.S.)--Arizona State University, 2018.
506
$a
This item must not be sold to any third party vendors.
520
$a
Multimodal Representation Learning is a multi-disciplinary research field which aims to integrate information from multiple communicative modalities in a meaningful manner to help solve some downstream task. These modalities can be visual, acoustic, linguistic, haptic etc. The interpretation of 'meaningful integration of information from different modalities' remains modality and task dependent. The downstream task can range from understanding one modality in the presence of information from other modalities, to that of translating input from one modality to another. In this thesis the utility of multimodal representation learning for understanding one modality vis-a-vis Image Understanding for Visual Reasoning given corresponding information in other modalities, as well as translating from one modality to the other, specifically, Text to Image Translation was investigated. Visual Reasoning has been an active area of research in computer vision. It encompasses advanced image processing and artificial intelligence techniques to locate, characterize and recognize objects, regions and their attributes in the image in order to comprehend the image itself. One way of building a visual reasoning system is to ask the system to answer questions about the image that requires attribute identification, counting, comparison, multi-step attention, and reasoning. An intelligent system is thought to have a proper grasp of the image if it can answer said questions correctly and provide a valid reasoning for the given answers. In this work how a system can be built by learning a multimodal representation between the stated image and the questions was investigated. Also, how background knowledge, specifically scene-graph information, if available, can be incorporated into existing image understanding models was demonstrated. Multimodal learning provides an intuitive way of learning a joint representation between different modalities. Such a joint representation can be used to translate from one modality to the other. It also gives way to learning a shared representation between these varied modalities and allows to provide meaning to what this shared representation should capture. In this work, using the surrogate task of text to image translation, neural network based architectures to learn a shared representation between these two modalities was investigated. Also, the ability that such a shared representation is capable of capturing parts of different modalities that are equivalent in some sense is proposed. Specifically, given an image and a semantic description of certain objects present in the image, a shared representation between the text and the image modality capable of capturing parts of the image being mentioned in the text was demonstrated. Such a capability was showcased on a publicly available dataset.
590
$a
School code: 0010.
650
4
$a
Computer Engineering.
$3
1567821
650
4
$a
Artificial intelligence.
$3
516317
690
$a
0464
690
$a
0800
710
2
$a
Arizona State University.
$b
Computer Engineering.
$3
3289092
773
0
$t
Masters Abstracts International
$g
80-06.
790
$a
0010
791
$a
M.S.
792
$a
2018
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10980267
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9386191
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入