語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Learning to Describe Images via Natu...
~
Yan, An.
FindBook
Google Book
Amazon
博客來
Learning to Describe Images via Natural Language.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Learning to Describe Images via Natural Language./
作者:
Yan, An.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2024,
面頁冊數:
81 p.
附註:
Source: Dissertations Abstracts International, Volume: 85-10, Section: B.
Contained By:
Dissertations Abstracts International85-10B.
標題:
Computer science. -
電子資源:
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30994065
ISBN:
9798382223483
Learning to Describe Images via Natural Language.
Yan, An.
Learning to Describe Images via Natural Language.
- Ann Arbor : ProQuest Dissertations & Theses, 2024 - 81 p.
Source: Dissertations Abstracts International, Volume: 85-10, Section: B.
Thesis (Ph.D.)--University of California, San Diego, 2024.
This item must not be sold to any third party vendors.
Teaching machines to describe visual images is one of the most long-standing challenges in the field of Machine Learning. This thesis tackles the problem of describing images via natural language: how to build machine learning models to read visual images and describe their content as well as answer relevant questions. From the application perspective, strong image captioning systems can contribute to applications such as visual question answering, dialogue systems and visual-based robotics. For a long term goal, if we can build such systems (e.g., GPT-4V and beyond), they would be a crucial step towards building Artificial General Intelligence: computers that can perceive and explore the world as humans do.This thesis focus on neural models: building vision understanding and language generation models with deep neural networks. It mainly consists of three parts.First, we will introduce concept bottleneck models, a class of models that build concept layers for visual understanding. We will present our work on learning a concise concept space, and follow-up applications for medical imaging to gain robustness.In the second part of this thesis, we investigate how we can build practical image captioning systems based on different neural text generation architectures, from LSTM to transformers and pre-trained language models. In particular, we will cover four different tasks: 1) how we can describe the visual difference of two images; 2) how we can write medical reports given Chest X-rays to assist doctors; 3) how to generate personalized explanations for recommender systems; 4) how to augment text generation with visual imagination generated from vision diffusion models.In the third part, we will discuss recent advances, future directions and open questions in this field, focusing on aspects of datasets, models, and applications. We will also introduce some of our on-going attempts for these directions: for example, how to navigate phone screens and complete mobile tasks with GPT-4V.In summary, my research contributes to the field of vision and language, specifically visual understanding via natural language, from the aspects of data curation, algorithm designing, model training, as well as various downstream applications.
ISBN: 9798382223483Subjects--Topical Terms:
523869
Computer science.
Subjects--Index Terms:
Machine learning
Learning to Describe Images via Natural Language.
LDR
:03500nmm a2200409 4500
001
2397077
005
20240617111732.5
006
m o d
007
cr#unu||||||||
008
251215s2024 ||||||||||||||||| ||eng d
020
$a
9798382223483
035
$a
(MiAaPQ)AAI30994065
035
$a
AAI30994065
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Yan, An.
$3
1257371
245
1 0
$a
Learning to Describe Images via Natural Language.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2024
300
$a
81 p.
500
$a
Source: Dissertations Abstracts International, Volume: 85-10, Section: B.
500
$a
Advisor: McAuley, Julian.
502
$a
Thesis (Ph.D.)--University of California, San Diego, 2024.
506
$a
This item must not be sold to any third party vendors.
520
$a
Teaching machines to describe visual images is one of the most long-standing challenges in the field of Machine Learning. This thesis tackles the problem of describing images via natural language: how to build machine learning models to read visual images and describe their content as well as answer relevant questions. From the application perspective, strong image captioning systems can contribute to applications such as visual question answering, dialogue systems and visual-based robotics. For a long term goal, if we can build such systems (e.g., GPT-4V and beyond), they would be a crucial step towards building Artificial General Intelligence: computers that can perceive and explore the world as humans do.This thesis focus on neural models: building vision understanding and language generation models with deep neural networks. It mainly consists of three parts.First, we will introduce concept bottleneck models, a class of models that build concept layers for visual understanding. We will present our work on learning a concise concept space, and follow-up applications for medical imaging to gain robustness.In the second part of this thesis, we investigate how we can build practical image captioning systems based on different neural text generation architectures, from LSTM to transformers and pre-trained language models. In particular, we will cover four different tasks: 1) how we can describe the visual difference of two images; 2) how we can write medical reports given Chest X-rays to assist doctors; 3) how to generate personalized explanations for recommender systems; 4) how to augment text generation with visual imagination generated from vision diffusion models.In the third part, we will discuss recent advances, future directions and open questions in this field, focusing on aspects of datasets, models, and applications. We will also introduce some of our on-going attempts for these directions: for example, how to navigate phone screens and complete mobile tasks with GPT-4V.In summary, my research contributes to the field of vision and language, specifically visual understanding via natural language, from the aspects of data curation, algorithm designing, model training, as well as various downstream applications.
590
$a
School code: 0033.
650
4
$a
Computer science.
$3
523869
650
4
$a
Systematic biology.
$3
3173492
650
4
$a
Medical imaging.
$3
3172799
653
$a
Machine learning
653
$a
Visual images
653
$a
Concept bottleneck models
653
$a
Concise concept space
653
$a
Diagnostic datasets
690
$a
0984
690
$a
0423
690
$a
0574
690
$a
0800
710
2
$a
University of California, San Diego.
$b
Computer Science and Engineering.
$3
1018473
773
0
$t
Dissertations Abstracts International
$g
85-10B.
790
$a
0033
791
$a
Ph.D.
792
$a
2024
793
$a
English
856
4 0
$u
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30994065
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9505397
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入