語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
FindBook
Google Book
Amazon
博客來
Food Image Retrieval and Generation.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Food Image Retrieval and Generation./
作者:
Han, Fangda.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2022,
面頁冊數:
118 p.
附註:
Source: Dissertations Abstracts International, Volume: 83-08, Section: B.
Contained By:
Dissertations Abstracts International83-08B.
標題:
Computer science. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28962934
ISBN:
9798790631016
Food Image Retrieval and Generation.
Han, Fangda.
Food Image Retrieval and Generation.
- Ann Arbor : ProQuest Dissertations & Theses, 2022 - 118 p.
Source: Dissertations Abstracts International, Volume: 83-08, Section: B.
Thesis (Ph.D.)--Rutgers The State University of New Jersey, School of Graduate Studies, 2022.
This item must not be sold to any third party vendors.
The domain of analysis and synthesis of food images is gaining increasing research interest due to its widespread applications in cooking and diet management. For instance, retrieval of food images from textual prompts can help speed up the cooking process. Likewise, extracting nutritional information from meal images can help monitor daily nutrient intake, facilitating diet management. The computational synthesis of photo-realistic food images is complementary to food image analysis. As an essential element of modeling the food image data, the synthesis also enables novel applications such as augmenting cooking instructions with multimedia content and gamification of the food creation process to promote healthy eating habits amongst children. This dissertation focuses on developing computational tools for food image retrieval and generation.Our food image retrieval algorithm leverages the auxiliary information capturing relationships between related text-image pairs to regularize the latent space of food instructions and food images. Specifically, we develop a Coherence Aware Module (CAM) to augment the traditional text-to-image retrieval pipeline. The CAM is then trained to predict the auxiliary coherence relations that systematically characterize possible forms of relationship between related text-image pairs. Capturing these coherence relations has the effect of regularizing the learning of latent space embeddings of text-image pairs, resulting in accurate retrieval. Moreover, we show how CAM can be used to refine queries during inference using the process of Selective Similarity Refinement (SSR). Both CAM and SSR lead to significant performance improvements in general text-image retrieval systems.Next, we develop a food image generation algorithm to generate images conditioned on multiple ingredients. First, we propose CookGAN, a novel extension of StackGANv2 with an explicit regularization in the form of Cycle Consistent Constraint (Cyc-constraint). Specifically, Cyc-constraint utilizes the pre-trained retrieval system discussed above to regularize the generation process and helps the model in generating images that more accurately reflect the desired content. However, CookGAN suffers from image blurring due to the limitation of model capacity. To address this problem, we propose Multi-ingredient Pizza Generator (MPG), an image synthesis approach that extends the StyleGAN2 architecture using a controllable conditioning input paradigm. Specifically, the control of ingredients relies on Scalewise Label Encoder (SLE) which helps the model to be strongly conditioned on the input ingredients while maintaining StyleGAN2's excellent image quality. To verify the efficacy of MPG, we validate it on Pizza10, which is a carefully annotated dataset of multi-ingredient pizza images. We show that MPG can successfully generate photo-realistic pizza images with the desired ingredients.However, while MPG can generate content-specific food images, it cannot control other image variation factors, such as the pizza shape, scale, or position, which are not available in the training data. To solve this problem, we propose Multi-attribute Pizza Generator (MPG2), together with Multi-Scale Multi-Attribute Encoder (MSMAE) and Attribute Regularizer (AR), targeting control of both ingredients and geometric attributes. We propose a cross-domain training schema to synthesize pizza images with the view attributes absent in the training dataset. This schema combines fully controllable computer graphics generated images (CGIs) with the partially annotated real-world data. To this end, we employ a view attribute regressor estimated on the CGI data to regularize the real-world food image generation process, thereby bridging the real-world and CGI training domains.
ISBN: 9798790631016Subjects--Topical Terms:
523869
Computer science.
Subjects--Index Terms:
Computer vision
Food Image Retrieval and Generation.
LDR
:04972nmm a2200373 4500
001
2351438
005
20221107085644.5
008
241004s2022 ||||||||||||||||| ||eng d
020
$a
9798790631016
035
$a
(MiAaPQ)AAI28962934
035
$a
AAI28962934
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Han, Fangda.
$0
(orcid)0000-0002-8663-2185
$3
3691010
245
1 0
$a
Food Image Retrieval and Generation.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2022
300
$a
118 p.
500
$a
Source: Dissertations Abstracts International, Volume: 83-08, Section: B.
500
$a
Advisor: Pavlovic, Vladimir.
502
$a
Thesis (Ph.D.)--Rutgers The State University of New Jersey, School of Graduate Studies, 2022.
506
$a
This item must not be sold to any third party vendors.
520
$a
The domain of analysis and synthesis of food images is gaining increasing research interest due to its widespread applications in cooking and diet management. For instance, retrieval of food images from textual prompts can help speed up the cooking process. Likewise, extracting nutritional information from meal images can help monitor daily nutrient intake, facilitating diet management. The computational synthesis of photo-realistic food images is complementary to food image analysis. As an essential element of modeling the food image data, the synthesis also enables novel applications such as augmenting cooking instructions with multimedia content and gamification of the food creation process to promote healthy eating habits amongst children. This dissertation focuses on developing computational tools for food image retrieval and generation.Our food image retrieval algorithm leverages the auxiliary information capturing relationships between related text-image pairs to regularize the latent space of food instructions and food images. Specifically, we develop a Coherence Aware Module (CAM) to augment the traditional text-to-image retrieval pipeline. The CAM is then trained to predict the auxiliary coherence relations that systematically characterize possible forms of relationship between related text-image pairs. Capturing these coherence relations has the effect of regularizing the learning of latent space embeddings of text-image pairs, resulting in accurate retrieval. Moreover, we show how CAM can be used to refine queries during inference using the process of Selective Similarity Refinement (SSR). Both CAM and SSR lead to significant performance improvements in general text-image retrieval systems.Next, we develop a food image generation algorithm to generate images conditioned on multiple ingredients. First, we propose CookGAN, a novel extension of StackGANv2 with an explicit regularization in the form of Cycle Consistent Constraint (Cyc-constraint). Specifically, Cyc-constraint utilizes the pre-trained retrieval system discussed above to regularize the generation process and helps the model in generating images that more accurately reflect the desired content. However, CookGAN suffers from image blurring due to the limitation of model capacity. To address this problem, we propose Multi-ingredient Pizza Generator (MPG), an image synthesis approach that extends the StyleGAN2 architecture using a controllable conditioning input paradigm. Specifically, the control of ingredients relies on Scalewise Label Encoder (SLE) which helps the model to be strongly conditioned on the input ingredients while maintaining StyleGAN2's excellent image quality. To verify the efficacy of MPG, we validate it on Pizza10, which is a carefully annotated dataset of multi-ingredient pizza images. We show that MPG can successfully generate photo-realistic pizza images with the desired ingredients.However, while MPG can generate content-specific food images, it cannot control other image variation factors, such as the pizza shape, scale, or position, which are not available in the training data. To solve this problem, we propose Multi-attribute Pizza Generator (MPG2), together with Multi-Scale Multi-Attribute Encoder (MSMAE) and Attribute Regularizer (AR), targeting control of both ingredients and geometric attributes. We propose a cross-domain training schema to synthesize pizza images with the view attributes absent in the training dataset. This schema combines fully controllable computer graphics generated images (CGIs) with the partially annotated real-world data. To this end, we employ a view attribute regressor estimated on the CGI data to regularize the real-world food image generation process, thereby bridging the real-world and CGI training domains.
590
$a
School code: 0190.
650
4
$a
Computer science.
$3
523869
650
4
$a
Food science.
$3
3173303
650
4
$a
Artificial intelligence.
$3
516317
650
4
$a
Information science.
$3
554358
653
$a
Computer vision
653
$a
Deep learning
653
$a
Generative model
653
$a
Machine learning
690
$a
0984
690
$a
0723
690
$a
0800
690
$a
0359
710
2
$a
Rutgers The State University of New Jersey, School of Graduate Studies.
$b
Computer Science.
$3
3428998
773
0
$t
Dissertations Abstracts International
$g
83-08B.
790
$a
0190
791
$a
Ph.D.
792
$a
2022
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28962934
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9473876
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入