東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

FindBook

Google Book

Amazon

博客來

Learning and Composing Primitives for the Visual World.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Learning and Composing Primitives for the Visual World./
作者:	Gupta, Kamal.
面頁冊數:	1 online resource (176 pages)
附註:	Source: Dissertations Abstracts International, Volume: 84-12, Section: B.
Contained By:	Dissertations Abstracts International84-12B.
標題:	Computer science. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30421606click for full text (PQDT)
ISBN:	9798379759858

Learning and Composing Primitives for the Visual World.
Gupta, Kamal.

Learning and Composing Primitives for the Visual World. - 1 online resource (176 pages)

Source: Dissertations Abstracts International, Volume: 84-12, Section: B.

Thesis (Ph.D.)--University of Maryland, College Park, 2023.

Includes bibliographical references

Compositionality is at the core of how humans understand and recreate the visual world. It is what allows us to express infinitely many concepts using finite primitives. For example, we understand images as a combination of objects, videos as comprising of actions, or we generate 3D animations by rendering 3D surfaces with textures, materials, and lighting. It is unsurprising to see composition also appear in almost all human-created art forms such as language, music, design, or even mathematics. Although compositionality seems an obvious and prevalent way humans consume and create data, it is often eluded in computational approaches such as deep learning. Current systems often assume the availability of exhaustive labeled concepts or primitives during training and fail to generalize to new compositions during inference. In this dissertation, we propose to discover compositional primitives from the data with little to no supervision and show how we can use these primitives for improving generalization in real-world applications such as classification, correspondence, or 2D/3D synthesis.In the first half of this dissertation, I propose two complementary approaches to discover compositional discrete primitives from visual data. Given a large collection of images without labels, I propose a generative and a contrastive way of recognizing discriminative parts in the image which are usual for visual recognition. In the generative approach, I take inspiration from bayesian approaches such as variational autoencoders, to develop a system that can express images in form of discrete language-like representation. In the contrastive approach, I play a referential game between two neural network agents, to learn meaningful discrete concepts from images. I further show applications of these approaches in image and video editing by learning a dense correspondence of primitives across images.In the second half, I'll focus on learning how to compose primitives for both 2D and 3D visual data. By expressing the scenes as an assembly of smaller parts, we can easily perform generation from scratch or from partial scenes as input. I present two works, one on composing multiple viewpoints to synthesize 3D objects, and another work on composing bounding boxes or cuboids to generate scene layouts. I also review a work on discovering a data-driven way of ordering traversing an image or a scene, for composition. I show applications of these works in image/video compression, as well as 2D and 3D content creation.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2023

Mode of access: World Wide Web

ISBN: 9798379759858Subjects--Topical Terms:

523869
Computer science.
Subjects--Index Terms:

Computer graphicsIndex Terms--Genre/Form:

542853
Electronic books.

Learning and Composing Primitives for the Visual World.
LDR:04012nmm a2200433K 4500 001 2364803
005 20231212064424.5
006 m o d
007 cr mn ---uuuuu
008 241011s2023 xx obm 000 0 eng d
020 $a 9798379759858
035 $a (MiAaPQ)AAI30421606
035 $a AAI30421606
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Gupta, Kamal. $3 1558612
245 1 0 $a Learning and Composing Primitives for the Visual World.
264 0 $c 2023
300 $a 1 online resource (176 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertations Abstracts International, Volume: 84-12, Section: B.
500 $a Advisor: Shrivastava, Abhinav;Davis, Larry S.
502 $a Thesis (Ph.D.)--University of Maryland, College Park, 2023.
504 $a Includes bibliographical references
520 $a Compositionality is at the core of how humans understand and recreate the visual world. It is what allows us to express infinitely many concepts using finite primitives. For example, we understand images as a combination of objects, videos as comprising of actions, or we generate 3D animations by rendering 3D surfaces with textures, materials, and lighting. It is unsurprising to see composition also appear in almost all human-created art forms such as language, music, design, or even mathematics. Although compositionality seems an obvious and prevalent way humans consume and create data, it is often eluded in computational approaches such as deep learning. Current systems often assume the availability of exhaustive labeled concepts or primitives during training and fail to generalize to new compositions during inference. In this dissertation, we propose to discover compositional primitives from the data with little to no supervision and show how we can use these primitives for improving generalization in real-world applications such as classification, correspondence, or 2D/3D synthesis.In the first half of this dissertation, I propose two complementary approaches to discover compositional discrete primitives from visual data. Given a large collection of images without labels, I propose a generative and a contrastive way of recognizing discriminative parts in the image which are usual for visual recognition. In the generative approach, I take inspiration from bayesian approaches such as variational autoencoders, to develop a system that can express images in form of discrete language-like representation. In the contrastive approach, I play a referential game between two neural network agents, to learn meaningful discrete concepts from images. I further show applications of these approaches in image and video editing by learning a dense correspondence of primitives across images.In the second half, I'll focus on learning how to compose primitives for both 2D and 3D visual data. By expressing the scenes as an assembly of smaller parts, we can easily perform generation from scratch or from partial scenes as input. I present two works, one on composing multiple viewpoints to synthesize 3D objects, and another work on composing bounding boxes or cuboids to generate scene layouts. I also review a work on discovering a data-driven way of ordering traversing an image or a scene, for composition. I show applications of these works in image/video compression, as well as 2D and 3D content creation.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2023
538 $a Mode of access: World Wide Web
650 4 $a Computer science. $3 523869
650 4 $a Computer engineering. $3 621879
650 4 $a Information technology. $3 532993
653 $a Computer graphics
653 $a Computer vision
653 $a Deep learning
653 $a Generative modeling
653 $a Machine learning
653 $a Natural language processing
653 $a Visual data
655 7 $a Electronic books. $2 lcsh $3 542853
690 $a 0984
690 $a 0489
690 $a 0464
690 $a 0800
710 2 $a ProQuest Information and Learning Co. $3 783688
710 2 $a University of Maryland, College Park. $b Computer Science. $3 1018451
773 0 $t Dissertations Abstracts International $g 84-12B.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30421606 $z click for full text (PQDT)