語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
FindBook
Google Book
Amazon
博客來
Learning to Explore Knowledges from Cross Modality Data.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Learning to Explore Knowledges from Cross Modality Data./
作者:
Huang, Huaiyi.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2020,
面頁冊數:
125 p.
附註:
Source: Dissertations Abstracts International, Volume: 83-03, Section: B.
Contained By:
Dissertations Abstracts International83-03B.
標題:
Design. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28735928
ISBN:
9798535514826
Learning to Explore Knowledges from Cross Modality Data.
Huang, Huaiyi.
Learning to Explore Knowledges from Cross Modality Data.
- Ann Arbor : ProQuest Dissertations & Theses, 2020 - 125 p.
Source: Dissertations Abstracts International, Volume: 83-03, Section: B.
Thesis (Ph.D.)--Hong Kong University of Science and Technology (Hong Kong), 2020.
Big data plays a crucial role in the success of deep learning, with its large scalability and various data modalities. Modeling the relations of the data from one particular domain is often the key to understand the domain. In this thesis, we construct two large datasets with multi-faceted information, exploring and leveraging relations from cross modality data for different tasks. Besides, we carefully design a generic framework for probabilistic inference which can deal with any combination of inputs and outputs.In the first part of this thesis, we construct a large place dataset and intend to understand places from different aspects. Place is an important element in visual understanding. Given a photo of a building, people can often tell its functionality, e.g. a restaurant or a shop, its cultural style, e.g. Asian or European, as well as its economic type, e.g. industry-oriented or tourism oriented. While place recognition has been widely studied in previous work, there remains a long way towards comprehensive place understanding, which is far beyond categorizing a place with an image and requires information of multiple aspects. In this work, we contribute Placepedia, a large-scale place dataset with more than 35M photos from 240K unique places. Besides the photos, each place also comes with massive multi-faceted information, e.g. GDP, population, etc.., and labels at multiple levels, including function, city, country, etc.. This dataset, with its large amount of data and rich annotations, allows various studies to be conducted. Particularly, in our studies, 1) we develop PlaceNet, a unified framework for multi-level place recognition, and 2) a method for city embedding, which can produce a vector representation for a city that captures both visual and multi-faceted side information. Such studies not only reveal the key challenges in place understanding, but also allow us to establish the connections between visual observations and the underlying socioeconomic or cultural implications.In the second part of this thesis, we build a news dataset and explore the background knowledges from news text to help news face recognition. Despite the remarkable progress in recent years, visual recognition remains challenging in the wild, due to the inherent ambiguities of visual cues. In recent years, contextual information has been actively exploited to assist recognition. However, such efforts are subject to an important limitation - the relational models usually need to be learned from a well-annotated dataset. This limitation makes it difficult to apply such methods to largescale real-world applications, e.g. open online services. To move beyond this limitation, we aim to explore the possibility of acquiring reliable background knowledges from massive unannotated open data and leveraging them to help visual recognition. Hence, we build a news dataset called NewsNet, which contains nearly 6 million pieces of news together with 9 million news photos. This dataset, with its large amount of data in both visual and textual domains, the inherent connections between the text and the photos, and the rich information contained in the news, allows various studies to be conducted on top in large scale. In this paper, we present a study, where we try to discover people relations from the news, leverage them for face recognition, and thus obtain encouraging results. This is just a taste of NewsNet, which, however, already shows the dataset's strong potential. In the third part of this thesis, we propose a new general-purpose framework for probabilistic inference by combining the versatility of probabilistic graphical models with the expressive power of deep neural networks. Specifically, graphical models have long been plagued by suboptimal accuracy, overly slow learning procedure, and weak scalability; while deep neural networks are often tailored to specific tasks. In this work, we explore a generic design of inference networks, which can respond to queries with different combinations of inputs and outputs via a set of shared inference units. On top of this formulation, we also derive a self-supervised learning algorithm, with a new learning objective - expected query loss, which directly aligns with the inference tasks. The proposed framework provides a unique combination of versatility, efficiency, and accuracy, which distinguishes it from existing methods: (1) Unlike typical deep neural networks, while trained only once, a network from our method can respond to arbitrary queries. (2) On large real-world datasets, our method substantially reduces both the training and inference time. (3) It also brings remarkable improvement on the inference accuracy, reducing the perplexity by over 80% on a number of tasks.
ISBN: 9798535514826Subjects--Topical Terms:
518875
Design.
Subjects--Index Terms:
Knowledge exporation
Learning to Explore Knowledges from Cross Modality Data.
LDR
:05866nmm a2200361 4500
001
2346037
005
20220613064842.5
008
241004s2020 ||||||||||||||||| ||eng d
020
$a
9798535514826
035
$a
(MiAaPQ)AAI28735928
035
$a
AAI28735928
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Huang, Huaiyi.
$3
3685070
245
1 0
$a
Learning to Explore Knowledges from Cross Modality Data.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2020
300
$a
125 p.
500
$a
Source: Dissertations Abstracts International, Volume: 83-03, Section: B.
500
$a
Advisor: Dahua, Lin.
502
$a
Thesis (Ph.D.)--Hong Kong University of Science and Technology (Hong Kong), 2020.
520
$a
Big data plays a crucial role in the success of deep learning, with its large scalability and various data modalities. Modeling the relations of the data from one particular domain is often the key to understand the domain. In this thesis, we construct two large datasets with multi-faceted information, exploring and leveraging relations from cross modality data for different tasks. Besides, we carefully design a generic framework for probabilistic inference which can deal with any combination of inputs and outputs.In the first part of this thesis, we construct a large place dataset and intend to understand places from different aspects. Place is an important element in visual understanding. Given a photo of a building, people can often tell its functionality, e.g. a restaurant or a shop, its cultural style, e.g. Asian or European, as well as its economic type, e.g. industry-oriented or tourism oriented. While place recognition has been widely studied in previous work, there remains a long way towards comprehensive place understanding, which is far beyond categorizing a place with an image and requires information of multiple aspects. In this work, we contribute Placepedia, a large-scale place dataset with more than 35M photos from 240K unique places. Besides the photos, each place also comes with massive multi-faceted information, e.g. GDP, population, etc.., and labels at multiple levels, including function, city, country, etc.. This dataset, with its large amount of data and rich annotations, allows various studies to be conducted. Particularly, in our studies, 1) we develop PlaceNet, a unified framework for multi-level place recognition, and 2) a method for city embedding, which can produce a vector representation for a city that captures both visual and multi-faceted side information. Such studies not only reveal the key challenges in place understanding, but also allow us to establish the connections between visual observations and the underlying socioeconomic or cultural implications.In the second part of this thesis, we build a news dataset and explore the background knowledges from news text to help news face recognition. Despite the remarkable progress in recent years, visual recognition remains challenging in the wild, due to the inherent ambiguities of visual cues. In recent years, contextual information has been actively exploited to assist recognition. However, such efforts are subject to an important limitation - the relational models usually need to be learned from a well-annotated dataset. This limitation makes it difficult to apply such methods to largescale real-world applications, e.g. open online services. To move beyond this limitation, we aim to explore the possibility of acquiring reliable background knowledges from massive unannotated open data and leveraging them to help visual recognition. Hence, we build a news dataset called NewsNet, which contains nearly 6 million pieces of news together with 9 million news photos. This dataset, with its large amount of data in both visual and textual domains, the inherent connections between the text and the photos, and the rich information contained in the news, allows various studies to be conducted on top in large scale. In this paper, we present a study, where we try to discover people relations from the news, leverage them for face recognition, and thus obtain encouraging results. This is just a taste of NewsNet, which, however, already shows the dataset's strong potential. In the third part of this thesis, we propose a new general-purpose framework for probabilistic inference by combining the versatility of probabilistic graphical models with the expressive power of deep neural networks. Specifically, graphical models have long been plagued by suboptimal accuracy, overly slow learning procedure, and weak scalability; while deep neural networks are often tailored to specific tasks. In this work, we explore a generic design of inference networks, which can respond to queries with different combinations of inputs and outputs via a set of shared inference units. On top of this formulation, we also derive a self-supervised learning algorithm, with a new learning objective - expected query loss, which directly aligns with the inference tasks. The proposed framework provides a unique combination of versatility, efficiency, and accuracy, which distinguishes it from existing methods: (1) Unlike typical deep neural networks, while trained only once, a network from our method can respond to arbitrary queries. (2) On large real-world datasets, our method substantially reduces both the training and inference time. (3) It also brings remarkable improvement on the inference accuracy, reducing the perplexity by over 80% on a number of tasks.
590
$a
School code: 1223.
650
4
$a
Design.
$3
518875
650
4
$a
Information science.
$3
554358
650
4
$a
Multimedia communications.
$3
590562
650
4
$a
Artificial intelligence.
$3
516317
650
4
$a
Information technology.
$3
532993
650
4
$a
Socioeconomic factors.
$3
3435444
650
4
$a
Accuracy.
$3
3559958
650
4
$a
Deep learning.
$3
3554982
650
4
$a
Photographs.
$3
627415
650
4
$a
Culture.
$3
517003
650
4
$a
Datasets.
$3
3541416
650
4
$a
Boxes.
$3
3564918
650
4
$a
Actors.
$3
641271
650
4
$a
Opening hours.
$3
3685071
650
4
$a
Genre.
$3
2191767
650
4
$a
Power.
$3
518736
650
4
$a
Experiments.
$3
525909
650
4
$a
Knowledge.
$3
872758
650
4
$a
Neural networks.
$3
677449
650
4
$a
Variables.
$3
3548259
650
4
$a
Algorithms.
$3
536374
650
4
$a
Annotations.
$3
3561780
650
4
$a
Queries.
$3
3564462
650
4
$a
Cities.
$3
3544022
653
$a
Knowledge exporation
653
$a
Cross modality data
653
$a
Deep learning
690
$a
0389
690
$a
0558
690
$a
0723
690
$a
0489
690
$a
0800
710
2
$a
Hong Kong University of Science and Technology (Hong Kong).
$3
1022235
773
0
$t
Dissertations Abstracts International
$g
83-03B.
790
$a
1223
791
$a
Ph.D.
792
$a
2020
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28735928
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9468475
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入