東華大學圖書館 |

Regularized Deep Network Learning for Multi-Label Visual Recognition.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Regularized Deep Network Learning for Multi-Label Visual Recognition./
作者:	Guo, Hao.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2021,
面頁冊數:	135 p.
附註:	Source: Dissertations Abstracts International, Volume: 83-03, Section: B.
Contained By:	Dissertations Abstracts International83-03B.
標題:	Computer science. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28412475
ISBN:	9798538155446

Regularized Deep Network Learning for Multi-Label Visual Recognition.
Guo, Hao.

Regularized Deep Network Learning for Multi-Label Visual Recognition. - Ann Arbor : ProQuest Dissertations & Theses, 2021 - 135 p.

Source: Dissertations Abstracts International, Volume: 83-03, Section: B.

Thesis (Ph.D.)--University of South Carolina, 2021.

This item must not be sold to any third party vendors.

This dissertation is focused on the task of multi-label visual recognition, a fundamental task of computer vision. It aims to tell the presence of multiple visual classes from the input image, where the visual classes, such as objects, scenes, attributes, etc., are usually defined as image labels. Due to the prosperous deep networks, this task has been widely studied and significantly improved in recent years. However, it remains a challenging task due to appearance complexity of multiple visual contents co-occurring in one image. This research explores to regularize the deep network learning for multi-label visual recognition.First, an attention concentration method is proposed to refine the deep network learning for human attribute recognition, i.e., a challenging instance of multi-label visual recognition. Here the visual attention of deep networks, in terms of attention maps, is an imitation of human attention in visual recognition. Derived by the deep network with only label-level supervision, attention maps interpretively highlight areas indicating the most relevant regions that contribute most to the final network prediction. Based on the observation that human attributes are usually depicted by local image regions, the added attention concentration enhances the deep network learning for human attribute recognition by forcing the recognition on compact attribute-relevant regions. Second, inspired by the consistent relevance between a visual class and an image region, an attention consistency strategy is explored and enforced during deep network learning for human attribute recognition. Specifically, two kinds of attention consistency are studied in this dissertation, including the equivariance under spatial transforms, such as flipping, scaling and rotation, and the invariance between different networks for recognizing the same attribute from the same image. These two kinds of attention consistency are formulated as a unified attention consistency loss and combined with the traditional classification loss for network learning. Experiments on public datasets verify its effectiveness by achieving new state-of-the-art performance for human attribute recognition.Finally, to address the long-tailed category distribution of multi-label visual recognition, the collaborative learning between using uniform and re-balanced samplings is proposed for regularizing the network training. While the uniform sampling leads to relatively low performance on tail classes, re-balanced sampling can improve the performance on tail classes, but may also hurt the performance on head classes in network training due to label co-occurrence. This research proposes a new approach to train on both class-biased samplings in a collaborative way, resulting in performance improvement for both head and tail classes. Based on a two-branch network taking the uniform sampling and re-balanced sampling as the inputs, respectively, a cross-branch loss enforces consistency when the same input goes through the two branches. The experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art methods on long-tailed multi-label visual recognition.

ISBN: 9798538155446Subjects--Topical Terms:

523869
Computer science.
Subjects--Index Terms:

Computer vision

Regularized Deep Network Learning for Multi-Label Visual Recognition.
LDR:04274nmm a2200325 4500 001 2348574
005 20220912135609.5
008 241004s2021 ||||||||||||||||| ||eng d
020 $a 9798538155446
035 $a (MiAaPQ)AAI28412475
035 $a AAI28412475
040 $a MiAaPQ $c MiAaPQ
100 1 $a Guo, Hao. $3 3687938
245 1 0 $a Regularized Deep Network Learning for Multi-Label Visual Recognition.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2021
300 $a 135 p.
500 $a Source: Dissertations Abstracts International, Volume: 83-03, Section: B.
500 $a Advisor: Wang, Song.
502 $a Thesis (Ph.D.)--University of South Carolina, 2021.
506 $a This item must not be sold to any third party vendors.
520 $a This dissertation is focused on the task of multi-label visual recognition, a fundamental task of computer vision. It aims to tell the presence of multiple visual classes from the input image, where the visual classes, such as objects, scenes, attributes, etc., are usually defined as image labels. Due to the prosperous deep networks, this task has been widely studied and significantly improved in recent years. However, it remains a challenging task due to appearance complexity of multiple visual contents co-occurring in one image. This research explores to regularize the deep network learning for multi-label visual recognition.First, an attention concentration method is proposed to refine the deep network learning for human attribute recognition, i.e., a challenging instance of multi-label visual recognition. Here the visual attention of deep networks, in terms of attention maps, is an imitation of human attention in visual recognition. Derived by the deep network with only label-level supervision, attention maps interpretively highlight areas indicating the most relevant regions that contribute most to the final network prediction. Based on the observation that human attributes are usually depicted by local image regions, the added attention concentration enhances the deep network learning for human attribute recognition by forcing the recognition on compact attribute-relevant regions. Second, inspired by the consistent relevance between a visual class and an image region, an attention consistency strategy is explored and enforced during deep network learning for human attribute recognition. Specifically, two kinds of attention consistency are studied in this dissertation, including the equivariance under spatial transforms, such as flipping, scaling and rotation, and the invariance between different networks for recognizing the same attribute from the same image. These two kinds of attention consistency are formulated as a unified attention consistency loss and combined with the traditional classification loss for network learning. Experiments on public datasets verify its effectiveness by achieving new state-of-the-art performance for human attribute recognition.Finally, to address the long-tailed category distribution of multi-label visual recognition, the collaborative learning between using uniform and re-balanced samplings is proposed for regularizing the network training. While the uniform sampling leads to relatively low performance on tail classes, re-balanced sampling can improve the performance on tail classes, but may also hurt the performance on head classes in network training due to label co-occurrence. This research proposes a new approach to train on both class-biased samplings in a collaborative way, resulting in performance improvement for both head and tail classes. Based on a two-branch network taking the uniform sampling and re-balanced sampling as the inputs, respectively, a cross-branch loss enforces consistency when the same input goes through the two branches. The experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art methods on long-tailed multi-label visual recognition.
590 $a School code: 0202.
650 4 $a Computer science. $3 523869
650 4 $a Dissertations & theses. $3 3560115
650 4 $a Maps. $3 544078
650 4 $a Hair. $3 823182
650 4 $a Methods. $3 3560391
650 4 $a Datasets. $3 3541416
650 4 $a Collaborative learning. $3 3543645
650 4 $a Experiments. $3 525909
650 4 $a Neural networks. $3 677449
650 4 $a Semantics. $3 520060
650 4 $a Classification. $3 595585
653 $a Computer vision
653 $a Deep network learning
653 $a Human attribute recognition
690 $a 0984
710 2 $a University of South Carolina. $b Computer Science & Engineering. $3 1024028
773 0 $t Dissertations Abstracts International $g 83-03B.
790 $a 0202
791 $a Ph.D.
792 $a 2021
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28412475