東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Object Localization with Deep Learni...

Li, Siyang.

FindBook

Google Book

Amazon

博客來

Object Localization with Deep Learning Techniques.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Object Localization with Deep Learning Techniques./
作者:	Li, Siyang.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2018,
面頁冊數:	129 p.
附註:	Source: Dissertations Abstracts International, Volume: 80-08, Section: A.
Contained By:	Dissertations Abstracts International80-08A.
標題:	Electrical engineering. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=11017104

Object Localization with Deep Learning Techniques.
Li, Siyang.

Object Localization with Deep Learning Techniques. - Ann Arbor : ProQuest Dissertations & Theses, 2018 - 129 p.

Source: Dissertations Abstracts International, Volume: 80-08, Section: A.

Thesis (Ph.D.)--University of Southern California, 2018.

This item must not be sold to any third party vendors.

Object localization is a crucial step for computers to understand an image. An object localizer typically takes in an image and outputs the bounding boxes of objects. Some applications require finer localization - delineating the shape of objects, which is called "object segmentation". In this dissertation, three object localization related problems have been studied: 1) improving the accuracy of object proposals, 2) reducing the labeling effort for object detector training, and 3) segmenting the moving objects in videos. Object proposal generation has been an important pre-processing step for object detectors in general and the convolutional neural network (CNN) detectors in particular. However, some object proposal methods suffer from the "localization bias" problem, that the recall drops rapidly as the localization accuracy requirement increases. Since contours offer a powerful cue for accurate localization, we propose a box refinement method by searching for the optimal contour for each initial bounding box that minimizes the contour cost. The box is then aligned with the contour. Experiments on the PASCAL VOC2007 test dataset show that our box refinement method can significantly improve the object recall at a high overlapping threshold while maintaining a similar recall at a loose one. Given 1000 proposals, the average recall of multiple existing methods is increased by more than 5% with our box refinement process integrated. The second research problem is motivated by the fact that a convolutional neural network based object detectors usually require a large amount of accurately annotated bounding boxes of objects. On the contrary, the image-level labels are much cheaper to achieve. Thus, we supervise the detectors with image-level labels only. A common drawback of such training setting is that the detector usually outputs bounding box of discriminative object parts (e.g. a box of cat face). To address this challenge, we incorporate object segmentation into the detector training, which guides the model to correctly localize the full objects. We propose the multiple instance curriculum learning (MICL) method, which injects curriculum learning (CL) into the multiple instance learning (MIL) framework. The MICL method starts by automatically picking the easy training examples, where the extent of the segmentation mask agrees with detection bounding boxes. The training set is gradually expanded to include harder examples to train strong detectors that handle complex images. The proposed MICL method with segmentation in the loop outperforms the state-of-the-art weakly supervised object detectors by a substantial margin on the PASCAL VOC datasets. In the third part, we propose a method for unsupervised video object segmentation by transferring the knowledge encapsulated in image-based instance embedding networks. The instance embedding network produces an embedding vector for each pixel that enables identifying all pixels belonging to the same object. Though trained on static images, the instance embeddings are stable over consecutive video frames. To reduce the false positives from static objects, a motion-based bilateral network is trained to estimate the background, which is later integrated with instance embeddings into a graph. We classify graph nodes by defining and minimizing a cost function, and segment the video frames based on the node labels. The proposed method outperforms previous state-of-the-art unsupervised video object segmentation methods on several benchmark datasets.Subjects--Topical Terms:

649834
Electrical engineering.

Object Localization with Deep Learning Techniques.
LDR:04551nmm a2200301 4500 001 2207852
005 20190923114242.5
008 201008s2018 ||||||||||||||||| ||eng d
035 $a (MiAaPQ)AAI11017104
035 $a (MiAaPQ)US_Calif_46749
035 $a AAI11017104
040 $a MiAaPQ $c MiAaPQ
100 1 $a Li, Siyang. $3 3434853
245 1 0 $a Object Localization with Deep Learning Techniques.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2018
300 $a 129 p.
500 $a Source: Dissertations Abstracts International, Volume: 80-08, Section: A.
500 $a Publisher info.: Dissertation/Thesis.
500 $a Advisor: Kuo, C-C Jay.
502 $a Thesis (Ph.D.)--University of Southern California, 2018.
506 $a This item must not be sold to any third party vendors.
520 $a Object localization is a crucial step for computers to understand an image. An object localizer typically takes in an image and outputs the bounding boxes of objects. Some applications require finer localization - delineating the shape of objects, which is called "object segmentation". In this dissertation, three object localization related problems have been studied: 1) improving the accuracy of object proposals, 2) reducing the labeling effort for object detector training, and 3) segmenting the moving objects in videos. Object proposal generation has been an important pre-processing step for object detectors in general and the convolutional neural network (CNN) detectors in particular. However, some object proposal methods suffer from the "localization bias" problem, that the recall drops rapidly as the localization accuracy requirement increases. Since contours offer a powerful cue for accurate localization, we propose a box refinement method by searching for the optimal contour for each initial bounding box that minimizes the contour cost. The box is then aligned with the contour. Experiments on the PASCAL VOC2007 test dataset show that our box refinement method can significantly improve the object recall at a high overlapping threshold while maintaining a similar recall at a loose one. Given 1000 proposals, the average recall of multiple existing methods is increased by more than 5% with our box refinement process integrated. The second research problem is motivated by the fact that a convolutional neural network based object detectors usually require a large amount of accurately annotated bounding boxes of objects. On the contrary, the image-level labels are much cheaper to achieve. Thus, we supervise the detectors with image-level labels only. A common drawback of such training setting is that the detector usually outputs bounding box of discriminative object parts (e.g. a box of cat face). To address this challenge, we incorporate object segmentation into the detector training, which guides the model to correctly localize the full objects. We propose the multiple instance curriculum learning (MICL) method, which injects curriculum learning (CL) into the multiple instance learning (MIL) framework. The MICL method starts by automatically picking the easy training examples, where the extent of the segmentation mask agrees with detection bounding boxes. The training set is gradually expanded to include harder examples to train strong detectors that handle complex images. The proposed MICL method with segmentation in the loop outperforms the state-of-the-art weakly supervised object detectors by a substantial margin on the PASCAL VOC datasets. In the third part, we propose a method for unsupervised video object segmentation by transferring the knowledge encapsulated in image-based instance embedding networks. The instance embedding network produces an embedding vector for each pixel that enables identifying all pixels belonging to the same object. Though trained on static images, the instance embeddings are stable over consecutive video frames. To reduce the false positives from static objects, a motion-based bilateral network is trained to estimate the background, which is later integrated with instance embeddings into a graph. We classify graph nodes by defining and minimizing a cost function, and segment the video frames based on the node labels. The proposed method outperforms previous state-of-the-art unsupervised video object segmentation methods on several benchmark datasets.
590 $a School code: 0208.
650 4 $a Electrical engineering. $3 649834
690 $a 0544
710 2 $a University of Southern California. $3 700129
773 0 $t Dissertations Abstracts International $g 80-08A.
790 $a 0208
791 $a Ph.D.
792 $a 2018
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=11017104