東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Learning Single-Image 3D from the In...

Chen, Weifeng.

FindBook

Google Book

Amazon

博客來

Learning Single-Image 3D from the Internet.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Learning Single-Image 3D from the Internet./
作者:	Chen, Weifeng.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2020,
面頁冊數:	106 p.
附註:	Source: Dissertations Abstracts International, Volume: 82-07, Section: B.
Contained By:	Dissertations Abstracts International82-07B.
標題:	Computer science. -
電子資源:	https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28240124
ISBN:	9798684623592

Learning Single-Image 3D from the Internet.
Chen, Weifeng.

Learning Single-Image 3D from the Internet. - Ann Arbor : ProQuest Dissertations & Theses, 2020 - 106 p.

Source: Dissertations Abstracts International, Volume: 82-07, Section: B.

Thesis (Ph.D.)--University of Michigan, 2020.

This item must not be sold to any third party vendors.

Single-image 3D refers to the task of recovering 3D properties such as depth and surface normals from an RGB image. It is one of the fundamental problems in Computer Vision, and its progress has the potential to bring major advancement to various other fields in vision. Although significant progress has been made in this field, the current best systems still struggle to perform well on arbitrary images "in the wild", i.e. images that depict all kinds of contents and scenes. One major obstacle is the lack of diverse training data. This dissertation makes contributions towards solving the data issue by extracting 3D supervision from the Internet, and proposing novel algorithms to learn from Internet 3D to significantly advance single-view 3D perception. First, we have constructed "Depth in the Wild" (DIW), a depth dataset consisting of 0.5 million diverse images. Each image is manually annotated with randomly sampled points and their relative depth. After benchmarking state-of-the-art single-view 3D systems on DIW, we found that even though current arts perform well on existing datasets, they perform poorly on images in the wild. We then propose a novel algorithm that learns to estimate depth using annotations of relative depth. Compared to the state of the art, our algorithm is simpler and performs better. Experiments show that our algorithm, combined with existing RGB-D data and our new relative depth annotations, significantly improves single-image depth perception in the wild.Second, we have constructed "Surface Normals in the Wild" (SNOW), a dataset with 60K Internet images, each manually annotated with the surface normal for one randomly sampled point. We explore advancing depth perception in the wild using surface normal as supervision. To train networks with surface normal annotations, we propose two novel losses, one that emphasizes depth accuracy, and another one that emphasizes surface normal accuracy. Experiments show that our approach significantly improves the quality of depth estimation in the wild.Third, we have constructed "Open Annotations of Single-Image Surfaces" (OASIS), a large-scale dataset for single-image 3D in the wild. It consists of pixel-wise reconstructions of 3D surfaces for 140K randomly sampled Internet images. Six types of 3D properties are manually annotated for each image: occlusion boundary (depth discontinuity), fold boundary (normal discontinuity), surface normal, relative depth, relative normal (orthogonal, parallel, or neither), and planarity (planar or not). The rich annotations of human 3D perception in OASIS open up new research opportunities on a spectrum of single-image 3D tasks -- they provide in-the-wild ground truths either for the first time, or at a much larger scale than prior work. By benchmarking leading deep learning models on a variety of 3D tasks, we observe a large room for performance improvement, pointing to ample research opportunities for designing new learning algorithms for single-image 3D.Finally, we have constructed "YouTube3D", a large-scale dataset with relative depth annotations for 795K images, spanning 121K videos. YouTube3D is collected fully automatically with a pipeline based on Structure-from-Motion (SfM). The key component is a novel Quality Assessment Network that identifies high-quality reconstructions obtained from SfM. It successfully eliminates erroneous reconstructions to guarantee data quality. Experiments demonstrate that YouTube3D is useful in advancing single-view depth estimation in the wild.

ISBN: 9798684623592Subjects--Topical Terms:

523869
Computer science.
Subjects--Index Terms:

3D reconstruction

Learning Single-Image 3D from the Internet.
LDR:04899nmm a2200457 4500 001 2281891
005 20210927083422.5
008 220723s2020 ||||||||||||||||| ||eng d
020 $a 9798684623592
035 $a (MiAaPQ)AAI28240124
035 $a (MiAaPQ)umichrackham003382
035 $a AAI28240124
040 $a MiAaPQ $c MiAaPQ
100 1 $a Chen, Weifeng. $3 3220195
245 1 0 $a Learning Single-Image 3D from the Internet.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2020
300 $a 106 p.
500 $a Source: Dissertations Abstracts International, Volume: 82-07, Section: B.
500 $a Advisor: Deng, Jia;Fouhey, David Ford.
502 $a Thesis (Ph.D.)--University of Michigan, 2020.
506 $a This item must not be sold to any third party vendors.
506 $a This item must not be added to any third party search indexes.
520 $a Single-image 3D refers to the task of recovering 3D properties such as depth and surface normals from an RGB image. It is one of the fundamental problems in Computer Vision, and its progress has the potential to bring major advancement to various other fields in vision. Although significant progress has been made in this field, the current best systems still struggle to perform well on arbitrary images "in the wild", i.e. images that depict all kinds of contents and scenes. One major obstacle is the lack of diverse training data. This dissertation makes contributions towards solving the data issue by extracting 3D supervision from the Internet, and proposing novel algorithms to learn from Internet 3D to significantly advance single-view 3D perception. First, we have constructed "Depth in the Wild" (DIW), a depth dataset consisting of 0.5 million diverse images. Each image is manually annotated with randomly sampled points and their relative depth. After benchmarking state-of-the-art single-view 3D systems on DIW, we found that even though current arts perform well on existing datasets, they perform poorly on images in the wild. We then propose a novel algorithm that learns to estimate depth using annotations of relative depth. Compared to the state of the art, our algorithm is simpler and performs better. Experiments show that our algorithm, combined with existing RGB-D data and our new relative depth annotations, significantly improves single-image depth perception in the wild.Second, we have constructed "Surface Normals in the Wild" (SNOW), a dataset with 60K Internet images, each manually annotated with the surface normal for one randomly sampled point. We explore advancing depth perception in the wild using surface normal as supervision. To train networks with surface normal annotations, we propose two novel losses, one that emphasizes depth accuracy, and another one that emphasizes surface normal accuracy. Experiments show that our approach significantly improves the quality of depth estimation in the wild.Third, we have constructed "Open Annotations of Single-Image Surfaces" (OASIS), a large-scale dataset for single-image 3D in the wild. It consists of pixel-wise reconstructions of 3D surfaces for 140K randomly sampled Internet images. Six types of 3D properties are manually annotated for each image: occlusion boundary (depth discontinuity), fold boundary (normal discontinuity), surface normal, relative depth, relative normal (orthogonal, parallel, or neither), and planarity (planar or not). The rich annotations of human 3D perception in OASIS open up new research opportunities on a spectrum of single-image 3D tasks -- they provide in-the-wild ground truths either for the first time, or at a much larger scale than prior work. By benchmarking leading deep learning models on a variety of 3D tasks, we observe a large room for performance improvement, pointing to ample research opportunities for designing new learning algorithms for single-image 3D.Finally, we have constructed "YouTube3D", a large-scale dataset with relative depth annotations for 795K images, spanning 121K videos. YouTube3D is collected fully automatically with a pipeline based on Structure-from-Motion (SfM). The key component is a novel Quality Assessment Network that identifies high-quality reconstructions obtained from SfM. It successfully eliminates erroneous reconstructions to guarantee data quality. Experiments demonstrate that YouTube3D is useful in advancing single-view depth estimation in the wild.
590 $a School code: 0127.
650 4 $a Computer science. $3 523869
650 4 $a Artificial intelligence. $3 516317
650 4 $a Design. $3 518875
650 4 $a Web studies. $3 2122754
650 4 $a Systems science. $3 3168411
650 4 $a Information technology. $3 532993
650 4 $a Optics. $3 517925
653 $a 3D reconstruction
653 $a Internet
653 $a Single-image 3D
653 $a RGB image
653 $a 3D systems
653 $a YouTube3D
690 $a 0984
690 $a 0790
690 $a 0389
690 $a 0752
690 $a 0489
690 $a 0800
690 $a 0646
710 2 $a University of Michigan. $b Computer Science & Engineering. $3 3285590
773 0 $t Dissertations Abstracts International $g 82-07B.
790 $a 0127
791 $a Ph.D.
792 $a 2020
793 $a English
856 4 0 $u https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28240124