語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
FindBook
Google Book
Amazon
博客來
Deep Learning for Scene Perception and Understanding.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Deep Learning for Scene Perception and Understanding./
作者:
Wang, Jun.
面頁冊數:
1 online resource (132 pages)
附註:
Source: Dissertations Abstracts International, Volume: 84-12, Section: B.
Contained By:
Dissertations Abstracts International84-12B.
標題:
Electrical engineering. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30421569click for full text (PQDT)
ISBN:
9798379752811
Deep Learning for Scene Perception and Understanding.
Wang, Jun.
Deep Learning for Scene Perception and Understanding.
- 1 online resource (132 pages)
Source: Dissertations Abstracts International, Volume: 84-12, Section: B.
Thesis (Ph.D.)--University of Maryland, College Park, 2023.
Includes bibliographical references
The ability to accurately perceive objects and capture motion information from the environment is crucial in many real-world applications, including autonomous driving, augmented reality and robotics. This dissertation focuses on some fundamental challenges, regarding scene perception, scene understanding, and learning- based autonomous system.We first address the problem of developing a good representation of 3D sensor data for solving scene perception tasks. We start by focusing on learning how to explore the environment of a 3D perception system, including accurately perceiving objects and understanding the motion of dynamic objects. For example, it is critical for robotic agents to be able to develop a good understanding of objects in their environment. We investigate and tackle this problem through different computer vision tasks using a variety of input data. Compared with images, 3D point clouds provide reliable depth and precise geometric information; however, they are generally sparse with varying densities. To handle these challenges, we present a number of methods for efficient object detection and motion learning in the case of large-scale LiDAR point cloud data. In the first part, we consider the problem of 3D point cloud density, not well-explored characteristic for the task of 3D object detection. Our proposed InfoFocus method improves detection by adaptively refining features guided by the information of point cloud density in an end-to-end manner. Inspired by the success of transformer-based architectures in a variety of computer vision tasks, we consequently present another method M3DETR, which unifies multiple point cloud representations, feature scales, as well as model mutual relationships between point clouds simultaneously using transformers for 3D object detection. We also consider the problem of understanding dynamic 3D environments and identifying motion information of objects, which is critical for 3D perception. In the third part, we focus on a temporal sequence of 3D point clouds to extract point-wise motion information. Specifically, we propose a point-based spatiotemporal pyramid architecture, PointMotionNet which handles multiple frames and large-scale scenes, avoids discretization and explicitly learns from the temporal ordering.We note that having a deeper and holistic understanding of environment is quite important to help safely navigate through complex traffic scenarios. Besides accurately classifying, locating objects and predicting their behaviors, it would be crucial for the autonomous system to understand traffic rules of the road, such as spotting traffic signals or temporary road signs. The long-term goal is to build a perception system that has the ability to reason about the environment and adaptively make plans under uncertainty in real time. To reason and make real-time adjustments, the system needs to able to develop a good understanding of the road signs information. Here we address this task of Text-VQA which aims at answering questions that require understanding the textual cues in an image. In the fourth part of the thesis, we develop a method to generate high-quality and rich question-answer (QA) pairs by explicitly utilizing the existing rich text available in the scene context of the input image. The proposed architecture, TAG exploits underexplored scene text information and enhances scene understanding of Text-VQA models by producing meaningful, and accurate QA samples using a multimodal transformer. This method has the potential to be applied to identify challenging traffic situations that the autonomous vehicles will encounter on roads, such as traffic signs (stop/speed limit), one-way street, or evolving streets including road closure or a construction zone.
Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2023
Mode of access: World Wide Web
ISBN: 9798379752811Subjects--Topical Terms:
649834
Electrical engineering.
Subjects--Index Terms:
Deep learningIndex Terms--Genre/Form:
542853
Electronic books.
Deep Learning for Scene Perception and Understanding.
LDR
:05174nmm a2200397K 4500
001
2360087
005
20230925052821.5
006
m o d
007
cr mn ---uuuuu
008
241011s2023 xx obm 000 0 eng d
020
$a
9798379752811
035
$a
(MiAaPQ)AAI30421569
035
$a
AAI30421569
040
$a
MiAaPQ
$b
eng
$c
MiAaPQ
$d
NTU
100
1
$a
Wang, Jun.
$3
892864
245
1 0
$a
Deep Learning for Scene Perception and Understanding.
264
0
$c
2023
300
$a
1 online resource (132 pages)
336
$a
text
$b
txt
$2
rdacontent
337
$a
computer
$b
c
$2
rdamedia
338
$a
online resource
$b
cr
$2
rdacarrier
500
$a
Source: Dissertations Abstracts International, Volume: 84-12, Section: B.
500
$a
Advisor: Davis, Larry S.; JaJa, Joseph F.
502
$a
Thesis (Ph.D.)--University of Maryland, College Park, 2023.
504
$a
Includes bibliographical references
520
$a
The ability to accurately perceive objects and capture motion information from the environment is crucial in many real-world applications, including autonomous driving, augmented reality and robotics. This dissertation focuses on some fundamental challenges, regarding scene perception, scene understanding, and learning- based autonomous system.We first address the problem of developing a good representation of 3D sensor data for solving scene perception tasks. We start by focusing on learning how to explore the environment of a 3D perception system, including accurately perceiving objects and understanding the motion of dynamic objects. For example, it is critical for robotic agents to be able to develop a good understanding of objects in their environment. We investigate and tackle this problem through different computer vision tasks using a variety of input data. Compared with images, 3D point clouds provide reliable depth and precise geometric information; however, they are generally sparse with varying densities. To handle these challenges, we present a number of methods for efficient object detection and motion learning in the case of large-scale LiDAR point cloud data. In the first part, we consider the problem of 3D point cloud density, not well-explored characteristic for the task of 3D object detection. Our proposed InfoFocus method improves detection by adaptively refining features guided by the information of point cloud density in an end-to-end manner. Inspired by the success of transformer-based architectures in a variety of computer vision tasks, we consequently present another method M3DETR, which unifies multiple point cloud representations, feature scales, as well as model mutual relationships between point clouds simultaneously using transformers for 3D object detection. We also consider the problem of understanding dynamic 3D environments and identifying motion information of objects, which is critical for 3D perception. In the third part, we focus on a temporal sequence of 3D point clouds to extract point-wise motion information. Specifically, we propose a point-based spatiotemporal pyramid architecture, PointMotionNet which handles multiple frames and large-scale scenes, avoids discretization and explicitly learns from the temporal ordering.We note that having a deeper and holistic understanding of environment is quite important to help safely navigate through complex traffic scenarios. Besides accurately classifying, locating objects and predicting their behaviors, it would be crucial for the autonomous system to understand traffic rules of the road, such as spotting traffic signals or temporary road signs. The long-term goal is to build a perception system that has the ability to reason about the environment and adaptively make plans under uncertainty in real time. To reason and make real-time adjustments, the system needs to able to develop a good understanding of the road signs information. Here we address this task of Text-VQA which aims at answering questions that require understanding the textual cues in an image. In the fourth part of the thesis, we develop a method to generate high-quality and rich question-answer (QA) pairs by explicitly utilizing the existing rich text available in the scene context of the input image. The proposed architecture, TAG exploits underexplored scene text information and enhances scene understanding of Text-VQA models by producing meaningful, and accurate QA samples using a multimodal transformer. This method has the potential to be applied to identify challenging traffic situations that the autonomous vehicles will encounter on roads, such as traffic signs (stop/speed limit), one-way street, or evolving streets including road closure or a construction zone.
533
$a
Electronic reproduction.
$b
Ann Arbor, Mich. :
$c
ProQuest,
$d
2023
538
$a
Mode of access: World Wide Web
650
4
$a
Electrical engineering.
$3
649834
650
4
$a
Computer engineering.
$3
621879
653
$a
Deep learning
653
$a
Scene perception
653
$a
Capture motion information
653
$a
3D perception system
653
$a
Sensor data
655
7
$a
Electronic books.
$2
lcsh
$3
542853
690
$a
0544
690
$a
0800
690
$a
0464
710
2
$a
ProQuest Information and Learning Co.
$3
783688
710
2
$a
University of Maryland, College Park.
$b
Electrical Engineering.
$3
1018746
773
0
$t
Dissertations Abstracts International
$g
84-12B.
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30421569
$z
click for full text (PQDT)
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9482443
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入