語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Video representation for fine-graine...
~
Zhou, Yang.
FindBook
Google Book
Amazon
博客來
Video representation for fine-grained action recognition.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Video representation for fine-grained action recognition./
作者:
Zhou, Yang.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2016,
面頁冊數:
124 p.
附註:
Source: Dissertations Abstracts International, Volume: 78-05, Section: B.
Contained By:
Dissertations Abstracts International78-05B.
標題:
Computer science. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10151035
ISBN:
9781369057997
Video representation for fine-grained action recognition.
Zhou, Yang.
Video representation for fine-grained action recognition.
- Ann Arbor : ProQuest Dissertations & Theses, 2016 - 124 p.
Source: Dissertations Abstracts International, Volume: 78-05, Section: B.
Thesis (Ph.D.)--The University of Texas at San Antonio, 2016.
This item must not be added to any third party search indexes.
Recently, fine-grained action analysis has raised a lot of research interests due to its potential applications in smart home, medical surveillance, daily living assist and child/elderly care, where action videos are captured indoor with fixed camera. Although background motion (i.e. one of main challenges for general action recognition) is more controlled compared to general action recognition, it is widely acknowledged that fine-grained action recognition is very challenging due to large intra-class variability, small inter-class variability, large variety of action categories, complex motions and complicated interactions. Fine-Grained actions, especially the manipulation sequences involve a large amount of interactions between hands and objects, therefore how to model the interactions between human hands and objects (i.e., context) plays an important role in action representation and recognition. We propose to discover the manipulated objects by human by modeling which objects are being manipulated and how they are being operated. Firstly, we propose a representation and classification pipeline which seamlessly incorporates localized semantic information into every processing step for fine-grained action recognition. In the feature extraction stage, we explore the geometric information between local motion features and the surrounding objects. In the feature encoding stage, we develop a semantic-grouped locality-constrained linear coding (SG-LLC) method that captures the joint distributions between motion and object-in-use information. Finally, we propose a semantic-aware multiple kernel learning framework (SA-MKL) by utilizing the empirical joint distribution between action and object type for more discriminative action classification. This approach can discover and model the inter- actions between human and objects. However, discovering the detailed knowledge of pre-detected objects (e.g. drawer and refrigerator). Thus, the performance of action recognition is constrained by object recognition, not to mention detection of objects requires tedious human labor for object annotation. Secondly, we propose a mid-level video representation to be suitable for fine-grained action classification. Given an input video sequence, we densely sample a large amount of spatio-temporal motion parts by temporal segmentation with spatial segmentation, and represent them with local motion features. The dense mid-level candidate parts are rich in localized motion information, which is crucial to fine-grained action recognition. From the candidate spatio-temporal parts, we perform an unsupervised approach to discover and learn the representative part detectors for final video representation. By utilizing the dense spatio-temporal motion parts, we highlight the human-object interactions and localized delicate motion in the local spatio-temporal sub-volume of the video. Thirdly, we propose a novel fine-grained action recognition pipeline by interaction part proposal and discriminative mid-level part mining. Firstly, we generate a large number of candidate object regions using off-the-shelf object proposal tool, e.g., BING. Secondly, these object regions are matched and tracked across frames to form a large spatio-temporal graph based on the appearance matching and the dense motion trajectories through them. We then propose an efficient approximate graph segmentation algorithm to partition and filter the graph into consistent local dense sub-graphs. These sub-graphs, which are spatio-temporal sub-volumes, represent our candidate interaction parts. Finally, we mine discriminative mid-level part detectors from the features computed over the candidate interaction parts. Bag-of-detection scores based on a novel Max-N pooling scheme are computed as the action representation for a video sample. Finally, we also focus on the first-view (egocentric) action recognition problem, which contains lots of hand-object interactions. On one hand, we propose a novel end-to-end trainable semantic parsing network for hand segmentation. On the other hand, we propose a second end-to-end deep convolutional network to maximally utilize the contextual information among hand, foreground object, and motion for interactional foreground object detection.
ISBN: 9781369057997Subjects--Topical Terms:
523869
Computer science.
Subjects--Index Terms:
Action recognition
Video representation for fine-grained action recognition.
LDR
:05549nmm a2200373 4500
001
2267226
005
20200623064711.5
008
220629s2016 ||||||||||||||||| ||eng d
020
$a
9781369057997
035
$a
(MiAaPQ)AAI10151035
035
$a
(MiAaPQ)utsa:11963
035
$a
AAI10151035
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Zhou, Yang.
$3
3281209
245
1 0
$a
Video representation for fine-grained action recognition.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2016
300
$a
124 p.
500
$a
Source: Dissertations Abstracts International, Volume: 78-05, Section: B.
500
$a
Publisher info.: Dissertation/Thesis.
500
$a
Advisor: Tian, Qi.
502
$a
Thesis (Ph.D.)--The University of Texas at San Antonio, 2016.
506
$a
This item must not be added to any third party search indexes.
506
$a
This item must not be sold to any third party vendors.
520
$a
Recently, fine-grained action analysis has raised a lot of research interests due to its potential applications in smart home, medical surveillance, daily living assist and child/elderly care, where action videos are captured indoor with fixed camera. Although background motion (i.e. one of main challenges for general action recognition) is more controlled compared to general action recognition, it is widely acknowledged that fine-grained action recognition is very challenging due to large intra-class variability, small inter-class variability, large variety of action categories, complex motions and complicated interactions. Fine-Grained actions, especially the manipulation sequences involve a large amount of interactions between hands and objects, therefore how to model the interactions between human hands and objects (i.e., context) plays an important role in action representation and recognition. We propose to discover the manipulated objects by human by modeling which objects are being manipulated and how they are being operated. Firstly, we propose a representation and classification pipeline which seamlessly incorporates localized semantic information into every processing step for fine-grained action recognition. In the feature extraction stage, we explore the geometric information between local motion features and the surrounding objects. In the feature encoding stage, we develop a semantic-grouped locality-constrained linear coding (SG-LLC) method that captures the joint distributions between motion and object-in-use information. Finally, we propose a semantic-aware multiple kernel learning framework (SA-MKL) by utilizing the empirical joint distribution between action and object type for more discriminative action classification. This approach can discover and model the inter- actions between human and objects. However, discovering the detailed knowledge of pre-detected objects (e.g. drawer and refrigerator). Thus, the performance of action recognition is constrained by object recognition, not to mention detection of objects requires tedious human labor for object annotation. Secondly, we propose a mid-level video representation to be suitable for fine-grained action classification. Given an input video sequence, we densely sample a large amount of spatio-temporal motion parts by temporal segmentation with spatial segmentation, and represent them with local motion features. The dense mid-level candidate parts are rich in localized motion information, which is crucial to fine-grained action recognition. From the candidate spatio-temporal parts, we perform an unsupervised approach to discover and learn the representative part detectors for final video representation. By utilizing the dense spatio-temporal motion parts, we highlight the human-object interactions and localized delicate motion in the local spatio-temporal sub-volume of the video. Thirdly, we propose a novel fine-grained action recognition pipeline by interaction part proposal and discriminative mid-level part mining. Firstly, we generate a large number of candidate object regions using off-the-shelf object proposal tool, e.g., BING. Secondly, these object regions are matched and tracked across frames to form a large spatio-temporal graph based on the appearance matching and the dense motion trajectories through them. We then propose an efficient approximate graph segmentation algorithm to partition and filter the graph into consistent local dense sub-graphs. These sub-graphs, which are spatio-temporal sub-volumes, represent our candidate interaction parts. Finally, we mine discriminative mid-level part detectors from the features computed over the candidate interaction parts. Bag-of-detection scores based on a novel Max-N pooling scheme are computed as the action representation for a video sample. Finally, we also focus on the first-view (egocentric) action recognition problem, which contains lots of hand-object interactions. On one hand, we propose a novel end-to-end trainable semantic parsing network for hand segmentation. On the other hand, we propose a second end-to-end deep convolutional network to maximally utilize the contextual information among hand, foreground object, and motion for interactional foreground object detection.
590
$a
School code: 1283.
650
4
$a
Computer science.
$3
523869
653
$a
Action recognition
653
$a
Action videos
653
$a
Fine-Grained actions
653
$a
Semantic information
690
$a
0984
710
2
$a
The University of Texas at San Antonio.
$b
Computer Science.
$3
1065531
773
0
$t
Dissertations Abstracts International
$g
78-05B.
790
$a
1283
791
$a
Ph.D.
792
$a
2016
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10151035
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9419460
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入