東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

High-dimensional classification and ...

Lo, Shin-Lian.

FindBook

Google Book

Amazon

博客來

High-dimensional classification and attribute-based forecasting.

紀錄類型:	書目-語言資料,印刷品 : Monograph/item
正題名/作者:	High-dimensional classification and attribute-based forecasting./
作者:	Lo, Shin-Lian.
面頁冊數:	133 p.
附註:	Source: Dissertation Abstracts International, Volume: 72-06, Section: B, page: .
Contained By:	Dissertation Abstracts International72-06B.
標題:	Statistics. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3451262
ISBN:	9781124564111

High-dimensional classification and attribute-based forecasting.
Lo, Shin-Lian.

High-dimensional classification and attribute-based forecasting. - 133 p.

Source: Dissertation Abstracts International, Volume: 72-06, Section: B, page: .

Thesis (Ph.D.)--Georgia Institute of Technology, 2010.

This thesis consists of two parts. The first part focuses on high-dimensional classification problems in microarray experiments. The second part deals with forecasting problems with a large number of categories in predictors.

ISBN: 9781124564111Subjects--Topical Terms:

517247
Statistics.

High-dimensional classification and attribute-based forecasting.
LDR:06335nam 2200349 4500 001 1402510
005 20111102140017.5
008 130515s2010 ||||||||||||||||| ||eng d
020 $a 9781124564111
035 $a (UMI)AAI3451262
035 $a AAI3451262
040 $a UMI $c UMI
100 1 $a Lo, Shin-Lian. $3 1681704
245 1 0 $a High-dimensional classification and attribute-based forecasting.
300 $a 133 p.
500 $a Source: Dissertation Abstracts International, Volume: 72-06, Section: B, page: .
500 $a Advisers: Kwok-Leung Tsui; Ying Hung.
502 $a Thesis (Ph.D.)--Georgia Institute of Technology, 2010.
520 $a This thesis consists of two parts. The first part focuses on high-dimensional classification problems in microarray experiments. The second part deals with forecasting problems with a large number of categories in predictors.
520 $a The first part of this thesis contains four chapters. The first chapter provides an overall introduction of microarray experiments and associated classification issues. The second chapter reviews some existing variable selection and classification methods. The third chapter develops a new classification approach to maintain variable selection consistency and classification accuracy in high dimensionality. The fourth chapter proposes a new classification method in the consideration of different variability among experimental observations. The second part of this thesis is included in chapter five, where a new forecasting approach that deals with a large number of categories in predictors and takes into account predictor structures is developed.
520 $a Classification problems in microarray experiments refer to discriminating subjects with different biologic phenotypes or known tumor subtypes as well as to predicting the clinical outcomes or the prognostic stages of subjects. A typical microarray experiment monitors the expression levels of thousands of genes taken from tens of subjects. Due to the large number of genes with a relatively small sample size, most traditional classification methods require preliminary variable selection before being employed for classification. As a result, the classification accuracy of such methods strongly relies on the choice of the pre-selected variables. Different from traditional classification methods, the penalized logistic regression method is known for simultaneous variable selection and classification. However, the performance of this method declines as the number of variables increases. With this concern, in chapter three, we propose a new classification approach that employs the penalized logistic regression method iteratively with a controlled size of gene subsets to maintain variable selection consistency and classification accuracy. Moreover, we incorporate a randomized heuristic algorithm that efficiently searches for the optimal gene subset without an exhaustive search. The performance of the new classification approach is evaluated and compared with existing methods through four real-world microarray datasets and a simulation study. The results show that the new approach outperforms the existing methods in terms of gene selection and classification accuracy.
520 $a The research described in chapter four is motivated by a modern microarray experiment that includes two layers of replicates. This new experimental setting causes most existing classification methods, including penalized logistic regression, not appropriate to be directly applied because the correlations among replicates violate the assumption of independent observations in penalized logistic regression. To solve this problem, we propose a new classification method by incorporating random effects into penalized logistic regression such that the heterogeneity among different experimental subjects and the correlations from repeated measurements can be taken into account. The proposed method, however, poses computational challenges because the high-dimensional integrals over the distribution of random effects can not be expressed in a closed form. Therefore, an efficient hybrid algorithm is introduced to tackle the difficulties in estimation and integration over random effect distributions. The theoretical results of variable selection consistency is also presented, and the finite sample performance is examined via a simulation study. Applications to a modern microarray experiment in breast cancer study show that the proposed classification method obtains smaller models with higher prediction accuracy than the method based on the assumption of independent observations.
520 $a In chapter five, we propose a new forecasting approach for large-scale datasets associated with a large number of predictor categories and with observed predictor structures. The new approach is similar to tree-based methods that grow a number of nodes through splitting and adopt piecewise constant prediction at terminal nodes. However, conventional tree-based methods do not accommodate intrinsic predictor structures, and they are not generally considered efficient to deal with a large number of categorical values in predictors. Beyond the conventional tree-based methods, the new approach incorporates observed predictor structures by a general linear model and multi-way hierarchical splits to make the grown trees more comprehensive, efficient, and interpretable. Through an empirical study of a capacity forecasting problem in the air cargo industry, we show that the new approach has higher forecasting accuracy and higher computational efficiency than existing tree-based methods consistently over time. Furthermore, we investigate the performance of the new approach under different circumstances via a simulation study. The simulation results show that the forecasting accuracy and the computational efficiency of the new approach is less influenced by the number of predictor categories and the irrelevant predictors than existing tree-based methods.
590 $a School code: 0078.
650 4 $a Statistics. $3 517247
650 4 $a Engineering, Industrial. $3 626639
650 4 $a Biology, Bioinformatics. $3 1018415
690 $a 0463
690 $a 0546
690 $a 0715
710 2 $a Georgia Institute of Technology. $3 696730
773 0 $t Dissertation Abstracts International $g 72-06B.
790 1 0 $a Tsui, Kwok-Leung, $e advisor
790 1 0 $a Hung, Ying, $e advisor
790 $a 0078
791 $a Ph.D.
792 $a 2010
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3451262