東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Handling Incomplete High-Dimensional...

Lu, Xiang.

FindBook

Google Book

Amazon

博客來

Handling Incomplete High-Dimensional Multivariate Longitudinal Data with Mixed Data Types by Multiple Imputation Using a Longitudinal Factor Analysis Model.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Handling Incomplete High-Dimensional Multivariate Longitudinal Data with Mixed Data Types by Multiple Imputation Using a Longitudinal Factor Analysis Model./
作者:	Lu, Xiang.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2016,
面頁冊數:	114 p.
附註:	Source: Dissertations Abstracts International, Volume: 77-09, Section: B.
Contained By:	Dissertations Abstracts International77-09B.
標題:	Biostatistics. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10036394
ISBN:	9781339545691

Handling Incomplete High-Dimensional Multivariate Longitudinal Data with Mixed Data Types by Multiple Imputation Using a Longitudinal Factor Analysis Model.
Lu, Xiang.

Handling Incomplete High-Dimensional Multivariate Longitudinal Data with Mixed Data Types by Multiple Imputation Using a Longitudinal Factor Analysis Model. - Ann Arbor : ProQuest Dissertations & Theses, 2016 - 114 p.

Source: Dissertations Abstracts International, Volume: 77-09, Section: B.

Thesis (Ph.D.)--University of California, Los Angeles, 2016.

This item must not be sold to any third party vendors.

We developed an imputation model solving the missing-data problem in a high-dimensional longitudinal data set with mixed data types (continuous and ordinal) based on a factor-analysis and a linear mixed-effect model. Markov Chain Monte Carlo is used to fit the model, drawing parameters, latent variables and missing values iteratively. The imputation model is written in an R package. We tested the newly developed imputation model using simulated data sets under 32 scenarios and 2 hypothetical missing-data mechanisms. Two competitive models PAN (Multiple Imputation for Multivariate Panel or Clustered Data) and MICE (Multiple Imputation using Chained Equations) are also tested in the same way for comparison, to show the necessity of addressing the high-dimension and mixed continuous and ordinal data type issues. Part of the effort we made is to accelerate the simulation using C++ (a low-level language) and the parallel computing by the Hoffman 2 Cluster. Compared to running the simulation evaluation in an R program on one single computer, the program we use for the simulation evaluation runs approximately 600 times faster. We also tested the robustness of the newly developed imputation model in the cases of violation of assumptions. We found that assuming less than the true number of factors corresponds to invalid inferences, while assuming more than that corresponds to reasonable inferences. We also found that only omitting very strong underlying quadratic trends of the factor scores hurt the inferences based on the imputation. In the most unfavorable scenario we tested, when the underlying quadratic coefficient is as large as .8 of the linear coefficient, the actual coverage rates of 95% interval estimates start falling below 90%. An application to a dentistry data is shown, in comparison to the PAN, NORM and a fore runner of the newly developed method.

ISBN: 9781339545691Subjects--Topical Terms:

1002712
Biostatistics.
Subjects--Index Terms:

Factor analysis

Handling Incomplete High-Dimensional Multivariate Longitudinal Data with Mixed Data Types by Multiple Imputation Using a Longitudinal Factor Analysis Model.
LDR:03187nmm a2200373 4500 001 2268989
005 20200908082306.5
008 220629s2016 ||||||||||||||||| ||eng d
020 $a 9781339545691
035 $a (MiAaPQ)AAI10036394
035 $a (MiAaPQ)ucla:14328
035 $a AAI10036394
040 $a MiAaPQ $c MiAaPQ
100 1 $a Lu, Xiang. $3 1911329
245 1 0 $a Handling Incomplete High-Dimensional Multivariate Longitudinal Data with Mixed Data Types by Multiple Imputation Using a Longitudinal Factor Analysis Model.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2016
300 $a 114 p.
500 $a Source: Dissertations Abstracts International, Volume: 77-09, Section: B.
500 $a Publisher info.: Dissertation/Thesis.
500 $a Advisor: Belin, Thomas R.
502 $a Thesis (Ph.D.)--University of California, Los Angeles, 2016.
506 $a This item must not be sold to any third party vendors.
520 $a We developed an imputation model solving the missing-data problem in a high-dimensional longitudinal data set with mixed data types (continuous and ordinal) based on a factor-analysis and a linear mixed-effect model. Markov Chain Monte Carlo is used to fit the model, drawing parameters, latent variables and missing values iteratively. The imputation model is written in an R package. We tested the newly developed imputation model using simulated data sets under 32 scenarios and 2 hypothetical missing-data mechanisms. Two competitive models PAN (Multiple Imputation for Multivariate Panel or Clustered Data) and MICE (Multiple Imputation using Chained Equations) are also tested in the same way for comparison, to show the necessity of addressing the high-dimension and mixed continuous and ordinal data type issues. Part of the effort we made is to accelerate the simulation using C++ (a low-level language) and the parallel computing by the Hoffman 2 Cluster. Compared to running the simulation evaluation in an R program on one single computer, the program we use for the simulation evaluation runs approximately 600 times faster. We also tested the robustness of the newly developed imputation model in the cases of violation of assumptions. We found that assuming less than the true number of factors corresponds to invalid inferences, while assuming more than that corresponds to reasonable inferences. We also found that only omitting very strong underlying quadratic trends of the factor scores hurt the inferences based on the imputation. In the most unfavorable scenario we tested, when the underlying quadratic coefficient is as large as .8 of the linear coefficient, the actual coverage rates of 95% interval estimates start falling below 90%. An application to a dentistry data is shown, in comparison to the PAN, NORM and a fore runner of the newly developed method.
590 $a School code: 0031.
650 4 $a Biostatistics. $3 1002712
653 $a Factor analysis
653 $a High-dimensional
653 $a Imputation
653 $a Longitudinal
653 $a Missing data
690 $a 0308
710 2 $a University of California, Los Angeles. $b Biostatistics. $3 3280770
773 0 $t Dissertations Abstracts International $g 77-09B.
790 $a 0031
791 $a Ph.D.
792 $a 2016
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10036394