東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Multiple Imputation Methods for Larg...

Cao, Jian.

FindBook

Google Book

Amazon

博客來

Multiple Imputation Methods for Large Multi-Scale Data Sets with Missing or Suppressed Values.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Multiple Imputation Methods for Large Multi-Scale Data Sets with Missing or Suppressed Values./
作者:	Cao, Jian.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2018,
面頁冊數:	138 p.
附註:	Source: Dissertations Abstracts International, Volume: 80-04, Section: A.
Contained By:	Dissertations Abstracts International80-04A.
標題:	Economics. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10831323
ISBN:	9780438447608

Multiple Imputation Methods for Large Multi-Scale Data Sets with Missing or Suppressed Values.
Cao, Jian.

Multiple Imputation Methods for Large Multi-Scale Data Sets with Missing or Suppressed Values. - Ann Arbor : ProQuest Dissertations & Theses, 2018 - 138 p.

Source: Dissertations Abstracts International, Volume: 80-04, Section: A.

Thesis (Ph.D.)--The Florida State University, 2018.

This item must not be sold to any third party vendors.

Without proper treatment, direct analysis on data sets with missing or suppressed values can lead to biased results. Among all of the missing data handling methods, multiple imputation (MI) methods are regarded as the state of the art. The multiple imputed data sets can, on the one hand, generate unbiased estimates, and on the other hand, provide a reliable way to adjust standard errors based on missing data uncertainty. Despite many advantages, existing MI methods have poor performance on complicated Multi-Scale data, especially when the data set is large. The large data set of interest to us is the Quarterly Census of Employment and Wage (QCEW), which is the employment and wages of every establishment in the US. These detailed data are aggregated up through three scales: industry structure, geographic levels and time. The size of the QCEW data is as large as 210 x 2217 x 3193 ≈ 1.5 billion$ observations. For privacy concerns the data are heavily suppressed and this missingness could appear anywhere in this complicated structure. The existing methods are either accurate or fast but bot both in handling the QCEW data. Our goal is to develop a MI method which is capable of handling the missing value problem of large multi-scale data set both accurately and efficiently. This research addresses this goal in three directions. First, I improve the accuracy of the fastest MI method, Bootstrapping based Expectation Maximization (EMB) algorithm, by equipping it with a Multi-Scale Updating step. This updating step uses the information from the singular covariance matrix to take multi-scale structure into account and to simulate more accurate imputations. Second, I improve the MI method by using a Quasi Monte Carlo technique to accelerate its convergence speed. Finally, I develop a Sequential Parallel Imputation method which can detect the structure and missing pattern of large data sets, and partition it to small data sets automatically. The resulting Parallel Sequential Multi-Scale Bootstrapping Expectation Maximization Multiple Imputation (PSI-MBEMMI) method is accurate, very fast, and can be applied to very large data sets.

ISBN: 9780438447608Subjects--Topical Terms:

517137
Economics.
Subjects--Index Terms:

Bayesian inference

Multiple Imputation Methods for Large Multi-Scale Data Sets with Missing or Suppressed Values.
LDR:03436nmm a2200385 4500 001 2269023
005 20200908082313.5
008 220629s2018 ||||||||||||||||| ||eng d
020 $a 9780438447608
035 $a (MiAaPQ)AAI10831323
035 $a (MiAaPQ)fsu:14706
035 $a AAI10831323
040 $a MiAaPQ $c MiAaPQ
100 1 $a Cao, Jian. $3 1057892
245 1 0 $a Multiple Imputation Methods for Large Multi-Scale Data Sets with Missing or Suppressed Values.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2018
300 $a 138 p.
500 $a Source: Dissertations Abstracts International, Volume: 80-04, Section: A.
500 $a Publisher info.: Dissertation/Thesis.
500 $a Advisor: Beaumont, Paul.
502 $a Thesis (Ph.D.)--The Florida State University, 2018.
506 $a This item must not be sold to any third party vendors.
520 $a Without proper treatment, direct analysis on data sets with missing or suppressed values can lead to biased results. Among all of the missing data handling methods, multiple imputation (MI) methods are regarded as the state of the art. The multiple imputed data sets can, on the one hand, generate unbiased estimates, and on the other hand, provide a reliable way to adjust standard errors based on missing data uncertainty. Despite many advantages, existing MI methods have poor performance on complicated Multi-Scale data, especially when the data set is large. The large data set of interest to us is the Quarterly Census of Employment and Wage (QCEW), which is the employment and wages of every establishment in the US. These detailed data are aggregated up through three scales: industry structure, geographic levels and time. The size of the QCEW data is as large as 210 x 2217 x 3193 ≈ 1.5 billion$ observations. For privacy concerns the data are heavily suppressed and this missingness could appear anywhere in this complicated structure. The existing methods are either accurate or fast but bot both in handling the QCEW data. Our goal is to develop a MI method which is capable of handling the missing value problem of large multi-scale data set both accurately and efficiently. This research addresses this goal in three directions. First, I improve the accuracy of the fastest MI method, Bootstrapping based Expectation Maximization (EMB) algorithm, by equipping it with a Multi-Scale Updating step. This updating step uses the information from the singular covariance matrix to take multi-scale structure into account and to simulate more accurate imputations. Second, I improve the MI method by using a Quasi Monte Carlo technique to accelerate its convergence speed. Finally, I develop a Sequential Parallel Imputation method which can detect the structure and missing pattern of large data sets, and partition it to small data sets automatically. The resulting Parallel Sequential Multi-Scale Bootstrapping Expectation Maximization Multiple Imputation (PSI-MBEMMI) method is accurate, very fast, and can be applied to very large data sets.
590 $a School code: 0071.
650 4 $a Economics. $3 517137
653 $a Bayesian inference
653 $a Bootstrapping
653 $a Expectation maximization
653 $a Large data analysis
653 $a Multiple imputation
653 $a Quasi-Monte Carlo
690 $a 0501
710 2 $a The Florida State University. $b Economics. $3 3172919
773 0 $t Dissertations Abstracts International $g 80-04A.
790 $a 0071
791 $a Ph.D.
792 $a 2018
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10831323