Language:
English
繁體中文
Help
回圖書館首頁
手機版館藏查詢
Login
Back
Switch To:
Labeled
|
MARC Mode
|
ISBD
Multiple Imputation Methods for Larg...
~
Cao, Jian.
Linked to FindBook
Google Book
Amazon
博客來
Multiple Imputation Methods for Large Multi-Scale Data Sets with Missing or Suppressed Values.
Record Type:
Electronic resources : Monograph/item
Title/Author:
Multiple Imputation Methods for Large Multi-Scale Data Sets with Missing or Suppressed Values./
Author:
Cao, Jian.
Published:
Ann Arbor : ProQuest Dissertations & Theses, : 2018,
Description:
138 p.
Notes:
Source: Dissertations Abstracts International, Volume: 80-04, Section: A.
Contained By:
Dissertations Abstracts International80-04A.
Subject:
Economics. -
Online resource:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10831323
ISBN:
9780438447608
Multiple Imputation Methods for Large Multi-Scale Data Sets with Missing or Suppressed Values.
Cao, Jian.
Multiple Imputation Methods for Large Multi-Scale Data Sets with Missing or Suppressed Values.
- Ann Arbor : ProQuest Dissertations & Theses, 2018 - 138 p.
Source: Dissertations Abstracts International, Volume: 80-04, Section: A.
Thesis (Ph.D.)--The Florida State University, 2018.
This item must not be sold to any third party vendors.
Without proper treatment, direct analysis on data sets with missing or suppressed values can lead to biased results. Among all of the missing data handling methods, multiple imputation (MI) methods are regarded as the state of the art. The multiple imputed data sets can, on the one hand, generate unbiased estimates, and on the other hand, provide a reliable way to adjust standard errors based on missing data uncertainty. Despite many advantages, existing MI methods have poor performance on complicated Multi-Scale data, especially when the data set is large. The large data set of interest to us is the Quarterly Census of Employment and Wage (QCEW), which is the employment and wages of every establishment in the US. These detailed data are aggregated up through three scales: industry structure, geographic levels and time. The size of the QCEW data is as large as 210 x 2217 x 3193 ≈ 1.5 billion$ observations. For privacy concerns the data are heavily suppressed and this missingness could appear anywhere in this complicated structure. The existing methods are either accurate or fast but bot both in handling the QCEW data. Our goal is to develop a MI method which is capable of handling the missing value problem of large multi-scale data set both accurately and efficiently. This research addresses this goal in three directions. First, I improve the accuracy of the fastest MI method, Bootstrapping based Expectation Maximization (EMB) algorithm, by equipping it with a Multi-Scale Updating step. This updating step uses the information from the singular covariance matrix to take multi-scale structure into account and to simulate more accurate imputations. Second, I improve the MI method by using a Quasi Monte Carlo technique to accelerate its convergence speed. Finally, I develop a Sequential Parallel Imputation method which can detect the structure and missing pattern of large data sets, and partition it to small data sets automatically. The resulting Parallel Sequential Multi-Scale Bootstrapping Expectation Maximization Multiple Imputation (PSI-MBEMMI) method is accurate, very fast, and can be applied to very large data sets.
ISBN: 9780438447608Subjects--Topical Terms:
517137
Economics.
Subjects--Index Terms:
Bayesian inference
Multiple Imputation Methods for Large Multi-Scale Data Sets with Missing or Suppressed Values.
LDR
:03436nmm a2200385 4500
001
2269023
005
20200908082313.5
008
220629s2018 ||||||||||||||||| ||eng d
020
$a
9780438447608
035
$a
(MiAaPQ)AAI10831323
035
$a
(MiAaPQ)fsu:14706
035
$a
AAI10831323
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Cao, Jian.
$3
1057892
245
1 0
$a
Multiple Imputation Methods for Large Multi-Scale Data Sets with Missing or Suppressed Values.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2018
300
$a
138 p.
500
$a
Source: Dissertations Abstracts International, Volume: 80-04, Section: A.
500
$a
Publisher info.: Dissertation/Thesis.
500
$a
Advisor: Beaumont, Paul.
502
$a
Thesis (Ph.D.)--The Florida State University, 2018.
506
$a
This item must not be sold to any third party vendors.
520
$a
Without proper treatment, direct analysis on data sets with missing or suppressed values can lead to biased results. Among all of the missing data handling methods, multiple imputation (MI) methods are regarded as the state of the art. The multiple imputed data sets can, on the one hand, generate unbiased estimates, and on the other hand, provide a reliable way to adjust standard errors based on missing data uncertainty. Despite many advantages, existing MI methods have poor performance on complicated Multi-Scale data, especially when the data set is large. The large data set of interest to us is the Quarterly Census of Employment and Wage (QCEW), which is the employment and wages of every establishment in the US. These detailed data are aggregated up through three scales: industry structure, geographic levels and time. The size of the QCEW data is as large as 210 x 2217 x 3193 ≈ 1.5 billion$ observations. For privacy concerns the data are heavily suppressed and this missingness could appear anywhere in this complicated structure. The existing methods are either accurate or fast but bot both in handling the QCEW data. Our goal is to develop a MI method which is capable of handling the missing value problem of large multi-scale data set both accurately and efficiently. This research addresses this goal in three directions. First, I improve the accuracy of the fastest MI method, Bootstrapping based Expectation Maximization (EMB) algorithm, by equipping it with a Multi-Scale Updating step. This updating step uses the information from the singular covariance matrix to take multi-scale structure into account and to simulate more accurate imputations. Second, I improve the MI method by using a Quasi Monte Carlo technique to accelerate its convergence speed. Finally, I develop a Sequential Parallel Imputation method which can detect the structure and missing pattern of large data sets, and partition it to small data sets automatically. The resulting Parallel Sequential Multi-Scale Bootstrapping Expectation Maximization Multiple Imputation (PSI-MBEMMI) method is accurate, very fast, and can be applied to very large data sets.
590
$a
School code: 0071.
650
4
$a
Economics.
$3
517137
653
$a
Bayesian inference
653
$a
Bootstrapping
653
$a
Expectation maximization
653
$a
Large data analysis
653
$a
Multiple imputation
653
$a
Quasi-Monte Carlo
690
$a
0501
710
2
$a
The Florida State University.
$b
Economics.
$3
3172919
773
0
$t
Dissertations Abstracts International
$g
80-04A.
790
$a
0071
791
$a
Ph.D.
792
$a
2018
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10831323
based on 0 review(s)
Location:
ALL
電子資源
Year:
Volume Number:
Items
1 records • Pages 1 •
1
Inventory Number
Location Name
Item Class
Material type
Call number
Usage Class
Loan Status
No. of reservations
Opac note
Attachments
W9421257
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
On shelf
0
1 records • Pages 1 •
1
Multimedia
Reviews
Add a review
and share your thoughts with other readers
Export
pickup library
Processing
...
Change password
Login