語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Statistical and Computational Approa...
~
Tran, Lam.
FindBook
Google Book
Amazon
博客來
Statistical and Computational Approaches for Data Integration and Constrained Variable Selection in Large Datasets.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Statistical and Computational Approaches for Data Integration and Constrained Variable Selection in Large Datasets./
作者:
Tran, Lam.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2023,
面頁冊數:
129 p.
附註:
Source: Dissertations Abstracts International, Volume: 85-03, Section: B.
Contained By:
Dissertations Abstracts International85-03B.
標題:
Biostatistics. -
電子資源:
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30747918
ISBN:
9798380371766
Statistical and Computational Approaches for Data Integration and Constrained Variable Selection in Large Datasets.
Tran, Lam.
Statistical and Computational Approaches for Data Integration and Constrained Variable Selection in Large Datasets.
- Ann Arbor : ProQuest Dissertations & Theses, 2023 - 129 p.
Source: Dissertations Abstracts International, Volume: 85-03, Section: B.
Thesis (Ph.D.)--University of Michigan, 2023.
This item must not be sold to any third party vendors.
With the number of covariates, sample size, and heterogeneity in datasets continuously increasing, the incorporation of prior domain knowledge or the addition of structural constraints in a model represents an attractive means to perform informed variable selection on high numbers of potential predictors. The growing complexity of individual datasets has been accompanied by their increasing availability, as researchers nowadays can access ever-expanding biobanks and other large clinical datasets. Integration of external datasets can increase the generalizability of locally-gathered data, but these datasets can be affected by context-specific confounders, necessitating weighted integration methods to differentiate datasets of variable quality.In Chapter 2, we present a method to perform weighted data integration based on minimizing the local data leave-one-out cross-validation (LOOCV) error, under the assumption that the local data is generated from the set of unknown true parameters. We demonstrate how the optimization of the LOOCV error for various models can be written as functions of external dataset weights. Furthermore, we develop an accompanying reduced space approach that reduces the weighted integration of any number of external datasets to a two-parameter optimization. The utility of the weighted data integration method in comparison to existing methods is shown through extensive simulation work mimicking heterogeneous clinical data, as well as in two real-world examples. The first examines kidney transplant patients from the Scientific Registry of Transplant Recipients and the second looks at the genomic data of bladder cancer patients from The Cancer Genome Atlas. Ongoing work on calculating standard error estimates and developing significance testing under a false discovery rate framework is also presented.In Chapter 3, we devise a fast solution to the equality-constrained lasso problem with a two-stage algorithm: first obtaining candidate covariates subsets of increasing size from unconstrained lasso problems and then leveraging an efficient alternating direction method of multipliers (ADMM) algorithm. Our "candidate subset approach" produces the same solution path as solving the constrained lasso over the entire predictor space, and in simulation studies, our approach is over an order of magnitude faster than existing methods. The ability to solve the equality-constrained lasso with multiple constraints and with a large number of potential predictors is demonstrated in a microbiome regression analysis and a myeloma survival analysis, neither of which could be solved by naively fitting the constrained lasso on all predictors.In Chapter 4, we aim to extend the candidate subset approach for constrained variable selection to accommodate different penalty functions and inequality constraints. Despite its desirable selection properties, it is well-known that the lasso is biased for large regression coefficients; to address this shortcoming, we consider our approach with two non-convex penalty functions, SCAD and MCP. Furthermore, we also consider the approach with inequality constraints and dual equality/inequality constraints, which greatly increases the number of potential applications. We demonstrate that the properties of the candidate subset approach, in terms of its speed and producing the same solution over the whole predictor space, additionally hold for these two extensions.
ISBN: 9798380371766Subjects--Topical Terms:
1002712
Biostatistics.
Subjects--Index Terms:
Data integration
Statistical and Computational Approaches for Data Integration and Constrained Variable Selection in Large Datasets.
LDR
:04839nmm a2200421 4500
001
2394932
005
20240513061048.5
006
m o d
007
cr#unu||||||||
008
251215s2023 ||||||||||||||||| ||eng d
020
$a
9798380371766
035
$a
(MiAaPQ)AAI30747918
035
$a
(MiAaPQ)umichrackham005103
035
$a
AAI30747918
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Tran, Lam.
$3
3764429
245
1 0
$a
Statistical and Computational Approaches for Data Integration and Constrained Variable Selection in Large Datasets.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2023
300
$a
129 p.
500
$a
Source: Dissertations Abstracts International, Volume: 85-03, Section: B.
500
$a
Advisor: Jiang, Hui.
502
$a
Thesis (Ph.D.)--University of Michigan, 2023.
506
$a
This item must not be sold to any third party vendors.
506
$a
This item must not be added to any third party search indexes.
520
$a
With the number of covariates, sample size, and heterogeneity in datasets continuously increasing, the incorporation of prior domain knowledge or the addition of structural constraints in a model represents an attractive means to perform informed variable selection on high numbers of potential predictors. The growing complexity of individual datasets has been accompanied by their increasing availability, as researchers nowadays can access ever-expanding biobanks and other large clinical datasets. Integration of external datasets can increase the generalizability of locally-gathered data, but these datasets can be affected by context-specific confounders, necessitating weighted integration methods to differentiate datasets of variable quality.In Chapter 2, we present a method to perform weighted data integration based on minimizing the local data leave-one-out cross-validation (LOOCV) error, under the assumption that the local data is generated from the set of unknown true parameters. We demonstrate how the optimization of the LOOCV error for various models can be written as functions of external dataset weights. Furthermore, we develop an accompanying reduced space approach that reduces the weighted integration of any number of external datasets to a two-parameter optimization. The utility of the weighted data integration method in comparison to existing methods is shown through extensive simulation work mimicking heterogeneous clinical data, as well as in two real-world examples. The first examines kidney transplant patients from the Scientific Registry of Transplant Recipients and the second looks at the genomic data of bladder cancer patients from The Cancer Genome Atlas. Ongoing work on calculating standard error estimates and developing significance testing under a false discovery rate framework is also presented.In Chapter 3, we devise a fast solution to the equality-constrained lasso problem with a two-stage algorithm: first obtaining candidate covariates subsets of increasing size from unconstrained lasso problems and then leveraging an efficient alternating direction method of multipliers (ADMM) algorithm. Our "candidate subset approach" produces the same solution path as solving the constrained lasso over the entire predictor space, and in simulation studies, our approach is over an order of magnitude faster than existing methods. The ability to solve the equality-constrained lasso with multiple constraints and with a large number of potential predictors is demonstrated in a microbiome regression analysis and a myeloma survival analysis, neither of which could be solved by naively fitting the constrained lasso on all predictors.In Chapter 4, we aim to extend the candidate subset approach for constrained variable selection to accommodate different penalty functions and inequality constraints. Despite its desirable selection properties, it is well-known that the lasso is biased for large regression coefficients; to address this shortcoming, we consider our approach with two non-convex penalty functions, SCAD and MCP. Furthermore, we also consider the approach with inequality constraints and dual equality/inequality constraints, which greatly increases the number of potential applications. We demonstrate that the properties of the candidate subset approach, in terms of its speed and producing the same solution over the whole predictor space, additionally hold for these two extensions.
590
$a
School code: 0127.
650
4
$a
Biostatistics.
$3
1002712
650
4
$a
Bioinformatics.
$3
553671
650
4
$a
Information technology.
$3
532993
653
$a
Data integration
653
$a
Variable selection
653
$a
Computational approaches
653
$a
External datasets
653
$a
Genomic data
690
$a
0308
690
$a
0489
690
$a
0715
710
2
$a
University of Michigan.
$b
Biostatistics.
$3
3352160
773
0
$t
Dissertations Abstracts International
$g
85-03B.
790
$a
0127
791
$a
Ph.D.
792
$a
2023
793
$a
English
856
4 0
$u
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30747918
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9503252
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入