語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
FindBook
Google Book
Amazon
博客來
Statistical and Machine Learning Methods for Multi-Study Prediction and Causal Inference.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Statistical and Machine Learning Methods for Multi-Study Prediction and Causal Inference./
作者:
Wang, Cathy.
面頁冊數:
1 online resource (137 pages)
附註:
Source: Dissertations Abstracts International, Volume: 84-05, Section: B.
Contained By:
Dissertations Abstracts International84-05B.
標題:
Biostatistics. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=29252400click for full text (PQDT)
ISBN:
9798357575784
Statistical and Machine Learning Methods for Multi-Study Prediction and Causal Inference.
Wang, Cathy.
Statistical and Machine Learning Methods for Multi-Study Prediction and Causal Inference.
- 1 online resource (137 pages)
Source: Dissertations Abstracts International, Volume: 84-05, Section: B.
Thesis (Ph.D.)--Harvard University, 2022.
Includes bibliographical references
In many areas of biomedical research, exponential advances in technology and facilitation of systematic data-sharing increased access to multiple studies. This dissertation proposes and compares methods to address three challenges in multi-study learning. First, personalized cancer risk assessment is key to early prevention, but studies typically report aggregated risk information. We address this challenge by proposing a method that integrates and deconvolves aggregated risk, allowing for heterogeneity in study populations, design, and risk measures, to provide personalized risk estimates that comprehensively reflect the best available data. Second, prediction models are widely used to evaluate disease risk and inform decisions about treatment, but models trained on a single study generally perform worse on out-of-study samples. To address this challenge, we compare two strategies for training prediction models on multiple studies to improve generalizability: merging and ensembling; in practice, our theory can help guide decisions on choosing the ideal strategy. Third, heterogeneous treatment effect estimation is central to personalizing treatment and improving clinical practice, but existing approaches on synthesizing evidence across multiple studies do not account for between-study heterogeneity. We address this challenge by proposing a flexible method that estimates heterogeneous treatment effects from multiple studies, including evidence from randomized controlled trials and real world data, while appropriately accounting for between-study differences in the propensity score and outcome models.In Chapter 1, we propose a meta-analytic approach for deconvolving aggregated risks to provide age-, gene-, and sex-specific cancer risk. Carriers of pathogenic variants in mismatch repair (MMR) genes benefit from reliable information about their cancer risk to better inform targeted surveillance strategies for colorectal cancer (CRC), but published estimates vary. Variation in published estimates could arise from differences in study designs, selection criteria for molecular testing, and statistical adjustments for ascertainment. Previous meta-analyses of CRC risk are based on studies that report gene- and sex-specific risk. This may exclude studies that provide aggregated cancer risk across sex and genes and lead to bias. To address this challenge, our meta-analytic approach has the ability to deconvolve aggregated risks, allowing us to use all of the information available in the literature and provide more comprehensive penetrance estimates. This method can be applied in the future to other gene/cancer combinations without restriction on the mutation.In Chapter 2, we compare methods for training gradient boosting models on multiple studies. When training and test studies come from different distributions, prediction models trained on a single study generally perform worse on out-of-study samples due to heterogeneity in study design, data collection methods, and sample characteristics. Training prediction models on multiple studies can address this challenge and improve cross-study replicability of predictions. We focus on two strategies for training cross-study replicable models: 1) merging all studies and training a single model, and 2) multi-study ensembling, which involves training a separate model on each study and combining the resulting predictions. We study boosting algorithms in a regression setting and compare cross-study replicability of merging vs. multi-study ensembling both empirically and theoretically. In particular, we characterize an analytical transition point beyond which ensembling exhibits lower prediction error than merging for boosting with linear learners. We verify the theoretical transition point empirically and illustrate how it may guide practitioners' choice regarding merging vs. ensembling in a breast cancer application.In Chapter 3, we propose an approach for estimating heterogeneous treatment effects in multiple studies. Heterogeneous treatment effect estimation is central to many modern statistical applications, such as precision medicine. Despite increased access to multiple studies, existing methods on heterogeneous treatment effect estimation are largely rooted in theory based on a single study. These methods generally rely on the assumption that the heterogeneous treatment effect is the same across studies. However, this assumption may be untenable under potential heterogeneity in study design, data collection methods, and sample characteristics across multiple studies. To address this challenge, we propose the multi-study R-learner for estimating heterogeneous treatment effects under the presence of between-study heterogeneity. This method allows information to be borrowed across multiple studies and allows flexible modeling of the nuisance components with machine learning methods. We show analytically that optimizing the multi-study R-loss is equivalent to optimizing the oracle loss up to an error that diminishes at a relatively fast rate with the sample size. Under the series estimation framework, we derive a pointwise normality result for the multi-study R-learner estimator. Empirically, we show that as between-study heterogeneity increases, the multi-study R-learner results in lower estimation error than the R-learner via simulations and a breast cancer application.
Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2023
Mode of access: World Wide Web
ISBN: 9798357575784Subjects--Topical Terms:
1002712
Biostatistics.
Subjects--Index Terms:
Causal inferenceIndex Terms--Genre/Form:
542853
Electronic books.
Statistical and Machine Learning Methods for Multi-Study Prediction and Causal Inference.
LDR
:06747nmm a2200373K 4500
001
2359804
005
20230917195251.5
006
m o d
007
cr mn ---uuuuu
008
241011s2022 xx obm 000 0 eng d
020
$a
9798357575784
035
$a
(MiAaPQ)AAI29252400
035
$a
AAI29252400
040
$a
MiAaPQ
$b
eng
$c
MiAaPQ
$d
NTU
100
1
$a
Wang, Cathy.
$3
3700420
245
1 0
$a
Statistical and Machine Learning Methods for Multi-Study Prediction and Causal Inference.
264
0
$c
2022
300
$a
1 online resource (137 pages)
336
$a
text
$b
txt
$2
rdacontent
337
$a
computer
$b
c
$2
rdamedia
338
$a
online resource
$b
cr
$2
rdacarrier
500
$a
Source: Dissertations Abstracts International, Volume: 84-05, Section: B.
500
$a
Advisor: Parmigiani, Giovanni.
502
$a
Thesis (Ph.D.)--Harvard University, 2022.
504
$a
Includes bibliographical references
520
$a
In many areas of biomedical research, exponential advances in technology and facilitation of systematic data-sharing increased access to multiple studies. This dissertation proposes and compares methods to address three challenges in multi-study learning. First, personalized cancer risk assessment is key to early prevention, but studies typically report aggregated risk information. We address this challenge by proposing a method that integrates and deconvolves aggregated risk, allowing for heterogeneity in study populations, design, and risk measures, to provide personalized risk estimates that comprehensively reflect the best available data. Second, prediction models are widely used to evaluate disease risk and inform decisions about treatment, but models trained on a single study generally perform worse on out-of-study samples. To address this challenge, we compare two strategies for training prediction models on multiple studies to improve generalizability: merging and ensembling; in practice, our theory can help guide decisions on choosing the ideal strategy. Third, heterogeneous treatment effect estimation is central to personalizing treatment and improving clinical practice, but existing approaches on synthesizing evidence across multiple studies do not account for between-study heterogeneity. We address this challenge by proposing a flexible method that estimates heterogeneous treatment effects from multiple studies, including evidence from randomized controlled trials and real world data, while appropriately accounting for between-study differences in the propensity score and outcome models.In Chapter 1, we propose a meta-analytic approach for deconvolving aggregated risks to provide age-, gene-, and sex-specific cancer risk. Carriers of pathogenic variants in mismatch repair (MMR) genes benefit from reliable information about their cancer risk to better inform targeted surveillance strategies for colorectal cancer (CRC), but published estimates vary. Variation in published estimates could arise from differences in study designs, selection criteria for molecular testing, and statistical adjustments for ascertainment. Previous meta-analyses of CRC risk are based on studies that report gene- and sex-specific risk. This may exclude studies that provide aggregated cancer risk across sex and genes and lead to bias. To address this challenge, our meta-analytic approach has the ability to deconvolve aggregated risks, allowing us to use all of the information available in the literature and provide more comprehensive penetrance estimates. This method can be applied in the future to other gene/cancer combinations without restriction on the mutation.In Chapter 2, we compare methods for training gradient boosting models on multiple studies. When training and test studies come from different distributions, prediction models trained on a single study generally perform worse on out-of-study samples due to heterogeneity in study design, data collection methods, and sample characteristics. Training prediction models on multiple studies can address this challenge and improve cross-study replicability of predictions. We focus on two strategies for training cross-study replicable models: 1) merging all studies and training a single model, and 2) multi-study ensembling, which involves training a separate model on each study and combining the resulting predictions. We study boosting algorithms in a regression setting and compare cross-study replicability of merging vs. multi-study ensembling both empirically and theoretically. In particular, we characterize an analytical transition point beyond which ensembling exhibits lower prediction error than merging for boosting with linear learners. We verify the theoretical transition point empirically and illustrate how it may guide practitioners' choice regarding merging vs. ensembling in a breast cancer application.In Chapter 3, we propose an approach for estimating heterogeneous treatment effects in multiple studies. Heterogeneous treatment effect estimation is central to many modern statistical applications, such as precision medicine. Despite increased access to multiple studies, existing methods on heterogeneous treatment effect estimation are largely rooted in theory based on a single study. These methods generally rely on the assumption that the heterogeneous treatment effect is the same across studies. However, this assumption may be untenable under potential heterogeneity in study design, data collection methods, and sample characteristics across multiple studies. To address this challenge, we propose the multi-study R-learner for estimating heterogeneous treatment effects under the presence of between-study heterogeneity. This method allows information to be borrowed across multiple studies and allows flexible modeling of the nuisance components with machine learning methods. We show analytically that optimizing the multi-study R-loss is equivalent to optimizing the oracle loss up to an error that diminishes at a relatively fast rate with the sample size. Under the series estimation framework, we derive a pointwise normality result for the multi-study R-learner estimator. Empirically, we show that as between-study heterogeneity increases, the multi-study R-learner results in lower estimation error than the R-learner via simulations and a breast cancer application.
533
$a
Electronic reproduction.
$b
Ann Arbor, Mich. :
$c
ProQuest,
$d
2023
538
$a
Mode of access: World Wide Web
650
4
$a
Biostatistics.
$3
1002712
653
$a
Causal inference
653
$a
Machine learning
653
$a
Meta-analysis
653
$a
Multi-study
653
$a
Prediction modeling
655
7
$a
Electronic books.
$2
lcsh
$3
542853
690
$a
0308
710
2
$a
ProQuest Information and Learning Co.
$3
783688
710
2
$a
Harvard University.
$b
Biostatistics.
$3
2104931
773
0
$t
Dissertations Abstracts International
$g
84-05B.
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=29252400
$z
click for full text (PQDT)
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9482160
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入