東華大學圖書館 |

A Gaussian-Process Framework for Nonlinear Statistical Inference Using Modern Machine Learning Models.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	A Gaussian-Process Framework for Nonlinear Statistical Inference Using Modern Machine Learning Models./
作者:	Deng, Wenying.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2023,
面頁冊數:	213 p.
附註:	Source: Dissertations Abstracts International, Volume: 84-12, Section: B.
Contained By:	Dissertations Abstracts International84-12B.
標題:	Biostatistics. -
電子資源:	https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30491637
ISBN:	9798379613006

A Gaussian-Process Framework for Nonlinear Statistical Inference Using Modern Machine Learning Models.
Deng, Wenying.

A Gaussian-Process Framework for Nonlinear Statistical Inference Using Modern Machine Learning Models. - Ann Arbor : ProQuest Dissertations & Theses, 2023 - 213 p.

Source: Dissertations Abstracts International, Volume: 84-12, Section: B.

Thesis (Ph.D.)--Harvard University, 2023.

This item must not be sold to any third party vendors.

Gaussian Process Regression has become widely used in biomedical research in recent years, particularly for studying the intricate and nonlinear impacts of multivariate genetic or environmental exposures. This dissertation proposes methods for estimating and testing nonlinear effects and/or estimating variable importance with uncertainty quantification. Chapter 1 focuses on hypothesis testing, while Chapter 2 and 3 address general variable importance estimation problems.{A0}In Chapter 1, we develop an R package CVEK, which introduces a suite of flexible machine learning models and robust hypothesis tests for learning the joint nonlinear effects of multiple covariates in limited samples. It implements the Cross-validated Ensemble of Kernels (CVEK), an ensemble-based kernel machine learning method that adaptively learns the joint nonlinear effect of multiple covariates from data, and provides powerful hypothesis tests for both main effects of features and interactions among features. The R Package CVEK provides a flexible, easy-to-use implementation of CVEK, and offers a wide range of choices for the kernel family (for instance, polynomial, radial basis functions, Matern, neural network, and others), model selection criteria, ensembling method (averaging, exponential weighting, cross-validated stacking), and the type of hypothesis test (asymptotic or parametric bootstrap). Through extensive simulations we demonstrate the validity and robustness of this approach, and provide practical guidelines on how to design an estimation strategy for optimal performance in different data scenarios.In Chapter 2, we propose a simple and unified framework for nonlinear variable importance estimation that incorporates uncertainty in the prediction function and is compatible with a wide range of machine learning models (e.g., tree ensembles, kernel methods, neural networks, etc). In particular, for a learned nonlinear model f(x), we consider quantifying the importance of an input variable xj using the integrated partial derivative {CE}{middot}j = {acute}{88}{AElig} {acute}{88}{82}/{acute}{88}{82}xj f(x){acute}{88}{AElig}2PX . We then (1) provide a principled approach for quantifying uncertainty in variable importance by deriving its posterior distribution, and (2) show that the approach is generalizable even to non-differentiable models such as tree ensembles. Rigorous Bayesian nonparametric theorems are derived to guarantee the posterior consistency and asymptotic uncertainty of the proposed approach. Extensive simulations and experiments on healthcare benchmark datasets confirm that the proposed algorithm outperforms existing classical and recent variable selection methods.In Chapter 3, we develop a versatile framework that can be applied to continuous, count, and binary responses. The primary aim is to estimate the variable importance scores using various machine learning models, including tree ensembles, kernel methods, neural networks, and others. Additionally, the proposed framework accounts for the impact of confounding variables and provides a way to assess the uncertainty associated with variable importance scores. Subsequently, we present a systematic method to estimate the uncertainty in variable importance by computing its posterior distribution. We derive Bayesian nonparametric theorems that ensure the posterior consistency and asymptotic uncertainty of the proposed approach. The efficacy of the proposed algorithm is validated through comprehensive simulations and experiments on socioeconomic benchmark datasets, indicating superior performance compared to existing traditional and contemporary variable selection techniques.

ISBN: 9798379613006Subjects--Topical Terms:

1002712
Biostatistics.
Subjects--Index Terms:

Gaussian Process Regression

A Gaussian-Process Framework for Nonlinear Statistical Inference Using Modern Machine Learning Models.
LDR:04944nmm a2200397 4500 001 2393046
005 20231130111624.5
006 m o d
007 cr#unu||||||||
008 251215s2023 ||||||||||||||||| ||eng d
020 $a 9798379613006
035 $a (MiAaPQ)AAI30491637
035 $a AAI30491637
040 $a MiAaPQ $c MiAaPQ
100 1 $a Deng, Wenying. $0 (orcid)0000-0003-0671-0102 $3 3762488
245 1 2 $a A Gaussian-Process Framework for Nonlinear Statistical Inference Using Modern Machine Learning Models.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2023
300 $a 213 p.
500 $a Source: Dissertations Abstracts International, Volume: 84-12, Section: B.
500 $a Advisor: Coull, Brent.
502 $a Thesis (Ph.D.)--Harvard University, 2023.
506 $a This item must not be sold to any third party vendors.
520 $a Gaussian Process Regression has become widely used in biomedical research in recent years, particularly for studying the intricate and nonlinear impacts of multivariate genetic or environmental exposures. This dissertation proposes methods for estimating and testing nonlinear effects and/or estimating variable importance with uncertainty quantification. Chapter 1 focuses on hypothesis testing, while Chapter 2 and 3 address general variable importance estimation problems.{A0}In Chapter 1, we develop an R package CVEK, which introduces a suite of flexible machine learning models and robust hypothesis tests for learning the joint nonlinear effects of multiple covariates in limited samples. It implements the Cross-validated Ensemble of Kernels (CVEK), an ensemble-based kernel machine learning method that adaptively learns the joint nonlinear effect of multiple covariates from data, and provides powerful hypothesis tests for both main effects of features and interactions among features. The R Package CVEK provides a flexible, easy-to-use implementation of CVEK, and offers a wide range of choices for the kernel family (for instance, polynomial, radial basis functions, Matern, neural network, and others), model selection criteria, ensembling method (averaging, exponential weighting, cross-validated stacking), and the type of hypothesis test (asymptotic or parametric bootstrap). Through extensive simulations we demonstrate the validity and robustness of this approach, and provide practical guidelines on how to design an estimation strategy for optimal performance in different data scenarios.In Chapter 2, we propose a simple and unified framework for nonlinear variable importance estimation that incorporates uncertainty in the prediction function and is compatible with a wide range of machine learning models (e.g., tree ensembles, kernel methods, neural networks, etc). In particular, for a learned nonlinear model f(x), we consider quantifying the importance of an input variable xj using the integrated partial derivative {CE}{middot}j = {acute}{88}{AElig} {acute}{88}{82}/{acute}{88}{82}xj f(x){acute}{88}{AElig}2PX . We then (1) provide a principled approach for quantifying uncertainty in variable importance by deriving its posterior distribution, and (2) show that the approach is generalizable even to non-differentiable models such as tree ensembles. Rigorous Bayesian nonparametric theorems are derived to guarantee the posterior consistency and asymptotic uncertainty of the proposed approach. Extensive simulations and experiments on healthcare benchmark datasets confirm that the proposed algorithm outperforms existing classical and recent variable selection methods.In Chapter 3, we develop a versatile framework that can be applied to continuous, count, and binary responses. The primary aim is to estimate the variable importance scores using various machine learning models, including tree ensembles, kernel methods, neural networks, and others. Additionally, the proposed framework accounts for the impact of confounding variables and provides a way to assess the uncertainty associated with variable importance scores. Subsequently, we present a systematic method to estimate the uncertainty in variable importance by computing its posterior distribution. We derive Bayesian nonparametric theorems that ensure the posterior consistency and asymptotic uncertainty of the proposed approach. The efficacy of the proposed algorithm is validated through comprehensive simulations and experiments on socioeconomic benchmark datasets, indicating superior performance compared to existing traditional and contemporary variable selection techniques.
590 $a School code: 0084.
650 4 $a Biostatistics. $3 1002712
650 4 $a Statistics. $3 517247
653 $a Gaussian Process Regression
653 $a Hypothesis testing
653 $a Machine learning
653 $a Statistical inference
653 $a Variable selection
690 $a 0308
690 $a 0463
690 $a 0800
710 2 $a Harvard University. $b Biostatistics. $3 2104931
773 0 $t Dissertations Abstracts International $g 84-12B.
790 $a 0084
791 $a Ph.D.
792 $a 2023
793 $a English
856 4 0 $u https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30491637