東華大學圖書館 |

Flexible Bayesian Methods for High Dimensional Data.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Flexible Bayesian Methods for High Dimensional Data./
作者:	Saha, Enakshi.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2021,
面頁冊數:	159 p.
附註:	Source: Dissertations Abstracts International, Volume: 83-01, Section: B.
Contained By:	Dissertations Abstracts International83-01B.
標題:	Statistics. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28494838
ISBN:	9798516959271

Flexible Bayesian Methods for High Dimensional Data.
Saha, Enakshi.

Flexible Bayesian Methods for High Dimensional Data. - Ann Arbor : ProQuest Dissertations & Theses, 2021 - 159 p.

Source: Dissertations Abstracts International, Volume: 83-01, Section: B.

Thesis (Ph.D.)--The University of Chicago, 2021.

This item must not be sold to any third party vendors.

We study flexible Bayesian methods that are amenable to a wide range of learning problems involving complex high dimensional data structures, with minimal tuning. We consider parametric and semiparametric Bayesian models, that are applicable to both static and dynamic data, arising from a multitude of areas such as economics, finance and marketing, to name a few. A special emphasis is given on deriving probabilistic guarantees of these models, that corroborate their strong empirical performance and can potentially provide insight into interesting avenues for future research.Chapter 1 describes the broader theme of our research. We focus on two important domains of Bayesian Statistics: Bayesian ensemble learning and latent factor models. As part of the first topic, we explore the theoretical properties and empirical adaptability of Bayesian trees and their additive ensembles, along with their multiple incarnations. In the second part of our research we propose a sparse factor analysis model for dynamic data that is suitable for discovering latent structures in multivariate time series arising from a wide range of real life applications.Bayesian additive regression trees (BART) is an ensemble learning technique that has been adapted to a wide range of high dimensional learning tasks. In Chapter 2 we demonstrate that the BART model has a near-optimal posterior concentration rate when the underlying regression function is Holder continuous. In Chapter 3 we demonstrate that this theoretical guarantee extends beyond the regression problem, to encompass response variables belonging to the exponential family, thereby including variants of BART that are adaptable to other important applications, such as classification and count regression. We also prove that these results can be replicated not only for Holder continuous functions but also when the regression function is a step function or a monotone function. In Chapter 4 we demonstrate the scope of BART for discrete choice modeling. We demonstrate that BART exhibits superior predictive accuracy on several benchmark datasets compared to some popular discrete choice models.In Chapter 5, we propose a Bayesian sparse factor analysis model for high dimensional dynamic data. We address some important challenges that often hinder the practical deployment of many existing dynamic factor analysis tools. Firstly, our model infers the number of latent factors from the data, instead of fixing this number to a user-defined value. Moreover both the number of latent factors, as well as the factor loadings are allowed to vary over time. Second, we propose an EM implementation that requires minimal identification constraints and is considerably faster than the MCMC sampler, for high dimensional applications. To demonstrate the efficacy of our model, we study a large scale US macroeconomic data with a special focus on the 2008 financial crisis.Finally Chapter 6 concludes with a discussion on possible implications of our work and some promising future research directions.

ISBN: 9798516959271Subjects--Topical Terms:

517247
Statistics.
Subjects--Index Terms:

Bayesian Statistics

Flexible Bayesian Methods for High Dimensional Data.
LDR:04266nmm a2200397 4500 001 2342344
005 20220318093118.5
008 241004s2021 ||||||||||||||||| ||eng d
020 $a 9798516959271
035 $a (MiAaPQ)AAI28494838
035 $a AAI28494838
040 $a MiAaPQ $c MiAaPQ
100 1 $a Saha, Enakshi. $3 3680689
245 1 0 $a Flexible Bayesian Methods for High Dimensional Data.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2021
300 $a 159 p.
500 $a Source: Dissertations Abstracts International, Volume: 83-01, Section: B.
500 $a Advisor: Rockova, Veronika.
502 $a Thesis (Ph.D.)--The University of Chicago, 2021.
506 $a This item must not be sold to any third party vendors.
520 $a We study flexible Bayesian methods that are amenable to a wide range of learning problems involving complex high dimensional data structures, with minimal tuning. We consider parametric and semiparametric Bayesian models, that are applicable to both static and dynamic data, arising from a multitude of areas such as economics, finance and marketing, to name a few. A special emphasis is given on deriving probabilistic guarantees of these models, that corroborate their strong empirical performance and can potentially provide insight into interesting avenues for future research.Chapter 1 describes the broader theme of our research. We focus on two important domains of Bayesian Statistics: Bayesian ensemble learning and latent factor models. As part of the first topic, we explore the theoretical properties and empirical adaptability of Bayesian trees and their additive ensembles, along with their multiple incarnations. In the second part of our research we propose a sparse factor analysis model for dynamic data that is suitable for discovering latent structures in multivariate time series arising from a wide range of real life applications.Bayesian additive regression trees (BART) is an ensemble learning technique that has been adapted to a wide range of high dimensional learning tasks. In Chapter 2 we demonstrate that the BART model has a near-optimal posterior concentration rate when the underlying regression function is Holder continuous. In Chapter 3 we demonstrate that this theoretical guarantee extends beyond the regression problem, to encompass response variables belonging to the exponential family, thereby including variants of BART that are adaptable to other important applications, such as classification and count regression. We also prove that these results can be replicated not only for Holder continuous functions but also when the regression function is a step function or a monotone function. In Chapter 4 we demonstrate the scope of BART for discrete choice modeling. We demonstrate that BART exhibits superior predictive accuracy on several benchmark datasets compared to some popular discrete choice models.In Chapter 5, we propose a Bayesian sparse factor analysis model for high dimensional dynamic data. We address some important challenges that often hinder the practical deployment of many existing dynamic factor analysis tools. Firstly, our model infers the number of latent factors from the data, instead of fixing this number to a user-defined value. Moreover both the number of latent factors, as well as the factor loadings are allowed to vary over time. Second, we propose an EM implementation that requires minimal identification constraints and is considerably faster than the MCMC sampler, for high dimensional applications. To demonstrate the efficacy of our model, we study a large scale US macroeconomic data with a special focus on the 2008 financial crisis.Finally Chapter 6 concludes with a discussion on possible implications of our work and some promising future research directions.
590 $a School code: 0330.
650 4 $a Statistics. $3 517247
650 4 $a Statistical physics. $3 536281
650 4 $a Artificial intelligence. $3 516317
650 4 $a Information science. $3 554358
650 4 $a Sparsity. $3 3680690
650 4 $a Research. $3 531893
650 4 $a Datasets. $3 3541416
650 4 $a Generalized linear models. $3 3561810
650 4 $a Feature selection. $3 3560270
650 4 $a Data analysis. $2 bisacsh $3 3515250
650 4 $a Time series. $3 3561811
650 4 $a Mathematical problems. $3 3680544
650 4 $a Survival analysis. $3 3566266
650 4 $a Competition. $3 537031
650 4 $a Simulation. $3 644748
650 4 $a Classification. $3 595585
650 4 $a Variables. $3 3548259
650 4 $a Trees. $3 516384
650 4 $a Algorithms. $3 536374
650 4 $a Popularity. $3 3564342
653 $a Bayesian Statistics
653 $a Factor Analysis
653 $a High Dimensional Data
653 $a Machine Learning
653 $a Posterior Concentration
653 $a Time Series Analysis
690 $a 0463
690 $a 0723
690 $a 0800
690 $a 0217
710 2 $a The University of Chicago. $b Statistics. $3 1673632
773 0 $t Dissertations Abstracts International $g 83-01B.
790 $a 0330
791 $a Ph.D.
792 $a 2021
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28494838