東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Why do Gradient Methods Work in Opti...

Chatterji, Niladri S.

FindBook

Google Book

Amazon

博客來

Why do Gradient Methods Work in Optimization and Sampling?

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Why do Gradient Methods Work in Optimization and Sampling?/
作者:	Chatterji, Niladri S.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2021,
面頁冊數:	185 p.
附註:	Source: Dissertations Abstracts International, Volume: 83-03, Section: B.
Contained By:	Dissertations Abstracts International83-03B.
標題:	Statistics. -
電子資源:	https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28498319
ISBN:	9798535563213

Why do Gradient Methods Work in Optimization and Sampling?
Chatterji, Niladri S.

Why do Gradient Methods Work in Optimization and Sampling? - Ann Arbor : ProQuest Dissertations & Theses, 2021 - 185 p.

Source: Dissertations Abstracts International, Volume: 83-03, Section: B.

Thesis (Ph.D.)--University of California, Berkeley, 2021.

This item must not be sold to any third party vendors.

Modern machine learning models are complex, hierarchical, and large-scale and are trained using non-convex objective functions. The algorithms used to train these models, however, are incremental, first-order gradient-based algorithms like gradient descent and Langevin Monte Carlo. Why and when do these seemingly simple algorithms succeed? This question is the focus of this thesis. We will consider three problems. The first problem involves the training of deep neural network classifiers using the logistic loss function with gradient descent. We establish conditions under which gradient descent drives the logistic loss to zero, and prove bounds on the rate of convergence. Our analysis applies for smoothed approximations to the ReLU activation function, such as Swish and the Huberized ReLU, proposed in previous applied work. We provide two sufficient conditions for convergence. The first is simply a bound on the loss at initialization. The second is a data separation condition used in prior analyses.The second pertains to the problem of sampling from a strongly log-concave density. We provide an information theoretic lower bound on the number of stochastic gradient queries of the log density needed to generate a sample. Several popular sampling algorithms (including many Markov chain Monte Carlo methods) operate by using stochastic gradients of the log density to generate a sample; our results establish an information theoretic limit for all of these algorithms. The final problem involves sampling from a distribution whose log-density is non-smooth. We show that a slight modification of the Langevin Monte Carlo algorithm can be used to generate samples from such distributions in polynomial-time. We also provide non-asymptotic guarantees on the rate of convergence of this algorithm.

ISBN: 9798535563213Subjects--Topical Terms:

517247
Statistics.
Subjects--Index Terms:

Gradient descent

Why do Gradient Methods Work in Optimization and Sampling?
LDR:03035nmm a2200397 4500 001 2283822
005 20211115071656.5
008 220723s2021 ||||||||||||||||| ||eng d
020 $a 9798535563213
035 $a (MiAaPQ)AAI28498319
035 $a AAI28498319
040 $a MiAaPQ $c MiAaPQ
100 1 $a Chatterji, Niladri S. $3 3562864
245 1 0 $a Why do Gradient Methods Work in Optimization and Sampling?
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2021
300 $a 185 p.
500 $a Source: Dissertations Abstracts International, Volume: 83-03, Section: B.
500 $a Advisor: Bartlett, Peter L.;DeWeese, Michael R.
502 $a Thesis (Ph.D.)--University of California, Berkeley, 2021.
506 $a This item must not be sold to any third party vendors.
520 $a Modern machine learning models are complex, hierarchical, and large-scale and are trained using non-convex objective functions. The algorithms used to train these models, however, are incremental, first-order gradient-based algorithms like gradient descent and Langevin Monte Carlo. Why and when do these seemingly simple algorithms succeed? This question is the focus of this thesis. We will consider three problems. The first problem involves the training of deep neural network classifiers using the logistic loss function with gradient descent. We establish conditions under which gradient descent drives the logistic loss to zero, and prove bounds on the rate of convergence. Our analysis applies for smoothed approximations to the ReLU activation function, such as Swish and the Huberized ReLU, proposed in previous applied work. We provide two sufficient conditions for convergence. The first is simply a bound on the loss at initialization. The second is a data separation condition used in prior analyses.The second pertains to the problem of sampling from a strongly log-concave density. We provide an information theoretic lower bound on the number of stochastic gradient queries of the log density needed to generate a sample. Several popular sampling algorithms (including many Markov chain Monte Carlo methods) operate by using stochastic gradients of the log density to generate a sample; our results establish an information theoretic limit for all of these algorithms. The final problem involves sampling from a distribution whose log-density is non-smooth. We show that a slight modification of the Langevin Monte Carlo algorithm can be used to generate samples from such distributions in polynomial-time. We also provide non-asymptotic guarantees on the rate of convergence of this algorithm.
590 $a School code: 0028.
650 4 $a Statistics. $3 517247
650 4 $a Artificial intelligence. $3 516317
650 4 $a Computer science. $3 523869
650 4 $a Research. $3 531893
650 4 $a Internships. $3 3560137
650 4 $a Accuracy. $3 3559958
650 4 $a Random variables. $3 646291
650 4 $a Experiments. $3 525909
650 4 $a Optimization. $3 891104
650 4 $a Neural networks. $3 677449
650 4 $a Approximation. $3 3560410
650 4 $a Noise. $3 598816
650 4 $a Algorithms. $3 536374
650 4 $a Advisors. $3 3560734
650 4 $a Distance learning. $3 3557921
650 4 $a Physics. $3 516296
653 $a Gradient descent
653 $a Langevin Dynamics
653 $a Neural networks
653 $a Optimization
653 $a Sampling
653 $a Machine learning
690 $a 0463
690 $a 0800
690 $a 0984
690 $a 0605
710 2 $a University of California, Berkeley. $b Physics. $3 1671059
773 0 $t Dissertations Abstracts International $g 83-03B.
790 $a 0028
791 $a Ph.D.
792 $a 2021
793 $a English
856 4 0 $u https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28498319