Language:
English
繁體中文
Help
回圖書館首頁
手機版館藏查詢
Login
Back
Switch To:
Labeled
|
MARC Mode
|
ISBD
Why do Gradient Methods Work in Opti...
~
Chatterji, Niladri S.
Linked to FindBook
Google Book
Amazon
博客來
Why do Gradient Methods Work in Optimization and Sampling?
Record Type:
Electronic resources : Monograph/item
Title/Author:
Why do Gradient Methods Work in Optimization and Sampling?/
Author:
Chatterji, Niladri S.
Published:
Ann Arbor : ProQuest Dissertations & Theses, : 2021,
Description:
185 p.
Notes:
Source: Dissertations Abstracts International, Volume: 83-03, Section: B.
Contained By:
Dissertations Abstracts International83-03B.
Subject:
Statistics. -
Online resource:
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28498319
ISBN:
9798535563213
Why do Gradient Methods Work in Optimization and Sampling?
Chatterji, Niladri S.
Why do Gradient Methods Work in Optimization and Sampling?
- Ann Arbor : ProQuest Dissertations & Theses, 2021 - 185 p.
Source: Dissertations Abstracts International, Volume: 83-03, Section: B.
Thesis (Ph.D.)--University of California, Berkeley, 2021.
This item must not be sold to any third party vendors.
Modern machine learning models are complex, hierarchical, and large-scale and are trained using non-convex objective functions. The algorithms used to train these models, however, are incremental, first-order gradient-based algorithms like gradient descent and Langevin Monte Carlo. Why and when do these seemingly simple algorithms succeed? This question is the focus of this thesis. We will consider three problems. The first problem involves the training of deep neural network classifiers using the logistic loss function with gradient descent. We establish conditions under which gradient descent drives the logistic loss to zero, and prove bounds on the rate of convergence. Our analysis applies for smoothed approximations to the ReLU activation function, such as Swish and the Huberized ReLU, proposed in previous applied work. We provide two sufficient conditions for convergence. The first is simply a bound on the loss at initialization. The second is a data separation condition used in prior analyses.The second pertains to the problem of sampling from a strongly log-concave density. We provide an information theoretic lower bound on the number of stochastic gradient queries of the log density needed to generate a sample. Several popular sampling algorithms (including many Markov chain Monte Carlo methods) operate by using stochastic gradients of the log density to generate a sample; our results establish an information theoretic limit for all of these algorithms. The final problem involves sampling from a distribution whose log-density is non-smooth. We show that a slight modification of the Langevin Monte Carlo algorithm can be used to generate samples from such distributions in polynomial-time. We also provide non-asymptotic guarantees on the rate of convergence of this algorithm.
ISBN: 9798535563213Subjects--Topical Terms:
517247
Statistics.
Subjects--Index Terms:
Gradient descent
Why do Gradient Methods Work in Optimization and Sampling?
LDR
:03035nmm a2200397 4500
001
2283822
005
20211115071656.5
008
220723s2021 ||||||||||||||||| ||eng d
020
$a
9798535563213
035
$a
(MiAaPQ)AAI28498319
035
$a
AAI28498319
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Chatterji, Niladri S.
$3
3562864
245
1 0
$a
Why do Gradient Methods Work in Optimization and Sampling?
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2021
300
$a
185 p.
500
$a
Source: Dissertations Abstracts International, Volume: 83-03, Section: B.
500
$a
Advisor: Bartlett, Peter L.;DeWeese, Michael R.
502
$a
Thesis (Ph.D.)--University of California, Berkeley, 2021.
506
$a
This item must not be sold to any third party vendors.
520
$a
Modern machine learning models are complex, hierarchical, and large-scale and are trained using non-convex objective functions. The algorithms used to train these models, however, are incremental, first-order gradient-based algorithms like gradient descent and Langevin Monte Carlo. Why and when do these seemingly simple algorithms succeed? This question is the focus of this thesis. We will consider three problems. The first problem involves the training of deep neural network classifiers using the logistic loss function with gradient descent. We establish conditions under which gradient descent drives the logistic loss to zero, and prove bounds on the rate of convergence. Our analysis applies for smoothed approximations to the ReLU activation function, such as Swish and the Huberized ReLU, proposed in previous applied work. We provide two sufficient conditions for convergence. The first is simply a bound on the loss at initialization. The second is a data separation condition used in prior analyses.The second pertains to the problem of sampling from a strongly log-concave density. We provide an information theoretic lower bound on the number of stochastic gradient queries of the log density needed to generate a sample. Several popular sampling algorithms (including many Markov chain Monte Carlo methods) operate by using stochastic gradients of the log density to generate a sample; our results establish an information theoretic limit for all of these algorithms. The final problem involves sampling from a distribution whose log-density is non-smooth. We show that a slight modification of the Langevin Monte Carlo algorithm can be used to generate samples from such distributions in polynomial-time. We also provide non-asymptotic guarantees on the rate of convergence of this algorithm.
590
$a
School code: 0028.
650
4
$a
Statistics.
$3
517247
650
4
$a
Artificial intelligence.
$3
516317
650
4
$a
Computer science.
$3
523869
650
4
$a
Research.
$3
531893
650
4
$a
Internships.
$3
3560137
650
4
$a
Accuracy.
$3
3559958
650
4
$a
Random variables.
$3
646291
650
4
$a
Experiments.
$3
525909
650
4
$a
Optimization.
$3
891104
650
4
$a
Neural networks.
$3
677449
650
4
$a
Approximation.
$3
3560410
650
4
$a
Noise.
$3
598816
650
4
$a
Algorithms.
$3
536374
650
4
$a
Advisors.
$3
3560734
650
4
$a
Distance learning.
$3
3557921
650
4
$a
Physics.
$3
516296
653
$a
Gradient descent
653
$a
Langevin Dynamics
653
$a
Neural networks
653
$a
Optimization
653
$a
Sampling
653
$a
Machine learning
690
$a
0463
690
$a
0800
690
$a
0984
690
$a
0605
710
2
$a
University of California, Berkeley.
$b
Physics.
$3
1671059
773
0
$t
Dissertations Abstracts International
$g
83-03B.
790
$a
0028
791
$a
Ph.D.
792
$a
2021
793
$a
English
856
4 0
$u
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28498319
based on 0 review(s)
Location:
ALL
電子資源
Year:
Volume Number:
Items
1 records • Pages 1 •
1
Inventory Number
Location Name
Item Class
Material type
Call number
Usage Class
Loan Status
No. of reservations
Opac note
Attachments
W9435555
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
On shelf
0
1 records • Pages 1 •
1
Multimedia
Reviews
Add a review
and share your thoughts with other readers
Export
pickup library
Processing
...
Change password
Login