語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Computational and Statistical Theori...
~
Mei, Song.
FindBook
Google Book
Amazon
博客來
Computational and Statistical Theories for Large-Scale Neural Networks.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Computational and Statistical Theories for Large-Scale Neural Networks./
作者:
Mei, Song.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2020,
面頁冊數:
274 p.
附註:
Source: Dissertations Abstracts International, Volume: 82-02, Section: B.
Contained By:
Dissertations Abstracts International82-02B.
標題:
Computer engineering. -
電子資源:
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28103930
ISBN:
9798662510883
Computational and Statistical Theories for Large-Scale Neural Networks.
Mei, Song.
Computational and Statistical Theories for Large-Scale Neural Networks.
- Ann Arbor : ProQuest Dissertations & Theses, 2020 - 274 p.
Source: Dissertations Abstracts International, Volume: 82-02, Section: B.
Thesis (Ph.D.)--Stanford University, 2020.
This item must not be sold to any third party vendors.
Deep learning methods operate in regimes that defy the traditional computational and statistical mindsets. Despite the non-convexity of empirical risks and the huge complexity of neural network architectures, stochastic gradient algorithms can often find an approximate global minimizer of the training loss and achieve small generalization error on test data. In recent years, an important research direction is to theoretically explain these observed optimization efficiency and generalization efficacy of neural network systems. This thesis tries to tackle these challenges in the model of two-layers neural networks, by analyzing its computational and statistical properties in various scaling limits. On the computational aspects, we introduce two competing theories for neural network dynamics: the mean field theory and the tangent kernel theory. These two theories characterize training dynamics of neural networks in different regimes that exhibit different behaviors. In the mean field framework, the training dynamics, in the large neuron limit, is captured by a particular non-linear partial differential equation. This characterization allows us to prove global convergence of the dynamics in certain scenarios. Comparatively, the tangent kernel theory characterizes the same dynamics in a different scaling limit and provides global convergence guarantees in more general scenarios. On the statistical aspects, we study the generalization properties of neural networks trained in the two regimes as described above. We first show that, in the high dimensional limit, neural tangent kernels are no better than polynomial regression, while neural networks trained in the mean field regime can potentially perform better. Next, we study more carefully the random features model, which is equivalent to a two-layers neural network in the kernel regime. We compute the precise asymptotics of its test error in the high dimensional limit and confirm that it exhibits an interesting double-descent curve that was observed in experiments.
ISBN: 9798662510883Subjects--Topical Terms:
621879
Computer engineering.
Subjects--Index Terms:
Neural networks
Computational and Statistical Theories for Large-Scale Neural Networks.
LDR
:03180nmm a2200349 4500
001
2279765
005
20210823083439.5
008
220723s2020 ||||||||||||||||| ||eng d
020
$a
9798662510883
035
$a
(MiAaPQ)AAI28103930
035
$a
(MiAaPQ)STANFORDmm676zm0933
035
$a
AAI28103930
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Mei, Song.
$3
3558238
245
1 0
$a
Computational and Statistical Theories for Large-Scale Neural Networks.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2020
300
$a
274 p.
500
$a
Source: Dissertations Abstracts International, Volume: 82-02, Section: B.
500
$a
Advisor: Montanari, Andrea;Johnstone, Iain;Ying, Lexing.
502
$a
Thesis (Ph.D.)--Stanford University, 2020.
506
$a
This item must not be sold to any third party vendors.
520
$a
Deep learning methods operate in regimes that defy the traditional computational and statistical mindsets. Despite the non-convexity of empirical risks and the huge complexity of neural network architectures, stochastic gradient algorithms can often find an approximate global minimizer of the training loss and achieve small generalization error on test data. In recent years, an important research direction is to theoretically explain these observed optimization efficiency and generalization efficacy of neural network systems. This thesis tries to tackle these challenges in the model of two-layers neural networks, by analyzing its computational and statistical properties in various scaling limits. On the computational aspects, we introduce two competing theories for neural network dynamics: the mean field theory and the tangent kernel theory. These two theories characterize training dynamics of neural networks in different regimes that exhibit different behaviors. In the mean field framework, the training dynamics, in the large neuron limit, is captured by a particular non-linear partial differential equation. This characterization allows us to prove global convergence of the dynamics in certain scenarios. Comparatively, the tangent kernel theory characterizes the same dynamics in a different scaling limit and provides global convergence guarantees in more general scenarios. On the statistical aspects, we study the generalization properties of neural networks trained in the two regimes as described above. We first show that, in the high dimensional limit, neural tangent kernels are no better than polynomial regression, while neural networks trained in the mean field regime can potentially perform better. Next, we study more carefully the random features model, which is equivalent to a two-layers neural network in the kernel regime. We compute the precise asymptotics of its test error in the high dimensional limit and confirm that it exhibits an interesting double-descent curve that was observed in experiments.
590
$a
School code: 0212.
650
4
$a
Computer engineering.
$3
621879
650
4
$a
Systems science.
$3
3168411
650
4
$a
Artificial intelligence.
$3
516317
653
$a
Neural networks
653
$a
Tangent kernel theory
690
$a
0464
690
$a
0800
690
$a
0790
710
2
$a
Stanford University.
$3
754827
773
0
$t
Dissertations Abstracts International
$g
82-02B.
790
$a
0212
791
$a
Ph.D.
792
$a
2020
793
$a
English
856
4 0
$u
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28103930
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9431498
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入