東華大學圖書館 |

Optimization and High-Dimensional Loss Landscapes in Deep Learning.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Optimization and High-Dimensional Loss Landscapes in Deep Learning./
作者:	Larsen, Brett William.
面頁冊數:	1 online resource (243 pages)
附註:	Source: Dissertations Abstracts International, Volume: 84-09, Section: B.
Contained By:	Dissertations Abstracts International84-09B.
標題:	Propagation. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30306115click for full text (PQDT)
ISBN:	9798374478679

Optimization and High-Dimensional Loss Landscapes in Deep Learning.
Larsen, Brett William.

Optimization and High-Dimensional Loss Landscapes in Deep Learning. - 1 online resource (243 pages)

Source: Dissertations Abstracts International, Volume: 84-09, Section: B.

Thesis (Ph.D.)--Stanford University, 2022.

Includes bibliographical references

Deep learning broadly refers to the practice of training a deep neural network architecture (i.e. a functional approximator defined by a set of weights and biases) on a task using a set of training data. Through a remarkable set of engineering advances over the past decade, deep learning has achieved outstanding performance on a wide range of tasks. These include image classification (e.g. ResNets [62] and Vision Transformers [26] on ImageNet), image generation from text and vice versa (e.g. DALL-E [139] and CLIP [137]), strategic game play (e.g. AlphaGo [151] and CICERO), protein structure prediction (e.g. AlphaFold [75]), mathematical reasoning (e.g. Minerva [101]), and natural language modeling (e.g. GPT-3 [15], PaLM [20], Flan-T5 [21], Chinchilla [66], OPT [183], and ChatGPT). Despite deep learning's impressive success, many questions remain concerning how training such high-dimensional models behaves in practice and why it reliably produces useful networks. The goal of this thesis then is to contribute to the development of such a principled understanding through the lens of the high-dimensional loss landscape.In this pursuit, we employ a scientific approach to deep learning in which we aim to emulate the discovery process taken in the natural sciences for understanding a complex system that we can empirically probe. (This approach has also been referred to as an "empirical theory of deep learning" [121] or simply the "science of deep learning.") As illustrated in Figure 1.1, there are conceptually two interacting pieces to this approach. The first is some complex system which has a number of parameters that we can control and observe the effect on the system output. In our case, we are probing deep neural networks and we can control many different parameters related to the architecture, initialization, dataset, and training procedure (optimizer, loss, learning rate, etc.). The output we then observe is the performance of the neural network on a task or any other function of the weights we want to measure.The second part is our mathematical framework which represents our current understanding of the relation between these inputs and outputs in the complex system. The goal here is to provide insights at the level of abstraction that contribute to our understanding of the system. For neural networks, we have access to the configuration of all the weights at any time during training but this tends to yield little understanding of the behaviors that occur in the system as a whole. This is analogous to statistical mechanics in which it is far more useful to develop tools that help us understand emergent, high-level phenomena than it is to know the velocity and position of every particle in the system. Developing this framework will often necessitate approximations, but we aim to choose our approximations carefully to keep the framework grounded in the real behaviors presented by the system.The key to the scientific approach is that we keep these two parts closely interacting in a virtuous cycle. Our observations of the complex system provide feedback on the predictive success of our theory, revealing its current strengths and weaknesses, and identifies new phenomena to understand.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2023

Mode of access: World Wide Web

ISBN: 9798374478679Subjects--Topical Terms:

3680519
Propagation.
Index Terms--Genre/Form:

542853
Electronic books.

Optimization and High-Dimensional Loss Landscapes in Deep Learning.
LDR:04456nmm a2200325K 4500 001 2358375
005 20230731112633.5
006 m o d
007 cr mn ---uuuuu
008 241011s2022 xx obm 000 0 eng d
020 $a 9798374478679
035 $a (MiAaPQ)AAI30306115
035 $a (MiAaPQ)STANFORDyj314kt7539
035 $a AAI30306115
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Larsen, Brett William. $3 3698907
245 1 0 $a Optimization and High-Dimensional Loss Landscapes in Deep Learning.
264 0 $c 2022
300 $a 1 online resource (243 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertations Abstracts International, Volume: 84-09, Section: B.
500 $a Advisor: Druckmann, Shaul; Ganguli, Surya.
502 $a Thesis (Ph.D.)--Stanford University, 2022.
504 $a Includes bibliographical references
520 $a Deep learning broadly refers to the practice of training a deep neural network architecture (i.e. a functional approximator defined by a set of weights and biases) on a task using a set of training data. Through a remarkable set of engineering advances over the past decade, deep learning has achieved outstanding performance on a wide range of tasks. These include image classification (e.g. ResNets [62] and Vision Transformers [26] on ImageNet), image generation from text and vice versa (e.g. DALL-E [139] and CLIP [137]), strategic game play (e.g. AlphaGo [151] and CICERO), protein structure prediction (e.g. AlphaFold [75]), mathematical reasoning (e.g. Minerva [101]), and natural language modeling (e.g. GPT-3 [15], PaLM [20], Flan-T5 [21], Chinchilla [66], OPT [183], and ChatGPT). Despite deep learning's impressive success, many questions remain concerning how training such high-dimensional models behaves in practice and why it reliably produces useful networks. The goal of this thesis then is to contribute to the development of such a principled understanding through the lens of the high-dimensional loss landscape.In this pursuit, we employ a scientific approach to deep learning in which we aim to emulate the discovery process taken in the natural sciences for understanding a complex system that we can empirically probe. (This approach has also been referred to as an "empirical theory of deep learning" [121] or simply the "science of deep learning.") As illustrated in Figure 1.1, there are conceptually two interacting pieces to this approach. The first is some complex system which has a number of parameters that we can control and observe the effect on the system output. In our case, we are probing deep neural networks and we can control many different parameters related to the architecture, initialization, dataset, and training procedure (optimizer, loss, learning rate, etc.). The output we then observe is the performance of the neural network on a task or any other function of the weights we want to measure.The second part is our mathematical framework which represents our current understanding of the relation between these inputs and outputs in the complex system. The goal here is to provide insights at the level of abstraction that contribute to our understanding of the system. For neural networks, we have access to the configuration of all the weights at any time during training but this tends to yield little understanding of the behaviors that occur in the system as a whole. This is analogous to statistical mechanics in which it is far more useful to develop tools that help us understand emergent, high-level phenomena than it is to know the velocity and position of every particle in the system. Developing this framework will often necessitate approximations, but we aim to choose our approximations carefully to keep the framework grounded in the real behaviors presented by the system.The key to the scientific approach is that we keep these two parts closely interacting in a virtuous cycle. Our observations of the complex system provide feedback on the predictive success of our theory, revealing its current strengths and weaknesses, and identifies new phenomena to understand.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2023
538 $a Mode of access: World Wide Web
650 4 $a Propagation. $3 3680519
650 4 $a Phase transitions. $3 3560387
650 4 $a Deep learning. $3 3554982
650 4 $a Algorithms. $3 536374
650 4 $a Success. $3 518195
650 4 $a Neural networks. $3 677449
655 7 $a Electronic books. $2 lcsh $3 542853
690 $a 0800
710 2 $a ProQuest Information and Learning Co. $3 783688
710 2 $a Stanford University. $3 754827
773 0 $t Dissertations Abstracts International $g 84-09B.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30306115 $z click for full text (PQDT)