東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

FindBook

Google Book

Amazon

博客來

Convex Optimization for Neural Networks.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Convex Optimization for Neural Networks./
作者:	Ergen, Tolga.
面頁冊數:	1 online resource (266 pages)
附註:	Source: Dissertations Abstracts International, Volume: 85-03, Section: B.
Contained By:	Dissertations Abstracts International85-03B.
標題:	Neural networks. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30561729click for full text (PQDT)
ISBN:	9798380264563

Convex Optimization for Neural Networks.
Ergen, Tolga.

Convex Optimization for Neural Networks. - 1 online resource (266 pages)

Source: Dissertations Abstracts International, Volume: 85-03, Section: B.

Thesis (Ph.D.)--Stanford University, 2023.

Includes bibliographical references

Due to the non-convex nature of training Deep Neural Network (DNN) models, their effectiveness relies on the use of non-convex optimization heuristics. Traditional methods for training DNNs often require costly empirical methods to produce successful models and do not have a clear theoretical foundation. In this thesis, we examine the use of convex optimization theory to improve the training of neural networks and provide a better interpretation of their optimal weights. In this thesis, we focus on two-layer neural networks with piecewise linear activations and show that they can be formulated as finite-dimensional convex programs with a regularization term that promotes sparsity, which is a variant of group Lasso. We first utilize semi-infinite programming theory to prove strong duality for finite-width neural networks and then describe these architectures equivalently as high dimensional convex models. Remarkably, the worst-case complexity to solve the convex program is polynomial in the number of samples and number of neurons when the rank of the data matrix is bounded, which is the case in convolutional networks. To extend our method to training data of arbitrary rank, we develop a novel polynomial-time approximation scheme based on zonotope subsampling that comes with a guaranteed approximation ratio. Our convex models can be trained using standard convex solvers without resorting to heuristics or extensive hyper-parameter tuning unlike non-convex methods. Due to the convexity, optimizer hyperparameters such as initialization, batch sizes, and step size schedules have no effect on the final model. Through extensive numerical experiments, we show that convex models can outperform traditional non-convex methods and are not sensitive to optimizer hyperparameters.In the remaining parts of the thesis, we first extend the analysis to certain standard two and three-layer Convolutional Neural Networks (CNNs) can be globally optimized in fully polynomial time. Unlike the fully connected networks studied in the first part, we prove that these equivalent characterizations of CNNs have fully polynomial complexity in all input dimensions without resorting to any approximation techniques and therefore enjoy significant computational complexity improvements. We then discuss extensions of our convex analysis to various neural network architectures including vector output networks, batch normalization, Generative Adversarial Networks (GANs), deeper architectures, and threshold networks.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2023

Mode of access: World Wide Web

ISBN: 9798380264563Subjects--Topical Terms:

677449
Neural networks.
Index Terms--Genre/Form:

542853
Electronic books.

Convex Optimization for Neural Networks.
LDR:03721nmm a2200325K 4500 001 2363228
005 20231116093831.5
006 m o d
007 cr mn ---uuuuu
008 241011s2023 xx obm 000 0 eng d
020 $a 9798380264563
035 $a (MiAaPQ)AAI30561729
035 $a (MiAaPQ)STANFORDsv935mh9248
035 $a AAI30561729
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Ergen, Tolga. $3 3703982
245 1 0 $a Convex Optimization for Neural Networks.
264 0 $c 2023
300 $a 1 online resource (266 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertations Abstracts International, Volume: 85-03, Section: B.
500 $a Advisor: Pilanci, Mert;Weissman, Tsachy;Boyd, Stephen.
502 $a Thesis (Ph.D.)--Stanford University, 2023.
504 $a Includes bibliographical references
520 $a Due to the non-convex nature of training Deep Neural Network (DNN) models, their effectiveness relies on the use of non-convex optimization heuristics. Traditional methods for training DNNs often require costly empirical methods to produce successful models and do not have a clear theoretical foundation. In this thesis, we examine the use of convex optimization theory to improve the training of neural networks and provide a better interpretation of their optimal weights. In this thesis, we focus on two-layer neural networks with piecewise linear activations and show that they can be formulated as finite-dimensional convex programs with a regularization term that promotes sparsity, which is a variant of group Lasso. We first utilize semi-infinite programming theory to prove strong duality for finite-width neural networks and then describe these architectures equivalently as high dimensional convex models. Remarkably, the worst-case complexity to solve the convex program is polynomial in the number of samples and number of neurons when the rank of the data matrix is bounded, which is the case in convolutional networks. To extend our method to training data of arbitrary rank, we develop a novel polynomial-time approximation scheme based on zonotope subsampling that comes with a guaranteed approximation ratio. Our convex models can be trained using standard convex solvers without resorting to heuristics or extensive hyper-parameter tuning unlike non-convex methods. Due to the convexity, optimizer hyperparameters such as initialization, batch sizes, and step size schedules have no effect on the final model. Through extensive numerical experiments, we show that convex models can outperform traditional non-convex methods and are not sensitive to optimizer hyperparameters.In the remaining parts of the thesis, we first extend the analysis to certain standard two and three-layer Convolutional Neural Networks (CNNs) can be globally optimized in fully polynomial time. Unlike the fully connected networks studied in the first part, we prove that these equivalent characterizations of CNNs have fully polynomial complexity in all input dimensions without resorting to any approximation techniques and therefore enjoy significant computational complexity improvements. We then discuss extensions of our convex analysis to various neural network architectures including vector output networks, batch normalization, Generative Adversarial Networks (GANs), deeper architectures, and threshold networks.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2023
538 $a Mode of access: World Wide Web
650 4 $a Neural networks. $3 677449
655 7 $a Electronic books. $2 lcsh $3 542853
690 $a 0800
710 2 $a ProQuest Information and Learning Co. $3 783688
710 2 $a Stanford University. $3 754827
773 0 $t Dissertations Abstracts International $g 85-03B.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30561729 $z click for full text (PQDT)