東華大學圖書館 |

Methods for Large Scale Nonlinear and Stochastic Optimization.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Methods for Large Scale Nonlinear and Stochastic Optimization./
作者:	Berahas, Albert S.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2018,
面頁冊數:	227 p.
附註:	Source: Dissertations Abstracts International, Volume: 79-10, Section: B.
Contained By:	Dissertations Abstracts International79-10B.
標題:	Applied Mathematics. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10749340
ISBN:	9780355824803

Methods for Large Scale Nonlinear and Stochastic Optimization.
Berahas, Albert S.

Methods for Large Scale Nonlinear and Stochastic Optimization. - Ann Arbor : ProQuest Dissertations & Theses, 2018 - 227 p.

Source: Dissertations Abstracts International, Volume: 79-10, Section: B.

Thesis (Ph.D.)--Northwestern University, 2018.

This item must not be sold to any third party vendors.

The goal of this thesis is to develop practical algorithms with sound theoretical properties for three different subfields of nonlinear optimization: (i) Optimization Algorithms for Supervised Machine Learning; (ii) Derivative-Free Optimization; and (iii) Distributed Optimization. As such, the thesis is divided into three main chapters. The focus of Chapter 2 is on optimization methods for supervised machine learning. Section 2.1 describes a batch method that uses a sizeable fraction of the training set at each iteration to facilitate parallelism, and that employs second-order information. In order to improve the learning process, we follow a multi-batch approach in which the batch changes at each iteration. This can cause difficulties because L-BFGS employs gradient differences to update the Hessian approximations, and when these gradients are computed using different data points the process can be unstable. We show how to perform stable quasi-Newton updating in the multi-batch setting, study the convergence properties for both convex and nonconvex functions, and illustrate the behavior of the algorithm on a distributed computing platform on binary classification logistic regression and neural network training tasks that arise in machine learning. In Section 2.2 we investigate the concepts of sketching and subsampling in conjunction with optimization. Specifically, we study Newton-Sketch and Subsampled Newton methods for the finite-sum optimization problem. We consider practical versions of the two methods in which the Newton equations are solved approximately using the conjugate gradient (CG) method or a stochastic gradient iteration. We establish new complexity results for the Subsampled Newton-CG method that exploit the spectral properties of CG. Controlled numerical experiments compare the relative strengths of Newton-Sketch and Subsampled Newton methods, and show that for many finite-sum problems they are far more efficient than popular first-order methods. Minimizing a noisy black-box function with many variables remains an open problem in optimization. Current methodology is based on derivative-free methods designed for small scale deterministic problems. In Chapter 3, we advance the state-of-the-art by proposing a finite difference quasi-Newton algorithm that can deal with stochastic noise and that scales into the thousands of variables. The method takes advantage of the scalability and power of BFGS updating, and employs an adaptive procedure for choosing the differencing interval h based on an estimate of the noise level in the functions. This noise estimation procedure and the selection of h are inexpensive, but not always accurate, and to prevent failures the algorithm incorporates a recovery procedure that takes appropriate action in the case when the line search procedure is unable to produce an acceptable point. A novel convergence analysis is presented that considers the effect of a noisy line search procedure. Numerical experiments comparing the method to a model based trust region method are presented. Methods for distributed optimization have received significant attention in recent years owing to their wide applicability. A distributed optimization method typically consists of two key components: communication and computation. The standard way of judging an algorithm via only the number of iterations overlooks the complexity associated with each iteration. Motivated by this discrepancy, in Chapter 4 we propose an adaptive cost framework which adjusts the cost measure depending on the features of various applications. We present a flexible algorithmic framework, where communication and computation steps are explicitly decomposed to enable algorithm customization for various applications. We apply this framework to the well-known distributed gradient descent (DGD) method, and propose a class of customized algorithms that compare favorably to their base algorithms, both theoretically and empirically. Numerical experiments illustrate the performance and flexibility of the methods, as well as practical variants, on quadratic functions and classification problems that arise in machine learning, in terms of iterations, gradient evaluations, communications and the proposed cost framework.

ISBN: 9780355824803Subjects--Topical Terms:

1669109
Applied Mathematics.

Methods for Large Scale Nonlinear and Stochastic Optimization.
LDR:05348nmm a2200325 4500 001 2210314
005 20191121124202.5
008 201008s2018 ||||||||||||||||| ||eng d
020 $a 9780355824803
035 $a (MiAaPQ)AAI10749340
035 $a (MiAaPQ)northwestern:14045
035 $a AAI10749340
040 $a MiAaPQ $c MiAaPQ
100 1 $a Berahas, Albert S. $3 3437456
245 1 0 $a Methods for Large Scale Nonlinear and Stochastic Optimization.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2018
300 $a 227 p.
500 $a Source: Dissertations Abstracts International, Volume: 79-10, Section: B.
500 $a Publisher info.: Dissertation/Thesis.
500 $a Advisor: Nocedal, Jorge.
502 $a Thesis (Ph.D.)--Northwestern University, 2018.
506 $a This item must not be sold to any third party vendors.
520 $a The goal of this thesis is to develop practical algorithms with sound theoretical properties for three different subfields of nonlinear optimization: (i) Optimization Algorithms for Supervised Machine Learning; (ii) Derivative-Free Optimization; and (iii) Distributed Optimization. As such, the thesis is divided into three main chapters. The focus of Chapter 2 is on optimization methods for supervised machine learning. Section 2.1 describes a batch method that uses a sizeable fraction of the training set at each iteration to facilitate parallelism, and that employs second-order information. In order to improve the learning process, we follow a multi-batch approach in which the batch changes at each iteration. This can cause difficulties because L-BFGS employs gradient differences to update the Hessian approximations, and when these gradients are computed using different data points the process can be unstable. We show how to perform stable quasi-Newton updating in the multi-batch setting, study the convergence properties for both convex and nonconvex functions, and illustrate the behavior of the algorithm on a distributed computing platform on binary classification logistic regression and neural network training tasks that arise in machine learning. In Section 2.2 we investigate the concepts of sketching and subsampling in conjunction with optimization. Specifically, we study Newton-Sketch and Subsampled Newton methods for the finite-sum optimization problem. We consider practical versions of the two methods in which the Newton equations are solved approximately using the conjugate gradient (CG) method or a stochastic gradient iteration. We establish new complexity results for the Subsampled Newton-CG method that exploit the spectral properties of CG. Controlled numerical experiments compare the relative strengths of Newton-Sketch and Subsampled Newton methods, and show that for many finite-sum problems they are far more efficient than popular first-order methods. Minimizing a noisy black-box function with many variables remains an open problem in optimization. Current methodology is based on derivative-free methods designed for small scale deterministic problems. In Chapter 3, we advance the state-of-the-art by proposing a finite difference quasi-Newton algorithm that can deal with stochastic noise and that scales into the thousands of variables. The method takes advantage of the scalability and power of BFGS updating, and employs an adaptive procedure for choosing the differencing interval h based on an estimate of the noise level in the functions. This noise estimation procedure and the selection of h are inexpensive, but not always accurate, and to prevent failures the algorithm incorporates a recovery procedure that takes appropriate action in the case when the line search procedure is unable to produce an acceptable point. A novel convergence analysis is presented that considers the effect of a noisy line search procedure. Numerical experiments comparing the method to a model based trust region method are presented. Methods for distributed optimization have received significant attention in recent years owing to their wide applicability. A distributed optimization method typically consists of two key components: communication and computation. The standard way of judging an algorithm via only the number of iterations overlooks the complexity associated with each iteration. Motivated by this discrepancy, in Chapter 4 we propose an adaptive cost framework which adjusts the cost measure depending on the features of various applications. We present a flexible algorithmic framework, where communication and computation steps are explicitly decomposed to enable algorithm customization for various applications. We apply this framework to the well-known distributed gradient descent (DGD) method, and propose a class of customized algorithms that compare favorably to their base algorithms, both theoretically and empirically. Numerical experiments illustrate the performance and flexibility of the methods, as well as practical variants, on quadratic functions and classification problems that arise in machine learning, in terms of iterations, gradient evaluations, communications and the proposed cost framework.
590 $a School code: 0163.
650 4 $a Applied Mathematics. $3 1669109
650 4 $a Industrial engineering. $3 526216
690 $a 0364
690 $a 0546
710 2 $a Northwestern University. $b Engineering Sciences and Applied Mathematics. $3 3279510
773 0 $t Dissertations Abstracts International $g 79-10B.
790 $a 0163
791 $a Ph.D.
792 $a 2018
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10749340