東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Second-Order Methods for Stochastic ...

Keskar, Nitish Shirish.

FindBook

Google Book

Amazon

博客來

Second-Order Methods for Stochastic and Nonsmooth Optimization.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Second-Order Methods for Stochastic and Nonsmooth Optimization./
作者:	Keskar, Nitish Shirish.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2017,
面頁冊數:	120 p.
附註:	Source: Dissertation Abstracts International, Volume: 78-10(E), Section: B.
Contained By:	Dissertation Abstracts International78-10B(E).
標題:	Operations research. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10265605
ISBN:	9781369817867

Second-Order Methods for Stochastic and Nonsmooth Optimization.
Keskar, Nitish Shirish.

Second-Order Methods for Stochastic and Nonsmooth Optimization. - Ann Arbor : ProQuest Dissertations & Theses, 2017 - 120 p.

Source: Dissertation Abstracts International, Volume: 78-10(E), Section: B.

Thesis (Ph.D.)--Northwestern University, 2017.

The goal of this thesis is to design practical algorithms for nonlinear optimization in the case when the objective function is stochastic or nonsmooth. The thesis is divided into three chapters. Chapter 1 describes an active-set method for the minimization of an objective function that is structurally nonsmooth, viz., it is the sum of a smooth convex function and an $\ell_1$-regularization term. Problems of this nature primarily arise, e.g., in machine learning, when sparse solutions are desired. A distinctive feature of the method is the way in which active-set identification and second-order subspace minimization steps are integrated to combine the predictive power of the two approaches. At every iteration, the algorithm selects a candidate set of free and fixed variables, performs an (inexact) subspace phase, and then assesses the quality of the new active set. If it is not judged to be acceptable, then the set of free variables is restricted and a new active-set prediction is made. We establish global convergence for our approach, and compare an implementation of the new method against state-of-the-art numerical codes to demonstrate its competitiveness.

ISBN: 9781369817867Subjects--Topical Terms:

547123
Operations research.

Second-Order Methods for Stochastic and Nonsmooth Optimization.
LDR:04301nmm a2200325 4500 001 2156370
005 20180517112609.5
008 190424s2017 ||||||||||||||||| ||eng d
020 $a 9781369817867
035 $a (MiAaPQ)AAI10265605
035 $a (MiAaPQ)northwestern:13629
035 $a AAI10265605
040 $a MiAaPQ $c MiAaPQ
100 1 $a Keskar, Nitish Shirish. $3 3344136
245 1 0 $a Second-Order Methods for Stochastic and Nonsmooth Optimization.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2017
300 $a 120 p.
500 $a Source: Dissertation Abstracts International, Volume: 78-10(E), Section: B.
500 $a Advisers: Andreas Waechter; Jorge Nocedal.
502 $a Thesis (Ph.D.)--Northwestern University, 2017.
520 $a The goal of this thesis is to design practical algorithms for nonlinear optimization in the case when the objective function is stochastic or nonsmooth. The thesis is divided into three chapters. Chapter 1 describes an active-set method for the minimization of an objective function that is structurally nonsmooth, viz., it is the sum of a smooth convex function and an $\ell_1$-regularization term. Problems of this nature primarily arise, e.g., in machine learning, when sparse solutions are desired. A distinctive feature of the method is the way in which active-set identification and second-order subspace minimization steps are integrated to combine the predictive power of the two approaches. At every iteration, the algorithm selects a candidate set of free and fixed variables, performs an (inexact) subspace phase, and then assesses the quality of the new active set. If it is not judged to be acceptable, then the set of free variables is restricted and a new active-set prediction is made. We establish global convergence for our approach, and compare an implementation of the new method against state-of-the-art numerical codes to demonstrate its competitiveness.
520 $a Chapter 2 outlines an algorithm for minimizing a continuous function that may be nonsmooth and nonconvex, subject to bound constraints. We propose an algorithm that uses the L-BFGS quasi-Newton approximation of the problem's curvature together with a variant of a weak Wolfe line search. The key ingredient of the method is an active-set selection strategy that defines the subspace in which search directions are computed. To overcome the inherent shortsightedness of the gradient for a nonsmooth function, we propose two strategies. The first relies on an approximation of the $\epsilon$-minimum norm subgradient, and the second uses an iterative corrective loop that augments the active set based on the resulting search directions. We describe a Python implementation of the proposed algorithm and present numerical results on a set of standard test problems to illustrate the efficacy of our approach.
520 $a Chapter 3 investigates the gap in statistical generalization performance between large- and small-batch methods in the task of training state-of-the-art deep neural network models. The stochastic gradient descent (SGD) method and its variants are algorithms of choice for many Deep Learning tasks. These methods operate in a small-batch regime wherein a fraction of the training data, say 32--512 data points, is sampled to compute an approximation to the gradient. It has been observed in practice that when using a larger batch there is a degradation in the quality of the model, as measured by its ability to generalize. We investigate the cause for this generalization drop in the large-batch regime and present numerical evidence that supports the view that large-batch methods tend to converge to sharp minimizers of the training and testing functions --- and as is well known, sharp minima lead to poorer generalization. In contrast, small-batch methods consistently converge to flat minimizers, and our experiments support a commonly held view that this is due to the inherent noise in the gradient estimation. We also discuss several strategies to attempt to help large-batch methods eliminate this generalization gap.
590 $a School code: 0163.
650 4 $a Operations research. $3 547123
650 4 $a Applied mathematics. $3 2122814
690 $a 0796
690 $a 0364
710 2 $a Northwestern University. $b Industrial Engineering and Management Sciences. $3 1023502
773 0 $t Dissertation Abstracts International $g 78-10B(E).
790 $a 0163
791 $a Ph.D.
792 $a 2017
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10265605