東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Efficient and Scalable Optimization ...

Jahani, Majid.

FindBook

Google Book

Amazon

博客來

Efficient and Scalable Optimization Methods for Training Large-Scale Machine Learning Models.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Efficient and Scalable Optimization Methods for Training Large-Scale Machine Learning Models./
作者:	Jahani, Majid.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2021,
面頁冊數:	232 p.
附註:	Source: Dissertations Abstracts International, Volume: 82-12, Section: B.
Contained By:	Dissertations Abstracts International82-12B.
標題:	Industrial engineering. -
電子資源:	https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28496808
ISBN:	9798515201036

Efficient and Scalable Optimization Methods for Training Large-Scale Machine Learning Models.
Jahani, Majid.

Efficient and Scalable Optimization Methods for Training Large-Scale Machine Learning Models. - Ann Arbor : ProQuest Dissertations & Theses, 2021 - 232 p.

Source: Dissertations Abstracts International, Volume: 82-12, Section: B.

Thesis (Ph.D.)--Lehigh University, 2021.

This item must not be sold to any third party vendors.

Many important problems in machine learning (ML) and data science are formulated as optimization problems and solved using optimization algorithms. With the scale of modern datasets and the ill-conditioned nature of real-world problems, new powerful algorithms are needed to address the aforementioned issues. The goal of this thesis is to develop and design efficient and scalable optimization algorithms with corresponding convergence guarantees for ill-conditioned problems that often arise in large-scale ML. The thesis contains three main parts with 5 chapters. Chapter 1 deals with novel accelerated gradient methods with optimality certificates. Chapters 2 and 3 talks about new quasi-Newton algorithms. Moreover, chapters 4 and 5 deal with efficient and scalable distributed optimization algorithms.The focus of Chapter 1 is to address the question of how to construct an appropriate sequence of lower bounds for strongly convex smooth functions and for strongly convex composite functions. We propose several first order methods for minimizing strongly convex functions in both the smooth and composite cases. We will show that all presented algorithms converge linearly, with the accelerated variants enjoying the optimal linear rate of convergence.Chapter 2 is concerned with two sampled quasi-Newton methods: sampled LBFGS and sampled LSR1. Contrary to the classical variants of these methods that sequentially build (inverse) Hessian approximations as the optimization progresses, our proposed methods sample points randomly around the current iterate to produce these approximations. As a result, the approximations constructed make use of more reliable (recent and local) information, and do not depend on past information that could be significantly stale.Chapter 3 describes a new optimization algorithm that bridges the gap between first- and second-order methods by computing a search direction that uses a second-order-type update in one subspace, coupled with a scaled steepest descent step in the orthogonal complement subspace. To this end, partial curvature information is incorporated to help with ill-conditioning, while simultaneously allowing the algorithm to scale to the large problem dimensions often encountered in machine learning applications.In Chapter 4, a novel approach is proposed in which the sample size is gradually increasing to quickly obtain a solution whose empirical loss is under satisfactory statistical accuracy. Our proposed method is multistage in which the solution of a stage serves as a warm start for the next stage which contains more samples (including the samples in the previous stage). The proposed multistage algorithm reduces the number of passes over data to achieve the approximate statistical accuracy of the full training set. Moreover, our algorithm in nature is easy to be distributed and shares the strong scaling property indicating that acceleration is always expected by using more computing nodes.Chapter 5 presents a scalable distributed implementation of the Sampled Limited-memory Symmetric Rank-1 (S-LSR1) algorithm. First, we show that a naive distributed implementation of S-LSR1 requires multiple rounds of expensive communications at every iteration and thus is inefficient. We then propose DS-LSR1, a communication-efficient variant that: (i) drastically reduces the amount of data communicated at every iteration, (ii) has favorable work-load balancing across nodes, and (iii) is matrix-free and inverse-free.

ISBN: 9798515201036Subjects--Topical Terms:

526216
Industrial engineering.
Subjects--Index Terms:

Deep learning

Efficient and Scalable Optimization Methods for Training Large-Scale Machine Learning Models.
LDR:04688nmm a2200373 4500 001 2282373
005 20211001100834.5
008 220723s2021 ||||||||||||||||| ||eng d
020 $a 9798515201036
035 $a (MiAaPQ)AAI28496808
035 $a AAI28496808
040 $a MiAaPQ $c MiAaPQ
100 1 $a Jahani, Majid. $3 3561176
245 1 0 $a Efficient and Scalable Optimization Methods for Training Large-Scale Machine Learning Models.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2021
300 $a 232 p.
500 $a Source: Dissertations Abstracts International, Volume: 82-12, Section: B.
500 $a Advisor: Takac, Martin.
502 $a Thesis (Ph.D.)--Lehigh University, 2021.
506 $a This item must not be sold to any third party vendors.
520 $a Many important problems in machine learning (ML) and data science are formulated as optimization problems and solved using optimization algorithms. With the scale of modern datasets and the ill-conditioned nature of real-world problems, new powerful algorithms are needed to address the aforementioned issues. The goal of this thesis is to develop and design efficient and scalable optimization algorithms with corresponding convergence guarantees for ill-conditioned problems that often arise in large-scale ML. The thesis contains three main parts with 5 chapters. Chapter 1 deals with novel accelerated gradient methods with optimality certificates. Chapters 2 and 3 talks about new quasi-Newton algorithms. Moreover, chapters 4 and 5 deal with efficient and scalable distributed optimization algorithms.The focus of Chapter 1 is to address the question of how to construct an appropriate sequence of lower bounds for strongly convex smooth functions and for strongly convex composite functions. We propose several first order methods for minimizing strongly convex functions in both the smooth and composite cases. We will show that all presented algorithms converge linearly, with the accelerated variants enjoying the optimal linear rate of convergence.Chapter 2 is concerned with two sampled quasi-Newton methods: sampled LBFGS and sampled LSR1. Contrary to the classical variants of these methods that sequentially build (inverse) Hessian approximations as the optimization progresses, our proposed methods sample points randomly around the current iterate to produce these approximations. As a result, the approximations constructed make use of more reliable (recent and local) information, and do not depend on past information that could be significantly stale.Chapter 3 describes a new optimization algorithm that bridges the gap between first- and second-order methods by computing a search direction that uses a second-order-type update in one subspace, coupled with a scaled steepest descent step in the orthogonal complement subspace. To this end, partial curvature information is incorporated to help with ill-conditioning, while simultaneously allowing the algorithm to scale to the large problem dimensions often encountered in machine learning applications.In Chapter 4, a novel approach is proposed in which the sample size is gradually increasing to quickly obtain a solution whose empirical loss is under satisfactory statistical accuracy. Our proposed method is multistage in which the solution of a stage serves as a warm start for the next stage which contains more samples (including the samples in the previous stage). The proposed multistage algorithm reduces the number of passes over data to achieve the approximate statistical accuracy of the full training set. Moreover, our algorithm in nature is easy to be distributed and shares the strong scaling property indicating that acceleration is always expected by using more computing nodes.Chapter 5 presents a scalable distributed implementation of the Sampled Limited-memory Symmetric Rank-1 (S-LSR1) algorithm. First, we show that a naive distributed implementation of S-LSR1 requires multiple rounds of expensive communications at every iteration and thus is inefficient. We then propose DS-LSR1, a communication-efficient variant that: (i) drastically reduces the amount of data communicated at every iteration, (ii) has favorable work-load balancing across nodes, and (iii) is matrix-free and inverse-free.
590 $a School code: 0105.
650 4 $a Industrial engineering. $3 526216
650 4 $a Artificial intelligence. $3 516317
650 4 $a Applied mathematics. $3 2122814
653 $a Deep learning
653 $a Distributed optimization
653 $a High-performance computing
653 $a Machine learning
653 $a Nonlinear optimization
690 $a 0546
690 $a 0800
690 $a 0364
710 2 $a Lehigh University. $b Industrial Engineering. $3 2092122
773 0 $t Dissertations Abstracts International $g 82-12B.
790 $a 0105
791 $a Ph.D.
792 $a 2021
793 $a English
856 4 0 $u https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28496808