語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Data Reduction for Communication-Eff...
~
Lu, Hanlin.
FindBook
Google Book
Amazon
博客來
Data Reduction for Communication-Efficient Machine Learning.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Data Reduction for Communication-Efficient Machine Learning./
作者:
Lu, Hanlin.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2021,
面頁冊數:
126 p.
附註:
Source: Dissertations Abstracts International, Volume: 83-03, Section: B.
Contained By:
Dissertations Abstracts International83-03B.
標題:
Construction. -
電子資源:
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28841713
ISBN:
9798460447800
Data Reduction for Communication-Efficient Machine Learning.
Lu, Hanlin.
Data Reduction for Communication-Efficient Machine Learning.
- Ann Arbor : ProQuest Dissertations & Theses, 2021 - 126 p.
Source: Dissertations Abstracts International, Volume: 83-03, Section: B.
Thesis (Ph.D.)--The Pennsylvania State University, 2021.
This item must not be sold to any third party vendors.
In recent years, we have observed a dramatic growth of data generation in edge-based machine learning applications. Motivated by the need of solving machine learning problem over distributed datasets, we would like to reduce the size of datasets as well as minimizing the machine learning performance degradation. Suppose we are given a dataset P, it could be represented by a data cube with three dimensions: cardinality n, number of features d and number of precision bits b. In this dissertation, we will explore different data reduction techniques to reduce these three dimensions and make three steps toward reducing the total size of the dataset. In our first step, we consider using coreset to reduce the cardinality of the collected dataset. Coreset is a small weighted dataset, functioning as a proxy of the original dataset. However, existing coreset construction algorithms are each tailor-made for a specific machine learning problem. That is, we are required to construct different coresets to support different machine learning models. In our first step, we resolve this dilemma by developing robust coreset construction algorithms based on k-clustering algorithms. Our solution is proved to give a guaranteed approximation for a broad range of machine learning problems with sufficiently continuous cost functions. In our second step, we propose the first framework to incorporate quantization techniques into the process of coreset construction. Specifically, we theoretically analyze the ML error caused by a combination of coreset construction techniques and quantization techniques. Based on that, we formulate an optimization problem to minimize the ML error under a fixed budget of communication cost. To improve the scalability for large datasets, we identify two proxies of the original objective function, for which efficient algorithms are developed. For the case of data on multiple nodes, we further design a novel algorithm to allocate the communication budgets to different nodes while minimizing the overall ML error. As our third step, we consider the problem of solving edge-based k-means on a large dataset in high dimensional space. In this application scenario, data sources offload machine learning computation to nearby edge servers under limited communication budget and computation power. To solve this problem, we propose to construct small data summaries with fewer data samples (by techniques for Cardinality Reduction (CR)), fewer features (by techniques for Dimensionality Reduction (DR)) and fewer precision bits (by techniques for Quantization (QT)). By analyzing the complexity, the communication cost, and the approximation error of k-means algorithms based on state-of-the-art data reduction methods, we show that: (i) it is possible to achieve a near-optimal approximation at a near-linear complexity and a constant communication cost, (ii) the order of applying DR and CR leads to a tradeoff between the complexity and the communication cost, (iii) combining DR/CR methods with a properly selected quantizer can further reduce the communication cost without compromising the other performance metrics. At last, in each step, the effectiveness of our analysis is verified through extensive experiments on multiple real datasets and different machine learning problems.
ISBN: 9798460447800Subjects--Topical Terms:
3561054
Construction.
Subjects--Index Terms:
Machine learning
Data Reduction for Communication-Efficient Machine Learning.
LDR
:04365nmm a2200325 4500
001
2283887
005
20211115071711.5
008
220723s2021 ||||||||||||||||| ||eng d
020
$a
9798460447800
035
$a
(MiAaPQ)AAI28841713
035
$a
(MiAaPQ)PennState_23883hzl263
035
$a
AAI28841713
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Lu, Hanlin.
$3
3562962
245
1 0
$a
Data Reduction for Communication-Efficient Machine Learning.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2021
300
$a
126 p.
500
$a
Source: Dissertations Abstracts International, Volume: 83-03, Section: B.
500
$a
Advisor: He, Ting.
502
$a
Thesis (Ph.D.)--The Pennsylvania State University, 2021.
506
$a
This item must not be sold to any third party vendors.
520
$a
In recent years, we have observed a dramatic growth of data generation in edge-based machine learning applications. Motivated by the need of solving machine learning problem over distributed datasets, we would like to reduce the size of datasets as well as minimizing the machine learning performance degradation. Suppose we are given a dataset P, it could be represented by a data cube with three dimensions: cardinality n, number of features d and number of precision bits b. In this dissertation, we will explore different data reduction techniques to reduce these three dimensions and make three steps toward reducing the total size of the dataset. In our first step, we consider using coreset to reduce the cardinality of the collected dataset. Coreset is a small weighted dataset, functioning as a proxy of the original dataset. However, existing coreset construction algorithms are each tailor-made for a specific machine learning problem. That is, we are required to construct different coresets to support different machine learning models. In our first step, we resolve this dilemma by developing robust coreset construction algorithms based on k-clustering algorithms. Our solution is proved to give a guaranteed approximation for a broad range of machine learning problems with sufficiently continuous cost functions. In our second step, we propose the first framework to incorporate quantization techniques into the process of coreset construction. Specifically, we theoretically analyze the ML error caused by a combination of coreset construction techniques and quantization techniques. Based on that, we formulate an optimization problem to minimize the ML error under a fixed budget of communication cost. To improve the scalability for large datasets, we identify two proxies of the original objective function, for which efficient algorithms are developed. For the case of data on multiple nodes, we further design a novel algorithm to allocate the communication budgets to different nodes while minimizing the overall ML error. As our third step, we consider the problem of solving edge-based k-means on a large dataset in high dimensional space. In this application scenario, data sources offload machine learning computation to nearby edge servers under limited communication budget and computation power. To solve this problem, we propose to construct small data summaries with fewer data samples (by techniques for Cardinality Reduction (CR)), fewer features (by techniques for Dimensionality Reduction (DR)) and fewer precision bits (by techniques for Quantization (QT)). By analyzing the complexity, the communication cost, and the approximation error of k-means algorithms based on state-of-the-art data reduction methods, we show that: (i) it is possible to achieve a near-optimal approximation at a near-linear complexity and a constant communication cost, (ii) the order of applying DR and CR leads to a tradeoff between the complexity and the communication cost, (iii) combining DR/CR methods with a properly selected quantizer can further reduce the communication cost without compromising the other performance metrics. At last, in each step, the effectiveness of our analysis is verified through extensive experiments on multiple real datasets and different machine learning problems.
590
$a
School code: 0176.
650
4
$a
Construction.
$3
3561054
650
4
$a
Cameras.
$3
524039
650
4
$a
Deep learning.
$3
3554982
650
4
$a
Datasets.
$3
3541416
650
4
$a
Communication.
$3
524709
650
4
$a
Power.
$3
518736
650
4
$a
Bandwidths.
$3
3560998
650
4
$a
Autonomous vehicles.
$3
2179092
650
4
$a
Optimization.
$3
891104
650
4
$a
Neural networks.
$3
677449
650
4
$a
Sensors.
$3
3549539
650
4
$a
Internet of Things.
$3
3538511
650
4
$a
Algorithms.
$3
536374
650
4
$a
Surveillance.
$3
3559358
650
4
$a
Computer science.
$3
523869
653
$a
Machine learning
690
$a
0459
690
$a
0984
710
2
$a
The Pennsylvania State University.
$3
699896
773
0
$t
Dissertations Abstracts International
$g
83-03B.
790
$a
0176
791
$a
Ph.D.
792
$a
2021
793
$a
English
856
4 0
$u
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28841713
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9435620
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入