東華大學圖書館 |

Active Resource Partitioning and Planning for Storage Systems Using Time Series Forecasting and Machine Learning Techniques.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Active Resource Partitioning and Planning for Storage Systems Using Time Series Forecasting and Machine Learning Techniques./
作者:	Kachmar, Maher Amine.
面頁冊數:	1 online resource (98 pages)
附註:	Source: Dissertations Abstracts International, Volume: 83-02, Section: A.
Contained By:	Dissertations Abstracts International83-02A.
標題:	Computer engineering. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28648537click for full text (PQDT)
ISBN:	9798534699203

Active Resource Partitioning and Planning for Storage Systems Using Time Series Forecasting and Machine Learning Techniques.
Kachmar, Maher Amine.

Active Resource Partitioning and Planning for Storage Systems Using Time Series Forecasting and Machine Learning Techniques. - 1 online resource (98 pages)

Source: Dissertations Abstracts International, Volume: 83-02, Section: A.

Thesis (Ph.D.)--Northeastern University, 2021.

Includes bibliographical references

In today's enterprise storage systems, supported data services such as snapshot delete or drive rebuild can result in tremendous performance overhead if executed inline along with heavy foreground IO, often leading to missing Service Level Objectives (SLOs). Moreover, new classes of data services, such as thin provisioning, instant volume snapshots, and data reduction features make capacity planning and drive wear-out prediction quiet challenging. Having enough free storage pool capacity available ensures that the storage system operates in favorable conditions during heavy foreground IO cycles. This enables the storage system to defer background work to a future idle cycle. Static partitioning of storage systems resources such as CPU cores or memory caches may lead to missing data reduction rate (DRR) guarantees. However, typical storage system applications such as Virtual Desktop Infrastructure (VDI) or web services follow a repetitive workload pattern that can be learned and/or forecasted. Learning these workload pattern allows us to address several storage system resource partitioning and planning challenges that may not be overcome with traditional manual tuning and primitive feedback mechanism.First, we propose a priority-based background scheduler that learns this pattern and allows storage systems to maintain peak performance and meet service level objectives (SLOs) while supporting a number of data services. When foreground IO demand intensifies, system resources are dedicated to service foreground IO requests. Any background processing that can be deferred is recorded to be processed in future idle cycles, as long as our forecaster predicts that the storage pool has remaining capacity. A smart background scheduler can adopt a resource partitioning model that allows both foreground and background IO to execute together, as long as foreground IOs are not impacted, harnessing any free cycles to clear background debt. Using traces from VDI and web services applications, we show how our technique can out-perform a static policy that sets fixed limits on the deferred background debt and reduces SLO violations from 54.6% (when using a fixed background debt watermark), to only 6.2% when employing our dynamic smart background scheduler.Second, we propose a smart capacity planning and recommendation tool that ensures the right number of drives are available in the storage pool in order to meet both capacity and performance constraints, without over-provisioning storage. Equipped with forecasting models that characterize workload patterns, we can predict future storage pool utilization and drive wear-outs. Similarly, to meet SLOs, the tool recommends expanding pool space in order to defer more background work through larger debt bins. Overall, our capacity planning tool provides a day/hour countdown for the next Data Unavailability/Data Loss (DU/DL) event, accurately predicting DU/DL events to cover a future 12-hour time window.Moreover, supported services such as data deduplication are becoming a common feature adopted in the data center, especially as new storage technologies mature. Static partitioning of storage system resources, memory caches, may lead to missing SLOs, such as the Data Reduction Rate (DRR) or IO latency. Lastly, we propose a Content-Aware Learning Cache (CALC) that uses online reinforcement learning models (Q-Learning, SARSA and Actor-Critic) to actively partition the storage system cache between a deduplicated data digest cache, content cache, and address-based data cache to improve cache hit performance, while maximizing data reduction rates. Using traces from popular storage applications, we show how our machine learning approach is robust and can out-perform an iterative search method for various data-sets and cache sizes. Our content-aware learning cache improves hit rates by 7.1% when compared to iterative search methods, and 18.2% when compared to traditional LRU-based data cache implementation.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2023

Mode of access: World Wide Web

ISBN: 9798534699203Subjects--Topical Terms:

621879
Computer engineering.
Subjects--Index Terms:

Priority-based background schedulerIndex Terms--Genre/Form:

542853
Electronic books.

Active Resource Partitioning and Planning for Storage Systems Using Time Series Forecasting and Machine Learning Techniques.
LDR:05474nmm a2200397K 4500 001 2357220
005 20230622065019.5
006 m o d
007 cr mn ---uuuuu
008 241011s2021 xx obm 000 0 eng d
020 $a 9798534699203
035 $a (MiAaPQ)AAI28648537
035 $a AAI28648537
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Kachmar, Maher Amine. $3 3697750
245 1 0 $a Active Resource Partitioning and Planning for Storage Systems Using Time Series Forecasting and Machine Learning Techniques.
264 0 $c 2021
300 $a 1 online resource (98 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertations Abstracts International, Volume: 83-02, Section: A.
500 $a Advisor: Kaeli, David.
502 $a Thesis (Ph.D.)--Northeastern University, 2021.
504 $a Includes bibliographical references
520 $a In today's enterprise storage systems, supported data services such as snapshot delete or drive rebuild can result in tremendous performance overhead if executed inline along with heavy foreground IO, often leading to missing Service Level Objectives (SLOs). Moreover, new classes of data services, such as thin provisioning, instant volume snapshots, and data reduction features make capacity planning and drive wear-out prediction quiet challenging. Having enough free storage pool capacity available ensures that the storage system operates in favorable conditions during heavy foreground IO cycles. This enables the storage system to defer background work to a future idle cycle. Static partitioning of storage systems resources such as CPU cores or memory caches may lead to missing data reduction rate (DRR) guarantees. However, typical storage system applications such as Virtual Desktop Infrastructure (VDI) or web services follow a repetitive workload pattern that can be learned and/or forecasted. Learning these workload pattern allows us to address several storage system resource partitioning and planning challenges that may not be overcome with traditional manual tuning and primitive feedback mechanism.First, we propose a priority-based background scheduler that learns this pattern and allows storage systems to maintain peak performance and meet service level objectives (SLOs) while supporting a number of data services. When foreground IO demand intensifies, system resources are dedicated to service foreground IO requests. Any background processing that can be deferred is recorded to be processed in future idle cycles, as long as our forecaster predicts that the storage pool has remaining capacity. A smart background scheduler can adopt a resource partitioning model that allows both foreground and background IO to execute together, as long as foreground IOs are not impacted, harnessing any free cycles to clear background debt. Using traces from VDI and web services applications, we show how our technique can out-perform a static policy that sets fixed limits on the deferred background debt and reduces SLO violations from 54.6% (when using a fixed background debt watermark), to only 6.2% when employing our dynamic smart background scheduler.Second, we propose a smart capacity planning and recommendation tool that ensures the right number of drives are available in the storage pool in order to meet both capacity and performance constraints, without over-provisioning storage. Equipped with forecasting models that characterize workload patterns, we can predict future storage pool utilization and drive wear-outs. Similarly, to meet SLOs, the tool recommends expanding pool space in order to defer more background work through larger debt bins. Overall, our capacity planning tool provides a day/hour countdown for the next Data Unavailability/Data Loss (DU/DL) event, accurately predicting DU/DL events to cover a future 12-hour time window.Moreover, supported services such as data deduplication are becoming a common feature adopted in the data center, especially as new storage technologies mature. Static partitioning of storage system resources, memory caches, may lead to missing SLOs, such as the Data Reduction Rate (DRR) or IO latency. Lastly, we propose a Content-Aware Learning Cache (CALC) that uses online reinforcement learning models (Q-Learning, SARSA and Actor-Critic) to actively partition the storage system cache between a deduplicated data digest cache, content cache, and address-based data cache to improve cache hit performance, while maximizing data reduction rates. Using traces from popular storage applications, we show how our machine learning approach is robust and can out-perform an iterative search method for various data-sets and cache sizes. Our content-aware learning cache improves hit rates by 7.1% when compared to iterative search methods, and 18.2% when compared to traditional LRU-based data cache implementation.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2023
538 $a Mode of access: World Wide Web
650 4 $a Computer engineering. $3 621879
650 4 $a Artificial intelligence. $3 516317
650 4 $a Electrical engineering. $3 649834
650 4 $a Agreements. $3 3559354
650 4 $a Violations. $3 3558956
650 4 $a Metadata. $3 590006
650 4 $a Datasets. $3 3541416
650 4 $a Response time. $3 3562212
650 4 $a Forecasting. $3 547120
650 4 $a Iterative methods. $3 3686120
650 4 $a Dissertations & theses. $3 3560115
650 4 $a Data analysis. $2 bisacsh $3 3515250
650 4 $a Time series. $3 3561811
650 4 $a Data compression. $3 3681696
650 4 $a Sanitation services. $3 3560997
650 4 $a Data integrity. $3 2142314
650 4 $a Neural networks. $3 677449
650 4 $a Design. $3 518875
650 4 $a Methods. $3 3560391
650 4 $a Algorithms. $3 536374
653 $a Priority-based background scheduler
653 $a Smart capacity planning and recommendation tool
653 $a Content-Aware Learning Cache
653 $a Service level objectives
655 7 $a Electronic books. $2 lcsh $3 542853
690 $a 0464
690 $a 0544
690 $a 0800
690 $a 0389
710 2 $a ProQuest Information and Learning Co. $3 783688
710 2 $a Northeastern University. $b Electrical and Computer Engineering. $3 1018491
773 0 $t Dissertations Abstracts International $g 83-02A.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28648537 $z click for full text (PQDT)