東華大學圖書館 |

Automated Runtime Data Analysis for System Reliability Management.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Automated Runtime Data Analysis for System Reliability Management./
作者:	He, Pinjia.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2018,
面頁冊數:	197 p.
附註:	Source: Dissertations Abstracts International, Volume: 79-12, Section: B.
Contained By:	Dissertations Abstracts International79-12B.
標題:	Systems science. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10902155
ISBN:	9780438105812

Automated Runtime Data Analysis for System Reliability Management.
He, Pinjia.

Automated Runtime Data Analysis for System Reliability Management. - Ann Arbor : ProQuest Dissertations & Theses, 2018 - 197 p.

Source: Dissertations Abstracts International, Volume: 79-12, Section: B.

Thesis (Ph.D.)--The Chinese University of Hong Kong (Hong Kong), 2018.

This item must not be sold to any third party vendors.

Runtime data are data generated by systems or programs during their execution. Typical runtime data include system logs and Quality-of-Service (QoS) values, which are widely employed by developers in various system reliability management tasks, such as anomaly detection, operational issues handling, performance prediction, etc. However, traditional reliability management methods become inefficient and error-prone because of the increase of modern system complexity and the rapid growth of runtime data volume. In this thesis, we propose automated data analysis methods to effectively utilize runtime data in reliability management tasks. Firstly, we conduct an evaluation study on existing data-driven log parsing methods. Log parsing is the first step of many log based reliability management methods. In log parsing, the unstructured raw log messages are transformed into structured event sequences. Although log parsing has been widely studied, a comprehensive benchmarking and an open-source toolkit are lacking. We implement four representative log parsing methods and evaluate their performance in terms of accuracy, efficiency, and effectiveness on reliability management tasks. We obtain six insightful findings, and make these parsing methods open-source for reuse. Secondly, we propose a parallel log parsing method for large-scale log data analysis. When system logs grow to a large scale, existing log parsing methods fail to complete in reasonable time, which makes log parsing the bottleneck of reliability management tasks. Because timely reliability management is important, an efficient log parsing method that can accurately parse large-scale log data is highly demanded. Our proposed parallel log parser POP employs specially designed heuristic rules and clustering algorithm. It is optimized on top of Spark, a large-scale data processing platform. Thus, POP can employ the computing power of computer clusters and handle large-scale logs efficiently. Thirdly, we propose an online log parsing method to parse raw log messages in a streaming manner. Most of existing log parsing methods focus on offline, batch processing of logs. However, typical log collection process in modern systems is online, which make an online log parser more eligible than the offline ones. Besides, an online log parsing methods can keep updating the parsing model by newly collected log messages. By designing a fixed depth parse tree, our proposed online log parsing method can efficiently parse log messages in a streaming manner. Fourthly, we propose an operational issues prioritization method based on hierarchical log clustering. Modern system developers handle issues reported by their users daily. To gain insights into the issues and find out the solutions, they often need to inspect tons of logs generated during system runtime. Our proposed method largely facilitates the operational issues handling process by clustering similar issues to the same group based on their corresponding log sequences, and recommending the largest issue groups to developers. Specifically, our method includes a coarse-grained clustering based on the event appearance matrix and a fine-grained clustering based on the event count matrix. Lastly, we propose a QoS prediction method for Web service recommendation. A typical modern system based on Web services need to regularly switch its service components based on their QoS values (e.g., response time) to avoid potential system failure and maintain system performance. However, it is difficult for service users to monitor the QoS values of all candidate services. To predict these QoS values accurately, our proposed QoS prediction method utilizes matrix factorization on existing sparse QoS values. The location of service providers and users is encoded in the matrix factorization model to improve prediction accuracy. In summary, this thesis targets at the design of data-driven techniques on system runtime data to automate labor-intensive reliability management tasks. Extensive experiments on real-world datasets determine the effectiveness of our proposed methods.

ISBN: 9780438105812Subjects--Topical Terms:

3168411
Systems science.

Automated Runtime Data Analysis for System Reliability Management.
LDR:05238nmm a2200325 4500 001 2207954
005 20190929184018.5
008 201008s2018 ||||||||||||||||| ||eng d
020 $a 9780438105812
035 $a (MiAaPQ)AAI10902155
035 $a AAI10902155
040 $a MiAaPQ $c MiAaPQ
100 1 $a He, Pinjia. $3 3434952
245 1 0 $a Automated Runtime Data Analysis for System Reliability Management.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2018
300 $a 197 p.
500 $a Source: Dissertations Abstracts International, Volume: 79-12, Section: B.
500 $a Publisher info.: Dissertation/Thesis.
502 $a Thesis (Ph.D.)--The Chinese University of Hong Kong (Hong Kong), 2018.
506 $a This item must not be sold to any third party vendors.
506 $a This item must not be added to any third party search indexes.
520 $a Runtime data are data generated by systems or programs during their execution. Typical runtime data include system logs and Quality-of-Service (QoS) values, which are widely employed by developers in various system reliability management tasks, such as anomaly detection, operational issues handling, performance prediction, etc. However, traditional reliability management methods become inefficient and error-prone because of the increase of modern system complexity and the rapid growth of runtime data volume. In this thesis, we propose automated data analysis methods to effectively utilize runtime data in reliability management tasks. Firstly, we conduct an evaluation study on existing data-driven log parsing methods. Log parsing is the first step of many log based reliability management methods. In log parsing, the unstructured raw log messages are transformed into structured event sequences. Although log parsing has been widely studied, a comprehensive benchmarking and an open-source toolkit are lacking. We implement four representative log parsing methods and evaluate their performance in terms of accuracy, efficiency, and effectiveness on reliability management tasks. We obtain six insightful findings, and make these parsing methods open-source for reuse. Secondly, we propose a parallel log parsing method for large-scale log data analysis. When system logs grow to a large scale, existing log parsing methods fail to complete in reasonable time, which makes log parsing the bottleneck of reliability management tasks. Because timely reliability management is important, an efficient log parsing method that can accurately parse large-scale log data is highly demanded. Our proposed parallel log parser POP employs specially designed heuristic rules and clustering algorithm. It is optimized on top of Spark, a large-scale data processing platform. Thus, POP can employ the computing power of computer clusters and handle large-scale logs efficiently. Thirdly, we propose an online log parsing method to parse raw log messages in a streaming manner. Most of existing log parsing methods focus on offline, batch processing of logs. However, typical log collection process in modern systems is online, which make an online log parser more eligible than the offline ones. Besides, an online log parsing methods can keep updating the parsing model by newly collected log messages. By designing a fixed depth parse tree, our proposed online log parsing method can efficiently parse log messages in a streaming manner. Fourthly, we propose an operational issues prioritization method based on hierarchical log clustering. Modern system developers handle issues reported by their users daily. To gain insights into the issues and find out the solutions, they often need to inspect tons of logs generated during system runtime. Our proposed method largely facilitates the operational issues handling process by clustering similar issues to the same group based on their corresponding log sequences, and recommending the largest issue groups to developers. Specifically, our method includes a coarse-grained clustering based on the event appearance matrix and a fine-grained clustering based on the event count matrix. Lastly, we propose a QoS prediction method for Web service recommendation. A typical modern system based on Web services need to regularly switch its service components based on their QoS values (e.g., response time) to avoid potential system failure and maintain system performance. However, it is difficult for service users to monitor the QoS values of all candidate services. To predict these QoS values accurately, our proposed QoS prediction method utilizes matrix factorization on existing sparse QoS values. The location of service providers and users is encoded in the matrix factorization model to improve prediction accuracy. In summary, this thesis targets at the design of data-driven techniques on system runtime data to automate labor-intensive reliability management tasks. Extensive experiments on real-world datasets determine the effectiveness of our proposed methods.
590 $a School code: 1307.
650 4 $a Systems science. $3 3168411
650 4 $a Artificial intelligence. $3 516317
650 4 $a Computer science. $3 523869
690 $a 0790
690 $a 0800
690 $a 0984
710 2 $a The Chinese University of Hong Kong (Hong Kong). $b Computer Science and Engineering. $3 3428136
773 0 $t Dissertations Abstracts International $g 79-12B.
790 $a 1307
791 $a Ph.D.
792 $a 2018
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10902155