Language:
English
繁體中文
Help
回圖書館首頁
手機版館藏查詢
Login
Back
Switch To:
Labeled
|
MARC Mode
|
ISBD
Data-driven approaches to improve de...
~
Potharaju, Rahul.
Linked to FindBook
Google Book
Amazon
博客來
Data-driven approaches to improve dependability of cloud services.
Record Type:
Electronic resources : Monograph/item
Title/Author:
Data-driven approaches to improve dependability of cloud services./
Author:
Potharaju, Rahul.
Description:
158 p.
Notes:
Source: Dissertation Abstracts International, Volume: 76-02(E), Section: B.
Contained By:
Dissertation Abstracts International76-02B(E).
Subject:
Computer Science. -
Online resource:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3636490
ISBN:
9781321181326
Data-driven approaches to improve dependability of cloud services.
Potharaju, Rahul.
Data-driven approaches to improve dependability of cloud services.
- 158 p.
Source: Dissertation Abstracts International, Volume: 76-02(E), Section: B.
Thesis (Ph.D.)--Purdue University, 2014.
This item must not be sold to any third party vendors.
The growing demand for always-on and low-latency cloud services is driving the creation of globally distributed datacenters. A major factor affecting service availability is reliability of the network, both inside the datacenters and wide-area links connecting them. While several research efforts focus on building scale-out datacenter networks, little has been reported on real network failures and how they impact geo-distributed services. Towards improving the dependability of the underlying datacenter networks, in this dissertation, we make one of the first attempts to characterize intra-datacenter and inter-datacenter network failures from a service perspective. Specifically, we make the following contributions: 1. Analysis Methodology for Structured Data: Our dataset includes multiple sources of structured network telemetry data spanning three years logged in monitoring servers of a large cloud provider comprising 100k+ servers, 10k+ core network devices, 2k+ middleboxes and 100k+ network links across 10+ datacenters. This dataset covers a wide range of network data sources, including syslog and SNMP alerts, and traffic carried by links. To this end, we describe a systematic methodology for analyzing this structured data based on event processing to extract events having service-level impact. 2. Analysis Methodology for Unstructured Data Our dataset also includes an important piece of operational knowledge -- network trouble tickets, which are diaries written by network operators to keep track of their troubleshooting efforts while fixing a problem. To this end, we take a practical step towards automatically analyzing natural language text in network trouble tickets to infer the problem symptoms, troubleshooting activities and resolution actions. Our system, NetSieve combines statistical natural language processing (NLP), knowledge representation, and ontology modeling to achieve these goals. 3. Data-Driven Approaches to Deriving Actionable Insights: Our overarching goal in this dissertation is to enable operators to understand global problem trends instead of making decisions based on isolated incidents. We outline several analyses rooted in reliability analysis and applied statistics for characterizing network failures and deriving actionable insights from them. Our study reveals several important findings on (a) the failure characteristics of network elements, (b) the availability of network domains, (c) service impact, (d) causes of network failures, (e) effectiveness of repairs, and (f) modeling failures.
ISBN: 9781321181326Subjects--Topical Terms:
626642
Computer Science.
Data-driven approaches to improve dependability of cloud services.
LDR
:04889nmm a2200325 4500
001
2056296
005
20150505071908.5
008
170521s2014 ||||||||||||||||| ||eng d
020
$a
9781321181326
035
$a
(MiAaPQ)AAI3636490
035
$a
AAI3636490
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Potharaju, Rahul.
$3
3170043
245
1 0
$a
Data-driven approaches to improve dependability of cloud services.
300
$a
158 p.
500
$a
Source: Dissertation Abstracts International, Volume: 76-02(E), Section: B.
500
$a
Adviser: Cristina Nita-Rotaru.
502
$a
Thesis (Ph.D.)--Purdue University, 2014.
506
$a
This item must not be sold to any third party vendors.
520
$a
The growing demand for always-on and low-latency cloud services is driving the creation of globally distributed datacenters. A major factor affecting service availability is reliability of the network, both inside the datacenters and wide-area links connecting them. While several research efforts focus on building scale-out datacenter networks, little has been reported on real network failures and how they impact geo-distributed services. Towards improving the dependability of the underlying datacenter networks, in this dissertation, we make one of the first attempts to characterize intra-datacenter and inter-datacenter network failures from a service perspective. Specifically, we make the following contributions: 1. Analysis Methodology for Structured Data: Our dataset includes multiple sources of structured network telemetry data spanning three years logged in monitoring servers of a large cloud provider comprising 100k+ servers, 10k+ core network devices, 2k+ middleboxes and 100k+ network links across 10+ datacenters. This dataset covers a wide range of network data sources, including syslog and SNMP alerts, and traffic carried by links. To this end, we describe a systematic methodology for analyzing this structured data based on event processing to extract events having service-level impact. 2. Analysis Methodology for Unstructured Data Our dataset also includes an important piece of operational knowledge -- network trouble tickets, which are diaries written by network operators to keep track of their troubleshooting efforts while fixing a problem. To this end, we take a practical step towards automatically analyzing natural language text in network trouble tickets to infer the problem symptoms, troubleshooting activities and resolution actions. Our system, NetSieve combines statistical natural language processing (NLP), knowledge representation, and ontology modeling to achieve these goals. 3. Data-Driven Approaches to Deriving Actionable Insights: Our overarching goal in this dissertation is to enable operators to understand global problem trends instead of making decisions based on isolated incidents. We outline several analyses rooted in reliability analysis and applied statistics for characterizing network failures and deriving actionable insights from them. Our study reveals several important findings on (a) the failure characteristics of network elements, (b) the availability of network domains, (c) service impact, (d) causes of network failures, (e) effectiveness of repairs, and (f) modeling failures.
520
$a
As part of this dissertation, we have built a broad range of systems including real-time network dashboards, a big data analytics system for analyzing network telemetry data, and an inference tool for root cause analysis in network troubleshooting. Several components of the dissertation work either have undergone a tech-transfer or are being used by multiple business groups inside Microsoft. NetWiser, a Microsoft Research project entailing this dissertation, was awarded the Microsoft Trustworthy Computing Reliability Award for 2013.
520
$a
The problem inference system part of this dissertation, NetSieve, is currently being used across different teams within Microsoft to improve network management: the Network Architecture team for comparing device reliability across platforms and vendors, the Capacity Planning team for understanding why network redundancy is ineffective in masking failures, and the Incident Management and Operations team for finding the top-k problems and failing components while troubleshooting devices and determining whether past repairs were effective. Since its inception, NetSieve has also been used to automate root cause analysis of security incidents within Microsoft's datacenters and recently found its way into commercial use through Microsoft's System Center Advisor (http://www.systemcenteradvisor.com).
590
$a
School code: 0183.
650
4
$a
Computer Science.
$3
626642
650
4
$a
Information Science.
$3
1017528
650
4
$a
Information Technology.
$3
1030799
690
$a
0984
690
$a
0723
690
$a
0489
710
2
$a
Purdue University.
$b
Computer Sciences.
$3
1019069
773
0
$t
Dissertation Abstracts International
$g
76-02B(E).
790
$a
0183
791
$a
Ph.D.
792
$a
2014
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3636490
based on 0 review(s)
Location:
ALL
電子資源
Year:
Volume Number:
Items
1 records • Pages 1 •
1
Inventory Number
Location Name
Item Class
Material type
Call number
Usage Class
Loan Status
No. of reservations
Opac note
Attachments
W9288775
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
On shelf
0
1 records • Pages 1 •
1
Multimedia
Reviews
Add a review
and share your thoughts with other readers
Export
pickup library
Processing
...
Change password
Login