Language:
English
繁體中文
Help
回圖書館首頁
手機版館藏查詢
Login
Back
Switch To:
Labeled
|
MARC Mode
|
ISBD
An evolutionary machine learning fra...
~
Kamath, Uday Krishna.
Linked to FindBook
Google Book
Amazon
博客來
An evolutionary machine learning framework for big data sequence mining.
Record Type:
Language materials, printed : Monograph/item
Title/Author:
An evolutionary machine learning framework for big data sequence mining./
Author:
Kamath, Uday Krishna.
Description:
177 p.
Notes:
Source: Dissertation Abstracts International, Volume: 75-07(E), Section: B.
Contained By:
Dissertation Abstracts International75-07B(E).
Subject:
Computer Science. -
Online resource:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3615022
ISBN:
9781303807275
An evolutionary machine learning framework for big data sequence mining.
Kamath, Uday Krishna.
An evolutionary machine learning framework for big data sequence mining.
- 177 p.
Source: Dissertation Abstracts International, Volume: 75-07(E), Section: B.
Thesis (Ph.D.)--George Mason University, 2014.
Sequence classification is an important problem in many real-world applications. Unlike other machine learning data, there are no \explicit" features or signals in sequence data that can help traditional machine learning algorithms learn and predict from the data. Sequence data exhibits inter-relationships in the elements that are important in understanding and predicting future sequences. However, finding these relationships is proven to be an NPhard problem. When we use naive enumerations of combinations of elements or \brute force" iterative approaches for defining these features they often result in poor predictions. Some algorithms which perform well in prediction lack transparency, i.e., the discriminating features generated by these methods are not easily identifiable. In addition, the size of the sequence-based datasets presents practical challenges to most learning algorithms. Most sequence-based datasets contain millions or even billions of instances, for example, the genome-wide sequences of organisms in bioinformatics. At these sizes, classic learning algorithms often become prohibitively expensive, making scalability an important issue. Therefore, there is a need for an approach that can help find features/signals in complex sequences, oer meaningful discriminators, produce good predictions, and can scale well in time and space. This dissertation addresses the above issues by designing a comprehensive approach in the form of the Evolutionary Machine Learner (EML) framework. This framework can be employed on sequence-based datasets to generate explicit, human-recognizable features while solving the scalability issue. EML framework consists of a novel EA-based feature generation (EFG) algorithm for automatic feature construction. By modeling four complex sequencing problems in bioinformatics and generating meaningful, human-understandable features with comparable or better accuracy than the state of the art algorithms, the power and usefulness of the EFG algorithm is demonstrated. The EFG algorithm is also validated by applying it to time series classification problems showing the generic nature of the algorithm in finding the important discriminating patterns that assist in modeling sequence based data. EML framework addresses the scalability issue by means of a novel, parallel scalable machine learning algorithm (PSBML) based on spatially structured evolutionary algorithms. PSBML is validated on real-world \big data" classification problems for various properties of meta-learning, scalability and noise resilience using well known benchmark datasets. The PSBML algorithm is also proven theoretically to be a large margin classifier with linear scalability in training time and space, giving it a unique distinction among the existing large scale learning algorithms. Finally, the EML framework is validated on a large genome-wide bioinformatics classification problem and a large time series problem, showing that the combined algorithms achieve higher predictive performance, training time speed up, and the ability to produce human-understandable discriminating signals as features.
ISBN: 9781303807275Subjects--Topical Terms:
626642
Computer Science.
An evolutionary machine learning framework for big data sequence mining.
LDR
:04071nam a2200289 4500
001
1963672
005
20141007080205.5
008
150210s2014 ||||||||||||||||| ||eng d
020
$a
9781303807275
035
$a
(MiAaPQ)AAI3615022
035
$a
AAI3615022
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Kamath, Uday Krishna.
$3
2099979
245
1 3
$a
An evolutionary machine learning framework for big data sequence mining.
300
$a
177 p.
500
$a
Source: Dissertation Abstracts International, Volume: 75-07(E), Section: B.
500
$a
Adviser: Kenneth A. De Jong.
502
$a
Thesis (Ph.D.)--George Mason University, 2014.
520
$a
Sequence classification is an important problem in many real-world applications. Unlike other machine learning data, there are no \explicit" features or signals in sequence data that can help traditional machine learning algorithms learn and predict from the data. Sequence data exhibits inter-relationships in the elements that are important in understanding and predicting future sequences. However, finding these relationships is proven to be an NPhard problem. When we use naive enumerations of combinations of elements or \brute force" iterative approaches for defining these features they often result in poor predictions. Some algorithms which perform well in prediction lack transparency, i.e., the discriminating features generated by these methods are not easily identifiable. In addition, the size of the sequence-based datasets presents practical challenges to most learning algorithms. Most sequence-based datasets contain millions or even billions of instances, for example, the genome-wide sequences of organisms in bioinformatics. At these sizes, classic learning algorithms often become prohibitively expensive, making scalability an important issue. Therefore, there is a need for an approach that can help find features/signals in complex sequences, oer meaningful discriminators, produce good predictions, and can scale well in time and space. This dissertation addresses the above issues by designing a comprehensive approach in the form of the Evolutionary Machine Learner (EML) framework. This framework can be employed on sequence-based datasets to generate explicit, human-recognizable features while solving the scalability issue. EML framework consists of a novel EA-based feature generation (EFG) algorithm for automatic feature construction. By modeling four complex sequencing problems in bioinformatics and generating meaningful, human-understandable features with comparable or better accuracy than the state of the art algorithms, the power and usefulness of the EFG algorithm is demonstrated. The EFG algorithm is also validated by applying it to time series classification problems showing the generic nature of the algorithm in finding the important discriminating patterns that assist in modeling sequence based data. EML framework addresses the scalability issue by means of a novel, parallel scalable machine learning algorithm (PSBML) based on spatially structured evolutionary algorithms. PSBML is validated on real-world \big data" classification problems for various properties of meta-learning, scalability and noise resilience using well known benchmark datasets. The PSBML algorithm is also proven theoretically to be a large margin classifier with linear scalability in training time and space, giving it a unique distinction among the existing large scale learning algorithms. Finally, the EML framework is validated on a large genome-wide bioinformatics classification problem and a large time series problem, showing that the combined algorithms achieve higher predictive performance, training time speed up, and the ability to produce human-understandable discriminating signals as features.
590
$a
School code: 0883.
650
4
$a
Computer Science.
$3
626642
650
4
$a
Biology, Bioinformatics.
$3
1018415
650
4
$a
Information Science.
$3
1017528
690
$a
0984
690
$a
0715
690
$a
0723
710
2
$a
George Mason University.
$b
Information Technology.
$3
2095426
773
0
$t
Dissertation Abstracts International
$g
75-07B(E).
790
$a
0883
791
$a
Ph.D.
792
$a
2014
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3615022
based on 0 review(s)
Location:
ALL
電子資源
Year:
Volume Number:
Items
1 records • Pages 1 •
1
Inventory Number
Location Name
Item Class
Material type
Call number
Usage Class
Loan Status
No. of reservations
Opac note
Attachments
W9258670
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
On shelf
0
1 records • Pages 1 •
1
Multimedia
Reviews
Add a review
and share your thoughts with other readers
Export
pickup library
Processing
...
Change password
Login