東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

An evolutionary machine learning fra...

Kamath, Uday Krishna.

FindBook

Google Book

Amazon

博客來

An evolutionary machine learning framework for big data sequence mining.

紀錄類型:	書目-語言資料,印刷品 : Monograph/item
正題名/作者:	An evolutionary machine learning framework for big data sequence mining./
作者:	Kamath, Uday Krishna.
面頁冊數:	177 p.
附註:	Source: Dissertation Abstracts International, Volume: 75-07(E), Section: B.
Contained By:	Dissertation Abstracts International75-07B(E).
標題:	Computer Science. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3615022
ISBN:	9781303807275

An evolutionary machine learning framework for big data sequence mining.
Kamath, Uday Krishna.

An evolutionary machine learning framework for big data sequence mining. - 177 p.

Source: Dissertation Abstracts International, Volume: 75-07(E), Section: B.

Thesis (Ph.D.)--George Mason University, 2014.

Sequence classification is an important problem in many real-world applications. Unlike other machine learning data, there are no \explicit" features or signals in sequence data that can help traditional machine learning algorithms learn and predict from the data. Sequence data exhibits inter-relationships in the elements that are important in understanding and predicting future sequences. However, finding these relationships is proven to be an NPhard problem. When we use naive enumerations of combinations of elements or \brute force" iterative approaches for defining these features they often result in poor predictions. Some algorithms which perform well in prediction lack transparency, i.e., the discriminating features generated by these methods are not easily identifiable. In addition, the size of the sequence-based datasets presents practical challenges to most learning algorithms. Most sequence-based datasets contain millions or even billions of instances, for example, the genome-wide sequences of organisms in bioinformatics. At these sizes, classic learning algorithms often become prohibitively expensive, making scalability an important issue. Therefore, there is a need for an approach that can help find features/signals in complex sequences, oer meaningful discriminators, produce good predictions, and can scale well in time and space. This dissertation addresses the above issues by designing a comprehensive approach in the form of the Evolutionary Machine Learner (EML) framework. This framework can be employed on sequence-based datasets to generate explicit, human-recognizable features while solving the scalability issue. EML framework consists of a novel EA-based feature generation (EFG) algorithm for automatic feature construction. By modeling four complex sequencing problems in bioinformatics and generating meaningful, human-understandable features with comparable or better accuracy than the state of the art algorithms, the power and usefulness of the EFG algorithm is demonstrated. The EFG algorithm is also validated by applying it to time series classification problems showing the generic nature of the algorithm in finding the important discriminating patterns that assist in modeling sequence based data. EML framework addresses the scalability issue by means of a novel, parallel scalable machine learning algorithm (PSBML) based on spatially structured evolutionary algorithms. PSBML is validated on real-world \big data" classification problems for various properties of meta-learning, scalability and noise resilience using well known benchmark datasets. The PSBML algorithm is also proven theoretically to be a large margin classifier with linear scalability in training time and space, giving it a unique distinction among the existing large scale learning algorithms. Finally, the EML framework is validated on a large genome-wide bioinformatics classification problem and a large time series problem, showing that the combined algorithms achieve higher predictive performance, training time speed up, and the ability to produce human-understandable discriminating signals as features.

ISBN: 9781303807275Subjects--Topical Terms:

626642
Computer Science.

An evolutionary machine learning framework for big data sequence mining.
LDR:04071nam a2200289 4500 001 1963672
005 20141007080205.5
008 150210s2014 ||||||||||||||||| ||eng d
020 $a 9781303807275
035 $a (MiAaPQ)AAI3615022
035 $a AAI3615022
040 $a MiAaPQ $c MiAaPQ
100 1 $a Kamath, Uday Krishna. $3 2099979
245 1 3 $a An evolutionary machine learning framework for big data sequence mining.
300 $a 177 p.
500 $a Source: Dissertation Abstracts International, Volume: 75-07(E), Section: B.
500 $a Adviser: Kenneth A. De Jong.
502 $a Thesis (Ph.D.)--George Mason University, 2014.
520 $a Sequence classification is an important problem in many real-world applications. Unlike other machine learning data, there are no \explicit" features or signals in sequence data that can help traditional machine learning algorithms learn and predict from the data. Sequence data exhibits inter-relationships in the elements that are important in understanding and predicting future sequences. However, finding these relationships is proven to be an NPhard problem. When we use naive enumerations of combinations of elements or \brute force" iterative approaches for defining these features they often result in poor predictions. Some algorithms which perform well in prediction lack transparency, i.e., the discriminating features generated by these methods are not easily identifiable. In addition, the size of the sequence-based datasets presents practical challenges to most learning algorithms. Most sequence-based datasets contain millions or even billions of instances, for example, the genome-wide sequences of organisms in bioinformatics. At these sizes, classic learning algorithms often become prohibitively expensive, making scalability an important issue. Therefore, there is a need for an approach that can help find features/signals in complex sequences, oer meaningful discriminators, produce good predictions, and can scale well in time and space. This dissertation addresses the above issues by designing a comprehensive approach in the form of the Evolutionary Machine Learner (EML) framework. This framework can be employed on sequence-based datasets to generate explicit, human-recognizable features while solving the scalability issue. EML framework consists of a novel EA-based feature generation (EFG) algorithm for automatic feature construction. By modeling four complex sequencing problems in bioinformatics and generating meaningful, human-understandable features with comparable or better accuracy than the state of the art algorithms, the power and usefulness of the EFG algorithm is demonstrated. The EFG algorithm is also validated by applying it to time series classification problems showing the generic nature of the algorithm in finding the important discriminating patterns that assist in modeling sequence based data. EML framework addresses the scalability issue by means of a novel, parallel scalable machine learning algorithm (PSBML) based on spatially structured evolutionary algorithms. PSBML is validated on real-world \big data" classification problems for various properties of meta-learning, scalability and noise resilience using well known benchmark datasets. The PSBML algorithm is also proven theoretically to be a large margin classifier with linear scalability in training time and space, giving it a unique distinction among the existing large scale learning algorithms. Finally, the EML framework is validated on a large genome-wide bioinformatics classification problem and a large time series problem, showing that the combined algorithms achieve higher predictive performance, training time speed up, and the ability to produce human-understandable discriminating signals as features.
590 $a School code: 0883.
650 4 $a Computer Science. $3 626642
650 4 $a Biology, Bioinformatics. $3 1018415
650 4 $a Information Science. $3 1017528
690 $a 0984
690 $a 0715
690 $a 0723
710 2 $a George Mason University. $b Information Technology. $3 2095426
773 0 $t Dissertation Abstracts International $g 75-07B(E).
790 $a 0883
791 $a Ph.D.
792 $a 2014
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3615022