東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Public Health Surveillance using Soc...

Dai, Xiangfeng.

FindBook

Google Book

Amazon

博客來

Public Health Surveillance using Social Media: Short Text Classification and Trend Analysis of Nonstationary Time Series.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Public Health Surveillance using Social Media: Short Text Classification and Trend Analysis of Nonstationary Time Series./
作者:	Dai, Xiangfeng.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2017,
面頁冊數:	112 p.
附註:	Source: Dissertation Abstracts International, Volume: 78-10(E), Section: B.
Contained By:	Dissertation Abstracts International78-10B(E).
標題:	Computer science. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10264204
ISBN:	9781369851076

Public Health Surveillance using Social Media: Short Text Classification and Trend Analysis of Nonstationary Time Series.
Dai, Xiangfeng.

Public Health Surveillance using Social Media: Short Text Classification and Trend Analysis of Nonstationary Time Series. - Ann Arbor : ProQuest Dissertations & Theses, 2017 - 112 p.

Source: Dissertation Abstracts International, Volume: 78-10(E), Section: B.

Thesis (Ph.D.)--North Carolina Agricultural and Technical State University, 2017.

Traditional public health surveillance is often limited by the time required to collect data. Social media (e.g., Twitter) provide a low-cost alternative data source for public health surveillance. In this dissertation, we develop a set of methods based on short text classification and trend analysis. First, we propose a hybrid classification method for collecting disease-related data from social media. The proposed method combines basic Natural Language Processing (NLP), rule-based classifiers and supervised machine learning classifiers. This method is efficiency and achieves better results than any single approach. To generalize the method, we also propose a word embedding based clustering method for text classification. Word embedding is an NLP method that can capture the semantic information of words. A text can be represented as a few vectors and divided into clusters of similar words. According to similarity measures of all the clusters, the text can then be classified as related or unrelated to a topic (e.g., influenza). Our simulations show a good performance and the best accuracy achieved was 87.1%. The proposed method is unsupervised, and hence it does not require labor to label training data and can be readily extended to other classification problems or other diseases.

ISBN: 9781369851076Subjects--Topical Terms:

523869
Computer science.

Public Health Surveillance using Social Media: Short Text Classification and Trend Analysis of Nonstationary Time Series.
LDR:03670nmm a2200325 4500 001 2126598
005 20171128150727.5
008 180830s2017 ||||||||||||||||| ||eng d
020 $a 9781369851076
035 $a (MiAaPQ)AAI10264204
035 $a AAI10264204
040 $a MiAaPQ $c MiAaPQ
100 1 $a Dai, Xiangfeng. $3 3288704
245 1 0 $a Public Health Surveillance using Social Media: Short Text Classification and Trend Analysis of Nonstationary Time Series.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2017
300 $a 112 p.
500 $a Source: Dissertation Abstracts International, Volume: 78-10(E), Section: B.
500 $a Adviser: Marwan Bikdash.
502 $a Thesis (Ph.D.)--North Carolina Agricultural and Technical State University, 2017.
520 $a Traditional public health surveillance is often limited by the time required to collect data. Social media (e.g., Twitter) provide a low-cost alternative data source for public health surveillance. In this dissertation, we develop a set of methods based on short text classification and trend analysis. First, we propose a hybrid classification method for collecting disease-related data from social media. The proposed method combines basic Natural Language Processing (NLP), rule-based classifiers and supervised machine learning classifiers. This method is efficiency and achieves better results than any single approach. To generalize the method, we also propose a word embedding based clustering method for text classification. Word embedding is an NLP method that can capture the semantic information of words. A text can be represented as a few vectors and divided into clusters of similar words. According to similarity measures of all the clusters, the text can then be classified as related or unrelated to a topic (e.g., influenza). Our simulations show a good performance and the best accuracy achieved was 87.1%. The proposed method is unsupervised, and hence it does not require labor to label training data and can be readily extended to other classification problems or other diseases.
520 $a The collected temporal disease-related social media data is quite noisy and nonstationary. To detect the onset of a disease from social media, we applied a distance-based outliers method to transform the noisy social media data into regions of inliers and outliers, then perform region-based hypothesis testing for outbreak detection. We then propose a Hypothesis testing-based Adaptive Spline Filtering (HASF) method which breaks the nonstationary time series into sections of adapted lengths, each of which is curve-fitted with a cubic spline. The method allows the imposition of appropriate constraints such as continuity and smoothness between the sections, minimum or maximum section length, etc. The number of sections and the nodes between them are adapted from the data by testing hypotheses regarding the second statistics of the residuals computed using different configurations of nodes. The resulting cubic-spline curve can therefore be interpreted as capturing the disease trends and turning the residual into a stationary process as much as possible. The HASF approach is extended to solve the problem of missing data in time series. Three "filling" variants are considered, and the most promising variant fills big gaps with linear splines while maintaining smoothness and continuity between the sections.
590 $a School code: 1544.
650 4 $a Computer science. $3 523869
650 4 $a Artificial intelligence. $3 516317
650 4 $a Mining engineering. $3 788403
650 4 $a Public health. $3 534748
690 $a 0984
690 $a 0800
690 $a 0551
690 $a 0573
710 2 $a North Carolina Agricultural and Technical State University. $b Computational Science and Engineering. $3 2105946
773 0 $t Dissertation Abstracts International $g 78-10B(E).
790 $a 1544
791 $a Ph.D.
792 $a 2017
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10264204