東華大學圖書館 |

Language: English

Help

回圖書館首頁

手機版館藏查詢

Back

Switch To: Labeled | MARC Mode | ISBD

Modern Survey Estimation with Social...

Ferg, Robyn A.

Linked to FindBook

Google Book

Amazon

博客來

Modern Survey Estimation with Social Media and Auxiliary Data.

Record Type:	Electronic resources : Monograph/item
Title/Author:	Modern Survey Estimation with Social Media and Auxiliary Data./
Author:	Ferg, Robyn A.
Published:	Ann Arbor : ProQuest Dissertations & Theses, : 2020,
Description:	128 p.
Notes:	Source: Dissertations Abstracts International, Volume: 82-07, Section: B.
Contained By:	Dissertations Abstracts International82-07B.
Subject:	Mass communications. -
Online resource:	https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28240109
ISBN:	9798684618499

Modern Survey Estimation with Social Media and Auxiliary Data.
Ferg, Robyn A.

Modern Survey Estimation with Social Media and Auxiliary Data. - Ann Arbor : ProQuest Dissertations & Theses, 2020 - 128 p.

Source: Dissertations Abstracts International, Volume: 82-07, Section: B.

Thesis (Ph.D.)--University of Michigan, 2020.

This item must not be sold to any third party vendors.

Traditional survey methods have been successful for nearly a century, but recently response rates have been declining and costs have been increasing, making the future of survey science uncertain. At the same time, new media sources are generating new forms of data, population data is increasingly readily available, and sophisticated machine learning algorithms are being created. This dissertation uses modern data sources and tools to improve survey estimates and advance the field of survey science.We begin by exploring the challenges of using data from new media, demonstrating how relationships between social media data and survey responses can appear deceptively strong. We examine a previously observed relationship between sentiment of "jobs" tweets and consumer confidence, performing a sensitivity analysis on how sentiment of tweets is calculated and sorting "jobs" tweets into categories based on their content, concluding that the original observed relationship was merely a chance occurrence. Next we track the relationship between sentiment of "Trump" tweets and presidential approval. We develop a framework to interpret the strength of this observed relationship by implementing placebo analyses, in which we perform the same analysis but with tweets assumed to be unrelated to presidential approval, concluding that our observed relationship is not strong. Failing to find a meaningful signal, we next propose following a set of users over time. For a set of politically active users, we are able to find evidence of a political signal in terms of frequency and sentiment of their tweets around the 2016 presidential election.In a given corpus of tweets, there are likely to be several topics present, which has the potential to introduce bias when using the corpus to track survey responses. To help discover and sort tweets into these topics, we create a clustering-based topic modeling algorithm. Using the entire corpus, we create distances between words based on how often they appear together in the same tweet, create distances between tweets based on the distance between words in the tweets, and perform clustering on the resulting distances. We show that this method is effective using a validation set of tweets and apply it to the corpus of tweets from politically active users and "jobs" tweets.Finally, we use population auxiliary data and machine learning algorithms to improve survey estimates. We develop an imputation-based estimation method that produces an unbiased estimate of the mean response of a finite population from a simple random sample when population auxiliary data are available. Our method allows for any prediction function or machine learning algorithm to be used to predict the response for out-of-sample observations, and is therefore able to accommodate a high dimensional setting and all covariate types. Exact unbiasedness is guaranteed by estimating the bias of the prediction function using subsamples of the original simple random sample. Importantly, the unbiasedness property does not depend on the accuracy of the imputation method. We apply this estimation method to simulated data, college tuition data, and the American Community Survey.

ISBN: 9798684618499Subjects--Topical Terms:

3422380
Mass communications.
Subjects--Index Terms:

Twitter data

Modern Survey Estimation with Social Media and Auxiliary Data.
LDR:04794nmm a2200529 4500 001 2281890
005 20210927083422.5
008 220723s2020 ||||||||||||||||| ||eng d
020 $a 9798684618499
035 $a (MiAaPQ)AAI28240109
035 $a (MiAaPQ)umichrackham003170
035 $a AAI28240109
040 $a MiAaPQ $c MiAaPQ
100 1 $a Ferg, Robyn A. $3 3560599
245 1 0 $a Modern Survey Estimation with Social Media and Auxiliary Data.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2020
300 $a 128 p.
500 $a Source: Dissertations Abstracts International, Volume: 82-07, Section: B.
500 $a Advisor: Gagnon Bartsch, Johann.
502 $a Thesis (Ph.D.)--University of Michigan, 2020.
506 $a This item must not be sold to any third party vendors.
506 $a This item must not be added to any third party search indexes.
520 $a Traditional survey methods have been successful for nearly a century, but recently response rates have been declining and costs have been increasing, making the future of survey science uncertain. At the same time, new media sources are generating new forms of data, population data is increasingly readily available, and sophisticated machine learning algorithms are being created. This dissertation uses modern data sources and tools to improve survey estimates and advance the field of survey science.We begin by exploring the challenges of using data from new media, demonstrating how relationships between social media data and survey responses can appear deceptively strong. We examine a previously observed relationship between sentiment of "jobs" tweets and consumer confidence, performing a sensitivity analysis on how sentiment of tweets is calculated and sorting "jobs" tweets into categories based on their content, concluding that the original observed relationship was merely a chance occurrence. Next we track the relationship between sentiment of "Trump" tweets and presidential approval. We develop a framework to interpret the strength of this observed relationship by implementing placebo analyses, in which we perform the same analysis but with tweets assumed to be unrelated to presidential approval, concluding that our observed relationship is not strong. Failing to find a meaningful signal, we next propose following a set of users over time. For a set of politically active users, we are able to find evidence of a political signal in terms of frequency and sentiment of their tweets around the 2016 presidential election.In a given corpus of tweets, there are likely to be several topics present, which has the potential to introduce bias when using the corpus to track survey responses. To help discover and sort tweets into these topics, we create a clustering-based topic modeling algorithm. Using the entire corpus, we create distances between words based on how often they appear together in the same tweet, create distances between tweets based on the distance between words in the tweets, and perform clustering on the resulting distances. We show that this method is effective using a validation set of tweets and apply it to the corpus of tweets from politically active users and "jobs" tweets.Finally, we use population auxiliary data and machine learning algorithms to improve survey estimates. We develop an imputation-based estimation method that produces an unbiased estimate of the mean response of a finite population from a simple random sample when population auxiliary data are available. Our method allows for any prediction function or machine learning algorithm to be used to predict the response for out-of-sample observations, and is therefore able to accommodate a high dimensional setting and all covariate types. Exact unbiasedness is guaranteed by estimating the bias of the prediction function using subsamples of the original simple random sample. Importantly, the unbiasedness property does not depend on the accuracy of the imputation method. We apply this estimation method to simulated data, college tuition data, and the American Community Survey.
590 $a School code: 0127.
650 4 $a Mass communications. $3 3422380
650 4 $a Statistics. $3 517247
650 4 $a Computer science. $3 523869
650 4 $a Artificial intelligence. $3 516317
650 4 $a Web studies. $3 2122754
650 4 $a Political science. $3 528916
650 4 $a Information technology. $3 532993
650 4 $a Social research. $3 2122687
650 4 $a Information science. $3 554358
653 $a Twitter data
653 $a Survey estimation
653 $a Topic modeling
653 $a Predictive modeling
653 $a Politically active users
653 $a 2016 presidential election
653 $a Auxiliary data
653 $a College tuition data
653 $a Machine learning
653 $a Donald Trump
690 $a 0463
690 $a 0489
690 $a 0984
690 $a 0344
690 $a 0800
690 $a 0723
690 $a 0646
690 $a 0615
690 $a 0708
710 2 $a University of Michigan. $b Statistics. $3 3279150
773 0 $t Dissertations Abstracts International $g 82-07B.
790 $a 0127
791 $a Ph.D.
792 $a 2020
793 $a English
856 4 0 $u https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28240109