東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Exploring the Google Books Corpus: A...

Pechenick, Eitan Adam.

FindBook

Google Book

Amazon

博客來

Exploring the Google Books Corpus: An Information-Theoretic Approach to Linguistic Evolution.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Exploring the Google Books Corpus: An Information-Theoretic Approach to Linguistic Evolution./
作者:	Pechenick, Eitan Adam.
面頁冊數:	109 p.
附註:	Source: Dissertation Abstracts International, Volume: 76-08(E), Section: B.
Contained By:	Dissertation Abstracts International76-08B(E).
標題:	Applied mathematics. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3688844
ISBN:	9781321677225

Exploring the Google Books Corpus: An Information-Theoretic Approach to Linguistic Evolution.
Pechenick, Eitan Adam.

Exploring the Google Books Corpus: An Information-Theoretic Approach to Linguistic Evolution. - 109 p.

Source: Dissertation Abstracts International, Volume: 76-08(E), Section: B.

Thesis (Ph.D.)--The University of Vermont and State Agricultural College, 2015.

This item must not be sold to any third party vendors.

The Google Books corpus contains millions of books in a variety of languages. Due to this incredible volume and its free availability, it is a treasure trove that has inspired a plethora of linguistic research.

ISBN: 9781321677225Subjects--Topical Terms:

2122814
Applied mathematics.

Exploring the Google Books Corpus: An Information-Theoretic Approach to Linguistic Evolution.
LDR:03585nmm a2200337 4500 001 2060371
005 20150828095347.5
008 170521s2015 ||||||||||||||||| ||eng d
020 $a 9781321677225
035 $a (MiAaPQ)AAI3688844
035 $a AAI3688844
040 $a MiAaPQ $c MiAaPQ
100 1 $a Pechenick, Eitan Adam. $3 3174519
245 1 0 $a Exploring the Google Books Corpus: An Information-Theoretic Approach to Linguistic Evolution.
300 $a 109 p.
500 $a Source: Dissertation Abstracts International, Volume: 76-08(E), Section: B.
500 $a Advisers: Peter S. Dodds; Christopher M. Danforth.
502 $a Thesis (Ph.D.)--The University of Vermont and State Agricultural College, 2015.
506 $a This item must not be sold to any third party vendors.
520 $a The Google Books corpus contains millions of books in a variety of languages. Due to this incredible volume and its free availability, it is a treasure trove that has inspired a plethora of linguistic research.
520 $a It is tempting to treat frequency trends from Google Books data sets as indicators for the true popularity of various words and phrases. Doing so allows us to draw novel conclusions about the evolution of public perception of a given topic. However, sampling published works by availability and ease of digitization leads to several important effects, which have typically been overlooked in previous studies. One of these is the ability of a single prolific author to noticeably insert new phrases into a language. A greater effect arises from scientific texts, which have become increasingly prolific in the last several decades and are heavily sampled in the corpus. The result is a surge of phrases typical to academic articles but less common in general, such as references to time in the form of citations. We highlight these dynamics by examining and comparing major contributions to the statistical divergence of English data sets between decades in the period 1800--2000. We find that only the English Fiction data set from the second version of the corpus is not heavily affected by professional texts, in clear contrast to the first version of the fiction data set and both unfiltered English data sets.
520 $a We critique a method used by authors of an earlier work to determine the birth and death rates of words in a given linguistic data set. While intriguing, the method in question appears to produce an artificial surge in the death rate at the end of the observed period of time. In order to avoid boundary effects in our own analysis of asymmetries in language dynamics, we observe the volume of word flux across various relative frequency thresholds (in both directions) for the second English Fiction data set. We then use the contributions of the words crossing these thresholds to the Jensen-Shannon divergence between consecutive decades to resolve major factors driving the flux.
520 $a Having established careful information-theoretic techniques to resolve important features in the evolution of the data set, we validate and refine our methods by analyzing the effects of major exogenous factors, specifically wars. This approach leads to a uniquely comprehensive set of methods for harnessing the Google Books corpus and exploring socio-cultural and linguistic evolution.
590 $a School code: 0243.
650 4 $a Applied mathematics. $3 2122814
650 4 $a Sociolinguistics. $3 524467
650 4 $a Computer science. $3 523869
690 $a 0364
690 $a 0636
690 $a 0984
710 2 $a The University of Vermont and State Agricultural College. $b Mathematical Sciences. $3 3173812
773 0 $t Dissertation Abstracts International $g 76-08B(E).
790 $a 0243
791 $a Ph.D.
792 $a 2015
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3688844