語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Exploring the Google Books Corpus: A...
~
Pechenick, Eitan Adam.
FindBook
Google Book
Amazon
博客來
Exploring the Google Books Corpus: An Information-Theoretic Approach to Linguistic Evolution.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Exploring the Google Books Corpus: An Information-Theoretic Approach to Linguistic Evolution./
作者:
Pechenick, Eitan Adam.
面頁冊數:
109 p.
附註:
Source: Dissertation Abstracts International, Volume: 76-08(E), Section: B.
Contained By:
Dissertation Abstracts International76-08B(E).
標題:
Applied mathematics. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3688844
ISBN:
9781321677225
Exploring the Google Books Corpus: An Information-Theoretic Approach to Linguistic Evolution.
Pechenick, Eitan Adam.
Exploring the Google Books Corpus: An Information-Theoretic Approach to Linguistic Evolution.
- 109 p.
Source: Dissertation Abstracts International, Volume: 76-08(E), Section: B.
Thesis (Ph.D.)--The University of Vermont and State Agricultural College, 2015.
This item must not be sold to any third party vendors.
The Google Books corpus contains millions of books in a variety of languages. Due to this incredible volume and its free availability, it is a treasure trove that has inspired a plethora of linguistic research.
ISBN: 9781321677225Subjects--Topical Terms:
2122814
Applied mathematics.
Exploring the Google Books Corpus: An Information-Theoretic Approach to Linguistic Evolution.
LDR
:03585nmm a2200337 4500
001
2060371
005
20150828095347.5
008
170521s2015 ||||||||||||||||| ||eng d
020
$a
9781321677225
035
$a
(MiAaPQ)AAI3688844
035
$a
AAI3688844
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Pechenick, Eitan Adam.
$3
3174519
245
1 0
$a
Exploring the Google Books Corpus: An Information-Theoretic Approach to Linguistic Evolution.
300
$a
109 p.
500
$a
Source: Dissertation Abstracts International, Volume: 76-08(E), Section: B.
500
$a
Advisers: Peter S. Dodds; Christopher M. Danforth.
502
$a
Thesis (Ph.D.)--The University of Vermont and State Agricultural College, 2015.
506
$a
This item must not be sold to any third party vendors.
520
$a
The Google Books corpus contains millions of books in a variety of languages. Due to this incredible volume and its free availability, it is a treasure trove that has inspired a plethora of linguistic research.
520
$a
It is tempting to treat frequency trends from Google Books data sets as indicators for the true popularity of various words and phrases. Doing so allows us to draw novel conclusions about the evolution of public perception of a given topic. However, sampling published works by availability and ease of digitization leads to several important effects, which have typically been overlooked in previous studies. One of these is the ability of a single prolific author to noticeably insert new phrases into a language. A greater effect arises from scientific texts, which have become increasingly prolific in the last several decades and are heavily sampled in the corpus. The result is a surge of phrases typical to academic articles but less common in general, such as references to time in the form of citations. We highlight these dynamics by examining and comparing major contributions to the statistical divergence of English data sets between decades in the period 1800--2000. We find that only the English Fiction data set from the second version of the corpus is not heavily affected by professional texts, in clear contrast to the first version of the fiction data set and both unfiltered English data sets.
520
$a
We critique a method used by authors of an earlier work to determine the birth and death rates of words in a given linguistic data set. While intriguing, the method in question appears to produce an artificial surge in the death rate at the end of the observed period of time. In order to avoid boundary effects in our own analysis of asymmetries in language dynamics, we observe the volume of word flux across various relative frequency thresholds (in both directions) for the second English Fiction data set. We then use the contributions of the words crossing these thresholds to the Jensen-Shannon divergence between consecutive decades to resolve major factors driving the flux.
520
$a
Having established careful information-theoretic techniques to resolve important features in the evolution of the data set, we validate and refine our methods by analyzing the effects of major exogenous factors, specifically wars. This approach leads to a uniquely comprehensive set of methods for harnessing the Google Books corpus and exploring socio-cultural and linguistic evolution.
590
$a
School code: 0243.
650
4
$a
Applied mathematics.
$3
2122814
650
4
$a
Sociolinguistics.
$3
524467
650
4
$a
Computer science.
$3
523869
690
$a
0364
690
$a
0636
690
$a
0984
710
2
$a
The University of Vermont and State Agricultural College.
$b
Mathematical Sciences.
$3
3173812
773
0
$t
Dissertation Abstracts International
$g
76-08B(E).
790
$a
0243
791
$a
Ph.D.
792
$a
2015
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3688844
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9293029
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入