東華大學圖書館 |

Beyond Discourse Computational Text Analysis and Material Historical Processes.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Beyond Discourse Computational Text Analysis and Material Historical Processes./
作者:	Atria, Jose Tomas.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2018,
面頁冊數:	207 p.
附註:	Source: Dissertations Abstracts International, Volume: 80-04, Section: A.
Contained By:	Dissertations Abstracts International80-04A.
標題:	History. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10930483
ISBN:	9780438394872

Beyond Discourse Computational Text Analysis and Material Historical Processes.
Atria, Jose Tomas.

Beyond Discourse Computational Text Analysis and Material Historical Processes. - Ann Arbor : ProQuest Dissertations & Theses, 2018 - 207 p.

Source: Dissertations Abstracts International, Volume: 80-04, Section: A.

Thesis (Ph.D.)--Columbia University, 2018.

This item must not be sold to any third party vendors.

This dissertation proposes a general methodological framework for the application of computational text analysis to the study of long duration material processes of transformation, beyond their traditional application to the study of discourse and rhetorical action. Over a thin theory of the linguistic nature of social facts, the proposed methodology revolves around the compilation of term co-occurrence matrices and their projection into different representations of an hypothetical semantic space. These representations offer solutions to two problems inherent to social scientific research: that of "mapping" features in a given representation to theoretical entities and that of "alignment" of the features seen in models built from different sources in order to enable their comparison. The data requirements of the exercise are discussed through the introduction of the notion of a "narrative horizon", the extent to which a given source incorporates a narrative account in its rendering of the context that produces it. Useful primary data will consist of text with short narrative horizons, such that the ideal source will correspond to a continuous archive of institutional, ideally bureaucratic text produced as mere documentation of a definite population of more or less stable and comparable social facts across a couple of centuries. Such a primary source is available in the Proceedings of the Old Bailey (POB), a collection of transcriptions of 197,752 criminal trials seen by the Old Bailey and the Central Criminal Court of London and Middlesex between 1674 and 1913 that includes verbatim transcriptions of witness testimony. The POB is used to demonstrate the proposed framework, starting with the analysis of the evolution of an historical corpus to illustrate the procedure by which provenance data is used to construct longitudinal and cross-sectional comparisons of different corpus segments. The co-occurrence matrices obtained from the POB corpus are used to demonstrate two different projections: semantic networks that model different notions of similarity between the terms in a corpus' lexicon as an adjacency matrix describing a graph and semantic vector spaces that approximate a lower-dimensional representation of an hypothetical semantic space from its empirical effects on the co-occurrence matrix. Semantic networks are presented as discrete mathematical objects that offer a solution to the mapping problem through operation that allow for the construction of sets of terms over which an order can be induced using any measure of significance of the strength of association between a term set and its elements. Alignment can then be solved through different similarity measures computed over the intersection and union of the sets under comparison. Semantic vector spaces are presented as continuous mathematical objects that offer a solution to the mapping problem in the linear structures contained in them. This include, in all cases, a meaningful metric that makes it possible to define neighbourhoods and regions in the semantic space and, in some cases, a meaningful orientation that makes it possible to trace dimensions across them. Alignment can then proceed endogenously in the case of oriented vector spaces for relative comparisons, or through the construction of common basis sets for non-oriented semantic spaces for absolute comparisons. The dissertation concludes with the proposition of a general research program for the systematic compilation of text distributional patterns in order to facilitate a much needed process of calibration required by the techniques discussed in the previous chapters. Two specific avenues for further research are identified. First, the development of incremental methods of projection that allow a semantic model to be updated as new observations come along, an area that has received considerable attention from the field of electronic finance and the pervasive use of Gentleman's algorithm for matrix factorisation. Second, the development of additively decomposable models that may be combined or disaggregated to obtain a similar result to the one that would have been obtained had the model being computed from the union or difference of their inputs. This is established to be dependent on whether the functions that actualise a given model are associative under addition or not.

ISBN: 9780438394872Subjects--Topical Terms:

516518
History.

Beyond Discourse Computational Text Analysis and Material Historical Processes.
LDR:05475nmm a2200337 4500 001 2208469
005 20191021073444.5
008 201008s2018 ||||||||||||||||| ||eng d
020 $a 9780438394872
035 $a (MiAaPQ)AAI10930483
035 $a (MiAaPQ)columbia:14862
035 $a AAI10930483
040 $a MiAaPQ $c MiAaPQ
100 1 $a Atria, Jose Tomas. $3 3435504
245 1 0 $a Beyond Discourse Computational Text Analysis and Material Historical Processes.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2018
300 $a 207 p.
500 $a Source: Dissertations Abstracts International, Volume: 80-04, Section: A.
500 $a Publisher info.: Dissertation/Thesis.
500 $a Advisor: Bearman, Peter S.
502 $a Thesis (Ph.D.)--Columbia University, 2018.
506 $a This item must not be sold to any third party vendors.
520 $a This dissertation proposes a general methodological framework for the application of computational text analysis to the study of long duration material processes of transformation, beyond their traditional application to the study of discourse and rhetorical action. Over a thin theory of the linguistic nature of social facts, the proposed methodology revolves around the compilation of term co-occurrence matrices and their projection into different representations of an hypothetical semantic space. These representations offer solutions to two problems inherent to social scientific research: that of "mapping" features in a given representation to theoretical entities and that of "alignment" of the features seen in models built from different sources in order to enable their comparison. The data requirements of the exercise are discussed through the introduction of the notion of a "narrative horizon", the extent to which a given source incorporates a narrative account in its rendering of the context that produces it. Useful primary data will consist of text with short narrative horizons, such that the ideal source will correspond to a continuous archive of institutional, ideally bureaucratic text produced as mere documentation of a definite population of more or less stable and comparable social facts across a couple of centuries. Such a primary source is available in the Proceedings of the Old Bailey (POB), a collection of transcriptions of 197,752 criminal trials seen by the Old Bailey and the Central Criminal Court of London and Middlesex between 1674 and 1913 that includes verbatim transcriptions of witness testimony. The POB is used to demonstrate the proposed framework, starting with the analysis of the evolution of an historical corpus to illustrate the procedure by which provenance data is used to construct longitudinal and cross-sectional comparisons of different corpus segments. The co-occurrence matrices obtained from the POB corpus are used to demonstrate two different projections: semantic networks that model different notions of similarity between the terms in a corpus' lexicon as an adjacency matrix describing a graph and semantic vector spaces that approximate a lower-dimensional representation of an hypothetical semantic space from its empirical effects on the co-occurrence matrix. Semantic networks are presented as discrete mathematical objects that offer a solution to the mapping problem through operation that allow for the construction of sets of terms over which an order can be induced using any measure of significance of the strength of association between a term set and its elements. Alignment can then be solved through different similarity measures computed over the intersection and union of the sets under comparison. Semantic vector spaces are presented as continuous mathematical objects that offer a solution to the mapping problem in the linear structures contained in them. This include, in all cases, a meaningful metric that makes it possible to define neighbourhoods and regions in the semantic space and, in some cases, a meaningful orientation that makes it possible to trace dimensions across them. Alignment can then proceed endogenously in the case of oriented vector spaces for relative comparisons, or through the construction of common basis sets for non-oriented semantic spaces for absolute comparisons. The dissertation concludes with the proposition of a general research program for the systematic compilation of text distributional patterns in order to facilitate a much needed process of calibration required by the techniques discussed in the previous chapters. Two specific avenues for further research are identified. First, the development of incremental methods of projection that allow a semantic model to be updated as new observations come along, an area that has received considerable attention from the field of electronic finance and the pervasive use of Gentleman's algorithm for matrix factorisation. Second, the development of additively decomposable models that may be combined or disaggregated to obtain a similar result to the one that would have been obtained had the model being computed from the union or difference of their inputs. This is established to be dependent on whether the functions that actualise a given model are associative under addition or not.
590 $a School code: 0054.
650 4 $a History. $3 516518
650 4 $a Sociology. $3 516174
650 4 $a Sociolinguistics. $3 524467
690 $a 0578
690 $a 0626
690 $a 0636
710 2 $a Columbia University. $b Sociology. $3 2049901
773 0 $t Dissertations Abstracts International $g 80-04A.
790 $a 0054
791 $a Ph.D.
792 $a 2018
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10930483