新书报道
当前位置: 首页 >> 艺术语言文学体育 >> 正文
Literary Detective Work on the Computer
发布日期:2015-12-17  浏览

Literary Detective Work on the Computer

[Book Description]

Computational linguistics can be used to uncover mysteries in text which are not always obvious to visual inspection. For example, the computer analysis of writing style can show who might be the true author of a text in cases of disputed authorship or suspected plagiarism. The theoretical background to authorship attribution is presented in a step by step manner, and comprehensive reviews of the field are given in two specialist areas, the writings of William Shakespeare and his contemporaries, and the various writing styles seen in religious texts. The final chapter looks at the progress computers have made in the decipherment of lost languages. This book is written for students and researchers of general linguistics, computational and corpus linguistics, and computer forensics. It will inspire future researchers to study these topics for themselves, and gives sufficient details of the methods and resources to get them started.

[Table of Contents]
Preface                                            ix
Chapter 1 Author identification                    1  (58)
  1 Introduction                                   1  (4)
  2 Feature selection                              5  (6)
    2.1 Evaluation of feature sets for             8  (3)
    authorship attribution
  3 Inter-textual distances                        11 (19)
    3.1 Manhattan distance and Euclidean           12 (2)
    distance
    3.2 Labbe and Labbe's measure                  14 (1)
    3.3 Chi-squared distance                       15 (1)
    3.4 The cosine similarity measure              16 (2)
    3.5 Kullback-Leibler Divergence (KLD)          18 (1)
    3.6 Burrows' Delta                             18 (5)
    3.7 Evaluation of feature-based measures       23 (3)
    for inter-textual distance
    3.8 Inter-textual distance by semantic         26 (2)
    similarity
    3.9 Stemmatology as a measure of               28 (2)
    inter-textual distance
  4 Clustering techniques                          30 (17)
    4.1 Introduction to factor analysis            31 (4)
    4.2 Matrix algebra                             35 (3)
    4.3 Use of matrix algebra for PCA              38 (6)
    4.4 PCA case studies                           44 (1)
    4.5 Correspondence analysis                    45 (2)
  5 Comparisons of classifiers                     47 (3)
  6 Other tasks related to authorship              50 (8)
    6.1 $tylochronometry                           50 (3)
    6.2 Affect dictionaries and psychological      53 (5)
    profiling
    6.3 Evaluation of author profiling             58 (1)
  7 Conclusion                                     58 (1)
Chapter 2 Plagiarism and spam filtering            59 (40)
  1 Introduction                                   59 (3)
  2 Plagiarism detection software                  62 (24)
    2.1 Collusion and plagiarism, external and     63 (1)
    intrinsic
    2.2 Preprocessing of corpora and feature       63 (1)
    extraction
    2.3 Sequence comparison and exact match        64 (1)
    2.4 Source-suspicious document similarity      65 (1)
    measures
    2.5 Fingerprinting                             66 (1)
    2.6 Language models                            67 (1)
    2.7 Natural language processing                68 (2)
    2.8 Intrinsic plagiarism detection             70 (3)
    2.9 Plagiarism of program code                 73 (1)
    2.10 Distance between translated and           74 (2)
    original text
    2.11 Direction of plagiarism                   76 (2)
    2.12 The search engine-based approach used     78 (3)
    at PAN-13
    2.13 Case study 1: Hidden influences from      81 (2)
    printed sources in the Gaelic tales of
    Duncan and Neil MacDonald
    2.14 Case study 2: General George Pickett      83 (1)
    and related writings
    2.15 Evaluation methods                        84 (1)
    2.16 Conclusion                                85 (1)
  3 Spam filters                                   86 (12)
    3.1 Content-based techniques                   87 (1)
    3.2 Building a labeled corpus for training     87 (1)
    3.3 Exact matching techniques                  88 (1)
    3.4 Rule-based methods                         89 (1)
    3.5 Machine learning                           90 (2)
    3.6 Unsupervised machine learning approaches   92 (1)
    3.7 Other spam-filtering problems              93 (1)
    3.8 Evaluation of spam filters                 94 (1)
    3.9 Non-linguistic techniques                  94 (3)
    3.10 Conclusion                                97 (1)
  4 Recommendations for further reading            98 (1)
Chapter 3 Computer studies of Shakespearean        99 (50)
authorship
  1 Introduction                                   99 (2)
  2 Shakespeare, Wilkins and "Pericles"            101(7)
    2.1 Correspondence analysis for "Pericles"     105(3)
    and related texts
  3 Shakespeare, Fletcher and "The Two Noble       108(2)
  Kinsmen"
  4 "King John"                                    110(1)
  5 "The Raigne of King Edward III"                111(7)
    5.1 Neural networks in stylometry              111(2)
    5.2 Cusum charts in stylometry                 113(3)
    5.3 Burrows' Zeta and Iota                     116(2)
  6 Hand D in "Sir Thomas More"                    118(14)
    6.1 Elliott, Valenza and the Earl of Oxford    118(3)
    6.2 Elliott and Valenza: Hand D                121(1)
    6.3 Bayesian approach to questions of          122(5)
    Shakespearian authorship
    6.4 Bayesian analysis of Shakespeare's         127(4)
    second person pronouns
    6.5 Vocabulary differences, LDA and the
    authorship of Hand D 13o
    6.6 Hand D: Conclusions                        131(1)
  7 The three parts of "Henry VI"                  132(1)
  8 "Timon of Athens"                              132(1)
  9 "The Puritan" and "A Yorkshire Tragedy"        133(1)
  10 "Arden of Faversham"                          134(2)
  11 Estimation of the extent of Shakespeare's     136(5)
  vocabulary and the authorship of the "Taylor"
  poem
  12 The chronology of Shakespeare                 141(6)
  13 Conclusion                                    147(2)
Chapter 4 Stylometric analysis of religious        149(58)
texts
  1 Introduction                                   149(41)
    1.1 Overview of the New Testament by           151(2)
    correspondence analysis
    1.2 Q                                          153(16)
    1.3 Luke and Acts                              169(2)
    1.4 Recent approaches to New Testament         171(4)
    stylometry
    1.5 The Pauline Epistles                       175(13)
    1.6 Hebrews                                    188(1)
    1.7 The Signs Gospel                           188(2)
  2 Stylometric analysis of the Book of Mormon     190(8)
  3 Stylometric studies of the Qu'ran              198(8)
  4 Condupion                                      206(1)
Chapter 5 Computers and decipherment               207(52)
  1 Introduction                                   207(17)
    1.1 Differences between cryptography and       208(1)
    decipherment
    1.2 Cryptological techniques for automatic     209(3)
    language recognition
    1.3 Dictionary approaches to language          212(1)
    recognition
    1.4 Sinkov's test                              212(1)
    1.5 Index of coincidence                       213(1)
    1.6 The log-likelihood ratio                   214(1)
    1.7 The chi-squared test statistic             215(1)
    1.8 Entropy of language                        215(3)
    1.9 Zipf's Law and Heaps' Law coefficients     218(1)
    1.10 Modal token length                        219(1)
    1.11 Autocorrelation analysis                  220(1)
    1.12 Vowel identification                      221(3)
  2 Rongorongo                                     224(19)
    2.1 History of Rongorongo                      224(2)
    2.2 Characteristics of Rongorongo              226(1)
    2.3 Obstacles to decipherment                  227(1)
    2.4 Encoding of Rongorongo symbols             227(1)
    2.5 The "Mamari" lunar calendar                228(1)
    2.6 Basic statistics of the Rongorongo         228(1)
    corpus
    2.7 Alignment of the Rongorongo corpus         229(2)
    2.8 A concordance for Rongorongo               231(2)
    2.9 Collocations and collostructions           233(1)
    2.10 Classification by genre                   234(3)
    2.11 Vocabulary richness                       237(4)
    2.12 Podzniakov's approach to matching         241(2)
    frequency curves
  3 The Indus Valley texts                         243(9)
    3.1 Why decipherment of the Indus texts is     243(1)
    difficult
    3.2 Are the Indus texts writing?               244(4)
    3.3 Other evidence for the Indus Script        248(1)
    being writing
    3.4 Determining the order of the Markov        248(1)
    model
    3.5 Missing symbols                            249(1)
    3.6 Text segmentation and the                  249(2)
    log-likelihood measure
    3.7 Network analysis of the Indus Signs        251(1)
  4 Linear A                                       252(3)
  5 The Phaistos disk                              255(1)
  6 Iron Age Pictish symbols                       256(1)
  7 Mayan glyphs                                   256(1)
  8 Conclusion                                     257(2)
References                                         259(22)
Index                                              281

关闭


版权所有:西安交通大学图书馆      设计与制作:西安交通大学数据与信息中心  
地址:陕西省西安市碑林区咸宁西路28号     邮编710049

推荐使用IE9以上浏览器、谷歌、搜狗、360浏览器;推荐分辨率1360*768以上