Text Authorship Identified Using the Dynamics of Word Co-Occurrence Networks
- PMID: 28125703
- PMCID: PMC5268788
- DOI: 10.1371/journal.pone.0170527
Text Authorship Identified Using the Dynamics of Word Co-Occurrence Networks
Abstract
Automatic identification of authorship in disputed documents has benefited from complex network theory as this approach does not require human expertise or detailed semantic knowledge. Networks modeling entire books can be used to discriminate texts from different sources and understand network growth mechanisms, but only a few studies have probed the suitability of networks in modeling small chunks of text to grasp stylistic features. In this study, we introduce a methodology based on the dynamics of word co-occurrence networks representing written texts to classify a corpus of 80 texts by 8 authors. The texts were divided into sections with equal number of linguistic tokens, from which time series were created for 12 topological metrics. Since 73% of all series were stationary (ARIMA(p, 0, q)) and the remaining were integrable of first order (ARIMA(p, 1, q)), probability distributions could be obtained for the global network metrics. The metrics exhibit bell-shaped non-Gaussian distributions, and therefore distribution moments were used as learning attributes. With an optimized supervised learning procedure based on a nonlinear transformation performed by Isomap, 71 out of 80 texts were correctly classified using the K-nearest neighbors algorithm, i.e. a remarkable 88.75% author matching success rate was achieved. Hence, purely dynamic fluctuations in network metrics can characterize authorship, thus paving the way for a robust description of large texts in terms of small evolving networks.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
References
-
- Xia C, Wang L, Sun S, Wang J. An SIR model with infection delay and propagation vector in complex networks. Nonlinear Dynamics. 2012;69(3):927–934. 10.1007/s11071-011-0313-y - DOI
-
- Chen M, Wang L, Wang J, Sun S, Xia C. Impact of individual response strategy on the spatial public goods game within mobile agents. Applied Mathematics and Computation. 2015;251:192–202. 10.1016/j.amc.2014.11.052 - DOI
-
- Chen M, Wang L, Sun S, Wang J, Xia C. Evolution of cooperation in the spatial public goods game with adaptive reputation assortment. Physics Letters A. 2016;380(1–2):40–47. 10.1016/j.physleta.2015.09.047 - DOI
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
