The System for Automatic Stylometric Analysis of Ukrainian Media Texts TextAttributor 1.0 (Techniques, Means, Functionality)

  • Nataliia Darchuk
  • Oksana Zuban
  • Valentyna Robeiko
  • Yuliia Tsyhvintseva
  • Victor Sorokin
  • Mykola Sazhok
Keywords: Computational linguistics, Ukrainian language, sentiment analysis, authorship attribution, stylometry, text classification

Abstract

This paper presents the structure, algorithms, implementation, and experimental results of the automatic TextAttributor system developed by the authors of the paper for statistical Ukrainian-language text parameterisation using a multiparametric set of statistical indices, characterising the author’s text style and applicable to authorship attribution tasks. Based on the created linguistic resources and software, the system generates a linguistic analysis based on the calculated statistical indices and performs a comparative study of two texts. An additional criterion for statistical indexing is the text toxicity index, calculated through the method of verbal identification of toxic sentiment. Authorship and toxicity detection tasks are addressed using two methods: dictionary- and rule-based statistical calculations and machine learning. The current findings implemented in the beta version of TextAttributor are thoroughly examined.

Section
Articles