Conference material: "Proceedings of the 7th International Conference “Futurity designing. Digital reality problems” (February 15-16, 2024, Moscow)"
Authors:Gromov V.A., Borodin N.S., Kogan A.S., Dang Q.N., Yerbolova A.S., Bayan H.
Spot the bot: large-scale natural language structure
Abstract:
In the modern world, specialized programs (bots) write comments, news, reviews, which may contain false information. As a result, it is extremely important to know whether a given text was written by a real person or a bot. This work aims to study the semantic trajectories of texts in natural languages to analyse the aforementioned problem. The study utilizes the concepts of vector embeddings and their n-grams, as well as methods for (1) clustering the semantic space, (2) analysing the position of texts on the 'entropy-complexity' plane, (3) estimating the intrinsic dimensionalities of vector language representations, and (4) topological data analysis.
Keywords:
semantic trajectories, natural language processing, bots, clustering, entropy-complexity plane, intrinsic dimensionality, topological data analysis