KIAM Main page Web Library  •  Publication Searh   
Publication

KIAM Preprint  16, Moscow, 2024
Authors: Kislitsyna M.Y., Orlov Y.N.
The distribution of ordinal frequencies of consonants as an invariant of a language group
Abstract:
The statistics of the frequency distribution of consonant letters in the main modern languages of the Indo-European family are collected. The distributions of descending frequencies were studied, based on the analysis of literary texts with a length of about 1 million characters. It is shown that it is possible to introduce an invariant of language groups Germanic, Romance, Slavic and Baltic as the distance between the elements of the group in the L1 norm. The threshold distance at which languages are grouped as fully connected subgraphs is 0.14. It is also shown that the structures of the graph of near and far neighbors correspond to the model of dependent random variables.
Keywords:
machine classification, text preprocessing, ordered frequencies distribution, nearest neighbor graph
Publication language: russian,  pages: 18
Research direction:
Mathematical modelling in actual problems of science and technics
Russian source text:
Export link to publication in format:   RIS    BibTeX
View statistics (updated once a day)
over the last 30 days 3 (-9), total hit from 07.03.2024 60
About authors:
  • Kislitsyna Maria Yurievna,  orcid.org/0000-0002-2542-8914KIAM RAS
  • Orlov Yurii Nikolaevich,  orcid.org/0000-0002-1356-5137KIAM RAS