Within the framework of this article, modern metrics for evaluating generative models are considered. Particular attention is paid to metrics that are used in the field of natural language processing - BLUE (evaluates quality based on a comparison of the result obtained by a model and a person), NIST (based on the BLUE metric), METEOR (based on the harmonic mean of unigrams of accuracy and completeness), ROUGE (. The article presents a new metric, which is based on subjective assessments. The subjective estimates used in the considered metric are collected using pairwise comparison in the form of evaluation scales. The article also proposes an algorithm for generating music based on automatic models of working with ABC notation, models of distributive semantics and generative models of deep neural networks - Transformers. The new quality metric (SS-metric) presented in the article is used to assess the quality of the proposed algorithm for generating music in comparison with the solutions offered by humans and baseline models. Music generation based on the baseline model builds a continuation of a musical fragment by randomly selecting bars from the first half of the musical fragment. During the experiments, it was found out that the SS-metric allows you to formalize and generalize subjective assessments, this can be used to assess the quality of various objects.
Keywords:
metrics, generative models, analysis of objects of complex structure, SS-metric, music generation, machine learning