KIAM Main page Web Library  •  Publication Searh  Русский 
Publication

KIAM Preprint № 53, Moscow, 2024
Authors: Chaynikov Y.S., Sudakov V.A.
On the estimation of integral risk of predictor Lipschitz functions in machine learning models
Abstract:
Class imbalance in available training samples for solving machine learning problems in most practical cases complicates the training of predictors that effectively generalize patterns from the training dataset to the general population. This paper investigates the theoretical foundations of the effectiveness of adding synthetic data to the training set. In the assessment of overall risk, two types of errors are highlighted: representation error and deviation error. Practical recommendations are formulated for creating synthetic samples that deviate in their distribution from the representative ones by the density distribution of the argument, with more frequent samples in those areas where the density distribution of the argument has relatively low values, leading to a reduction in the size of the corresponding Voronoi cells and a reduction in the contribution of deviation error to total risk.
Keywords:
synthetic data, machine learning, Voronoi cells, predictor, training sample, total risk, empirical risk, representation error, deviation error
Publication language: russian,  pages: 12
Research direction:
Mathematical modelling in actual problems of science and technics
Russian source text:
Export link to publication in format:   RIS    BibTeX
View statistics (updated once a day)
over the last 30 days — 7 (-28), total hit from 12.08.2024 — 42
About authors:
  • Chaynikov Yuri Sergeevich,  orcid.org/0009-0000-0720-5189Moscow Aviation Institute
  • Sudakov Vladimir Anatolievich,  orcid.org/0000-0002-1658-1941KIAM RAS