Conference material: "Proceedings of the International Conference on Computer Graphics and Vision “Graphicon” (19-21 September 2022, Ryazan)"
Authors:Makarova E.A., Lagerev D.G.
Using Visual Modelsfor Exploratory Analysis of Semi-structured Text Data
Abstract:
The processing of semi-structured textual data for further use in DM models is a labor-intensive process, which, in addition to material costs, can increase the time required to build a model, and, as a result, worsen the efficiency of decision-making. This article presents visual models of semistructured text data and methods for their processing at the stage of exploratory analysis. Exploratory analysis will reduce the time to select significant variables at the initial stage of the study and, in the future, avoid the processing of redundant or insignificant variables. The use of visualization will help to include in DM model and process only data that will improve DM model quality. The process of using visualization of textual data in the process of exploratory analysis and the construction of two types of visual models is described - interactive 'quantitative' visualization and visualization of relationships between words and other variables in the data under study. Approbation of the developed models is described on the example of labor market analysis. Examples of visualization of the content of the 'soft skills' field from the CV and vacancies are presented, displaying both the skills most often mentioned by applicants from various professional fields, and the impact of mentioning these skills on inviting applicants for interviews. The experiment showed that the use of the developed visual models makes it possible to determine whether it is necessary to include a text variable in the DM model at the stage of exploratory analysis.
Keywords:
Natural language processing, data visualization, exploratory data analysis, correlation coefficient, labor market analysis