Головная страница ИПМ Библиотеки, издания  •  Поиск публикаций  English 

Материал конференции: "Труды Международной конференции по компьютерной графике и зрению "Графикон", CEUR"
Авторы: Korogodina O., Koulichenko V., Карпик О.В., Klyshinsky E.
Evaluation of Vector Transformations for Russian Static and Contextualized Embeddings
The authors of Word2Vec claimed that their technology could solve the word analogy problem using the vector transformation in the introduced vector space. By default, the same is true for both static and contextualized models. However, the practice demonstrates that sometimes such an approach fails. In this paper, we investigate several static and contextualized models trained for the Russian language and find out the reasons of such inconsistency. We found out that words of different categories demonstrated different behavior in the semantic space. Contextualized models tend to find phonological and lexical analogies, while static models are better in finding relations among geographical proper names. In most cases, the average accuracy for contextualized models is better than for static ones. Our experiments have demonstrated that in some cases the length of the vectors could differ more than twice, while for some categories most of the vectors could be perpendicular to the vector connecting average beginning and ending points.
Ключевые слова:
Word Embeddings, Vector Space, Vector Transformation, Word Analogies
Язык публикации: английский,  страниц: 9 (с. 349-357)
Полный текст на английском языке:
Экспорт ссылки на публикацию в формате:   RIS    BibTeX
Сведения об авторах:
  • Korogodina Olga,  orcid.org/0000-0003-3601-4677,  National Research University Higher School of Economics
  • Koulichenko Vladimir,  orcid.org/0000-0003-3256-8955,  National Research University Higher School of Economics
  • Карпик Олеся Владимировна,  orcid.org/0000-0002-0477-1502ИПМ им. М.В. Келдыша РАН
  • Klyshinsky Eduard,  orcid.org/0000-0002-4020-488X,  National Research University Higher School of Economics