Материал конференции: "Труды Международной конференции по компьютерной графике и зрению "Графикон", CEUR"
Авторы:Korogodina O., Koulichenko V., Карпик О.В., Klyshinsky E.
Evaluation of Vector Transformations for Russian Static and Contextualized Embeddings
Аннотация:
The authors of Word2Vec claimed that their technology could solve the word analogy problem using the vector transformation in the introduced vector space. By default, the same is true for both static and contextualized models. However, the practice demonstrates that sometimes such an approach fails. In this paper, we investigate several static and contextualized models trained for the Russian language and find out the reasons of such inconsistency. We found out that words of different categories demonstrated different behavior in the semantic space. Contextualized models tend to find phonological and lexical analogies, while static models are better in finding relations among geographical proper names. In most cases, the average accuracy for contextualized models is better than for static ones. Our experiments have demonstrated that in some cases the length of the vectors could differ more than twice, while for some categories most of the vectors could be perpendicular to the vector connecting average beginning and ending points.
Ключевые слова:
Word Embeddings, Vector Space, Vector Transformation, Word Analogies
Язык публикации: английский, страниц:9 (с. 349-357)