Evaluation of Vector Transformations for Russian Static and Contextualized Embeddings

Korogodina O.; Koulichenko V.; Карпик О.В.; Klyshinsky E.

Аннотация:

The authors of Word2Vec claimed that their technology could solve the word analogy problem using the vector transformation in the introduced vector space. By default, the same is true for both static and contextualized models. However, the practice demonstrates that sometimes such an approach fails. In this paper, we investigate several static and contextualized models trained for the Russian language and find out the reasons of such inconsistency. We found out that words of different categories demonstrated different behavior in the semantic space. Contextualized models tend to find phonological and lexical analogies, while static models are better in finding relations among geographical proper names. In most cases, the average accuracy for contextualized models is better than for static ones. Our experiments have demonstrated that in some cases the length of the vectors could differ more than twice, while for some categories most of the vectors could be perpendicular to the vector connecting average beginning and ending points.

Ключевые слова:

Word Embeddings, Vector Space, Vector Transformation, Word Analogies

Язык публикации: английский, страниц: 9 (с. 349-357)

Полный текст на английском языке:

Список цитирующих публикаций:

Экспорт ссылки на публикацию в формате:

Сведения об авторах:

Korogodina Olga, , National Research University Higher School of Economics

Koulichenko Vladimir, , National Research University Higher School of Economics

Карпик Олеся Владимировна, , ,

Klyshinsky Eduard, , , National Research University Higher School of Economics

	Библиотеки, издания • Поиск публикаций	English
	Публикация