Visualization of the Evolutionary Trajectory: Application of Reduced Amino Acid Alphabets and Word2Vec Embedding
Analysis of viral evolution is a key element of epidemiological surveillance and control. One of the fundamental tools which is widely used to illustrate evolutionary history is the phylogenetic tree. Recently, we have proposed an alternative visualization for the phylogenetic tree using the evolutionary trajectory of its taxa. An evolutionary trajectory is a path starting from a taxon and ending at the root of the tree. In this paper, we propose an embedding of tree nodes by encoding their genetic sequence using a reduced amino acid alphabet and employing the Word2Vec framework. The suggested visualization maintains the phylogenetic relationship between nodes, while their proximity in 3D space depends on three factors: the type of reduced amino acid alphabet; fixed-length genetic patterns used in Word2Vec; and the neighbor effect of adjacent signatures. The results of our experiments showed that the majority of evolutionary history can be described in the embedded space. Moreover, they suggest potential application of our approach as an explanatory tool in studying various aspects: evolutionary dynamics; evolutionary deviation of viral variants; and phylogenetic characteristics, such as formation of new clades. Besides the usual local analysis of point mutations, the developed framework enables studying these aspects based on a more comprehensive global context, including neighboring effects, genetic signatures.
Visualization, Evolution, Word2Vec, Simplified Amino Acid Alphabet, Phylogenetic Tree