KIAM Main page Web Library  •  Publication Searh  Русский 

KIAM Preprint № 225, Moscow, 2018
Authors: Perepelkina A.Y., Levchenko V. D.
The DiamondCandy Algorithm for Maximum Performance Vectorized Cross-Stencil Computation
An advance in the search for the 4D time-space decomposition that leads to an efficient vectorized cross-stencil implementation is presented here. The new algorithm is called DiamondCandy. It is built from the dependency and influence conoids of the scheme stencil. It has high locality in terms of the operational intensity, SIMD parallelism support, and is easy to implement. The implementation details are shown to illustrate how both instruction and data levels of parallelism are used for many-core CPU. The test run results show that it performs an order of magnitude better than the traditional approach, and that the performance does not decline with the increase of the data size.
Stencil, LRnLA, Wave Equation, time skewing, many-core
Publication language: english,  страниц: 23
Research direction:
Programming, parallel computing, multimedia
English source text:
View statistics (updated once a day)
over the last 30 days — 6 (+2), total hit from 01.09.2019 — 98