High performance FDTD code implementation for GPGPU supercomputers
An implementation of FDTD (Finite Difference Time Domain) method for solution of optical and other electrodynamic problems of high computational cost is described. The implementation is based on LRnLA (Locally Recursive non-Locally Asynchronous) algorithm DiamondTorre, which is developed specifically for GPGPU (General Purpose Graphical Processing Unit) hardware. The specifics of the DiamondTorre algorithms for staggered grid (Yee cell) and many-GPU devices are shown. The algorithm is implemented in software for real physics calculation. The software performance is estimated through algorithms parameters and computer model. The real performance is tested on one GPU device, as well as on many-GPU cluster. The performance of up to 0.65・1012 cell updates per second for 3D domain with 0.3・1012 Yee cells total is achieved.