Talk on asynchronous DG by Shubham Goswami
Posted on: 16 Apr 2024
Shubham Goswami is a doctoral candidate at CDS, IISc who is working on asynchronous methods geared towards massively parallel simulations. The idea is to reduce the amount of time ranks have to wait for data, which is a major bottleneck for large simulations.
A scalable asynchronous discontinuous Galerkin method for massively parallel flow simulations
Shubham Kumar Goswami
CDS, IISc, Bangalore
17 April 2024 at 2 PM
TIFR-CAM, Auditorium and on ZoomAccurate simulations of turbulent flows are crucial for comprehending complex phenomena in engineered systems and natural processes.These simulations are often computationally expensive and require the use of supercomputers, where scalability at extreme scales is significantly affected by communication overhead. To address this, an asynchronous computing approach for time-dependent partial differential equations (PDEs) that relaxes communication/synchronization at a mathematical level has been developed with finite difference schemes that are ideal for structured meshes. This work proposes an asynchronous discontinuous Galerkin (ADG) method, which has the potential to provide high-order accurate solutions for various flow problems on both structured and unstructured meshes, and demonstrates its scalability. We first propose a new method that combines asynchrony-tolerant and low-storage explicit RK schemes with reduced communication effort. The accuracy of this method is assessed both theoretically and numerically, and its scalability is demonstrated through simulations of the decaying turbulence. Subsequently, we introduce the asynchronous discontinuous Galerkin method, which combines the benefits of the DG method with asynchronous computing. The numerical properties of the proposed method are investigated, including local conservation,stability, and accuracy, where the method is shown to be, at most, first-order accurate. To recover accuracy, we developed new asynchrony-tolerant (AT) fluxes that utilize data from multiple time levels. To validate these theoretical findings, several numerical experiments are conducted based on both linear and nonlinear problems. Finally, we develop a parallel PDE solver based on the ADG method within an open-source finite element library deal.II using a communication-avoiding algorithm. Accuracy validation and scalability benchmarks of the solver are performed, demonstrating a speedup of up to 80% with the ADG method at an extreme scale with 9216 cores. The overall work highlights the potential benefits of the asynchronous approach for the development of accurate and scalable PDE solvers, paving the way for simulations of complex physical systems on massively parallel supercomputers.