Center for Simulation and Modeling, University of Pittsburgh
Reliable High Performance Computing Through Migratable Objects
Wednesday November 20, 2013, 3:00 PM
Seminar Room 5317, Sennott Square
ABSTRACT: Supercomputers are a fundamental tool in solving many of the most important scientific problems. However, larger and more powerful machines are necessary to keep pushing the envelope in computational science. These extreme-scale systems will have to assemble an immense amount of components. The direct effect of such a large number of parts in a machine is a high failure rate and tight power and energy constraints. Therefore, these challenges must be addressed in order to provide a functional extreme-scale supercomputer. In this talk, we will describe the migratable-objects computational model and how it can be leveraged to provide a resilient and energy-efficient scheme for large supercomputers. We will review a collection of fault-tolerance techniques that are empowered by this model and see the substantial benefit of using migratable objects. We will use an analytical model throughout to help us make performance and energy projections for extreme-scale machines.
BIOGRAPHY: Esteban Meneses is a Research Assistant Professor working in the Center for Simulation and Modeling at the University of Pittsburgh. His research is focused on load balancing and fault tolerance techniques for large-scale parallel applications. He holds a PhD degree in Computer Science from the University of Illinois at Urbana-Champaign. In his doctoral dissertation, he proposed a collection of strategies to decrease the memory overhead of message-logging protocols. He showed how those protocols, coupled with the migratable-objects computational model, provide an energy-efficient mechanism to tolerate the high frequency of failures expected at extreme-scale supercomputers.