• SC Home
  • SC Organization
  • SC Jobs
  • Contact SC
  • DOE Quick Links
  • DOE Home
Exascale banner ASCR Discovery home page 
  • Feature
    • Energy-materials research a test case for big-data flood
    • Archive
  • Kernels
    • Research out to optimize uncertain energy future
    • Archive
  • Big Iron
    • Petaflops performance scored running universe simulation
    • Archive
  • At the Universities
    • To rid water of salt, MIT group taps thin carbon and computing
    • Archive
  • Synchronized
    • Modernizing old codes for a new era of scientific computing
    • Archive
  • Genealogy
    • Mathematician makes most of abiding passion for optimization
    • Archive
  • New Faces
    • Of colorful candies and fluid dynamics
    • Archive
  • Exascale Science
    • As climate changes, so must the tools to model it
    • Archive

At exascale, being oblivious
to a fault keeps apps running

Posted December 12, 2012

The Fault-Oblivious eXtreme-scale (FOX) software stack
The FOX software stack. Click image to enlarge and for more information.

Computer scientist Maya Gokhale is optimistic about exascale computing’s fast, bright future. But to achieve this success, the Lawrence Livermore National Laboratory researcher and her colleagues focus on an exascale computer’s inevitable failures, from processor cores, to memory and communications links – the unprecedented millions of hardware parts. The failure of any one of these elements could hobble a 1018 floating-point-operation scientific calculation.

“Failure will be a constant companion in exascale computing,” says Gokhale, a 30-year veteran in thinking about the silicon edge of high-performance computing (HPC). “We expect applications will have to function in an environment of near-continual failure: something, somewhere in the machine will always be malfunctioning, either temporarily, or simply breaking.”

Yet just as the most successful entrepreneurs learn and grow from overcoming failure, HPC scientists like Gokhale believe planning for breakdowns at the exascale will not only improve scientific computing; it may be a tipping point toward a new paradigm in how supercomputers are programmed.

As part of its effort to outsmart faults at the exascale, the DOE Office of Science is looking to FOX – the Fault-Oblivious eXtreme-scale execution environment project.

Led by Gokhale, FOX is an ambitious project, now in the last of its three years, with a transformational vision. To succeed at the exascale, FOX’s creators believe, HPC must fundamentally change its approach to failure – from one of seeking hardware perfection at all costs to one of embracing hardware failure when designing a new generation of operating systems.

The failure frontier

With 100 million to a billion processing cores spread among millions of nodes, an exascale supercomputer will represent an unprecedented opportunity for scientific computing. But it also will spawn a new frontier in terms of failure rates.

1   |   2   |   3   |   4   |   Print       Next »

Exascale Science looks at challenges in building and the scientific possibilities of next-generation computers that will operate at exaflops – processing power many times that of today’s fastest machines.

 

CONTACT

Maya B. Gokhale
Lawrence Livermore National Laboratory
gokhale2@llnl.gov


 

RELATED LINK

FOX project

Home Contact Us Archive Subscribe ASCR Home About ASCR Press Center
  • SC Jobs
  • Contact SC
  • SC Web Policies
  • DOE Phone Book
  • DOE Employment
  • DOE FOIA
  • DOE Privacy Policy
  • DOE Web Policies
  • DOE No Fear Act
  • DOE Small Business
  • DOE Information Quality
  • E-Gov
  • The White House
  • USA Gov