spacer
ASCR Home Button ASCR Organization Button ASCR News Button Contact ASCR Button
DOE Homepage Science Homepage
ASCRlogo ASCR Discovery home page 


Breaking the bottleneck
in computer data flow

Posted February 12, 2009

If bytes were pennies and you stacked them, they would go from Earth to the sun and back, with coins to spare, before you reached one petabyte.

Given such a tower of data – an amount of information appearing in modern scientific computing with increasing frequency – the trick becomes moving it and moving it quickly. It can be done, but only with the right software.

Moving data depends on a sometimes-neglected aspect of computer systems known as input/output, or I/O. “We teach about computer architecture, starting with the processor,” says Wu Feng, associate professor in the departments of computer science and electrical and computer engineering at Virginia Tech in Blacksburg. “Then, we teach about the memory hierarchy, including caches and virtual memory. By the time we get to the end of a semester and have just a couple of classes left, I/O gets thrown into the mix as an afterthought.”

In large-scale, parallel-processing computers, though, every aspect of a system must be balanced to get top speed. Many of today’s systems stumble in the I/O realm, making it a bottleneck that can back up entire computations.

One of the first approaches to advanced parallel I/O came from Rob Ross, a computer scientist at Argonne National Laboratory (ANL). Ross led the development of the Parallel Virtual File System (PVFS). As Ross’s webpage puts it, he and colleagues have used PVFS to achieve something like pulling a few DVDs’ worth of data into your computer in a second. Nonetheless, I/O also must work far beyond a scientist’s desktop.

In many situations today, scientists rely on wide-area networks to send information to high-performance data centers, sometimes around the world. To make that possible – in a reasonable amount of time – Feng, Pavan Balaji (assistant computer scientist in ANL’s Mathematics and Computer Science Division and a fellow of the Computation Institute at the University of Chicago), and their colleagues developed ParaMEDIC (parallel metadata environment for distributed I/O and computing). Moreover, these researchers performed a biology experiment in which they dramatically improved the speed of I/O, even when moving data around the world.

Biology’s boost to I/O

One of many areas in need of improved I/O is biology, where giant genomic datasets are becoming the norm. Moreover, biologists often compare data across organisms. For example, the popular BLAST (basic local alignment search tool) software compares biological sequences, such as the strings of bases, or nucleotides, that make up DNA. The amount of data to search, meanwhile, keeps increasing. GenBank, the National Institutes of Health’s genetic sequence database, holds more than 240 billion bases and counting. So as the data grow, BLAST must work faster to keep searches doable. 

1   |   2   |   3   |   4   |   Print       Next »

Web Policies Button No Fear Act Button Site Map Button Privacy Button Phone Book Button Employment Button
spacer