• SC Home
  • SC Organization
  • SC Jobs
  • Contact SC
  • DOE Quick Links
  • DOE Home
ASCRlogo ASCR Discovery home page 
  • Feature
    • Energy-materials research a test case for big-data flood
    • Archive
  • Kernels
    • Research out to optimize uncertain energy future
    • Archive
  • Big Iron
    • Petaflops performance scored running universe simulation
    • Archive
  • At the Universities
    • To rid water of salt, MIT group taps thin carbon and computing
    • Archive
  • Synchronized
    • Modernizing old codes for a new era of scientific computing
    • Archive
  • Genealogy
    • Mathematician makes most of abiding passion for optimization
    • Archive
  • New Faces
    • Of colorful candies and fluid dynamics
    • Archive
  • Exascale Science
    • As climate changes, so must the tools to model it
    • Archive


FastBit: Digging through
databases faster

Posted Dec 23, 2008

Imagine trying to find someone in a New York City telephone book if the names were listed randomly.

Scientific databases prove even more daunting, perhaps holding a quadrillion bytes of data – about 5,000 desktop hard drives – or more. As databases grow, quickly finding specific information would be unmanageable without some data organization that facilitates searching.

Around the turn of the millennium, a team of scientists at Lawrence Berkeley National Laboratory (LBNL) began work on this database-indexing problem with something called FastBit. Since then, the technology has sped large data searches across the spectrum, from sifting through Enron e-mails and hunting new-drug molecules to tracking burn episodes in combustion simulations. The work earned the Berkeley researchers a 2008 R&D 100 Award for technology advances.

Many database indexes use tree data structures, such as B-trees. These look like upside-down trees, branching out into ever-more categories as the index goes deeper into a database. A search goes down the tree to find a specific section, sections within a section and so on. Such indexes work well when data are frequently added or deleted.

“B-trees are designed to support changes made to the data,” says Arie Shoshani, senior staff scientist at LBNL’s High Performance Computing Research Department in the Computational Research Division. “They make it possible to go down the tree and find the page or section where a change is made, and you only touch that section.”

But he adds, “The cost is that more space is needed for accommodating data changes. For scientific data that do not change over time, the extra space is wasteful and slows down the search.”

Bitmaps provide an alternative indexing approach. “We have been working on the idea of using bitmaps for scientific applications,” says Kesheng “John” Wu, who works in the Scientific Data Management Research Group at LBNL. “This method is known to work well for low-cardinality data, like the gender of a customer: male, female or unknown.”

For the most part, though, bitmaps only work efficiently when the cardinality is below, say, 100 distinct values for an attribute. “We wanted a way to make bitmaps work when the cardinality is higher, such as with instruments recording temperature or pressure,” Wu says.

That desire spawned FastBit.

1   |   2   |   3   |   4   |   5   |   Print     Next »

Genealogy brings to life the evolution, present and future of scientific computing and simulation.


 

Contact

Kesheng "John" Wu
john.wu@acm.org

 

Related links

FastBit Home Page



COMPUTER SCIENCE

Home Contact Us Archive Subscribe ASCR Home About ASCR Press Center
  • SC Jobs
  • Contact SC
  • SC Web Policies
  • DOE Phone Book
  • DOE Employment
  • DOE FOIA
  • DOE Privacy Policy
  • DOE Web Policies
  • DOE No Fear Act
  • DOE Small Business
  • DOE Information Quality
  • E-Gov
  • The White House
  • USA Gov