FastBit: Digging through
databases faster
(page 4 of 5)
The upshot: FastBit outruns other search indexes by a factor of 10 to 100 and doesn’t require much more room than the original data size. With one million events from STAR, for example, Wu and his colleagues took the dozen most commonly queried attributes (from more than 500), built a bitmap for each and then stored the bit sequences in several ways, including BBC and WAH. On average, WAH searched the data 12 times faster than BBC.
The scientists received a patent for the WAH concept on Dec. 14, 2004.
Road-testing FastBit
FastBit already speeds up a wide range of searches. For example, the LBNL group tested FastBit on a portion of an Enron employee e-mail database made public during 2003 legal proceedings against the company. In January 2006, Wu, Shoshani and colleagues reported on a search-speed test on 250,000 Enron e-mails indexed with FastBit versus the open-source database management system MySQL. Depending on the specific parameters of a search, FastBit outran MySQL by 10 to 1,000 times.
“Most of the queries people use so far are on single sets of data,” Wu says. “The idea of exploring the Enron data was to work with more than one set, more than one table of data.” So to mine the entire collection of Enron data, tables had to be combined.
“Doing a join in databases is always a challenging problem,” Wu says. In addition, the bulk of Enron data is text instead of numbers. “A typical search on text data is the keyword search, and we wanted to explore that.”
FastBit also improves the capabilities of searching simulation data, such as looking at the chemical and physical features of combustion. “Here you are simulating what happened in space and time,” Wu says. For example, scientists might look for where the burning goes on and where it doesn’t and how that affects the byproducts generated in the process.
“You get lots of numbers, like temperature, but no one number that says, ‘I’m burning,’” Wu explains. “Even the highest temperature is not an indication of burning. The definition of burning involves a bunch of variables, and it’s complicated.” FastBit helps combustion scientists search through databases for the collection of parameters that do indicate burning and the quality of the burn.



