FastBit: Digging through
databases faster
(page 2 of 5)
Bitmap basics
A bitmap consists of a sequence of bits: 0s and 1s. Then, a set of bitmaps makes up a bitmap index. Typically, a bitmap index includes one bitmap for every value of an attribute. If you have a gender database of subjects, for example, it needs two bitmaps: one for female and one for male. It might look like this:
| Subject Gender | Female Bitmap | Male Bitmap |
|---|---|---|
In combination, the female bitmap (10001) and the male bitmap (01100) make up the bitmap index that represents the data for these five subjects.
In scientific databases, an attribute – say, temperature – could easily require hundreds or thousands of distinct values. To encode temperatures from 0 to 100 as integers, for instance, requires 101 bitmaps. Many more values might be used if temperatures are computed as floating-point values in a simulation, and the range of measurements can be far greater than 0 to 100. These databases, therefore, could have bitmaps containing millions or billions of 0s and 1s.
Consequently, some approaches put the raw data in bins. In the temperature example, the bins might be for temperatures of 0-5 degrees, 5-10, 10-15 and so on, which would reduce the number of bitmaps to 20. Such binning, though, reduces the accuracy of later analysis if only the index is used.
Unlike tree structures, however, bitmap indexes do not work well with databases that change frequently. A modification to a record might lead to changes in a number of different bitmaps. When the bitmaps are compressed, as they usually are, these changes could require a significant amount of work. On the other hand, updating the bitmap indexes after appending a larger number of records is typically more efficient than updating tree-based indexes.
The example above also shows that bitmap indexes often contain many 0s, providing an avenue to significantly compress them. FastBit researchers found a way to specially compress the bitmaps, thus permitting more refined bins.
Building the index
A team of scientists – Wu, Shoshani and two LBNL scientists, Ekow Otoo and Kurt Stockinger – developed FastBit. To keep the size of FastBit manageable, it needed compressing. Wu and his colleagues tried a few approaches, including the byte-aligned bitmap code (BBC), which is used in Oracle.
But Wu and his colleagues saw a problem with using bytes. “Inside a computer, if you tell it to work with a byte, like with BBC, it pulls in an entire word and then extracts the byte. While byte alignment can be more compact, we thought we should use the word as the basic unit, and ensure that we do not break the word. That is the basic idea of word-aligned hybrid code, or WAH.”
It turns out that the feature of compressing the bitmaps without breaking the word boundaries greatly reduces the time needed to perform operations on the bitmaps, at a small cost of increasing the index size. That alone, however, wasn’t enough to achieve high search efficiency. The compression into words required a special selection of a word length.

