Berkeley Lab researcher applies
graph theory to biofuels search
Posted September 4, 2013
Graph clustering, personalized. Click image to enlarge and for more information.
Energy scientists hope new plant varieties will yield large amounts of biomaterial, which new types of microbes will quickly and cheaply convert into clean-burning fuel. To achieve these goals, researchers will need novel algorithms that enable supercomputers to sift through the combined genomes of several hundred plants or microbes in search of useful genes.
Aydın Buluç a computational research scientist in Lawrence Berkeley National Laboratory’s Computational Research Division, will use support from a Department of Energy Early Career Award to develop faster and more energy-efficient data-mining algorithms than now available in the quest for new biofuels. (The dotless i in Buluç’s first name springs from the alphabet used in his native Turkey, as does the c-cedilla in his last name.)
The challenge and promise of new algorithms have the same root: graph theory. The theory’s tenets as an abstract way to think about information are tried, true and, in computational sciences, ubiquitous. Each data point is represented by a vertex, and each relationship, if any, between vertices is represented by a line, or edge. Assigning different weights to different edges to represent distances between markers lets a programmer create a network representing all markers and the distances between them. Then the analysis begins.
The World Wide Web is a familiar graph. Each page is a vertex and each link is an edge. Road maps also are graphs – ones analyzed many times a day. Each time someone asks Google, MapQuest, Yahoo! Maps or a similar website for driving directions, the server performs a point-to-point shortest-path query – a graph algorithm.
“Graphs are a great way to represent data,” Buluç says. “For example, in genetic mapping, you have lots of polymorphisms (genetic variations) within a species that act as genetic markers. You need to find the locations of these markers in the genome to build a scaffold that guides the whole-genome assembly. How do you order the markers? You use a graph algorithm.”
Graph theory lets computers do wonderful things, but graphs have disadvantages. An algorithm’s progress through a graph is like a treasure hunt. The algorithm starts at one vertex and traverses the edges from one vertex to the next. It never sees the whole graph and never sees more than one step ahead. Instead, it discovers each next destination only after it has arrived at the latest vertex and finished the task at hand. Then the algorithm proceeds to the next vertex, even though it may be – and often is – the most costly step to take.