Computer Science
February 2016

Moving target

PNNL-led Analysis in Motion group comes up with algorithms that assist real-time interactions with data.

Analysis in Motion’s compressive analysis project supports rapid
event or data-of-interest detection and categorization, which
speeds streaming-data processing or pre-processing of large
volumes of static data. This image represents 24 hours of
electrical frequency data sampled at 2 megahertz in grayscale.
The color overlays each represent a single video compression
metadata value generated by converting the electrical data into
video and then compressing it. Image courtesy of Pacific
Northwest National Laboratory.

When thinking about the Analysis in Motion (AIM) program at Pacific Northwest National Laboratory (PNNL), it helps to compare it to driving a car or flying a plane, says Mark Greaves, the initiative’s leader.

In both situations, “you make decisions on the fly given the stream of data which is coming at you,” Greaves explains in a video PNNL made about the program. “You don’t collect a lot of data and then figure out a week later what you should have done.”

The approximately 40 PNNL investigators plus faculty at five universities who comprise AIM want scientists to think the same way about the overwhelming streams of information their experiments can produce. The team is devising new ways computer algorithms can help scientists interact with that data in real time, allowing them to change the experiment’s course if necessary.

By enhancing our ability to “explore data and make decisions on the fly during an experiment, AIM should allow us to accelerate discovery,” Greaves says.

Rob Jasper, AIM’s chief scientist, says the program targets cases “where the data is flowing and changing over time, and people have to interact with (it) as it happens,” such as controlling large computer or power networks or tracing output from complex physics experiments.

AIM’s goal is to develop new methods that let humans interact with incoming information and dynamically explore hypotheses as it flows by. AIM combines research on streaming analysis algorithms with advanced work on human-computer interaction in near-real-time environments. Researchers usually analyze information from experiments after it has been collected and structured in a database. After the data are collected, “the scientists spend a lot of time – often months – analyzing it,” Greaves says. “Because the data isn’t changing, there are no real-time constraints on analysis speed.”

In contrast, the planned five-year AIM initiative, which began in 2013, develops technologies to “explore phenomena in real time using the data as it is being collected, and dynamically steer an experiment based on advanced analytics,” Greaves explains. That means, Jasper says, the underlying algorithms must work on data streams at high speeds and quickly adapt to the scientist’s changing needs.

‘How can we support a scientist interacting with that imagery stream so that she is quickly able to focus on specific areas she identifies?’

Despite the metaphor Greaves used in his video, the initiative isn’t necessarily interested in control problems for self-driving vehicles or the latest computerized airliners and fighter jets. “We understand a great deal about specific high-value situations, like those faced by drivers and pilots,” he says. “But, broadly speaking, our knowledge about the ways that experts can use analytics to interact with streaming phenomena is quite shallow. So AIM is trying to push the whole state of the art in interactive streaming analytics.”

The program focuses its research on three areas that involve humans interacting dynamically with streaming information. In each, people must use incoming data to make essential steering decisions.

First, AIM researchers are working with another PNNL team that’s building an electron microscope able “to take molecular movies and watch chemistry in action,” Greaves says. It “will create an enormous amount of very high-rate imagery data. How can we support a scientist interacting with that imagery stream so that she is quickly able to focus on specific areas she identifies?” By providing software to interpret that information and explore phenomena in real time, “the microscope will become far more interactive, and the science work will be accelerated.”

Second, AIM is supporting analysts who track illicit proliferation of nuclear materials. But rather than focus on sensing the presence of unaccounted-for material, the goal is tracing “all of the things that might happen in the proliferation network long before the nuclear materials ever get moved,” AIM founder Bill Pike says.

There’s much streaming data to collect by following how illicit property is moved, Greaves adds. “If you’re going to make prohibited nuclear materials, you’re going to need to acquire and transport the components and expertise to outfit a production facility.” The AIM team wants to help analysts find early signs of that behavior.

Third, AIM wants to find better ways to support the specialists who defend complex computer networks. Today’s automatic defenses weed out many common attacks, Greaves says, but “for sophisticated threats like those posed by insiders, cyber defenders need to work with the streaming data, probing to understand what the data is showing them, and using all their background knowledge and intuition to try and figure out what is happening.”

AIM researchers are targeting these areas with automated and semi-automated approaches to support dynamic interaction with mind-boggling volumes of streaming data. The mathematical methods analyze data at high speeds to construct hypotheses and continuously generate predictions that are more useful to humans than the raw information.

AIM researchers are also evaluating ways to visualize the data as they stream in, putting them into forms scientists can use more easily. “Essentially we want these visualizations to tell stories with data,” Pike says.

Finally, the initiative evaluates its systems in carefully structured experiments that compare human performance in streaming situations with and without AIM tools.

The net result, Greaves says, is “AIM’s research should give scientists and others the ability to make sense of data at larger volumes and faster speeds.”