For decades, computers were protected by anti-virus software loaded with the signatures of previously identified malicious code. It’s an effective technique if most of the attacks have confronted other systems before.
But attackers have gotten smarter, adapting and morphing attacks regularly. And while human beings are built to grasp near equality when comparing two items, computers have a much harder time of it. Even minor changes in the code would mean that an attack wouldn’t match a prior signature.
So researchers are working on a more granular approach, digging into components of attacks to find similarities, exploiting the fact that attackers rarely start completely from scratch — it’s expensive to create fresh code every time. But teaching computers to find patterns in parts of an attack isn’t easy, either.
“It’s terribly hard, because programming is a personal kind of thing,” said Alan Paller, director of research at the SANS Institute, Bethesda, Md. “You may think you’re looking at the same thing, but if it’s from two different people, it will be very different.”
The critical part of the process is figuring out how to cluster things effectively to ensure that useful connections are drawn.
Georgia Tech Research Institute (GTRI) has developed a technique to do just that, implementing clustering into its Apiary system.
“We’re trying to derive relationships,” said Christopher Smoak, one of the lead developers of Apiary at GTRI.
It’s part of the group’s effort to ratchet up defense capabilities in the face of the growing onslaught of attacks.
“It’s a sad state of affairs, for $25 to $50 you can attack anything that you want,” Smoak said.
The system GTRI is developing allows for calibration to identify a desired level of similarity between samples of attacks. That’s the Goldilocks-type goal that researchers are having to figure out to make the intelligence they generate useful.
GTRI isn’t alone in trying to take advantage of computational power to find connections. Researchers at Mandiant’s labs started working on the issue to solve the problem of having too many malware samples to analyze.
A client arrived with a “bucket” of more than 500 samples they wanted deconstructed and analyzed, said Michael Sikorski, a technical director with the firm. The researchers in the lab didn’t think that they could analyze that many samples independently, so they started to create a machine learning system to find patterns in the samples.
In the solution they came up with, they discovered there were really only 40 strains in the group that were meaningfully different, allowing the researchers to focus their efforts.
“It is very difficult, and for the classifying, you need to have very good data,” said Sikorski, author of the book “Practical Malware Analysis.”
“For the clustering, it takes a lot of tinkering to make sure that the algorithms are good,” he said.
For Mandiant, which spends most of its time helping clients who have been attacked, using tools to find similar attacks helps experts determine who is attacking and how they should respond.
“We have a lot of knowledge of how these different groups operate; the quicker we can realize who we’re dealing with, the quicker we can solve a client’s problems,” Sikorski said.
Still, researchers said using techniques that automatically block similar threats is problematic, as human judgment is still an important calibration tool.
And even if a system could become intelligent enough to respond on its feet, a more granular anti-virus model may be of little utility, Paller said.
“It might quadruple the value, but it still doesn’t get up to the behavior test or anywhere near the white listing,” he said. “It will make a weak system stronger. It’s still a weak system.”
Instead, Paller said, the real improvements will come from white listing, a technique where only programs that have been certified in a directory are allowed to run on a computer. Attacks can still get through, but it’s the best technique that’s been developed thus far, he said.
“Well over 90 percent of the attacks that matter are started by a social engineering of one kind or another that caused somebody to put a program on the system that went ahead and did bad things,” Paller said. “The nature of that attack is that it uses an attack that cannot be stopped, as long as a victim is alive and on the Internet.” ■