Rebecca Garcia is Director of Sales, SAS Federal (Courtesy photo)
After multiple attacks on U.S. embassies in the 1990s, the FBI created an Investigative Services Division in 1999. Its focus was collating counterterrorism data.
Staffed with a team of analysts, the division sorted through file after file searching for clues that could predict the next terrorist attack. Despite their efforts, the team couldn’t possibly sort through every single file because one critical file that they missed was hidden on the first floor of FBI headquarters in a storage room and, therefore, was not considered. The file contained a 1998 report by an FBI pilot who had suspicions about Middle Eastern flight students in Oklahoma City. The report didn’t make it into the new division’s predictive assessments.
How many more clues are hidden in filing cabinets, storage rooms or siloed systems waiting to be uncovered — or might be missed?
Since that time, most data has gone digital but it’s also multiplied. Ninety percent of the world’s data has been created in the last two years. With such an explosion of information, it’s increasingly difficult for those who depend on that information to make sense of it because there is too much noise, not enough signal.
The problem impacts every enterprise but is particularly acute in the intelligence community, where information collection is outstripping analysis at an alarming rate.
Consider the amount of data now being produced by UAVs over Afghanistan. According to estimates from the Department of Defense, their more than 700 HD video cameras are now collecting the equivalent of more than 700 years of video annually.
Data isn’t static, and yet the systems and processes used to analyze data assume that we’re operating in a static environment with significant time to perform analysis. But the clock is ticking; every minute that passes, a critical piece of intelligence may go unseen. Analysts need the ability to focus more time on uncovering threats and less time on finding clues in existing data.
Today, most data collection is a two-step process. The information is collected and then immediately stored. In this model, intelligence agencies become information hoarders, where every bit of information has equal importance once it is stored.
Making sense of the data, categorizing it and assessing its value is largely still a time-consuming manual process conducted by the analyst. With Big Data and intelligent automation, however, the tables are beginning to turn.
Using event stream processing or implementing a grid, agencies can turn traditional data collection on its head. The new techniques automatically review data before it is stored and score only the most relevant information for each user.
Where before all information was stored and was therefore effectively equal in weight, applying scoring to the data assigns value based on mission relevance. Data scientists call this contextual scoring, and there are several distinct capabilities that it enables:
■ Data normalization: The information is organized to improve cohesion (the degree to which data belongs together) and eliminate redundancy.
■ Content categorization: Natural language processing identifies key topics and phrases in electronic text. Once common phrases, names, places, organizations, or topics are found, the content can be automatically categorized.
■ Data enrichment: Data is appended to include important meta-data, such as geographic, demographic and linguistic details.
■ Relevance scoring: A score is derived based on the relevance a specific topic has in a document to information desired by an analyst. If the document has very little relevance to topic A, the score is low for topic A; if the document has very high relevance for topic B, that same document has a high score for topic B.
Rather than analysts searching for information, these capabilities mean the information searches for them. Just as popular retail websites suggest new products to shoppers based on their previous purchases, this system suggests new data to analysts based on the documents and emails they’ve been using. By bringing statistical and semantic analysis to the workflow, scoring the data through use of a grid or event stream processing ensures that analysts are provided relevant information first.
Moreover, as the system learns what is of interest to each person, the relevance becomes even more refined. An analyst could even request only documents on one or more topics that have relevance above a specific score, or documents that contain all topics of interest above a certain score. This would provide the analyst only the information most relevant to their need.
Intelligence organizations can no longer afford to have their best people searching through hundreds or thousands of documents to find relevant information. The volume and velocity of data is simply outstripping the capacity of intelligence officers to make sense of it.
These new developments promise faster and more informed decision-making. More importantly, they suggest a day when important warnings — like the one filed by the FBI pilot — make their way to the top of the queue and provide decision-makers with the critical insight needed to thwart the next terrorist attack.
Rebecca Garcia is Director of Sales, SAS Federal