Airmen of the 497th Intelligence, Surveillance and Reconnaissance Group work at the unit's complex at Langley Air Force Base,Va. More than 80 miles of fiber-optics cable transmit intelligence data there. ()
With an almost infinite number of data sources at its disposal, the intelligence community has no trouble aggregating massive amounts of information. But putting all of that Big Data to productive use is another story.
Over the past few years, Big Data tools have become essential for sorting through and making sense out of a growing mountain of intelligence data. Yet to keep pace with skyrocketing amounts of raw information, the IC must continually update and augment its analytical capabilities.
“The IC will always be pushing the envelope on data needs, which means the solutions they leverage will be those that can really scale,” said Bob Gourley, former chief technology officer of the Defense Intelligence Agency, now CTO at Crucial Point, a technology research, consulting and services firm.
“The community also has big needs for speed and agility in solutions, so in this Big Data world, open source software is a key need.”
“The IC sectors that stand to benefit most from Big Data are organizations with detailed analytics at the core of their mission — signals intelligence, electronic intelligence, even human intelligence — as well as those sectors where massive amounts of data underpin their products,” such as the National Geospatial-Intelligence Agency, said Greg Gardner, chief architect defense solutions for NetApp, which provides analytics for extremely large datasets.
Most IC Big Data analytics tools are built on Apache Hadoop, an open source software framework that allows users to store, process and gain insight from Big Data on a huge scale. Key Hadoop tools include the Hadoop Distributed File System, Yarn (a job scheduler), MapReduce (for parallel processing) HBase (for structured query), Hive (for data warehousing), Mahout (for machine learning), Pig (for data flow) and ZooKeeper (for high performance coordination).
“All these tools come tested and bundled together in a Hadoop distribution like CDH,” Gourley said.
The IC is also turning to business suppliers to help it handle Big Data performance and scalability challenges.
“We are seeing the emergence of an Open Systems Interconnection-like stack for applications development,” Gardner said. “Platform as a Service, a category of cloud computing services that provides a computing platform and a solution stack as a service, bundles a wide range of data services from a variety of vendors.”
Data experts needed
To keep on top of Big Data trends, and to identify and acquire powerful new analytical tools, IC members are steadily adding data scientists to their teams. A data scientist, generally someone with an advanced background in statistical, data mining and machine learning skills, should be able to derive unique values from data or propose changes in processes supported by the data.
“Data scientists are crucial, as they serve as the glue that holds together various technological innovations from both government labs and industry and tunes them to the particular needs of their organizations they support,” Gardner said. “There are no simple, easy, cookie-cutter approaches at this level of complexity and abstraction, and machine speed and processing power, while important, are no substitute for human creativity and intuition.”
“The IC has long leveraged this unique blend of savvy leader, and I’m sure always will,” Gourley noted. “But it is also important to empower the individual analyst, the person who might know a bit about computers and statistics, but is really paid to research, think and produce knowledge.”
“They need an ability to do that fast and in their Web browser via capabilities like Platfora,” Gourley said.
In the years ahead, the most useful Big Data technologies will be systems focused on performance and scale, such as infrastructures based on software-defined storage, Gardner said.
“That’s a streamlined storage consumption, monitoring management, metering and protection framework for highly scalable storage services,” he said. “This storage operating environment supports not only Big Data, but High Performance Computing and more mundane, day-to-day functionality, like email, ERP applications and file and print as well.”