Tom Ryan wanted to build something that could identify criminal behavior inside massive mobile networks, stock trading services, ecommerce sites, and other online operations. So he turned to a pair of familiar names for help: Facebook and the NSA.
He didnt exactly knock on Facebooks front doorlet alone the NSAs. But he did adopt a pair of sweeping software systems built by these giants of the online age, systems that help them juggle the massive amounts of digital information streaming into their computer data centers.
Ryan grabbed an NSA tool called Accumulo, which likely plays a key role in the agencys notoriously widespread efforts to monitor internet traffic in the name of national security, and he paired it with a Facebook tool called Presto, used to quickly analyze the way people, ads, and all sorts of other things behave on the worlds largest social network. Both Facebook and the NSA, you see, have open sourced their software, meaning these tools are freely available to the world at large.
Ryan is the CEO of a small Silicon Valley startup called Argyle Data. Over the past sixteen months, he and his engineering team used Accumulo and Presto to fashion software that can root out fraud inside todays massive online operations, and theyve already deployed the thing with at least a few companies, including Vodafone, the British telecommunications giant that runs mobile phone networks across Europe.
Argyle is a nicely rounded metaphor for the recent evolution of the data-juggling technologies that drive our modern businesses. Over the past several years, massive web companies such as Google and Facebookas well as similarly ambitious operations like the NSAhave built a new breed of software that can store and analyze data across tens, hundreds, and even thousands of machines, and now, these software tools are trickling down to the rest of the business world. As a startup, Ryan says, you want to build on whats new, not whats old.
The poster child for this movement is a software system called Hadoop, which was inspired by work originally done at Google. But Hadoopat least as it was originally conceivedis now giving way to tools that operate at much faster speeds. Hadoop is a batch system, meaning you assign it a task and then wait a good while for the answer to come back. Newer systems are much better at operating at speed.
Argyles software is a prime example. Using machine learning and whats called deep packet inspection, it analyzes the individual packets of data that stream across a network, and if a piece of data meets certain criteriai.e. sets off certain flagsit gets shuttled into Accumulo, a massive database that can extend across myriad machines. It helps us scan tens of millions to hundreds of millions of transactions a second, Ryan says. Companies can then use a version of Presto to further analyze this data, executing specific queries in near real-time.
Christopher Nguyen, the CEO of a data analysis startup called Adatao who once worked with similar big data software inside Google, says that Arygles method isnt necessarily the best way to analyze such massive amounts of information at speed. But he agrees that this is part of a much much larger movement towards real-time big data tools, tools that also include something called Spark, developed at the University of California at Berkeley, and various other software contraptions.
At the same time, Argyles story underlines another aspect of this movement. At the NSA, you see, Accumulo is likely part of a surveillance effort that underpins our online privacy, and as the tools like this make it easier to collect and analyze such enormous amounts data, they may help push us towards a world where privacy is eroded even further. Vodafone, after all, is using Argyles software to closely analyze data streaming across European wireless networks used by the general public.
According to Seth Schoen, a staff technologist with the Electronic Frontier Foundation, laws typically allow companies to use tools along the lines of Argyleincluding deep packet inspectionto do things like fight fraud. But in the end, their affect on privacy boils down to the policy of each individual company. The good news with Argyle, as Ryan points out, is that the NSA built Accumulo so that organizations can closely control who, within their operation, has access to each individual piece of data. Its a trade off, Ryan says. Privacy is so important. But with more data-enrichment, you can improve the results of your analytics.
Read this article:
Startup Fights Fraud With Tools From FacebookAnd the NSA