Big Data Under Attack
Once again, everybody is talking about China. On Feb. 19, Mandiant, an American security company issued a startling report — the result of a six-year investigation — that makes the claim that the United Sates is in a cyber war with a 12-story building in Shanghai. The private security analyst concluded that the building is home of China’s stealth cyber war division, the People’s Liberation Army Unit 61398.
If this sounds like the movie, War Games, make no mistake — this is real. According to Mandiant, for the last seven years, Chinese hackers have stolen data from at least 141 companies across 20 major industries, including critical infrastructure sectors like energy and telecommunications. At least 115 of the companies were in the United States. Sen. Dianne Feinstein (D-CA) chairwoman of the Senate Intelligence Committee said classified intelligence documents support Mandiant’s claim.
Last year, we proclaimed this the Era of Big Data, and, in light of the dramatic events of the last few weeks, we thought it was an appropriate time to consider what’s happened since. In order to understand this from the inside, we invited a leading big data expert, Brian Greenberg, VP of Technology Operations at Total Attorneys and Founder of General System Dynamics, to help us parse fact from fiction or fantasy.
First off, it doesn’t take China to do this. Hacking isn’t only done by governments and individuals; there are zombie hacker armies at work everywhere (Brian spent part of last week fighting off Russian hackers). What’s more, hackers will sometimes automate their efforts by writing programs that will try a laundry list of known security assaults (or exploits) on random computers throughout the globe — thousands of exploits on millions of computers. Once they hack their way into a computer, they automatically plant a seed of code on that computer that will spread the hacking maneuvers to more computers. Soon, thousands of computers are hacked and hacking thousands more computers. Most of the time, the owner of the hacked computer doesn’t even know she’s been hacked and is now part of a large zombie hacker army.
The Data Drain
As we face threats and theft, consider this: less than one percent of the world’s data is analyzed and less than 20 percent is protected. Part of this is a traffic problem. It’s estimated that the amount of global data will reach 40 zettabytes (ZB) by 2020 (that’s over 40 billion terabytes), an amount that exceeds previous forecasts by 5 ZBs. That represents a 50-fold growth from the beginning of 2010 and that number is probably a lowball.
Data collection and acquisition is harder today because of the explosive growth of mobile devices and multimedia. For example, your 1GB memory card will hold 715 pictures taken on a 4 MegaPixel (MP) camera. However, the same 1GB memory card will only hold 130 pictures taken with a newer 22MP camera. Consider how many pictures you take a month with your current camera or camera phone. Now think about how often you get a new camera or camera phone — for most people its every two years as cheaper ones come to market. Now you’ll need to get that much more storage for your home computer to store and back things up, not to mention in the cloud where things are expected to reside in perpetuity (consider that over 250M pictures are uploaded to Facebook every day with increasingly higher and higher resolution).
Now extend that to video cameras. The city of London has over two million CCTV cameras alone. These cameras are mostly security cameras put in place at stores, office complexes, ATM’s, in subways, and at traffic lights. As newer, cheaper technology becomes available, these cameras are replaced with higher resolution cameras with better quality pictures. Data storage requirements are already exploding into the petabyte (PB) range — a thousand terabytes — and increasing every year.
So, why are corporations so vulnerable to hackers? Because they have more data than ever. Like hanging on to clothes that don’t fit, most of America’s premier companies are starting to do essentially the same thing with data. The reason? They recognize the opportunity cost of not collecting data. Often times, the chance to get data only happens once — when it occurs. This is true whether it’s from video surveillance, pictures from a cell phone, people browsing their on-line store, GPS tracking information coming from a car, recording tweets or information collected through scientific research. But what if you don’t have the algorithms and the requisite supernerds to analyze that data? This is a real problem. The fields of data science exploring how to mine data for patterns, discover emergent properties, and generate useful information is still quite young and consists mostly of what the industry called Quants — quantitative analysts usually with a Ph.D. in math or physics with extensive skills in computer programming. But, the level of data mining grows more sophisticated everyday and the quants are forced to keep shifting goal posts. So, if you don’t yet know what to do with all that data, put it in storage. Archive it with some meta-data (data about the data collected), and then use it sometime in the future. This is where the sci-fi and cyberpunk elements enter the picture: projecting ahead to a time when a team of data scientists, a quantum computer, an exabyte of collected data and the right algorithms will discover new business strategies, reveal hidden genetic clues that vanquish devastating diseases, or combat terrorism before terrorists have a chance to strike by analyzing patterns in global events. It’s all possible.
Lastly, as the China controversy has shown us, privacy remains a huge concern. Even if the data collected is ‘Greeked’ or ‘anonymized,’ often times peoples’ identities can be extrapolated from the large sums of data collected and correlated with publicly available data from social media and other sources. Companies don’t always effectively protect and secure the private information they collect, making them prime targets for hackers, who sell or freely publish it on websites. Considering all of the advances in the way companies can collect and sell your data, we need more advocacy groups and organizations such as the Electronic Frontier Foundation (EFF) to do the critical work of defending free speech, privacy, and consumer rights.
This article was co-written by:
- Brian Greenberg, Vice President of Technology Operations at Total Attorneys and Founder & CEO of General System Dynamics. A thought leader in data storage, litigation readiness, IT Strategy, data protection, and disaster recovery, he has worked with major corporations, including Motorola, Microsoft, Dell, Fujitsu, and Bloomberg. Follow Brian at @bjgreenberg and on his blog at http://briangreenberg.net
- Tom Silva, Senior Vice President of Marketing and Strategy for The Alter Group. He is a graduate of Long Island University and is currently part of the Masters of Arts program in the Humanities at the University of Chicago. Tom was the recipient of the CoreNet Global Chicago Chapter’s inaugural Michael Kaczmareck Leadership Award. His work has been recognized by the Publicity Club of Chicago, the International Academy of the Visual Arts and the Web Marketing Association. For more information about Tom, please check out silvabrand.com and follow him at @silvabrnd.
Thanks for reading!
— Brian & Tom