Big Data and Your Privacy: How Concerned Should You Really Be?

Rini Dantes Apr 23 - 7 min read

Audio : Listen to This Blog.

Today, every IT-related service online or offline is driven by data. In the last few years alone, explosion of social media has given rise to a humongous amount of data, which is sort of impossible to manipulate without specific high-end computing systems. In general, normal people like us are familiar with kilobytes, megabytes, and gigabytes of data, some even terabytes of data. But when it comes to the Internet, data is measured in entirely different scales. There are petabytes, exabytes, zettabytes, and yottabytes. A petabyte is a million gigabyte, an exabyte is a billion gigabyte, and so on.

A Few Interesting Statistics

Let me pique your interest with a few statistics here from various sources:

    • 90 percent of data in existence in the world was created in the last two years alone.
    • 90 percent of data in existence in the world was created in the last two years alone.
    • The reason why Amazon sells five times Wal-Mart, Target, and combined is because the company steadily grew to be of 74 billion dollar revenue from a miniature bookseller by incorporating all the statistical customer data it gathered since 1994. In a week, Amazon targets close to 130 million customers—imagine the enormous amount of big data it can gather from them.

Google’s former CEO and current executive chairman, Eric Schmidt, once said: “If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place.” The significance of this statement is evident when you realize the magnitude of data that the search giant crunches every second. In its expansive index, Google has stored anywhere between 15 to 20 billion web pages, as in this statistic.

On a daily basis, Google processes five billion queries. Beyond these, through numerous Google apps that you continuously use, such as Gmail, Maps, Android, Google+, Places, Blogger, News, YouTube, Play, Drive, Calendar, etc., Google is collecting data about you on a huge scale.

All of this data is known in the industry circles as “big data.” Processing such huge chunks of data is not really possible with your existing hardware and software. That’s the reason why there are industry-standard algorithms for the purpose. Apache Hadoop, which Google also uses, is one such system. Various components of Hadoop–HDFS, MapReduce, YARN, etc.–are capable of intense data manipulation and processing capabilities. Similar to Hadoop, Apache Storm is a big data processing technology used by Twitter, Groupon, and Alibaba (the largest online retailer in the world).

The effects and business benefits of big data can be quite significant. Imagine the growth of Amazon in the last few years. In that ginormous article, George Packer gives a “brief” account of Amazon’s growth in the past few years: from “the largest” bookseller to the multi-product online-retail behemoth it is today. What made that happen? In essence, the question is what makes the internet giants they are today? Companies such as Facebook, Google, Amazon, Microsoft, Apple, Twitter, etc., have reached the position they are today by systematically processing the big data generated by their users–including you.

In essence, data processing is an essential tool for success in today’s Internet. How is the processing of your data affecting your privacy? Some of these internet giants gather and process more data than all governments combined. There really is a concern for your privacy, isn’t there?

Look at National Security Agency of the US. It’s estimated that NSA has a tap on every smartphone communication that happens across the world, through any company that has been established in the United States. NSA is the new CIA, at least in the world of technology. Remember about PRISM program that the NSA contractor Edward Snowden blew the whistle on. For six years, PRISM remained under cover; now we know the extent of data collected by this program is several times in magnitude in comparison to the data collected by any technology company. Not only that, NSA, as reported by the Washington Post, has a surveillance system that can record hundred percent of telephone calls from any country, not only the United States. Also, NSA allegedly has the capability to remotely install a spy app (known as Dropoutjeep) in all iPhones. The spy app can then activate iPhone’s camera and microphone to gather real-time intelligence about the owner’s conversations. An independent security analyst and hacker Jacob Appelbaum reported this capability of the NSA.

NSA gets a recording of every activity you do online: telephone and VoIP conversations, browsing history, messages, email, online purchases, etc. In essence, this big data collection is the biggest breach of personal privacy in human history. While the government assures that the entire process is for national security, there are definitely concerns from the general public.

Privacy Concerns

While on one side companies are using your data to grow their profit, governments are using this big data to further surveillance. In a nutshell, this could all mean one thing: no privacy for the average individual. As far back as 2001, industry analyst Doug Laney signified big data with three v’s: volume, velocity, and variety. Volume for the vastness of the data that comes from the peoples of the world (which we saw earlier); velocity to mean the breathtaking speeds it takes for the data to arrive; and variety to mean the sizeable metadata used to categorize the raw data.
What real danger is there in sharing your data with the world? For one thing, if you are strongly concerned about your own privacy, you shouldn’t be doing anything online or over your phone. While sharing your data can help companies like Google, Facebook, and Microsoft show you relevant ads (while increasing their advertising revenues), there virtually is no downside for you. The sizeable data generated by your activities goes into a processing phase wherein it is amalgamated to the big data generated by other users like you. It’s hence in many ways similar to disappearing in a crowd, something people like us do in the real world on a daily basis.

However, online, there is always a trace that goes back to you, through your country’s internet gateway, your specific ISP, and your computer’s specific IP address (attached to a timestamp if you have dynamic IP). So, it’s entirely possible to create a log of all activities you do online. Facebook and Google already have a log, a thing you call your “timeline.” Now, the timeline is a simple representation of your activities online, attached to a social media profile, but with a trace on your computer’s web access, the data generated is pretty much your life’s log. Then it becomes sort of scary.

You are under trace not only while you are in front of your computer but also when you move around with your smartphone. The phone can virtually be tapped to get every bit of your conversations, and its hardware components–camera, GPS, and microphone–can be used to trace your every movement.

When it comes to online security, the choice is between your privacy and better services. If you divulge your information, companies will be able to provide you with some useful ads of the products that you may really like (and act God on your life!). On the other hand, there is always an inner fear that you are being watched–your every movement. To avoid it, you may have to do things you want to keep secret offline, not nearby any connected digital device–in essence, any device that has a power source attached.

In an article that I happened to read some time back, it was mentioned that the only way to bypass security surveillance is removing a battery from your smartphone.

The question remains, how you can trust any technology. I mean, there are a huge number of surveillance technologies and projects that people don’t know about even now. With PRISM, we came to know about NSA’s tactics, although most of them are an open secret. Which other countries engage in such tactics is still unknown.

Leave a Reply

MSys developed Big Data ETL Workflow design for one of the world’s largest software companies. This group of experts was focused on building new products for their Hadoop Cloud offering. To know more about this customer story download our case study on Big Data ETL Workflow Designer