Big Data - What Is It and How Does It Affect Me?
What Is “Big Data”?
“Big Data” is a computer term that refers to very large data sets. These sets are so large that the typical database management tools aren’t able to process them in a reasonable amount of time. The complexity of the data is also a defining characteristic of Big Data.
“Big” means different things to different organizations. For small companies used to dealing with data in spreadsheets, a relational database in the dozens of gigabytes may be considered “big” since they don’t have the tools or experience to efficiently deal with it. For larger organizations like Google or Facebook, dealing with many hundreds of terabytes is routine.
Who Uses Big Data?
The amount of data being captured continues increasing steadily. So too does computing power and processing techniques. This has increased the number of groups that can take advantage of what can be found in Big Data.
These groups include:
-
Governments
-
Scientists
-
Business
-
Military
- Industry
Data Size Terminology
Size
| Term
|
---|---|
1,000 bytes
| 1 Kilobyte (KB)
|
1,000 KB
| 1 Megabyte (MB)
|
1,000 MB
| 1 Gigabyte (GB)
|
1,000 GB
| 1 Terabyte (TB)
|
1,000 TB
| 1 Petabyte (PB)
|
1,000 PB
| 1 Exabyte (EB)
|
1 EB
| 1,000,000,000,000,000,000 bytes
|
What Does Big Data Do?
By itself, Big Data doesn’t do anything. There may be lots of useful information hidden away in the data, but it just sits there until someone uses tools to discover it. These analytical tools can find trends and correlations that may not be noticeable with smaller sets of data.
One such tool is Apache Hadoop.
Hadoop is open source framework for processing large data sets using clusters of commonly available hardware. It uses HDFS (Hadoop Distributed File System) to spread the data across multiple computers and replicates the data in multiple places to protect against loss in case of hardware failure. Hadoop also spreads the processing of the data over multiple computers - possibly thousands - to crunch the data.
Where Does This Data Come From?
The world today is very connected and very monitored. Data is constantly being generated and captured in many different ways. A short list of examples includes:
-
Internet search terms
-
Transaction information from stores
-
Traffic monitors in cities and on highways
-
Weather stations
-
Telephone call logging
-
Television viewing information from cable companies
-
Medical tests
-
Scientific instruments
- Usage data from electric and gas utilities
Examples of Big Data in Action
The following examples are just a few ways that different groups are using Big Data for different purposes.
Google is one of the original Big Data companies. The size of Google’s data collection isn’t publicly acknowledged. Because they constantly scan and index most everything accessible on the Internet, they’re likely to have hundreds of terabytes of data.
They also closely track the people who use their services. You pay for these “free” services with information about your Internet habits gleaned from your search terms and browser cookies. This information becomes part of the Big Data and is used to select advertisements to display to you. In effect, the customer that Google sells to is its advertisers and the product it sells is information about you!
Large Hadron Collider
The Large Hadron Collider, built and operated by CERN, was built to conduct experiments in particle and high energy physics. In 2012, data about the particle collisions was being generated at a rate of 25 petabytes per year. This Big Data is analyzed using the world’s largest computing grid made up of 170 facilities in a worldwide network across 36 countries.
Among other things, this has confirmed the existence of the elementary particle called the Higgs Boson (the “god particle”), which was first theorized in 1969.
Amazon
Amazon has transaction records for over 215 million active customer accounts and 1.5 billion items in its on-line store. It also has information about shipping, product availability, product reviews, supply, demand, pricing, and many other things. It uses this information to give its customers a better shopping experience and to make suggestions to get them to spend more. Amazon also packages this information and sells it marketers who use it to display advertising tailored to you.
Health Care
Big Data is used in many areas of health care. There are four general sources of this data:
-
Clinical data (e.g. patient records)
-
Pharmaceutical research data (e.g. clinical trial results)
-
Activity and cost data
-
Patient behavior data
Big Data provides the tools to correlate information from these different sources to identify patients that are more at risk for certain medical conditions. It helps researchers understand which treatments are more or less effective for certain conditions and certain people. The cost effectiveness for different treatments is also researched.
Weather
Weather monitoring and prediction is a Big Data application. Weather is very complex and the more information you have from as many monitoring sources as possible, the better the predictions will be.
As of 2013, The Weather Company, the parent company of The Weather Channel and other weather-related outlets, takes in 2.2 million weather data points from around the world 4 times per hours. That’s over 211 million data points daily. The new system that they’re in the process of implementing will increase that to 2.5 billion data points 15 times per hour - an increase of more than 4,200 times.
In addition to using all this data to predict the weather, The Weather Company also uses it to select and present relevant advertising to local areas affected by the weather. If rain is in your forecast, don’t be surprised to see advertisements for umbrellas.
Wal-Mart
One of the largest retail chains in the world, Wal-Mart handles more than 1 million customer transactions every hour. These transactions are fed into databases estimated at more than 2.5 petabytes and include information on the purchasing activity of over 145 million Americans. They use this information in their customer relationship management tools to not only track all of your purchases, but to also make predictions about your future interests.
A division of Wal-Mart called @WalmartLabs developed a large database called the “Social Genome”. They describe the Social Genome as “a vast, constantly changing, up-to-date knowledge base with hundreds of millions of entities and relationships. We then use the Social Genome to perform semantic analysis of social media and to power a broad array of e-commerce applications.”
Among other things, this information is used to determine how to best market products to its customers and the best time to mark down prices in different locations to maximize sales.
Privacy Concerns
While there is no question that there are benefits to Big Data, there are also risks. The biggest concern that most people have is privacy.
Because of recent leaks, it is now known that the US government, in cooperation with the governments of Canada, Great Britain, Australia, and New Zealand, has been gathering huge amounts of data regarding phone calls, and information from Google and Yahoo accounts and other sources. While the stated purpose of this Big Data is to track down terrorists, it includes information about many millions of ordinary citizens.
All the Big Data being processed and sold to marketers for the purpose of getting you to buy their products also seems to border on invasions of privacy. In one famous case, the department store chain Target was able to use the data it gathered about the purchasing habits of a high school girl to determine that she was pregnant. Her father, who was unaware of her condition, was quite surprised when she started receiving baby-related coupons.
I recently encountered a web site that had records including my name, age, address, phone number, and the same information about 10 relatives in my state all grouped as probable family members. This was all found in publicly-available records - Big Data.
Do you use "Customer Loyalty" cards?
How Can I Protect Myself?
There is little you can do to stay completely out of these data sets without impacting your usual way of doing business and interacting on-line, but there are a few things you can do that help.
- Pay Cash
Paying by cash instead of using credit or debit cards will allow you to avoid having certain transactions tracked. While this works for brick-and-mortar stores, it doesn't help when shopping on-line.
- Disable Browser Cookies
You can disable cookies in your browser, but this is likely to affect your browsing experience. Some sites won't work at all. Plus, there are ways other than cookies that web sites use to track you.
- Keep Your Phone Number To Yourself
Many stores routinely ask you for your phone number at the cash register. I'm always surprised at how many people automatically give it out. Cashiers are often shocked when I refuse to give it to them.
- Don't Use Store "Customer Loyalty" Cards
The sole purpose of these cards is to feed as much information as possible about you into the store's database. I don't mind losing the discount to keep a little privacy.
- Social Media
Be mindful of what you post on social media sites. Assume that everything you post will be examined by Big Data tools for marketing and other purposes. Social media marketing is one use of Big Data.
Conclusion
Big Data is a relatively new thing; we’ve only been able to process such large amounts of data in a useful time frame for a few years. Like all tools, Big Data can be used for both positive and negative purposes. Society will need to decide what kind of limitations it wants to put on its use and pressure government to enact laws enforcing those limits.