- HubPages»
- Technology»
- Computers & Software»
- Computer Science & Programming
Big Data what is it ?
What is Big Data?
Big Data is high-volume, high-velocity, and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making and process automation.
The 4 V's of Big Data
Velocity:
is the idea that data is being generated
extremely fast, a process that never stops.
Attributes include near or real-time streaming
and local and cloud-based technologies
that can process information very quickly.
Volume:
is the amount of data generated.
For example, exabytes, zettabytes, yottabytes, etc..
Drivers of volume are the increase in data sources,
higher resolution sensors and scalable infrastructure.
Veracity:
is the quality and origin of data.
Attributes include consistency, completeness, integrity,
and ambiguity.
Drivers include cost, and the need for traceability.
Variety:
is the idea that data comes from different sources,
machines, people, processes,
both internal and external to organizations.
Attributes include the degree of structure and complexity
and drivers are mobile technologies, social media,
wearable technologies, geo technologies,
video, and many, many more.
And the last V is value.
Let's look at some examples of the V's in action.
Velocity:
Every 60 seconds, hours of footage are uploaded to YouTube.
This amount of data is generated every minute.
So think about how much accumulates over hours, days,
and in years.
Volume:
Every day we create approximately 2.5 quintillion bytes of data.
That's 10 million Blu-ray DVD's every day.
The world population is approximately seven billion people, and the vast majority of people are now using digital devices.
These devices all generate, capture, and store data.
And with more than one device, for example,
mobile devices, desktop computers, laptops, et cetera,
we're seeing even more data being produced.
Variety:
Let's think about the different types of data, text, pictures, and film.
What about sound, health data from wearable devices, and many different types of data from devices connected to the internet of things.
Veracity:
80% of data is considered to be unstructured
and we must devise ways
to produce reliable and accurate insights.
The data must be categorized, analyzed and visualized.
The emerging V is value.
This V refers to our ability and need
to turn data into value.
Value isn't just profit.
It may be medical or social benefits,
or customer, employee, or personal satisfaction.
The main reasons for why people invest time to understand
Big Data is to derive value from it.
What is the Hadoop and why it is considered a great Big Data solution ?
Hadoop is an open-source software framework used to store and process huge amounts of data.
It is implemented in several distinct, specialized modules:
Storage, principally employing the Hadoop File System, or HDFS,
Resource management and scheduling for computational tasks,
Distributed processing programming models based on MapReduce,
Common utilities and software libraries necessary for the entire Hadoop platform.
Hadoop is a framework written in Java, originally developed by Doug Cutting
who named it after his son's toy elephant.
Hadoop uses Google's MapReduce technology as its foundation.
How is Big Data Used?
Companies like Amazon, Netflix and Spotify use algorithms based on big data
to make specific recommendations based on customer preferences and historical behavior.
Personal assistants like Siri on Apple devices use big data to devise answers
to the infinite number of questions end users may ask.
Google now makes recommendations based on the big data on a user's device.
Now that we have an idea of how consumers are using big data, let's take a look at how big data is impacting business.
In 2011, McKinsey & Company said that big data was going to become the key basis of competition supporting new waves of productivity growth and innovation.
In 2013, UPS announced that it was using data from customers, drivers and vehicles in a new route guidance system aimed to save time, money and fuel.
Initiatives like this one support the statement that big data will fundamentally change the way businesses compete and operate.
How Big Data Is Used In Amazon Recommendation Systems To Change Our Lives
Privacy and Security Issues in the Age of Big Data
Big data can enable “invasions of privacy, invasive marketing, decreased civil liberties, and increased state and corporate control”. The amount of information collected on each individual can be processed to provide a surprisingly complete picture. As a result, organizations that own data are legally responsible for the security and the usage policies they apply to their data.
Attempts to anonymous specific data are not successful in protecting privacy because there is so much available that some data can be used as a correlation for identification purposes.
Users' data are also constantly in transit, being accessed by inside users and outside contractors, government agencies, and business partners sharing data for research.
SOLUTION: Privacy, for legal reasons, must be preserved even at the cost – not only monetary but that of system performance. Developing approaches include “differential privacy”, a formal and proven model that comes with a great deal of systems overhead; and an emerging technology known as homomorphic encryption, which allows analytics to work with encrypted data. Older, more standard solutions include encryption of data within the database, access control, and stringent authorization policies. Keeping security patches up to date, another bit of standard wisdom, is important.
An important consideration for implementing privacy policies is that legal requirements vary from country to country, and it is necessary to comply with the policies of the countries where you are active.