The term “Big Data” is being used a lot these days. People from IT industry may know a thing or two about it
but it sounds like some complex out of the world term for normal non-tech people.
So what exactly is Big Data? – I’ll try to explain its meaning and applications in simpler terms.
Big Data is a collection of data so large that it cannot be managed by traditional database management tools and techniques. Although it seems like a new concept, big data has been around since 2001. The challenge in managing big data lies in the 3 V’s concept originally coined by Doug Laney – Volume, Velocity, Variety.
A lot of data is being generated these days by companies through customer and corporate transactions, social media, videos, web traffic analysis etc.; and they need a tool to put these data to readable form for business planning and forecasting. The days of storing data in MS- Office application are long gone for large companies.
Its not necessary that big data concept can be used only in large corporations, small and medium sized companies can also make use
of big data analysis for their benefit.
50 GB of data used to look like a giant set of information in the early 2000’s, but now its just a fraction of the 1000’s of tera bytes of
data being generated on monthly basis. How is it possible to read and make sense of such huge set of information? That’s where the interpretive process of big data analysis comes into picture.
Interpretation of big data is what makes it a unique concept which traditional database techniques would find impossible to achieve.
Apache Hadoop is one such technology which takes on the behemoth task of interpreting large volumes of data. Hadoop has come to be associated as the most commonly preferred software when it comes to big data. Hadoop involves data storage and data processing. Both of these occur in a distributed fashion to improve efficiency and results. A set of tasks known as MapReduce coordinates the processing of data in different segments of the cluster then breaks down the results to more manageable chunks which are summarized.
The reason hadoop is so popular is because its open source and available for free. Many companies have released their
own version of hadoop for a premium too. Using hadoop and big data requires experts who are well versed in mathematics and statistics and who understand business in which the application is used. The job market for big data analyst is booming and is slated for 500% increase by year end. As more and more data is going to be generated we are going to need more such applications and analysts to work on it.