Big Data has always been a hot topic and in 2014 it came to play. ‘Big Data’ has developed, evolved and matured to give significant value to ‘Business Intelligence’. However there is so much more to big data than meets the eye. Understanding enormous amounts of unstructured data is not easy by any means; yet once that data is analysed and understood, organisations have started to value its importance and need. ‘Big data’ has helped create a number of opportunities which range from new platforms, tools, technologies; to improved economic performances in different industries; through development of specialist skills, job creation and business growth. Let’s do a quick recap of 2014 and on what Big Data has offered to the tech world from the perspective of a tech publisher.
The term ‘Data Science’ has been around for sometime admittedly, yet in 2014 it received a lot more attention thanks to the demands created by ‘Big Data’. Looking at Data Science from a tech publisher’s point of view, it’s a concept which has rapidly been adopted with potential for greater levels of investment and growth. To address the needs of Big data, Data science has been split into four key categories, which are; Data mining, Data analysis, Data visualization and Machine learning. Equally we have important topics which fit inbetween those such as: Data cleaning (Munging) which I believe takes up majority of a data scientist time.
The rise in jobs for data scientists has exploded in recent times and will continue to do so, according to global management firm McKinsey & Company there will be a shortage of 140,000 to 190,000 data scientists due to the continued rise of ‘big data’ and also has been described as the ‘Sexiest job of 21st century’.
Real time Analytics
The competitive battle in Big data throughout 2014 was focused around how fast data could be streamed to achieve real time performance. Real-time analytics most important feature is gaining instant access and querying data as soon as it comes through.
The concept is applicable to different industries and supports the growth of new technologies and ideas. Live analytics are more valuable to social media sites and marketers in order to provide actionable intelligence. Likewise Real time data is becoming increasing important with the phenomenon known as ‘the internet of things’. The ability to make decisions instantly and plan outcome in real time is possible now than before; thanks to development of technologies like Spark and Storm and NoSQL databases like the Apache Cassandra, enable organisations to rapidly retrieve data and allow fault tolerant performance.
Machine learning (Ml) became the new black and is in constant demand by many organisations especially new startups. However even though Machine learning is gaining adoption and improved appreciation of its value; the concept Deep Learning seems to be the one that’s really pushed on in 2014. Now granted both Ml and Deep learning might have been around for some time, we are looking at the topics in terms of current popularity levels and adoption in tech publishing.
Deep learning is a subset of machine learning which refers to the use of artificial neural networks composed of many layers. The idea is based around a complex set of techniques for finding information to generate greater accuracy of data and results. The value gained from Deep learning is the information (from hierarchical data models) helps AI machines move towards greater efficiency and accuracy that learn to recognize and extract information by themselves and unsupervised!
The popularity around Deep learning has seen large organisations invest heavily, such as: Googles acquisition of Deepmind for $400 million and Twitter’s purchase of Madbits, they are just few of the high profile investments amongst many, watch this space in 2015!
New Hadoop and Data platforms
Hadoop best associated with big data has adopted and changed its batch processing techniques from MapReduce to what’s better known as YARN towards the end of 2013 with Hadoop V2. MapReduce demonstrated the value and benefits of large scale, distributed processing. However as big data demands increased and more flexibility, multiple data models and visual tool became a requirement, Hadoop introduced Yarn to address these problems.
YARN stands for ‘Yet-Another-Resource-Negotiator’. In 2014, the emergence and adoption of Yarn allows users to carryout multiple workloads such as: streaming, real-time, generic distributed applications of any kind (Yarn handles and supervises their execution!) alongside the MapReduce models. The biggest trend I’ve seen with the change in Hadoop in 2014 would be the transition from MapReduce to YARN. The real value in big data and data platforms are the analytics, and in my opinion that would be the primary point of focus and improvement in 2015.
Rise of NoSQL
NoSQL also interpreted as ‘Not Only SQL’ has exploded with a wide variety of databases coming to maturity in 2014. NoSQL databases have grown in popularity thanks to big data. There are many ways to look at data stored, but it is very difficult to process, manage, store and query huge sets of messy, complex and unstructured data. Traditional SQL systems just wouldn’t allow that, so NoSQL was created to offer a way to look at data with no restrictive schemas. The emergence of ‘Graph’, ‘Document’, ‘Wide column’ and ‘Key value store’ databases have showed no slowdown and the growth continues to attract a higher level of adoption. However NoSQL seems to be taking shape and settling on a few major players such as: Neo4j, MongoDB, Cassandra etc, whatever 2015 brings, I am sure it would be faster, bigger and better!