In the past few decades, the most trending words being used in the field of Information Technology has been big data & analytics. Whereas the consulting firms and large organizations providing big data professional services are required to adopt new & upcoming skills, in order to get competitive advantage.
The purpose of writing this blog is to inspect how big data analytics has changed from where it began, what lies ahead & how is it going to further shape in the future.
Types of Data Analytics
The basic definition of data analytics is to completely analyze data of a business to produce actionable or meaning-full insights in order to ease decision making & maximize a business’s profit.
Descriptive Analytics presents what has happened in the past in an easy-to-understand and accurate way to better understand the changes that have occurred in a business.
Predictive analytics enables organizations to understand the likelihood of future outcomes & performance, by use of modelling techniques & statistics to make predictions.
Prescriptive analytics is a type of analytics which suggests what should happen next. It suggests best course of actions based on reviewing previous iterations in mind.
Phases of Data Analytics
Use of analytics based on statistics has been existent for decades some historians suspect that Ancient Egypt also used statistics for the construction of pyramids. The main reason behind implementing data analytics or machine learning in businesses was to find more error-free and dependable solution rather than using business trial and error which costed them both time and resources.
Also, the choice of statistics/mathematics and technology stack used were kept simple. In order for easier development, explanation & not to overwhelm business leaders who were not familiar with these technology stack.
The use-cases used earlier were not so complex either as compared to the ones used today as they were more exploratory rather than being predictive. Availability of product and analysts in the market also played an important role.
Stage 1 - The Rise in the Awareness of Business Intelligence
At this stage people got aware of the business intelligence & many organizations started to collect their data whether it be daily transactions, production processes or long-term contracts and stored them into a centralized data warehouse and performed analytics on it.
Many software and companies came to spotlight for data processing, SAS being one of the most popular workhouses at that time for Exploratory Data Analysis (EDA). Some other companies also used SAS and cloud computing for Enterprise data warehousing (EDW) to build data warehouses.
A materialistic advancement began which provided factual information to the business decision makers so that they can make fact-based decisions for their companies. Further digital transformation took place which transformed these enterprise data warehouses into Enterprise Consolidated Data Warehouse (ECDW), with the help of ETL & Business intelligence tools.
Descriptive analytics was carried out during this time which mainly described what happened and why it happened, to improve decision making. Descriptive analytics was mainly delivered through reports, and these were mostly delivered using excel and VBA whereas some large organizations used tools such as Micro strategy and Cognos.
But the main drawback of this stage was that this information was mainly used within an organization, and it only focused on the past patterns and did not provide any prediction based on historical data for trends in the future. Most organizations at that time hired people who had statistics degree and then trained them with the programming skills such as SQL and sometimes even VBA programming to carry out analytics for them.
Stage 2 - Awareness of Data Science and Enriching Data
After realizing the drawbacks of the previous stage many companies started to develop a different form of analytics solutions. They wanted to form a new strategy to access small data from digital sources such as social media, internet, much more and customers reacted well towards it. With this new form of strategy also came the need for new big data technologies, and its role was to generate data from different data sources for large companies which may take advantage from it.
These firms started to invest in analyzing data, which was related to customer experience, products, cloud computing & services featured by these firms; and started to attract heavy traffic to their websites by building statistical models, better search algorithms, targeted ads & customer loyalty.
This resulted in these companies making much more money as compared to other who had no idea how to use their data properly and the field of 'data science' was formed which educated on how to extract actionable insights based on processing big data, scientific and statistical methods.
The job sector was introduced with some new job roles like data engineers, Hadoop administrators and were very important for many IT organizations. These well-performing organizations even went on to develop new technologies which were able to ingest, transform, and process data from data lakes and also integrated predictive analytics and its task was to predict future trends based on analytics rooted through descriptive analytics.
Stage 3 - Data Analytics and its Offerings
After stage 2, there was widespread in data analytics and many companies independent of their market size started to use it for their advantage furthermore, online analytical processing started for the respective company's products and services and have spread into many industries whether it be pharmaceutical or clothing.
Key tools used for Data Analytics
In the early 2010's non-relational databases Apache Hadoop and Spark took up the mantle of distributed processing framework. Hadoop was widely used because it was scalable, flexible and low cost but mainly because of its MapReduce tools like Mahout and Hive but the main drawback of using it was that it collected data from disparate data sources and was limited as compared to relational data base system.
Even though Hadoop was widely being used its limitation affected its efficiency as it used MapReduce functions which had to read and write back data at every step taking up a lot of time when handling iterative data processing, machine learning or even real-time data.
Later to respond this issue Apache came up with Spark which was an in-memory distributed framework which allowed it to perform the whole operation while holding all the data in memory unlike Hadoop, and also made it faster. Developers switched to Spark because it was also compatible with many programming languages such as python, java and much more giving the developers the freedom, they want.
Technologies like R and Python are the top choice of data analysts for data analytics today as they are open source, can be optimized according to your needs thanks to their advanced libraries, are also able to integrate with other platforms like visualization platforms to create visualized reports and much more.
Python is preferred by developers when developing applications because of its programming functionality. Whereas R is used to produce data driven insights for companies using data mining and exploratory analysis. Both tools are quite unique, and it is suspected that are not going anywhere soon and have made their mark in big data industry & data mining.
Meanwhile predictive analytics has also undergone many changes. Now more complex techniques are being used or as we call them "Deep learning" compared to the "Shallow learning" used in previous phase.
These techniques can be used to train models in order to provide different solutions to businesses such as natural language processing (NLP) which also plays a key role when writing blogs & articles. These techniques are able to foresee complex trends while providing higher accuracy as compared to their counterparts in the previous phase.
Many new organizations have also popped out providing analytics via visualization services such as Tableau and QlikView etc. and deriving business value. Whereas many other developers are also shifting to open-source tools for visualization like Angular, Candela and much more as they can be customized based on the need and are also low cost. But the main drawback is that they require developers or data scientists to be implemented.
As more and more industries increased adoption of data analytics, they required individuals who were well adapted with data analysis and machine learning. So that they could use data modeling to combine data with a training model in order to produce precise prediction results.
Hence, a next generation of data-oriented roles emerged such as data analyst, data engineer, data architect, and data scientists who were both computationally and analytically well-versed.
Stage 4 - The Future of Big Data Analytics & Data Scientists
With the ongoing research and development on big data it is clear that large volumes of data are increasing exponentially and making data much "bigger". Hence will come the need of newer and powerful models to handle and access the unstructured data, and to counter this came the need for automation.
Many IoT devices and fully automated solutions have already been developed like chat-bots and Neural Machine Translation which we are already using to complete our sentences! These solutions are here to stay especially those who are able to improve decision making and recommend prior action themselves.
Along with these fully automated solutions many new technologies have also emerged such as Google's Go and Julia programming languages. They provide better execution speeds and have single threaded system; but are not being used mainly because they need to be more developed and have improved libraries as compared to their counterparts in previous phase such as Scala and Python.
As discussed, AI is being implemented in many applications such as image recognition etc., and with further evolution we will see more improved and adopted versions of these applications. Likewise big data visualization will also implement AI building more interactive visualizations to deliver cognitive analytics.
From collecting information stored on punched cards using tabulating machine to AI workbenches automatically identifying and running the best model for business goals, has been a long road.
It is quite amazing how evolution of data analytics has taken place in the previous decades and with this exponential rate where it will take us in the next phase. Some people are afraid that this will reduce human workforce and artificial intelligence will take over completely.
But if we look at this journey, technology have made our lives easier and have made drastic improvements in various industries. In my opinion the integration of artificial intelligence in data analytics will be fruitful and will not replace human workforce but will make their work easier.