In our world today, people are constantly generating different varieties and large quantities of data. When customers make use of a company’s service, their service requests generate data about their needs and preferences, as well as their quality requirements. For companies that provide virtual services, customers have to be constantly linked through their devices in order to know their location and other data that are necessary to make service always available, running and ready to use.
Organizations have recognized the potential value of this data and are acquiring the technologies, recruiting skilled manpower, and performing other processes required to tap into the opportunities. They can make use of this big data to create, stimulate and improve a wide range of services. These include: designing more competitive service offers and packages, seamless communicating with customers about their usage, determining the areas of service that need improvement, and swift rectification of reported problems. All these activities improve user experience and increase customer satisfaction and provision of smarter services. Therefore, the key to deriving value from big data is the use of analytics. Just collecting and storing data creates little value. It is only when data is analyzed and the results are applied by the company is when the data can be used to generate value.
In the past, people were hindered from benefiting from the value of big data because of its sheer volume, velocity and variety were simply overwhelming. Though there are many techniques of analyzing data such as machine learning, simulation, and regression analysis have been available, what is new is the advances in computer technology and better software, new sources of data (e.g. social media), and new business opportunities. The capability to store large amounts of data wasn’t a problem. The ability to do or get something meaningful out of the data quickly and cost effectively is the crux. What we are dealing with now is the exploration of new techniques for analyzing those large data sets. Data processing challenges have now largely been solved with the introduction of easily obtained data processing and management tools.
However, these tools are simply tools for data processing and management. The real value of the data comes from knowing which combination of the multiple data elements will produce the desired insights and predictions. This is where deep analytics skills and operational expertise are of paramount value. Like a picture puzzle, it is only when these key variables are connected and linked together can the necessary insights such as customer behavior, product performance and feedback about newly launched products, be gained.
What Is Big Data?
Data is said to be big when its volume, velocity, or variety exceeds the storage and processing capacity that relational database management systems (RDBMSs) can handle. Though many companies have the equipment to handle large quantities of structured data, they do not have the ability to mine or dig out meaningful insights and previews from big data. Also, the volume and speed at which data is being generated is too fast for traditional analytics to process. This necessitates the need for new and faster types of data processing and analytic tools and solutions.
Another perspective to distinguish big data is to characterize it according to the three Vs, which are:
- Volume: Big data comes in large amount or quantity.
- Velocity: This is the rate at which data is created. Vast amount of data can be created during a particular period or events.
- Variety: There are different types of data. This includes text, images, video and audio.
One might be tempted to say that big data is large and different kinds of data but any size or numerical definition that is given to big data is likely to change over time as we collect, store, and analyze more and more data
Competitive Advantage in Data Analytics
Modern data analytics delivers competitive advantage in two major ways compared to the traditional data analytics model. Firstly, modern data analytics uses a simple model which is efficiently applied to volumes of data that would be too large for traditional computer systems. A simple algorithm that is applied to a large volume of data is often easier to manage and more accurate than applying a sophisticated algorithm. The algorithm is not the competitive advantage here. The ability to apply it to huge amounts of data without compromising performance and getting reliable results is the real competitive advantage.
Second, data analytics refers to the sophistication of the system itself. More analysis algorithms are being provided directly by Database Management System (DBMS) manufacturers. A company’s ability to process and analyze large amounts of data is limited according to the sophistication of the system they are using. To overcome and rise above the vendor equipment limits, companies must go well beyond what is provided and innovate by experimenting with newer methods that can handle more data analysis.
In short, the main goal of Big Data analytics is to optimize computation speed, avoid data sampling, reduce the need for data transfer and replication and finally, apply analysis as close as possible to the data.
Big Data Analysis Requirements
a. Minimize Data Movement:
In traditional computer processing, data is inputted to the computer through various means, processed, and then sent out to the next destination for further processing or utilization. However, as the volume of data expands, this type of data processing model becomes increasingly less efficient because transferring so much input and output data around can be very stressful. It therefore makes more sense to store and process the data all in the same location.
With new sources and forms of data being churned out lately, new skills are required to analyze the data. Sometimes the existing workforce that is skilled in handling data can be used to run the analysis. But when the required set of skills is lacking, training existing manpower, hiring new workers and acquiring new equipment with better processing power will address the problem.
c. Data Security:
This is also essential for many corporate applications. Data warehouse users are accustomed to adhering to a reliable set of administration policies and security control measures. These rigorous checks are often lacking with unstructured and open source data analysis tools. Close attention should be paid to the security and data management requirements of each data analysis project.
Types of Data Analytics
On its own, stored data does not generate any value. This is true of databases, data warehouses, and the new systems for storing big data. However, when the data is appropriately stored, it can be processed and analyzed to create value.
There are four distinguishable types of data analytics with each having different implications for the type of data, technologies and architectures they’re used on. Some types of analytics are better applied on some data or technological platforms than on others.
a. Descriptive Analytics:
This is used to review events or transactions that have happened. Examples include dashboards/scorecards, reporting, and data visualization. Descriptive analytics tend to be backward looking, i.e. they ‘look back’ and reveal what has happened.
b. Predictive Analytics:
These attempts to predict events will occur in the future. Examples include machine learning, regression analysis, and neural networks. Recently, software products like SAS Enterprise Miner have made predictive analysis much easier. Marketing is the main target for predictive analytics applications. The goal here is to better understand customers’ needs and preferences.
c. Discovery Analytics:
These are used to find patterns or relationships in big data that were previously not known clearly. The ability to mine and analyze new big data sources creates additional opportunities for companies that have large amounts of customer data. It can be used to identify patterns of events or activities that foretell customer actions, e.g. closing a bank account. When a company can predict customer behavior, it can take remedial actions to change the anticipated behavior if it is unfavorable.
d. Prescriptive Analytics:
This is used to identify the best course of action or the best decisions one can make. Prescriptive analytics can be used to identify optimal solutions, to help make the best choice out of many options, and for the allocation of scarce resources.
Steps to Analyzing Data
When we make use of Online Analytical Processing (OLAP) tools to generate sales forecasts or SQL queries to check financial numbers, we know beforehand what kind of data we have and what it can generate. Data analysis involves “digging gold” out of large volumes of raw varied data.
In many cases, before starting out to analyze any data, one first has to find out the basic indices of the data and how different sets of data relate to each other. One must figure these out through a process of discovery and exploration.
It is not advisable to process all types or handle multiple sources of data simultaneously because establishing the actual relationship between various sources/types of data is often unpredictable. Handling random types of data sometimes leads down a path that turns out to be a dead end. It is advisable to start with small, well-defined projects. One can learn from each iteration and gradually move on to the next type/source of data or field of inquiry.
c. Flexible Schedule:
Because of the tasking nature of big data analysis, one should be prepared to spend more time and utilize more resources to solve potential knotty challenges that may come up.
d. Decision Management:
The transaction volume and velocity of data should be considered before starting any data analysis. If one plans to use big data analytics to drive various operational decisions (such as running a dynamic website or informing client companies about their customer habits and behavioral trends), it will be better to automate and optimize the implementation of all those actions.
Big data analysis is not always plain black and white. One can’t always know how the various data elements relate to each other. As data is being mined to discover patterns and relationships, predictive analytics can yield the insights that are needed.
Tools & Systems for Processing Data
Below are some tools and systems used for processing and analyzing big data.
Apache’s Hadoop is used for pre-processing data before using advanced forms of analytics in order to identity macro trends or find pieces of valuable information. The pieces of information gained are then stitched together to get the desired result. It has helped immensely in unlocking potential value from new data using inexpensive servers.
b. Revolution Analytics (R):
Microsoft Revolution Analytics’ product, ScaleR is used to process big data through XDF, a high performance data warehouse/store with high processing performance across large clusters of data. ScaleR is open source and it can also be combined with MapReduce, Apache’s Hadoop and some other distributed file systems.
c. Map Reduce:
Map Reduce is used to divide large clusters of data between various systems to process in a fault tolerance manner. The results of data processed by the ‘reducers’ can be combined to create a final report which is written to the HDFS. Failed or aborted tasks can be re-processed again if desired. Scheduling, monitoring, and other system resources management are automatically synchronized by the system.
d. Iterative Processing:
As the name implies, iterative processing is the process of subjecting a set of data to repeated analysis/processing in order to get the best value out of the data. The iteration could be done using a set of algorithms. When all possible information has been gotten out of the data, the results are sorted according to relevance.
e. Batch Processing:
Large chunks of data are sorted according to set parameters and then processed in various batches. Hadoop is an example of batch oriented processing where jobs are queued and then processed.
Requirements for Data Analytics
a. Business Case:
Data processing projects should be business driven rather than technology driven. They should address a business need such as creating new business ideas, seizing discovered opportunities and solving a problem.
b. Strong, Committed Sponsorship:
Most medium to large ICT related projects require committed sponsorship. It is difficult to startup and run IT projects, including big data analytics projects, without strong cash backup due to the high cost of equipment and hiring specialist manpower to run the project. Before starting any ICT project, ensure you have a robust and committed backup before starting out in order to avoid frustrations.
c. Alignment with Business:
It is imperative that data analytics projects support the business line of the company. Without analytics as an enabler, modern business operations can hardly succeed. A good example is online retail stores that use data analytics to study their customers’ most preferred purchases and the price ranges that sell most. This will help the company adjust their stocking and pricing policies in line with customer preferences.
d. Reliable ICT/ Data Infrastructure:
Some business owners sometimes assume it is a waste to spend much on advanced ICT infrastructure because they don’t really drive sales. They prefer to focus on marketing instead. This assumption is very shallow because without a reliable ICT/database infrastructure, business operations will be slow and disorganized which will eventually result in the loss of present/potential customers to competitors that have reliable ICT infrastructure. The cost of acquiring servers with multiple CPUs that can process large amounts of data is getting more affordable for small businesses.
With the introduction of innovative ICT services such as Infrastructure as a Service (Iaas), Software as a Service (SaaS), Platform as a Service (PaaS), etc., businesses can outsource their ICT requirements to specialized service providers at very affordable rates.
Skilled Data Analytics Manpower
The final requirement for being successful with big data analytics is to have people with the necessary skills. Business decision makers use data-related information through detailed reports, OLAP, and data visualization tools such as dashboards/scorecards. They are mainly information consumers; they do not process or create the information. Therefore, it is imperative to have data scientists at the back end and manpower skilled at analyzing big data at the front end.
Data scientists at the backend are the highly skilled and experienced professionals whose main job is to discover new insights in big data, to discover patterns and relationships in meaningless data and turn these discoveries into valuable information that creates value for the organization. They require different types of tools and resources such as a robust data warehouse (e.g. RDBMS, Hadoop), write code (e.g. Python, Java, R), access the data through SQL/ Hive, and finally, analyze it.
Data scientists are mostly just ‘tech guys’ who like to solve difficult problems. They have advanced degrees in analytical fields such as statistics, computer science, and mathematics. As a result, they are hardly business savvy.
To make up for this deficiency, data scientists are paired to work closely with data analysts who have business sense. Business analysts work in business units such as sales/marketing. Both data scientists and business analysts are producers of information rather than information consumers. They prepare the final reports that the management board uses to make final policy decisions.
One of the most controversial issues with big data analytics is the issue of individual privacy. People have very little knowledge about how their data is captured and neither do they know how organizations are using their data. What data should organizations be allowed to collect and what checks should be in place to prevent inappropriate use of people’s data? The extent to which big data analytics steers privacy challenges varies, depending on the kind of data that is being captured and on the unknown use cases of the data itself. This makes some people feel very uneasy while other people see no problem with the use of big data, because it results in better customer service and appealing products/service offers.
While ICT security challenges can often be addressed with well-defined and established rules, the challenges associated with privacy often have different aspects and so requires new measures to protect privacy and addresses issues such as misuse of personal data as well as development of privacy-enhancing technologies due to the rapidly changing nature of data usage.
Organizations are gaining unprecedented insights into customers and operations as a result of the ability to analyze new data sources and large volumes of highly varied data. The results and value generated brings more context and insight to business decision making. When properly captured, stored and analyzed, big data can provide unique insights into customer trends, improve business operations, lower costs, maximize profits and help make important business decisions.
However, it is not easy to run big data analytics projects because there are specific requirements that must be met. It is better to start with specific well defined business objectives. There must be committed sponsorship to support the operations as well as reliable ICT infrastructure to enable smooth operations.
Big data has created a new variety of data management technologies, platforms, and systems. These can be blended with traditional platforms in a way that meets organizational specifications and cost effectively. The analysis of big data requires workbenches like SAS Enterprise Miner, traditional analytical tools like SQL, and data analysis languages like R. Careful consideration needs to be given to the skills requirements, experiences, and opinion of the personnel, how the analytics project will be turned into a successful business model, etc. There should be a fact-based operations culture and final decisions made after iterative experimentation to see what works best.
- Big Data Analytics – Actionable Insights For The Communication Service Provider; Ericsson White Paper, Uen 288 23-3211 Rev B | October 2015.
- Revolution Analytics White Paper: Advanced ‘Big Data’ Analytics with R and Hadoop; Revolution Analytics2011.
- Robin Bloor: Big Data Analytics – This Time It’s Personal; The Bloor Group White Paper.
- Watson, Hugh J: Tutorial: Big Data Analytics: Concepts, Technologies, and Applications; Communications of the Association for Information Systems: Vol. 34, Article 65, 2014. Available at: http://aisel.aisnet.org/cais/vol34/iss1/65.