More Intelligence for Faster Operational Intelligence Resolutions

Author


 

Karthikeyan Paramasivam

Lead – Analytics , Opus Consulting

linkedin

New smart devices and fast-Internet speeds have raised user expectations. No one wants to wait even a few seconds for their device-screen to load or refresh. Users also expect that any information technology (IT) issue interfering with their access to online services and transactions will be addressed immediately; otherwise they will jump to a competitor to obtain better service — preferably reliable, 24 x 7 service. There are many monitoring tools available for detecting operational issues, but most provide insight only for the one system they are designed to monitor. However, you can achieve a holistic view of your e-commerce operations that can reduce the time to resolution (TTR) as described here.

What is Operational Intelligence?

Operational Intelligence provides real-time insight into different functions of an application system that helps IT staff take immediate action to keep the business running. Recent advances in computer technology have enabled the development of this new field of IT-support services. It differs from Business Intelligence (BI), as the BI reports generated in real-time are based on historic data. Examples of applications for operational Intelligence reports include customer experience management, fraud detection and prevention, application-systems monitoring, and process visibility.

Weakness of Current Monitoring Systems

The diagram in Figure 1 shows the component IT systems used to support a simple financial transaction. The software application is built over the systems webserver, application server, database, firewall, load balancers etc. Different tools monitor the health of these systems and they, in turn, record the related data. To resolve any business issue that occurs in this application, we need to check each of these systems with their respective tools.

For instance, let’s say we use a tool to monitor the server health. A high- priority alert is sent when the CPU usage rises above 95%. The IT support team can then launch the tool to see what the trend of CPU usage is and how the server is performing at that moment. It cannot, however, provide any insight on what aspect of the business is causing that CPU usage to exceed 95%. To answer this question, we need a holistic view of the application.

In order to get the holistic view, the data from all the component application systems must be collected so that we can build a dashboard that simultaneously displays trend reports for the status of each of the systems. This is now possible using tools like Elasticsearch, logstash and Kibana from Elasticsearch BV, splunk from Splunk, Inc. and Storm from Liquid Web, Inc.

The Operational Intelligence Advantage

Let’s now examine the advantage that operational intelligence technology can bring to the monitoring of this software application. If CPU usage in one system triggers an alert — instead of taking any immediate action — we could view a “business situation report” that shows an increase in users of the registration process elsewhere in the application system. We can see the correlation between the business situation and the higher CPU usage, revealing that the server behavior is normal. We can subsequently keep an eye on operations until the system returns to its “normal user-load” state.

Or let’s assume the application was very slow to launch and there was no alert from any of the monitoring tools. The number of users seems to be normal. To trouble shoot, the technical team started with the webserver – its performance was normal. The team then checked the application server – the JVM and CPU memory seemed to be normal, and the database connections and server were also operating normally. By this time, the team would have spent 2 to 3 hours without uncovering the source of the issue. Eventually, the source of the problem was determined to be an infinite redirect loop in LDAP that slowed down the authentication step, preventing the users from launching the application.

In either scenario, had there been a dashboard which showed real-time metrics from all these systems simultaneously, we could have identified the faulty system very quickly, and used the system-specific tools to isolate and resolve the issue.

An End-to-End Operational Intelligence Solution

While there are a few companies that provide tools to monitor all systems, purchasing their complete suite is not cost-effective for many firms. In addition, the tools in a single-vendor suite may not all be the most effective monitoring tools available. Experience has led us to the conclusion that selecting system-specific tools and using new technologies like Splunk, Elasticsearch, etc. to integrate the related log files, databases and processing queues creates an effective Operational Intelligence solution tailored to the needs of the business.

Elasticsearch and Kibana are trademarks of Elasticsearch BV, registered in the U.S. and in other countries; logstash is a trademark of Elasticsearch BV. Splunk is a registered trademark of Splunk, Inc.

Recent Posts

Leave a Comment

Start typing and press Enter to search

Payment Blog (14)hero_mobilewallet