You are here

You are here

5 AIOps advances that might just rock your world

public://pictures/davidl.jpg
David Linthicum Chief Cloud Strategy Officer, Deloitte Consulting
 

AIOps remains in the pilot phase within many organizations that have yet to determine what it can offer. The issue most face is how to find the time and money to select an AIOps technology stack that will succeed the first time. The technology itself is scary. 

However, reports of AIOps' possibilities and accomplishments continue to generate cautious optimism and a growing sense of urgency. It's no longer a question of if you'll move to AIOps; it's question of when

We can trace AIOps' recent growth to two key drivers. First, as enterprises move to complex architectures such as multi-cloud, they eventually hit a wall when they can't manage the increasing number of heterogeneous systems with the same resources and budget. Second, there's increasing paranoia around security, plus dawning knowledge that today's security operations need more proactive monitoring and responses. 

Most existing monitoring tools have begun to adopt AIOps features. This means the monitoring tool you’ve used since the '80s is suddenly an AIOps tool. You already pay for its features and functions, so you might as well leverage them. Cloud monitoring tools that leveraged AIOps from the start are becoming more strategically focused, with many beginning to connect to more traditional platforms as well as cloud-based systems.

Today many tech managers and staff feel the pressure to make a call about AIOps in their enterprise. This is what I call a forced march to AIOps. If you have been forced on this march, it's time to learn about the pragmatic use of AIOps technology so you can make fair and informed recommendations. This new knowledge should include an understanding of current AIOps technology and what it offers and, more important, the technology's short- and long-term outlooks. This education will become more critical as many enterprises find their way to AIOps in 2022 and 2023. 

Here are four advances in AIOps that will happen in 2022—plus one that's probably further off—and how your CloudOps team could benefit. 

1. Better SecOps integration

While this is a no-brainer for most people who do general and security operations, most AIOps technology tools do not yet focus on how to leverage an AIOps tool to provide better security

Many enterprises leverage their AIOps tool's automation systems as well as their connectivity APIs to form new roles for AIOps. For example, a common new role is to provide visibility into systems' telemetry, such as network performance, I/O issues, database operations, or any other data that can indicate if a system is likely under attack. 

Too many users and/or operations teams discover security breaches when something occurs that is outside the norm. This might include CPU and I/O saturation as the result of a malicious binary that resides on some computer, or even on IoT-based systems such as appliances and vehicles. In the past, this resulted in a phone call to the security team, which did battle with the attacking systems. 

A better approach is to be proactive. Proactive measures require that security systems have visibility into more than the verification systems, including key metrics that may indicate an attack. AIOps has the potential to provide those metrics in real time

Many enterprises will use AIOps tools to integrate with existing security systems to provide this visibility, and those on the leading edge now demand that this feature be included on tool developers' critical path. In addition to identifying the breach, in 2022 you can also look forward to AIOps tools that will carry out an automated fix. 

2. Performance operations improvements

Performance operations generally include activities to monitor the overall performance of systems such as CPUs, storage, databases, and applications. In the past, ops teams would get complaints from users about slow response times and then react. These days, it's more about spotting problems before users feel them and then fixing the issues using manual or automated processes

Most AIOps tools can spot performance issues using a pre-defined set of limits that alert ops teams when an issue exists. However, most tools don't provide pre-defined processes as to how each performance issue should be corrected. In the world of operations, there could be hundreds of possible problems and fixes.

For example, say the database server's response time dropped from 0.3 seconds to 0.9 seconds. Because that time falls outside of its threshold, the AIOps tool generates an alert about the performance problem.

But now what? There could be hundreds of reasons that the performance of the database server dropped, both internal to the server and external, such as network issues. Finding that issue is the hard nut to crack for most AIOps technology providers and/or users who leverage automation to create processes that identify problems. 

In 2022, that conundrum will result in improvements around how to find performance issues and, more importantly, how to determine a root cause and automatically fix the problem. This is where the value of AIOps tools will truly shine. 

Event-driven operations already include alerting to system issues. Soon they will include the ability to carry out a sophisticated set of diagnostics to determine the cause and deploy an automated fix. The ops team will not even be involved with these processes. 

3. Auto-interface updates and fixes

If there is one thing that goes wrong with most old and new AIOps systems, it's the interface that deals with all systems under management. Sometimes these interfaces are cryptic APIs or out-of-the-box adapters that translate the differences between the systems that are part of the operating collection into how the AIOps tools deal with them. Or they might have the ability to deal with all kinds of system components, including cloud and non-cloud components, databases, applications, etc., in the same way.

When something goes wrong with the interface, AIOps users are typically forced to stop systems or update adapters or other interfaces to solve problems that are mostly caused when platforms automatically install fixes and improvements that end up breaking the adapters and interfaces, for one reason or another. It's ironic that an AIOps tool, which can provide self-healing capabilities, can't yet self-heal. 

Many changes in 2022 will focus on the automation of updates and fixes to these interfaces. AIOps users and ops teams will no longer need to deal with the hundreds of interfaces the AIOps systems leverage. Those pain-in-the-neck tasks that made the ops team members' jobs even harder will go away. 

4. New governance integration

News flash: Most AIOps tools are not governance-aware. Some AIOps tools interact with security systems that allow or deny users access to things. However, governance is the ability to monitor how users and applications consume resources and to place limits on their usage. 

For example, there are governance systems that place limits on the use of APIs or the use of data or that enforce policies around regulatory compliance. Governance systems can monitor anything where systems and humans want to use something, but you want them to do so in pre-defined ways and include limitations on what they can do. 

One of the more common examples of this is cost governance, which is sometimes called financial operations or FinOps. Here is where a monitoring system, typically public cloud systems, can provide usage-based billing by user, division, company, and so forth. There are any number of ways a business might want to track cost consumption for both cloud and non-cloud systems.

AIOps tools will need to talk to governance systems because there could be operational aspects as to how things are governed. In the case of cost governance, outages and performance issues may also play a role in tracking. If a system is paid for by using a flat rate, subscription-based pricing approach and that system is down 30% of the time, then it may be fair to credit that percentage back to the users, the department, and/or the company for the downtime that is reported by the AIOps tool. 

Other values of integration with governance tools include the ability to report API issues or to let the data governance systems know when databases are down or have other issues. 

5. Nested, specialized knowledge bases

While knowledge bases are a net-new feature for most AIOps players, leveraging a single generalized knowledge base is just not good enough for most enterprises. What's on the horizon is the concept of nested knowledge bases, or general repositories that can leverage other specialized information for a more structured approach to finding and solving operational problems. 

Let's use the poorly performing database server from the earlier example. The AIOps tool may leverage a generalized knowledge base to determine that a performance problem exists by using both user-defined thresholds and a larger set of stored experiences that provide the "knowledge" to the AIOps tool. 

A separate knowledge base can store and retrieve information about a specific system such as a networking switch, cloud-based storage systems, applications, or, in this case, a poorly performing database server. 

The idea is to determine the root cause of issues that are related to specific system components and store complex knowledge related to each component. In the case of our poorly performing database server, the knowledge base could provide the experiences showing that most of the performance problems for that specific database server are related to the indexing system and to look there first for a root cause of the performance problem. 

Perhaps it could automatically download and install a fix that nobody bothered to do yet. Or it could be one of 400 other things the system needs to consider. 

While this level of AIOps is a bit of science fiction for now, most of those who leverage AIOps systems want (and soon the general market will demand) tools that provide general knowledge around operations, as well as the ability to focus on specific components. AIOps tool providers could sell these specialized knowledge bases as an option so you don’t have to build these repositories or replicate work that someone else has already done. 

More knowledge is good

Any one of the five advances reviewed above could have a profound effect on the ways you design and implement AIOps within your enterprise. Users and management should understand what is likely to happen with AIOps in the near term, no matter if you've deployed your AIOps tool kit yet or not. It's good to know what's coming.

Keep learning

Read more articles about: Enterprise ITIT Ops