Can Machine Learning Prevent App Downtime?

Business users expect immediate access to data, all the time and without interruption. But reality does not always meet expectations. IT leaders must constantly perform intricate forensic work to unravel the maze of issues that impact data delivery to applications. This performance gap between the data and the application creates a bottleneck. This impacts productivity and ultimately damages a business’ ability to operate effectively.

Storage – including hardware and software issues – is normally the first suspect when identifying the culprit for this performance gap. But problems can also result from issues with configuration, interoperability, and not using best practice.

Point monitoring and troubleshooting solutions will typically only show a slice of the problem. Or how a problem relates to only one part of the infrastructure stack. This is where predictive analytics techniques that have visibility across the entire stack can effectively identify issues no matter where they originate.

Infrastructure solutions should utilise data science and machine learning

To boost performance and significantly reduce chances of downtime in the environment, companies should change how they evaluate key infrastructure products. Evaluating solutions based solely on speeds and feeds or price is no longer adequate. Nor is it sufficient to rely on traditional models of infrastructure reliability and high availability, which primarily rely on redundancy of each component but do little to ensure that all components interoperate correctly.

Companies should instead validate solutions that utilise machine learning and predictive analytics to perform the following capabilities:

  • Downtime prediction
    Infrastructure must be able to predict potential causes of slowness and downtime well before they occur.
  • Downtime automatic prevention
    Once predicted, tools should be able to prevent the adverse situation automatically through machine learning. Traditional infrastructure comes with reactive monitoring, which provides little relief other than flagging the problem.
  • Prescriptive resolution
    For the rare occasion where the infrastructure cannot automatically prevent an issue, it should lead to a clear and prescriptive resolution. The days of looking up online forums and calling support to help resolve issues, together with the long delays involved, are over. It leads to loss of productivity and significantly slows downtime to resolution.
  • Rapid root-cause analysis
    For rare occasions where no automatic prescription is available, it should rapidly identify the root cause so that the problem can be quickly resolved. Traditional root cause analysis involves numerous cycles of troubleshooting, problem recreation, capturing of logs, engineering analysis, and weeks of time and frustration.
  • Cross-stack application of analytics
    The predictive analytics capability should include the knowledge and the ability to collect information across the infrastructure stack. If a product is not analysing interactions across the ecosystem, it is missing out on a big part of the picture and a major cause of the app-data gap.
  • Analytics-driven tech support
    Advanced analytics are able to eliminate the need for frontline, level-1 and level-2 support engineers. Frontline engineers spend most of their time documenting the issue, collecting data, and performing initial triage. All of which can be automated through predictive analytics. For the small percentage of problems that require an engineer, a customer can immediately reach a level-3 engineer who has precollected telemetry to rapidly resolve even the most complex issue.
  • Measured availability metrics
    This should not be a theoretical availability number based on a system’s design. Rather it should be measured in real-world environments across an entire customer base.

Data science and machine learning, when used together in a predictive analytics solution, improve performance and availability of applications by closing the performance gap. Employing leading edge machine learning technologies to manage the infrastructure makes the business more productive. And it also frees up IT to partner with the business on high value-added initiatives.

We partner with Hewlett Packard Enterprise, one of the few vendors truly innovating with new technologies that can help with Big Data.

‘Can Machine Learning Prevent App Downtime? excerpts taken from the Hewlett Packard Enterprise whitepaper ‘Can Machine Learning Prevent Application Downtime’.

Download Whitepaper

Nimble Storage uncovers the true cause of application disruptions and slowdowns through installed-based learning. Find out more by downloading the Whitepaper.

Download Whitepaper