I Don’t Know What I Don’t Know

I Don’t Know What I Don’t Know

What happens when the IT person doesn’t know where the source of a system failure is? What happens when that same IT person has to go and explain to an executive that the source of the problem cannot be found quickly, due to an organically-evolved legacy system or diagnostic tools that cannot reveal what is truly happening under the hood? What kind of answer can be given when the boss asks why we didn’t see this coming?

Such scenarios are enough to make most IT managers wake up at 3 a.m. in a cold sweat.

Minutes count, but seconds count even more. A sudden surge in social media activity for example, might place greater demand on a system which, if properly balanced and prepared, would likely be able to handle the increased load. But if a monitoring system is slow, outdated, or vague, or if it does not provide alerts in advance – based on current needs and past performance – then the crash comes first and frenzied diagnosis must follow afterwards.

(Image source: Shutterstock.com)

This is not a healthy situation for either humans or machines, which is why many admins prefer to maintain an active state of overall awareness, using a combination of monitoring and pattern-catching technologies to identify and resolve problems before they turn into SNAFUs.

This approach sounds like common sense, of course, but many admins discover too late that the monitoring systems they have in place are not up to the task of delivering real-time, proactive information. Furthermore, they discover that upgrades would prove too costly per machine.

So instead, the IT department resorts to overprovisioning, building in extra space or load veering capacity just-in-case, with resultant increases in cost either being passed on to the customer or merely eating into the profits. (The concept of overprovisioning is covered in an earlier CloudTweaks post.)

According to Eric Anderson, CTO and co-founder of Austin, TX-based CopperEgg www.copperegg.com, the tools that most organizations currently have are not built for the dynamic nature of cloud and the new way developers are building applications. He states “admins need to be ready for a new wave of application architectures that do not fit well with traditional monitoring and performance tools.”

Specifically, Anderson refers to granularity. “Admins are feeling more pressure to see more fine-grained visibility into the applications they are keeping alive,” he says, “and their systems are becoming more complex, driving them to need higher frequency and more detailed tools to bring clarity when there are application level issues.”

Anderson pointed out how one of the best things he had ever done on this frontier was to show a client a screenshot contrasting the differences between 5 minute, 1 minute and 5 second granularity in monitoring: “the key is that with finer grained resolution you don’t miss the details.” He points to an example in which at 1 minute, a 20-second spike in CPU will get washed out to a little bump on everyone else’s monitoring system, but with CopperEgg, at 5 seconds, you see it. “Let me ask,” he says, “if you sit at a website for 5 seconds and nothing happens, do you wait? I don’t. I bounce. So why is monitoring that same service at 1 minute acceptable? It isn’t. You need monitoring at the granularity your customers care about. And that’s seconds, not minutes. That, he says, is one of the ways to find what you need to know about your systems.”

CopperEgg offers a free trial of their monitoring and optimization solutions here.

By Steve Prentice

Post Sponsored By Copperegg