We see so many disillusioned customers coming to us thinking that the “cloud” is unreliable and expensive because of poor experiences with our competitors. What they don’t know is that they haven’t been using a cloud. It occurred to me that in the context of the contact center industry, there is still a lot of confusion in the market about the term “cloud”.
How can you quickly and easily distinguish the difference between a hosted infrastructure and cloud infrastructure, how do they differ and most importantly, why should you care?
Let’s first deal with why you should care. If you are not using a cloud contact center, expect downtime and disruption. If your contact center isn’t mission critical, then you don’t have a problem. But if uptime and the ability to communicate with your customers and get productivity from your staff is important, you need to be able to spot the fake “cloud”.
Let’s look at the hosted model, which is often incorrectly called cloud. Most of us will be familiar with the old way of buying contact center technology and installing it on a server in our server room. A hosted solution works the same way, the only difference is, the equipment is hosted in a data center remote from the location where the agent consumes the service. The agent is connected via a dedicated data connection to the hosted data center.
Many established suppliers provide a hosted solution to customers that operate on exactly this principle. There is a problem, however. With a hosted solution, you have more points of failure than with a premise solution, so to provide resiliency, the supplier will offer a failover data center to manage any incidents. In the event of a primary data center incurring any problems, the supplier can migrate your services to the second data center. The problem with this approach is, for many reasons, it doesn’t work.
For example, how long will it take to complete the process of moving you from the first data center to the second data center? In other words, how long will your contact center be down, while your supplier moves you? The technical term that describes the service level for this process is Recovery Time Objective (RTO). The fastest we’ve seen available in the marketplace (other than our own), is around 15 minutes. Average is 1-4 hours. That’s 15 minutes best case and 1-4 hours average downtime built into a hosted solution. And remember these are service levels targets, the best your supplier thinks they can do. Often, they don’t achieve these timings. No wonder customers are frustrated! What would it mean to you for your contact center to be down for 5 minutes, let alone possible hours? Yet we still see decision makers elect this route for their contact center…
The second challenge presented by the hosted solution takes place after moving services from the primary data center to a secondary center. The moment the move is completed, services will more than likely roll back to a previous version that was backed up from the first data center to the second data center. How old is the backup of your service? We know contact centers operate in real time, constantly recruiting, assigning and swapping agents between groups and skill sets, building and adding new contact queues and adding or changing IVR announcements and options.
The trouble with a second data center is that, other than at Cirrus, backups typically don’t happen in real time. They take place overnight or on a schedule, timed for when the infrastructure isn’t under heavy load. So, when your services are moved from the first to the second data center, how old is the version going to be? Will the changes you made this morning, yesterday or in extreme cases this week or even month be there when you fail over? The technical term to measure this is called the Recovery Point Objective (RPO). Most suppliers will typically quote the Recovery Point Objective (RPO) in hours, ranging from 4 – 8 hours.
The RTO pain you will experience is obvious. However, RPO pain is equally as significant, as, when you are at a point of crisis in your contact centre, a poor RPO results in confusion and even more frustration, as you will have agents who can’t log in, queues that don’t reflect your current operation and an IVR that is playing callers the wrong information. Therefore, your actual recovery time and extends beyond what is quoted by your supplier because you have rolled backwards. This is the challenge and risk involved in having a hosted environment. It is a known problem associated with a hosted environment model that is often confused with a cloud model, by many, including suppliers and customers who don’t know how to distinguish the differences.
What is the difference between a cloud and a hosted model?
Let’s start with, ‘what is a cloud’. By way of example, when you search Google, you as the user are oblivious to which server is processing your request, what type of traffic its handling and where your request will be contained and processed, simply because you don’t need to know. Google manages the entire infrastructure in the background for you. You may go to one of several different locations, depending on the infrastructure Google is providing at that point in time. When was the last time you tried to search Google and couldn’t because Google was unavailable? This is an example of a cloud.
In a contact center context, the difference between a hosted contact center and a cloud contact center is 1) how many data centers the agent connects to and 2) how the agent gets connected. In a cloud contact center model, the agent is connected to a multitude of data centers simultaneously. For example, with Cirrus, when the agent signs into their Cirrus service, the software will connect to six servers at three different datacentres, using six different internet service providers, at the same time. Calls, emails, chats, and social media contacts are served to the agent via all three data centers. There is no single point of failure. However, being connected to six servers at three different datacentres that aren’t aware of each other would cause a problem, so in the background, we replicate all data across all servers and all datacentres, at speeds of 0.1 of a second. That means that your services and your data is backed up to three locations, in less than a second. All changes, updates, new agents, queues and IVR options are replicated, in under a second. With Cirrus, your RPO service level is therefore 60 seconds! This is “Cloud Contact Centre” and it is unique to Cirrus.
No other “Cloud” contact center supplier offers triple live datacentre infrastructure. Some suppliers claim to provide the service on a dual live datacentre basis or “Live-Live”. The agent connects into two data centers simultaneously and calls, emails, web chat, SMS, and social media, routes into two data centers and then down to the agent who’s connected to both data centers. The challenge with using this Live-Live topography is, when a supplier has two data centers servicing a customer base, they need to ensure that in the event of a failover, that they don’t have more than 50% utilization on the failing or surviving data center. If they attempt to move more than 50% capacity from one datacentre to another, the surviving datacentre will be oversubscribed and won’t be able to accommodate the load. Therefore, the entire customer base is affected & you will experience downtime. It’s easy for a supplier to plan to manage capacity so that neither datacentre is ever utilized more than 50%, but we live in the real world, where big orders land and need to be fulfilled, and it can take time to scale a datacentre up when that happens. So inevitably, there are occasions when a data center will be oversubscribed, and the risk of downtime is therefore very real with the Live-Live model.
Live-Live presents a challenge for maintenance as well. If the 2 datacentres are operating at more than 50% utilization, how do you take one down to perform maintenance tasks. The supplier can’t move the traffic to the other datacentre, as it will be oversubscribed. Therefore, Cirrus operates a triple live or “Live-Live-Live” architecture. It’s also why our last Priority 1 incident was January 2014.
In closing, how can you quickly determine the infrastructure model your supplier offers? There are three simple questions you can ask any potential supplier.
1) “What is your Recovery Time Objective?”
2) “What is your Recovery Point Objective?”
If the answer you get to either of the above is anything longer than 60 seconds, you know that the supplier is providing you with a hosted environment.
The final question you should ask is;
3) What are your downtime windows for planned maintenance?
If the answer is anything other than “None”, then you will know that the supplier is operating two or less data centers in the infrastructure supporting you.
To sum all of this up, I’m reminded of a comment from Jon Dawson, Cirrus’ Operations Director six months after joining Cirrus from a competitor in 2015. He said; “At Cirrus, I don’t have conversations with customers about downtime. The Cirrus solution just works. That leaves me and my team free to work with customers on continuous improvement and transformation in their contact centers. I’ve never had that anywhere else I’ve worked, and it’s amazing what a difference it makes”.
This week’s blog post was written by Jason Roos; CEO.