By Steve Walker
The industry trend towards Voice over IP (VoIP)-based PBXs is causing a shift in the technology underpinning the call center business. VoIP PBXs bring with them tremendous features and flexibility, but they also create some unique technical challenges. Since a VoIP PBX is essentially software running on a computer, how will you keep your agents active and phone lines up when your VoIP telephony environment goes down?
Computer problems are not a question of if, but when. And VoIP PBXs are no exception. VoIP PBXs may encounter problems on their own (for example, a hard disk failure), network problems may block them, or they may go idle in the event of a local VoIP carrier problem. (VoIP call centers on the East Coast of the United States will remember an extended outage of a particular carrier in November 2017). Any one of these events (and more) can bring your VoIP telephony environment to a halt and idle your agents.
If you’re planning to deploy a VoIP-based PBX, you need to ensure that you implement high availability (HA). In simplest terms, HA means that if one PBX fails for any reason, another will rapidly take its place and restore telephony services. This is normally achieved through “clustering,” which means having a standby PBX ready to take over for the primary PBX if things go wrong.
If you ask your IT person about HA or clustering, you might get an answer well-suited to an office computer but not appropriate to a telephony environment. To design a HA solution suitable to a mission-critical telephony environment, you need to consider the following six criteria:
This criteria is the most important requirement when designing a HA telephony environment. It means that damage or failure of one PBX in the cluster cannot negatively affect the others; they must be autonomous (share nothing). Simple or cheap solutions share hardware, software, and disk drives between primary and standby PBXs. But enterprise-caliber solutions, including those serving public service answer points (PSAPs), must have fully autonomous cluster members. Make sure your clustered PBXs are fully autonomous.
The information held in the PBX must be kept consistent between the primary and standby PBXs in the cluster, so that either can take over for the other on a moment’s notice. Solutions that share data break the first rule of autonomy, but solutions which synchronize data are ideal. Look for a solution that synchronizes data, not one that shares a data storage device. Just as important, ensure that the PBXs will automatically turn off synchronization if one of them is in poor health. Sharing data that may be corrupted by a failing PBX can destroy the other one, resulting in the call center going off-line.
3. Failure Detection
Simplistic HA solutions define failure as a black-or-white scenario (for example, a power outage affecting the building shuts down everything). But VoIP PBXs fail in their own unique ways. A software bug might prevent the PBX from connecting calls, or a memory error may prevent calls from reaching agents. Enterprise-caliber solutions require sophisticated health sensing and failure detection. This ensures that the PBX is running and telephony services are fully functional. Avoid solutions with simplistic failure detection.
4. PBX Separation
While putting the primary and secondary PBXs side by side is convenient, it minimizes the magnitude of failures the cluster can withstand. Instead you will want to place one PBX in your primary call center and the other far away, perhaps in a different state. That way, if you suffer a local or regional power or carrier outage, the backup PBX running far away can take over. Then agents can connect with mobile phones or work from home. Note as well that simplistic synchronization solutions break down whether the two PBXs are placed far away or one is placed in the cloud. Therefore, make sure your synchronization solution can handle any degree of physical separation of the two PBXs.
5. Rapid Detection and Failover
Your call center will suffer immensely if it takes fifteen minutes for your PBX to detect that something went wrong, and it will suffer again if it takes twenty minutes longer to switch to the backup. And a lengthy outage may put your call center SLAs (service level agreements) or contracts at risk. Ensure that your HA solution can rapidly failover from one PBX to the other and that failure detection (health monitoring) can trigger a failover in under one second if things go wrong.
If your call center handles personal health information (i.e., for a medical facility), then information contained in the PBX (such as voicemails) may be protected health information (PHI). Voicemails synchronized between the two PBXs may be deemed “ePHI in transit,” which could violate rules pertaining to the protection of this information. Regulations like HIPAA in the USA, PHIPA in Canada, PDPA in Singapore, and so forth may impact your HA solution. You must ensure that communications between the two PBXs are encrypted to secure that information; this will also help protect the PBXs from internet hackers.
These six criteria define a minimum set of capabilities your HA environment must meet to ensure you maximize PBX uptime and maintain the productivity of your call center. Since VoIP PBXs are fundamentally software running on a computer, you will find a range of HA solutions from free and open-source (generic computer HA) to commercial products specifically for PBXs.
As you select your HA solution, evaluate your options using this criteria to find the solution that’s right for you. Don’t wait until your first VoIP PBX outage to start implementing a high-availability solution.
Steve Walker is the CTO at Telium, a manufacturer of telephony and telematics solutions specializing in VoIP.