On The Reliability of Voice Over IP (VoIP) Telephony

Sumalya Pal, Raviteja Gadde and Haniph A. Latchman

University Of Florida

Department of Electrical and Computer engineering

Abstract

Voice over Internet Protocol (VoIP) for telephony is

becoming important and widespread with the use of the

Internet for multimedia traffic and particularly voice

communication. The Public Switched Telephone Network

(PSTN) system is widely recognized as having set a very

high bar in expected telephony performance - an overall

availability of 0.99999 (“five nines”). For VoIP telephony

to replace traditional systems they should be able to

provide a comparable level of availability to the users as

on a PSTN system - a formidable challenge for VoIP

systems. This paper reviews the “five nines” availability

for PSTN systems and explores how emerging VoIP

systems may achieve this level of performance. We

propose a Kamailio and Freeswich combination (an

open-source SIP Router and VoIP application server,

respectively), operating in a Linux environment in

conjunction with a load balancer and ENUM to provide

high availability VoIP service.

Key Words: Alpine Linux, Availability, reliability,

redundancy, Five-nines, Kamailio, FreeSwitch, LVS,

Ultra Monkey, Virtual Servers. VoIP.

1. INTRODUCTION

IN THE world of traditional telephony, the providers of

the particular telephone services often quote a service

level of “five nines” or an availability of 0.99999 as a

standard signifying quality and assurance. This is

sometimes used as a bragging point about their individual

products and service offerings. It also means that any new

competitors in the same field have to achieve this goal in

order to compete effectively. Hence emerging Voice over

Internet Protocol (VoIP) telephony – both for long

distance or toll calls, as well as for local inter- or intra-

office calls (an alternative for traditional telephone

switches or Private Branch exchanges (PBX‟s)) – faces

the “five nines” hurdle.

Reliability, a concept that generally deals with the

continuous operation of a service, depends on the

hardware and software elements of the system, while

availability, is a measure of fraction of time that the

service is usable. Reliability can be defined as the

calculated value, which represents how often the system

fails as compared to the percentage of time the system, is

available. Availability on the other hand depends largely

on the probability of the failure of a hardware component.

It‟s calculated by counting the number of components in a

system and then calculating the overall mean time between

failures (MTBF). It is represented by this formula [1]:

Availability = MTBF / (MTBF+MTTR)

Where MTTR = mean time to repair. Typically MTBF

increases as the number of components in the system

increases. The most important factor is the MTTR [1],

which literally decides the availability of a certain IP

telephony connection. For example let‟s take MTBF =

400,000 Hours and MTTR = 1 Hour. Then the

Availability will be approximately 99.9997%, which is

excellent from any industrial standard. But let‟s take

MTTR to be 96 hours then the Availability will be 99.7%

approximately. .

Traditional telephony achieved the impressive “five-

nines” level of availability mainly from hardware

redundancy and reliability of components. Several

organizations have been working towards making the

availability of IP telephony approach that of the PSTN.

Some of the most prominent IP telephony companies

working towards this goal are Cisco, ShoreTel, Nokia and

AT&T.

In Section 2 we explore in greater detail the

implications of the nines‟ criteria, while in Section 3 we

will discuss some reasons why VoIP still lags behind the

PSTN when it comes to availability of the communication

system. Section 4 presents several recent positive

developments in IP telephony systems from some of the

companies working in this field and in Section 5 we

examine the VoIP frameworks that are being created for

providing high availability. Section 6 identifies some of

deficiencies in present approaches to VoIP systems and in

Section 7 we propose a framework for highly availably

VoIP systems based on open-source and an Internet

inspired approach. Section 9 presents our conclusion and

anticipates further work towards high availability open-

source based VoIP systems.

2. THE NINES’ CRITERIA

There are different figures of „nines‟ that are used in

demarcating the availability of a device. The one nine

represents a 9.0 % availability, which translates on an

annual basis to 332 days of downtime in a year, which

means that the system will function properly on average

for only one month in a year. The two-nine reliability or

99%, availability implies 3 days and 15 hours seconds of

downtime in a year. The three-nines or 99.9% availability,

signifies downtime of eight hours and forty-six minutes in

a year. The four-nines criteria or 99.99% availability

means that there will be a downtime of fifty two minutes

and thirty six seconds in a year. Now in today‟s age of

telecommunications the five-nines criteria has become the

holy grail of telecommunications industries.

Five nine criteria is generally used to represent a target

availability figure. Some people even go to the extent of

calling it a catchall since with high availability we get

high reliability and vice versa. Five-nine availability (or

99.999%) means a downtime of five minutes and fifteen

seconds or less per year. The PSTN service is now

shooting for six nines availability, which means an

availability of 99.9999%. This means a downtime of

thirty-two seconds or even less, per year.

3. REASONS WHY VOIP STILL LAGS

BEHIND PSTN

Consumers expect same level of reliability as they expect

from traditional PSTN service in VoIP. PSTN services

have matured into a highly reliable system providing

99.999% of availability. It has been noted that the PSTN

system can be characterized as an upper bound of

dependability that a distributed computing system can

achieve.

The major factors that affect the VoIP system are network

outages and SIP server outages. The service availability of

the system drops to 98% when network outages interrupt

calls [12]. In the following table we can see the service

availability of VoIP systems in various networks.

Network/path type

Call success probability

All

99.53%

Internet2

99.52%

Internet2+

99.56%

Commercial

99.51%

Domestic (US)

99.45%

International

99.58%

Domestic Commercial

99.39%

International Commercial

99.59%

Table 1 CALL SUCCESS PROBABILITY ON FIRST CALL

ATTEMPT WITH RESPECT TO NETWORK/PATH TYPE [12]

Also Packet Loss causes considerable quality

degradations for the users, which can be equated to a

dropped call. Even when the system has packet loss below

0%, it was observed that service availability that can be

achieved is a maximum of 97.7%. [12]

Network Outages are not a fleeting occurrence but a

common place in present Internet. The advantage the

PSTN is that once a call is established it is ensured quality

but on other hand Internet provides a best effort service.

There is also a overall call abortion probability of 1.5%

giving 98% service availability which is falls far short of

achieving 99.999% availability [12]. Given all the above

reasons we can see that achieving five-nines in a normal

Internet conditions is difficult.

Many VoIP systems often feature a central Session

Initiation Server (SIP) or other type of VoIP Server.

Server outages may occur because a software problem or

a failure in the server hardware. To achieve Five Nines

availability the system needs to have MTBF of 2,400,000

hours and MTTR of 24 hours. Even if MTTR is reduced

to 4 hours, we need to have a MTBF of 400,000 hours.

The state of the art system provides a MTBF of 100,000

hours and MTTR of 24 hours [1].

This means we cannot design a high availability system

with a single component. Highly available systems are

stand-by. When the primary fails, the system falls back to

secondary. A similar approach is needed for high

availability VoIP.

4. IP TELEPHONY RELIABILITY

FRAMEWORKS

Accessibility, continuity and fulfillment are the main

factors in IP telephony. Accessibility is the ability to

initiate a voice call when desired; continuity is the ability

to finish the call successfully without jitter upon

successful access; and fulfillment is the desired call

quality by the customer upon successfully establishing the

call [3].

The basic function of IP network is to carry data and

traffic without experiencing disruptions due to increased

delay and packet loss. IP backbone routers inherently lack

carrier grade reliability [3]. In order to overcome this

weakness in IP telephony networks, a complete redundant

connection between backbone routers, which are grouped

in pairs, needs to be established.

A network architecture having a fully redundant

backbone guarantees network stability and minimum

delay for the rerouted traffic. This is because in the event

when one backbone router fails, the other backbone

routers, which are grouped together, will be able to

reroute any traffic. However such network architecture

involves high expenditure, which is a tradeoff that the

architecture imposes for high network availability. In

essence the high availability is to a large extent dependent

on the design of the network infrastructure.

The architectures that we use for network applications

follows IP distribution routing protocols like that of OSPF

(open shortest path first). Under such conditions the

rerouting delay is pretty high, and so these are not ideal

for robust Internet telephony applications. Cognitive

networking is becoming a necessity and newer ways to

reroute traffic are already under research and

development. One such protocol is MPLS (multiprotocol

label switching). In this protocol architecture, rerouting

paths are assessed in advance and whenever there is a

failure in network, immediately this alternate path is

invoked [6].

However further investigation is needed to verify that

such protocols can be used in a more cost effective way.

This is because it might not be possible to implement such

fast routing protocols in every class of networks due to

cost or may be due to security considerations.

5. AN EVALUATION OF THE

APPROACHES TOWARDS “FIVE-NINE”

PERFORMANCE FOR VOIP

There have been several attempts by industries and

academic institutions to find a robust and cost effective

mechanism for making VoIP five-nines available. Some of

the approaches include server redundancy (1+1)

mechanism by industries, software based approaches

using SIP protocol and different software tools like Ultra

Monkey [25], Piranha and other similar technologies

based on Unix based systems or even using peer-to-peer

(P2P) as a mechanism for effective VoIP usage.

In the following discussion we will evaluate these

approaches as to how effective they are in the search for a

five-nine-availability VoIP solution.

Session Initiation Protocol (SIP) based approach

a. The SIP Architecture

The Session Initiation Protocol (SIP) is a more

flexible and simpler solution compared to H.323

architecture [23], for handling multimedia sessions of

VoIP. SIP is defined by IETF signaling protocol as a

text based protocol (that uses UTF-8 encoding) which

incorporates several elements of the Hyper Text

Transfer Protocol (HTTP) and Simple Mail Transfer

Protocol (SMTP) and is widely used for controlling

multimedia communications such as voice and video

calls over Internet Protocol (IP). SIP is an application

layer protocol and is independent of the underlying

transport layer. SIP can run efficiently on

Transmission Layer Protocol (TCP), User Datagram

Protocol (UDP) and also on Stream Control

Transmission Protocol (SCTP). It uses port 5060 for

UDP as well as for TCP [24]. SIP acts as an enabling

protocol for VoIP telephony services.

Fig -2 Basic SIP architecture for VoIP

As an alternative to classical PSTN, SIP based

telephony services are being considered since it offers

a number of advantages over the former.

There are many impressive features of SIP that made

it a possible choice for VoIP. Some of these features

are:

1. SIP makes sure that the call made by the user

reaches its destination, without the restriction

of the respective party‟s location.

2. This takes into account that not all party (in a

conference call) can support all the features,

for example video. Hence negotiating feature

is highly regarded in SIP.

3. Call cancellation and call hold is completely

supported by SIP. A user can add more users

to the call or terminate any user. Call

transferring is also supported in SIP.

4. Flexibility in call features is one of the major

advantages of SIP. The user can start using

another program while maintaining the call.

For example a user can start using video

while on phone.

5. Less advanced devices can be incorporated to

make a call since SIP allows usage of several

different codecs, which enable negotiation of

the media in a call.

b. SIP availability

Wenyu Jiang and Henry Schulzrinne of Columbia

University [12] conducted an experiment in which

they set up a group of test clients in different

networks and simulated automatically random calls

between these clients. The overall service availability

as calculated by them was 0.98.

They stated that in order to calculate the correct

measure of availability in VoIP system, we must

consider the call abortion probability of the users.

They concluded that although SIP does not really

provide five-nines availability like the PSTN, it does

come very close to mobile networks‟ availability,

which is 0.97 to 0.99.

N+1 Redundancy approach

This method is followed by several

networking/hardware industries in order to make their

products five-nines available. Ciscos, Nokia,

ShoreTel are some of the companies who have

invested considerable amounts in this technique.

Cisco has made considerable advancement in

achieving five-nines availability in their IP telephony

systems. Taking a leaf from the definition of legacy

PBX connections, which over all neglects the

influence of non-redundant components of an IP

telephony system, Cisco claims to have theoretically

proven that their IP telephony solution meets the five-

nines standard. They tested their IP telephony

solution for hardware reliability, software reliability,

link/carrier reliability, power/environment reliability

and network design reliability. For this they largely

included the parts count method by Telcordia, which

is used, for MTBF calculations [2]. Hardware

reliability was assessed using the CI infrastructure

model with redundant Catalyst 6509 chassis, power

supplies and supervisor modules for Cisco Call

Manager access switches, which were all redundant in

nature [2]. Based on the estimated software forced

reloads the theoretical software reliability for the IP

telephony product was measured [2].

Cisco typically enforced N+1 redundancy technique

to their IP telephony environment.

The overall availability has been calculated using the

equation:

System Availability = π

i=1

= availability (i)

Where i = number of components from 1 to n. This

number is then multiplied by the availability of each

component to give the overall system availability

[2][4].

Now in case there is another failure while the

repair for the first failure is still under way then

calculation of availability of a redundant parallel

system has to be conducted. Cisco performed this by

using this equation:

Parallel availability = 1 – [ π

i=1

(1 – component

availability(i))] [2]

In essence Cisco, Nokia and Shore-Tell calculated

the parallel availability/reliability of each component

in a redundant system thus ignoring the non-

redundant parts of the system and then calculated the

overall system reliability by adding up the parallel

reliability of the redundant system.

This process of redundancy is very effective in five-

nines availability assessment of any system, but on

the flip side, it increases the cost of manufacturing the

system.

6. PROBLEMS WITH CURRENTLY

AVAILABLE APPROACHES IN

ACHIEVING FIVE-NINES

AVAILABILITY

The redundant mechanism followed by most industries in

order to raise the service availability of traditional VoIP

by making use of some redundant service hosts

coordinated by central load balancer, in which each

component is coupled with a stand by partner is not a very

cost effective mechanism.

Another approach is to make use of a layer-4

switch in addition to the available switching device

between SIP clients and a set of SIP servers to dispatch

service load without any modifications to the SIP service

components. An example of such an approach is the

development of a hardware-based mechanism like BIG-IP

[18]. Such an approach might be able to improve the

availability of a system but will cost a lot in

implementation.

Software based approach developed on a Linux

Virtual Server (LVS) [19], can achieve the required

availability, but may cause delay as the SIP messages

traverse through intermediate nodes.

There exists a Peer-to-Peer (P2P) approach [20]

[21] towards making VoIP more accessible and available.

In such a mechanism, there exists no centralized manager

and the peers themselves act as servers as well as clients.

However there are numerous security concerns with such

an approach. Any peer with malicious intent can alter the

SIP messages by other peers. Then again there is a

problem of anonymity, as the peers might not want to

share SIP messages, which can be read by other peers.

Such a mechanism also suffers from SIP message

transversal delays, which might cause additional packet

loss and poor connectivity.

7. KAMAILIO – FREESWITCH BASED

VOIP FRAMEWORK

Here we propose a mechanism for attaining five-nines

availability of VoIP telephony using RTP and SIP. The

basic setup consists of Kamailio or SIP Router [26],

which is an open source SIP server, and FreeSwitch [27],

which is an open-source telephony platform, designed to

facilitate the creation of multimedia-messaging (voice

video and text) driven products.

The basic setup consists of two or more Kamailio virtual

servers and one or more FreeSwitch virtual server running

on Linux platform. Each of the Kamailio virtual servers

has a pre-determined number of supportable clients based

on the CPU and memory limits. The X-Lite [28]

softphone or any number of SIP Compliant IP telephones

are used as the primary IP Telephony devices. Call are

routed from the clients to the Kamailio servers using a

distributed DNS-based ENUM (E.164 Number to URI

Mapping) system with priority settings. If at any moment a

particular Kamailio virtual server is down or running at its

full capacity then it will reroute the calls to other Kamailio

virtual server 2 or to the FreeSwitch virtual server(s)

which will primarily serve the purpose of voice mail and

music on hold, an automated attendant.

Fig 4: Flowchart of our proposed setup.

To give hardware redundancy, it is proposed to use Ultra

Monkey as a load balancer on the Linux Virtual Server

(LVS), which is essential for creating highly available

network services.

Fig 5 – Ultra Monkey setup.

Ultra Monkey is an open source project to provide

flexible high availability frameworks. This can be used for

developing both single server based high availability

systems as well as multiple servers based high availability

systems. It uses the Heartbeat protocol [29] to monitor

the servers.

Heartbeat is a protocol that monitors messages sent at

regular interval between two servers and at any point of

time if messages are not received from one server, it is

assumed that the server in question has failed and some

form of evasive mechanism is followed in order to rectify

this. Heartbeat protocol can send heartbeat messages over

both serial links and Ethernet interfaces. It uses an IPfail

plugin [30] that helps in determining which nodes should

be active.

Fig 6 – Ultramonkey with ipfail plugin [30]

After configuring Heartbeat, we designate a master node

which when Heartbeat starts up, designates an interface

for a virtual IP address which may be accessed by external

end users. Under any situation if this node fails then in

order to ensure that this machine receives all traffic bound

for this address, another node in the heartbeat cluster will

start up an interface for this IP address. This process is

carried out using Gratuitous ARP [31]. This process is

called IP failover takeover.

Ultra monkey makes use of Heartbeat to manage IP

addresses on the host on which Linux Virtual Server runs.

This also monitors the ultimate destination of a connection

made to a virtual service using IPaddr2 resource.

FreeSwitch is used as a voicemail, music-on-hold,

conferencing and automated attendant server. From Figure

6 we can see how the whole setup works. The Kamailio

servers if unable to handle the calls at any time will

redirect the calls to the FreeSwitch server, which will

either offer an appropriate service.

8. CONCLUSION AND IMPLICATIONS

The main hurdle in providing five-nines-availability over

VoIP networks is that a server failure at any one point,

shuts down the system for a certain amount of time in

order to recover from the failure. This typically decreases

the MTBF. If we can somehow can take the load off the

failed server and divide it to several working servers, the

MTBF will certainly increase; thus increasing the

availability of the VoIP network.

Ultramonkey and ENUM provides load balancing to the

servers using heartbeat and priority routing. Hence at any

time when there is a load on one server (in our case

kamailio servers), it will aptly reallocate the load to the

server, which is comparatively free. Heartbeat

continuously monitors the messages sent between the two

servers, and at any point when it finds that there has been

a communication break down, it takes evasive action to

rectify the failure.

A working model of the Kamailio-FreeSwich Utltra

Monkey and ENUM-based VoIP system has been built

and work is continuing on formal theoretical study of the

overall availability as well as on generating empirical

results as the system is scaled in terms of the number of

supportable clients as well as in geographical and server

count scope.

9. Reference

[1] Ensuring Reliability in IP Telephony – Shore Tel-

IP Telephony from A-Z e-book

[2] IP Telephony: The Five Nine Story – Cisco systems

white paper

[3] VOIP Reliability: A Service provider’s perspective

- Carolyn R. Johnson, Yakov Kogan. Yonatan Levy,

Farhad Saheban and Percy Tarapore, AT&T Labs

[4] High Availability Solutions For SIP Enabled

Voice-Over-IP Networks – Cisco Systems whitepaper.

[5] Exploring the challenges to powering the future as

telecommunications transitions to IP based networks –

Nicholas Osifchin International power strategies, USA

[6] IP Telephony: Reliability You can count on – Shore

Tel white paper

[7] Power and Cooling for VOIP and IP telephony

Applications – Viswas Purani

[8] Convergence: the business case for IP Telephony –

Bob Emmerson.

[9] VOIP and IP Telephony: Planning for convergence

in state government – Nascio Whitepaper

[10] Self-Admission Control for IP Telephony using

Early Quality Estimation - Olof Hagsand1, Ignacio

M´as1, Ian Marsh2, and Gunnar Karlsson1-1 Department

of Microelectronics and Information Technology Royal

Institute of Technology (KTH) S-16440 Kista, Sweden 2

Swedish Institute of Computer Science Box 1263 SE-164

29 Kista, Sweden

[11] IP Telephony Security: an overview -Cisco

Systems.

[12] Assessment of VoIP Service Availability in the

Current Internet –Wenyu Jiang, Henning Schulzrinne

[13] Design and Implementation of a Low Cost DNS-

based Load Balancing Solution for the SIP-based VoIP

Service - Jenq-Shiou Leu, Hui-Ching Hsieh, Yen-Chiu

Chen, and Yuan-Po Chi

[14] High-Availability Solutions for SIP Enabled Voice-

over-IP Networks –CISCO

[15] Design and Implementation of a High Availability

SIP Server Architecture- Diplomarbeit , Nils Ohlmeier

[16] Lessons from the PSTN for Dependable Computing

-Patricia Enriquez, Aaron Brown, David Patterson

[17] Network Performance Analysis of Internet

Telephony on SIP in ENUM Implementation - Yudha

Indah Prihatini, Adi Permadi, Wahyu Novian Condro

Murwanto, Rendy Munadi

[18] “BIG-IP” - http://www.f5.com/products/big-ip/

[19] “IP Virtual Server” -

http://www.linuxvirtualserver.org/

[20] D. Bryan, B. Lowekamp, C. Jennings, “A P2P

Approach to SIPRegistration and Resource Location,”

draft-bryan-sipping-p2p-02, IETF, March 5, 2006.

[21] D. Bryan, B. Lowekamp, C. Jennings, “SOSIMPLE:

A Serverless, Standards-based, P2P SIP Communication

System,”International Workshop on Advanced

Architectures and Algorithms for Internet Delivery and

Applications (AAA-IDEA), June 2005.

[22] http://www.networkdictionary.com/Telecom/VOIP-

Architecture-based-SIP.php

[23] SIP- Session Initiation Protocol - J. Rosenberg, H.

Schulzrinne , G. Camarillo, A. R. Johnston ,J. Peterson,

R. Sparks , M.Handley, and E. Schooler.

[24] http://www.voip-info.org/wiki/view/SIP

[25] http://www.ultramonkey.org/

[26] http://www.Kamailio.org/w/

[27] http://www.freeswitch.org/

[28] http://www.counterpath.com/x-lite.html&active=4

[39] http://www.linux-ha.org/wiki/Heartbeat

[30] http://www.ultramonkey.org/3/ipfail.html

[31] http://wiki.wireshark.org/Gratuitous_ARP