Vici load balancing. Redundancy / Scalability

All installation and configuration problems and questions

Moderators: gerski, enjay, williamconley, Op3r, Staydog, gardo, mflorell, MJCoate, mcargile, Kumba, Michael_N

Vici load balancing. Redundancy / Scalability

Postby artimus » Wed Mar 12, 2008 7:42 am

I'm looking into upgrading our current vicidial dialer. Are current setup runs on 2.0.3 with the following setup.
1 Asterisk gateway (Trixbox) w/ 4 T1's. Dell 1750, 1 cpu.
2 MySQL Servers (Active/Passive)
1 Web Server
1 Vicidial Server. Dell 1750 w/ 2 3G Xeons and 4G Ram.

The above solution has proven to be unstable with 30 agents. We've under estimated the load put on by the gateway and the dialer. Vici is an excellent project, and overall we are very happy with it. However, it is also extremely innefficient and needs some good hardware to support it. Under high load we get crossed calls, which invites some serious legal issues. I am currently running 15 agents, but would like overtime to ramp it up to about 100 agents. My goal for the next solution is to build in redundancy and scalability. Ideally I would like to be able to add/remove vicidial servers without incurring downtime. I know this is probobly an unreasonable request at this point in development, but what is the best solution I can put in place?


My Thoughts for the new setup are still a work in progress, but what I'm thinking so far is:
Gigabit Switch for dialer only traffic.
2-3 Asterisk gateways (may or may not be trixbox anymore) Each with 4 T1s.
2 MySQL Servers (Active/Passive)
2 Web Servers
2 Vicidial Servers.


The last part is where I'm not sure what I need to do. Is it possible to have two vicidial servers running together? Ideally we don't want them to work as two seperate instances, they should work together sharing lists and campaigns. In the worst case scenario We'll have to cut lists in half and split them between the two. This will cause several problems.

Any thoughts?
Slackware 12 - Linux 2.6.21.5 SMP
Asterisk 1.2.19
Zaptel 1.2.19 (ztdummy) - libpri 1.2.5 - spandsp 0.0.3
IAX2 trunk to trixbox on the same LAN.
VICI / astguiclient 2.0.3
artimus
 
Posts: 38
Joined: Wed Sep 19, 2007 9:54 am

Postby mflorell » Wed Mar 12, 2008 8:17 am

I am very confused how 15 agents on a 2 Xeon server doing only VICIDIAL/Asterisk could be overloading it. We have had 40 agents on similar systems with no issues.

The Trixbox gateway is one weak point since Trixbox has only half the functional capacity of a standard Asterisk server install on the same hardware.

Are you doing full recording?

What is your loadavg when you run into problems?

Are you using ULAW for calls and agents on the VICIDIAL server?

As for multiple VICIDIAL servers, they can function as one dialing system with no issues, dialing from the same lists and campaigns at the same time and sharing calls across each other depending on the next agent to received the call.

MySQL master/slave is not too difficult to set up, two web servers is a bit overkill for 30 agents.

I think you should try to figure out ehy your existing system is performing so badly first though.
mflorell
Site Admin
 
Posts: 18387
Joined: Wed Jun 07, 2006 2:45 pm
Location: Florida

Postby artimus » Wed Mar 12, 2008 3:49 pm

Currently our load avg is 0.18. We usually average below .5 at all times. As the load gets higher with more agents, we notice that the cpu load becomes a cascading problem. When the load ave hits 2.0 we get major problems. We've been able to "manage" the problem by moving off mysql and apache. Moving apache made a huge difference. What I should also have noted was that we do not have an accurate gauge on what two processors can handle. Our vici box originally had only 1 cpu. We added the second at the same time that we reduced the amount of agents.

I'm not about to bash the vici project as we appreciate it very much. Clearly a lot of effort and dedication has gone into. I was also very surprised with the amount of features that were built into it (and actually work to). Unfortunately I do believe that the overall problem has to do with the way it was written (readline / screen / perl scripts).

I'm sure I'm also behind with bug fixes. I haven't patched it since the initial install, because we have entrenched a lot of changes and customizations.


I am aware of the overhead which trixbox causes, but that part we can scale. We keep a close eye on the performance We are currently using ulaw, but we have tried forcing gsm. We noticed a lot of echo's with gsm.

I plan on throwing as much as 100 people on the dialer, which is partially why I want to be able use more then one server. The other reason is that if there is a hardware failure, I don't want to have a call center full of people with no one on the phone.

How would you suggest to implement a second server?
Slackware 12 - Linux 2.6.21.5 SMP
Asterisk 1.2.19
Zaptel 1.2.19 (ztdummy) - libpri 1.2.5 - spandsp 0.0.3
IAX2 trunk to trixbox on the same LAN.
VICI / astguiclient 2.0.3
artimus
 
Posts: 38
Joined: Wed Sep 19, 2007 9:54 am

Postby mflorell » Wed Mar 12, 2008 6:53 pm

I still do not understand why you are having problems with so few agents. We have set up dozens of systems with equal or less CPU resources that reliably handle at least double the number of seats you claim to have problems with.

When we moved the call logging to FastAGI we saw a 50% reduction in load, and when we move the inbound and outbound AGI scripts to FastAGI at some point in the future we expect about a 5-10% reduction in load as well.

As for using Perl as being a limiter, that has some validity, but only for the agi-VDADtransfer scripts under the current releases, the other scripts are constantly running and do not incur as much overhead and little if any delay over compiled programs because they are always running.

We have seen ztdummy as a cause for all kinds of problems on some systems. switching to a different timing source usually fixes the issues.
mflorell
Site Admin
 
Posts: 18387
Joined: Wed Jun 07, 2006 2:45 pm
Location: Florida

Postby artimus » Wed Mar 12, 2008 10:32 pm

Unfortunately it is difficult to identify what "fixed" our problem as we acted on several things at the same time. We reduced the number of clients, added ram and a cpu, and also added a zaptel card at the same time. It's been a while since i looked closely at what we believed was the source of the problem, so unfortunately I cannot provide enough detail there.

Given what you have said:
probobly about 95% of our agents are using ulaw currently. It was our understanding that ulaw uses more bandwidth but less load, where gsm uses much less bandwidth but requires a little more cpu power.

I will have to look at my config when I get a chance to see were using fastagi in the right places. I will try to clean up the config and post it tomorrow.

Please note that are current setup will allow for more then 15 agents. Although we can't acurately determine how many it can support, we believe it is currently more the 30 for the max. The problem is that pushing the limit by 1 agent causes the system to go from a slightly high load average to right through the roof.
Slackware 12 - Linux 2.6.21.5 SMP
Asterisk 1.2.19
Zaptel 1.2.19 (ztdummy) - libpri 1.2.5 - spandsp 0.0.3
IAX2 trunk to trixbox on the same LAN.
VICI / astguiclient 2.0.3
artimus
 
Posts: 38
Joined: Wed Sep 19, 2007 9:54 am

Postby mflorell » Wed Mar 12, 2008 10:43 pm

A couple of other things to do to lower your load:
- use a RAM drive for recordings, if recording all calls on outbound campaign, this can lower your loadavg by 90%
- use the 1.2.16.2 version of Asterisk and use the new enter/leave meetme sounds, yes this actually can lower your load slightly.
mflorell
Site Admin
 
Posts: 18387
Joined: Wed Jun 07, 2006 2:45 pm
Location: Florida

Postby gardo » Thu Mar 13, 2008 2:31 pm

With your current hardware, you should be able to support 30 agents without any issues. It's probably the way your Vicidial has been setup that is causing the challenges your having. As Matt mentioned, you can create ramdrives to drastically reduce server load if you're doing full recordings. If your agents are on the same network as your Vicidial server, their phones should be using ulaw/alaw.

Below is one of our setup for 30 agents:

vicidial/asterisk server:
cpu: core 2 duo e6600
ram: 2 gig
hd: 80 gig sata/ide raid1
os: centos 5.1 64bit
vicidial: version 2.0.3/2.0.4

mysql/apache server:
cpu: core 2 quad q6600
ram: 4 gig
hd: 320 sata raid1
os: centos 5.1 64bit

We have full recordings enabled on ramdrive and ulaw as codecs. It's very stable and can be quickly scaled to more than 30 agents.
http://goautodial.com
Empowering the next generation contact centers
gardo
 
Posts: 1926
Joined: Fri Sep 15, 2006 10:24 am
Location: Manila, 1004

Postby artimus » Thu Mar 13, 2008 6:17 pm

I belive 30 "should" be ok for my setup.
We have a pretty stripped down kernel, v 2.6.21.5
Asterisk 1.2.19 on Slack 12
This was set up directly from the scratch install.

As I said mysql and apache are other boxes.
As for recordings, it looks like we do about 5 to 26 recordings a day, the largest being about 9Mb. It appears that recording isn't often enough to be our main issue.

top:
shows asterisk as #1 taking anywhere from up to 20% cpu.
I do see several perl processes going defunct pretty often. ast_update vd_hopper, etc. Please keep in mind that right now the system is behaving.

I'm wondering if a good chunk of the problem may be the IAX trunks to my pbx? However I imagine that they should be better on load then putting the t1 card directly into the dialer.



In planning the new system, I would like to have the two or more vici servers. My question is how do manage the extensions? Are some agents bound to one dialer, and some to another? Will vici properly manage the amount of logged in agents to each box? Will campaigns need to be split up between servers? How are incoming calls handled? Do I pass them to both servers using a round robin method?
Slackware 12 - Linux 2.6.21.5 SMP
Asterisk 1.2.19
Zaptel 1.2.19 (ztdummy) - libpri 1.2.5 - spandsp 0.0.3
IAX2 trunk to trixbox on the same LAN.
VICI / astguiclient 2.0.3
artimus
 
Posts: 38
Joined: Wed Sep 19, 2007 9:54 am

Postby enjay » Thu Mar 13, 2008 6:21 pm

Personally I would put the trunks on the VICIDIAL server and have your trixbox server dial through IAX trunks to it (into a different context that does not inhereit the call_log stuff). Primary reason being the VICIDIAL server will generate a lot of traffic, and trixbox insists on running its agiparties crap every call, which in turn puts lots of load on that server and ultimately can results in everyone "Trixbox/VICI" having bad quality.

just an opinion.
Last edited by enjay on Thu Mar 13, 2008 6:53 pm, edited 1 time in total.
enjay
 
Posts: 806
Joined: Mon Jun 19, 2006 12:40 pm
Location: Utah

Postby mflorell » Thu Mar 13, 2008 6:29 pm

Are you using ULAW for the IAX trunk?

What is the loadavg of the server on the other side of the IAX trunk?
mflorell
Site Admin
 
Posts: 18387
Joined: Wed Jun 07, 2006 2:45 pm
Location: Florida

Postby artimus » Tue Mar 25, 2008 9:03 am

Sorry for the delay.

Yes IAX is using ULAW. Is this the best option to keep load down? My thoughts were that gsm would be more cpu intensive while ulaw needs more bandwidth.

Trixbox and vici are connected via crossover cable on a second nic.
In my new dialer setup, trixbox will be used only for extensions. I will have multiple boxes each with 2-4 T1's in them for the outbound calls (asterisk not trixbox).

Mysql(2 boxes) and apache(2 boxes) will also be separate.

All these boxes will be on there own switch or vlan, so they won't be affected by the rest of the network.


But my same questions remains. If I have more then one vici box (say 3), how should they be setup to share the load? Here are the concerns which we want to eliminate.

Will we have to split our lists in three and load one on each server?
Will we have to split our agents into three groups and bind each group to a specific server?
How can we easily manage extensions? My thought was to move the agents to trixbox. Will everthing function the same if the agent extensions are not on the same box?

Also assuming that agents would be on trixbox, would it be safe to assume that agiparties would not be an issue since they are technically only taking one 8 hour call a day.
Slackware 12 - Linux 2.6.21.5 SMP
Asterisk 1.2.19
Zaptel 1.2.19 (ztdummy) - libpri 1.2.5 - spandsp 0.0.3
IAX2 trunk to trixbox on the same LAN.
VICI / astguiclient 2.0.3
artimus
 
Posts: 38
Joined: Wed Sep 19, 2007 9:54 am

Postby mflorell » Tue Mar 25, 2008 7:20 pm

Ulaw works best for low load.

I would recommend keeping your lists on the same database and not splitting them up. Also, you can have agents on one server and outgoing lines on another server. Those agent's phones can even be on a third server if you want, there are many ways of spreading the load of a multi-server VICIDIAL system.

For myself, I prefer splitting up the agents equally across all of the dialing servers to increase efficiency of the system, but there isn't much difference in load if you have all agents on one server and other dialing-only servers.
mflorell
Site Admin
 
Posts: 18387
Joined: Wed Jun 07, 2006 2:45 pm
Location: Florida


Return to Support

Who is online

Users browsing this forum: Google [Bot] and 60 guests