Page 1 of 1

Nasty Nasty Issues with Vicibox 5.0.3 & asterisk 1.8.24-vici

PostPosted: Thu Nov 21, 2013 10:46 pm
by amjohnson
I have a fairly large Vicidial cluster with 12 servers that over the years as we added servers and upgraded we wound up having everything from vicibox 3.0.1 to 5.0.something or other... So a few weeks ago we backed up the database wiped all the servers and reinstalled.

I created a backup, ran the svn update to 2040 and applied the schema changes to the database. Then created another backup to restore after updating. The schema version is now 1359 on the database server.

Code: Select all
Version:    2.8b0.5
SVN Version:   2040
DB Schema Version:    1359
DB Schema Update Date:    2013-11-08 01:40:22


Anyways other than some random OS crap I had to deal with the upgrade went ok.. The OS crap had to do with the software raid on the clustered servers and the new version of OpenSuse having problems mounting them at install. Once that was handled everything appeared to install wonderfully.

I also dropped the asterisk database created on the DB install on the database server before restoring my backup to the server. Changed the usernames and passwords in astguiclient.conf to match what I was previously using. Flushed the mysql privileges and rebooted. Server came up online without any issues.I wrote a SQL script loosely based on the SQL in the change server ip script to DELETE the servers out of the database (I did leave the phones and carriers) as to ensure all options were installed correctly. Everything installed and worked as planned..

However suddenly the system was plagued by random mysql connect errors in both the admin and client urls on all 3 of my web servers. Asterisk was giving errors about unable to destroy channels and I had a few servers that racked up 8000 trunks before they just died (asterisk segment faulted). Rebalancing my out bound calling helped with the channels but the mysql connect errors were still popping up here and there.

I found a post here in the forums and ran this on ALL of my servers as well as added it to the /etc/rc.b/boot.local file:
Code: Select all
echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse
echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle


The mysql errors appeared to be the mysql clients AND the server running out of TCP sockets. Problem didnt go away completely until I applied it to all my servers

On my newly installed servers this fixed the MYSQL errors.. I have a total of 5 large clusters I manage so I also applied it to one of them and saw a 10% increase in performance on vicibox v3 and v4 .. IE I could add more outbound lines before the servers started to freak out.

That left me with just the asterisk 1.8 issue.. I should also note that during the upgrade each server was unracked and cleaned. Each server had a Sangoma A102 which was no longer being used for T1's so it was also removed and a blank cover plate was installed. I know DAHDI was using them for timing and meetme requires dahdi for conferences but this has never been an issue with asterisk 1.4 with our without hardware timing.

We are also using g729 compression on one of our sip carriers.

I tried every trick in my book to get asterisk to play nice. I reinstalled vicibox without doing the zypper up, I compiled asterisk 1.8 from the sources in the vicibox downloads section and even tried stock 1.8.24 but at the end of the day I was forced to downgrade to asterisk v1.4.44-vici from the repo by running:

Code: Select all
zypper install --oldpackage asterisk-dahdi=1.4.44-32.43 asterisk=1.4.44-32.43


After changing the settings in the admin webui and the astguiclient.conf to reflect asterisk 1.4 everything now works 100% as expected.

Ok if you've read this far now for the actual question:
What causes asterisk 1.8 to behave like that?

It appears it has a threshold for how many channels per minute it can destroy. My servers typically run at about 25-35% processor usage so I do not believe its a CPU issue. I run only 100 outbound lines per server with 0 agents and 0 to 40 lines if the server has agents on it depending on how many. Asterisk was unable to close both IAX2 and SIP channels, only my sip channels are compressed, so I dont think it was the compression module. I keep the max trunks under or at 200. (unless they started not closing then I had to set the server to 0 outbound for up to 5 minutes to allow the channels to close so the server didnt crash). My database server showed 20 long queries out of about 200,000..

Bottom line:
asterisk 1.4 Max outbound calls I can run on a server with no agents without any problems: 100-125
asterisk 1.8 Max outbound calls I can run on a server with no agents without any problems: 50-75

My telephony servers in this cluster are ASUS RS100-X7 1U servers with Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz processors and 8 GB of Unbuffered DDR3 memory.

We typically dial up to about 6 to 1 with adpt_average dialing. 25 sec call timer.

Running ASterisk v1.4.44-vici - 95 agents and 300-450 outbound lines
Code: Select all
SERVER +   DESCRIPTION        IP         ACT   LOAD   CHAN   DISK   OUTBOUND   INBOUND
database   Server 90vicidial   X.80   Y   154 - 17%      8          9%   LINK   LINK
viciout1   Server vici1          X.81   Y   63 - 17%       178       3%   LINK   LINK
viciout2   Server vici2          X.82   Y   33 - 2%         66        7%   LINK   LINK
viciout3   Server vici3          X.83   Y   37 - 11%       208       3%   LINK   LINK
viciout4   Server vici4          X.84   Y   14 - 10%       145       4%   LINK   LINK
viciout5   Server vici5          X.85   Y   49 - 14%       167       11%   LINK   LINK
viciout6   Server vici6          X.86   Y   8 - 4%           25        4%   LINK   LINK
viciout7   Server vici7   i      X.87   Y   60 - 14%       75        9%   LINK   LINK
viciout8   Server vici8          X.88   Y   31 - 10%       201      3%   LINK   LINK


Exact same settings with 1.8.24-vici would crash .81, .83, .85 and .88 within 20 minutes.

Is this problem I was having with asterisk 1.8 fixable? I remember Michael saying at training that there was an issue with 1.8 not closing the channels but that he was working on a patch for it.

Re: Nasty Nasty Issues with Vicibox 5.0.3 & asterisk 1.8.24-

PostPosted: Fri Nov 22, 2013 8:57 am
by mcargile
A few questions:

What is your calls per second set to?

Do you have the output from 'core show channels'?

Do you get any error messages on the asterisk console?

What are you using for meetme timing now?


In general I have seen a reduction in load from Asterisk 1.4 to Asterisk 1.8 and considerably less crashing of 1.8 servers. As for the channel issue, I was having an issue with local channels getting stuck open during development, but I worked with the meetme developer to get that worked out.

Re: Nasty Nasty Issues with Vicibox 5.0.3 & asterisk 1.8.24-

PostPosted: Fri Nov 22, 2013 2:11 pm
by Kumba
The newer versions of ViciDial have been converted to MySQLi for the php connector and we have seen an increase in port usage at all levels. There is also a large increase in the amount of tracking that iptables/netfilter had to do for reasons that we don't fully understand just yet. We had to do the following on large cluster installs due to shared memory running out in the kernel for netfilter:

net.netfilter.nf_conntrack_max=262144

Putting it in sysctl.conf also doesn't seem to work. We had to add 'sysctl net.netfilter.nf_conntrack_max=262144' to /etc/init.d/after.local and set after.local to run. That works on reboot. Basically it increases the shared memory used by the netfilter tracking mechanisms from 64-megs to 256-megs.

Re: Nasty Nasty Issues with Vicibox 5.0.3 & asterisk 1.8.24-

PostPosted: Tue Nov 26, 2013 2:15 pm
by amjohnson
It varies based on the server but each server is between 15 and 20 calls per second.. I did lower the calls per second down to 10 and I was still experiencing channel issues with 1.8

-Andrew

Re: Nasty Nasty Issues with Vicibox 5.0.3 & asterisk 1.8.24-

PostPosted: Tue Nov 26, 2013 2:35 pm
by amjohnson
Unfortunately I did not save any of the output from the asterisk 1.8 I'll be loading servers this weekend as I am going to be moving one of my clusters from on premise to a colo site and completely rebuilding the cluster... Its looking like I might have to goto a scratch install on my dialers though since the kernel in Opensuse doesn't seem to like my ASUS RS100-E8 servers... It running and everything but when you type reboot it shuts the server down.... Not good..

I'm going to throw a Unbuntu kickstart file together to install all the dependencies and some scripting to setup the svn and what not. If you are interested just tell me whom to send them to.

-Andrew