I created a backup, ran the svn update to 2040 and applied the schema changes to the database. Then created another backup to restore after updating. The schema version is now 1359 on the database server.
- Code: Select all
Version: 2.8b0.5
SVN Version: 2040
DB Schema Version: 1359
DB Schema Update Date: 2013-11-08 01:40:22
Anyways other than some random OS crap I had to deal with the upgrade went ok.. The OS crap had to do with the software raid on the clustered servers and the new version of OpenSuse having problems mounting them at install. Once that was handled everything appeared to install wonderfully.
I also dropped the asterisk database created on the DB install on the database server before restoring my backup to the server. Changed the usernames and passwords in astguiclient.conf to match what I was previously using. Flushed the mysql privileges and rebooted. Server came up online without any issues.I wrote a SQL script loosely based on the SQL in the change server ip script to DELETE the servers out of the database (I did leave the phones and carriers) as to ensure all options were installed correctly. Everything installed and worked as planned..
However suddenly the system was plagued by random mysql connect errors in both the admin and client urls on all 3 of my web servers. Asterisk was giving errors about unable to destroy channels and I had a few servers that racked up 8000 trunks before they just died (asterisk segment faulted). Rebalancing my out bound calling helped with the channels but the mysql connect errors were still popping up here and there.
I found a post here in the forums and ran this on ALL of my servers as well as added it to the /etc/rc.b/boot.local file:
- Code: Select all
echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse
echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle
The mysql errors appeared to be the mysql clients AND the server running out of TCP sockets. Problem didnt go away completely until I applied it to all my servers
On my newly installed servers this fixed the MYSQL errors.. I have a total of 5 large clusters I manage so I also applied it to one of them and saw a 10% increase in performance on vicibox v3 and v4 .. IE I could add more outbound lines before the servers started to freak out.
That left me with just the asterisk 1.8 issue.. I should also note that during the upgrade each server was unracked and cleaned. Each server had a Sangoma A102 which was no longer being used for T1's so it was also removed and a blank cover plate was installed. I know DAHDI was using them for timing and meetme requires dahdi for conferences but this has never been an issue with asterisk 1.4 with our without hardware timing.
We are also using g729 compression on one of our sip carriers.
I tried every trick in my book to get asterisk to play nice. I reinstalled vicibox without doing the zypper up, I compiled asterisk 1.8 from the sources in the vicibox downloads section and even tried stock 1.8.24 but at the end of the day I was forced to downgrade to asterisk v1.4.44-vici from the repo by running:
- Code: Select all
zypper install --oldpackage asterisk-dahdi=1.4.44-32.43 asterisk=1.4.44-32.43
After changing the settings in the admin webui and the astguiclient.conf to reflect asterisk 1.4 everything now works 100% as expected.
Ok if you've read this far now for the actual question:
What causes asterisk 1.8 to behave like that?
It appears it has a threshold for how many channels per minute it can destroy. My servers typically run at about 25-35% processor usage so I do not believe its a CPU issue. I run only 100 outbound lines per server with 0 agents and 0 to 40 lines if the server has agents on it depending on how many. Asterisk was unable to close both IAX2 and SIP channels, only my sip channels are compressed, so I dont think it was the compression module. I keep the max trunks under or at 200. (unless they started not closing then I had to set the server to 0 outbound for up to 5 minutes to allow the channels to close so the server didnt crash). My database server showed 20 long queries out of about 200,000..
Bottom line:
asterisk 1.4 Max outbound calls I can run on a server with no agents without any problems: 100-125
asterisk 1.8 Max outbound calls I can run on a server with no agents without any problems: 50-75
My telephony servers in this cluster are ASUS RS100-X7 1U servers with Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz processors and 8 GB of Unbuffered DDR3 memory.
We typically dial up to about 6 to 1 with adpt_average dialing. 25 sec call timer.
Running ASterisk v1.4.44-vici - 95 agents and 300-450 outbound lines
- Code: Select all
SERVER + DESCRIPTION IP ACT LOAD CHAN DISK OUTBOUND INBOUND
database Server 90vicidial X.80 Y 154 - 17% 8 9% LINK LINK
viciout1 Server vici1 X.81 Y 63 - 17% 178 3% LINK LINK
viciout2 Server vici2 X.82 Y 33 - 2% 66 7% LINK LINK
viciout3 Server vici3 X.83 Y 37 - 11% 208 3% LINK LINK
viciout4 Server vici4 X.84 Y 14 - 10% 145 4% LINK LINK
viciout5 Server vici5 X.85 Y 49 - 14% 167 11% LINK LINK
viciout6 Server vici6 X.86 Y 8 - 4% 25 4% LINK LINK
viciout7 Server vici7 i X.87 Y 60 - 14% 75 9% LINK LINK
viciout8 Server vici8 X.88 Y 31 - 10% 201 3% LINK LINK
Exact same settings with 1.8.24-vici would crash .81, .83, .85 and .88 within 20 minutes.
Is this problem I was having with asterisk 1.8 fixable? I remember Michael saying at training that there was an issue with 1.8 not closing the channels but that he was working on a patch for it.