Serious Connectivity Issues

All installation and configuration problems and questions

Moderators: gerski, enjay, williamconley, Op3r, Staydog, gardo, mflorell, MJCoate, mcargile, Kumba, Michael_N

Serious Connectivity Issues

Postby Davidian » Fri Apr 27, 2012 10:55 am

Hi there,

we have had some seriously terrible connectivity issues the last month on our Vicidial server.

It started that at 12:20, 2:20, 3:20 and 4:20 we would not connect to any calls for 6-8 minutes.

Our provider told us it was because our DB server was old and could not handle the 8gb database, so we upgraded to 2 brand new servers from them.

we still got the same issues.

they then said that it was because the dialler was being run too hard (even though we have always run it the same, between 6-8 for most of the day, but sometimes we run it up to 15 when we only have our troublesome areas left)

The problems have worsened, to the point where every 20-30 minuites we lose connectivity for 10 mins or so...
When this happens our IP phones take 45s+ to even get a dial tone and start ringing

Our provider said that it was a problem on our network, but we went through everything and it wasn't, then it was a problem with our lease line which we checked, got our provider to check, basically they kept throwing it back at it being everyone elses problem, yet we and others have checked everything else and cannot find any issues.

Our provider have now said that the issue is we run the dialler too high and have bad data again.

They also don’t like the “random new 2” – this choses one random lead, one new lead, whereas downcount only dials the least called leads.

Downcount does not work well for us because some of our areas have had a few calls and do not get called until the other lists are off, whereas random new 2 dials well.

When we are not suffering these issues, I can have the dialler on 6-8 Adapt hard limit, random new 2 with a wait time sub 20 seconds

If the issue is DATA and having the dialler up to hard, why when we had the dialler on Ratio 4 Downcount for an hour (under their advisement) would we STILL have had connection issues, we even saw someone at nobel turn the dialler up to 6, and we were still not connecting.

During this time if we tried to dial out of our phones it would take about a minute to even start ringing, which is why the dialler is returning our data as NA’s
In some instances it connected calls then cut them off.

We manually rang a batch of data that the dialler had gone through in this time. It had put it all as NA, in some instances it should have been DC, some should have been actual connections, so the dialler was putting stuff through as NA’s even though it wasn’t. (As I said ratio 4 downcount was returning the same issues as having it up higher)

So the DATA is not an issue as we tested it (yes of course a percentage is bad, but not all) – and how hard we are running the dialler is also not the issue

The dialler has been run much higher that I run it in the past without issues

Today we ran it slower than normal, the way our provider wanted us too, and still had the same issues.
I did turn it up after that so that when we have our “good patches” we got more calls.

One thing we did notice is that when there were less people on the server we didn’t seem to get any issues.

With a 20mb lease line and new servers I can’t imagine there are any issues our end – in fact we have been pinging the server our end and during the connectivity issues the ping has remained the same, usually around 2ms

That leaves our providers systems not being able to cope with the amount of data we are trying to put across it.

There are plenty of posts on Vicidial with people getting wait time sub 15 seconds and no connection issues – yet our provider seem to think we should aim for a minute between calls. – this again makes me think their systems cannot handle the data load.

We tried making the max calls per server 30 instead of 60 (so overall 90 instead of 180) wait times worsened, same connection issues.

I’m not sure what is left apart from new dialler or provider.

Any ideas? (thanks in advance for your help)
Davidian
 
Posts: 10
Joined: Fri Apr 27, 2012 10:23 am

Re: Serious Connectivity Issues

Postby williamconley » Sun Apr 29, 2012 12:28 pm

I see a lot of interesting troubleshooting talk ... but I'm missing some very necessary information. Baby steps.

when you post, please post your entire configuration including (but not limited to) your installation method and vicidial version with build.

this IS a requirement for posting along with reading the stickies (at the top of each forum) and the manager's manual (available on EFLO.net, both free and paid versions)

You should also post: Asterisk version, telephony hardware (model number is helpful here), cluster information if you have one, and whether any other software is installed in the box. If your installation method is "from scratch" you must post your operating system and should also post the .iso version from which you installed your original operating system.

If this is a "Cloud" or "Virtual" server, please note the technology involved along with the version of that techology (ie: VMware Server Version 2.0.2). If it is not, merely stating the Motherboard model # and CPU would be helpful.

Similar to This:

Vicibox X.X from .iso | Vicidial X.X.X-XXX Build XXXXXX-XXXX | Asterisk X.X.X | Single Server | No Digium/Sangoma Hardware | No Extra Software After Installation | Intel DG35EC | Core2Quad Q6600

Next:
...seriously terrible connectivity...
...to the point where every 20-30 minuites we lose connectivity for 10 mins or so...
...the issue is we run the dialler too high and have bad data again...

None of these contains any useful technical information.

Please: During "problem moments":

Can your agents see the login screen?

Is sound quality bad?

Do agent phones refuse to register?

Is the admin screen available?

What is the server load on each of the machines?

And here's my favorite question: Why is your database 8GB? Are you archiving your old logs using the archive script provided in the crontab for high-volume systems?
Vicidial Installation and Repair, plus Hosting and Colocation
Newest Product: Vicidial Agent Only Beep - Beta
http://www.PoundTeam.com # 352-269-0000 # +44(203) 769-2294
williamconley
 
Posts: 20258
Joined: Wed Oct 31, 2007 4:17 pm
Location: Davenport, FL (By Disney!)

Re: Serious Connectivity Issues

Postby Davidian » Mon Apr 30, 2012 4:02 am

Hi, having only worked in this company for a month i haven't got the answers to most of these questions, everything was set up origionally years ago by our vicidial provoder (the provider i mentioned above, i wasn't talking about ISP)

Rather than give you half the answer now i will speak to them and our chaps in IT and try and describe the whole picture for you, hopefully I will have an answer by the end of the day, although our provider is rather slow at getting back to us on any of the queries we submit.
Davidian
 
Posts: 10
Joined: Fri Apr 27, 2012 10:23 am

Re: Serious Connectivity Issues

Postby Davidian » Mon Apr 30, 2012 6:38 am

hmm, our provider has suddenly come back saying they have found a rogue bit of code that they think might have been causing the issue but don't want to pin themselves down to that and we will keep our eyes on it...

so after a month of them blaming everyone else they seem to have found a problem their end... if only they had looked into it that long ago we wouldn't be in this situation... who know's might not be the issue. after getting the 10.20 blip they found the error, and we didn't get our 11.20 or 12.20 blip today.

The issue seems resolved though I would like the post left open for a little while just in case if that is okay?
As you took the time out to respond to me, I thought I would provide you with as much information as I can at this stage.

No one has the answer of how it was installed so i cannot provide the bit of information for you.

Vicidial: VERSION: 2.4-364a BUILD: 120409-1136

We have 3 Dial servers running CentOS 5.5 final
1 DB Server running CentOS 6.2 Final
- the spec for the DB Server is as follows (and I imagine to be similar for the others but without taking apart the rack to get the product# we don't know)
- HP ProLiant DL360 G7 E5606 1P 4GB-R P410i/ZM 4 SFF 460W RPS Server (633778-421) - I did provide a URL to it here but i'm not allowed to as a new user...
- Ours is actually been upgraded to 12gb of ram

1 Backup DB Server running CentOS 5.8 Final - this is where we run reports and things from to minimise the load
1 Webserver - although we are not sure since we have got the new servers in if this is doing anything

They all go into a switch connected to a JMC Router (for our internal network) the router is a cisco 1800 series
this then connects to a cisco 1900 series - which is connects to our Zen 20mb lease line

This has a fiber link to a BT 21cn Box

From there we are not sure the "flight path" but it goes to a server at our provider, which the calls then connect through. - no info on the server their end.

Now, to your questions

to the point where every 20-30 minuites we lose connectivity for 10 mins or so...

When this happens on the carriers we can see the congestion numbers shooting up.
When we try and dial from our IP phones we don't even get a dial tone for about a minuite - we believe this to be caused by the congestion.

Please: During "problem moments":

Can your agents see the login screen?
Yes

Is sound quality bad? most of the time no, but there have be a few instances where we got "crackaly" calls

Do agent phones refuse to register?
sometimes it does say there is no connectivity.

Is the admin screen available? yes

What is the server load on each of the machines? fine for the most part, but the load does spike during these errors, oddly everything else remains rather low... eg 2% processor usage.

And here's my favorite question: Why is your database 8GB? Are you archiving your old logs using the archive script provided in the crontab for high-volume systems?

I believe the DB is that big because we keep the old customers data so we do not call them again. we do not know how the logs are archived.

what do you think it should be? What is normal? - if you belive the DB could be an issue, we could look into "thinning" it out.
Davidian
 
Posts: 10
Joined: Fri Apr 27, 2012 10:23 am

Re: Serious Connectivity Issues

Postby Davidian » Mon Apr 30, 2012 6:57 am

I think i spoke too soon.

Where as before when we got the 20 past issue the carrier would highlight a lot of congestion issues....

this issue is highlighting a lot of CHANNELUNAVAILABLE

now our provider are trying to say this is due to poor data and they are all disconnects, but in reading this forum that is not what channel unavailable means at all

Oddly all calls during this time return a disposition of No Answer AutoDial - in trying them out, some get through to customers some don't - so I need to find out what this channel unavailable issue is - any ideas?


CARRIER STATS: HANGUP STATUS 24 HOURS 6 HOURS 1 HOUR 15 MIN 5 MIN 1 MIN
ANSWER 13320 13320 4058 655 1 1
BUSY 292 292 87 10 0 0
CANCEL 8106 8106 2050 368 1 0
CHANUNAVAIL 9999 9999 4568 2095 1113 236
CONGESTION 321 321 1 0 0 0
NOANSWER 9 9 2 0 0 0
Davidian
 
Posts: 10
Joined: Fri Apr 27, 2012 10:23 am

Re: Serious Connectivity Issues

Postby williamconley » Mon Apr 30, 2012 8:19 am

look in your crontab -e entries at all the entries available. this is for two purposes: first to see if anything appears to be running at an interval that may coincide with your error; second is for education about the scripts that are available, especially the ones that are NOT being used. check those out, you may find them useful. especially the ones that say archive. the archiving script does not delete, it merely moves data from live tables to archive tables. this reduces the "live" table size and seriously improves performance. also check /etc/crontab for possible entires (that's non-standard, but you have apparently a very non-standard system).

it could be mysql replication or backup that runs and slams your system on a timed basis. in which case, obviously, this will need to be turned off during work-hours until the issue is resolved. by that same token, it could be ANY process that runs ... so you may have a bit of a hunt on your hands.

also consider checking your logs. there are some logs in /var/log that will indicate the "firing" of cron jobs. this could be very handy to watch just about the moment you expect your problem to occur.

also seriously consider that you may simply be under a dos or brute force attack if you are not using a whitelist access method. in which case you'll never resolve it until you secure the system. that, however, should show in your logs (either for ssh attempted logins or asterisk attempted registrations).

channelunavailable is an indication that asterisk had no way to dial the number in question. look at your asterisk cli for a single instance and see what the full number dialed was (including dial prefix and dial code) and duplicate it when you are not having the problem. there are a great many ways that could lead you.

happy hunting ;)
Vicidial Installation and Repair, plus Hosting and Colocation
Newest Product: Vicidial Agent Only Beep - Beta
http://www.PoundTeam.com # 352-269-0000 # +44(203) 769-2294
williamconley
 
Posts: 20258
Joined: Wed Oct 31, 2007 4:17 pm
Location: Davenport, FL (By Disney!)

Re: Serious Connectivity Issues

Postby Davidian » Tue May 01, 2012 4:57 am

Well the 20 past error was definitly solved by them blanking out that erronous code.

The other errors are the channunavailable and sometimes the dialler not even calling anything for a while.

we don't have access to those logs etc currently but we have spoken to our provider who are under the impression we have numbers in our DB that are not valid, and therefore are not being attempeted to be called, or if they are they go into channunavailable.

I have changed the scripts on the lists creation and we will only be pulling in potentially valid numbers now and I am going through the DB looking for the problems and fixing/deleting them.

If that doesn't solve it then they are going to have to come up with something else.

Thank you very much for your help, you have definitly given me some great pointers in what I need to look into :)
Davidian
 
Posts: 10
Joined: Fri Apr 27, 2012 10:23 am


Return to Support

Who is online

Users browsing this forum: No registered users and 137 guests