Page 1 of 1

ERROR with audio quality problems : channel.c:2290 ast_write

PostPosted: Thu Sep 16, 2010 8:18 am
by ronator
Hello guys,

again, I have an error and I have no clue since when it occured the first time. Guess, I was too busy to search in the logs, and my agents reported it quite late. I've been told that sometimes (well, around 100-250 times a day) there is an short period (one or two seconds) where the customer is not understandable, because there is a short sound peak; I could relate it to an error message on the CLI and searched the web and this forum for some hints. I found something but it was all very vague. I get the following error when the audio quality problem occurs:

channel.c:2290 ast_write: Thread -1231029360 Blocking 'SIP/2018-099d74a8', already blocked by thread -1255416944 in procedure ast_waitfor_nandfds

So, right now I am unsure if I have misconfigured something, I have a little idea what the error means but no idea how to get rid of it. I still use quite the same AGI strings for queueing calls like in vici 1.1.

If anyone has an idea what I should check or look out for, please let me know.

######
ViciDialNow 1.3
Asterisk 1.2.30.2
AstGuiClient VERSION: 2.2.1-237, BUILD: 100510-2015

All the best wishes,
r0n

PostPosted: Thu Sep 16, 2010 11:27 am
by williamconley
do you have any other software in your box? have you modified it after the initial installation? (vicidial/asterisk upgrades?)

what codec are you using for these calls?

what is your load level during these occurrences?

what is your timing source? (ztdummy ...?)

description

PostPosted: Mon Sep 20, 2010 3:12 am
by ronator
Hello and thanks for your answer. Here are mine:

1) No, no software added. But yes, installed version 1.3 modified due to SVN-upgrade to astgui 2.2 with its relating mysql-changes (updating the tables). No system upgrades, no software upgrades. Modification happened in this way, that I had to get the extensions.conf of the old system running in the new system. Had to struggle to get call-transfer running because of system-changes (wich is now done by DIDs).

2)We use alaw for all calls.

3) I do not see big level loads when this error occurs; I use "top" and asterisk is never consuming more than 9 to 15 percent. If there is another way to check the load, let me know.

4)Yes, timing source is ztdummy (with which I had problems, because after a server restart, the file /etc/rc.d/rc.local was not executable which caused the system not inserting the needed drivers. Setting the executable bit helped me out.)

Furthermore, I cannot see any pattern, when this error happens; no matter which campaign or agent or in-group, so it seems to happen system-wide. Can this error be caused by concurring (the same) entries in .conf-files and their relating AUTO-conf-files (e.g. like extensions-vicidial.conf) ? Or what about a SIP-account which is named exactly like another IAX-account ? May this provoke errors ?

<---->

A) Is there a possibility to get more information of these processes which are just random numbers from my point of view ?
B) Do you have any idea, why there might be processes trying to act on already running channels / calls ?


Thank you for debugging me and my system ;-)

PostPosted: Mon Sep 20, 2010 8:25 am
by williamconley
you can also use the Admin->Server modification option to view the server load.

i prefer htop (similar to top, but has interesting graphics and shows load on a per cpu basis in addition to the total).

it's not about how much Asterisk is using, it's about total load on the server at that moment.

that particular error looks more like a notification to me unless it can be demonstrated to occur at the same time as the sound quality issues (and NOT when there are NO sound quality issues).

sound quality issues are ordinarily CPU or bandwidth related and can be "battled" with more bandwidth, less bandwidth usage (get everyone to stop using the internet for anything except asterisk and/or compress your calls), or more CPU (faster processor or more processors) or by adding a timing source or transcoder to the server. or some combination thereof.

on rare occasions, of course, other things have been the problem: poor performance by a carrier (always have TWO or MORE so you can test this theory easily) or a bad internal network router or switch or NIC on the server.

troubleshooting

PostPosted: Mon Sep 20, 2010 11:42 am
by ronator
Dear William,

yes, I also prefer htop but on vici 1.3 there ain't no apt-get and yum seems to not have the corresponding repositories; that's why I use top (and if u press "1", you also see all CPUs, but no "GUI-style" ;-) .

The error messages occurs on the CLI right in the second, when the agent hears something like a beep or at least a sound that makes it impossible to understand the caller for at least one second. I checked that with agents knocking on the table when it occured while I was watching the CLI and I could clearly connect the error message with the short sound problem. I checked the load in the webinterface, trying to interpret the HELP; as far as I understood the max possible (but not desired) number is [number of cores x 100] ?!?

System Load: 268 - 21%
Live Channels: 132

When writing this post, the load level started with 96 and got up to 268. The system (all in one, asterisk, http, mysql) is an Dual-Quad core with 2,6GHz (8 cores), so I think adding more CPU is not really needed. But what do you think ? Are these numbers to high ? And what the heck is this procedure "ast_waitfor_nandfds" doing ? I mean, what does "nandfds" stands for ?

What you said about bandwith shouldn't be a problem: all calls are routed from a computing centre and the link has a 1-Gbit bandwidth with QoS. All "ordinary" internet access is routed another way on a different gateway, so all internet connections cannot interfer with the VoIP-packets ... No network-hardware was changed, internally. I guess, the only thing that remains is adding a timing source. Although I dutyfully read the (full) vicidial manuals, I have no idea how I can do this. And actually, I don't know what a transcoder is or how to combine both ideas ... We only use IAX and SIP. Furthermore, I was quite happy with /dev/zap/pseudo and that it did what it was designed for although I didn't really need to understand completely how and whyit does what it does.

The point with the carrier is also a good hint, because from time to time it seems our provider is least-cost-routing our calls, what produced quality problems in the past. I will try to run a test with another carrier, but I have to check the logs to see when this error ocurred for the first time.

So, you'd say it should not be a configuration problem (except the point of the timing source) ?

Thank you [for naming those different possibilities] and all the best wishes,
Ron Salvatore


P.S.: I re-checked level load. some agents logged off and now I have
System Load: 115 - 11%
Live Channels: 95

but the message still occurs (but not that often)

PostPosted: Mon Sep 20, 2010 3:48 pm
by williamconley
i would certainly try another carrier (even if only for a few calls) to see if those calls are not susceptible to this issue (rule out carrier, in other words).

thx

PostPosted: Wed Oct 06, 2010 8:59 am
by ronator
well, my company is moving to another building, and then we change the carrier. otherwise i'd have no possibility to change the carrier, since we have no other :/

thank you

PostPosted: Wed Oct 06, 2010 9:14 am
by williamconley
if your server has internet, you can try another carrier (not necessarily for production, but at least for a call or three). By carrier I meant the carrier for the calls, not for the internet. You can check the internet reliability with many tools available from many places.

And yet I have bumped into issues similar to that before, but they generally resulted in "dropped calls" all at once and required the replacement of a router.

Does this situation happen to all agents simultaneously or does it "wander" around the room to random individuals?