vicidial.org

Posted: **Thu Aug 03, 2006 2:15 pm**

Experiencing an issue with hotkeys and hangups

When hotkeys are used and the previous customer is not hungup on right away so when the agent gets another call (immediately) the calls overlap. Is there a way to force a hangup any faster?

-enjay

Posted: **Thu Aug 03, 2006 2:23 pm**

What is the loadavg on the server?

The other way to hangup a call that is not hungup is to use the CALLS IN THIS SESSION link at the bottom of vicdial.php and HANGUP the customer channel manually.

Posted: **Thu Aug 03, 2006 2:33 pm**

Gotcha, when we experience this the load is between 4.5-6

Posted: **Thu Aug 03, 2006 2:40 pm**

That'll do it. One of the many things that gets bad when load consistently stays over 200%

Posted: **Thu Aug 03, 2006 2:45 pm**

Wierd that my system load stays so high though my typical operating range is 2.5-5.0

I dont have anything special running I removed the lame process from cron...

here is a little snippet from 'top'

Code: Select all: top - 12:47:02 up 15:28, 3 users, load average: 4.41, 3.52, 3.11 Tasks: 117 total, 8 running, 100 sleeping, 9 stopped, 0 zombie Cpu(s): 10.0% us, 17.1% sy, 9.5% ni, 51.4% id, 11.1% wa, 0.2% hi, 0.8% si Mem: 2074900k total, 1774952k used, 299948k free, 52132k buffers Swap: 0k total, 0k used, 0k free, 1546868k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2748 root -11 0 63508 19m 5140 S 39.4 0.9 34:15.38 asterisk 2962 root 16 0 7016 5436 1872 S 7.9 0.3 12:49.06 AST_update.pl 25272 root 22 2 6224 4604 1836 S 3.3 0.2 0:00.10 AST_send_action 25294 root 22 2 6224 4628 1840 S 3.3 0.2 0:00.10 AST_send_action 25275 root 22 2 6220 4604 1836 S 3.0 0.2 0:00.09 AST_send_action 25246 root 22 2 6220 4600 1836 S 2.6 0.2 0:00.09 AST_send_action 25249 root 22 2 6220 4628 1844 S 2.0 0.2 0:00.06 AST_send_action 25265 root 20 2 6224 4628 1844 S 2.0 0.2 0:00.06 AST_send_action 2954 root 16 0 5968 4296 1820 S 1.3 0.2 2:03.80 AST_VDauto_dial 25256 root 22 2 6220 4596 1836 S 1.3 0.2 0:00.04 AST_send_action 9014 root 17 2 2692 1168 864 S 0.7 0.1 0:06.04 screen 24840 root 16 0 2020 1000 784 R 0.7 0.0 0:00.09 top 2573 root 17 0 4548 1148 740 S 0.3 0.1 0:00.70 crond 9015 root 17 2 7472 5872 1852 S 0.3 0.3 0:57.86 AST_manager_lis 9020 root 17 2 7668 6028 1860 S 0.3 0.3 0:57.66 AST_manager_lis 9026 root 17 2 5596 4008 1832 S 0.3 0.2 0:50.23 AST_manager_sen 1 root 16 0 1748 572 492 S 0.0 0.0 0:01.86 init 2 root RT 0 0 0 0 S 0.0 0.0 0:00.14 migration/0 3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0

Posted: **Thu Aug 03, 2006 3:27 pm**

how many active channels do you have going on that server?

If you are recording all calls the loadavg can be inflated above what it should be as well.

Posted: **Mon Aug 07, 2006 6:26 pm**

Perhaps Im not reading this right but if I have 77% of idle cpu on average why would I have calls overlap it appears that I only have this problem with answering machines and non-solicitation recording hangups.

Code: Select all: ---------- TOTALS, PEAKS and AVERAGES Total Calls in/out on this server: 0 (obviously something broken here :)) Total Off-Hook time on this server (min): Average channels in use for server: 58.7789 Average load for server: 157.4635 Peak load for server: 841 Average USER process cpu percentage: 15.2217 % Average SYSTEM process cpu percentage: 6.7950 % Average IDLE process cpu percentage: 77.9676 %

Posted: **Mon Aug 07, 2006 6:52 pm**

IDLE cpu really doesn't mean anything when it comes to Asterisk and VICIDIAL. it's the average load and the amount of concurrent time that the loadavg spends above 100%, that is an indicator of the capacity of the machine.

Posted: **Mon Aug 07, 2006 8:52 pm**

Should my h lines be at the end of my context does that make any difference at all? I had an instance today with 7 answering machines in a conference at once because it wouldnt hang them up automatically. The load at that time was 2.78..

Posted: **Mon Aug 07, 2006 9:07 pm**

There do seem to be issues with AMD on heavily loaded servers sometimes locking up channels. My suggestion then would be to try not using AMD and see if the load goes down or other problems go away.

Posted: **Wed Aug 09, 2006 12:51 pm**

We arent actually using AMD, but when an agent gets an answering machine and dispositions using hotkeys (or hangup customer) the next call will come in while the answering machine is still running.

There are 2 local and 2 long distance t1s on this server, and only a max of 15 agents. We are recording all calls, but they are being converted to mp3 on another system.

Is 15 agents with recording too much for 1 server?

Posted: **Wed Aug 09, 2006 1:07 pm**

We have done 15 agents on one server recording all calls before. The system was a 3.2GHz P4 with a SCSI RAID 10 and the load averaged around 90-110% most of the time.

The load on the Asterisk server, Web server and DB server all matter when it comes to the slow execution of Asterisk Manager API actions. One of those is your bottleneck.

Posted: **Wed Aug 09, 2006 1:13 pm**

Our server is a 3.0 ghz XEON with 2gb memory RAID 0 15k Scsi Drives

Our DB server is a Dual 3.2ghz XEON with 4gb memory RAID 5 15k Scsi Drives

Posted: **Wed Aug 09, 2006 2:39 pm**

What is the loadavg of each when this happens? (Asterisk/web/DB)

Posted: **Wed Aug 09, 2006 9:55 pm**

VICIDIAL server - bounces between 1.5 and 3.0

Database Server - ~0.5

-enjay

Posted: **Wed Aug 09, 2006 10:36 pm**

What about your web server? is that on another machine?

Posted: **Thu Aug 10, 2006 10:20 am**

The database and web are the same box.

Posted: **Thu Aug 10, 2006 10:29 am**

how many do you get per agent/hour?

Posted: **Thu Aug 10, 2006 10:37 am**

Thats really hard to judge, its happening on almost every answering machine and "this number does not accept solicitation" messages they get. During the day time hours, it can easily be 3/4 or more of the calls they take.

If they get someone on the phone, and that person hangs up, then it accepts that as a hang up and moves on.

Posted: **Thu Aug 10, 2006 11:13 am**

Have you tried running mtop or mytop and watching when this happens to see if any queries are getting stuck on your database server?

Also, you should take a look at your /home/cron/action_full* log to see if there are any issues with the Hangups being delayed or not being accepted.

Posted: **Thu Aug 10, 2006 2:20 pm**

Thanks Matt, I was looking through that log but I am not exactly sure what I am looking for. For example, I see:

|
Action: Hangup
Channel: Local/78600065@default-3ddb,2

DONE execute time: 5 seconds
2006-08-09 10:44:42|618404

Does that mean it took 5 seconds to actually hang up?

Posted: **Thu Aug 10, 2006 4:22 pm**

nope, the process hangs around for 5 seconds to make sure it was executed. what you need to look for is the Response:

Response: Success
Message: Originate successfully queued

for the specific action ID that was sent. in the action_full.2006-08-10 file you should see the Response after the Action.

Posted: **Thu Aug 10, 2006 4:48 pm**

We see in excess of 2000+

2006-08-09 10:20:11|1|616335
|Response: Error
Message: No channel specified

While there are Significantly more Successes and Goodbye's there are 2000+ errors.

-enjay

Posted: **Thu Aug 10, 2006 5:00 pm**

Well, there's a start, look for the Action that corresponds to the 616335 manid in that file to see what was sent.

Posted: **Thu Aug 10, 2006 5:03 pm**

Yup

Code: Select all: 2006-08-09 10:20:11|0|616335 | Action: Hangup Channel: 2006-08-09 10:20:11|1|616335 |Response: Error Message: No channel specified 2006-08-09 10:20:11|2|616335 |Response: Goodbye Message: Thanks for all the fish.

Why would it fail to find the proper channel associated with it I assume it extracts this info from the live_channels table?

-enjay

EDIT: However today I see this in one of them..

Code: Select all: 2006-08-10 10:02:36|0|642128 | Action: Hangup Channel: Zap/7-1 2006-08-10 10:02:36|1|642128 |Response: Error Message: No such channel 2006-08-10 10:02:36|2|642128 |Response: Goodbye Message: Thanks for all the fish.

odd...

Posted: **Thu Aug 10, 2006 8:14 pm**

If you can, use the manid to do a select * from vicidial_manager where manid='manid';

you have to do it fairly quick since they are deleted after a half hour or so, but that may give some insight into what's going on.

Posted: **Fri Aug 11, 2006 10:49 am**

deleted from the database? I have 669498 entries in vicidial_manager does that seem abnormally high (assuming they should be deleted hourly)

per my example yesterday

Code: Select all: +--------+-----------+---------------------+--------+----------+-------------+---------------------+--------+--------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+ | man_id | uniqueid | entry_date | status | response | server_ip | channel | action | callerid | cmd_line_b | cmd_line_c | cmd_line_d | cmd_line_e | cmd_line_f | cmd_line_g | cmd_line_h | cmd_line_i | cmd_line_j | cmd_line_k | +--------+-----------+---------------------+--------+----------+-------------+---------------------+--------+--------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+ | 616335 | 0.0000000 | 2006-08-09 10:20:01 | DEAD | N | 192.168.0.5 | Privilege: call,all | Hangup | RH1234591155144001 | Channel: | | | | | | | | | | +--------+-----------+---------------------+--------+----------+-------------+---------------------+--------+--------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+ 1 row in set (0.00 sec)

-enjay

Posted: **Fri Aug 11, 2006 11:46 am**

What version of Asterisk are you using?

What do you have in the asterisk version field of the servers table?

Posted: **Fri Aug 11, 2006 11:55 am**

Everything is now 1.2.10

servers field shows old version 1.2.7 and 1.2.6

Posted: **Fri Aug 11, 2006 12:24 pm**

Somehow you are getting "Privilege: call,all" in the channel field. That is either a misconfiguration or a problem with the AST_manager_listen script.

Can you post the corresponding output from the listen.2006-08-XX file so I can check it?

Posted: **Fri Aug 11, 2006 12:47 pm**

I could not correlate anything in the listen file.. based on todays errors I've seen heres an output for:

MANID: 672016

Code: Select all: 2006-08-11 10:39:47|0|672016 | Action: Hangup Channel: Local/78600061@default-fb19,2

Code: Select all: 2006-08-11 10:39:47|1|672016 |Response: Error Message: No such channel

Code: Select all: +--------+-----------+---------------------+--------+----------+-------------+---------+--------+--------------------+----------------------------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+ | man_id | uniqueid | entry_date | status | response | server_ip | channel | action | callerid | cmd_line_b | cmd_line_c | cmd_line_d | cmd_line_e | cmd_line_f | cmd_line_g | cmd_line_h | cmd_line_i | cmd_line_j | cmd_line_k | +--------+-----------+---------------------+--------+----------+-------------+---------+--------+--------------------+----------------------------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+ | 672016 | 0.0000000 | 2006-08-11 10:39:41 | SENT | N | 192.168.0.5 | | Hangup | RH1234511553179810 | Channel: Local/78600061@default-fb19,2 | | | | | | | | | | +--------+-----------+---------------------+--------+----------+-------------+---------+--------+--------------------+----------------------------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+

MANID: 672012

Code: Select all: 2006-08-11 10:39:47|0|672012 | Action: Hangup Channel: Zap/8-1

Code: Select all: 2006-08-11 10:39:47|1|672012 |Response: Error Message: No such channel

Code: Select all: +--------+-----------+---------------------+--------+----------+-------------+---------------------+--------+----------------------+------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+ | man_id | uniqueid | entry_date | status | response | server_ip | channel | action | callerid | cmd_line_b | cmd_line_c | cmd_line_d | cmd_line_e | cmd_line_f | cmd_line_g | cmd_line_h | cmd_line_i | cmd_line_j | cmd_line_k | +--------+-----------+---------------------+--------+----------+-------------+---------------------+--------+----------------------+------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+ | 672012 | 0.0000000 | 2006-08-11 10:39:38 | DEAD | N | 192.168.0.5 | Privilege: call,all | Hangup | HLvdcW11553179743213 | Channel: Zap/8-1 | | | | | | | | | | +--------+-----------+---------------------+--------+----------+-------------+---------------------+--------+----------------------+------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+

I still cannot find correlatting entries in the listen file..

-enjay

Posted: **Fri Aug 11, 2006 1:51 pm**

It's not always easy but you have to look around the same time period in the listen file(before the hangup obviously) for the channel that you are trying to hangup and try to trace the progress of that call back to it's beginning. You may have to match up the callerIDname in some cases and the uniqueid of the call leg in others. Manager output is not the easiest thing to parse, but it's all we've got.

Also could you post a large chunk of your listen file to a web or FTP site somewhere for me to download and look at? (you can email me off-list with the address if you like)

Posted: **Fri Aug 11, 2006 1:58 pm**

sent via pm

Posted: **Fri Aug 11, 2006 2:16 pm**

looks like there's lots of new Manager output in 1.2.10 that I haven't had problems with on my systems. I'm off for the weekend, but will dive into this on Monday.

It's just an issue of the listen script not catching the output properly so it shouldn't take more than a couple days.

Posted: **Fri Aug 11, 2006 4:33 pm**

rock on! Talk to you monday have a great weekend..

Posted: **Mon Aug 14, 2006 10:23 am**

Monday's back (unfortunately) This is pretty much the last thing I have on my list to get resolved to close my project.

Posted: **Mon Aug 14, 2006 10:29 am**

I looked at the output and can't figure out why this is happening to you. I will comb over the overlapping sections and post if I find any issues shortly.

Posted: **Mon Aug 14, 2006 10:34 am**

If you need any output or system access from me let me know..

Posted: **Mon Aug 14, 2006 10:48 am**

Does your AST_update.pl script ever die around the time these things happen?

You can tell by monitoring the screen -r PID of the ASTupdate screen throughout the day, if it is the same number through the whole day then it is not dying, if it is a different number then it does.

Posted: **Mon Aug 14, 2006 10:50 am**

Stays up

Code: Select all: root 2962 1.5 0.2 7016 5436 pts/4 Ss+ Aug02 252:56 /usr/bin/perl /home/cron/AST_update.pl

EDIT: AST_manager_listen is running twice is that abnormal? If so could that cause this behavior

Code: Select all: root 2959 0.0 0.0 2696 1228 ? Ss Aug02 0:34 /usr/bin/SCREEN -d -m -S ASTupdate /home/cron/AST_update.pl root 2962 1.5 0.2 7016 5436 pts/4 Ss+ Aug02 252:56 /usr/bin/perl /home/cron/AST_update.pl root 9084 0.0 0.0 2692 1164 ? Ss Aug03 0:55 /usr/bin/SCREEN -d -m -S ASTVDremote /home/cron/AST_VDremote_agents.pl root 9085 0.0 0.1 5708 4088 pts/6 Ss+ Aug03 8:42 /usr/bin/perl /home/cron/AST_VDremote_agents.pl root 20281 0.0 0.0 2692 1168 ? Ss Aug07 0:31 /usr/bin/SCREEN -d -m -S ASTVDauto /home/cron/AST_VDauto_dial.pl root 20282 0.2 0.2 5968 4304 pts/5 Ss+ Aug07 26:57 /usr/bin/perl /home/cron/AST_VDauto_dial.pl root 20134 0.0 0.0 2696 1184 ? SNs 05:33 0:00 /usr/bin/SCREEN -d -m -S ASTlisten /home/cron/AST_manager_listen.pl root 20135 0.0 0.2 6648 4972 pts/1 SNs+ 05:33 0:01 /usr/bin/perl /home/cron/AST_manager_listen.pl root 20140 0.0 0.0 2692 1168 ? SNs 05:33 0:00 /usr/bin/SCREEN -d -m -S ASTlisten /home/cron/AST_manager_listen.pl root 20141 0.0 0.2 6652 4952 pts/2 SNs+ 05:33 0:01 /usr/bin/perl /home/cron/AST_manager_listen.pl root 20145 0.0 0.0 2692 1164 ? SNs 05:33 0:00 /usr/bin/SCREEN -d -m -S ASTsend /home/cron/AST_manager_send.pl root 20146 0.0 0.1 5592 3940 pts/3 SNs+ 05:33 0:04 /usr/bin/perl /home/cron/AST_manager_send.pl root 13416 0.5 0.1 5444 3692 ? Ss 08:50 0:00 /usr/bin/perl /home/cron/AST_manager_kill_hung_congested.pl

another thing that has changed from the original is the startup for asterisk

Code: Select all: #screen -d -m -S asterisk -L /usr/sbin/asterisk -vvvvvvvvvvvvvvvvvvvvvvvvgcp screen -d -m -S asterisk -L /usr/sbin/asterisk -gcp

-enjay

vicidial.org

Hotkeys & Hangup

Hotkeys & Hangup