Page 1 of 2

Hotkeys & Hangup

PostPosted: Thu Aug 03, 2006 2:15 pm
by enjay
Experiencing an issue with hotkeys and hangups

When hotkeys are used and the previous customer is not hungup on right away so when the agent gets another call (immediately) the calls overlap. Is there a way to force a hangup any faster?

-enjay

PostPosted: Thu Aug 03, 2006 2:23 pm
by mflorell
What is the loadavg on the server?

The other way to hangup a call that is not hungup is to use the CALLS IN THIS SESSION link at the bottom of vicdial.php and HANGUP the customer channel manually.

PostPosted: Thu Aug 03, 2006 2:33 pm
by enjay
Gotcha, when we experience this the load is between 4.5-6

PostPosted: Thu Aug 03, 2006 2:40 pm
by mflorell
That'll do it. One of the many things that gets bad when load consistently stays over 200%

PostPosted: Thu Aug 03, 2006 2:45 pm
by enjay
Wierd that my system load stays so high though my typical operating range is 2.5-5.0

I dont have anything special running I removed the lame process from cron...


here is a little snippet from 'top'

Code: Select all
top - 12:47:02 up 15:28,  3 users,  load average: 4.41, 3.52, 3.11
Tasks: 117 total,   8 running, 100 sleeping,   9 stopped,   0 zombie
Cpu(s): 10.0% us, 17.1% sy,  9.5% ni, 51.4% id, 11.1% wa,  0.2% hi,  0.8% si
Mem:   2074900k total,  1774952k used,   299948k free,    52132k buffers
Swap:        0k total,        0k used,        0k free,  1546868k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2748 root     -11   0 63508  19m 5140 S 39.4  0.9  34:15.38 asterisk
 2962 root      16   0  7016 5436 1872 S  7.9  0.3  12:49.06 AST_update.pl
25272 root      22   2  6224 4604 1836 S  3.3  0.2   0:00.10 AST_send_action
25294 root      22   2  6224 4628 1840 S  3.3  0.2   0:00.10 AST_send_action
25275 root      22   2  6220 4604 1836 S  3.0  0.2   0:00.09 AST_send_action
25246 root      22   2  6220 4600 1836 S  2.6  0.2   0:00.09 AST_send_action
25249 root      22   2  6220 4628 1844 S  2.0  0.2   0:00.06 AST_send_action
25265 root      20   2  6224 4628 1844 S  2.0  0.2   0:00.06 AST_send_action
 2954 root      16   0  5968 4296 1820 S  1.3  0.2   2:03.80 AST_VDauto_dial
25256 root      22   2  6220 4596 1836 S  1.3  0.2   0:00.04 AST_send_action
 9014 root      17   2  2692 1168  864 S  0.7  0.1   0:06.04 screen
24840 root      16   0  2020 1000  784 R  0.7  0.0   0:00.09 top
 2573 root      17   0  4548 1148  740 S  0.3  0.1   0:00.70 crond
 9015 root      17   2  7472 5872 1852 S  0.3  0.3   0:57.86 AST_manager_lis
 9020 root      17   2  7668 6028 1860 S  0.3  0.3   0:57.66 AST_manager_lis
 9026 root      17   2  5596 4008 1832 S  0.3  0.2   0:50.23 AST_manager_sen
    1 root      16   0  1748  572  492 S  0.0  0.0   0:01.86 init
    2 root      RT   0     0    0    0 S  0.0  0.0   0:00.14 migration/0
    3 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0

PostPosted: Thu Aug 03, 2006 3:27 pm
by mflorell
how many active channels do you have going on that server?

If you are recording all calls the loadavg can be inflated above what it should be as well.

PostPosted: Mon Aug 07, 2006 6:26 pm
by enjay
Perhaps Im not reading this right but if I have 77% of idle cpu on average why would I have calls overlap it appears that I only have this problem with answering machines and non-solicitation recording hangups.

Code: Select all
---------- TOTALS, PEAKS and AVERAGES
Total Calls in/out on this server:                 0 (obviously something broken here :))
Total Off-Hook time on this server (min):           
Average channels in use for server:          58.7789
Average load for server:                    157.4635
Peak load for server:                            841
Average USER process cpu percentage:         15.2217 %
Average SYSTEM process cpu percentage:        6.7950 %
Average IDLE process cpu percentage:         77.9676 %

PostPosted: Mon Aug 07, 2006 6:52 pm
by mflorell
IDLE cpu really doesn't mean anything when it comes to Asterisk and VICIDIAL. it's the average load and the amount of concurrent time that the loadavg spends above 100%, that is an indicator of the capacity of the machine.

PostPosted: Mon Aug 07, 2006 8:52 pm
by enjay
Should my h lines be at the end of my context does that make any difference at all? I had an instance today with 7 answering machines in a conference at once because it wouldnt hang them up automatically. The load at that time was 2.78..

PostPosted: Mon Aug 07, 2006 9:07 pm
by mflorell
There do seem to be issues with AMD on heavily loaded servers sometimes locking up channels. My suggestion then would be to try not using AMD and see if the load goes down or other problems go away.

PostPosted: Wed Aug 09, 2006 12:51 pm
by lkeller
We arent actually using AMD, but when an agent gets an answering machine and dispositions using hotkeys (or hangup customer) the next call will come in while the answering machine is still running.

There are 2 local and 2 long distance t1s on this server, and only a max of 15 agents. We are recording all calls, but they are being converted to mp3 on another system.

Is 15 agents with recording too much for 1 server?

PostPosted: Wed Aug 09, 2006 1:07 pm
by mflorell
We have done 15 agents on one server recording all calls before. The system was a 3.2GHz P4 with a SCSI RAID 10 and the load averaged around 90-110% most of the time.

The load on the Asterisk server, Web server and DB server all matter when it comes to the slow execution of Asterisk Manager API actions. One of those is your bottleneck.

PostPosted: Wed Aug 09, 2006 1:13 pm
by enjay
Our server is a 3.0 ghz XEON with 2gb memory RAID 0 15k Scsi Drives

Our DB server is a Dual 3.2ghz XEON with 4gb memory RAID 5 15k Scsi Drives

PostPosted: Wed Aug 09, 2006 2:39 pm
by mflorell
What is the loadavg of each when this happens? (Asterisk/web/DB)

PostPosted: Wed Aug 09, 2006 9:55 pm
by enjay
VICIDIAL server - bounces between 1.5 and 3.0

Database Server - ~0.5

-enjay

PostPosted: Wed Aug 09, 2006 10:36 pm
by mflorell
What about your web server? is that on another machine?

PostPosted: Thu Aug 10, 2006 10:20 am
by lkeller
The database and web are the same box.

PostPosted: Thu Aug 10, 2006 10:29 am
by mflorell
how many do you get per agent/hour?

PostPosted: Thu Aug 10, 2006 10:37 am
by lkeller
Thats really hard to judge, its happening on almost every answering machine and "this number does not accept solicitation" messages they get. During the day time hours, it can easily be 3/4 or more of the calls they take.

If they get someone on the phone, and that person hangs up, then it accepts that as a hang up and moves on.

PostPosted: Thu Aug 10, 2006 11:13 am
by mflorell
Have you tried running mtop or mytop and watching when this happens to see if any queries are getting stuck on your database server?

Also, you should take a look at your /home/cron/action_full* log to see if there are any issues with the Hangups being delayed or not being accepted.

PostPosted: Thu Aug 10, 2006 2:20 pm
by lkeller
Thanks Matt, I was looking through that log but I am not exactly sure what I am looking for. For example, I see:

|
Action: Hangup
Channel: Local/78600065@default-3ddb,2

DONE execute time: 5 seconds
2006-08-09 10:44:42|618404


Does that mean it took 5 seconds to actually hang up?

PostPosted: Thu Aug 10, 2006 4:22 pm
by mflorell
nope, the process hangs around for 5 seconds to make sure it was executed. what you need to look for is the Response:

Response: Success
Message: Originate successfully queued

for the specific action ID that was sent. in the action_full.2006-08-10 file you should see the Response after the Action.

PostPosted: Thu Aug 10, 2006 4:48 pm
by enjay
We see in excess of 2000+

2006-08-09 10:20:11|1|616335
|Response: Error
Message: No channel specified


While there are Significantly more Successes and Goodbye's there are 2000+ errors.

-enjay

PostPosted: Thu Aug 10, 2006 5:00 pm
by mflorell
Well, there's a start, look for the Action that corresponds to the 616335 manid in that file to see what was sent.

PostPosted: Thu Aug 10, 2006 5:03 pm
by enjay
Yup

Code: Select all
2006-08-09 10:20:11|0|616335
|
Action: Hangup
Channel:


2006-08-09 10:20:11|1|616335
|Response: Error
Message: No channel specified

2006-08-09 10:20:11|2|616335
|Response: Goodbye
Message: Thanks for all the fish.


Why would it fail to find the proper channel associated with it I assume it extracts this info from the live_channels table?

-enjay

EDIT: However today I see this in one of them..

Code: Select all
2006-08-10 10:02:36|0|642128
|
Action: Hangup
Channel: Zap/7-1

2006-08-10 10:02:36|1|642128
|Response: Error
Message: No such channel

2006-08-10 10:02:36|2|642128
|Response: Goodbye
Message: Thanks for all the fish.


odd...

PostPosted: Thu Aug 10, 2006 8:14 pm
by mflorell
If you can, use the manid to do a select * from vicidial_manager where manid='manid';

you have to do it fairly quick since they are deleted after a half hour or so, but that may give some insight into what's going on.

PostPosted: Fri Aug 11, 2006 10:49 am
by enjay
deleted from the database? I have 669498 entries in vicidial_manager does that seem abnormally high (assuming they should be deleted hourly)

per my example yesterday

Code: Select all
+--------+-----------+---------------------+--------+----------+-------------+---------------------+--------+--------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
| man_id | uniqueid  | entry_date          | status | response | server_ip   | channel             | action | callerid           | cmd_line_b | cmd_line_c | cmd_line_d | cmd_line_e | cmd_line_f | cmd_line_g | cmd_line_h | cmd_line_i | cmd_line_j | cmd_line_k |
+--------+-----------+---------------------+--------+----------+-------------+---------------------+--------+--------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
| 616335 | 0.0000000 | 2006-08-09 10:20:01 | DEAD   | N        | 192.168.0.5 | Privilege: call,all | Hangup | RH1234591155144001 | Channel:   |            |            |            |            |            |            |            |            |            |
+--------+-----------+---------------------+--------+----------+-------------+---------------------+--------+--------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
1 row in set (0.00 sec)


-enjay

PostPosted: Fri Aug 11, 2006 11:46 am
by mflorell
What version of Asterisk are you using?

What do you have in the asterisk version field of the servers table?

PostPosted: Fri Aug 11, 2006 11:55 am
by enjay
Everything is now 1.2.10

servers field shows old version 1.2.7 and 1.2.6

PostPosted: Fri Aug 11, 2006 12:24 pm
by mflorell
Somehow you are getting "Privilege: call,all" in the channel field. That is either a misconfiguration or a problem with the AST_manager_listen script.

Can you post the corresponding output from the listen.2006-08-XX file so I can check it?

PostPosted: Fri Aug 11, 2006 12:47 pm
by enjay
I could not correlate anything in the listen file.. based on todays errors I've seen heres an output for:

MANID: 672016

Code: Select all
2006-08-11 10:39:47|0|672016
|
Action: Hangup
Channel: Local/78600061@default-fb19,2


Code: Select all
2006-08-11 10:39:47|1|672016
|Response: Error
Message: No such channel


Code: Select all
+--------+-----------+---------------------+--------+----------+-------------+---------+--------+--------------------+----------------------------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
| man_id | uniqueid  | entry_date          | status | response | server_ip   | channel | action | callerid           | cmd_line_b                             | cmd_line_c | cmd_line_d | cmd_line_e | cmd_line_f | cmd_line_g | cmd_line_h | cmd_line_i | cmd_line_j | cmd_line_k |
+--------+-----------+---------------------+--------+----------+-------------+---------+--------+--------------------+----------------------------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
| 672016 | 0.0000000 | 2006-08-11 10:39:41 | SENT   | N        | 192.168.0.5 |         | Hangup | RH1234511553179810 | Channel: Local/78600061@default-fb19,2 |            |            |            |            |            |            |            |            |            |
+--------+-----------+---------------------+--------+----------+-------------+---------+--------+--------------------+----------------------------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+





MANID: 672012

Code: Select all
2006-08-11 10:39:47|0|672012
|
Action: Hangup
Channel: Zap/8-1


Code: Select all
2006-08-11 10:39:47|1|672012
|Response: Error
Message: No such channel


Code: Select all
+--------+-----------+---------------------+--------+----------+-------------+---------------------+--------+----------------------+------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
| man_id | uniqueid  | entry_date          | status | response | server_ip   | channel             | action | callerid             | cmd_line_b       | cmd_line_c | cmd_line_d | cmd_line_e | cmd_line_f | cmd_line_g | cmd_line_h | cmd_line_i | cmd_line_j | cmd_line_k |
+--------+-----------+---------------------+--------+----------+-------------+---------------------+--------+----------------------+------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
| 672012 | 0.0000000 | 2006-08-11 10:39:38 | DEAD   | N        | 192.168.0.5 | Privilege: call,all | Hangup | HLvdcW11553179743213 | Channel: Zap/8-1 |            |            |            |            |            |            |            |            |            |
+--------+-----------+---------------------+--------+----------+-------------+---------------------+--------+----------------------+------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+


I still cannot find correlatting entries in the listen file..

-enjay

PostPosted: Fri Aug 11, 2006 1:51 pm
by mflorell
It's not always easy but you have to look around the same time period in the listen file(before the hangup obviously) for the channel that you are trying to hangup and try to trace the progress of that call back to it's beginning. You may have to match up the callerIDname in some cases and the uniqueid of the call leg in others. Manager output is not the easiest thing to parse, but it's all we've got.

Also could you post a large chunk of your listen file to a web or FTP site somewhere for me to download and look at? (you can email me off-list with the address if you like)

PostPosted: Fri Aug 11, 2006 1:58 pm
by enjay
sent via pm

PostPosted: Fri Aug 11, 2006 2:16 pm
by mflorell
looks like there's lots of new Manager output in 1.2.10 that I haven't had problems with on my systems. I'm off for the weekend, but will dive into this on Monday.

It's just an issue of the listen script not catching the output properly so it shouldn't take more than a couple days.

PostPosted: Fri Aug 11, 2006 4:33 pm
by enjay
rock on! Talk to you monday have a great weekend..

PostPosted: Mon Aug 14, 2006 10:23 am
by enjay
Monday's back (unfortunately) This is pretty much the last thing I have on my list to get resolved to close my project.

PostPosted: Mon Aug 14, 2006 10:29 am
by mflorell
I looked at the output and can't figure out why this is happening to you. I will comb over the overlapping sections and post if I find any issues shortly.

PostPosted: Mon Aug 14, 2006 10:34 am
by enjay
If you need any output or system access from me let me know..

PostPosted: Mon Aug 14, 2006 10:48 am
by mflorell
Does your AST_update.pl script ever die around the time these things happen?

You can tell by monitoring the screen -r PID of the ASTupdate screen throughout the day, if it is the same number through the whole day then it is not dying, if it is a different number then it does.

PostPosted: Mon Aug 14, 2006 10:50 am
by enjay
Stays up

Code: Select all
root      2962  1.5  0.2   7016  5436 pts/4    Ss+  Aug02 252:56 /usr/bin/perl /home/cron/AST_update.pl


EDIT: AST_manager_listen is running twice is that abnormal? If so could that cause this behavior

Code: Select all
root      2959  0.0  0.0   2696  1228 ?        Ss   Aug02   0:34 /usr/bin/SCREEN -d -m -S ASTupdate /home/cron/AST_update.pl
root      2962  1.5  0.2   7016  5436 pts/4    Ss+  Aug02 252:56 /usr/bin/perl /home/cron/AST_update.pl
root      9084  0.0  0.0   2692  1164 ?        Ss   Aug03   0:55 /usr/bin/SCREEN -d -m -S ASTVDremote /home/cron/AST_VDremote_agents.pl
root      9085  0.0  0.1   5708  4088 pts/6    Ss+  Aug03   8:42 /usr/bin/perl /home/cron/AST_VDremote_agents.pl
root     20281  0.0  0.0   2692  1168 ?        Ss   Aug07   0:31 /usr/bin/SCREEN -d -m -S ASTVDauto /home/cron/AST_VDauto_dial.pl
root     20282  0.2  0.2   5968  4304 pts/5    Ss+  Aug07  26:57 /usr/bin/perl /home/cron/AST_VDauto_dial.pl
root     20134  0.0  0.0   2696  1184 ?        SNs  05:33   0:00 /usr/bin/SCREEN -d -m -S ASTlisten /home/cron/AST_manager_listen.pl
root     20135  0.0  0.2   6648  4972 pts/1    SNs+ 05:33   0:01 /usr/bin/perl /home/cron/AST_manager_listen.pl
root     20140  0.0  0.0   2692  1168 ?        SNs  05:33   0:00 /usr/bin/SCREEN -d -m -S ASTlisten /home/cron/AST_manager_listen.pl
root     20141  0.0  0.2   6652  4952 pts/2    SNs+ 05:33   0:01 /usr/bin/perl /home/cron/AST_manager_listen.pl
root     20145  0.0  0.0   2692  1164 ?        SNs  05:33   0:00 /usr/bin/SCREEN -d -m -S ASTsend /home/cron/AST_manager_send.pl
root     20146  0.0  0.1   5592  3940 pts/3    SNs+ 05:33   0:04 /usr/bin/perl /home/cron/AST_manager_send.pl
root     13416  0.5  0.1   5444  3692 ?        Ss   08:50   0:00 /usr/bin/perl /home/cron/AST_manager_kill_hung_congested.pl



another thing that has changed from the original is the startup for asterisk

Code: Select all
#screen -d -m -S asterisk -L /usr/sbin/asterisk -vvvvvvvvvvvvvvvvvvvvvvvvgcp
screen -d -m -S asterisk -L /usr/sbin/asterisk -gcp


-enjay