Page 1 of 2
Hotkeys & Hangup
Posted:
Thu Aug 03, 2006 2:15 pm
by enjay
Experiencing an issue with hotkeys and hangups
When hotkeys are used and the previous customer is not hungup on right away so when the agent gets another call (immediately) the calls overlap. Is there a way to force a hangup any faster?
-enjay
Posted:
Thu Aug 03, 2006 2:23 pm
by mflorell
What is the loadavg on the server?
The other way to hangup a call that is not hungup is to use the CALLS IN THIS SESSION link at the bottom of vicdial.php and HANGUP the customer channel manually.
Posted:
Thu Aug 03, 2006 2:33 pm
by enjay
Gotcha, when we experience this the load is between 4.5-6
Posted:
Thu Aug 03, 2006 2:40 pm
by mflorell
That'll do it. One of the many things that gets bad when load consistently stays over 200%
Posted:
Thu Aug 03, 2006 2:45 pm
by enjay
Wierd that my system load stays so high though my typical operating range is 2.5-5.0
I dont have anything special running I removed the lame process from cron...
here is a little snippet from 'top'
- Code: Select all
top - 12:47:02 up 15:28, 3 users, load average: 4.41, 3.52, 3.11
Tasks: 117 total, 8 running, 100 sleeping, 9 stopped, 0 zombie
Cpu(s): 10.0% us, 17.1% sy, 9.5% ni, 51.4% id, 11.1% wa, 0.2% hi, 0.8% si
Mem: 2074900k total, 1774952k used, 299948k free, 52132k buffers
Swap: 0k total, 0k used, 0k free, 1546868k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2748 root -11 0 63508 19m 5140 S 39.4 0.9 34:15.38 asterisk
2962 root 16 0 7016 5436 1872 S 7.9 0.3 12:49.06 AST_update.pl
25272 root 22 2 6224 4604 1836 S 3.3 0.2 0:00.10 AST_send_action
25294 root 22 2 6224 4628 1840 S 3.3 0.2 0:00.10 AST_send_action
25275 root 22 2 6220 4604 1836 S 3.0 0.2 0:00.09 AST_send_action
25246 root 22 2 6220 4600 1836 S 2.6 0.2 0:00.09 AST_send_action
25249 root 22 2 6220 4628 1844 S 2.0 0.2 0:00.06 AST_send_action
25265 root 20 2 6224 4628 1844 S 2.0 0.2 0:00.06 AST_send_action
2954 root 16 0 5968 4296 1820 S 1.3 0.2 2:03.80 AST_VDauto_dial
25256 root 22 2 6220 4596 1836 S 1.3 0.2 0:00.04 AST_send_action
9014 root 17 2 2692 1168 864 S 0.7 0.1 0:06.04 screen
24840 root 16 0 2020 1000 784 R 0.7 0.0 0:00.09 top
2573 root 17 0 4548 1148 740 S 0.3 0.1 0:00.70 crond
9015 root 17 2 7472 5872 1852 S 0.3 0.3 0:57.86 AST_manager_lis
9020 root 17 2 7668 6028 1860 S 0.3 0.3 0:57.66 AST_manager_lis
9026 root 17 2 5596 4008 1832 S 0.3 0.2 0:50.23 AST_manager_sen
1 root 16 0 1748 572 492 S 0.0 0.0 0:01.86 init
2 root RT 0 0 0 0 S 0.0 0.0 0:00.14 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
Posted:
Thu Aug 03, 2006 3:27 pm
by mflorell
how many active channels do you have going on that server?
If you are recording all calls the loadavg can be inflated above what it should be as well.
Posted:
Mon Aug 07, 2006 6:26 pm
by enjay
Perhaps Im not reading this right but if I have 77% of idle cpu on average why would I have calls overlap it appears that I only have this problem with answering machines and non-solicitation recording hangups.
- Code: Select all
---------- TOTALS, PEAKS and AVERAGES
Total Calls in/out on this server: 0 (obviously something broken here :))
Total Off-Hook time on this server (min):
Average channels in use for server: 58.7789
Average load for server: 157.4635
Peak load for server: 841
Average USER process cpu percentage: 15.2217 %
Average SYSTEM process cpu percentage: 6.7950 %
Average IDLE process cpu percentage: 77.9676 %
Posted:
Mon Aug 07, 2006 6:52 pm
by mflorell
IDLE cpu really doesn't mean anything when it comes to Asterisk and VICIDIAL. it's the average load and the amount of concurrent time that the loadavg spends above 100%, that is an indicator of the capacity of the machine.
Posted:
Mon Aug 07, 2006 8:52 pm
by enjay
Should my h lines be at the end of my context does that make any difference at all? I had an instance today with 7 answering machines in a conference at once because it wouldnt hang them up automatically. The load at that time was 2.78..
Posted:
Mon Aug 07, 2006 9:07 pm
by mflorell
There do seem to be issues with AMD on heavily loaded servers sometimes locking up channels. My suggestion then would be to try not using AMD and see if the load goes down or other problems go away.
Posted:
Wed Aug 09, 2006 12:51 pm
by lkeller
We arent actually using AMD, but when an agent gets an answering machine and dispositions using hotkeys (or hangup customer) the next call will come in while the answering machine is still running.
There are 2 local and 2 long distance t1s on this server, and only a max of 15 agents. We are recording all calls, but they are being converted to mp3 on another system.
Is 15 agents with recording too much for 1 server?
Posted:
Wed Aug 09, 2006 1:07 pm
by mflorell
We have done 15 agents on one server recording all calls before. The system was a 3.2GHz P4 with a SCSI RAID 10 and the load averaged around 90-110% most of the time.
The load on the Asterisk server, Web server and DB server all matter when it comes to the slow execution of Asterisk Manager API actions. One of those is your bottleneck.
Posted:
Wed Aug 09, 2006 1:13 pm
by enjay
Our server is a 3.0 ghz XEON with 2gb memory RAID 0 15k Scsi Drives
Our DB server is a Dual 3.2ghz XEON with 4gb memory RAID 5 15k Scsi Drives
Posted:
Wed Aug 09, 2006 2:39 pm
by mflorell
What is the loadavg of each when this happens? (Asterisk/web/DB)
Posted:
Wed Aug 09, 2006 9:55 pm
by enjay
VICIDIAL server - bounces between 1.5 and 3.0
Database Server - ~0.5
-enjay
Posted:
Wed Aug 09, 2006 10:36 pm
by mflorell
What about your web server? is that on another machine?
Posted:
Thu Aug 10, 2006 10:20 am
by lkeller
The database and web are the same box.
Posted:
Thu Aug 10, 2006 10:29 am
by mflorell
how many do you get per agent/hour?
Posted:
Thu Aug 10, 2006 10:37 am
by lkeller
Thats really hard to judge, its happening on almost every answering machine and "this number does not accept solicitation" messages they get. During the day time hours, it can easily be 3/4 or more of the calls they take.
If they get someone on the phone, and that person hangs up, then it accepts that as a hang up and moves on.
Posted:
Thu Aug 10, 2006 11:13 am
by mflorell
Have you tried running mtop or mytop and watching when this happens to see if any queries are getting stuck on your database server?
Also, you should take a look at your /home/cron/action_full* log to see if there are any issues with the Hangups being delayed or not being accepted.
Posted:
Thu Aug 10, 2006 2:20 pm
by lkeller
Thanks Matt, I was looking through that log but I am not exactly sure what I am looking for. For example, I see:
|
Action: Hangup
Channel: Local/78600065@default-3ddb,2
DONE execute time: 5 seconds
2006-08-09 10:44:42|618404
Does that mean it took 5 seconds to actually hang up?
Posted:
Thu Aug 10, 2006 4:22 pm
by mflorell
nope, the process hangs around for 5 seconds to make sure it was executed. what you need to look for is the Response:
Response: Success
Message: Originate successfully queued
for the specific action ID that was sent. in the action_full.2006-08-10 file you should see the Response after the Action.
Posted:
Thu Aug 10, 2006 4:48 pm
by enjay
We see in excess of 2000+
2006-08-09 10:20:11|1|616335
|Response: Error
Message: No channel specified
While there are Significantly more Successes and Goodbye's there are 2000+ errors.
-enjay
Posted:
Thu Aug 10, 2006 5:00 pm
by mflorell
Well, there's a start, look for the Action that corresponds to the 616335 manid in that file to see what was sent.
Posted:
Thu Aug 10, 2006 5:03 pm
by enjay
Yup
- Code: Select all
2006-08-09 10:20:11|0|616335
|
Action: Hangup
Channel:
2006-08-09 10:20:11|1|616335
|Response: Error
Message: No channel specified
2006-08-09 10:20:11|2|616335
|Response: Goodbye
Message: Thanks for all the fish.
Why would it fail to find the proper channel associated with it I assume it extracts this info from the live_channels table?
-enjay
EDIT: However today I see this in one of them..
- Code: Select all
2006-08-10 10:02:36|0|642128
|
Action: Hangup
Channel: Zap/7-1
2006-08-10 10:02:36|1|642128
|Response: Error
Message: No such channel
2006-08-10 10:02:36|2|642128
|Response: Goodbye
Message: Thanks for all the fish.
odd...
Posted:
Thu Aug 10, 2006 8:14 pm
by mflorell
If you can, use the manid to do a select * from vicidial_manager where manid='manid';
you have to do it fairly quick since they are deleted after a half hour or so, but that may give some insight into what's going on.
Posted:
Fri Aug 11, 2006 10:49 am
by enjay
deleted from the database? I have 669498 entries in vicidial_manager does that seem abnormally high (assuming they should be deleted hourly)
per my example yesterday
- Code: Select all
+--------+-----------+---------------------+--------+----------+-------------+---------------------+--------+--------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
| man_id | uniqueid | entry_date | status | response | server_ip | channel | action | callerid | cmd_line_b | cmd_line_c | cmd_line_d | cmd_line_e | cmd_line_f | cmd_line_g | cmd_line_h | cmd_line_i | cmd_line_j | cmd_line_k |
+--------+-----------+---------------------+--------+----------+-------------+---------------------+--------+--------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
| 616335 | 0.0000000 | 2006-08-09 10:20:01 | DEAD | N | 192.168.0.5 | Privilege: call,all | Hangup | RH1234591155144001 | Channel: | | | | | | | | | |
+--------+-----------+---------------------+--------+----------+-------------+---------------------+--------+--------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
1 row in set (0.00 sec)
-enjay
Posted:
Fri Aug 11, 2006 11:46 am
by mflorell
What version of Asterisk are you using?
What do you have in the asterisk version field of the servers table?
Posted:
Fri Aug 11, 2006 11:55 am
by enjay
Everything is now 1.2.10
servers field shows old version 1.2.7 and 1.2.6
Posted:
Fri Aug 11, 2006 12:24 pm
by mflorell
Somehow you are getting "Privilege: call,all" in the channel field. That is either a misconfiguration or a problem with the AST_manager_listen script.
Can you post the corresponding output from the listen.2006-08-XX file so I can check it?
Posted:
Fri Aug 11, 2006 12:47 pm
by enjay
I could not correlate anything in the listen file.. based on todays errors I've seen heres an output for:
MANID: 672016
- Code: Select all
2006-08-11 10:39:47|0|672016
|
Action: Hangup
Channel: Local/78600061@default-fb19,2
- Code: Select all
2006-08-11 10:39:47|1|672016
|Response: Error
Message: No such channel
- Code: Select all
+--------+-----------+---------------------+--------+----------+-------------+---------+--------+--------------------+----------------------------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
| man_id | uniqueid | entry_date | status | response | server_ip | channel | action | callerid | cmd_line_b | cmd_line_c | cmd_line_d | cmd_line_e | cmd_line_f | cmd_line_g | cmd_line_h | cmd_line_i | cmd_line_j | cmd_line_k |
+--------+-----------+---------------------+--------+----------+-------------+---------+--------+--------------------+----------------------------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
| 672016 | 0.0000000 | 2006-08-11 10:39:41 | SENT | N | 192.168.0.5 | | Hangup | RH1234511553179810 | Channel: Local/78600061@default-fb19,2 | | | | | | | | | |
+--------+-----------+---------------------+--------+----------+-------------+---------+--------+--------------------+----------------------------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
MANID: 672012
- Code: Select all
2006-08-11 10:39:47|0|672012
|
Action: Hangup
Channel: Zap/8-1
- Code: Select all
2006-08-11 10:39:47|1|672012
|Response: Error
Message: No such channel
- Code: Select all
+--------+-----------+---------------------+--------+----------+-------------+---------------------+--------+----------------------+------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
| man_id | uniqueid | entry_date | status | response | server_ip | channel | action | callerid | cmd_line_b | cmd_line_c | cmd_line_d | cmd_line_e | cmd_line_f | cmd_line_g | cmd_line_h | cmd_line_i | cmd_line_j | cmd_line_k |
+--------+-----------+---------------------+--------+----------+-------------+---------------------+--------+----------------------+------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
| 672012 | 0.0000000 | 2006-08-11 10:39:38 | DEAD | N | 192.168.0.5 | Privilege: call,all | Hangup | HLvdcW11553179743213 | Channel: Zap/8-1 | | | | | | | | | |
+--------+-----------+---------------------+--------+----------+-------------+---------------------+--------+----------------------+------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
I still cannot find correlatting entries in the listen file..
-enjay
Posted:
Fri Aug 11, 2006 1:51 pm
by mflorell
It's not always easy but you have to look around the same time period in the listen file(before the hangup obviously) for the channel that you are trying to hangup and try to trace the progress of that call back to it's beginning. You may have to match up the callerIDname in some cases and the uniqueid of the call leg in others. Manager output is not the easiest thing to parse, but it's all we've got.
Also could you post a large chunk of your listen file to a web or FTP site somewhere for me to download and look at? (you can email me off-list with the address if you like)
Posted:
Fri Aug 11, 2006 1:58 pm
by enjay
sent via pm
Posted:
Fri Aug 11, 2006 2:16 pm
by mflorell
looks like there's lots of new Manager output in 1.2.10 that I haven't had problems with on my systems. I'm off for the weekend, but will dive into this on Monday.
It's just an issue of the listen script not catching the output properly so it shouldn't take more than a couple days.
Posted:
Fri Aug 11, 2006 4:33 pm
by enjay
rock on! Talk to you monday have a great weekend..
Posted:
Mon Aug 14, 2006 10:23 am
by enjay
Monday's back (unfortunately) This is pretty much the last thing I have on my list to get resolved to close my project.
Posted:
Mon Aug 14, 2006 10:29 am
by mflorell
I looked at the output and can't figure out why this is happening to you. I will comb over the overlapping sections and post if I find any issues shortly.
Posted:
Mon Aug 14, 2006 10:34 am
by enjay
If you need any output or system access from me let me know..
Posted:
Mon Aug 14, 2006 10:48 am
by mflorell
Does your AST_update.pl script ever die around the time these things happen?
You can tell by monitoring the screen -r PID of the ASTupdate screen throughout the day, if it is the same number through the whole day then it is not dying, if it is a different number then it does.
Posted:
Mon Aug 14, 2006 10:50 am
by enjay
Stays up
- Code: Select all
root 2962 1.5 0.2 7016 5436 pts/4 Ss+ Aug02 252:56 /usr/bin/perl /home/cron/AST_update.pl
EDIT: AST_manager_listen is running twice is that abnormal? If so could that cause this behavior
- Code: Select all
root 2959 0.0 0.0 2696 1228 ? Ss Aug02 0:34 /usr/bin/SCREEN -d -m -S ASTupdate /home/cron/AST_update.pl
root 2962 1.5 0.2 7016 5436 pts/4 Ss+ Aug02 252:56 /usr/bin/perl /home/cron/AST_update.pl
root 9084 0.0 0.0 2692 1164 ? Ss Aug03 0:55 /usr/bin/SCREEN -d -m -S ASTVDremote /home/cron/AST_VDremote_agents.pl
root 9085 0.0 0.1 5708 4088 pts/6 Ss+ Aug03 8:42 /usr/bin/perl /home/cron/AST_VDremote_agents.pl
root 20281 0.0 0.0 2692 1168 ? Ss Aug07 0:31 /usr/bin/SCREEN -d -m -S ASTVDauto /home/cron/AST_VDauto_dial.pl
root 20282 0.2 0.2 5968 4304 pts/5 Ss+ Aug07 26:57 /usr/bin/perl /home/cron/AST_VDauto_dial.pl
root 20134 0.0 0.0 2696 1184 ? SNs 05:33 0:00 /usr/bin/SCREEN -d -m -S ASTlisten /home/cron/AST_manager_listen.pl
root 20135 0.0 0.2 6648 4972 pts/1 SNs+ 05:33 0:01 /usr/bin/perl /home/cron/AST_manager_listen.pl
root 20140 0.0 0.0 2692 1168 ? SNs 05:33 0:00 /usr/bin/SCREEN -d -m -S ASTlisten /home/cron/AST_manager_listen.pl
root 20141 0.0 0.2 6652 4952 pts/2 SNs+ 05:33 0:01 /usr/bin/perl /home/cron/AST_manager_listen.pl
root 20145 0.0 0.0 2692 1164 ? SNs 05:33 0:00 /usr/bin/SCREEN -d -m -S ASTsend /home/cron/AST_manager_send.pl
root 20146 0.0 0.1 5592 3940 pts/3 SNs+ 05:33 0:04 /usr/bin/perl /home/cron/AST_manager_send.pl
root 13416 0.5 0.1 5444 3692 ? Ss 08:50 0:00 /usr/bin/perl /home/cron/AST_manager_kill_hung_congested.pl
another thing that has changed from the original is the startup for asterisk
- Code: Select all
#screen -d -m -S asterisk -L /usr/sbin/asterisk -vvvvvvvvvvvvvvvvvvvvvvvvgcp
screen -d -m -S asterisk -L /usr/sbin/asterisk -gcp
-enjay