Page 1 of 2

lagged pause

PostPosted: Wed Jun 16, 2010 6:46 am
by vasix
Hi guys

I have a problem regarding LAGGED status.
I have 25-35 agents logged during the day and taking calls for several inbound groups, eveything is OK.
However, during the night shifts (when few calls ore none are received) or during the day (for 3 agents logged on some Ingroup with 1 call per hour) I experience LAGGED pauses constantly.
I checked the network connectivity (which is OK, as other agents from previous shifts has no problems at all when calls are flowing), checked the standby properties of workstations (actually disabled standby), checked the antivirus soft, checked the caching of the browser, yelled at agents not to move back and forth page when they are logged in, checked the database transactions for locked tables, the problem remains.

I am out of ideas.
Long story short: if the agent is taking calls intensively, everything is OK. If there are no calls for a long period of time, it is put to LAGGED.

Please advise, where should I look, what should I check?
Thank you in advance.

PS: using vicidial 2.2.0-234 on asterisk 1.2 (standard vicibox server install), 3 dialer servers, 1 DB server and 1 WEB server
PS2: load is under 0.5 on all machines

Vasix

PostPosted: Wed Jun 16, 2010 9:41 am
by mflorell
How long is a "long period of time" receiving calls?

After how long does the agent go LAGGED?

PostPosted: Wed Jun 16, 2010 12:03 pm
by vasix
Hi Matt

a long period is about 10 to 12 hours, everything is OK as our customers call in constantly and agents has activity;
the agents with longer wait time without calls (approx 30-45 minutes without calls) are put to LAGGED by system
they change places often, so the problem is not related to a specific seat/group of seats

I tried to understand the alghorythm behind the lagging detection (found something in AST_VDauto_dial.pl and AST_cleanup_agent_log.pl, but unfortunately I did not understood it). Is there a variable that can be defined to control this process?
On the other hand, can it be disabled? And if yes, what can happen worst?

best regards,

PostPosted: Wed Jun 16, 2010 3:27 pm
by mflorell
There are several factors to an agent going LAGGED, but they are almost all caused by the agent interface loosing contact with the webserver for more than 30 seconds so that their vicidial_live_agents record is not being updated.

PostPosted: Wed Jun 16, 2010 8:35 pm
by williamconley
are these agents "logged in" when they leave, or do they log out of their stations?

perhaps you could describe how it happens a bit more precisely (just to be sure)

and: are your servers and workstations all sync'd solidly to a single time source?

PostPosted: Thu Jun 17, 2010 1:41 am
by vasix
the agents logout when they leave
the problem exist only for agents with little or no calls, if they are just waiting for calls for more than 30-45 minutes they went lagged
all systems in our compnay are sync'ed by a singe internal NTP server

William, the scenario is like this:
1 agent taking call after call, with no time to wait, has no problems at all, everything works as expected
1 agent waiting for calls (logged on 1 ingroup with little traffic) goes lagged after 30-45 minutes of waiting and it is automatically paused by system

Matt, in a previous post you said the time for lagging and agent is 20 seconds
Now you say it is 10 seconds, did you modified it in this build or it is something relative?

PostPosted: Thu Jun 17, 2010 7:16 am
by mflorell
Actually, both numbers are wrong :) In the code it's 30 seconds, I will update my posting above.

At 10 seconds of non-communication the agent would not receive any calls, and at 30 seconds of non-communication the agent would be set to LAGGED.

PostPosted: Thu Jun 17, 2010 8:31 am
by williamconley
vasix wrote:William, the scenario is like this:
1 agent taking call after call, with no time to wait, has no problems at all, everything works as expected
1 agent waiting for calls (logged on 1 ingroup with little traffic) goes lagged after 30-45 minutes of waiting and it is automatically paused by system
perhaps your workstations are "sleeping" after that period of time?

has the agent been actively using the computer during this period?

PostPosted: Thu Jun 17, 2010 8:41 am
by vasix
so I have 30 seconds of inactivity if they go lagged...ouch

the workstations are not sleeping as they all have standby disabled from windows
more, the agents are continuously using them, as they work on emails/trouble tickets and other electronic cases

any idea?

PostPosted: Thu Jun 17, 2010 8:50 am
by williamconley
mflorell wrote:... at 30 seconds of non-communication the agent would be set to LAGGED.
non-communication, not inactivity.

when their browser loses communication with the server, the clock starts ticking. if they are not generating mysql events that demonstrate "connectivity", the system thinks they are "gone". 30 seconds of "gone" = lagged.

these are AJAX communications. perhaps you have other software that interferes with the AJAX, or other windows fill the live memory and this page is pushed to virtual memory and incapable of generating AJAX calls for a minute or two.

PostPosted: Mon Jun 21, 2010 8:49 am
by aouyar
I have registered an issue in Mantis, because we've done some throughout testing to determine that even the slightest disruption of communications (less than 1 second duration) might cause the agent interface to disconnect and the interface does not recover until the agent logs out and logs in again:
http://www.eflo.net/VICIDIALmantis/view.php?id=360

I think we've pinpointed the bug in vicidial.php and I am working on a solution.

PostPosted: Mon Jun 21, 2010 9:08 am
by vasix
great news!
waiting the fix with interest

PostPosted: Mon Jun 21, 2010 11:18 am
by aouyar
I've just posted a patch aimed at fixing the issue to Mantis. The patch implements the following logic:
* When the web server returns an error code or when the TCP connection is rejected the agent interface tries again at next poll cycle (1 second).
* When packets do not get to the destination, the remote end is totally silent, agent interface aborts request after 3 seconds and tries again.

The patch has only been tested on Firefox 3.6 browser. I will be testing it on Internet Explorer soon and I will send the results.

PostPosted: Mon Jun 21, 2010 12:26 pm
by vasix
thanks! I patched the file and I will keep you posted if anything wrong

PostPosted: Mon Jun 21, 2010 1:06 pm
by williamconley
Testing a manual version of the patch in 2.0.5 on a client box, too. :)

If someone reminds me, I'll try to remember to post the results (and if it works, create a 2.0.5 patch!)

PostPosted: Mon Jun 21, 2010 1:23 pm
by aouyar
Done some testing with IE7, seems to be working for IE7 too.

PostPosted: Thu Jun 24, 2010 2:46 pm
by aouyar
Anyone else testing the patch?

Matt, do you think you could give a look at this issue; it seems to be affecting many people.

PostPosted: Thu Jun 24, 2010 3:10 pm
by williamconley
the client i tested with on 2.0.5 likes the results. he apparently has networking issues (new fiber goes in tomorrow!) and this patch (he says THANKS!!) dropped the occurrences of the "automatically logged out agents" with dead clocks on their screens to about 20% of the original.

he says it used to happen to some agents half a dozen times or more during a 30 minute period, but it's down closer to once per half hour instead. (Much easier to deal with while awaiting his new network connection.)

but make no mistake, this fix allows vicidial to "recover" from network issues that should not be there (thus the new fiber installation).

PostPosted: Thu Jun 24, 2010 3:59 pm
by mflorell
I posted to the ticket in the tracker.

If added to the codebase, at first this would probably be an option with an easy off/on switch in the vicidial.php(possibly in the options.php file) so that those wishing to test it could do so and revert back if there were any issues, or there could easily be a different version of the script on the same webserver. I wouldn't want to throw a revision like this into a 300 seat call center without a lot of production testing.

Thanks for all of your hard work on this one aouyar!

PostPosted: Fri Jun 25, 2010 2:27 am
by aouyar
The problem with current situation is that 2 or 3 temporary network disruptions that last less than a few seconds, throughout a 8 hour shift for one agent can cause 2 or 3 annoying disconnections of the agent interface that can only be resolved by a logging out and logging in again to the application and this is only possible once the agent realizes that his/her session is behaving strangely. (In fact the clock on the agent interface is a reliable indicator of the disconnection event, but the agents are always attentive to it.) For the supervisor this can convert into a major headache, because for a 40-50 agent Call Center this can translate into dozens of complaints from agents.

Some clients started complaining more and more after the upgrade to 2.2, because agents that appear as paused in the Agent Status Side Panel (Paused by System) do not appear to be paused in the agent interface.

I totally agree with WIlliam Conley that most of the network problems that cause the disconnections must not occur in the first place and this patch might simply mask valuable symptoms that might indicate network problems, but even with quite decent hardware disconnections still occur once in a while. That's why, in my post in Mantis I proposed to count the disconnection events and give a visual indication of the network health to the agent.

PostPosted: Fri Jun 25, 2010 8:39 am
by williamconley
I've never had a situation where the agent was paused but was not notified. (well, within 6 seconds anyway)

PostPosted: Fri Jun 25, 2010 11:56 am
by mflorell
After a lot of testing and a few changes I have committed this feature to SVN agc_2.2.0 and trunk.

The attempts value is configurable in vicidial.php(and/or options.php in trunk):
$conf_check_attempts = '3';


Please test and confirm that the new code works for you.

PostPosted: Fri Jun 25, 2010 12:57 pm
by williamconley
that'll have to be the other guys, i only have one client currently experiencing the issue and he won't budge off 2.0.5. so the manual patch is the best he'll get.

PostPosted: Mon Jul 26, 2010 4:19 am
by phil_discount
Hello,

i've got the same problem.

i made some changes in vicidial.php and we use VERSION: 2.2.1-260 BUILD: 100527-2211...the fix isn't included in that version and i wan't upgrade because of the changes in my vicidial.php.

is it possible to make the changes manually?
if yes, how can i do it?

Thanks
regards
philip

PostPosted: Mon Jul 26, 2010 7:17 am
by mflorell
You would have to make the changes in the patch manually then.

What changes to the interface have you made?

PostPosted: Mon Jul 26, 2010 3:05 pm
by phil_discount
ohhh sorry, my fault.
i don't see the patch...i downloaded the new vicidial.php and searched for all differences...much work :-)

it seems to work fine...tomorrow i will see it in production with 40 live agents.

regards
philip

PostPosted: Wed Nov 10, 2010 8:18 pm
by AlSam
mflorell wrote:After a lot of testing and a few changes I have committed this feature to SVN agc_2.2.0 and trunk.

The attempts value is configurable in vicidial.php(and/or options.php in trunk):
$conf_check_attempts = '3';

Can't find this variable in /var/www/html/agc2/vicidial.php. Am I looking in the right place? This might help me out. Just today I had a situation where an agent's session was paused without her or the supervisor doing so. Thanks.

PostPosted: Wed Nov 10, 2010 10:15 pm
by williamconley
agc2 is strictly "VicidialNOW"/"GoAutoDial". you should try the GoAutoDial 2.0 release (which i believe is 2.2.1) and may have the file updated.

PostPosted: Thu Nov 11, 2010 2:48 pm
by AlSam
I am currently running 2.2.1-237 build 100510-2015

PostPosted: Thu Nov 11, 2010 3:19 pm
by williamconley
right. For the Admin section of Vicidial.

But the agc (agent portion) is stock and the agc2 (also agent portion) is NOT STOCK. It is not part of Vicidial.

The vicidial.php file in agc2 is altered by "VicidialNOW/GoAutoDial" (gardo) and not supported by The Vicidial Group. The Vicidial Group maintains agc on all releases; Gardo maintains agc2 on GoAutoDial (formerly VicidialNOW) only. agc2 is not in any other release.

if you want it to work ... use agc instead of agc2. Then go find the technical different and fix it and tell gardo ... by posting in the GoAutoDial/vicidialnow forum (or ask him how to fix it in that forum). It's right next door. :)

PostPosted: Sat Nov 13, 2010 12:08 pm
by AlSam
Thanks for the clarification william

Re:

PostPosted: Thu Dec 03, 2020 7:50 pm
by marzo
mflorell wrote:After a lot of testing and a few changes I have committed this feature to SVN agc_2.2.0 and trunk.

The attempts value is configurable in vicidial.php(and/or options.php in trunk):
$conf_check_attempts = '3';


Please test and confirm that the new code works for you.


Hello.
We have a Cluster of Vicibox 8.0.0 | Vicidial 2.14b0.5 | SVN 3304 | DB Schema 1608 | Asterisk 11.25.3-vici
The cluster have 1 database, 3 web and 3 telephony servers.
All of the servers have their time synchronized with a ntp server.
Our agents do not make any phone calls. They just login at Vicidial to register they time connections.
The problem we have is that the agents frequently get the following message: Your session have been paused. As shown here https://ibb.co/xs0RVt8
I have modified the file vicidial.php, the parameter:
$max_check_attempts from a value of 3 to 10. However this not solved the problem.
This problem happens since the system was built.
The problems also occurs when using OnHook agents.
When the problem occurs this is what the AGENT ACTIVITY FOR THIS TIME PERIOD shows https://ibb.co/64yrJB8
Any ideas ?
Regards

Re: lagged pause

PostPosted: Fri Dec 04, 2020 7:33 am
by carpenox
check the rtptimeout in sip.conf

Re: lagged pause

PostPosted: Fri Dec 04, 2020 8:15 am
by marzo
carpenox wrote:check the rtptimeout in sip.conf

rtptimeout=36000

Re: lagged pause

PostPosted: Fri Dec 04, 2020 10:39 am
by carpenox
do you have the one set to keep the connection active? rtpkeepalive=30 is good

Re: lagged pause

PostPosted: Fri Dec 04, 2020 11:25 am
by marzo
carpenox wrote:do you have the one set to keep the connection active? rtpkeepalive=30 is good

I changed to rtpkeepalive=30
The problems also occurs when using OnHook agents.

Re: lagged pause

PostPosted: Fri Dec 04, 2020 11:47 am
by marzo
I was checking the file vicidial.php that comes with my version of Vicidial.
This file has the patch developed by aouyar http://www.eflo.net/VICIDIALmantis/view.php?id=360:
--- vicidial.php.orig 2010-06-21 17:59:31.000000000 -0500
+++ vicidial.php 2010-06-21 18:02:31.000000000 -0500
@@ -3288,6 +3288,7 @@
{
if (typeof(xmlhttprequestcheckconf) == "undefined") {
//alert (xmlhttprequestcheckconf == xmlhttpSendConf);
+ xmlhttprequestcheckconf_wait = 0;
custchannellive--;
if ( (agentcallsstatus == '1') || (callholdstatus == '1') )
{
@@ -3330,12 +3331,12 @@
if (xmlhttprequestcheckconf)
{
checkconf_query = "server_ip=" + server_ip + "&session_name=" + session_name + "&user=" + user + "&pass=" + pass + "&client=vdc&conf_exten=" + taskconfnum + "&auto_dial_level=" + auto_dial_level + "&campagentstdisp=" + campagentstdisp;
- xmlhttprequestcheckconf.open('POST', 'conf_exten_check.php');
+ xmlhttprequestcheckconf.open('POST', 'conf_exten_check.php', true);
xmlhttprequestcheckconf.setRequestHeader('Content-Type','application/x-www-form-urlencoded; charset=UTF-8');
xmlhttprequestcheckconf.send(checkconf_query);
xmlhttprequestcheckconf.onreadystatechange = function()
- {
- if (xmlhttprequestcheckconf.readyState == 4 && xmlhttprequestcheckconf.status == 200)
+ {
+ if (xmlhttprequestcheckconf && xmlhttprequestcheckconf.readyState == 4 && xmlhttprequestcheckconf.status == 200)
{
var check_conf = null;
var LMAforce = taskforce;
@@ -3691,11 +3692,37 @@
nochannelinsession++;
}
}
+ delete xmlhttprequestcheckconf;
xmlhttprequestcheckconf = undefined;
- delete xmlhttprequestcheckconf;
}
+ else if (xmlhttprequestcheckconf && xmlhttprequestcheckconf.readyState == 4 && xmlhttprequestcheckconf.status != 200) {
+ // Cleanup after AJAX Request returns error.
+ // alert("Status: " + xmlhttprequestcheckconf.status);
+ delete xmlhttprequestcheckconf;
+ xmlhttprequestcheckconf = undefined;
+ }
+ }
+ }
+ }
+ else {
+ if (xmlhttprequestcheckconf) {
+ xmlhttprequestcheckconf_wait++;
+ if (xmlhttprequestcheckconf_wait >= 3) {
+ // Abort AJAX Request, due to timeout.
+ // The handler must take care of cleanup.
+ // alert("xmlhttprequestcheckconf: Abort (Wait > 3 sec)");
+ xmlhttprequestcheckconf.abort();
}
}
+ if (xmlhttprequestcheckconf_wait >= 5) {
+ // In case the handler function fails to do cleanup, cleanup manually.
+ xmlhttprequestcheckconf_wait = 0;
+ delete xmlhttprequestcheckconf;
+ xmlhttprequestcheckconf = undefined;
+ }
+ else {
+ xmlhttprequestcheckconf = undefined;
+ }
}
}
except the line:
xmlhttprequestcheckconf.open('POST', 'conf_exten_check.php', true);

The question is if the vicidial.php has the patch why the agents get the following message: Your session have been paused. As shown here https://ibb.co/xs0RVt8

Re: lagged pause

PostPosted: Sun Dec 27, 2020 6:00 pm
by marzo
I used the procedure outlined by aouyar to simulate network problems:

* Execute the following command for monitoring live_agents tables continuously to detect disconnects:
while true; do clear; echo "SELECT user,status,last_state_change,last_update_time FROM vicidial_live_agents;" | mysql -u root asterisk; sleep 2; done

* Execute the following command on web server to simulate a 5 second disruption of communication with the web server (drop all packets to port 80):
iptables -A INPUT -i eth0 -p tcp --dport 80 -j DROP; sleep 5; iptables -F INPUT

* Execute the following command on web server to simulate a 5 second period where the web server rejects connections (reject all packets to port 80):
iptables -A INPUT -i eth0 -p tcp --dport 80 -j REJECT; sleep 5; iptables -F INPUT

I did some testing with sleep times from 5 second up to 35 seconds to simulate network problems of short and long duration.
My vicidial.php has $conf_check_attempts = '3'; # number of attempts to try before loosing webserver connection, for bad network setups

For times from 5, 10, 15, 20, 25 and 30 I not got the message "Your session has been pause" at the agent interface.
This mean that if you have moderate network problems or a moderate overloaded web server you are not going to have the message "Your session has been pause" at the agent interface.

For time equal or greater to 35 seconds I got the message "Your session has been pause" at the agent interface.
This mean that if you have extreme network problems or a extreme overloaded web server you are going to have the message "Your session has been pause" at the agent interface.

Really aouyar made a good job with that patch !

Re:

PostPosted: Fri Apr 07, 2023 7:38 pm
by marzo
mflorell wrote:There are several factors to an agent going LAGGED, but they are almost all caused by the agent interface loosing contact with the webserver for more than 30 seconds so that their vicidial_live_agents record is not being updated.


The main factor to an agent going LAGGED is caused when the agent is using the Google Chrome browser.
The Google Chrome browser has a bug. When the agent minimize the Google Chrome browser windows beacause the agent has to do other jobs on other windows, after some minutes (3,4,5,6,7 or more), the Google Chrome does not send any information to the webserver.
I found this using a Wireshark at the station of the agent. I observed that after some minutes (3,4,5,6,7 or more) the Google Chrome browser does not send the HTTP POST /agc/vdc_db_query.php HTTP/1.1 to the webserver.
Then I made a query to the table vicidial_agent_log. I observed the column sub_status with a value of LAGGED.
Then I made a query to the table vicidial_live_agents. I observed the column status with a value of PAUSED.
At the agent interface appears a windows with the following message: Your session has been paused. At this point the agent must click OK to close this window. The agent must close that session and then start a new session.
This problem does not happen when the agent uses Mozilla Firefox.

Re: lagged pause

PostPosted: Sat Apr 08, 2023 7:45 am
by mflorell
It's not a bug if they put it in there on purpose, the Chrome developers consider that to be a "feature". Here's more info,
https://www.vicidial.org/docs/WEB_BROWS ... TTLING.txt