Looking for help on ancient Vicidial server - Random Pausing
Posted: Fri Feb 26, 2021 3:21 pm
Hello, everyone.
We have a very old Vicidial server we have been using for a VERY long time. And I understand everyone's knee-jerk response is going to be, "You need to upgrade." I accept that, but I would really appreciate any kind of insight as to why, after over a decade of running, the server is suddenly pausing agents randomly throughout the day.
First, the server details. This is all running on a Dell 1950. Without the use of LSHW on the install itself, I can only say that it has 2 4-core processors, 16GB of ram, and 165GB disk space free. It has a cron job scheduled to reboot every morning at 3am. The OS is openSUSE 12.1, Asterisk 1.4.44-vivi, Vicidial version 2.6-395a, build 130221-1736. All carriers are connected through SIP.
Second, the agent hardware. All agents are using Polycom either SP 300 or 330 phones. Our service center (by far the most users) are all using Ubuntu thin clients and accessing Vicidial with Firefox. The remaining users are using either Windows 7 or 10 workstations and either Chrome or Firefox as their browser.
As I mentioned, the problem is agents are getting randomly paused. (Meaning they either see the, "Your session has been paused" alert or, in many cases, they get no indication and it's only when the real-time screen is checked that you can see them paused.) This started happening last Thursday, February 18. It started as only 2 or 3 agents. My initial suspicion was that there was another user attempting to log in using one of the other agent's IDs. That was quickly proven wrong. Over the course of the week, more and more agents are suffering from random pausing. Initially, it was limited to Windows users, as everyone using our Ubuntu thin clients were seemingly immune. Yesterday, however, even the Linux users are experiencing it. We've run the full gamut of web browsers, IE, Edge, Firefox, and Chrome, but it seems to have no bearing on the issue.
I've searched the internet and this forum for any clues and (from what limited information I could find) I've attempted the following troubleshooting:
Scanned the mysql database for any table corruption. (None found.)
Time sync issues. (Seemingly none, as all servers and workstations are set to sync to a singular server on our network. When checked, all machines seem to be within 1 second of each other.)
Network traffic. (I've consulted our NMS logs and, as far as I can determine, there has been no irregular spikes in traffic. To be honest, the trends seem to show there's been LESS traffic than normal.)
Misconfigured campaign. (Again, everything has been running smoothly for over a decade. But I rebuilt one of the campaigns and even created a brand new user to see if that might resolve the issue. It did not, as the new user also started suffering the pausing issue almost immediately.)
I've tried looking through the log files (vdautodial.<date> in particular) for any kind of information. While the log does show what time a given agent got paused, it doesn't exactly say what agent, extension, campaign, ingroup, or anything really. I've taken the times it lists and checked against the vicidial_user_log table in the database to find out who and what campaign. Sadly, though, it doesn't suggest what caused the pause in the first place. Oddly enough, even though the log says, "lagged call vla agent PAUSED," the user log says they were actually logged out.
Here is a snippet of the vdautodial log where an agent was paused.
I will gladly provide any additional information that may help to find the cause of the problem. In the mean time, I have a ghosted image of this server from 2018 that I'm going to attempt to restore, but I would really like to carry the database over so I wouldn't be stuck recreating a couple hundred users and a dozen or so routes.
One other thing that I only noticed in the log files today, the pausing seems to only happen between just before 8am and never after 8pm.
We have a very old Vicidial server we have been using for a VERY long time. And I understand everyone's knee-jerk response is going to be, "You need to upgrade." I accept that, but I would really appreciate any kind of insight as to why, after over a decade of running, the server is suddenly pausing agents randomly throughout the day.
First, the server details. This is all running on a Dell 1950. Without the use of LSHW on the install itself, I can only say that it has 2 4-core processors, 16GB of ram, and 165GB disk space free. It has a cron job scheduled to reboot every morning at 3am. The OS is openSUSE 12.1, Asterisk 1.4.44-vivi, Vicidial version 2.6-395a, build 130221-1736. All carriers are connected through SIP.
Second, the agent hardware. All agents are using Polycom either SP 300 or 330 phones. Our service center (by far the most users) are all using Ubuntu thin clients and accessing Vicidial with Firefox. The remaining users are using either Windows 7 or 10 workstations and either Chrome or Firefox as their browser.
As I mentioned, the problem is agents are getting randomly paused. (Meaning they either see the, "Your session has been paused" alert or, in many cases, they get no indication and it's only when the real-time screen is checked that you can see them paused.) This started happening last Thursday, February 18. It started as only 2 or 3 agents. My initial suspicion was that there was another user attempting to log in using one of the other agent's IDs. That was quickly proven wrong. Over the course of the week, more and more agents are suffering from random pausing. Initially, it was limited to Windows users, as everyone using our Ubuntu thin clients were seemingly immune. Yesterday, however, even the Linux users are experiencing it. We've run the full gamut of web browsers, IE, Edge, Firefox, and Chrome, but it seems to have no bearing on the issue.
I've searched the internet and this forum for any clues and (from what limited information I could find) I've attempted the following troubleshooting:
Scanned the mysql database for any table corruption. (None found.)
Time sync issues. (Seemingly none, as all servers and workstations are set to sync to a singular server on our network. When checked, all machines seem to be within 1 second of each other.)
Network traffic. (I've consulted our NMS logs and, as far as I can determine, there has been no irregular spikes in traffic. To be honest, the trends seem to show there's been LESS traffic than normal.)
Misconfigured campaign. (Again, everything has been running smoothly for over a decade. But I rebuilt one of the campaigns and even created a brand new user to see if that might resolve the issue. It did not, as the new user also started suffering the pausing issue almost immediately.)
I've tried looking through the log files (vdautodial.<date> in particular) for any kind of information. While the log does show what time a given agent got paused, it doesn't exactly say what agent, extension, campaign, ingroup, or anything really. I've taken the times it lists and checked against the vicidial_user_log table in the database to find out who and what campaign. Sadly, though, it doesn't suggest what caused the pause in the first place. Oddly enough, even though the log says, "lagged call vla agent PAUSED," the user log says they were actually logged out.
Here is a snippet of the vdautodial log where an agent was paused.
- Code: Select all
2021-02-26 08:29:02|SERVER CALLS PER SECOND MAXIMUM SET TO: 20 |50||
2021-02-26 08:29:02|LIVE AGENTS LOGGED IN: 1 ACTIVE CALLS: 7|
2021-02-26 08:29:02|OLD TRUNK SHORTS CLEARED: 1 |'','NEW_SVC'||
2021-02-26 08:29:02|NEW_SVC 192.168.10.205: agents: 0 (READY: 0) dial_level: 1 (0|1|0) -4|
2021-02-26 08:29:02|NEW_SVC 192.168.10.205: Calls to place: -7 (0 - 7 [7 + 0|7|2]) 7 |
2021-02-26 08:29:02|CAMPAIGN DIFFERENTIAL: 2.85 0.3 (0.65 - 0.35)|
2021-02-26 08:29:02|LOCAL TRUNK SHORTAGE: 0|0 (0 - 23)|
2021-02-26 08:29:02|NEW_SVC 2: INBOUND QUEUE NO DIAL, NO DIALING|
2021-02-26 08:29:02|| dead call vac INBOUND do nothing|4385303|9122281772|CLOSER||
2021-02-26 08:29:02|| dead call vac INBOUND do nothing|4385308|8132634570|CLOSER||
2021-02-26 08:29:02|| dead call vac INBOUND do nothing|4385317|9049938721|CLOSER||
2021-02-26 08:29:02|| dead call vac INBOUND do nothing|4385318|9049030349|CLOSER||
2021-02-26 08:29:02|| dead call vac INBOUND do nothing|4385331|9049555911|CLOSER||
2021-02-26 08:29:02|| logindate UPDATED 1|'NEW_SVC'||
2021-02-26 08:29:05|SERVER CALLS PER SECOND MAXIMUM SET TO: 20 |50||
2021-02-26 08:29:05|LIVE AGENTS LOGGED IN: 1 ACTIVE CALLS: 7|
2021-02-26 08:29:05|OLD TRUNK SHORTS CLEARED: 1 |'','NEW_SVC'||
2021-02-26 08:29:05|NEW_SVC 192.168.10.205: agents: 0 (READY: 0) dial_level: 1 (0|1|0) -4|
2021-02-26 08:29:05|NEW_SVC 192.168.10.205: Calls to place: -7 (0 - 7 [7 + 0|7|2]) 7 |
2021-02-26 08:29:05|CAMPAIGN DIFFERENTIAL: 3.05 0.15 (0.6 - 0.45)|
2021-02-26 08:29:05|LOCAL TRUNK SHORTAGE: 0|0 (0 - 23)|
2021-02-26 08:29:05|NEW_SVC 2: INBOUND QUEUE NO DIAL, NO DIALING|
2021-02-26 08:29:05|| dead call vac INBOUND do nothing|4385303|9122281772|CLOSER||
2021-02-26 08:29:05|| dead call vac INBOUND do nothing|4385308|8132634570|CLOSER||
2021-02-26 08:29:05|| dead call vac INBOUND do nothing|4385317|9049938721|CLOSER||
2021-02-26 08:29:05|| dead call vac INBOUND do nothing|4385318|9049030349|CLOSER||
2021-02-26 08:29:05|| dead call vac INBOUND do nothing|4385331|9049555911|CLOSER||
2021-02-26 08:29:05|| lagged call vla agent PAUSED 1|1|20210226082835|20210226082855|20210226082905||
2021-02-26 08:29:05|| lagged agent LOGOUT entry inserted 7256|INBOUND|||
2021-02-26 08:29:05|| logindate UPDATED 1|'NEW_SVC'||
2021-02-26 08:29:05|| updating server parameters 23|8365|-5|default||
2021-02-26 08:29:07|SERVER CALLS PER SECOND MAXIMUM SET TO: 20 |50||
2021-02-26 08:29:07|LIVE AGENTS LOGGED IN: 1 ACTIVE CALLS: 7|
2021-02-26 08:29:07|OLD TRUNK SHORTS CLEARED: 1 |'','NEW_SVC'||
2021-02-26 08:29:07|NEW_SVC 192.168.10.205: agents: 0 (READY: 0) dial_level: 1 (0|1|0) -4|
2021-02-26 08:29:07|NEW_SVC 192.168.10.205: Calls to place: -7 (0 - 7 [7 + 0|7|2]) 7 |
2021-02-26 08:29:07|CAMPAIGN DIFFERENTIAL: 3.25 0 (0.55 - 0.55)|
2021-02-26 08:29:07|LOCAL TRUNK SHORTAGE: 0|0 (0 - 23)|
2021-02-26 08:29:07|NEW_SVC 2: INBOUND QUEUE NO DIAL, NO DIALING|
2021-02-26 08:29:07|| dead call vac INBOUND do nothing|4385303|9122281772|CLOSER||
2021-02-26 08:29:07|| dead call vac INBOUND do nothing|4385308|8132634570|CLOSER||
2021-02-26 08:29:07|| dead call vac INBOUND do nothing|4385317|9049938721|CLOSER||
2021-02-26 08:29:07|| dead call vac INBOUND do nothing|4385318|9049030349|CLOSER||
2021-02-26 08:29:07|| dead call vac INBOUND do nothing|4385331|9049555911|CLOSER||
2021-02-26 08:29:07|| logindate UPDATED 1|'NEW_SVC'||
I will gladly provide any additional information that may help to find the cause of the problem. In the mean time, I have a ghosted image of this server from 2018 that I'm going to attempt to restore, but I would really like to carry the database over so I wouldn't be stuck recreating a couple hundred users and a dozen or so routes.
One other thing that I only noticed in the log files today, the pausing seems to only happen between just before 8am and never after 8pm.