Hi,
I got a strange issue today.
Early this morning (PYST 0900), when all agents in all the active campaigns logged in, received the message "Your session has been paused".
When the agents unpaused themselves, the message popped again.
Now, i have experienced this behavior before and it was caused by some tables (one of the _live ones) been crashed; nothing that can not be
repaired with myisamchk. But this time there was no crashed tables.
The checks i done were:
- Network: ping, traceroute.
- Database: mysqlanalyze, SHOW TABLE STATUS FROM asterisk, journalctl -m
- Web: journalctl, tailf /var/log/apache2/{access, error}.log
- Telephony: tailf /var/log/astguiclient/screenlog.0
- Web load balancer: stats
- General OS: df -h, free -m, top, journalctl -m, check ASTGUIClient scripts messages with mail.
The actions i took:
- asterisk -x 'core restart now'
- systemctl mysql stop && systemctl mysql restart
- systemctl apache2 restart
- systemctl haproxy restart
- echo 3 > /proc/sys/vm/drop_caches
Additionaly, i have the cluster monitored with a virtual node running Zabbix 3, checking astguiclient conferences, channels, trunk status,
networking, load, storage and the like. There was no registered network outage.
In order to sort out the Web Load Balancer, i tried to log in directly to the web nodes, ending with the same result.
Now, i have resolved the situation restarting all nodes in the cluster in this order: Database, Web, Telephony.
I have created a tarball from all servers' /var/log/astguiclient to check for anything i have omitted.
Does anyone know something about this or have any pointer for where to look?
The cluster have been running about 9 months now without issues.
This is a clean ViciBox install. The Web Load Balancer was added later, after exaustive tests.
The SBC cited below manages all the SIP trunks against our various ITSP.
The cluster manages 150 agents, with inbound and manual campaigns, being inbound the most.
All the Contact Centers are connected to the cluster via a redundant 1 Gbps fiber MPLS link each. The Contact Centers are separated from the
data center.
Salutations
=======================
Vicidial Cluster
Installed using ViciBox_v.7.x86_64-7.0.2 except where noted.
Vicidial version: 2.12-561a | Build: 160708-0745 | Revision: 2561
All nodes (except the virtual ones) have network interfaces with bonding active-slave attached to 2 switches.
1 Database Node: Dell PowerEdge R720|CPU: Intel Xeon E5-2650 @ 2.0 GHz | RAM: 32 GB | Storage: 1 TB SAS RAID 5 | OS: ViciBox_v.7.x86_64-7.0.2
5 Telephony Nodes: Xorcom XE 3000 | CPU: Intel Core i3 i3-2120 @ 3.3 GHz | RAM: 4 GB | Storage: 500 GB SATA Linux SW RAID 1 | OS:
ViciBox_v.7.x86_64-7.0.2
2 Web Nodes:1 Web Load Balancer: Virtual Machine (KVM) | CPU: Intel Core i7 9xx @ 2.4 GHz | RAM: 8GB | Storage: 20 GB | OS: ViciBox_v.7.x86_64
-7.0.2
1 Web Load Balancer: Virtual Machine (KVM) | CPU: Intel Core i7 9xx @ 2.4 GHz | RAM: 1GB | Storage: 20 GB | OS: CentOS 7 running HAProxy
1 Archive Server: HP ProLiant ML150 G6 | CPU: Intel Xeon X3430 @ 2.4 GHz | RAM: 4GB | Storage: 2 TB SATA | OS: ViciBox_v.7.x86_64-7.0.2
2 Hypervisor: HP ProLiant ML150 G6 | CPU: Intel Xeon X3430 @ 2.4 GHz | RAM: 16GB | Storage: 500 GB SATA | OS: CentOS 7
1 SBC: Dell Optiplex | CPU: Intel Xeon X3430 @ 2.4 GHz | RAM: 16GB | Storage: 500 GB SATA | OS: Elastix 2.4 x86_64
=======================