vicidial monitoring tool - zabbix
Posted:
Tue Oct 18, 2022 3:44 pm
by jayboo876
Currently researching a tool to monitor vicidial cluster. Something with a few dashboards to monitor generic system resources, db, web, carriers, calls etc. Zabbix (
https://www.zabbix.com/integrations/asterisk) seems to be a good tool that has been popping up. Just wanted to know if anyone has been using it and can provide feedback or suggest something better. Thanks.
installation
generic vicibox10 iso, ast16 update, cluster mode, dedicated virtual servers
Re: vicidial monitoring tool - zabbix
Posted:
Mon Oct 24, 2022 7:08 am
by jamiemurray
I set this up already and have been playing around with setting some alerts up to replace my current solution.
Besides the server and services monitoring, a few other things I been working on are:
- Wait Time Alert - If wait time in last couple minutes is over 30s, Increase max dial level slightly up to a max defined limit, if wait time still doesn't decrease, alert
- Dialable leads low - If estimated time left before leads run out is less than 30 minutes, reset the campaign lists, if by the next minute the time is still under 30 minutes before leads run out, alert.
- Monitor call failure counts - if all calls start failing, pause the campaigns, alert, if call failures suddenly increase, alert.
- FAS alerts - If auto detected FAS calls increase, alert.
- Agent Dispo alerts - Consecutive dispoing (common with new agents, they get trigger happy on the hotkeys and just dispo every call the same thing regardless of what happened).
It's not been hugely urgent for me as I already have a solution looking after most of these things but the flexibility of Zabbix allows you to take automatic actions based on conditions with relative ease and apply further checks to ensure the issue is resolved which will greatly reduce the alarms we receive for some campaigns we look after whilst providing us with a simple interface to manage alert events and collaborate on them without conflict, currently we use teams internally and just type a message in the group chat to say something like [CLIENT]/[CAMP]/[Alarm Type] Responding / Resolved. Eg. ACME/EN/Data Low Responding or ACME/EN/Wait Time Responding but occasionally someone will forget then two people start working on it (my team all work from home) and end up making conflicting changes to campaigns etc.
This is especially true when we have multiple alarms going off at once. I like also how alarms can be prioritized in the interface based on severity, something my crude solution I have in place now doesn't allow us to do.