Page 1 of 1

What would you monitor in a working Vicidial server?

PostPosted: Fri Aug 01, 2014 3:44 pm
by ccabrera
I´m trying to develop a series of tests to measure the health of several Vicidial systems (kind of a Nagios monitoring, but Vicidial specific). So far, I´m monitoring these:

- Average load
- Disk space
- Amount of Asterisk calls (core show channels)
- SIP responsiveness
- IAX responsiveness
- SSH access
- Status of PRI/R2 trunks

However, I´ve found out that even though these metrics help me qualify a working Asterisk system, they do not necessarily measure Vicidial very well. For example:

- What if the hopper isn´t getting filled?
- What if the dialer is not sending any outbound calls, and the calls measured by core show channels are only the ones from the Meetme conferences?
- What if everyone is on pause, or calls are simply getting dropped because of a database error?

And so on...

So what I´m looking for is key data which I can get either by looking at logs or at the MySQL database and that I can query every minute or so, load them into a monitoring system, and set an alarm in case any of those numbers look wrong.

Does this make sense to anyone?

What would you monitor?

Re: What would you monitor in a working Vicidial server?

PostPosted: Mon Aug 04, 2014 11:43 pm
by williamconley
Web site response (as in live web server)
Mysql response (as in mysql is responding)
SIP registration to another server (indicates asterisk is online, not crashed)
You could build a DID round trip call generator (ie: make sure both inbound and outbound calls complete). If you use DTMF tones in your example call to go through a call menu, you'll know sound is also working. Note that this should actually go through the carrier to ensure the carrier is online.
Check your carrier stats (imbalance, for instance, if congestion is suddenly over 50% of your call results)
Track number of registered agent phones, logged in dialer agents and logged in web server agents (a sudden change in those numbers could be indication of a fault ... or just a shift change ... but interesting nonetheless).
Mysql shares a good bit of load information which can be tracked as well.
Hard drive lag time is useful (dying hard drive will rat itself out with this)
Bandwidth usage
Number and IP of internet connections is useful and fairly easy to gather (and discard after a few days to avoid data overload)