Page 1 of 1

cpu load spike causing time sync problems

PostPosted: Tue Mar 21, 2017 7:43 pm
by TwistedFister
Vicibox 7.0.3 from .iso | Vicidial 2.14-579a Build 161128-1746 | Asterisk 11.22.0-vici | CLuster setup: 1 web 1 DB 6 telephony | No Digium/Sangoma Hardware | No Extra Software After Installation

We have 95 agents currently and we are recording all calls and as we increase the size of our staff we are starting to have time sync issues with only 2 of the 6 nodes.

this is the 2 that are giving us problems
Image
Image

this is one that isn't
Image

all of the nodes are configured the same but the 2 in question are slightly older. My question is what process can we delay until after hours to avoid this?

crontab:

Code: Select all
### keepalive script for astguiclient processes
* * * * * /usr/share/astguiclient/ADMIN_keepalive_ALL.pl

### Compress log files and remove the really old ones
25 2 * * * /usr/bin/find /var/log/astguiclient -maxdepth 1 -type f -mtime +1 -print | grep -v \.gz | xargs gzip -9 >/dev/null 2>&1
30 2 * * * /usr/bin/find /var/log/astguiclient -maxdepth 1 -type f -mtime 1 -print | grep -v \.gz | xargs gzip -9 >/dev/null 2>&1
28 0 * * * /usr/bin/find /var/log/astguiclient -maxdepth 1 -type f -mtime +30 -print | xargs rm -f
30 0 * * * /usr/bin/find / -maxdepth 1 -name "screenlog.0*" -mtime +30 -print | xargs rm -f
35 2 * * * /usr/bin/find /var/log/asterisk -maxdepth 1 -type f -mtime +1 -print | grep -v \.gz | xargs gzip -9 >/dev/null 2>&1
40 2 * * * /usr/bin/find /var/log/asterisk -maxdepth 1 -type f -mtime 1 -print | grep -v \.gz | xargs gzip -9 >/dev/null 2>&1
29 0 * * * /usr/bin/find /var/log/asterisk -maxdepth 3 -type f -mtime +30 -print | xargs rm -f

### recording mixing/compressing/ftping scripts
#0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,48,51,54,57 * * * * /usr/share/astguiclient/AST_CRON_audio_1_move_mix.pl
0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,48,51,54,57 * * * * /usr/share/astguiclient/AST_CRON_audio_1_move_mix.pl --MIX
0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,48,51,54,57 * * * * /usr/share/astguiclient/AST_CRON_audio_1_move_VDonly.pl
30 18 * * *  /usr/share/astguiclient/AST_CRON_audio_2_compress.pl --MP3
*/5 19-23 * * * /usr/share/astguiclient/AST_CRON_audio_3_ftp.pl --MP3
*/5 0-4 * * * /usr/share/astguiclient/AST_CRON_audio_3_ftp.pl --MP3
#0 1 * * * /usr/share/astguiclient/AST_CRON_audio_4_ftp2.pl --ftp-server=server.ip --ftp-login=user --ftp-pass=pass --ftp-directory=/ --ftp-persistent --ftp-validate --transfer-limit=100000 --list-limit=100000


### remove old recordings more than 7 days old, and delete originals after 1 day
#24 0 * * * /usr/bin/find /var/spool/asterisk/monitorDONE -maxdepth 2 -type f -mtime +7 -print | xargs rm -f
24 1 * * * /usr/bin/find /var/spool/asterisk/monitorDONE/ORIG -maxdepth 2 -type f -mtime +1 -print | xargs rm -f

### kill Hangup script for Asterisk updaters
* * * * * /usr/share/astguiclient/AST_manager_kill_hung_congested.pl

### updater for voicemail
* * * * * /usr/share/astguiclient/AST_vm_update.pl

### updater for conference validator
* * * * * /usr/share/astguiclient/AST_conf_update.pl

### reset several temporary-info tables in the database
2 1 * * * /usr/share/astguiclient/AST_reset_mysql_vars.pl

### Reboot nightly to manage asterisk issues and memory leaks - uncomment if issues arise
00 6 * * 1-5 /sbin/reboot

### remove text to speech file more than 4 days old
#20 0 * * * /usr/bin/find /var/lib/asterisk/sounds/tts/ -maxdepth 2 -type f -mtime +4 -print | xargs rm -f

## uncomment below if you want to log agent phone_ip
#*/5 * * * * /usr/share/astguiclient/AST_phone_update.pl --agent-lookup

@reboot /usr/src/firewall/firewall.sh

Re: cpu load spike causing time sync problems

PostPosted: Tue Mar 21, 2017 8:22 pm
by williamconley
1) Neither of your images posted.

2) Time sync errors are often not really time sync errors. How is your time sync configured? Is one server the ntp master and the other servers all feed off that one?

3) It's common for overload (CPU overload, out of network ports overload, out of apache process overload ... lots of available overloads!) to cause a backlog that overloads the system ... and then packets don't get to/from the web server to the agent ... and then a dropped packet between the agent web browser and the agent web server causes a "time sync" error because one of the responsibilities of each of those packets is to update a timing field.

IE: Dropped packet = no time update = "time sync error!" and logged out agent.

4) So what, precisely, is overloading (and no, posting your stock crontab is not particularly useful)