Page 1 of 1
Vicibox random crash in Dell Optiplex 360
Posted:
Tue Nov 15, 2011 8:58 am
by indreias
Hello,
Running with an up to date vicibox 3.1.12 (OpenSuse 11.3) and experience random freeze of the machine (Dell Optiplex 360)
If running with "nosmp maxcpus=0" (options from the Failsafe boot option) everything is OK (with the exception of having only one CPU...).
Could somebody provide some hints?
Best regards,
Ioan.
host65:~ # uname -a
Linux host65 2.6.34.10-0.2-pae #1 SMP 2011-07-20 18:48:56 +0200 i686 i686 i386 GNU/Linux
Posted:
Tue Nov 15, 2011 4:48 pm
by williamconley
check your logs in /var/log for messages and any form of error at the time of death.
check your cpu fan to be sure you are not overheating (1 cpu will not generate much heat, so the fan will work better)
Posted:
Wed Nov 16, 2011 3:49 am
by indreias
Thanks for the provided hints, but:
# nothing interesting saved in /var/log/messages
# fan is working OK - also I have monitored the "sensors" info and both CPU have normal values (42 and 36 degrees).
# configured kernel dump and reboot in case of kernel panic and panic_oops - but still the machine freeze randomly without any traces
# the freeze is somehow "deep" as nothing works in this stage, including magic sysrq - I am suspecting a hardware problem but having no traces I could not point out exactly what is the problem...
PS: yesterday I've performed a zypper update and the freeze is still present with the new kernel (2.6.34.10-0.4-pae).
Posted:
Wed Nov 16, 2011 7:32 am
by indreias
Just a short notice of the workaround identified today:
- the application which produce the random crash is the mondoarchive backup utility used on all of our production servers for full backups
- in case I run the mondoarchive only on one CPU (using maxcpus=1 kernel option or by using taskset -c <cpu_number> mondoarchive ...) we could not reproduce the freeze (from 25 tests performed in a loop).
All my best,
Ioan
Posted:
Wed Nov 16, 2011 7:48 pm
by williamconley
mondoarchive isn't exactly mondo any more.
look at /usr/share/astguiclient/ADMIN_backup.pl
it will even ftp the resulting backup set off server for you. put it in cron.
Posted:
Thu Nov 17, 2011 9:42 am
by indreias
Thanks William,
I was aware about the ADMIN_backup.pl - but we are using mondoarchive to perform a full-backup. It generate an ISO file useful to restore the server in case of emergencies.
Could you elaborate on your message as "mondoarchive isn't exactly mondo any more." is puzzling me.
Best regards,
Ioan