Vicibox random crash in Dell Optiplex 360

Support forum for the ViciBox ISO Server Install and ISO LiveCD Demo

Moderators: enjay, williamconley, Staydog, mflorell, MJCoate, mcargile, Kumba

Vicibox random crash in Dell Optiplex 360

Postby indreias » Tue Nov 15, 2011 8:58 am

Hello,

Running with an up to date vicibox 3.1.12 (OpenSuse 11.3) and experience random freeze of the machine (Dell Optiplex 360)

If running with "nosmp maxcpus=0" (options from the Failsafe boot option) everything is OK (with the exception of having only one CPU...).

Could somebody provide some hints?

Best regards,
Ioan.

host65:~ # uname -a
Linux host65 2.6.34.10-0.2-pae #1 SMP 2011-07-20 18:48:56 +0200 i686 i686 i386 GNU/Linux
Last edited by indreias on Wed Nov 16, 2011 7:32 am, edited 1 time in total.
indreias
 
Posts: 16
Joined: Thu Apr 01, 2010 6:35 am

Postby williamconley » Tue Nov 15, 2011 4:48 pm

check your logs in /var/log for messages and any form of error at the time of death.

check your cpu fan to be sure you are not overheating (1 cpu will not generate much heat, so the fan will work better)
Vicidial Installation and Repair, plus Hosting and Colocation
Newest Product: Vicidial Agent Only Beep - Beta
http://www.PoundTeam.com # 352-269-0000 # +44(203) 769-2294
williamconley
 
Posts: 20258
Joined: Wed Oct 31, 2007 4:17 pm
Location: Davenport, FL (By Disney!)

Postby indreias » Wed Nov 16, 2011 3:49 am

Thanks for the provided hints, but:

# nothing interesting saved in /var/log/messages

# fan is working OK - also I have monitored the "sensors" info and both CPU have normal values (42 and 36 degrees).

# configured kernel dump and reboot in case of kernel panic and panic_oops - but still the machine freeze randomly without any traces

# the freeze is somehow "deep" as nothing works in this stage, including magic sysrq - I am suspecting a hardware problem but having no traces I could not point out exactly what is the problem...

PS: yesterday I've performed a zypper update and the freeze is still present with the new kernel (2.6.34.10-0.4-pae).
indreias
 
Posts: 16
Joined: Thu Apr 01, 2010 6:35 am

Postby indreias » Wed Nov 16, 2011 7:32 am

Just a short notice of the workaround identified today:

- the application which produce the random crash is the mondoarchive backup utility used on all of our production servers for full backups

- in case I run the mondoarchive only on one CPU (using maxcpus=1 kernel option or by using taskset -c <cpu_number> mondoarchive ...) we could not reproduce the freeze (from 25 tests performed in a loop).

All my best,
Ioan
indreias
 
Posts: 16
Joined: Thu Apr 01, 2010 6:35 am

Postby williamconley » Wed Nov 16, 2011 7:48 pm

mondoarchive isn't exactly mondo any more.

look at /usr/share/astguiclient/ADMIN_backup.pl

it will even ftp the resulting backup set off server for you. put it in cron.
Vicidial Installation and Repair, plus Hosting and Colocation
Newest Product: Vicidial Agent Only Beep - Beta
http://www.PoundTeam.com # 352-269-0000 # +44(203) 769-2294
williamconley
 
Posts: 20258
Joined: Wed Oct 31, 2007 4:17 pm
Location: Davenport, FL (By Disney!)

Postby indreias » Thu Nov 17, 2011 9:42 am

Thanks William,

I was aware about the ADMIN_backup.pl - but we are using mondoarchive to perform a full-backup. It generate an ISO file useful to restore the server in case of emergencies.

Could you elaborate on your message as "mondoarchive isn't exactly mondo any more." is puzzling me.

Best regards,
Ioan
indreias
 
Posts: 16
Joined: Thu Apr 01, 2010 6:35 am


Return to ViciBox Server Install and Demo

Who is online

Users browsing this forum: No registered users and 94 guests

cron