Page 1 of 1

Linux crash

PostPosted: Tue Mar 04, 2014 4:58 am
by phil_discount
Hello,

we've got about 10 clusters running vicidial 2.2.0.
Installed Redux 3.0.5 (Opensuse 11.3) und Redux 4.0.3 (Opensuse 12.1).
every server has differnt hardware
sometimes server acting only as Webserver or only as telephonyserver.

every week one server is completely down. no screen on VGA port, keyboard not responding.
we already installed kdump for a crash Dump. but if the server crash the server doesnt create a dump file.
if i crash the server manually, server crates a dump file under /var/crash/DATE/...

i've got no idea to solve the problem. i looked at each log in /var/log.
sometimes a server reports as last message "kernel: hrtimer: interrupt took 141378 ns", but not every time.

if anybody has an idea what i can do to find the problem, please tell me :-)

one server has 8 cores and 10GB RAM, load average 0,18 0,16 0,16 - only telephony about 25 agents.
i think it cannot be to less power.

regards
philip

Re: Linux crash

PostPosted: Tue Mar 04, 2014 6:35 am
by geoff3dmg
When a server crashes with no screen output it's usually a hardware issue. I'm assuming the BIOS/Firmware is up to date. Test the hard disk(s) and the memory (I'd suspect the memory). Beyond that you can try and use the serial console to get a kernel dump.

Re: Linux crash

PostPosted: Tue Mar 04, 2014 6:46 am
by phil_discount
we have 10 differnt Servers, cant believe thats an hardware issue. any other idea? ;-)
i will google for console dump.
do u know a good tool for checking ram and harddisk?

Re: Linux crash

PostPosted: Tue Mar 04, 2014 6:53 am
by geoff3dmg
You can run a 'memtest86+' memory test from the grub boot menu. You can use the 'badblocks' command from the console on the hard drive.

Re: Linux crash

PostPosted: Wed Mar 05, 2014 5:28 am
by phil_discount
i checked memory and harddisk .. no errors, everything was fine.
now i disabled all RSYNC cronjobs to sync some files (not vicidial related) from networks shares.
perhaps rsync is causing the problems.

i will report

Re: Linux crash

PostPosted: Sat Mar 22, 2014 4:51 pm
by phil_discount
since i splitted web and asterisk server, everything works fine.

Re: Linux crash

PostPosted: Sat Mar 22, 2014 5:10 pm
by williamconley
another method you could use (next time) is to boot to a CD after hard power down. you may find the server was indeed running but had no networking and since your last reboot didn't have kbd/monitor on it the system refused to interact with them when added. we've had a few systems (still do in fact) that will not allow addition of a video or kbd after boot if one was not present.

Re: Linux crash

PostPosted: Sun Mar 23, 2014 5:08 am
by phil_discount
Each server is connected to a kvm network switch. After a cradh keyboard isnt responding and no pictire on the screen

Re: Linux crash

PostPosted: Wed Mar 26, 2014 1:30 pm
by williamconley
perhpaps the kvm is crashing.

at any rate, put in a CD and boot from it. then you can browse the log files at the moment of crash. if you see log entries for AFTER the moment of crash, obviously it was still running since the CD boot will not have made any entries.