Page 1 of 1
Facing a RAM concern - memory leaks ?
Posted:
Fri Feb 18, 2011 7:55 am
by ronator
Hello geeks,
the last days i've been "optimizing" some vicibox-settings on my web-db-combo-server because I wanted it to "dedicate" more ressources to those important services. Meanwhile, I faced a strange situation: On my first vici-phone-server (the one with all keepalive-scripts on) the RAM usage behaves not typically, even for linux ^^ Lets say I got 4 GB; top says:
Mem:3950 used:195M buffers: 94M cache: 551M
As far as I knew, linux uses the cache as a disk-cache to improve system-performance. Normally, it should also free the cache again, when data has been written to disk or when other processes need more active RAM. But I have been watching the cache and its growing up, until almost all RAM is in cache ... when this happens, asterisk crashes. When I read something about memory leaks and checked crontab, where there is also mentioned I should reboot the machine every night "due to asterisk issues or memory leaks", I thought everything is ok. But the other machines do not fill up the RAM-cache that quick. I tried to find the corresponding process (vmstat, ps, top) but all I found was rsyslogd writing to /var/log. I have the idea that I maybe made something wrong so that a script is running but not finishing correctly. only guessed because I know that PHP is known for memory leaks.
I know it is a very superficial problem-description but maybe anyone got an idea where to look/check for ? Or what is known to cause this behaviour.
top shows now after around 30 minutes:
Mem: 3950M used: 230M buffers:170M cache: 596M
thats just more than 1 MB per minute ...
Phone-Server-only with
Vicibox: 3.0.6 32 bit
Asterisk 1.4.21.1
ASTGUI VERSION: 2.2.1-237, BUILD: 100510-2015
Posted:
Fri Feb 18, 2011 8:54 am
by williamconley
cache filling up is not likely your issue. mysql always eats all available memory to cache (so it "can" reload data-pages, but it allows that memory to be re-used by the system if it wants, this improves data seek in mysql and still allows necessary memory to be available for the system).
it sounds to me like you're a guy pushing buttons a little deeply in the system. i'm guessing you've changed a few things (trying to "optimize") and have broken something essential.
i recommend you reinstall and not touch anything outside the Vicidial GUI and crontab options for a few days and see if your problem "goes away". then you can "optimize" an item or two a day with good recordkeeping to find out what you changed that was a bad idea.
nope
Posted:
Fri Feb 18, 2011 9:06 am
by ronator
oh, well i only make one change a day, test it, then going on ... and you did misunderstand me ... the web-db-combo us running smoothly, my modifications did not ruin anything on that machine. it's the first phone server where no mysqld or httpd is running at all. Only keep-alive script and asterisk (so to say). So it is not mysql who eats up my ressources:
top on db-web
Mem 32218M used: 1133M buffers: 269M cache: 6404
Since I raised the maxclient settings in httpd, I am still happy with 1GB used and since I got 32GB, mysql may eat more if it wants...
another suggestion ?
Posted:
Fri Feb 18, 2011 9:22 am
by williamconley
did you use the same vicibox install cd on this box? (also: how long is it between reboot and "death"?)
Posted:
Fri Feb 18, 2011 9:30 am
by ronator
yes, all machines use vicibox 3.0.6 (db with 64 version) and since 2 weeks the old vicinow system is down.
The cache eats approximately 1 MB/min. that makes 1440MB/day. Since I got 4 GB in there, it will work for two days (2880MB), and the third day it will crash.
I can remember someone who told me if I am using voice-files/moh/announcements, it would always be more safe, to reboot the machines at night due to asterisk issues / memory leaks (like the commented out lines in crontab on viciboxes!)
Posted:
Fri Feb 18, 2011 9:59 am
by williamconley
Two choices: reboot nightly (like everyone else)
or dig in and find it. I've looked a few times, but it's time-consuming and noone wants to pay for it.
Do Not Assume that it is a single item. And DO NOT assume it is not MySQL (because mysql CLIENT is still running, no?
)
Posted:
Fri Feb 18, 2011 10:35 am
by ronator
ok, one point for you regarding mysql-client :-p but to defend myself: I just copied some settings from huge-myql.conf just to keep more data in RAM so it should not affect the clients. but that is just an assumption, not a fact ^^
Posted:
Fri Feb 18, 2011 11:20 am
by williamconley
You got a problem with rebooting every night like the rest of us? You "special" or something?
(If you DO decide to hunt this down, i'll help ... slowly, on here
)
Posted:
Fri Feb 18, 2011 11:37 am
by ronator
everyone is special, which actually means, no one is
no, I ain't got no problem with it. but I may need a WOL-solution because otherwise I cannot asure that the db-server is up and running before the phone-servers.
if I reboot the pone servers first and then db-server (so that vicis dont see mysql disappear) then vicis are up before mysql and they will cry. if I reboot mysql first, vicis also start screaming and spam my mailbox ...
So I have to shut down vicis, reboot mysql, sleep(100000), and then boot vicis. Or did I understand something wrong ?
I will go on watching the behaviour and I'd love to stay in contact with you, because maybe I can find some important imformation. But just like you, noone is going to pay me and my boss just wants to see the machine working .. not me debugging ... I can hear 'em saying:"So then just reboot!"
It's a cruel world
Posted:
Fri Feb 18, 2011 11:46 am
by williamconley
i thought it was a madmadmadmadmadmad world.
you can't just "pause" the other servers for 5 minutes while mysql reboot, then reboot the others? (shut off the watchdog for that 5 minute period?)
then you'll only get a freakout when the mysql server does its monthly drive-check.
Posted:
Mon Mar 21, 2011 5:00 pm
by Kumba
MySQL shouldn't need rebooting nightly, weekly or really monthly. We really only reboot ours about once a year when we open them up, blow them out, check fans, scan the filesystems/HD, and maybe upgrade the OS/etc. Basically, yearly maintenance. For instance:
- Code: Select all
db5:~ # uptime
5:50pm up 197 days 18:14, 1 user, load average: 2.35, 2.29, 2.37
They pretty much just run. Same thing with the web servers:
- Code: Select all
web2:~ # uptime
5:52pm up 197 days 14:54, 1 user, load average: 0.25, 0.39, 0.40
As you can see out last reboot corresponded with a move we did in our colo last halloween combined with maintenance. The above servers are serving roughly around 200 concurrent agents at any give time
The telephony (asterisk) servers are a different story. Those can last anywhere from a few days to a month before they just self-detonate and need to be rebooted. The short easy solution for us was to reboot in a controlled manner as opposed to an uncontrolled one. Since historically we see no traffic between 4:00am and 4:05am, we chose this as the time period to reboot. We have yet to have a customer call and let us know their agents couldn't place calls at 4am in the morning, so it seems to be a good fix.
Whether the fix is good is all open to interpretation.
Posted:
Wed Mar 23, 2011 9:24 pm
by williamconley
linux weirdos who never want to reboot their servers. lol. impressive, though.
we reboot everything nightly just to be safe, so we wouldn't know "how long they can go". we don't have any 24 hour call centers.
although I will say that our Server2k and server 2k3 boxes ran for months (years?) without a reboot unless the box was physically moved.
Posted:
Thu Mar 24, 2011 5:15 pm
by Kumba
No point in rebooting the database and web servers unless you are running a bunch of all in one boxes. Dialers, because of Asterisk specifically, are what need rebooting. You can even get away with killing asterisk and all the screens on the box then restarting if you want to have only a 15-second downtime window.
Pointlessly rebooting the database has ended up with some DB's crashed for whatever reason. Probably from the caching scheme used on the hard-drives.
Posted:
Thu Mar 24, 2011 5:36 pm
by williamconley
The ONLY damaged DBs we've ever had/heard of have been from power outages. (So far)
We have had excellent luck with daily reboots across the board if Vicidial is in ANY way installed on the servers, whereas we've had down servers weekly or monthly for those we've opted not to reboot.
That being said, we no longer "experiment" with this concept at all. We reboot all servers that are not 24-hour servers nightly (if we control this function). The 24-hour servers are in direct control of the client, and once again we do not delve into these matters.
Since we instituted that rule, we have only ONE server that regularly crashes (and brother is that one old, soon to be rebuilt and no longer act as router for the network it is on! LOL).
So since we no longer experiment with NOT rebooting, I can't honestly say whether anything since Redux 2.0 would be able to skip it, so that's good to hear. But I can say that rebooting nightly has not caused any issues on any of our servers. At that point: we stop meddling as they are live client servers and it would be inconvenient to experiment.