Page 1 of 1

High SQL Load

PostPosted: Wed Feb 03, 2016 9:51 am
by adymeblack
I have a brand new 9 (8 web/dialers and 1 DB) server cluster up and running with on average 75 agents logged in and taking calls at about a 3:1 ratio and I am having a lot of issues with SQL constantly running over 1.4, which in turn is causing problems for my agents (being paused, lead info taking forever to load, double dialing,etc)

I've minimized all SQL usage as much as I feasibly can but it doesn't seem to really help. I've had over 100 agents running flawlessly on a smaller 5 server cluster using the same exact hardware so i am at a loss as to where the issue lies.

Any help would be greatly appreciated.


VERSION: 2.12-538a
BUILD: 160122-1401
Asterisk: 1.8.32.3-vici
ISO: 6.0.4
2x- Intel Xeon 3.0 GHz Quad-Core CPUs
16GB RAM
2x 240GB SSD
No additional hardware
No customization

Re: High SQL Load

PostPosted: Wed Feb 03, 2016 11:16 am
by mflorell
What kind of RAID card are you using for your DB server?
What are the specs on the drives on your DB server?
How much RAM on your DB server?

Re: High SQL Load

PostPosted: Wed Feb 03, 2016 11:26 am
by adymeblack
Dell PowerEdge PERC 6 - RAID 0
16GB RAM
Sandisk 240GB 6 Gb/s

I also had the same issue with 7200K SATA and 15k SAS drives.

Re: High SQL Load

PostPosted: Wed Feb 03, 2016 5:19 pm
by adymeblack
Anyone have an idea?

I'm starting to pull my hair out.

Re: High SQL Load

PostPosted: Wed Feb 03, 2016 6:09 pm
by mflorell
That should be able to handle 100 agents if it is a dedicated DB server.

What does your /etc/my.cnf look like?

Re: High SQL Load

PostPosted: Wed Feb 03, 2016 6:30 pm
by williamconley
check your "show processlist" entries to see if there is a process that runs too long regularly, or at least when you experience "the issue". slow query log can also be useful.

when you installed the dialers, how did you cluster them? (please say you just used the scripts in the Vicibox .iso that Kumba designed ... and not that you manually clustered them all, LOL)

Re: High SQL Load

PostPosted: Wed Feb 03, 2016 6:34 pm
by adymeblack
It should and it has and with more agents in the past and barely broke a sweat.

On the my.cnf file, standard config generated by the installer. I have made no changes to it.

Code: Select all

# Basic ViciBox my.cnf for a quad-core 8-GB RAM or so
# Use the dedicate my-big.cnf for 16+GB RAM and 8+ Cores
#

# The following options will be passed to all MySQL clients
[client]
#password   = your_password
port      = 3306
socket      = /var/run/mysql/mysql.sock

# Here follows entries for some specific programs

# The MySQL server
[mysqld]
port      = 3306
socket      = /var/run/mysql/mysql.sock
datadir   = /var/lib/mysql
skip-external-locking
key_buffer_size = 640M
max_allowed_packet = 2M
table_open_cache = 8192
sort_buffer_size = 4M
net_buffer_length = 8K
read_buffer_size = 4M
read_rnd_buffer_size = 16M
myisam_sort_buffer_size = 64M
thread_cache_size = 50
query_cache_size = 0
thread_concurrency=8
skip-name-resolve
connect_timeout=60
long_query_time=3
log_slow_queries
max_connections=768
open_files_limit=24576
max_heap_table_size=32M
expire_logs_days=3
default-storage-engine=MyISAM
table_definition_cache=8192
table_cache=8192
concurrent_insert=2
myisam_recover
myisam_repair_threads=1
tmpdir          = /tmp/


# This will disable networking
#skip-networking

# Replication Master Server (default)
# binary logging is required for replication
log-bin=mysql-bin

# binary logging format - mixed recommended
binlog_format=mixed

# required unique id between 1 and 2^32 - 1
# defaults to 1 if master-host is not set
# but will not function as a master if omitted
server-id   = 1
slave-skip-errors=1032,1690


# Replication Slave (comment out master section to use this)
#
# To configure this host as a replication slave, you can choose between
# two methods :
#
# 1) Use the CHANGE MASTER TO command (fully described in our manual) -
#    the syntax is:
#
#    CHANGE MASTER TO MASTER_HOST=<host>, MASTER_PORT=<port>,
#    MASTER_USER=<user>, MASTER_PASSWORD=<password> ;
#
#    where you replace <host>, <user>, <password> by quoted strings and
#    <port> by the master's port number (3306 by default).
#
#    Example:
#
#    CHANGE MASTER TO MASTER_HOST='125.564.12.1', MASTER_PORT=3306,
#    MASTER_USER='joe', MASTER_PASSWORD='secret';
#
# OR
#
# 2) Set the variables below. However, in case you choose this method, then
#    start replication for the first time (even unsuccessfully, for example
#    if you mistyped the password in master-password and the slave fails to
#    connect), the slave will create a master.info file, and any later
#    change in this file to the variables' values below will be ignored and
#    overridden by the content of the master.info file, unless you shutdown
#    the slave server, delete master.info and restart the slaver server.
#    For that reason, you may want to leave the lines below untouched
#    (commented) and instead use CHANGE MASTER TO (see above)
#
# required unique id between 2 and 2^32 - 1
# (and different from the master)
# defaults to 2 if master-host is set
# but will not function as a slave if omitted
#server-id       = 2
#
# The replication master for this slave - required
#master-host     =   <hostname>
#
# The username the slave will use for authentication when connecting
# to the master - required
#master-user     =   <username>
#
# The password the slave will authenticate with when connecting to
# the master - required
#master-password =   <password>
#
# The port the master is listening on.
# optional - defaults to 3306
#master-port     =  <port>
#
# binary logging - not required for slaves, but recommended
#log-bin=mysql-bin

# Uncomment the following if you are using InnoDB tables
#innodb_data_home_dir = /var/lib/mysql
#innodb_data_file_path = ibdata1:10M:autoextend
#innodb_log_group_home_dir = /var/lib/mysql
# You can set .._buffer_pool_size up to 50 - 80 %
# of RAM but beware of setting memory usage too high
#innodb_buffer_pool_size = 16M
#innodb_additional_mem_pool_size = 2M
# Set .._log_file_size to 25 % of buffer pool size
#innodb_log_file_size = 5M
#innodb_log_buffer_size = 8M
#innodb_flush_log_at_trx_commit = 1
#innodb_lock_wait_timeout = 50

# The safe_mysqld script
[safe_mysqld]
log-error   = /var/log/mysql/mysqld.log
socket      = /var/run/mysql/mysql.sock

[mysqldump]
socket      = /var/run/mysql/mysql.sock
quick
max_allowed_packet = 16M

[mysql]
no-auto-rehash
# Remove the next comment character if you are not familiar with SQL
#safe-updates

[myisamchk]
key_buffer_size = 20M
sort_buffer_size = 20M
read_buffer = 2M
write_buffer = 2M

[mysqlhotcopy]
interactive-timeout

[mysqld_multi]
mysqld     = /usr/bin/mysqld_safe
mysqladmin = /usr/bin/mysqladmin
log        = /var/log/mysqld_multi.log
# user       = multi_admin
# password   = secret

# If you want to use mysqld_multi uncomment 1 or more mysqld sections
# below or add your own ones.

# WARNING
# --------
# If you uncomment mysqld1 than make absolutely sure, that database mysql,
# configured above, is not started.  This may result in corrupted data!
# [mysqld1]
# port       = 3306
# datadir    = /var/lib/mysql
# pid-file   = /var/lib/mysql/mysqld.pid
# socket     = /var/lib/mysql/mysql.sock
# user       = mysql

# [mysqld2]
# port       = 3307
# datadir    = /var/lib/mysql-databases/mysqld2
# pid-file   = /var/lib/mysql-databases/mysqld2/mysql.pid
# socket     = /var/lib/mysql-databases/mysqld2/mysql.sock
# user       = mysql

# [mysqld3]
# port       = 3308
# datadir    = /var/lib/mysql-databases/mysqld3
# pid-file   = /var/lib/mysql-databases/mysqld3/mysql.pid
# socket     = /var/lib/mysql-databases/mysqld3/mysql.sock
# user       = mysql

# [mysqld6]
# port       = 3309
# datadir    = /var/lib/mysql-databases/mysqld6
# pid-file   = /var/lib/mysql-databases/mysqld6/mysql.pid
# socket     = /var/lib/mysql-databases/mysqld6/mysql.sock
# user       = mysql




Re: High SQL Load

PostPosted: Wed Feb 03, 2016 6:37 pm
by adymeblack
williamconley wrote:check your "show processlist" entries to see if there is a process that runs too long regularly, or at least when you experience "the issue". slow query log can also be useful.

when you installed the dialers, how did you cluster them? (please say you just used the scripts in the Vicibox .iso that Kumba designed ... and not that you manually clustered them all, LOL)


I've been watching the process list and everything seems to move along as it should. Nothing really seems to "hang" or i haven't been able to catch it.

Everything was done via the installer from the DB to the dialers. The only extra software i installed was traceroute so i could verify it was using the right gateway to my SIP provider.

Re: High SQL Load

PostPosted: Wed Feb 03, 2016 6:44 pm
by williamconley
If there is no "backup" causing the problems with the db server, then there should be no delay in response to the dialers. Without the delayed response, the dialers should have no problems. If there is a delay, but the delay is not due to the DB response, perhaps its a network delay ...

Do your server share a private network to communicate with one another? (ie: public IP for communication with Carriers and private IP for communication between servers without the possibility of "outside interference"?)

If it is not a network delay, and it's not a DB delay ... then I don't see how it could be a delay at all. In which case I might be looking at whether it's just one of the dialers and not the DB itself. The issues you've reported, were they logged as to which server, agent, lead, time of day ...? (Looking for a common thread that could be used to demonstrate that the problem is not universal but limited to a specific segment of your system.)

Re: High SQL Load

PostPosted: Wed Feb 03, 2016 6:52 pm
by adymeblack
The standard setup is eth0 for the private network shared by the servers and agents, eth1 is a different private network that is connected to a separate dedicated router for outbound SIP traffic.

The more i think about it, it seems to pick on the same agents on 2 of the dialer servers. If it was one or two of the dialer servers causing issues it would explain why i'm getting almost identical results even though i've had the db installed on 2 physical servers this week.

Re: High SQL Load

PostPosted: Wed Feb 03, 2016 7:17 pm
by williamconley
I applaud your open mind for that "shift" to "maybe it's not the DB".

I also strongly suggest you implement logging of the complaints. We tend to push statistics to match our expectations and overlook those that do not match. So having them written down would make ignoring them more difficult ... and having an actual list of the problems may point you directly at the problem.

Sledge-hammer to "prove" the problem is in a specific server (or whatever you believe). This is best done by turning off the server, or moving the user to a whole new workstation ... that sort of thing. "Idle" the offender and see if the problem vanishes.

Remember: It doesn't count unless you can "revive" the problem by putting the offender back online, and make the problem disappear again by removing it again. Takes out the possibility of coincidence if you do it enough times. If you "think" you found it, but didn't prove it, it'll inevitably resurface during your heaviest load (ie: Least Convenient Moment, LOL).

When you're satisfied you've found it, you can either try to fix the problem or trash "the problem" completely, of course. 8-)

Re: High SQL Load

PostPosted: Wed Feb 03, 2016 7:29 pm
by adymeblack
I try to "log" them, but the way i usually find out there are issues is when all the supervisors (or worse, VP's) all tell me at once in various ways of communications. It get's hard to keep up quickly.

The only issue i have with that suggestion is i don't have a lot of spare servers to shift the load do while i narrow it down (i have some ordered but they take forever to ship). This only seems to rear it's head while under full load during production hours. With that being said, is there a way i can "generate" that kind of load during off-hours to help me find the offending server a little faster?

Re: High SQL Load

PostPosted: Wed Feb 03, 2016 7:46 pm
by williamconley
Yep. Been there. Ask them with a blank face for the log. And why you were not notified at the time of occurrence so you could have viewed the problem first hand.

After all, when the car makes a funny noise you take the car to the mechanic and let him listen to the noise. What are the odds on telling your wife "there was a noise, but it's gone now" and having your wife (based on that description) call the mechanic and have him fix it over the phone. Same deal. Give them that description and ask if there is (a) something ELSE going on here or (b) a way to set this up to get fixed professionally (log, email from room managers at the time of occurrence, something!).

Of course, a professional room has a "Duty IT Technician" to call and have someone look into the problem LIVE. We provide this service for many Call Centers. The room managers call us and we have a look and render an opinion (often we run a server health check script to see if any "known problem areas" can give us a hint). A room manager who does NOT make that phone call is taken aside and threatened. Otherwise, you're just stuck in "committee" mode and trying to fix problems long after their symptoms are gone. Not particularly efficient.

Just sayin' (not that I've been in that situation ... today LOL).

Re: High SQL Load

PostPosted: Wed Feb 03, 2016 8:08 pm
by adymeblack
Any suggestions on what to look for on the dialers that would lead to a super high load like that?

Re: High SQL Load

PostPosted: Wed Feb 03, 2016 8:14 pm
by williamconley
htop and ps aux may provide some answers. i also like this "top 10 cpu hogs" line:

Code: Select all
ps -eo pcpu,pid,user,args | sort -k 1 -r | head -11

Re: High SQL Load

PostPosted: Wed Feb 03, 2016 8:51 pm
by adymeblack
I took a look at the "top 10 cpu hogs" and didn't find anything too terribly interesting. I did have one server that had a max of 11% for asterisk, but it has a slightly slower processor than the rest so i expected that.

Although i'm looking through the netstat and i see 3 servers with 3x-4x more connections back to the DB than the others.

Re: High SQL Load

PostPosted: Wed Feb 03, 2016 9:10 pm
by williamconley
Are you presently experiencing the "symptoms" in question? Cuz if you're not ... you're likely spinning your wheels.

Re: High SQL Load

PostPosted: Wed Feb 03, 2016 9:12 pm
by adymeblack
I am not, it's off hours so they are literally doing nothing. I'm looking over everything on all the servers to see if i can find anything that stands out among the rest even under no load.

Re: High SQL Load

PostPosted: Thu Feb 04, 2016 11:43 am
by adymeblack
I am now in the thick of the "issues". While they are not as bad as they have been, still having issues with double dialing, being paused every other call, etc.

SQL load still varies from 0.2 to 1.5. I've enabled slow query logging and nothing is coming up. I put a load balancer in front of all the web servers because on 2 of them Apache was generating the highest load. A little bit of improvement, but not a whole lot. As far as i can tell at present there is nothing in the network causing a bottle neck.

I'm in the process of creating a dedicated web server for the cluster. I'm hoping if i can offload all the apache traffic to a different box, the rest of them will slow down a bit.

Here's to hoping.

Re: High SQL Load

PostPosted: Thu Feb 04, 2016 9:18 pm
by mflorell
You shouldn't need more than 1 web server if you have 100 agents. Are you using eaccellerator or another PHP caching engine?

Re: High SQL Load

PostPosted: Fri Feb 05, 2016 12:30 am
by williamconley
Is apache "in general" causing load? Or are there specific apache processes hammering the server while others are (as expected) fairly minor?

If so, kill the heavy loaded processes and see what breaks. It could be you have some rogue processes or a person running something unexpected. Hopefully that's the case (a single rogue process being run by someone who should not be doing so ...) and you'll get a phone call about a web page that keeps crashing from someone other than an agent.

Re: High SQL Load

PostPosted: Fri Feb 05, 2016 2:16 pm
by adymeblack
mflorell wrote:You shouldn't need more than 1 web server if you have 100 agents. Are you using eaccellerator or another PHP caching engine?


Nope, was using a load balancer (Zen Load Balancer to be specific) but even when i took that out of the loop it made no difference.

Although i may have found the issue, but i'm waiting til EOD to determine if that was the actual issue or not. It seems that if i turn off the chat function it calms way down. I still see the DB spike every once in a while, but it is a lot shorter in duration, not as intense, and for the most part the complaints from agents have stopped coming in at the same pace. So i'm figuring that could be it reloading the hopper or something similar.

I have cautious optimism now mainly because since the beginning it was off and on.

Re: High SQL Load

PostPosted: Fri Feb 05, 2016 4:20 pm
by mflorell
Very interesting, if you happen to see what chat-related queries are causing problems, that would be great information to have. It is a newer feature, and it's possible there might be some query optimizations that are needed.

Re: High SQL Load

PostPosted: Fri Feb 05, 2016 4:32 pm
by adymeblack
Honestly the entire time I've been watching the process list, i think i may have seen one or 2 chat queries flash up for a couple seconds then were gone.

Most of the time i would only see 2 or 3 queries going at once and they were usually lead related.

I turned off the chat around 10 this morning and i'm pretty sure something in that was causing the issue. Since then i think i've had one agent get paused and one other have one or two double-dials.

Re: High SQL Load

PostPosted: Mon Jan 30, 2017 12:46 pm
by adymeblack
They're back....kind of. Luckily not as intense, but still problematic.

Since the last post on this thread I have upgraded my DB server:

Dell PowerEdge R610
Intel Xeon X5690 6-core 3.47GHz (2x)
32 GB RAM

I've also upgraded Vicidial:
VERSION: 2.14-585a
BUILD: 170114-1356
Installed via Vicibox ISO 7.0.4
Added traceroute and webmin

Still having some load issues. I only have about 80 agents dialing off this server right now (no chat) but the average load is about 0.8-1.9 and it is all MySQL.

Nothing too major in the slow_query_log, maybe one query here and there.

I'm thinking that i need to tune SQL to make more use of the memory and ease up on the CPU.

How would i go about doing that?........if it's even an option.

Thanks.

Re: High SQL Load

PostPosted: Mon Jan 30, 2017 12:52 pm
by mflorell
Try running the "extras/mysql-tuning.sh" script to see what my.cnf settings changes you might want to make.

Re: High SQL Load

PostPosted: Mon Jan 30, 2017 2:22 pm
by williamconley
And verify that your HDs are 6G/sec and RAID10. The sql load could be increased by slow response times.

You could also turn on the "everything" sql log for 10 seconds and inspect your queries to see if there's anything amiss, repeated unnecessary queries from something misfiring for instance. Note that those would not be in the slow query log because they may respond quickly.

There may also be some queries just under the slow-query threshold, show processlist has been known to flag this fairly well.

Re: High SQL Load

PostPosted: Mon Jan 30, 2017 4:05 pm
by adymeblack
mflorell wrote:Try running the "extras/mysql-tuning.sh" script to see what my.cnf settings changes you might want to make.


I forgot about that little extra, i'll run it after EoB and see what it has to say.

Re: High SQL Load

PostPosted: Sun Mar 05, 2017 5:00 pm
by adymeblack
Took me forever to get to it. But i tried the Sql Tuning script and got this:

Code: Select all
 -- MYSQL PERFORMANCE TUNING PRIMER --
         - By: Matthew Montgomery -

MySQL Version 10.1.20-MariaDB x86_64

Uptime = 0 days 15 hrs 52 min 15 sec
Avg. qps = 691
Total Questions = 39498035
Threads Connected = 122

Warning: Server has not been running for at least 48hrs.
It may not be safe to use these recommendations

To find out more information on how each of these
runtime variables effects performance visit:
UNSUPPORTED MYSQL VERSION