Page 1 of 1

Database High CPU load

PostPosted: Thu Nov 23, 2017 1:18 pm
by covarrubiasgg
Hello everyone,

First this is my BUILD

1 x Database Server [ 2 x Xeon E5504 2.0 Ggz 128G RAM RAID 10 LSI MegaRAID 9265-8i with 8 15k SAS hard drives 3 + 3 drives and 2 hot spare]
1 x Web Server [1 x Xeon E31220 3.10Gz 32G RAM]
3 x Dialers 2 x Xeon L5520 16G RAM
1 x FreeNAS as Archive, not sure about the specs

Everything was installed using Vicibox 7.0.4 and VERSION: 2.14-639a BUILD: 171028-0913

Database is not too big, we archive every weekend to get only last 6 weeks
vicidial_list is around 4 million records but only 300k on active lists

This is for 80 Outbound Agents, Dial method RATIO and Dial Level 2
The symptoms

At some point agents start with "tour session has been paused", random disconnections, wrong data on the real time such as showing agents in pause while they are on calls, etc etc etc

The load on the database get super high peaks 8.0 with all cores at 100%

mysql show process list with a large listo of query with "Waiting for table level lock" and "Waiting for cache query lock"

MariaDB [asterisk]> show open tables WHERE In_use > 0;
+----------+---------------------------+--------+-------------+
| Database | Table | In_use | Name_locked |
+----------+---------------------------+--------+-------------+
| asterisk | vicidial_agent_log | 206 | 0 |
| asterisk | vicidial_manager_chats | 5 | 0 |
| asterisk | vicidial_users | 7 | 0 |
| asterisk | vicidial_live_agents | 26 | 0 |
| asterisk | vicidial_campaigns | 1 | 0 |
| asterisk | vicidial_manager_chat_log | 222 | 0


What have i done so far ?

I have already try some crazy ideas on my,cnf increase key_buffer_size to super high values, increate query_cache_size to 256M but I break it, it start getting "mysql error in accept bad file descriptor" and the web server was reporting "Mysql server is gone away" so i fallback to the original Vicibox config, i just increase the number of conections from 500 to 2048

Here is my current my.cnf

Code: Select all
[mysqld]
log_bin=/var/lib/mysql/mysql-bin
binlog_format=mixed
server-id=1
relay-log  = /var/lib/mysql/mysqld-relay-bin
slave-skip-errors = 1032,1690,1062
datadir=/var/lib/mysql
sql_mode=NO_ENGINE_SUBSTITUTION
port = 3306
socket = /var/run/mysql/mysql.sock
skip-external-locking
skip-name-resolve
connect_timeout=60
long_query_time=3
slow_query_log=1
slow-query-log-file=/var/log/mysql/mysqld-slow.log
max_connections=2048
key_buffer_size=4G
max_allowed_packet=16M
table_open_cache=512
table_definition_cache=2048
open_files_limit=24576
sort_buffer_size=4M
net_buffer_length=8K
read_buffer_size=4M
read_rnd_buffer_size=16M
myisam_sort_buffer_size=128M
join_buffer_size=1M
thread_cache_size=100
query_cache_size=32M
thread_concurrency=8
default-storage-engine=MyISAM
expire_logs_days=3
concurrent_insert=2
myisam_repair_threads=2
myisam_use_mmap=1
skip-innodb
delay_key_write=ALL
max_write_lock_count=1


Here is a sample from top please note the free memory and the super low wa, so i dont think it is an io bottle neck iotop shows similar values

Code: Select all
top - 10:53:11 up 58 min,  2 users,  load average: 6.38, 6.95, 6.47
Tasks: 174 total,   2 running, 172 sleeping,   0 stopped,   0 zombie
%Cpu(s): 34.1 us, 13.0 sy,  0.0 ni, 50.0 id,  0.1 wa,  0.0 hi,  2.8 si,  0.0 st
KiB Mem:  13202600+total,  5330308 used, 12669569+free,    37520 buffers
KiB Swap:  6304760 total,        0 used,  6304760 free.  2804556 cached Mem



any ideas? Im starting to think on faulty hardware but faulty CPU is not that common :/

What would you try before throwing the server through the window?



UPDATE:

During a super slow down event, with 700 queries on lock in the show processlist I use

mysql> SET GLOBAL query_cache_size = 0;
mysql> SET GLOBAL query_cache_type = 0;

Immidately all locks are gone, CPU is still super high, but agents are not reporting issues and mtop havent shown any red query not even yellow..

Re: Database High CPU load

PostPosted: Thu Nov 23, 2017 7:03 pm
by mflorell
We have seen some systems that perform better when the query cache is turned off. One of our largest systems with over 700 concurrent agents has the query cache turned off on the database.