Slow DB at 120 live, aiming for 350+ live agents.
Posted: Mon May 07, 2012 7:52 am
Hi,
We look to install and maintain a sizeable multi-server environment with manual, ratio and adapt dialing up to 4 ratio and blended campaigns for a client with up to 350 live agents per campaign. My specs tested for 120 live agents across around 7 simultaneous campaigns:
My problem is that I see database software bottlenecks with sometimes over 30 calls waiting while there are agents available at around 120 live agents.
I am not confident to support 350 agents on a single multi-server system because I see database queries to the vicidial_log table in particular taking over 8 seconds to return when there are many calls waiting to be connected to a waiting agent although I'm not sure if this converts to dropped calls or a business performance problem. Example from MySQL slow query log:
I don't see a direct disc bottleneck on the DB server where the disk util% is not even 10%. I've looked at the usual MySQL optimizations, indexes and even moved records older than 2 months to backup tables to keep the scans lean.
What's the reason the vicidial_log updates force index lead_id?
Should I consider table partitioning across multiple servers or MariaDB?
What do I need to do to max out the number of live agents on the system and aim for 500+?
I realise splitting up the agents to different systems is possible but it adds reporting and managment overhead. The company I work for is considering official paid support for this implementation and hand-over to guarantee at least 350 live agents if it is feasible.
I would appreciate any suggestions for now, thanks!
Vincent.
We look to install and maintain a sizeable multi-server environment with manual, ratio and adapt dialing up to 4 ratio and blended campaigns for a client with up to 350 live agents per campaign. My specs tested for 120 live agents across around 7 simultaneous campaigns:
- OS: OpenSuse from Vicidial 3.1.15 from Vicibox Redux preload ISO
Vicidial: 2.4-351a
MySQL: 5.1.57-log SUSE MySQL RPM
Server Specs of various Dells eg 1950:
DB x1: Dual Xeon E5530@2.40Ghz, 16GB RAM, 15k RPM SAS RAID10 4 drives, Perc6i Logic MegaRAID SAS 1078.
Dialer x4: Dual Xeon L5420@2.5Ghz 16GB RAM, 15k RPM SAS RAID1 2 drives, Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS.
Web x1: Dual Xeon E5420@2.5Ghz 16GB RAM, 15k RPM SAS RAID1 2 drives, Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS
Codecs: G729a
Carrier: VOIP over IAX2
My problem is that I see database software bottlenecks with sometimes over 30 calls waiting while there are agents available at around 120 live agents.
I am not confident to support 350 agents on a single multi-server system because I see database queries to the vicidial_log table in particular taking over 8 seconds to return when there are many calls waiting to be connected to a waiting agent although I'm not sure if this converts to dropped calls or a business performance problem. Example from MySQL slow query log:
- Code: Select all
# Time: 120505 15:55:26
# User@Host: cron[cron] @ [10.100.0.248]
# Query_time: 8.505915 Lock_time: 3.828021 Rows_sent: 0 Rows_examined: 4375907
SET timestamp=1336226126;
UPDATE vicidial_log FORCE INDEX(lead_id) set status='ADC' where lead_id = '90******' and uniqueid LIKE "13********%";
# User@Host: cron[cron] @ [10.100.0.248]
# Query_time: 6.365207 Lock_time: 3.287518 Rows_sent: 0 Rows_examined: 2184425
SET timestamp=1336226234;
UPDATE vicidial_list set status='PU' where lead_id='60********' and status NOT IN('CBHOLD','CALLBK');
....
# Time: 120507 11:30:54
# User@Host: cron[cron] @ [10.100.0.249]
# Query_time: 4.296744 Lock_time: 0.000043 Rows_sent: 0 Rows_examined: 4381492
SET timestamp=1336383054;
UPDATE vicidial_log FORCE INDEX(lead_id) set status='ADC' where lead_id = '70*******' and uniqueid LIKE "13********%";
I don't see a direct disc bottleneck on the DB server where the disk util% is not even 10%. I've looked at the usual MySQL optimizations, indexes and even moved records older than 2 months to backup tables to keep the scans lean.
What's the reason the vicidial_log updates force index lead_id?
Should I consider table partitioning across multiple servers or MariaDB?
What do I need to do to max out the number of live agents on the system and aim for 500+?
I realise splitting up the agents to different systems is possible but it adds reporting and managment overhead. The company I work for is considering official paid support for this implementation and hand-over to guarantee at least 350 live agents if it is feasible.
I would appreciate any suggestions for now, thanks!
Vincent.