vicidial.org

by **perlmutr** » Wed Jan 06, 2021 1:06 pm

any suggestions why the load on dialer 3 is as high as it is. I am running a press 1 campaign set as 100 chan/user with 3 users logged in. i thought the balance rank This field allows you to set the order in which this server is to be used for balance dialing, if balance dialing is enabled. The server with the highest rank will be used first in placing Balance fill calls.

Thank you.

server definition
SERVER...........DESCRIPTION................ACT..........LOAD..........CHAN...AGNT.....DISK.........Max.Trunks.Calls/sec..Balance.Rank
LeadgenD2.......Server.LeadgenD2.........Y./.Y./.N......174.-.43%.......105.....0.......3%..............150.....20.............5
LeadgenDB0......ViciDial.Database.server.Y./.Y./.Y......457.-.26%.......156.....3.......22%.............150.....20.............4
LeadgenDi3......Server.LeadgenDi3........Y./.Y./.N......3156.-.78%......227.....0.......16%.............150.....20.............2
LeadgenDl1......Server.LeadgenDl1........Y./.Y./.N......1110.-.88%......122.....0.......1%..............150.....20.............5

SERVER CPU INFORMATION

LeadgenDb0:~ # lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 26
Model name: Intel(R) Xeon(R) CPU X5550 @ 2.67GHz
Stepping: 5
CPU MHz: 2660.004
BogoMIPS: 5320.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15
========================================================

LeadgenDl1:~ # lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 23
Model name: Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz
Stepping: 10
CPU MHz: 2926.000
CPU max MHz: 2926.0000
CPU min MHz: 1596.0000
BogoMIPS: 5866.42
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 3072K
NUMA node0 CPU(s): 0,1
========================================================

LeadgenD2:~ # lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 42
Model name: Intel(R) Core(TM) i5-2320 CPU @ 3.00GHz
Stepping: 7
CPU MHz: 3204.906
CPU max MHz: 3300.0000
CPU min MHz: 1600.0000
BogoMIPS: 5986.81
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 6144K
NUMA node0 CPU(s): 0-3
========================================================

LeadgenDl3:~ # lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 15
Model: 4
Model name: Intel(R) Xeon(TM) CPU 2.80GHz
Stepping: 8
CPU MHz: 2800.000
CPU max MHz: 2800.0000
CPU min MHz: 2400.0000
BogoMIPS: 5586.26
L1d cache: 16K
L2 cache: 2048K
NUMA node0 CPU(s): 0-7
========================================================

Vicidial Version 2.14-730A SVN 3180 DB schema version 1582 asterisk version 13.21.1-vici

I have a 4 machine cluster
Database/ WEB / Dialer machine dual: quad core Xeon 2.67 GHz 48GB mem / Dialer dual: dual core Xeon 2.80 GHz 8GB mem / dialer core i5 3.0 GHz 4GB mem / dialer core-2 duo 2.93GHz 3GB mem / soon archive I3 3.1 GHz 10GB mem
.. all machines except archive have 2 nic's.. one internal and 1 external for traffic.

by **carpenox** » Wed Jan 06, 2021 1:28 pm

what else is dialer3 used for? anything? check what is taking up the ram usage during the high load using the following command:

ps aux | awk '{print $2, $4, $11}' | sort -k2rn | head -n 20

by **williamconley** » Wed Jan 06, 2021 5:56 pm

Check load balance via "uptime" or "htop" to verify that you have the real load balances.

Are you experiencing issues, or is this an ethereal question?

Does this server handle inbound calls?

Balance dialing timing is odd. Balance dialing itself can cause extra load. Try checking cpu hogs to see which processes are using the most load.

Code: Select all: ps -eo pcpu,pid,user,args | sort -k 1 -r | head -11

I generally recommend putting one agent on each server and allowing each server to dial 100 channels and turning off balance dialing. Note that balance dialing effects only initiation of oubound calls. When an answer occurs, the call will be routed to an available agent immediately, even if on another server, even with balance dialing off.

by **perlmutr** » Thu Jan 07, 2021 12:15 pm

thank you all for your suggestions.

dialer 3 does nothing different from dialer 1 and dialer 2.
I can't really tell if I am experiencing issues from this.
This is a new client and I am testing what my limits of the servers are. to satisfy requirements i may have to add additional servers to the cluster. I am attempting to determine a what point the call quality will deteriorate at which time i was thinking of reducing the dialing channels on my database server and adding additional dialers.

would turning system performance stats on the servers cause excess overhead? where is the data logged? i don't see any info on this doing web searches.

as soon as our client is available for testing I will be increasing my channels and will run htop and uptime to learn more.

here is what i see when system is idle:

Idle Status

SERVER + DESCRIPTION ACT LOAD CHAN AGNT DISK
LeadgenD2 Server.LeadgenD2 Y./.Y./.N 6.-.1% 0 0 3%
LeadgenDB0 ViciDial.Database.server Y./.Y./.Y 14.-.1% 0 0 22%
LeadgenDi3 Server.LeadgenDi3 Y./.Y./.N 26.-.2% 0 0 17%
LeadgenDl1 Server.LeadgenDl1 Y./.Y./.N 19.-.3% 0 0 1%

I appreciate all of the input as I have learned linux while building these servers and installing vicidial and have no formal training.

by **williamconley** » Thu Jan 21, 2021 12:20 pm

To test the capacity of dialers and web servers in a cluster: load as many agents onto one of each until that server begins to act erratically. Then back off by 20% and see if that fixes the problem. Repeat the test a few times to be sure where your threshold is. But do NOT make the mistake of assuming this threshold is written in blood. Too many variables. But you may get a good idea of the basic limitation for that single server, and then apply this same limit to all identical servers to get your system capacity.

If you can, capture the logs during the "erratic behavior" moment so you can investigate where your limitation came from. Chase it down and see if this is a hard barrier or something that can be raised (such as additional open files or network ports or meetme rooms). For instance, if the limitation was dropped packets, you can improve your network to avoid dropping packets and then test again.

Avoid running more than one role on a single server as you get more servers in the cluster.

Single server: DB/Dialer/Web/Archive
Two server: DB/Web/Archive |Dialer
Three server: DB/Web/Archive |Dialer |Dialer
Four server: DB/Archive |Dialer |Dialer |Web
Five server: DB/Archive |Dialer |Dialer |Dialer |Web
Six server: DB/Archive |Dialer |Dialer |Dialer |Web |Web

The Archive server can actually be anywhere in the system. In fact, it can be a completely unrelated server in a different location running just Web/FTP without any loss of functionality.

If your DB must share a role, consider Web rather than Dialer on the DB server. It is not necessary to disable a role on a server that's not using that role, but configuration options may be changed to avoid excessive waste for an unused role and it does free up some resources to disable a role entirely. Not running asterisk on the DB server doesn't free up huge resources, but it does free up SOME resources. If your DB server is your chokepoint (common), then freeing up resources on it is necessary.

Keep an eye on the Average Server Load (using htop or uptime) for all servers during production. If the Average Server Load never exceeds half of the CPU core count, that server is not overloaded as yet. For example: staying under 4.0 on an 8 core system is smooth sailing. Once you exceed half of the core count (sustained for more than a few seconds), that server is nearing load. It will progress from half to full MUCH faster than it did from idle (0.1?) to half. And once it approches full load (8.0 on 8 core system) it will overload quickly. Not to say that an 8 core system will crash and burn if it's running at 16.0, because I've seen that many times. But any hiccups after full load Can Cause Crash. For instance, running a report or increasing a dial ratio. Anything can trigger failure at that point.

Happy Hunting! 8-)

by **perlmutr** » Mon Jan 25, 2021 11:46 am

thank you william conley

vicidial.org

cluster server performance question

cluster server performance question

Re: cluster server performance question

Re: cluster server performance question

Re: cluster server performance question

D

Re: cluster server performance question

Who is online