Page 1 of 1

AGC Interface - system delay

PostPosted: Mon Oct 31, 2011 11:02 am
by chrism
We seem to be having issues where there is a 'system delay' after trying to dial the next number for a manual dial campaign.

Vicidial:
Vicibox Redux .iso with SVN upgrade | VERSION: 2.4-334a BUILD: 110831-2038 | Asterisk 1.4.39.2-vici | Single Server with separate web interface for AGC | No Digium/Sangoma Hardware | No Extra Software After Installation

top - 12:01:00 up 12:30, 3 users, load average: 0.50, 0.55, 0.37
Tasks: 147 total, 1 running, 146 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.7%us, 1.2%sy, 0.0%ni, 94.6%id, 0.0%wa, 0.1%hi, 0.4%si, 0.0%st
Mem: 2037796k total, 1966196k used, 71600k free, 9424k buffers
Swap: 4200444k total, 15332k used, 4185112k free, 1151484k cached


Kernel: Linux 2.6.34.10-0.2-pae #1 SMP 2011-07-20 18:48:56 +0200 i686 i686 i386 GNU/Linux
Quad Core Processor

AGC:
On separate web server running Apache connected through a private gigabit network to the database
AGC Version: VERSION: 2.4-330c BUILD: 110802-0122

Detailed Description:
On the AGC interface, we are seeing the error mesage:

System Delay, Please try again
code:740739


Inside of asterisk, the following error message is appearing for the same channel:

WARNING[30606] channel.c: Exceptionally long voice queue length queuing to Local/8600056@default-13de,1


This campaign is set to manual dial and the error message appears if clicking the 'DIAL LEAD' button too quickly (i.e. within 1 to 5 seconds after hanging up). If they wait about 5 seconds and then dial the next lead, everything is fine.

Could this be an issue with the database? I noticed that the my.cnf is tuned differently than I've done in the past.

Thanks again for all the help.

Chris

PostPosted: Mon Oct 31, 2011 11:28 am
by williamconley
test: put an agent on your database server (for agent web) to exclude clustering as your "issue".

please include your vicibox version (the fact that you upgraded is handy, but the original installer version can be useful information)

does this system work in autodial mode?

AGC:
On separate web server running Apache connected through a private gigabit network to the database
is this system also a full vicibox installation with the same svn code installed? how did you cluster it? is the "private gigabit network" in the same physical location? (assuming it is ... but have to ask)

exactly where does this system delay message appear? what type of phones are you using?

please show the entire WARNING cli in case there is something else useful, in fact the date/time in association to the system delay error message would be useful.

PostPosted: Mon Oct 31, 2011 12:23 pm
by chrism
Running it locally on the same server seems to be giving the same results. I changed the phone number below to a (212) number.

For the version of Vicibox, /etc/vicibox-version says:

ViciBox Redux v.3.1.0 released


The section that ties to the "system delay" notice on the AGC side under /var/log/asterisk/messages is:

[Oct 31 11:39:42] VERBOSE[30607] logger.c: [Oct 31 11:39:42] -- Executing [h@default:1] DeadAGI("Local/8600056@default-13de,1", "agi://127.0.0.1:4577/call_log--HVcauses--PRI-----NODEBUG-----16-----ANSWER-----60-----40") in new stack
[Oct 31 11:39:42] VERBOSE[30784] logger.c: [Oct 31 11:39:42] == Manager 'sendcron' logged off from 209.160.73.112
[Oct 31 11:39:44] VERBOSE[30789] logger.c: [Oct 31 11:39:44] == Parsing '/etc/asterisk/manager.conf': [Oct 31 11:39:44] VERBOSE[30789] logger.c: [Oct 31 11:39:44] Found
[Oct 31 11:39:44] VERBOSE[30789] logger.c: [Oct 31 11:39:44] == Manager 'sendcron' logged on from 209.160.73.112
[Oct 31 11:39:44] VERBOSE[30789] logger.c: [Oct 31 11:39:44] == Manager 'sendcron' logged off from 209.160.73.112
[Oct 31 11:39:44] WARNING[30606] channel.c: Exceptionally long voice queue length queuing to Local/8600056@default-13de,1
[Oct 31 11:39:45] WARNING[30606] channel.c: Exceptionally long voice queue length queuing to Local/8600056@default-13de,1
[Oct 31 11:39:46] WARNING[30606] channel.c: Exceptionally long voice queue length queuing to Local/8600056@default-13de,1
[Oct 31 11:39:47] VERBOSE[30607] logger.c: [Oct 31 11:39:47] -- AGI Script agi://127.0.0.1:4577/call_log--HVcauses ... -60-----40 completed, returning 0
[Oct 31 11:39:47] VERBOSE[30607] logger.c: [Oct 31 11:39:47] == Spawn extension (default, 8412012125551212, 3) exited no
n-zero on 'Local/8600056@default-13de,1'
[Oct 31 11:39:47] VERBOSE[30606] logger.c: [Oct 31 11:39:47] == Spawn extension (default, 8600056, 1) exited non-zero on 'Local/8600056@default-13de,2'
[Oct 31 11:39:47] VERBOSE[30606] logger.c: [Oct 31 11:39:47] -- Executing [h@default:1] DeadAGI("Local/8600056@default-13de,2", "agi://127.0.0.1:4577/call_log--HVcauses--PRI-----NODEBUG-----0---------------") in new stack
[Oct 31 11:39:47] VERBOSE[30606] logger.c: [Oct 31 11:39:47] -- AGI Script agi://127.0.0.1:4577/call_log--HVcauses ... ---------- completed, returning 0
[Oct 31 11:39:47] VERBOSE[30803] logger.c: [Oct 31 11:39:47] == Parsing '/etc/asterisk/manager.conf': [Oct 31 11:39:47] VERBOSE[30803] logger.c: [Oct 31 11:39:47] Found
[Oct 31 11:39:47] VERBOSE[30803] logger.c: [Oct 31 11:39:47] == Manager 'sendcron' logged on from 209.160.73.112
[Oct 31 11:39:47] VERBOSE[30803] logger.c: [Oct 31 11:39:47] == Manager 'sendcron' logged off from 209.160.73.112
[Oct 31 11:39:47] VERBOSE[30804] logger.c: [Oct 31 11:39:47] == Parsing '/etc/asterisk/manager.conf': [Oct 31 11:39:47] VERBOSE[30804] logger.c: [Oct 31 11:39:47] Found
[Oct 31 11:39:47] VERBOSE[30804] logger.c: [Oct 31 11:39:47] == Manager 'sendcron' logged on from 209.160.73.112
[Oct 31 11:39:47] VERBOSE[30804] logger.c: [Oct 31 11:39:47] == Manager 'sendcron' logged off from 209.160.73.112
[Oct 31 11:39:47] VERBOSE[30805] logger.c: [Oct 31 11:39:47] == Parsing '/etc/asterisk/manager.conf': [Oct 31 11:39:47] VERBOSE[30805] logger.c: [Oct 31 11:39:47] Found
[Oct 31 11:39:47] VERBOSE[30805] logger.c: [Oct 31 11:39:47] == Manager 'sendcron' logged on from 209.160.73.112
[Oct 31 11:39:47] VERBOSE[30806] logger.c: [Oct 31 11:39:47] -- Executing [58600056@default:1] MeetMe("Local/58600056@default-e0c9,2", "8600056|Fmq") in new stack
[Oct 31 11:39:47] VERBOSE[30805] logger.c: [Oct 31 11:39:47] > Channel Local/58600056@default-e0c9,1 was answered.
[Oct 31 11:39:47] VERBOSE[30807] logger.c: [Oct 31 11:39:47] -- Executing [8309@default:1] Answer("Local/58600056@default-e0c9,1", "") in new stack
[Oct 31 11:39:47] VERBOSE[30807] logger.c: [Oct 31 11:39:47] -- Executing [8309@default:2] Monitor("Local/58600056@default-e0c9,1", "wav|204901_20111031-113943_8602773750") in new stack
[Oct 31 11:39:47] VERBOSE[30805] logger.c: [Oct 31 11:39:47] == Manager 'sendcron' logged off from 209.160.73.112
[Oct 31 11:39:47] VERBOSE[30807] logger.c: [Oct 31 11:39:47] -- Executing [8309@default:3] Wait("Local/58600056@default-e0c9,1", "1800") in new stack


is this system also a full vicibox installation with the same svn code installed? how did you cluster it? is the "private gigabit network" in the same physical location? (assuming it is ... but have to ask)


No, this webserver is not actually a typical clustered web server. All that we do is set up a web server (multi-homed - one public IP and one private IP) and it is communicating with the database on the Vicibox server. The only thing that we copy over are the /agc/ directory and the /etc/astguiclient.conf file so that the web server can access the appropriate database IP. Plus, before we upgraded, we had this same webserver running and it seems that the error occurs even if the web interface is accessed from the same system as the database. It is a 'private gigabit network' in the same location.

exactly where does this system delay message appear? what type of phones are you using?


The system delay message is appearing in the AGC interface. We are using Polycom 300 and Polycom 301 phones.

Thank you in advance.

PostPosted: Mon Oct 31, 2011 3:07 pm
by williamconley
Running it locally on the same server seems to be giving the same results.
I'm not sure this is convincing me you abandoned that cool "mini server" you created and just went with the "local" server for the experiment. Please put something in here that makes it plain that you tried a test where the "other" server was completely unused, your agent web, agent phone registration, database, and carriers are all on one server. Just to be clear.

also verify that this system is physical, not virtual. just to be funny.

i've never seen
Exceptionally long voice queue length queuing
before, so I'm not sure which form of "debug" would contain deeper data on it.

also, may be useful to copy/paste the agc message verbatim as well. just in case a google search triggers something.

Issue Resolved - my.cnf changes

PostPosted: Tue Nov 01, 2011 8:36 am
by chrism
The AGC message verbatim:
Agent Alert! System Delay, Please try again
code:741221 - 1


After doing some research, I found that I had a similar problem in the past with the database just not being able to respond fast enough. I found that changing my.cnf helped a lot. I applied the following changes to our ViciBox install and it seems that the error has gone away. My assumption is that asterisk wasn't able to respond fast enough because the database was being overworked and not caching enough.

Code: Select all
vicibox:/etc # diff /etc/my.cnf /etc/my.20111101.cnf
32,35c32,33
< key_buffer_size = 200M
< join_buffer=1M
< max_allowed_packet = 16M
< table_cache=1536
---
> key_buffer_size = 640M
> max_allowed_packet = 2M
37,38c35,36
< sort_buffer_size = 2M
< read_buffer_size = 2M
---
> sort_buffer_size = 4M
> read_buffer_size = 4M
40,41c38
< max_connect_errors=10
< read_rnd_buffer_size = 2M
---
> read_rnd_buffer_size = 16M
43,46c40,41
< thread_cache_size = 128
< query_cache_size = 64M
< query_cache_limit = 1M
< query_cache_type = 1
---
> thread_cache_size = 16
> query_cache_size = 256M
52c47
< connect_timeout=100
---
> connect_timeout=60
55,56c50
< max_connections=900
< interactive_timeout = 100
---
> max_connections=640
173a168
> key_buffer_size = 20M
179,180c174
< interactive_timeout=100
< wait_timeout=100
---
> interactive-timeout


/etc/my.20111101.cnf = Original my.cnf from Vicibox
/etc/my.cnf = Modified my.cnf

If you also notice, the default Vicibox has key_buffer_size = 20M and key_buffer_size = 640M defined in two different places inside of the my.cnf. I'd imagine this might have caused an issue too. I brought mine down to 200M from 640M and removed the duplicate 20M reference to key_buffer_size. Where is a good place to report this as an issue?

I'm not sure which one of the changes helped, but this configuration seems to work.

Here is the final version of my.cnf:

Code: Select all
cat my.cnf
# Example MySQL config file for medium systems.
#
# This is for a system with little memory (32M - 64M) where MySQL plays
# an important part, or systems up to 128M where MySQL is used together with
# other programs (such as a web server)
#
# You can copy this file to
# /etc/my.cnf to set global options,
# mysql-data-dir/my.cnf to set server-specific options (in this
# installation this directory is /var/lib/mysql) or
# ~/.my.cnf to set user-specific options.
#
# In this file, you can use all long options that a program supports.
# If you want to know which options a program supports, run the program
# with the "--help" option.

# The following options will be passed to all MySQL clients
[client]
port            = 3306
socket          = /var/run/mysql/mysql.sock

# Here follows entries for some specific programs

# The MySQL server
[mysqld]
port            = 3306
socket          = /var/run/mysql/mysql.sock
# Change following line if you want to store your database elsewhere
datadir = /var/lib/mysql
skip-locking
key_buffer_size = 200M
join_buffer=1M
max_allowed_packet = 16M
table_cache=1536
table_open_cache = 8192
sort_buffer_size = 2M
read_buffer_size = 2M
net_buffer_length = 8K
max_connect_errors=10
read_rnd_buffer_size = 2M
myisam_sort_buffer_size = 128M
thread_cache_size = 128
query_cache_size = 64M
query_cache_limit = 1M
query_cache_type = 1
# Try number of CPU's*2 for thread_concurrency
thread_concurrency = 16
#log=/var/log/mysqldquery.log
# ViciDial DB-Only Settings
skip-name-resolve
connect_timeout=100
long_query_time = 3
log_slow_queries
max_connections=900
interactive_timeout = 100
open_files_limit=24576
max_heap_table_size = 64M
concurrent_insert=2
expire_logs_days=3
skip-innodb


# Don't listen on a TCP/IP port at all. This can be a security enhancement,
# if all processes that need to connect to mysqld run on the same host.
# All interaction with mysqld must be made via Unix sockets or named pipes.
# Note that using this option without enabling named pipes on Windows
# (via the "enable-named-pipe" option) will render mysqld useless!
#
#skip-networking

# Replication Master Server (default)
# binary logging is required for replication
log-bin=mysql-bin

# binary logging format - mixed recommended
binlog_format=mixed

# required unique id between 1 and 2^32 - 1
# defaults to 1 if master-host is not set
# but will not function as a master if omitted
server-id       = 1

# The safe_mysqld script
[safe_mysqld]
log-error       = /var/log/mysql/mysqld.log
socket          = /var/run/mysql/mysql.sock

[mysqldump]
socket          = /var/run/mysql/mysql.sock
quick
max_allowed_packet = 16M

[mysql]
no-auto-rehash
# Remove the next comment character if you are not familiar with SQL
#safe-updates

[myisamchk]
sort_buffer_size = 20M
read_buffer = 2M
write_buffer = 2M

[mysqlhotcopy]
interactive_timeout=100
wait_timeout=100

[mysqld_multi]
mysqld     = /usr/bin/mysqld_safe
mysqladmin = /usr/bin/mysqladmin
log        = /var/log/mysqld_multi.log


In addition, I noticed a lot of my tables are set up as:

Code: Select all
ENGINE=MyISAM DEFAULT CHARSET=latin1;


However, the default database had many of the tables created as:

Code: Select all
ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;


Could this be an issue with the CHARSET and/or COLLATE option and that's why I had to configure those settings inside of my.cnf?