vicidial.org

by **sbenson** » Mon Sep 17, 2012 5:01 pm

I have setup a new server to be a DB slave to the primary DB server. everything looks fine, then randomly it says it's 19k seconds out of sync. No errors showing when it says that, just randomly does that. Nothing that makes sense on why, timing or anything. Both servers times are in sync. Anyone ever seen this before?

Vici-DB2:~ # ssh root@Vici-DB1 'echo DB1;date'; echo DB2; date
Password:
DB1
Mon Sep 17 14:58:58 MST 2012
DB2
Mon Sep 17 14:58:58 MST 2012

Vici-DB2:~ # date;for i in {1..20}; do echo "show slave status\G"|mysql|grep Seconds_Behind_Master|awk -F':' '{print $2" seconds behind master"}';sleep .5;done;date
Mon Sep 17 14:56:56 MST 2012
0 seconds behind master
0 seconds behind master
0 seconds behind master
0 seconds behind master
0 seconds behind master
0 seconds behind master
19935 seconds behind master
0 seconds behind master
0 seconds behind master
0 seconds behind master
0 seconds behind master
0 seconds behind master
0 seconds behind master
0 seconds behind master
0 seconds behind master
0 seconds behind master
0 seconds behind master
0 seconds behind master
19941 seconds behind master
19941 seconds behind master
Mon Sep 17 14:57:07 MST 2012

Any suggestions? This is actually the replacement DB server to replace the current second server, that is having these problems also.

Thanks in advance,
Scott

by **williamconley** » Mon Sep 17, 2012 10:32 pm

I'm not sure if it is even abnormal. Have you looked at some mysql forums?

Seconds_Behind_Master

This field is an indication of how “late” the slave is:

When the slave SQL thread is actively processing updates, this field is the number of seconds that have elapsed since the timestamp of the most recent event on the master executed by that thread.

When the SQL thread has caught up to the slave I/O thread and is idle waiting for more events from the I/O thread, this field is zero.

In essence, this field measures the time difference in seconds between the slave SQL thread and the slave I/O thread.

If the network connection between master and slave is fast, the slave I/O thread is very close to the master, so this field is a good approximation of how late the slave SQL thread is compared to the master. If the network is slow, this is not a good approximation; the slave SQL thread may quite often be caught up with the slow-reading slave I/O thread, so Seconds_Behind_Master often shows a value of 0, even if the I/O thread is late compared to the master. In other words, this column is useful only for fast networks.

This time difference computation works even if the master and slave do not have identical clock times, provided that the difference, computed when the slave I/O thread starts, remains constant from then on. Any changes—including NTP updates—can lead to clock skews that can make calculation of Seconds_Behind_Master less reliable.

This field is NULL (undefined or unknown) if the slave SQL thread is not running, or if the slave I/O thread is not running or not connected to master. For example, if the slave I/O thread is running but is not connected to the master and is sleeping for the number of seconds given by the CHANGE MASTER TO statement or --master-connect-retry option (default 60) before reconnecting, the value is NULL. This is because the slave cannot know what the master is doing, and so cannot say reliably how late it is.

The value of Seconds_Behind_Master is based on the timestamps stored in events, which are preserved through replication. This means that if a master M1 is itself a slave of M0, any event from M1's binary log that originates from M0's binary log has M0's timestamp for that event. This enables MySQL to replicate TIMESTAMP successfully. However, the problem for Seconds_Behind_Master is that if M1 also receives direct updates from clients, the Seconds_Behind_Master value randomly fluctuates because sometimes the last event from M1 originates from M0 and sometimes is the result of a direct update on M1.

by **Kumba** » Mon Sep 17, 2012 11:59 pm

I've seen this happen when the slave is underpowered and something like the archive script or DB optimize script runs on the master. Usually not over the course of a 20-second span. Considering the 19K-seconds is gone 1-second later I would say this is just a time-slip from NTP or a fast clock or the like. In other words, say DB1 was half a second ahead of DB2. You could easily get some funny readings if DB2 processed a packet with a timestamp that was in the future, even if it was only milliseconds. The same could be said if NTP goes to adjust the clock drift and you end up getting strange timestamps.

The bigger question is are there any real side-effects happening? If the slave is replicating and not really getting out of sync, then all you are chasing is a harmless annoyance.

by **Vince-0** » Tue Sep 18, 2012 2:18 am

I have seen instances where using the slave compression protocol helps with this apparent time problem. If your slave has available resources to deal with the decompression:

Code: Select all: http://dev.mysql.com/doc/refman/5.1/en/replication-options-slave.html#sysvar_slave_compressed_protocol

I also used binlog_format=statement, which creates lots of warning logs but this can be switched off, but it keeps the slave in sync where I had problems in the past with row based binlog formats.

by **sbenson** » Tue Sep 18, 2012 10:42 am

First off, Thank you for all of your help. I've setup replication before and never had this issue, only seems to have happened with this instance of mysql on these vicibox's. I'm not sure if it's the master box causing the issue or not, but both boxes are definitely powerful enough to handle the replication. There are no errors happening in Mysql causing the syncing to stop. It just happens randomly does this. I don't think the data is actually causing an issue, just wondering if anyone has ever seen this happening before. We have monitoring tools watching these servers and didn't want to get some false notifications based on this. Below is the information about the servers. Thanks in advance.

Scott

vicibox-db1:~ # free -m
total used free shared buffers cached
Mem: 16049 14137 1911 0 114 12713
-/+ buffers/cache: 1309 14740
Swap: 4108 36 4072

vicibox-db1:~ # uptime
08:19am up 140 days 2:19, 3 users, load average: 0.00, 0.03, 0.00

vicibox-db1:~ # cat /proc/cpuinfo |egrep 'processor|model name'
processor : 0
model name : AMD Opteron(tm) Processor 4234
processor : 1
model name : AMD Opteron(tm) Processor 4234
processor : 2
model name : AMD Opteron(tm) Processor 4234
processor : 3
model name : AMD Opteron(tm) Processor 4234
processor : 4
model name : AMD Opteron(tm) Processor 4234
processor : 5
model name : AMD Opteron(tm) Processor 4234
processor : 6
model name : AMD Opteron(tm) Processor 4234
processor : 7
model name : AMD Opteron(tm) Processor 4234
processor : 8
model name : AMD Opteron(tm) Processor 4234
processor : 9
model name : AMD Opteron(tm) Processor 4234
processor : 10
model name : AMD Opteron(tm) Processor 4234
processor : 11
model name : AMD Opteron(tm) Processor 4234

Vici-DB2:~ # free -m
total used free shared buffers cached
Mem: 7989 7333 655 0 150 6038
-/+ buffers/cache: 1145 6844
Swap: 4101 0 4101

Vici-DB2:~ # uptime
08:18am up 6 days 13:45, 2 users, load average: 0.20, 0.16, 0.15

Vici-DB2:~ # cat /proc/cpuinfo |egrep 'processor|model name'
processor : 0
model name : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
processor : 1
model name : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
processor : 2
model name : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
processor : 3
model name : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
processor : 4
model name : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
processor : 5
model name : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
processor : 6
model name : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
processor : 7
model name : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz

by **williamconley** » Fri Mar 15, 2013 3:12 pm

I would attempt to correlate the events with something else in logs. Particularly ntp. If ntp makes a slight correction during this period ...

How are you syncing the clocks exactly?

vicidial.org

Problems with replication

Problems with replication

Re: Problems with replication

Re: Problems with replication

Re: Problems with replication

Re: Problems with replication

Re: Problems with replication

Who is online