OT: Switching the MySQL replication Master

This is one of those blog posts where I (ab)use blogs.perl.org to document some notes, primarily for myself ...

  • Server A : is the old "Master"
  • Server B : is the new "Master" (a rebuild of A) and for a transition period is replicating from A until it is promoted to the main Master and A can be switched off.
  • Server C : currently replicates from A, however we want to instead move it to replicate from B in advance of replacing A with B.

The idea of having A replicating to B means that during the "switchover" from the (old) A to the (new) B, we minimise the downtime. We don't need to take A offline, dump the (large) DBs, copy them to B, reload them, then bring B back up as the new master.

Instead, we can stop everything on A, stop A replicating to B, and bring up B as the new master. Faster & less to go wrong & less downtime.

Our problem was that we had a lot of slaves chained off Server C, and relatively large databases that could take ~1 hour to dump/reload. Thus the more normal route of:

mysqldump --master-data master_db > some_dump.sql
rsync some_dump.sql slave:
ssh slave "mysql -u root < some_dump.sql"

would work, but we would need to duplicate this on each of the (many) servers down the replication chain. This seemed ugly.

Some background

MySQL binary log (if enabled) logs all the changes to the database, typically by logging the SQL queries that executed and changed data.

A MySQL Slave has two replication processes. The IO process which connects to the master and copies down data (aka SQL statements) from the masters binary log.

The slave stores these SQL statements in it's local Relay log. The second SQL replication process then applies these statements to the local/slave database.

Aside: recent versions of MySQL can now replicate the SQL statements (the older approach) or the row data.

If you want a single server (C) to act as both a Slave (from A), and then also act as a Master to other downstream slaves (A => C => other slaves), then in addition to

-- tell C to replicate "db_name" from it's master
replicate-do-db = db_name
-- tell C to binary log "db_name" so others can slave from C
binlog-do-db = db_name

and the various other options to tell C what/where it's Master is (which you can either put directly into my.cnf or instead use the CHANGE MASTER sql syntax

... you also need:

log-slave-updates

Without this, Server C won't binary log the statements it receives from an up-stream master. Namely, it will only log queries executed on C. Not queries executed on A and replicated to C. Without this binary logging, C cannot function as a Master to other slaves.

What to do

Stop C (slave) replicating from A (old master). B (new master) is still replicating from A, and thus B is guaranteed to be at the same point, or ahead, of C.

ssh C
mysql -u root
> STOP SLAVE;

Find the last statement that C has executed by looking in C's relay logs.

mysql -u root
> show slave status\G

and look for the lines:

Relay_Log_File: host-relay-bin.000002

that file will by default appear in your MySQL data dir (/var/lib/mysql on debian based OSs) or can be configured with the relay-log option.

cd /var/lib/mysql
mysqlbinlog host-relay-bin.000002

Find the equivalent statement in the new Master's (B) binary logs, the location of which is set by the log-bin option in your my.cnf

The most recent binary log is likely the one in use, but running "SHOW SLAVE STATUS\G" on the server will confirm this. Note that we're looking in B's binary logs, as these are what will be sent to our slave C.

mysqlbinlog mysql-bin.000022 | tail -100

should show something like:

#111115 13:02:48 server id 3102  end_log_pos 46316299   Query   thread_id=94257 exec_time=0     error_code=0
use db_name/*!*/;
SET TIMESTAMP=1321362168/*!*/;
UPDATE `some_table` SET `some_col` = 'some_value' WHERE `some_other_col` = 'some_other_value'
/*!*/;
# at 46316299
#111115 13:02:48 server id 3102  end_log_pos 46316326   Xid = 5346061
COMMIT/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

In this case, the position we're after is 46316326 (assuming that the last statement to execute on C was the "UPDATE some_table SET some_col = 'some_value'..." one).

Finally, re-configure the Master details for C

ssh C
mysql -u root
> CHANGE MASTER TO
      MASTER_HOST = 'ip.or.hostname.for.B.com',
      MASTER_LOG_FILE = 'mysql-bin.000057',      -- binary log on B
      MASTER_LOG_POS = 46316326;                    -- position in B's binary log
> START SLAVE;
> SHOW SLAVE STATUS\G

Conclusion

This might seem insanely complex, but when you have many active production servers all replicating in non trivial patterns and relatively long "chains" of Master -> Slave/Master -> Slave/Master -> Slave and you want to minimise the downtime ... it does at least appear to work.

It's not really all that complicated -- just a lot of concepts and details (plus a few weird MySQL-isms) to wrap your head around.

Leave a comment

About minty

user-pic I blog about Perl.