Is MySQL Master Slave Replication Working
Often, the bottleneck and single point of failure to many big systems is Database.
There are slaves configured for Master to share atleast the READ load.
MySQL Master-Slave Replication is so robust at core, there are very less mentions of its not working properly.
But when replication fails, and if left unnoticed, can cause serious problems like wrong reporting, wrong results / interpetations in applications connected to slaves.
E.g. E-Commerce selling T-shirts. All orders recorded in Master (obviously :) ). An App connected to Slave reading delivered orders and paying the merchant accordingly.
If replication fails, associated merchants will be unhappy to not get any payments as none reported as released order since last payment.
Orders are sold but not replicated on slave to reflect :(
Morale: Replication Errors should not go unnoticed and reported almost realtime so they can be looked at the earliest.
Here are few links which can get you great idea on how to set-up basic replication monitoring:
- How to Monitor MySQL Replication?
- Replication monitoring with monit
- More bits on monit
- MySQL Master-Slave Replication – Bash Script Monitor
Exceptional Replication Error due to mismatch in records created on master / slave when a bulk insert query replicated on slave form master
As mentioned here
INSERT INTO packages(`package_number`, `status`, `shipping_status`) ( SELECT orders.order_number, IF(SUM(line_items.status) = COUNT(*) * 2, "cancelled", "released"), orders.shipping_status FROM line_items INNER JOIN orders ON orders.id = line_items.order_id WHERE line_items.shippable = 1 GROUP by line_items.order_id )
Mass Insert Queries like above not dictating
ORDER BY SQL clause can
result into rows in
packages table with same package number having different
primary id column and disrupting the Replication Flow.
Little fact about mysql replication is that it works by replaying commands over SLAVE as executed on MASTER using bin log being sent from MASTER.
The bug was unusual and very-very hard to catch as no errors on mysql-replication reported.
It got surfaced by comparing same reports on MASTER and SLAVE and noticing different results and that too not because of replication lag as stayed same throughout the day :(
- Monitor your MySQL Replication as it can disrupt the system in unusual ways
- Data Migrations might result in different data-sets on Master / Slave,
Keep a Watch !!
- If possible, add a
ORDER BYclause to force ordering and increasing chances of same data-set
- After migration, compare the data-sets for exact replicas and set-up replication again if inconsistent
- If possible, add a