How we automated the handling of MySQL primary failovers in DBaaS

[In part I of this blog series we saw an overview of our DBaaS design. In this blog (part II) we discuss the automated MySQL primary failover design in some detail.] 

Automated MySQL primary failover

We use the hashicorp consul and a custom consul watcher service, along with the orchestrator service, to inject proxySQL route changes in the event of a MySQL primary failure in a shard.

Consul cluster

Consul nodes, typically three in number, are configured to run in three different AZ. These nodes form a cluster and the orchestrator, proxySQL nodes are configured to join that cluster.  Orchestrator nodes are installed with the consul agent binary in order to be able to make easy API calls to the cluster. We use the consul as a key/value store and leverage its key watcher functionality. 

Orchestrator hooks

In part I of this blog series we discussed a typical MySQL shard architecture. In our setup a typical shard contains a primary, a candidate primary and a replica. During a primary failure event, the orchestrator promotes the candidate primary to primary and re-points the replica to the new primary. The orchestrator topology recovery process goes through several stages and lets administrators hook into those stages via custom scripts. For more information on orchestrator hooks, check this out.

We register one script each for the PreFailoverProcesses, PostFailoverProcesses and PostUnsuccessfulFailoverProcesses hooks. The scripts when invoked make appropriate entries into consul keys. Note that the contents of the scripts here are just illustrative.

orchestrator-prefailover.sh


#/bin/bash

# Pre-failover hook. We populate the consul key with some json data. The format is as follows. We derive the team, shard number which is not shown here.
# The payload is as follows
# {
#           "failed_host": "1.2.3.4",
#           "hook": "pre-failover",
#           "shard_number": 1,
#           "successor_host": ""
# }
#
# Note that ORC_FAILED_HOST env variable is automatically set by orchestrator
consul kv put mysql-clusters/"$team"/"$shard"/master "{\"hook\": \"pre-failover\", \"failed_host\": \"${ORC_FAILED_HOST}\", \"successor_host\": \"\", \"shard_number\": ${shard_number}}"

orchestrator-postfailover.sh


#/bin/bash

# Post-failover hook. We populate the consul key which contains json data in the following json format. We derive the team, shard number which is not shown here.
# The payload is as follows
# {
#      "failed_host": "1.2.3.4",
#      "successor_host": "5.6.7.8",
#      "hook": "post-failover",
#      "shard_number": 1
# }

# Note that ORC_FAILED_HOST, ORC_SUCCESSOR_HOST env variable is automatically set by orchestrator.
# We update consul key only if there is a successorHost.
if [ ! -z "${ORC_SUCCESSOR_HOST}" ]
then

consul kv put mysql-clusters/"$team"/"$shard"/master "{\"hook\": \"post-failover\", \"failed_host\": \"${ORC_FAILED_HOST}\", \"successor_host\": \"${ORC_SUCCESSOR_HOST}\", \"shard_number\": ${shard_number}}"
fi

orchestrator-postunsuccessfulfailover.sh


#/bin/bash

# Not much here. We just send an alert.

Consul watcher service

We developed a custom consul watcher service. This service runs on all the proxySQL nodes. The service watches for consul key changes for the tenant that the proxySQL is responsible for. The service reacts appropriately to the key changes by the pre-failover and post-failover hook scripts by injecting proxySQL route changes. The following is a diagram that gives an overview of the setup.

 

As you can see, the consul watcher service of tenants watches for changes in its respective key paths in consul. Here, the consul watcher running on all tenant-A proxySQL nodes watches for changes under the mysql_cluster/tenant-A key path. Similarly the tenant-B watcher services watch for changes under the mysql_cluster/tenant-B key path.

The consul watcher service is a simple python script we run as a systemd service as shown below:


[Unit]
Description=Consul watcher for proxysql
After=proxysql.target
Wants=network-online.target

[Service]
Type=simple
User=proxysql
Group=proxysql
ExecStart=/usr/local/bin/consul watch -type=keyprefix -prefix=mysql-clusters/{{ tenant }} /opt/proxysql/consul_watcher.py

ExecReload=/bin/kill -HUP $MAINPID
KillSignal=SIGINT
TimeoutStopSec=5
Restart=on-failure
SyslogIdentifier=consul-watcher

[Install]

This service reacts appropriately to consul key changes under the /mysql_cluster/tenant-X/… path. When the orchestrator runs the pre-failover script on a primary failure event for a given tenant shard, this service tries to fetch the failed primary information from the json payload inside the consul key and modifies the running proxySQL config. It tries to shun the failed primary on proxySQL. This config change happens on all the proxySQL of the given tenant. The idea is to stop the write traffic to the shard primary just before the orchestrator attempts the actual failover. When the post-failover script is executed by the orchestrator after the failover, the service looks for the new payload in the consul key and adds the successor host (candidate primary) in the right host_group as the new shard primary.

Conclusion

Manual intervention to fix a failed primary was not going to meet our SLA requirement. Automated MySQL primary failover, therefore, was critical to achieving high availability. In this blog we saw how we integrated orchestrator hooks with consul and a consul key watcher service to inject proxySQL route changes in order to handle MySQL primary failovers. In the next blog of this series we will discuss our DBaaS backup and restoration strategies.

Cover image: Vignesh Rajan