Active/Passive MySQL High Availability Pacemaker Cluster with DRBD

High availability means a lot of things, but the primary meaning is contained within the name. Keeping a database available at all times can be a rough challenge, and for certain environments, redundancy is a must. Two of everything is the order of the day. This
sounds expensive at first, but a hardware failure at the wrong time can result in millions of dollars in loss for some business cases.Basic RGB
Keeping two machines synchronized is no easy task. Some solutions seek to share a single disk device, mounted by two servers that can replace each other during failures or upgrades. Others opt for duplicating the machines and the disks, to prevent a single
point of failure such as a SAN or other attached storage.

Pacemaker is nowadays one of the best references for High Availability. The Pacemaker + Corosync stack is able to detect failures on various services and automatically decide to failover the failing resource to another node when possible.

To be able to manage a specific service resource, Pacemaker interact with it through a so-called “Resource Agent”. A Resource Agent is an external program that abstracts the service it provides and present a consistent view to the cluster.

Fencing is one of the mandatory piece you need when building an highly available cluster for your database.It’s the ability to isolate a node from the cluster.

Should an issue happen where the master does not answer to the cluster, successful fencing is the only way to be sure what is its status: shutdown or not able to accept new work or touch data. It avoids countless situations where you end up with split brain scenarios or data corruption

DRBD stands for Distributed Replicating Block Device and can solve several of the replication problems we mentioned earlier.
First and foremost, it’s the equivalent of a network-driven RAID-1, but at the block level, existing below the OS filesystem. What does this mean? .A sync must be acknowledged by both nodes before it succeeds. So just like synchronous replication, no committed transaction can be lost, because it’s written to both nodes.

Architecture Setup :

pacemaker and coro-1

Networking, Firewall and SELinux Configuration

Following are the network configuration

  1. 10.7.7.0/24 – LAN with access to the Internet (host-only adapter)
  2. 172.16.21.0/24 – non-routable cluster heartbeat vlan for Corosync (internal network adapter)
  3. 172.16.22.0/24 – non-routable cluster heartbeat vlan for DRBD (internal network adapter).

Hostnames and IPs as defined in /etc/hosts file:

10.7.7.60 haproxysrv
10.7.7.61 mysqlsrv1
10.7.7.62 mysqlsrv2
172.16.21.11 mysqlsrv1-cr
172.16.21.12 mysqlsrv1-cr
172.16.22.11 mysqlsrv1-drbd
172.16.22.12 mysqlsrv1-drbd

We have set the following hostnames:

# hostnamectl set-hostname mysqlsrv1
# hostnamectl set-hostname mysqlsrv2

Network configuration for the first node can be seen below, it is the same for the second node except the IPs which are specified above.

[mysqlsrv1]# cat /etc/sysconfig/network-scripts/ifcfg-enp0s8
#Corosync ring0
NAME="enp0s8"
DEVICE="enp0s8"
IPADDR="172.16.21.11"
PREFIX="24"
TYPE="Ethernet"
IPV4_FAILURE_FATAL="yes"
IPV6INIT="no"
DEFROUTE="no"
PEERDNS="no"
PEERROUTES="no"
ONBOOT="yes"
[mysqlsrv1]# cat /etc/sysconfig/network-scripts/ifcfg-enp0s9
#DRBD
NAME="enp0s9"
DEVICE="enp0s9"
IPADDR="172.16.22.11"
PREFIX="24"
TYPE="Ethernet"
IPV4_FAILURE_FATAL="yes"
IPV6INIT="no"
DEFROUTE="no"
PEERDNS="no"
PEERROUTES="no"
ONBOOT="yes"
[mysqlsrv1]# cat /etc/sysconfig/network-scripts/ifcfg-enp0s17
#LAN
NAME="enp0s17"
DEVICE="enp0s17"
IPADDR="10.7.7.61"
PREFIX="24"
GATEWAY="10.7.7.1"
DNS1="7.7.7.7"
DNS2="8.8.4.4"
TYPE="Ethernet"
IPV4_FAILURE_FATAL="yes"
IPV6INIT="no"
DEFROUTE="yes"
PEERDNS="yes"
ONBOOT="yes"

For test purpose we switch off Firewalld

[ALL]# systemctl stop firewalld.service

We have also disabled IPv6, open /etc/sysctl.conf for editing and place the following:

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
[ALL]# sysctl -p

SELinux

SELinux is set to  permissive mode

Install Pacemaker and Corosync

[ALL]# yum install -y pcs

The pcs will install pacemaker, corosync and resource-agents as dependencies.

For SELinux management:

[ALL]# yum install -y policycoreutils-python

In RHEL 7, we have to set up a password for the pcs administration account named hacluster:

[ALL]# echo "passwd" | passwd hacluster --stdin

Start and enable the service:

[ALL]# systemctl start pcsd.service
[ALL]# systemctl enable pcsd.service

Configure Corosync

Authenticate as the hacluster user. Authorisation tokens are stored in the file /var/lib/pcsd/tokens.

[mysqlsrv1]# pcs cluster auth mysqlsrv1-cr mysqlsrv2-cr -u hacluster -p passwd
mysqlsrv1-cr: Authorized
mysqlsrv2-cr: Authorized

Generate and synchronise the Corosync configuration.

[mysqlsrv1]# pcs cluster setup --name mysql_cluster mysqlsrv1-cr mysqlsrv2-cr

Start the cluster on all nodes:

[mysqlsrv1]# pcs cluster start --all

Install DRBD and MySQL

DRBD Installation

DRBD refers to block devices designed as a building block to form high availability clusters. This is done by mirroring a whole block device via an assigned network. DRBD can be understood as network based RAID-1.

Import the ELRepo package signing key, enable the repository and install the DRBD kernel module with utilities:

[ALL]# rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
[ALL]# rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
[ALL]# yum install -y kmod-drbd84 drbd84-utils

To avoid issues with SELinux, for the time being, we are going to exempt DRBD processes from SELinux control:

[ALL]# semanage permissive -a drbd_t

LVM Volume for DRBD

Create a new 1GB logical volume for DRBD:

[mysqlsrv1]# vgs
  VG         #PV #LV #SN Attr   VSize  VFree
  vg_centos7   1   3   0 wz--n- 63.21g 45.97g
[ALL]# lvcreate --name lv_drbd --size 1024M vg_centos7

Configure DRBD

Configure DRBD, use single-primary mode with replication protocol C.

[ALL]# cat << EOL >/etc/drbd.d/mysql01.res
resource mysql01 {
 protocol C;
 meta-disk internal;
 device /dev/drbd0;
 disk   /dev/vg_centos7/lv_drbd;
 handlers {
  split-brain "/usr/lib/drbd/notify-split-brain.sh root";
 }
 net {
  allow-two-primaries no;
  after-sb-0pri discard-zero-changes;
  after-sb-1pri discard-secondary;
  after-sb-2pri disconnect;
  rr-conflict disconnect;
 }
 disk {
  on-io-error detach;
 }
 syncer {
  verify-alg sha1;
 }
 on mysqlsrv1 {
  address  172.16.22.11:7789;
 }
 on mysqlsrv2 {
  address  172.16.22.12:7789;
 }
}
EOL

We have a resource named mysql01 which uses /dev/vg_centos7/lv_drbd as the lower-level device, and is configured with internal meta data.

The resource uses TCP port 7789 for its network connections, and binds to the IP addresses 172.16.22.11 and 172.16.22.12, respectively.

If case we run into problems, we have to ensure that a TCP 7789 port is open on a firewall for the DRBD interface and that the resource name matches the file name.

Create the local metadata for the DRBD resource:

[ALL]# drbdadm create-md mysql01

Ensuring that a DRBD kernel module is loaded, bring up the DRBD resource:

[ALL]# drbdadm up mysql01

For data consistency, tell DRBD which node should be considered to have the correct data (can be run on any node as both have garbage at this point):

[mysqlsrv1 ]# drbdadm primary --force mysql01

Observe the sync:

[mysqlsrv1]# drbd-overview
 0:mysql01/0  SyncSource Primary/Secondary UpToDate/Inconsistent 
	[=>..................] sync'ed: 11.8% (926656/1048508)K

Create a filesystem on the DRBD device and tune as required:

[mysqlsrv1 ]# mkfs.ext4 -m 0 -L drbd /dev/drbd0
[mysqlsrv1 ]# tune2fs -c 30 -i 180d /dev/drbd0

Mount the disk, we will populate it with MySQL content shortly:

[mysqlsrv1 ]# mount /dev/drbd0 /mnt

MySQL Installation

yum -y install https://dev.mysql.com/get/mysql80-community-release-el7-1.noarch.rpm
sudo yum install mysql-community-server

Ensure the MySQL service is disabled, as it will managed by pacemaker:

# systemctl disable mysqld.service

Now start the MySQL service manually on one of the cluster nodes:

[mysqlsrv1]# systemctl start mysqld.service

We can also install a fresh MySQL database with the mysql_install_db command:

[mysqlsrv1]# mysql_install_db --datadir=/mnt --user=mysql

Run secure installation:

[mysqlsrv1]# mysql_secure_installation

We need to give the same SELinux policy as the MySQL DB datadir. The mysqld policy stores data with multiple different file context types under the /var/lib/mysql directory. If we want to store the data in a different directory, we can use the semanage command to add file context.

[mysqlsrv1]# semanage fcontext -a -t mysqld_db_t "/mnt(/.*)?"
[mysqlsrv1]# restorecon -Rv /mnt

Please be advised that changes made with the chcon command do not survive a file system relabel, or the execution of the restorecon command. Always use semanage.

At this point our preparation is complete, we can unmount the temporarily mounted filesystem and stop the MySQL DB service:

[mysqlsrv1]# umount /mnt
[mysqlsrv1]# systemctl stop mysqld.service

Last thing to do, we have to put some very basic my.cnf configuration:

[ALL]# cat << EOL > /etc/my.cnf
[mysqld]
symbolic-links=0
bind_address            = 0.0.0.0
datadir                 = /var/lib/mysql
pid_file                = /var/run/mysql/mysqld.pid
socket                  = /var/run/mysql/mysqld.sock

[mysqld_safe]
bind_address            = 0.0.0.0
datadir                 = /var/lib/mysql
pid_file                = /var/run/mysql/mysqld.pid
socket                  = /var/run/mysql/mysqld.sock

!includedir /etc/my.cnf.d
EOL

Configure Pacemaker Cluster

We want the configuration logic and ordering to be as below:

  1. Start: mysql_fs01 -> mysql_service01 -> mysql_VIP01,
  2. Stop: mysql_VIP01 -> mysql_service01 -> mysql_fs01.

Where mysql_fs01 is the filesystem resource, mysql_service01 is the mysqld service resource, and mysql_VIP01 is the floating virtual IP 10.7.7.60.

One handy feature pcs has is the ability to queue up several changes into a file and commit those changes atomically. To do this, we start by populating the file with the current raw XML config from the CIB:

[mysqlsrv1]# pcs cluster cib clust_cfg

Disable STONITH. Be advised that a node level fencing configuration depends heavily on environment. You can check thispage for Pacemaker STONITH device configuration on VMware.

[mysqlsrv1]# pcs -f clust_cfg property set stonith-enabled=false

Set quorum policy to ignore:

[mysqlsrv1]# pcs -f clust_cfg property set no-quorum-policy=ignore

Prevent the resources from moving after recovery as it usually increases downtime:

[mysqlsrv1]# pcs -f clust_cfg resource defaults resource-stickiness=200

Create a cluster resource named mysql_data01 for the DRBD device, and an additional clone resource MySQLClone01 to allow the resource to run on both cluster nodes at the same time:

[mysqlsrv1]# pcs -f clust_cfg resource create mysql_data01 ocf:linbit:drbd \
  drbd_resource=mysql01 \
  op monitor interval=30s
[mysqlsrv1]# pcs -f clust_cfg resource master MySQLClone01 mysql_data01 \
  master-max=1 master-node-max=1 \
  clone-max=2 clone-node-max=1 \
  notify=true

Note the meta variables used:

master-max: how many copies of the resource can be promoted to master status,
master-node-max: how many copies of the resource can be promoted to master status on a single node,
clone-max: how many copies of the resource to start. Defaults to the number of nodes in the cluster,
clone-node-max: how many copies of the resource can be started on a single node,
notify: when stopping or starting a copy of the clone, tell all the other copies beforehand and when the action was successful.

Create a cluster resource named mysql_fs01 for the filesystem. Tell the cluster that the clone resource MySQLClone01 must be run on the same node as the filesystem resource, and that the clone resource must be started before the filesystem resource.

[mysqlsrv1]# pcs -f clust_cfg resource create mysql_fs01 Filesystem \
  device="/dev/drbd0" \
  directory="/var/lib/mysql" \
  fstype="ext4"
[mysqlsrv1]# pcs -f clust_cfg constraint colocation add mysql_fs01 with MySQLClone01 \
  INFINITY with-rsc-role=Master
[mysqlsrv1]# pcs -f clust_cfg constraint order promote MySQLClone01 then start mysql_fs01

Create a cluster resource named mysql_service01 for the MYSQL service. Tell the cluster that the MySQL service must be run on the same node as the mysql_fs01 filesystem resource, and that the filesystem resource must be started first.

[mysqlsrv1]# pcs -f clust_cfg resource create mysql_service01 ocf:heartbeat:mysql \
  binary="/usr/bin/mysqld_safe" \
  config="/etc/my.cnf" \
  datadir="/var/lib/mysql" \
  pid="/var/lib/mysql/mysql.pid" \
  socket="/var/lib/mysql/mysql.sock" \
  additional_parameters="--bind-address=0.0.0.0" \
  op start timeout=60s \
  op stop timeout=60s \
  op monitor interval=20s timeout=30s
[mysqlsrv1]# pcs -f clust_cfg constraint colocation add mysql_service01 with mysql_fs01 INFINITY
[mysqlsrv1]# pcs -f clust_cfg constraint order mysql_fs01 then mysql_service01

Finally, create a cluster resource named mysql_VIP01 for the virtual IP 10.7.7.60.

[mysqlsrv1]# pcs -f clust_cfg resource create mysql_VIP01 ocf:heartbeat:IPaddr2 \
 ip=10.7.7.60 cidr_netmask=32 \
 op monitor interval=30s

Why to use IPaddr2 and not IPaddr:

  1. IPaddr – manages virtual IPv4 addresses (portable version),
  2. IPaddr2 – manages virtual IPv4 addresses (Linux specific version).

The virtual IP mysql_VIP01 resource must be run on the same node as the MySQL DB resource, naturally, and must be started the last. This is to ensure that all other resources are already started before we can connect to the virtual IP.

[mysqlsrv1]# pcs -f clust_cfg constraint colocation add mysql_VIP01 with mysql_service01 INFINITY
[mysqlsrv1]# pcs -f clust_cfg constraint order mysql_service01 then mysql_VIP01

Let us check the configuration:

[mysqlsrv1]# pcs -f clust_cfg constraint
Location Constraints:
Ordering Constraints:
  promote MySQLClone01 then start mysql_fs01 (kind:Mandatory)
  start mysql_fs01 then start mysql_service01 (kind:Mandatory)
  start mysql_service01 then start mysql_VIP01 (kind:Mandatory)
Colocation Constraints:
  mysql_fs01 with MySQLClone01 (score:INFINITY) (with-rsc-role:Master)
  mysql_service01 with mysql_fs01 (score:INFINITY)
  mysql_VIP01 with mysql_service01 (score:INFINITY)
[mysqlsrv1]# pcs -f clust_cfg resource show
 Master/Slave Set: MySQLClone01 [mysql_data01]
     Stopped: [ mysqlsrv1-cr mysqlsrv2-cr ]
 mysql_fs01	(ocf::heartbeat:Filesystem):	Stopped
 mysql_service01	(ocf::heartbeat:mysql):	Stopped
 mysql_VIP01	(ocf::heartbeat:IPaddr2):	Stopped

We can commit changes now and check cluster status:

[mysqlsrv1]# pcs cluster cib-push clust_cfg
[mysqlsrv1]# pcs status
[...]

Online: [ mysqlsrv1-cr mysqlsrv2-cr ]

Full list of resources:

 Master/Slave Set: MySQLClone01 [mysql_data01]
     Masters: [ mysqlsrv1-cr ]
     Stopped: [ mysqlsrv2-cr ]
 mysql_fs01     (ocf::heartbeat:Filesystem):    Started mysqlsrv1-cr
 mysql_service01        (ocf::heartbeat:mysql): Started mysqlsrv1-cr
 mysql_VIP01    (ocf::heartbeat:IPaddr2):	Started mysqlsrv1-cr

[...]

Once the configuration has been committed, Pacemaker will:

  1. Start DRBD on both cluster nodes,
  2. Select one node for promotion to the DRBD Primary role,
  3. Mount the filesystem, configure the cluster IP address, and start the MySQL server on the same node,
  4. Commence resource monitoring

We can test the MySQL service by tel netting into the virtual IP 10.7.7.60 on a TCP por 3306:

# telnet 10.7.7.60 3306
Trying 10.8.8.60...
Connected to 10.7.7.60.
Escape character is '^]'.

Configure HAProxy

With the MySQL cluster set up, you need a way to connect to the master regardless of which of the servers in the cluster is the master. This is where HAProxy comes in.

On the server that has HAProxy installed, edit the configuration file at /etc/haproxy/haproxy.cfg to contain the following:

global
maxconn 100
defaults
log global
mode tcp
retries 2
timeout client 30m
timeout connect 4s
timeout server 30m
timeout check 5s

listen stats
mode http
bind *:7000
stats enable
stats uri /

listen mysql
bind *:5000
option httpchk
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server mysqlsrv1 10.7.7.61:3306 maxconn 100 check port 3306
server mysqlsrv2 10.7.7.62:3306 maxconn 100 check port 3306

Restart HAProxy to use the new settings:

sudo systemctl restart haproxy

If HAProxy fails to start, check for syntax errors:

/usr/sbin/haproxy -c -V -f /etc/haproxy/haproxy.cfg

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Powered by WordPress.com.

Up ↑

%d bloggers like this: