Active/Passive Cluster with Pacemaker, Corosync

In this article I will explains how to set up and maintain an Active/Passive Cluster, using Pacemaker with Corosync with DRBD replication.

We have two nodes (same hardware), one active and another in passive mode. If the active node goes down, the passive one will automatically take its position and all running services.

This article was written complete for and with Debian 6-squeeze (With a lot of luck you could bring it to run on Ubuntu)

Active/Passive

1. Basic notes

Node-1:
Hostname: node-1
IP: 192.168.2.101

Node-2:
Hostname: node-2
IP: 192.168.2.102

Partitioning:

/dev/vda1 System / ext4
/dev/vda5 reserved for DRBD – empty
/dev/vda6 swap area

Cluster IP: 192.168.2.100
We need this cluster IP to reach currently active node.

This is a physical network structure of the cluster:
Structure

2. Basic configuration
To ssh from one node to another without passwords:

1
2
3
4
root@node-1:~# ssh-keygen
root@node-2:~# ssh-keygen
root@node-1:~# ssh-copy-id root@192.168.2.102
root@node-2:~# ssh-copy-id root@192.168.2.101

Now we need to make sure we can communicate with the machines by their name. If you have a DNS server,
add additional entries for the three machines. Otherwise, you’ll need to add the machines to /etc/hosts .
Below are the entries for my cluster nodes:

1
2
3
4
root@node-1:~# cat /etc/hosts
127.0.0.1       localhost
192.168.2.101   node-1
192.168.2.102   node-2
1
2
3
4
root@node-2:~# cat /etc/hosts
127.0.0.1       localhost
192.168.2.101   node-1
192.168.2.102   node-2

3. Installation
Install corosync:

1
aptitude install pacemaker

4. Initial corosync configuration:
Generate key for openais and copy it to node

1
2
3
4
root@node-1:~# corosync-keygen
root@node-1:~# scp /etc/corosync/authkey node-2:/etc/corosync/authkey
chown root:root /etc/corosync/authkey
chmod 400 /etc/corosync/authkey
1
2
3
4
vim /etc/default/corosync
# change to yes
START=yes
scp /etc/default/corosync node-2:/etc/default/corosync

The only block you must change is the network one:
Full corosync.conf:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
totem {
        version: 2
 
        # How long before declaring a token lost (ms)
        token: 3000
 
        # How many token retransmits before forming a new configuration
        token_retransmits_before_loss_const: 10
 
        # How long to wait for join messages in the membership protocol (ms)
        join: 60
 
        # How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)
        consensus: 3600
 
        # Turn off the virtual synchrony filter
        vsftype: none
 
        # Number of messages that may be sent by one processor on receipt of the token
        max_messages: 20
 
        # Limit generated nodeids to 31-bits (positive signed integers)
        clear_node_high_bit: yes
 
        # Disable encryption
        secauth: off
 
        # How many threads to use for encryption/decryption
        threads: 0
 
        # Optionally assign a fixed node id (integer)
        # nodeid: 1234
 
        # This specifies the mode of redundant ring, which may be none, active, or passive.
        rrp_mode: none
 
        interface {
                # The following values need to be set based on your environment
                ringnumber: 0
                bindnetaddr: 192.168.2.0
                mcastaddr: 226.94.1.1
                mcastport: 5405
        }
}
 
amf {  
        mode: disabled
}
 
service {
        # Load the Pacemaker Cluster Resource Manager
        ver:       0
        name:      pacemaker
}
 
aisexec {
        user:   root
        group:  root
}
 
logging {
        fileline: off
        to_stderr: yes
        to_logfile: yes
        logfile: /var/log/corosync/corosync.log
        to_syslog: yes
        syslog_facility: daemon
        debug: off
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
                tags: enter|leave|trace1|trace2|trace3|trace4|trace6
        }
}

Now copy this config to your node2:

1
scp /etc/corosync/corosync.conf node-2:/etc/corosync/corosync.conf

5. Try and Error:
Lets start corosync and see what’s happening:

1
2
root@node-1:~# /etc/init.d/corosync start
root@node-2:~# /etc/init.d/corosync start

Check cluster status:

1
crm_mon --one-shot -V
1
crm_verify -L

Take a look on your logs:

1
less /var/log/corosync/corosync.log

6. First cluster resource:

We now turn off STONITH since we don’t need it in this example configuration:

1
crm configure property stonith-enabled=false

Now we add our first resource, virtual-IP to the configuration:

1
crm configure primitive FAILOVER-ADDR ocf:heartbeat:IPaddr2 params ip="192.168.2.100" nic="eth0" op monitor interval="10s" meta is-managed="true"

Of course there are much more options available like Nic.

7. Use crm:
To use all futures of Corosync you should use the Cluster Resource Manager or CRM command line tool. It’s great and very powerful.

1
2
root@node-2:~# crm
crm(live)#

Use help command and TAB for completion.

1
2
3
4
5
6
7
8
9
10
11
12
Available commands:
 
        cib              manage shadow CIBs
        resource         resources management
        node             nodes management
        options          user preferences
        configure        CRM cluster configuration
        ra               resource agents information center
        status           show cluster status
        quit,bye,exit    exit the program
        help             show help
        end,cd,up        go back one level

Get cluster status:

1
crm_mon --one-shot
1
2
3
4
5
6
7
8
9
10
11
12
root@node-2:~# crm_mon --one-shot
============
Last updated: Tue Dec 28 21:50:38 2010
Stack: openais
Current DC: node-1 - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, 2 expected votes
1 Resources configured.
============
 
Online: [ node-1 node-2 ]
     FAILOVER-ADDR    (ocf::heartbeat:IPaddr2):   Started node-2

8. Maintain
To list cluster configuration:

1
crm configure show

To list cluster status and maintain on the status screen:

1
crm_mon --interval=1

Another shorter command for cluster status with fail count overview:

1
crm_mon -i 2 -f

resource-stickiness option
Configuring weight-points to change the resource to another node.
When a node goes down and then goes up, this configuration makes the resource that is running on the another server be kept there (that was always up). This is very good to prevent a sync problem on the node that was down, or prevent that the node that is flapping, flap the cluster services.

1
crm configure rsc_defaults resource-stickiness=100

migration-threshold option
The default migration-threshold is INFINITY (that means the value is 1000000) what a little bit too much is:

1
2
3
4
# Before modification
apache2: migration-threshold=1000000 fail-count=1
# After modification
apache2: migration-threshold=3 fail-count=1

Lets modify it:

1
crm configure rsc_defaults migration-threshold=3

crm_mon
This configuration is very useful to quickly check the cluster status. Should run on both nodes:

1
crm_mon --daemonize --as-html /var/www/cluster/index.html

Autostart:

1
echo "crm_mon --daemonize --as-html /var/www/html/cluster/index.html" >> /etc/rc.d/rc.local

Click on image to resize:

It’ s a simple HTML plain text with a 5sec refresh.

To migrate a resource to another node, do:

1
crm resource migrate AP-CLUST node-2

To clean up resource messages, do:

1
crm resource cleanup FAILOVER-ADDR

To stop (for start use start) FAILOVER-ADDR resource, do:

1
crm resource stop FAILOVER-ADDR

Put the first node to standby mode and back to online:

1
2
crm node standby node-1
crm node online node-1

Other
Add resources:

1
2
3
4
5
6
# DRBD:
crm configure primitive DRBD ocf:linbit:drbd params drbd_resource="drbd0" op monitor interval="120s"
# Mount target:
crm configure primitive SRV-MOUNT ocf:heartbeat:Filesystem params device="/dev/drbd0" directory="/srv/" fstype="ext4"
# MySQL resource:
crm configure primitive MYSQL lsb:mysql op monitor interval="10s"

Example configuration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
root@node-1:~# crm configure show
node node-1
node node-2
primitive APACHE2 lsb:apache2 \
	op monitor interval="35s" timeout="20s" start-delay="0s"
primitive DRBD ocf:linbit:drbd \
	params drbd_resource="drbd0" \
	op monitor interval="120s" \
	op start interval="0" timeout="240s" \
	op stop interval="0" timeout="100s"
primitive DRBD-LINKS heartbeat:drbdlinks \
	op monitor interval="10s" \
	meta target-role="Started"
primitive FAILOVER-ADDR ocf:heartbeat:IPaddr2 \
	params ip="192.168.2.100" nic="eth0" \
	op monitor interval="10s" \
	meta is-managed="true"
primitive FAILOVER-SRC ocf:heartbeat:IPsrcaddr \
	params ipaddress="192.168.2.100"
primitive MYSQL lsb:mysql \
	op monitor interval="10s"
primitive PING-NET ocf:pacemaker:ping \
	params dampen="5s" multiplier="100" host_list="192.168.2.1 192.168.2.36 8.8.8.8" \
	op monitor interval="60s" timeout="60" \
	op start interval="0" timeout="60s" \
	op stop interval="0" timeout="60s"
primitive SRV-MOUNT ocf:heartbeat:Filesystem \
	params device="/dev/drbd0" directory="/srv/" fstype="ext4" \
	op start interval="0" timeout="60s" \
	op stop interval="0" timeout="60s"
group AP-CLUST FAILOVER-ADDR SRV-MOUNT FAILOVER-SRC APACHE2 DRBD-LINKS MYSQL \
	meta target-role="Started"
ms DRBD-DATA DRBD \
	meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started"
clone PING-NET-CLONE PING-NET
location cli-standby-AP-CLUST AP-CLUST \
	rule $id="cli-standby-rule-AP-CLUST" -inf: #uname eq node-2
location connected_node AP-CLUST \
	rule $id="connected_node-rule" -inf: not_defined pingd or pingd lte 0
location master-prefer-node-1 FAILOVER-ADDR 25: node-1
colocation AP-CLUST_on_DRBD inf: AP-CLUST DRBD-DATA:Master
order AP-CLUST_after_DRBD-DATA inf: DRBD-DATA:promote AP-CLUST:start
property $id="cib-bootstrap-options" \
	dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
	cluster-infrastructure="openais" \
	expected-quorum-votes="2" \
	stonith-enabled="false" \
	no-quorum-policy="ignore" \
	start-failure-is-fatal="false" \
	last-lrm-refresh="1313668727"
rsc_defaults $id="rsc-options" \
	resource-stickiness="100" \
	migration-threshold="3"

Special:
NTP:
It is highly recommended to enable NTP-Service on your cluster nodes. Doing so ensures all nodes agree on the current time and makes reading log files significantly easier! For example to find a reason for a split brain, you need the exact time.

OCF resources
To find all OCF resource agents provided by Pacemaker and Heartbeat, run:

1
2
crm ra list ocf heartbeat
crm ra list ocf pacemaker

Dealing with failconuts:
The output of “crm_mon -i1 -f” shows you some failcounts for your resources:

1
APACHE2: migration-threshold=3 fail-count=2

Clear resource with failcounts:

1
crm resource cleanup APACHE2

Reset failcounts to zero:

1
2
crm resource failcount APACHE2 set node-1 0
crm resource failcount APACHE2 set node-2 0

Maintenance of cluster resources:
If you want to disable your MySQL-Server for a short time you should set it to unmanaged status:

1
2
3
crm resource meta MYSQL set is-managed false
# after upgrades or restarts put it to the managed status back
crm resource meta MYSQL set is-managed true

2 Responses to “Active/Passive Cluster with Pacemaker, Corosync”


  • Hey, yeah we are working really on much similar and great things – that is cool, two heads are better than one:)

    Pacemaker is really fascinating, I had a lot fun to work with it. I have a webserver-cluster running under Fedora 14 ; normally I use only Debian but in this case I wanted to have newer version of pacemaker.

    My apprenticeship project was about HA-Cluster with Pacemaker.

    On what kind of cluster are you working?

    id

  • Looks like we are working on many similar things :-)

    Thanks for this post, I still have to go through it, but if you want to be able to ssh from one host to the other, copy your SSH key on both servers and use “ssh -A”.

    Cheers
    Seb

Leave a Reply