DRBD master/slave setup

July 7, 2011 by Igor Drobot 8 Comments

In the last time I get a lot questions and feedback to some older posts (Active/Passive Cluster) where I used DRBD. And actually I never wrote how to setup DRBD and I try to fill this gap now. Also this post will be a new ground for some new ideas and posts related to Active/Passive Clustering with KVM.

DRBD (Distributed Replicated Block Device) is a really cool cluster solution to have radundant data on two or more nodes. Everybody know RAID, DRBD is like RAID 1 over network.

Install DRBD virtual package (This will work on Debian and Ubuntu):

1	apt-get install drbd-utils

I my case DRBD is running under control of my cluster-software, that means DRBD will be managed by corosync. I remove DRBD from upstart:

1	insserv -v -r drbd remove

We need to start the driver and tell Linux to load it the next time when it boots:

1
2
3

modprobe drbd
echo 'drbd' &gt;&gt; /etc/modules
lsmod | grep drbd

By the way, this could be your network configuration for the nodes – sync over a dedicated crossover cable:

# Cluster Sync IFACE on node 1
auto eth1
iface eth1 inet static
        address 169.254.0.1
        netmask 255.255.255.248

# Cluster Sync IFACE on node2
auto eth1
iface eth1 inet static
  address 169.254.0.4
  netmask 255.255.255.248

Configure DRBD

vim /etc/drbd.conf
# I decomment it to use my own resource
#include "drbd.d/global_common.conf";
include "drbd.d/*.res";

Define my resource:

vim /etc/drbd.d/zeldor.res
# Content of zeldor.res
global {
    usage-count no;
}
 
common {
  syncer { rate 100M; }
}
 
resource drbd0 {
  protocol C;
 
  handlers {
    pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b &gt; /proc/sysrq-trigger ; reboot -f";
    pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b &gt; /proc/sysrq-trigger ; reboot -f";
    local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o &gt; /proc/sysrq-trigger ; halt -f";
  }
 
  startup {
    degr-wfc-timeout 120;    # 2 minutes.
  }
 
  disk {
    on-io-error   detach;
  }
 
  net {
    after-sb-0pri disconnect;
    after-sb-1pri disconnect;
    after-sb-2pri disconnect;
    rr-conflict disconnect;
  }
 
  syncer {
    rate 100M;
    al-extents 257;
  }
 
  on kvm-1 {
    device     /dev/drbd0;
    disk       /dev/sda5;
    address    169.254.0.1:7788;
    flexible-meta-disk  internal;
  }
 
  on kvm-2 {
    device     /dev/drbd0;
    disk       /dev/sda5;
    address    169.254.0.4:7788;
    flexible-meta-disk  internal;
  }
}

Sync rate 110M will be suitable to fill a gigabit interlink connection.

First start of DRBD
Before you start DRBD fix permissions:

chgrp haclient /sbin/drbdsetup
chmod o-x /sbin/drbdsetup
chmod u+s /sbin/drbdsetup
chgrp haclient /sbin/drbdmeta
chmod o-x /sbin/drbdmeta
chmod u+s /sbin/drbdmeta

Start DRBD:

1	/etc/init.d/drbd start

Now we area ready to create the drbd0 device.

1	drbdadm create-md drbd0

Restart DRBD to refresh created resource:

1	/etc/init.d/drbd restart

Execute on primary node, which should be primary(master) all other nodes will be automatically set to slave mode:

1	drbdadm -- --overwrite-data-of-peer primary drbd0

Thats all, DRBD device is now syncing.

1
2
3

root@kvm-fw1:~&gt; drbd-overview 
0:drbd0  SyncSource Primary/Secondary UpToDate/Inconsistent C r---- 
     [&gt;....................] sync'ed:  1.0% (404800/408820)M

This article will help you to optimize your sync speed.

Now you can choose between a static or dynamic usage of the DRBD device
Static:
You can create a filesystem:

1
2
3

mkfs.ext3 /dev/drbd0
# or
mkfs.ext4 /dev/drbd0

Dynamic:
You can use it for LVM and be free to do everything you want with it. (my choice)

A little fix for lvm:

vim /etc/lvm/lvm.conf
# By default we accept every block device:
#filter = [ "a/.*/" ]
# We want only drbd0
filter = [ "a|drbd0|", "r|.*|" ]

1	pvcreate /dev/drbd0

Create a volume group:

1	vgcreate zeldor /dev/drbd0

Create logical volume:

1	lvcreate -n data --size 100g zeldor

Intersted? Read more about LVM

Troubleshooting and Maintenance section
Split-Brain Solution: primary/unknown and secondary/unknown

1. First you should umount drbd device if you can.
2. On primary node issue:

1	drbdadm connect all

3. On secondary node/on faulty node execute: (what will destroy all your data and resync from primary)

1	drbdadm -- --discard-my-data connect all

Some other commands

Will set to master:

1	drbdadm primary all

Will set to slave:

1	drbdadm secondary all

Human readable DRBD status:

1	drbdadm dstate drbd0

Start a manual resync (you will invalidate all your data)

1	drbdadm invalidate all

Start a manual resync on the other node

1	drbdadm invalidate_remote all

If you have more than one resource you should be carefully

1	drbdadm invalidate resource-name

Comments

BJörn says

December 13, 2013 at 16:43

I did this:

sudo /etc/init.d/drbd stop
sudo /etc/init.d/drbd start
dann nur Master:
sudo drbdadm primary –force all
sudo drbdadm — –overwrite-data-of-peer -primary all
sudo /etc/init.d/drbd stop
sudo /etc/init.d/drbd start
sudo drbdadm detach all
sudo drbdadm attach all
sudo drbdadm primary opt
sudo mkfs.ext4 /dev/drbd0
sudo mount /dev/drbd0 /opt
cat /proc/drbd

… and YES, it’s synchronizing!
zeldor says

December 13, 2013 at 11:41

UpToDate sounds good so far. Now you have to told him to be primary. As next step you can establish connection to the backup-instance.
BJörn says

December 13, 2013 at 11:15

Hi,
now I get:

vst@…-master:~$ cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
srcversion: C0F510A918B92928FB51EE3
0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/Inconsistent C r—–
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1482956560

vst@…-master:~$ cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
srcversion: C0F510A918B92928FB51EE3
0: cs:WFBitMapS ro:Secondary/Secondary ds:UpToDate/Inconsistent C r—–
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:1 pe:0 ua:0 ap:0 ep:1 wo:f oos:1482956560

and
vst@…-master:~$ tail -20 /var/log/syslog
Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.785798] block drbd0: asender terminated
Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.785798] block drbd0: Terminating drbd0_asender
Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.814645] block drbd0: bitmap WRITE of 11315 pages took 8 jiffies
Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.814708] block drbd0: 1414 GB (370739140 bits) marked out-of-sync by on disk bit-map.
Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.814715] block drbd0: Connection closed
Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.814715] block drbd0: conn( ProtocolError -> Unconnected )
Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.814715] block drbd0: receiver terminated
Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.814715] block drbd0: Restarting drbd0_receiver
Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.814752] block drbd0: receiver (re)started
Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.814752] block drbd0: conn( Unconnected -> WFConnection )
Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536052] block drbd0: Handshake successful: Agreed network protocol version 96
Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536066] block drbd0: conn( WFConnection -> WFReportParams )
Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536069] block drbd0: Starting asender thread (from drbd0_receiver [15924])
Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536362] block drbd0: data-integrity-alg:
Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536370] block drbd0: drbd_sync_handshake:
Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536372] block drbd0: self 80ACD3B404EC4838:5BA1526BD0669FF7:5BA0526BD0669FF7:5B9F526BD0669FF7 bits:370739140 flags:0
Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536373] block drbd0: peer 5BA1526BD0669FF6:0000000000000000:0000000000000000:0000000000000000 bits:370739140 flags:0
Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536374] block drbd0: uuid_compare()=1 by rule 70
Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536375] block drbd0: Becoming sync source due to disk states.
Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536377] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS )

Any guess what’s going on?
zeldor says

December 13, 2013 at 11:06
Hi,

Start the initial full synchronization:
1
drbdadm primary --force resource
drbdadm primary --force resource
The DRBD synchronisation can be start with the following command on master:
1
drbdadm -- --overwrite-data-of-peer primary all
drbdadm -- --overwrite-data-of-peer primary all
Afterwards you can stop drbd at all and start again and see what happening
BJörn says

December 13, 2013 at 10:32

Hi Zeldor (Igor),

I don’t have any data on the DRBD devices yet. So, tabula rasa (Alles plätten ;-) ) would be fine! I just want to get it working quick.

Please, could you tell me which commands I have to punch in?

Thanx.
zeldor says

December 13, 2013 at 10:20

Hi Björn(if you prefer German so let me know;)),

your problem is:
you have two DRBD instances. They see each other and are connected, that is fine. No one of them is primary. If you make one to primary that will not be a solution.
They are both inconsistent so you have to make one primary issue a full resync and make the opponent secondary. You can also discard all data on the secondary.

Have you any production or important data on this DRBD-device?
BJörn says

December 13, 2013 at 10:02

Dear Igor,

I am on this for days now and I definitely need help. Maybe you can give a hand on DRBD. This is how it looks like:

vst@…-master:~$ cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
srcversion: C0F510A918B92928FB51EE3
0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r—–
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1482956560

Then I do:

vst@…-master:~$ sudo drbdadm — –overwrite-data-of-peer primary opt
DRBD module version: 8.3.13
userland version: 8.3.11

vst@…-master:~$ cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
srcversion: C0F510A918B92928FB51EE3
0: cs:WFBitMapS ro:Primary/Secondary ds:UpToDate/Inconsistent C r—–
ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1482956560

if I do cat /proc/drbd a couple of times I sometimes get

vst@…-master:~$ cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
srcversion: C0F510A918B92928FB51EE3
0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r—–
ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1482956560
[>………………..] sync’ed: 0.1% (1448196/1448196)Mfinish: 1029:49:51 speed: 0 (0) K/sec

but with the next cat /proc/drbd I get again

vst@…-master:~$ cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
srcversion: C0F510A918B92928FB51EE3
0: cs:WFBitMapS ro:Primary/Secondary ds:UpToDate/Inconsistent C r—–
ns:0 nr:0 dw:0 dr:664 al:0 bum:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1482956560

Tail returns:

vst@…master:~$ tail -20 /var/log/syslog
Dec 13 09:59:16 ubu-alf-master kernel: [ 2426.004673] block drbd0: Becoming sync source due to disk states.
Dec 13 09:59:16 ubu-alf-master kernel: [ 2426.004676] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS )
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.002260] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.007341] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0)
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.007356] block drbd0: conn( WFBitMapS -> SyncSource )
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.007371] block drbd0: Began resync as SyncSource (will sync 1482956560 KB [370739140 bits set]).
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.007413] block drbd0: updated sync UUID 80ACD3B404EC4839:521A526BD0669FF7:5219526BD0669FF7:5218526BD0669FF7
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.017877] block drbd0: /build/buildd/linux-lts-quantal-3.5.0/drivers/block/drbd/drbd_receiver.c:1988: sector: 0s, size: 16777216
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.020454] block drbd0: error receiving RSDataRequest, l: 24!
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.021282] block drbd0: peer( Secondary -> Unknown ) conn( SyncSource -> ProtocolError )
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.026196] block drbd0: asender terminated
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.026196] block drbd0: Terminating drbd0_asender
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.054432] block drbd0: bitmap WRITE of 11315 pages took 8 jiffies
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.054500] block drbd0: 1414 GB (370739140 bits) marked out-of-sync by on disk bit-map.
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.054507] block drbd0: Connection closed
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.054507] block drbd0: conn( ProtocolError -> Unconnected )
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.054513] block drbd0: receiver terminated
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.054514] block drbd0: Restarting drbd0_receiver
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.054545] block drbd0: receiver (re)started
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.054545] block drbd0: conn( Unconnected -> WFConnection )

I tried the Split-Brain Solution on both sides – nothing works!

Right now, there is no data on the Master/ Slave. So I began to start from scratch on again to set up a DRBD System. And I always get the same issue. All configuration files look similar to yours in http://zeldor.biz/2011/07/drbd-masterslave-setup/.

Thank you for your help in advance.
Björn

Trackbacks

DRBD, Open vSwitch.. | TooMeeK says:

May 28, 2012 at 20:44

[…] in Ubuntu and DRBD for KVM cluster with GFS2, 2-Node Red Hat KVM Cluster Tutorial and DRBD master/slave setup. I’m also looking into LCMC. This entry was posted in KVM, linux by tomcio. Bookmark the […]

DRBD master/slave setup

Categories

Archives

Categories

Archives

Tags

Comments

Trackbacks

Leave a Reply