zeldor.biz

Linux, programming and more

Copyright © 2023
Log in

DRBD master/slave setup

July 7, 2011 by Igor Drobot 8 Comments

In the last time I get a lot questions and feedback to some older posts (Active/Passive Cluster) where I used DRBD. And actually I never wrote how to setup DRBD and I try to fill this gap now. Also this post will be a new ground for some new ideas and posts related to Active/Passive Clustering with KVM.

DRBD (Distributed Replicated Block Device) is a really cool cluster solution to have radundant data on two or more nodes. Everybody know RAID, DRBD is like RAID 1 over network.

Install DRBD virtual package (This will work on Debian and Ubuntu):

1
apt-get install drbd-utils

apt-get install drbd-utils

I my case DRBD is running under control of my cluster-software, that means DRBD will be managed by corosync. I remove DRBD from upstart:

1
insserv -v -r drbd remove

insserv -v -r drbd remove

We need to start the driver and tell Linux to load it the next time when it boots:

1
2
3
modprobe drbd
echo 'drbd' >> /etc/modules
lsmod | grep drbd

modprobe drbd echo 'drbd' >> /etc/modules lsmod | grep drbd

By the way, this could be your network configuration for the nodes – sync over a dedicated crossover cable:

1
2
3
4
5
# Cluster Sync IFACE on node 1
auto eth1
iface eth1 inet static
        address 169.254.0.1
        netmask 255.255.255.248

# Cluster Sync IFACE on node 1 auto eth1 iface eth1 inet static address 169.254.0.1 netmask 255.255.255.248

1
2
3
4
5
# Cluster Sync IFACE on node2
auto eth1
iface eth1 inet static
  address 169.254.0.4
  netmask 255.255.255.248

# Cluster Sync IFACE on node2 auto eth1 iface eth1 inet static address 169.254.0.4 netmask 255.255.255.248

Configure DRBD

1
2
3
4
vim /etc/drbd.conf
# I decomment it to use my own resource
#include "drbd.d/global_common.conf";
include "drbd.d/*.res";

vim /etc/drbd.conf # I decomment it to use my own resource #include "drbd.d/global_common.conf"; include "drbd.d/*.res";

Define my resource:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
vim /etc/drbd.d/zeldor.res
# Content of zeldor.res
global {
    usage-count no;
}
 
common {
  syncer { rate 100M; }
}
 
resource drbd0 {
  protocol C;
 
  handlers {
    pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
    pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
    local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
  }
 
  startup {
    degr-wfc-timeout 120;    # 2 minutes.
  }
 
  disk {
    on-io-error   detach;
  }
 
  net {
    after-sb-0pri disconnect;
    after-sb-1pri disconnect;
    after-sb-2pri disconnect;
    rr-conflict disconnect;
  }
 
  syncer {
    rate 100M;
    al-extents 257;
  }
 
  on kvm-1 {
    device     /dev/drbd0;
    disk       /dev/sda5;
    address    169.254.0.1:7788;
    flexible-meta-disk  internal;
  }
 
  on kvm-2 {
    device     /dev/drbd0;
    disk       /dev/sda5;
    address    169.254.0.4:7788;
    flexible-meta-disk  internal;
  }
}

vim /etc/drbd.d/zeldor.res # Content of zeldor.res global { usage-count no; } common { syncer { rate 100M; } } resource drbd0 { protocol C; handlers { pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; } startup { degr-wfc-timeout 120; # 2 minutes. } disk { on-io-error detach; } net { after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { rate 100M; al-extents 257; } on kvm-1 { device /dev/drbd0; disk /dev/sda5; address 169.254.0.1:7788; flexible-meta-disk internal; } on kvm-2 { device /dev/drbd0; disk /dev/sda5; address 169.254.0.4:7788; flexible-meta-disk internal; } }

Sync rate 110M will be suitable to fill a gigabit interlink connection.

First start of DRBD
Before you start DRBD fix permissions:

1
2
3
4
5
6
chgrp haclient /sbin/drbdsetup
chmod o-x /sbin/drbdsetup
chmod u+s /sbin/drbdsetup
chgrp haclient /sbin/drbdmeta
chmod o-x /sbin/drbdmeta
chmod u+s /sbin/drbdmeta

chgrp haclient /sbin/drbdsetup chmod o-x /sbin/drbdsetup chmod u+s /sbin/drbdsetup chgrp haclient /sbin/drbdmeta chmod o-x /sbin/drbdmeta chmod u+s /sbin/drbdmeta

Start DRBD:

1
/etc/init.d/drbd start

/etc/init.d/drbd start

Now we area ready to create the drbd0 device.

1
drbdadm create-md drbd0

drbdadm create-md drbd0

Restart DRBD to refresh created resource:

1
/etc/init.d/drbd restart

/etc/init.d/drbd restart

Execute on primary node, which should be primary(master) all other nodes will be automatically set to slave mode:

1
drbdadm -- --overwrite-data-of-peer primary drbd0

drbdadm -- --overwrite-data-of-peer primary drbd0

Thats all, DRBD device is now syncing.

1
2
3
root@kvm-fw1:~> drbd-overview 
0:drbd0  SyncSource Primary/Secondary UpToDate/Inconsistent C r---- 
     [>....................] sync'ed:  1.0% (404800/408820)M

root@kvm-fw1:~> drbd-overview 0:drbd0 SyncSource Primary/Secondary UpToDate/Inconsistent C r---- [>....................] sync'ed: 1.0% (404800/408820)M

This article will help you to optimize your sync speed.

Now you can choose between a static or dynamic usage of the DRBD device
Static:
You can create a filesystem:

1
2
3
mkfs.ext3 /dev/drbd0
# or
mkfs.ext4 /dev/drbd0

mkfs.ext3 /dev/drbd0 # or mkfs.ext4 /dev/drbd0

Dynamic:
You can use it for LVM and be free to do everything you want with it. (my choice)

A little fix for lvm:

1
2
3
4
5
vim /etc/lvm/lvm.conf
# By default we accept every block device:
#filter = [ "a/.*/" ]
# We want only drbd0
filter = [ "a|drbd0|", "r|.*|" ]

vim /etc/lvm/lvm.conf # By default we accept every block device: #filter = [ "a/.*/" ] # We want only drbd0 filter = [ "a|drbd0|", "r|.*|" ]

1
pvcreate /dev/drbd0

pvcreate /dev/drbd0

Create a volume group:

1
vgcreate zeldor /dev/drbd0

vgcreate zeldor /dev/drbd0

Create logical volume:

1
lvcreate -n data --size 100g zeldor

lvcreate -n data --size 100g zeldor

Intersted? Read more about LVM

Troubleshooting and Maintenance section
Split-Brain Solution: primary/unknown and secondary/unknown

1. First you should umount drbd device if you can.
2. On primary node issue:

1
drbdadm connect all

drbdadm connect all

3. On secondary node/on faulty node execute: (what will destroy all your data and resync from primary)

1
drbdadm -- --discard-my-data connect all

drbdadm -- --discard-my-data connect all

Some other commands

Will set to master:

1
drbdadm primary all

drbdadm primary all

Will set to slave:

1
drbdadm secondary all

drbdadm secondary all

Human readable DRBD status:

1
drbdadm dstate drbd0

drbdadm dstate drbd0

Start a manual resync (you will invalidate all your data)

1
drbdadm invalidate all

drbdadm invalidate all

Start a manual resync on the other node

1
drbdadm invalidate_remote all

drbdadm invalidate_remote all

If you have more than one resource you should be carefully

1
drbdadm invalidate resource-name

drbdadm invalidate resource-name

Filed Under: Debian, Linux, Networking Tagged With: Cluster, Debian, DRBD, drbd syncer, HA, KVM, LVM, Ubuntu

Categories

Archives

Tags

apache2 Apple arduino ARM Automation backup bash Cisco Cluster Corosync Database Debian Debian squeeze DIY DNS Fedora FTP Fun Icinga Ipv6 KVM Linux LVM MAC OS X Monitoring MySQL Nagios Nginx openSUSE OpenVPN PHP Proxy Python python3 qemu RAID rsync Samba security ssh Ubuntu virtualization Windows Windows 7 Wordpress

Comments

  1. BJörn says

    December 13, 2013 at 16:43

    I did this:

    sudo /etc/init.d/drbd stop
    sudo /etc/init.d/drbd start
    dann nur Master:
    sudo drbdadm primary –force all
    sudo drbdadm — –overwrite-data-of-peer -primary all
    sudo /etc/init.d/drbd stop
    sudo /etc/init.d/drbd start
    sudo drbdadm detach all
    sudo drbdadm attach all
    sudo drbdadm primary opt
    sudo mkfs.ext4 /dev/drbd0
    sudo mount /dev/drbd0 /opt
    cat /proc/drbd

    … and YES, it’s synchronizing!

  2. zeldor says

    December 13, 2013 at 11:41

    UpToDate sounds good so far. Now you have to told him to be primary. As next step you can establish connection to the backup-instance.

  3. BJörn says

    December 13, 2013 at 11:15

    Hi,
    now I get:

    vst@…-master:~$ cat /proc/drbd
    version: 8.3.13 (api:88/proto:86-96)
    srcversion: C0F510A918B92928FB51EE3
    0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/Inconsistent C r—–
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1482956560

    vst@…-master:~$ cat /proc/drbd
    version: 8.3.13 (api:88/proto:86-96)
    srcversion: C0F510A918B92928FB51EE3
    0: cs:WFBitMapS ro:Secondary/Secondary ds:UpToDate/Inconsistent C r—–
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:1 pe:0 ua:0 ap:0 ep:1 wo:f oos:1482956560

    and
    vst@…-master:~$ tail -20 /var/log/syslog
    Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.785798] block drbd0: asender terminated
    Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.785798] block drbd0: Terminating drbd0_asender
    Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.814645] block drbd0: bitmap WRITE of 11315 pages took 8 jiffies
    Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.814708] block drbd0: 1414 GB (370739140 bits) marked out-of-sync by on disk bit-map.
    Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.814715] block drbd0: Connection closed
    Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.814715] block drbd0: conn( ProtocolError -> Unconnected )
    Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.814715] block drbd0: receiver terminated
    Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.814715] block drbd0: Restarting drbd0_receiver
    Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.814752] block drbd0: receiver (re)started
    Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.814752] block drbd0: conn( Unconnected -> WFConnection )
    Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536052] block drbd0: Handshake successful: Agreed network protocol version 96
    Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536066] block drbd0: conn( WFConnection -> WFReportParams )
    Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536069] block drbd0: Starting asender thread (from drbd0_receiver [15924])
    Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536362] block drbd0: data-integrity-alg:
    Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536370] block drbd0: drbd_sync_handshake:
    Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536372] block drbd0: self 80ACD3B404EC4838:5BA1526BD0669FF7:5BA0526BD0669FF7:5B9F526BD0669FF7 bits:370739140 flags:0
    Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536373] block drbd0: peer 5BA1526BD0669FF6:0000000000000000:0000000000000000:0000000000000000 bits:370739140 flags:0
    Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536374] block drbd0: uuid_compare()=1 by rule 70
    Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536375] block drbd0: Becoming sync source due to disk states.
    Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536377] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS )

    Any guess what’s going on?

  4. zeldor says

    December 13, 2013 at 11:06

    Hi,

    Start the initial full synchronization:

    1
    
    drbdadm primary --force resource

    drbdadm primary --force resource

    The DRBD synchronisation can be start with the following command on master:

    1
    
    drbdadm -- --overwrite-data-of-peer primary all

    drbdadm -- --overwrite-data-of-peer primary all

    Afterwards you can stop drbd at all and start again and see what happening

  5. BJörn says

    December 13, 2013 at 10:32

    Hi Zeldor (Igor),

    I don’t have any data on the DRBD devices yet. So, tabula rasa (Alles plätten ;-) ) would be fine! I just want to get it working quick.

    Please, could you tell me which commands I have to punch in?

    Thanx.

  6. zeldor says

    December 13, 2013 at 10:20

    Hi Björn(if you prefer German so let me know;)),

    your problem is:
    you have two DRBD instances. They see each other and are connected, that is fine. No one of them is primary. If you make one to primary that will not be a solution.
    They are both inconsistent so you have to make one primary issue a full resync and make the opponent secondary. You can also discard all data on the secondary.

    Have you any production or important data on this DRBD-device?

  7. BJörn says

    December 13, 2013 at 10:02

    Dear Igor,

    I am on this for days now and I definitely need help. Maybe you can give a hand on DRBD. This is how it looks like:

    vst@…-master:~$ cat /proc/drbd
    version: 8.3.13 (api:88/proto:86-96)
    srcversion: C0F510A918B92928FB51EE3
    0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r—–
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1482956560

    Then I do:

    vst@…-master:~$ sudo drbdadm — –overwrite-data-of-peer primary opt
    DRBD module version: 8.3.13
    userland version: 8.3.11

    vst@…-master:~$ cat /proc/drbd
    version: 8.3.13 (api:88/proto:86-96)
    srcversion: C0F510A918B92928FB51EE3
    0: cs:WFBitMapS ro:Primary/Secondary ds:UpToDate/Inconsistent C r—–
    ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1482956560

    if I do cat /proc/drbd a couple of times I sometimes get

    vst@…-master:~$ cat /proc/drbd
    version: 8.3.13 (api:88/proto:86-96)
    srcversion: C0F510A918B92928FB51EE3
    0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r—–
    ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1482956560
    [>………………..] sync’ed: 0.1% (1448196/1448196)Mfinish: 1029:49:51 speed: 0 (0) K/sec

    but with the next cat /proc/drbd I get again

    vst@…-master:~$ cat /proc/drbd
    version: 8.3.13 (api:88/proto:86-96)
    srcversion: C0F510A918B92928FB51EE3
    0: cs:WFBitMapS ro:Primary/Secondary ds:UpToDate/Inconsistent C r—–
    ns:0 nr:0 dw:0 dr:664 al:0 bum:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1482956560

    Tail returns:

    vst@…master:~$ tail -20 /var/log/syslog
    Dec 13 09:59:16 ubu-alf-master kernel: [ 2426.004673] block drbd0: Becoming sync source due to disk states.
    Dec 13 09:59:16 ubu-alf-master kernel: [ 2426.004676] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS )
    Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.002260] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0
    Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.007341] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0)
    Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.007356] block drbd0: conn( WFBitMapS -> SyncSource )
    Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.007371] block drbd0: Began resync as SyncSource (will sync 1482956560 KB [370739140 bits set]).
    Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.007413] block drbd0: updated sync UUID 80ACD3B404EC4839:521A526BD0669FF7:5219526BD0669FF7:5218526BD0669FF7
    Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.017877] block drbd0: /build/buildd/linux-lts-quantal-3.5.0/drivers/block/drbd/drbd_receiver.c:1988: sector: 0s, size: 16777216
    Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.020454] block drbd0: error receiving RSDataRequest, l: 24!
    Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.021282] block drbd0: peer( Secondary -> Unknown ) conn( SyncSource -> ProtocolError )
    Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.026196] block drbd0: asender terminated
    Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.026196] block drbd0: Terminating drbd0_asender
    Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.054432] block drbd0: bitmap WRITE of 11315 pages took 8 jiffies
    Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.054500] block drbd0: 1414 GB (370739140 bits) marked out-of-sync by on disk bit-map.
    Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.054507] block drbd0: Connection closed
    Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.054507] block drbd0: conn( ProtocolError -> Unconnected )
    Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.054513] block drbd0: receiver terminated
    Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.054514] block drbd0: Restarting drbd0_receiver
    Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.054545] block drbd0: receiver (re)started
    Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.054545] block drbd0: conn( Unconnected -> WFConnection )

    I tried the Split-Brain Solution on both sides – nothing works!

    Right now, there is no data on the Master/ Slave. So I began to start from scratch on again to set up a DRBD System. And I always get the same issue. All configuration files look similar to yours in http://zeldor.biz/2011/07/drbd-masterslave-setup/.

    Thank you for your help in advance.
    Björn

Trackbacks

  1. DRBD, Open vSwitch.. | TooMeeK says:
    May 28, 2012 at 20:44

    […] in Ubuntu and DRBD for KVM cluster with GFS2, 2-Node Red Hat KVM Cluster Tutorial and DRBD master/slave setup. I’m also looking into LCMC. This entry was posted in KVM, linux by tomcio. Bookmark the […]

Leave a Reply

Your email address will not be published. Required fields are marked *