In the last time I get a lot questions and feedback to some older posts (Active/Passive Cluster) where I used DRBD. And actually I never wrote how to setup DRBD and I try to fill this gap now. Also this post will be a new ground for some new ideas and posts related to Active/Passive Clustering with KVM.
DRBD (Distributed Replicated Block Device) is a really cool cluster solution to have radundant data on two or more nodes. Everybody know RAID, DRBD is like RAID 1 over network.
Install DRBD virtual package (This will work on Debian and Ubuntu):
1 | apt-get install drbd-utils |
I my case DRBD is running under control of my cluster-software, that means DRBD will be managed by corosync. I remove DRBD from upstart:
1 | insserv -v -r drbd remove |
We need to start the driver and tell Linux to load it the next time when it boots:
1 2 3 | modprobe drbd echo 'drbd' >> /etc/modules lsmod | grep drbd |
By the way, this could be your network configuration for the nodes – sync over a dedicated crossover cable:
1 2 3 4 5 | # Cluster Sync IFACE on node 1
auto eth1
iface eth1 inet static
address 169.254.0.1
netmask 255.255.255.248 |
1 2 3 4 5 | # Cluster Sync IFACE on node2
auto eth1
iface eth1 inet static
address 169.254.0.4
netmask 255.255.255.248 |
Configure DRBD
1 2 3 4 | vim /etc/drbd.conf # I decomment it to use my own resource #include "drbd.d/global_common.conf"; include "drbd.d/*.res"; |
Define my resource:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | vim /etc/drbd.d/zeldor.res # Content of zeldor.res global { usage-count no; } common { syncer { rate 100M; } } resource drbd0 { protocol C; handlers { pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; } startup { degr-wfc-timeout 120; # 2 minutes. } disk { on-io-error detach; } net { after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { rate 100M; al-extents 257; } on kvm-1 { device /dev/drbd0; disk /dev/sda5; address 169.254.0.1:7788; flexible-meta-disk internal; } on kvm-2 { device /dev/drbd0; disk /dev/sda5; address 169.254.0.4:7788; flexible-meta-disk internal; } } |
Sync rate 110M will be suitable to fill a gigabit interlink connection.
First start of DRBD
Before you start DRBD fix permissions:
1 2 3 4 5 6 | chgrp haclient /sbin/drbdsetup chmod o-x /sbin/drbdsetup chmod u+s /sbin/drbdsetup chgrp haclient /sbin/drbdmeta chmod o-x /sbin/drbdmeta chmod u+s /sbin/drbdmeta |
Start DRBD:
1 | /etc/init.d/drbd start |
Now we area ready to create the drbd0 device.
1 | drbdadm create-md drbd0 |
Restart DRBD to refresh created resource:
1 | /etc/init.d/drbd restart |
Execute on primary node, which should be primary(master) all other nodes will be automatically set to slave mode:
1 | drbdadm -- --overwrite-data-of-peer primary drbd0 |
Thats all, DRBD device is now syncing.
1 2 3 | root@kvm-fw1:~> drbd-overview 0:drbd0 SyncSource Primary/Secondary UpToDate/Inconsistent C r---- [>....................] sync'ed: 1.0% (404800/408820)M |
This article will help you to optimize your sync speed.
Now you can choose between a static or dynamic usage of the DRBD device
Static:
You can create a filesystem:
1 2 3 | mkfs.ext3 /dev/drbd0 # or mkfs.ext4 /dev/drbd0 |
Dynamic:
You can use it for LVM and be free to do everything you want with it. (my choice)
A little fix for lvm:
1 2 3 4 5 | vim /etc/lvm/lvm.conf # By default we accept every block device: #filter = [ "a/.*/" ] # We want only drbd0 filter = [ "a|drbd0|", "r|.*|" ] |
1 | pvcreate /dev/drbd0 |
Create a volume group:
1 | vgcreate zeldor /dev/drbd0 |
Create logical volume:
1 | lvcreate -n data --size 100g zeldor |
Intersted? Read more about LVM
Troubleshooting and Maintenance section
Split-Brain Solution: primary/unknown and secondary/unknown
1. First you should umount drbd device if you can.
2. On primary node issue:
1 | drbdadm connect all |
3. On secondary node/on faulty node execute: (what will destroy all your data and resync from primary)
1 | drbdadm -- --discard-my-data connect all |
Some other commands
Will set to master:
1 | drbdadm primary all |
Will set to slave:
1 | drbdadm secondary all |
Human readable DRBD status:
1 | drbdadm dstate drbd0 |
Start a manual resync (you will invalidate all your data)
1 | drbdadm invalidate all |
Start a manual resync on the other node
1 | drbdadm invalidate_remote all |
If you have more than one resource you should be carefully
1 | drbdadm invalidate resource-name |
BJörn says
I did this:
sudo /etc/init.d/drbd stop
sudo /etc/init.d/drbd start
dann nur Master:
sudo drbdadm primary –force all
sudo drbdadm — –overwrite-data-of-peer -primary all
sudo /etc/init.d/drbd stop
sudo /etc/init.d/drbd start
sudo drbdadm detach all
sudo drbdadm attach all
sudo drbdadm primary opt
sudo mkfs.ext4 /dev/drbd0
sudo mount /dev/drbd0 /opt
cat /proc/drbd
… and YES, it’s synchronizing!
zeldor says
UpToDate sounds good so far. Now you have to told him to be primary. As next step you can establish connection to the backup-instance.
BJörn says
Hi,
now I get:
vst@…-master:~$ cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
srcversion: C0F510A918B92928FB51EE3
0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/Inconsistent C r—–
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1482956560
vst@…-master:~$ cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
srcversion: C0F510A918B92928FB51EE3
0: cs:WFBitMapS ro:Secondary/Secondary ds:UpToDate/Inconsistent C r—–
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:1 pe:0 ua:0 ap:0 ep:1 wo:f oos:1482956560
and
vst@…-master:~$ tail -20 /var/log/syslog
Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.785798] block drbd0: asender terminated
Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.785798] block drbd0: Terminating drbd0_asender
Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.814645] block drbd0: bitmap WRITE of 11315 pages took 8 jiffies
Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.814708] block drbd0: 1414 GB (370739140 bits) marked out-of-sync by on disk bit-map.
Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.814715] block drbd0: Connection closed
Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.814715] block drbd0: conn( ProtocolError -> Unconnected )
Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.814715] block drbd0: receiver terminated
Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.814715] block drbd0: Restarting drbd0_receiver
Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.814752] block drbd0: receiver (re)started
Dec 13 11:14:00 ubu-alf-master kernel: [ 6910.814752] block drbd0: conn( Unconnected -> WFConnection )
Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536052] block drbd0: Handshake successful: Agreed network protocol version 96
Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536066] block drbd0: conn( WFConnection -> WFReportParams )
Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536069] block drbd0: Starting asender thread (from drbd0_receiver [15924])
Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536362] block drbd0: data-integrity-alg:
Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536370] block drbd0: drbd_sync_handshake:
Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536372] block drbd0: self 80ACD3B404EC4838:5BA1526BD0669FF7:5BA0526BD0669FF7:5B9F526BD0669FF7 bits:370739140 flags:0
Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536373] block drbd0: peer 5BA1526BD0669FF6:0000000000000000:0000000000000000:0000000000000000 bits:370739140 flags:0
Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536374] block drbd0: uuid_compare()=1 by rule 70
Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536375] block drbd0: Becoming sync source due to disk states.
Dec 13 11:14:01 ubu-alf-master kernel: [ 6911.536377] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS )
Any guess what’s going on?
zeldor says
Hi,
Start the initial full synchronization:
drbdadm primary --force resource
The DRBD synchronisation can be start with the following command on master:
Afterwards you can stop drbd at all and start again and see what happening
BJörn says
Hi Zeldor (Igor),
I don’t have any data on the DRBD devices yet. So, tabula rasa (Alles plätten ;-) ) would be fine! I just want to get it working quick.
Please, could you tell me which commands I have to punch in?
Thanx.
zeldor says
Hi Björn(if you prefer German so let me know;)),
your problem is:
you have two DRBD instances. They see each other and are connected, that is fine. No one of them is primary. If you make one to primary that will not be a solution.
They are both inconsistent so you have to make one primary issue a full resync and make the opponent secondary. You can also discard all data on the secondary.
Have you any production or important data on this DRBD-device?
BJörn says
Dear Igor,
I am on this for days now and I definitely need help. Maybe you can give a hand on DRBD. This is how it looks like:
vst@…-master:~$ cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
srcversion: C0F510A918B92928FB51EE3
0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r—–
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1482956560
Then I do:
vst@…-master:~$ sudo drbdadm — –overwrite-data-of-peer primary opt
DRBD module version: 8.3.13
userland version: 8.3.11
vst@…-master:~$ cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
srcversion: C0F510A918B92928FB51EE3
0: cs:WFBitMapS ro:Primary/Secondary ds:UpToDate/Inconsistent C r—–
ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1482956560
if I do cat /proc/drbd a couple of times I sometimes get
vst@…-master:~$ cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
srcversion: C0F510A918B92928FB51EE3
0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r—–
ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1482956560
[>………………..] sync’ed: 0.1% (1448196/1448196)Mfinish: 1029:49:51 speed: 0 (0) K/sec
but with the next cat /proc/drbd I get again
vst@…-master:~$ cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
srcversion: C0F510A918B92928FB51EE3
0: cs:WFBitMapS ro:Primary/Secondary ds:UpToDate/Inconsistent C r—–
ns:0 nr:0 dw:0 dr:664 al:0 bum:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1482956560
Tail returns:
vst@…master:~$ tail -20 /var/log/syslog
Dec 13 09:59:16 ubu-alf-master kernel: [ 2426.004673] block drbd0: Becoming sync source due to disk states.
Dec 13 09:59:16 ubu-alf-master kernel: [ 2426.004676] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS )
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.002260] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.007341] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0)
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.007356] block drbd0: conn( WFBitMapS -> SyncSource )
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.007371] block drbd0: Began resync as SyncSource (will sync 1482956560 KB [370739140 bits set]).
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.007413] block drbd0: updated sync UUID 80ACD3B404EC4839:521A526BD0669FF7:5219526BD0669FF7:5218526BD0669FF7
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.017877] block drbd0: /build/buildd/linux-lts-quantal-3.5.0/drivers/block/drbd/drbd_receiver.c:1988: sector: 0s, size: 16777216
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.020454] block drbd0: error receiving RSDataRequest, l: 24!
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.021282] block drbd0: peer( Secondary -> Unknown ) conn( SyncSource -> ProtocolError )
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.026196] block drbd0: asender terminated
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.026196] block drbd0: Terminating drbd0_asender
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.054432] block drbd0: bitmap WRITE of 11315 pages took 8 jiffies
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.054500] block drbd0: 1414 GB (370739140 bits) marked out-of-sync by on disk bit-map.
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.054507] block drbd0: Connection closed
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.054507] block drbd0: conn( ProtocolError -> Unconnected )
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.054513] block drbd0: receiver terminated
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.054514] block drbd0: Restarting drbd0_receiver
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.054545] block drbd0: receiver (re)started
Dec 13 09:59:17 ubu-alf-master kernel: [ 2427.054545] block drbd0: conn( Unconnected -> WFConnection )
I tried the Split-Brain Solution on both sides – nothing works!
Right now, there is no data on the Master/ Slave. So I began to start from scratch on again to set up a DRBD System. And I always get the same issue. All configuration files look similar to yours in http://zeldor.biz/2011/07/drbd-masterslave-setup/.
Thank you for your help in advance.
Björn