Installation and Configuration of DRBD

Introduction

DRBD is a system that allows you to create software RAID1 over a local network. This enables high availability and resource sharing on a cluster without a disk array.

Here we will install DRBD8, with the goal of implementing a cluster filesystem (see documentation on OCFS2) which is not supported on DRBD7. We'll use the DRBD8 packages from Debian repositories. We'll work on a 2-node cluster.

Installation

First, install the following packages:

1	`aptitude install drbd8-utils`

Then we'll load the module and make it persistent (for future reboots):

modprobe drbd
echo "drbd" >> /etc/modules

Configuration

drbd.conf

The drbd.conf file is pretty good by default as it allows you to write an extensible configuration:

# You can find an example in  /usr/share/doc/drbd.../drbd.conf.example

include "drbd.d/global_common.conf";
include "drbd.d/*.res";

I didn't modify it.

global_common.conf

This file is the default file, which can contain host configurations, but also allows you to have a global configuration for your different DRBD configurations (common section):

# Global configuration
global {
    # Do not report statistics usage to LinBit
    usage-count no;
}

# All resources inherit the options set in this section
common {
    # C (Synchronous replication protocol)
    protocol C;

    startup {
        # Wait for connection timeout (in seconds)
        wfc-timeout 1 ;
        # Wait for connection timeout, if this node was a degraded cluster (in seconds)
        degr-wfc-timeout 1 ;
    }

    net {
        # Maximum number of requests to be allocated by DRBD
        max-buffers 8192;
        # The highest number of data blocks between two write barriers
        max-epoch-size 8192;
        # The size of the TCP socket send buffer
        sndbuf-size 512k;
        # How often the I/O subsystem's controller is forced to process pending I/O requests
        unplug-watermark 8192;
        # The HMAC algorithm to enable peer authentication at all
        cram-hmac-alg sha1;
        # The shared secret used in peer authentication
        shared-secret "xxx";
        # Split brains
        # Split brain, resource is not in the Primary role on any host
        after-sb-0pri disconnect;
        # Split brain, resource is in the Primary role on one host
        after-sb-1pri disconnect;
        # Split brain, resource is in the Primary role on both host
        after-sb-2pri disconnect;
        # Helps to solve the cases when the outcome of the resync decision is incompatible with the current role assignment
        rr-conflict disconnect;
    }

    handlers {
        # If the node is primary, degraded and if the local copy of the data is inconsistent
        pri-on-incon-degr "echo Current node is primary, degraded and the local copy of the data is inconsistent | wall ";
    }

    disk {
        # The node downgrades the disk status to inconsistent on io errors
        on-io-error pass_on;
        # Disable protecting data if power failure (done by hardware)
        no-disk-barrier;
        # Disable the backing device to support disk flushes
        no-disk-flushes;
        # Do not let write requests drain before write requests of a new reordering domain are issued
        no-disk-drain;
        # Disables the use of disk flushes and barrier BIOs when accessing the meta data device
        no-md-flushes;
    }

    syncer {
        # The maximum bandwidth a resource uses for background re-synchronization
        rate 500M;
        # Control how big the hot area (= active set) can get
        al-extents 3833;
    } 
}

I've commented all my changes.

r0.res

Now we'll create a file to add our resource 0:

resource r0 {
    # Node 1
    on srv1 {
        device       /dev/drbd0;
        # Disk containing the drbd partition
        disk         /dev/mapper/datas-drbd;
        # IP address of this host
        address      192.168.100.1:7788;
        # Store metadata on the same device
        meta-disk    internal;
    }
    # Node 2
    on srv2 {
        device      /dev/drbd0;
        disk        /dev/mapper/lvm-drbd;
        address     192.168.20.4:7788;
        meta-disk   internal;
    }
}

Synchronization

We need to launch the first sync now.

On both nodes, run this command:

1	`drbdadm create-md r0`

Still on both nodes, run this command to activate the resource:

1	`drbdadm up r0`

Node 1

We'll ask the first node to do the first block-by-block replication:

drbdadm -- --overwrite-data-of-peer primary r0

Then we'll have to wait for the sync to finish before continuing:

> cat /proc/drbd
version: 8.3.7 (api:88/proto:86-91)
srcversion: EE47D8BF18AC166BE219757 
 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----
   ns:912248 nr:0 dw:0 dr:920640 al:0 bm:55 lo:1 pe:388 ua:2048 ap:0 ep:1 wo:b oos:3283604
        [===>................] sync'ed: 21.9% (3283604/4194304)K
        finish: 1:08:24 speed: 580 (452) K/sec

The display of /proc/drbd allows you to see the replication status. At the end, you should have something like this:

> cat /proc/drbd
version: 8.3.7 (api:88/proto:86-91)
srcversion: EE47D8BF18AC166BE219757
 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----
    ns:0 nr:4194304 dw:4194304 dr:0 al:0 bm:256 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

Node 2

If you want to do dual master, this option must be active in the configuration:

resource <resource>
  startup {
    become-primary-on both;
  }
  net {
    protocol C;
    allow-two-primaries yes;
  }
}

Now we can activate the other node as primary:

1	`drbdadm primary r0`

Once the synchronization is complete, DRBD is installed and properly configured. You now need to format the device /dev/drbd0 with a filesystem, such as ext3 for active/passive or OCFS2 for example if you want active/active (there are others like GFS2).

1	`mkfs.ext3 /dev/drbd0`

or

1	`mkfs.ocfs2 /dev/drbd0`

Then mount the volume in a folder to access the data:

1	`mount /dev/drbd0 /mnt/data`

Only a primary node can mount and access the data on the DRBD volume. When DRBD works with HeartBeat in CRM mode, if the primary node goes down, the cluster is able to switch the secondary node to primary. When the old primary is "UP" again, it will synchronize and become a secondary in turn.

Usage

Become master

To set all volumes as primary:

1	`drbdadm primary all`

Info

Replace all with the name of your volume if you only want to operate on one.

Become slave

To set a volume as slave:

1	`drbdadm secondary all`

Manual synchronization

To start a manual synchronization (will invalidate all your data):

1	`drbdadm invalidate all`

To do the same but on other nodes:

1	`drbdadm invalidate_remote all`

FAQ

My sync doesn't work, I have: Secondary/Unknown

If you have this type of message:

> cat /proc/drbd
version: 8.3.7 (api:88/proto:86-91)
srcversion: EE47D8BF18AC166BE219757 
 0: cs:StandAlone ro:Secondary/Unknown ds:Inconsistent/DUnknown   r----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:4194304

You need to check if the machines are properly configured for your resources and also if they can telnet to each other (firewalling etc...)

What to do in case of split brain?

If you find yourself in this situation:

1 2	`> cat /proc/drbd primary/unknown`

or

1	`secondary/unknown`

Unmount the drbd volumes
On the primary:

1	`drbdadm connect all`

On the secondary (this will destroy all data and reimport from the master)

drbdadm -- --discard-my-data connect all