BTRFS: Using the Ext4 Replacement
Software version | 0.19 |
Operating System | Debian 7 |
Website | BTRFS Website |
Others | Kernel used: 3.2.0-2-amd64 |
Introduction
BTRFS is the perfect replacement for the aging ExtX filesystem. For those familiar with the ZFS filesystem, BTRFS draws heavily from it.
BTRFS, like Ext4, is based on the concept of extents. This is a contiguous area (which can reach several hundred MB, unlike the clusters of some older formats) reserved each time a file is saved on the hard drive. This allows, in case of writing at the end of a file (append) or a complete rewrite, to often add the new data directly to the extent rather than in another area of the hard disk, which would increase fragmentation. Large files are thus stored more efficiently through a greater disk space occupation, but at a cost that has decreased considerably. BTRFS stores the data of very small files directly in the extent of the directory file, not in a separate extent.
BTRFS manages a notion of “subvolumes” allowing, within the filesystem, to have a separate tree (including the root) containing directories and files, giving the possibility to have various trees simultaneously, and therefore a greater independence from the main system. This also allows for better separation of data and imposing different quotas on different subvolumes. The most practical use of this system concerns snapshots. A snapshot offers the possibility to “take a photograph” at a given moment of a filesystem to back it up. This snapshot under BTRFS is a subvolume, which allows it to be modified afterward. Having a snapshot accessible in write mode is of obvious interest for high-availability online databases.
To exploit these subvolumes and snapshots, BTRFS uses the classic technique of “Copy-on-write”. If data is written to a memory block, then the block will be copied to another location in the filesystem and the new data will be recorded on the copy instead of on the original. Then the metadata pointing to the block is automatically modified to take into account the new data. We thus have a transactional mechanism distinct from the journaling present in Ext3. Before each write, taking a snapshot of the system would allow, in case of a problem, to return to the snapshot, but this seems to pose, if not performance problems, at least questions: should you take a snapshot at each write, or for a certain volume of data? This also raises the question of time lost at each creation/destruction of a snapshot. The use of snapshots for this purpose is not emphasized by the developers.
BTRFS has its own data protection techniques: the use of back references (i.e., knowing, from a data block, which metadata points to the block) allows identification of system corruptions. If a file claims to belong to a set of blocks and these blocks claim to be related to another file, this indicates that the consistency of the system is altered. BTRFS also performs checksums on all data and stored metadata to detect all kinds of corruptions on the fly, repair some of them, and thus offer a better level of reliability.
It allows hot resizing of the filesystem size (including shrinking it) while maintaining excellent protection of metadata that is duplicated in several places for security. The operation is simple: btrfsctl -r +2g /mnt adds 2 GiB to your filesystem. This function is not intended to be redundant with what the Linux logical volume manager offers but claims to technically complement it.
Checking the filesystem through the btrfsck program is error-tolerant and presented as extremely fast by its design. The use of B-trees allows exploring the disk structure at a speed essentially limited by the disk’s read speed. The price to pay is a strong memory footprint since btrfsck uses three times more memory than e2fsck.
BTRFS respects the hierarchy of Linux’s functional “layers”. For example, while offering functions to complement it, it tries as much as possible not to rewrite the whole volume management system proposed as standard by LVM.
Google’s lightweight and fast Snappy compression algorithm was added in January 2012, allowing for faster data access compared to LZO compression (around 10%) and no compression (around 15%).
It was followed by the LZ4 compression algorithm in February 2012, which further improves performance compared to Snappy (by about 20-30%).
Installation
We’ll need to install the BTRFS tools:
|
|
Usage
In my use cases below, many examples will be in relation to a configuration of this type:
|
|
sda3 and sda4 are partitions we’ll be working with.
Creating a BTRFS Partition
I have a 3GB partition here (sda4). I’ll format it as BTRFS:
|
|
Note: It is strongly recommended to create a BTRFS partition on LVM for future hot resizing!
My partition is ready to be mounted:
|
|
And we can see that the partition is correctly mounted:
|
|
Subvolumes
Just like ZFS, it’s possible to create subvolumes. That is, in a formatted partition (think of it as a VG under LVM), it’s possible to create subvolumes (LV under LVM) but whose use allows great data flexibility.
Let’s create our first subvolume:
|
|
A subvolume is materialized at the directory tree level by a directory present at the root of the volume’s mount point.
|
|
The number 145 uniquely identifies our subvolume. The volume1 path is also indicated. The volume1 path, which is also the name of the subvolume, is relative to the root mount of our btrfs volume.
To mount a subvolume at the same location:
|
|
You can also mount your subvolume from its identifier:
|
|
Converting an extX Partition to BTRFS
It’s possible to convert an ext3 or ext4 partition to btrfs! In this case, I’ll convert sda3, which is already in ext4. We’ll use the btrfs-convert command:
|
|
And there it is, that’s all :-). It’s simple, right! I can now mount it:
|
|
And verify that everything is good:
|
|
Resizing a Partition
Method 1: Filesystem Expansion
To increase the size of a partition on the fly, it’s very simple as long as we’re on LVM or if we have the partition that physically has at least the desired size. Otherwise, we’ll need to do a cold operation to expand the partition, then the filesystem. The difference with an ext-type filesystem is that we can specify that the partition size can take a size x without taking the whole (like resize2fs).
|
|
Method 2: Adding a Device
We have another possibility in case our current volume becomes too small… we can add a device to an existing volume. First, let’s get a status of the btrfs volumes present on our system:
|
|
If I look at my mounted partitions:
|
|
I have my sda3 which is 1G. We’ll add sda4 to it:
|
|
Let’s check that /mnt has been enlarged:
|
|
Method 3: RAID 0
There is a solution like RAID 0, but with data distribution and metadata replication of the filesystem on all disks:
|
|
The data is in RAID0 between the different partitions while the system and metadata are RAID1. This means that even if you lose one of the system partitions, you will still be able to mount the remaining partition. However, since the data is not replicated but distributed, you will have lost the data present on the disappeared partition.
Reducing a Partition
To reduce a partition on the fly, it’s really very simple, just specify the size, then the mount point on which you want to remove the size:
|
|
And that’s it :-)
RAID 1
It’s possible to do software RAID 1. I remind you that you need disks of the same size, or it will be the size of the smallest disk that will be used. Let’s start by initializing our RAID:
|
|
Then you can mount sda3 or sda4, the replication is done :-)
Compression
Cold Method
It’s possible to have a compressed filesystem. For this, nothing could be simpler:
|
|
If we add the compress-force option, the compression on files that btrfs will be greater. By default, btrfs doesn’t compress well, because for large files it can lead to a lot of I/O. The behavior of btrfs’s on-the-fly compression algorithm therefore tries to spare the processor when it determines according to its first operations if a file can be difficult to compress:
|
|
Hot Method
If you want to perform the same compression activation operation directly from a mounted btrfs filesystem, we can use the following command which will activate the compression option and compress the data already present on the disk:
|
|
Snapshot
There are several complementary tools that allow managing snapshots such as Snapper or yum-plugin-fs-snapshot on Fedora/RedHat. But for now, we’ll see how to manage snapshots the standard btrfs way.
We create from a volume the snapshot that will allow us to make modifications on this file tree:
|
|
We now unmount the current volume to mount our snapshot instead in which we create a new file:
|
|
Revert
If we want to cancel our modifications, it will be very simple, we will unmount our snapshot and mount the old one:
|
|
Merge
If we want to merge the data, we need to retrieve the ID of our subvolume. Then the set-default order, followed by the subvolume ID followed by the original volume’s mount point allows declaring a new default volume:
|
|
References
https://fr.wikipedia.org/wiki/Btrfs
https://www.funtoo.org/wiki/BTRFS_Fun
https://www.rashardkelly.com/extending-a-btrfs-filesystem-2/
Last updated 05 Jul 2012, 21:08 CEST.