GlusterFS: High Availability Cluster Filesystem
Introduction
GlusterFS is an open source distributed parallel file system capable of scaling to several petabytes. GlusterFS is a cluster/network file system. GlusterFS comes with two components, a server and a client. The storage server (or each server in a cluster) runs glusterfsd and clients use the mount command or glusterfs client to mount the file systems served, using FUSE.
The goal here is to run 2 servers that will perform complete replication of part of a filesystem.
Be careful not to run this type of architecture on the Internet as performance will be catastrophic. Indeed, when a node wants to read access a file, it must contact all other nodes to see if there are any discrepancies. Only then does it authorize reading, which can take a long time depending on the architectures.
Installation
To install on Debian…easy move:
aptitude install glusterfs-server glusterfs-examples
Configuration
hosts
As with any respectable cluster, we must correctly configure the hosts table to avoid troubles in case of DNS loss. Add the following hosts:
(/etc/hosts
)
192.168.110.2 rafiki.deimos.fr rafiki
192.168.20.6 ed.deimos.fr ed
Generating Configurations
We’ll simplify things here by generating a configuration for a RAID 1:
cd /etc/glusterfs
rm -f *
/usr/bin/glusterfs-volgen --name www --raid 1 rafiki:/var/www-orig ed:/var/www-orig
Then on each server, rename the file corresponding to the server you’re on to glusterfsd.vol and the tcp file to glusterfs.vol:
mv rafiki-www-export.vol glusterfsd.vol
mv www-tcp.vol glusterfs.vol
Don’t forget to do the same on the other server and you can restart your glusterfs server.
Server
On the server side, we’ll apply this configuration:
(/etc/glusterfs/glusterfsd.vol
)
### file: server-volume.vol.sample
#####################################
### GlusterFS Server Volume File ##
#####################################
#### CONFIG FILE RULES:
### "#" is comment character.
### - Config file is case sensitive
### - Options within a volume block can be in any order.
### - Spaces or tabs are used as delimitter within a line.
### - Multiple values to options will be : delimitted.
### - Each option should end within a line.
### - Missing or commented fields will assume default values.
### - Blank/commented lines are allowed.
### - Sub-volumes should already be defined above before referring.
volume posix1
type storage/posix
option directory /var/www
end-volume
volume locks1
type features/locks
subvolumes posix1
end-volume
volume brick1
type performance/io-threads
option thread-count 8
subvolumes locks1
end-volume
volume server-tcp
type protocol/server
option transport-type tcp
option auth.addr.brick1.allow *
option transport.socket.listen-port 6996
option transport.socket.nodelay on
subvolumes brick1
end-volume
Client
For the client part, we tell it that we want to do “raid1”. Here is the configuration to apply on the “ed” node:
(/etc/glusterfs/glusterfs.vol
)
### file: client-volume.vol.sample
#####################################
### GlusterFS Client Volume File ##
#####################################
#### CONFIG FILE RULES:
### "#" is comment character.
### - Config file is case sensitive
### - Options within a volume block can be in any order.
### - Spaces or tabs are used as delimitter within a line.
### - Each option should end within a line.
### - Missing or commented fields will assume default values.
### - Blank/commented lines are allowed.
### - Sub-volumes should already be defined above before referring.
# RAID 1
# TRANSPORT-TYPE tcp
volume ed-1
type protocol/client
option transport-type tcp
option remote-host ed
option transport.socket.nodelay on
option transport.remote-port 6996
option remote-subvolume brick1
end-volume
volume rafiki-1
type protocol/client
option transport-type tcp
option remote-host rafiki
option transport.socket.nodelay on
option transport.remote-port 6996
option remote-subvolume brick1
end-volume
volume mirror-0
type cluster/replicate
subvolumes rafiki-1 ed-1
end-volume
volume readahead
type performance/read-ahead
option page-count 4
subvolumes mirror-0
end-volume
volume iocache
type performance/io-cache
option cache-size `echo $(( $(grep 'MemTotal' /proc/meminfo | sed 's/[^0-9]//g') / 5120 ))`MB
option cache-timeout 1
subvolumes readahead
end-volume
volume quickread
type performance/quick-read
option cache-timeout 1
option max-file-size 64kB
subvolumes iocache
end-volume
volume writebehind
type performance/write-behind
option cache-size 4MB
subvolumes quickread
end-volume
volume statprefetch
type performance/stat-prefetch
subvolumes writebehind
end-volume
Execution
Server
Restart glusterfs after adapting to your needs.
Client
Simply mount the glusterfs partition:
glusterfs /var/www
You now have access to your glusterfs mount point in /var/www.
FAQ
Force Client Synchronization
If you want to force data synchronization for a client, it’s simple. Just go to the directory where the glusterfs share is located (here /mnt/glusterfs), then perform a directory traversal like this:
ls -lRa
This will read everything and therefore copy everything.
www-posix: Extended attribute not supported
If you look in your logs and see something like this:
(/var/log/glusterfs/glusterfsd.vol.log
)
...
+------------------------------------------------------------------------------+
[2010-10-17 00:40:30] W [afr.c:2743:init] www-replicate: Volume is dangling.
[2010-10-17 00:40:30] C [posix.c:4936:init] www-posix: Extended attribute not supported, exiting.
[2010-10-17 00:40:30] E [xlator.c:844:xlator_init_rec] www-posix: Initialization of volume 'www-posix' failed, review your volfile again
[2010-10-17 00:40:30] E [glusterfsd.c:591:_xlator_graph_init] glusterfs: initializing translator failed
[2010-10-17 00:40:30] E [glusterfsd.c:1395:main] glusterfs: translator initialization failed. exiting
It means you have permission problems. In my case, this happened in an OpenVZ container. To solve the problem, here’s the solution to apply on the host machine (not in the VE) (warning, this requires stopping, applying configurations, then restarting the VE):
If you want to do glusterfs in a VE, you may encounter permission problems:
fuse: failed to open /dev/fuse: Permission denied
To work around them, we’ll create the fuse device from the host on the VE in question and add admin rights to it (not great in terms of security, but no choice):
vzctl set $my_veid --devices c:10:229:rw --save
vzctl exec $my_veid mknod /dev/fuse c 10 229
vzctl set $my_veid --capability sys_admin:on --save
Note: Don’t forget to load the fuse module on your host machine:
(/etc/modules
)
...
fuse
Resources
Last updated 11 Apr 2011, 08:18 CEST.