If you have already set up MySQL replication, you know that you’re limited to 2 master nodes maximum. Many actions must be done manually, making it difficult to have something that’s both scalable and accessible for simultaneous writes, since by default writes are synchronous.
Note: If you need advice or support for MySQL/MariaDB/Galera, I recommend OceanDBA.
There’s a tool called Galera that integrates with MariaDB or MySQL (through recompilation in both cases) and allows multi-master replication (3 nodes minimum). There are several products that use Galera:
MariaDB Galera Cluster
Percona Cluster
MySQL Galera Cluster
For those wondering, this is really different from MySQL Cluster which knows how to scale writes. Here it’s only multi-threaded. And unlike MHA which is an asynchronous solution, Galera is synchronous.
Galera only works with the InnoDB engine and allows:
Synchronous replication
Active multi-master replication
Simultaneous read/write on multiple nodes
Automatic detection when a node fails
Automatic node reintegration
No lag on slaves
No lost transactions
Lower client latency
Although this seems perfect on paper, there are some limitations:
Only supports InnoDB
All tables must have primary keys
DELETE only works on tables with primary keys
LOCK/UNLOCK/GET_LOCK/RELEASE_LOCK doesn’t work in multi-master
Query logs can only be sent to files, not tables
XA transactions are not supported
To help you understand a typical architecture:
There’s also an online tool to help build this kind of infrastructure: Galera Configurator
At the time of writing, there is a small dependency issue with some packages. You’ll need to download them in advance and install them. You don’t need to follow the rest of this installation if you haven’t encountered errors with the previous steps.
# MariaDB database server configuration file.# Pierre Mavro / Deimosfr## You can copy this file to one of:# - "/etc/mysql/my.cnf" to set global options,# - "~/.my.cnf" to set user-specific options.## One can use all long options that the program supports.# Run program with --help to get a list of available options and with# --print-defaults to see which it would actually understand and use.## For explanations see# http://dev.mysql.com/doc/mysql/en/server-system-variables.html# This will be passed to all mysql clients# It has been reported that passwords should be enclosed with ticks/quotes# escpecially if they contain "#" chars...# Remember to edit /etc/mysql/debian.cnf when changing the socket location.[client]port=3306socket=/var/run/mysqld/mysqld.sock# Here is entries for some specific programs# The following values assume you have at least 32M ram# This was formally known as [safe_mysqld]. Both versions are currently parsed.[mysqld_safe]socket=/var/run/mysqld/mysqld.socknice=0[mysqld]## * Basic Settings#user=mysqlpid-file=/var/run/mysqld/mysqld.pidsocket=/var/run/mysqld/mysqld.sockport=3306basedir=/usrdatadir=/var/lib/mysqlinnodb_log_group_home_dir=/var/lib/mysqltmpdir=/tmplc_messages_dir=/usr/share/mysqllc_messages=en_USskip-external-lockingcharacter_set_server=utf8collation_server=utf8_general_ci## Instead of skip-networking the default is now to listen only on# localhost which is more compatible and is not less secure.bind-address=0.0.0.0## * Fine Tuning#max_connections=500connect_timeout=5wait_timeout=600max_allowed_packet=16Mthread_cache_size=128sort_buffer_size=16Mbulk_insert_buffer_size=16Mtmp_table_size=32Mmax_heap_table_size=64Mnet_buffer_length=4k## * MyISAM## This replaces the startup script and checks MyISAM tables if needed# the first time they are touched. On error, make copy and try a repair.myisam_recover=BACKUPkey_buffer_size=128M#open-files-limit = 2000table_open_cache=400myisam_sort_buffer_size=512Mconcurrent_insert=2read_buffer_size=2Mread_rnd_buffer_size=1M## * Query Cache Configuration## Cache only tiny result sets, so we can fit more in the query cache.query_cache_limit=128Kquery_cache_size=64M# for more write intensive setups, set to DEMAND or OFF#query_cache_type = DEMAND## * Logging and Replication## Both location gets rotated by the cronjob.# Be aware that this log type is a performance killer.# As of 5.1 you can enable the log at runtime!#general_log_file = /var/log/mysql/mysql.log#general_log = 1## Error logging goes to syslog due to /etc/mysql/conf.d/mysqld_safe_syslog.cnf.## we do want to know about network errors and suchlog_warnings=2## Enable the slow query log to see queries with especially long duration#slow_query_log[={0|1}]slow_query_log_file=/var/log/mysql/mariadb-slow.loglong_query_time=10#log_slow_rate_limit = 1000log_slow_verbosity=query_plan#log-queries-not-using-indexes#log_slow_admin_statements## The following can be used as easy to replay backup logs or for replication.# note: if you are setting up a replication slave, see README.Debian about# other settings you may need to change.#server-id = 1#report_host = master1#auto_increment_increment = 2#auto_increment_offset = 1log_bin=/var/log/mysql/mariadb-binlog_bin_index=/var/log/mysql/mariadb-bin.index# not fab for performance, but safer#sync_binlog = 1expire_logs_days=10max_binlog_size=100M# slaves#relay_log = /var/log/mysql/relay-bin#relay_log_index = /var/log/mysql/relay-bin.index#relay_log_info_file = /var/log/mysql/relay-bin.info#log_slave_updates#read_only## If applications support it, this stricter sql_mode prevents some# mistakes like inserting invalid dates etc.#sql_mode = NO_ENGINE_SUBSTITUTION,TRADITIONAL## * InnoDB## InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/.# Read the manual for more InnoDB related options. There are many!default_storage_engine=InnoDB# you can't just change log file size, requires special procedure#innodb_log_file_size = 50Minnodb_buffer_pool_size=256Minnodb_log_buffer_size=8Minnodb_log_file_size=256Mthread_concurrency=64innodb_thread_concurrency=64innodb_read_io_threads=16innodb_write_io_threads=16innodb_flush_log_at_trx_commit=2innodb_file_per_table=1innodb_open_files=400innodb_io_capacity=600innodb_lock_wait_timeout=60innodb_flush_method=O_DIRECTinnodb_doublewrite=0innodb_additional_mem_pool_size=20Minnodb_buffer_pool_restore_at_startup=500innodb_file_per_table## * Security Features## Read the manual, too, if you want chroot!# chroot = /var/lib/mysql/## For generating SSL certificates I recommend the OpenSSL GUI "tinyca".## ssl-ca=/etc/mysql/cacert.pem# ssl-cert=/etc/mysql/server-cert.pem# ssl-key=/etc/mysql/server-key.pem[mysqldump]quickquote-namesmax_allowed_packet=16M[mysql]#no-auto-rehash # faster start of mysql but no tab completition[isamchk]key_buffer=16M[mysqlhotcopy]interactive-timeout## * IMPORTANT: Additional settings that can override those from this file!# The files must end with '.cnf', otherwise they'll be ignored.#!includedir /etc/mysql/conf.d/
Now we’ll delete the log files because we just changed their configuration (or we won’t be able to start the instance) and be able to start our MariaDB service:
# MariaDB-specific config file.# Read by /etc/mysql/my.cnf[client]# Default is Latin1, if you need UTF-8 set this (also in server section)#default-character-set = utf8[mysqld]## * Character sets## Default is Latin1, if you need UTF-8 set all this (also in client section)##character-set-server = utf8#collation-server = utf8_general_ci#character_set_server = utf8#collation_server = utf8_general_ci# Load Galera Clusterwsrep_provider=/usr/lib/galera/libgalera_smm.sowsrep_cluster_name='mariadb_cluster'wsrep_node_name=node2wsrep_node_address="10.0.0.2"wsrep_cluster_address='gcomm://10.0.0.1,10.0.0.2,10.0.0.3,10.0.0.4'wsrep_retry_autocommit=0wsrep_sst_method=rsyncwsrep_provider_options="gcache.size = 1G; gcache.name = /tmp/galera.cache"#wsrep_replication_myisam = 1#wsrep_sst_receive_address = <x.x.x.x>#wsrep_notify_cmd="script.sh"# Other mysqld optionsbinlog_format=ROWinnodb_autoinc_lock_mode=2innodb_flush_log_at_trx_commit=2innodb_locks_unsafe_for_binlog=1
wsrep_cluster_name: the name of the Galera cluster. Use this especially if you have multiple Galera clusters in the same subnet to prevent nodes from entering the wrong cluster.
wsrep_node_name: the name of the machine where this configuration file is located. You’ll understand that you absolutely must avoid duplicates (especially for debugging ;-))
wsrep_node_address: IP address of the current node (same warning as the previous line)
wsrep_cluster_address: list of cluster members that can be master (separated by commas).
Adapt the wsrep_node_name and wsrep_cluster_address lines to your respective machines.
The above configuration is applicable to all nodes except the master (node 1). The master should have the identical configuration with the only difference being this line:
1
wsrep_cluster_address='gcomm://'
warning
It’s important that only one machine has the ‘gcomm://’ configuration, as this initializes the cluster.
It’s possible to set up a geo cluster, but the disadvantage is that when there’s a network outage exceeding the specified timeouts, one of the clusters will have to completely resynchronize. Here’s the line to configure to modify these timeouts:
If you use the ‘wsrep_sst_receive_address’ option, you’ll need to add a parameter to this line (ist.recv_addr) with the same IP as the ‘wsrep_sst_receive_address’ option:
There are several solutions for data transfers between nodes. In the example above, we used rsync. When a machine requests a donor (another machine) to receive data, transactions are blocked for the donor during the data exchange period with the receiver!!!
warning
If you use a load balancer, you’ll need to remove the donor node during this period.
We also see that the wsrep_local_state value changes to 4 when the process is complete.
info
The problem with this method is that it doesn’t handle IST (Incremental State Transfer). Everything must be transferred in case of problems, not just the incremental data. You need to look at rsync or xtrabackup methods to be able to do IST.
XtraBackup is the best method today. It allows transfers while minimizing lock time to just a few seconds. To use this method, we need to install XtraBackup.
To install it on Debian, it’s very simple, we’ll add its repository:
Before using, you need to understand the principle. When we start our MariaDB instances, we’ll end up in this configuration (I deliberately didn’t draw all the communication arrows to avoid clutter, but all nodes talk to each other):
Node 1 initializes the cluster with the empty gcomm value.
The other nodes connect to node 1 and exchange their data to have the same data level everywhere
To avoid split brains (cluster inconsistencies), it’s advisable to use a tool provided with the Galera cluster that acts as a cluster quorum, especially if you’re in 2-node mode (and that’s generally the main advantage). Here’s a use case:
You’ll need to use the garbd service. It’s installed by default but simply not activated. To configure it, we’ll edit its configuration:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Copyright (C) 2012 Coedership Oy# This config file is to be sourced by garb service script.# A space-separated list of node addresses (address[:port]) in the clusterGALERA_NODES="10.0.0.1:4567 10.0.0.2:4567 10.0.0.3:4567 10.0.0.4:4567"# Galera cluster name, should be the same as on the rest of the nodes.GALERA_GROUP="mariadb_cluster"# Optional Galera internal options string (e.g. SSL settings)# see http://www.codership.com/wiki/doku.php?id=galera_parameters# GALERA_OPTIONS=""# Log file for garbd. Optional, by default logs to syslog# LOG_FILE=""
Adapt GALERA_NODES with the list of all your nodes and GALERA_GROUP with the name of the Galera cluster. Now we just need to activate it at machine startup and start the service:
1
2
update-rc.d -f garb defaults
service garb start
Now, on one of your nodes, you’ll see there’s a new node, which is actually just the quorum:
The ‘galera-info’ option prevents problems during the uuid request (which will always return 0). If this option is not specified, incremental restorations won’t be possible.
info
For databases containing only InnoDB tables, it’s possible to have no blocking at all during the lock by adding the ‘–no-lock’ option.
During restoration, one of the nodes will have blocked transactions after the Xtrabackup restoration, while the differential is being applied. To restore, copy the backup files to the MariaDB directory:
You don’t need to worry about replication if a node other than the master (node1) fails. Once repaired and powered on, it will automatically reconnect to node 1 and catch up. However, in case of a problem with node 1:
The other nodes will continue to communicate with each other and wait for the master to return. Once the master is turned back on, you’ll need to tell it another node to which it should connect to continue synchronization:
Whether you want to force a reconnection or perform maintenance on the master node, it’s advisable to redirect the other servers to another master to avoid outages:
I’ve also tested violently shutting down any node and turning it back on. Once integrated into the cluster, it properly retrieves all the differential information. I had no corruption problems. The only issues I encountered were with MariaDB startup and lock problems as explained in the FAQ.
It’s possible to update a node from a delta between an up-to-date version and one that’s behind. To do this, check the version in which one of the up-to-date nodes is:
If you really don’t know what a node has and you want to completely resynchronize it because you’re unsure about its data, just delete the MariaDB content and restart it:
1
2
3
service mysql stop
rm -Rf /var/lib/mysql/*
service mysql start
All data will then resynchronize.
warning
This can take some time if your database is large or if the bandwidth between nodes is low.
When you have one or more nodes in a split brain state, it’s possible to continue using the cluster and discard all changes from the other down nodes so they’ll do a complete sync when they start back up. On a ‘primary’ server:
One of my MariaDB services refuses to start after shutting it down link
It can happen that when you turn off a MariaDB service, the lock files aren’t properly released and the rsync service is still running. To clean everything up (without completely restarting the machine), follow these steps:
Make sure MariaDB is no longer running:
1
2
service mysql stop
ps aux | grep mysql
If it’s still running, kill the process!
Check that the rsync process is no longer running and kill it if it is
Delete the lock files:
1
rm -f /var/run/mysqld/mysqld.sock
Check that the directory for storing the pid exists, otherwise create it:
1
if[ ! -d /var/run/mysqld ] ; then mkdir /var/run/mysqld ; chown mysql. /var/run/mysqld ; fi