Table of Contents
ClusterControl can be deployed in a couple of different ways, one is to form a cluster of ClusterControl nodes and then ensure that there is always a node that will be able to deal with the cluster management. We called this solution CMON HA, to build highly available clusters of cmon daemons to achieve ClusterControl high availability. There is another solution called ClusterControl secondary standby host which acts as a hot standby in case the primary ClusterControl host goes down.
Requirements
In order to deploy a cluster of ClusterControl nodes, the following requirements must be met:
- CMON HA is only available on ClusterControl v1.9.6 and later.
- A minimum of 3 ClusterControl nodes is required.
- A multi-master MySQL/MariaDB solution, e.g., MySQL/MariaDB Galera Cluster to host the cmon database on each ClusterControl Controller host.
- All ClusterControl Controllers (cmon) must be able to communicate and “see” each other (including the Galera Cluster). If they communicate over WAN, this can be achieved by using site-to-site VPN (if ClusterControl controllers are configured with private IP addresses), point-to-point tunneling (e.g, WireGuard, AutoSSH tunneling, PPTP), cloud VPC peering, private connectivity (e.g, AWS DirectConnect, Azure ExpressRoute, GCP Interlink) or direct public IP address communication over WAN. See the Deployment Steps section for ports to be allowed.
- Any of the ClusterControl Controllers must not manage nor monitor database clusters when activating the CMON HA. Otherwise, you must re-import them back again into ClusterControl.
- ClusterControl CMON HA feature must be enabled via ClusterControl CLI, otherwise, ClusterControl will just act as a standalone controller.
CMON Cluster Operation
ClusterControl CMON HA is a high-availability extension to ClusterControl’s controller backend. It uses RAFT protocol for member election and keeping configuration files in sync. All the CMON HA nodes connect to the same database cluster system for the cmon database, and this database cluster is not managed by the CMON HA.
In RAFT, there is one leader at a given time and the rest of the nodes are the followers. The leader sends heartbeats to the followers. If a follower does not get a heartbeat in time, it sets itself into a candidate state and starts an election. The candidate always votes for itself, so to have a majority, at least half of the remaining nodes must vote for it. If the election is won, the node sets itself to leader status and starts to send heartbeats.
There are some important considerations one has to know when running CMON HA:
- Limited failure tolerance – The RAFT protocol tolerates only one node failure in a 3-node cluster. For having two-node failure tolerance, the set needs at least 5 nodes. For example, in a 3-node CMON HA cluster, if two nodes lose network connection, there can not be a majority during the election requested by any of the nodes.
- Extension to avoid split brain – There is protection against split brain in Cluster Control. If the heartbeat is not accepted by at least half of the followers, then a network partition is assumed and the leader recognized as being in the minority will step down to be a follower. At the same time, the followers being in the majority in another network partition will elect a new leader if the heartbeat is not received. If there is no majority in any of the network partitions, there will be no leader elected.
- Every ClusterControl controller connects to the cmon database locally, where the connection string is 127.0.0.1 (see the
mysql_hostname
value inside/etc/cmon.cnf
). Therefore, every ClusterControl node must have a MySQL/MariaDB database service and must be part of the Galera Cluster members. - Galera special behavior – It is good to keep in mind, that in the case of a 3-node Galera Cluster, which is recommended to store the cmon database, if none of the nodes can reach the other, the Galera Cluster will become operating only if all the nodes are recovered and the cluster is complete again. As long as the Galera cluster is not recovered, neither can the CMON HA cluster work properly.
As this is a new feature, there are a number of known limitations:
- No automatic CMON HA automatic deployment support. Installation and deployment are shown below.
- The database cluster storing the cmon database is not managed and is expected not to be managed by the CMON HA instance that uses it as a data storage backend.
- There is no way to force an election between CMON HA nodes.
- There are no priorities amongst the CMON HA node to be one or the other the preferred leader.
- There is no way to manually choose a CMON HA node to be a leader.
- At the moment, ClusterControl GUI has no automatic redirection to the leader. You will get a redirection error if you are login to a controller that is in the follower state. See Connecting to ClusterControl GUI.
Deployment Steps
There can be many ways to install the CMON HA cluster. One could use another ClusterControl install to install a Galera node and use it as the cmon database for the CMON HA cluster, or use the ClusterControl installer script and convert the MySQL/MariaDB installation to a Galera Cluster. The deployment steps explained in this article will be based on the latter.
In this example, we will deploy a 3-node CMON HA cluster including a MariaDB Galera Cluster as the cmon database backend. Assume we have 3 hosts capable of communicating via public IP addresses:
- 100.18.98.75 – cmon1 (Site A)
- 35.208.13.166 – cmon2 (Site B)
- 202.131.17.88 – cmon3 (Site C)
ClusterControl nodes must be able to communicate with each other on the following ports:
- tcp/9500 – CMON RPC
- tcp/9501 – CMON RPC (TLS)
- tcp/3306 – MySQL/MariaDB Galera Cluster – database connections
- tcp/4567 – MySQL/MariaDB Galera Cluster – gcomm
- tcp/4568 – MySQL/MariaDB Galera Cluster – IST
- tcp/4444 – MySQL/MariaDB Galera Cluster – SST
The following steps are based on Rocky Linux 8 64bit. We expect similar steps for other similar RHEL-based OS distributions. Execute the commands on all 3 CMON HA nodes unless specified otherwise.
1) Prepare the host and download the installer script:
dnf install -y wget vim epel-release curl net-tools sysstat mlocate
wget https://severalnines.com/downloads/cmon/install-cc
chmod 755 install-cc
2) Perform ClusterControl installation using the install-cc
script (we use a one-liner method and define the public IP address as the host value):
cmon1:
HOST=100.18.98.75 \
S9S_CMON_PASSWORD='q1w2e3!@#' \
S9S_ROOT_PASSWORD='q1w2e3!@#' \
S9S_DB_PORT=3306 \
./install-cc
cmon2:
HOST=35.208.13.166 \
S9S_CMON_PASSWORD='q1w2e3!@#' \
S9S_ROOT_PASSWORD='q1w2e3!@#' \
S9S_DB_PORT=3306 \
./install-cc
cmon3:
HOST=202.131.17.88 \
S9S_CMON_PASSWORD='q1w2e3!@#' \
S9S_ROOT_PASSWORD='q1w2e3!@#' \
S9S_DB_PORT=3306 \
./install-cc
S9S_ROOT_PASSWORD
is the MySQL root user password where the cmon database is hosted, while S9S_CMON_PASSWORD
is the cmon database user password.
3) Once the installation completes on all nodes, we have to stop the cmon service for CMON HA preparation:
systemctl stop cmon
4) Comment cmon cron job temporarily to make sure cmon will not be started automatically (we will enable it back later):
sed -i '/^\*.*pidof/s/^/# /' /etc/cron.d/cmon
5) Run the following command to set the ClusterControl Controller service to listen to all IP addresses:
echo 'RPC_BIND_ADDRESSES="0.0.0.0"' | sudo tee -a /etc/default/cmon
6) Stop the MariaDB service:
systemctl stop mariadb
7) Convert the default MariaDB 10.3 installed by the installer script to MariaDB Galera Cluster:
dnf install -y mariadb-server-galera mariadb-backup
This command will install the necessary packages to run a Galera Cluster like the Galera replication library, some dependencies and also backup/restore tools.
8) Copy cmon_password
value inside /etc/s9s.conf
on cmon1 to all nodes (we will use cmon1 as the reference point):
cmon_password = "d30bc09b-ecd4-4812-8b56-15e6dd524dee"
9) Set the following lines inside /etc/my.cnf
under the [mysqld]
directive:
cmon1:
wsrep_on = ON
wsrep_node_address = 100.18.98.75 # cmon1 primary IP address
wsrep_provider = '/usr/lib64/galera/libgalera_smm.so'
wsrep_provider_options = 'gcache.size=1024M;gmcast.segment=0'
wsrep_cluster_address = gcomm://100.18.98.75,35.208.13.166,202.131.17.88 # All nodes' IP addresses
wsrep_cluster_name = 'CMON_HA_Galera'
wsrep_sst_method = rsync
binlog_format = 'ROW'
cmon2:
wsrep_on = ON
wsrep_node_address = 35.208.13.166 # cmon2 primary IP address
wsrep_provider = '/usr/lib64/galera/libgalera_smm.so'
wsrep_provider_options = 'gcache.size=1024M;gmcast.segment=0'
wsrep_cluster_address = gcomm://100.18.98.75,35.208.13.166,202.131.17.88 # All nodes' IP addresses
wsrep_cluster_name = 'CMON_HA_Galera'
wsrep_sst_method = rsync
binlog_format = 'ROW'
cmon3:
wsrep_on = ON
wsrep_node_address = 202.131.17.88 # cmon3 primary IP address
wsrep_provider = '/usr/lib64/galera/libgalera_smm.so'
wsrep_provider_options = 'gcache.size=1024M;gmcast.segment=0'
wsrep_cluster_address = gcomm://100.18.98.75,35.208.13.166,202.131.17.88 # All nodes' IP addresses
wsrep_cluster_name = 'CMON_HA_Galera'
wsrep_sst_method = rsync
binlog_format = 'ROW'
10) Bootstrap the Galera cluster on the first node only, cmon1:
galera_new_cluster
11) On the remaining nodes (cmon2 and cmon3), remove the grastate.dat file to force an SST (full syncing) from cmon1 and start the MariaDB Galera service (one node at a time):
rm -f /var/lib/mysql/grastate.dat
systemctl start mariadb
12) Verify the Galera Cluster is communicating correctly. On all nodes, you should see the following:
$ mysql -uroot -p -e "show status like 'wsrep%'"
...
| wsrep_cluster_size | 3 |
| wsrep_cluster_status | Primary |
| wsrep_ready | ON |
| wsrep_local_state_comment | Synced |
...
Do not proceed to the next step until you get the same output as above. The cluster status must be Primary and Synced, with the correct cluster size (total number of nodes in a cluster).
13) Now we are ready to start the cmon service, enable back the cmon cron, and activate CMON HA on the first node (only proceed to the next node after all commands are successful):
cmon1:
systemctl start cmon
sed -i '/pidof/ s/# *//' /etc/cron.d/cmon
s9s controller --enable-cmon-ha
cmon2:
systemctl start cmon
sed -i '/pidof/ s/# *//' /etc/cron.d/cmon
cmon3:
systemctl start cmon
sed -i '/pidof/ s/# *//' /etc/cron.d/cmon
14) Verify if CMON HA can see all nodes in the cluster:
$ s9s controller --list --long
S VERSION OWNER GROUP NAME IP PORT COMMENT
l 1.9.6.6408 system admins 100.18.98.75 100.18.98.75 9501 CmonHA just become enabled, starting as leader.
f 1.9.6.6408 system admins 35.208.13.166 35.208.13.166 9501 Responding to heartbeats.
f 1.9.6.6408 system admins 202.131.17.88 202.131.17.88 9501 Responding to heartbeats.
The leftmost column indicates the controller role, l
means leader, and f
means follower. In the above output, cmon1 is the leader.
15) Open ClusterControl GUI of the leader nodes via web browser by going to https://{leader_host_ip_address}/clustercontrol
and create a new admin user. In this particular example, the ClusterControl GUI URL should be https://100.18.98.75/clustercontrol
(the leader controller). After creating the admin user, you will be redirected to the ClusterControl dashboard panel where you can start managing your database clusters. See User Guide GUI or User Guide GUI v2.
The following steps are based on Ubuntu 22.04 LTS 64bit (Jammy Jellyfish). We expect similar steps for other similar Debian-based OS distributions. Execute the commands on all 3 CMON HA nodes unless specified otherwise.
1) Prepare the host and download the installer script:
apt install -y wget vim curl net-tools sysstat mlocate
wget https://severalnines.com/downloads/cmon/install-cc
chmod 755 install-cc
2) Before running the installer script, we have to modify it to install MariaDB server/client instead (otherwise, the installer script will default to MySQL 8.0 installation available in the repository):
sed -i 's/mysql-server/mariadb-server/g' install-cc
sed -i 's/mysql-client/mariadb-client/g' install-cc
3) Perform ClusterControl installation using the install-cc
script (we use a one-liner method and define the public IP address as the host value):
cmon1:
HOST=100.18.98.75 \
S9S_CMON_PASSWORD='q1w2e3!@#' \
S9S_ROOT_PASSWORD='q1w2e3!@#' \
S9S_DB_PORT=3306 \
./install-cc
cmon2:
HOST=35.208.13.166 \
S9S_CMON_PASSWORD='q1w2e3!@#' \
S9S_ROOT_PASSWORD='q1w2e3!@#' \
S9S_DB_PORT=3306 \
./install-cc
cmon3:
HOST=202.131.17.88 \
S9S_CMON_PASSWORD='q1w2e3!@#' \
S9S_ROOT_PASSWORD='q1w2e3!@#' \
S9S_DB_PORT=3306 \
./install-cc
S9S_ROOT_PASSWORD
is the MySQL root user password where the cmon database is hosted, while S9S_CMON_PASSWORD
is the cmon database user password.
After the modification on step 2, this script will install the necessary packages for MariaDB 10.6 where Galera Cluster is included.
4) Once the installation completes on all nodes, we have to stop the cmon service for CMON HA preparation:
systemctl stop cmon
5) Comment cmon cron job temporarily to make sure cmon will not be started automatically (we will enable it back later):
sed -i '/^\*.*pidof/s/^/# /' /etc/cron.d/cmon
6) Run the following command to set the ClusterControl Controller service to listen to all IP addresses:
echo 'RPC_BIND_ADDRESSES="0.0.0.0"' | sudo tee -a /etc/default/cmon
7) Stop the MariaDB service:
systemctl stop mariadb
8) Copy cmon_password
value inside /etc/s9s.conf
on cmon1 to all nodes (we will use cmon1 as the reference node):
cmon_password = "d30bc09b-ecd4-4812-8b56-15e6dd524dee"
9) Set the following lines inside /etc/my.cnf
under the [mysqld]
directive:
cmon1:
wsrep_on = ON
wsrep_node_address = 100.18.98.75 # cmon1 primary IP address
wsrep_provider = '/usr/lib/galera/libgalera_smm.so'
wsrep_provider_options = 'gcache.size=1024M;gmcast.segment=0'
wsrep_cluster_address = gcomm://100.18.98.75,35.208.13.166,202.131.17.88 # All nodes' IP addresses
wsrep_cluster_name = 'CMON_HA_Galera'
wsrep_sst_method = rsync
binlog_format = 'ROW'
cmon2:
wsrep_on = ON
wsrep_node_address = 35.208.13.166 # cmon2 primary IP address
wsrep_provider = '/usr/lib/galera/libgalera_smm.so'
wsrep_provider_options = 'gcache.size=1024M;gmcast.segment=0'
wsrep_cluster_address = gcomm://100.18.98.75,35.208.13.166,202.131.17.88 # All nodes' IP addresses
wsrep_cluster_name = 'CMON_HA_Galera'
wsrep_sst_method = rsync
binlog_format = 'ROW'
cmon3:
wsrep_on = ON
wsrep_node_address = 202.131.17.88 # cmon3 primary IP address
wsrep_provider = '/usr/lib/galera/libgalera_smm.so'
wsrep_provider_options = 'gcache.size=1024M;gmcast.segment=0'
wsrep_cluster_address = gcomm://100.18.98.75,35.208.13.166,202.131.17.88 # All nodes' IP addresses
wsrep_cluster_name = 'CMON_HA_Galera'
wsrep_sst_method = rsync
binlog_format = 'ROW'
10) Bootstrap the Galera cluster on the first node only, cmon1:
galera_new_cluster
11) On the remaining nodes (cmon2 and cmon3), remove the grastate.dat file to force an SST (full syncing) from cmon1 and start the MariaDB Galera service (one node at a time):
rm -f /var/lib/mysql/grastate.dat
systemctl start mysql
12) Verify the Galera Cluster is communicating correctly. On all nodes, you should see the following:
$ mysql -uroot -p -e "show status like 'wsrep%'"
...
| wsrep_cluster_size | 3 |
| wsrep_cluster_status | Primary |
| wsrep_ready | ON |
| wsrep_local_state_comment | Synced |
...
Do not proceed to the next step until you get the same output as above. The cluster status must be Primary and Synced, with the correct cluster size (total number of nodes in a cluster).
13) Now we are ready to start the cmon service, enable back the cmon cron, and activate CMON HA on the first node (only proceed to the next node after all commands are successful):
cmon1:
systemctl start cmon
sed -i '/pidof/ s/# *//' /etc/cron.d/cmon
s9s controller --enable-cmon-ha
cmon2:
systemctl start cmon
sed -i '/pidof/ s/# *//' /etc/cron.d/cmon
cmon3:
systemctl start cmon
sed -i '/pidof/ s/# *//' /etc/cron.d/cmon
13) Verify if CMON HA can see all nodes in the cluster:
$ s9s controller --list --long
S VERSION OWNER GROUP NAME IP PORT COMMENT
l 1.9.6.6408 system admins 100.18.98.75 100.18.98.75 9501 CmonHA just become enabled, starting as leader.
f 1.9.6.6408 system admins 35.208.13.166 35.208.13.166 9501 Responding to heartbeats.
f 1.9.6.6408 system admins 202.131.17.88 202.131.17.88 9501 Responding to heartbeats.
The leftmost column indicates the controller role, l
means leader, and f
means follower. In the above output, cmon1 is the leader.
14) Open ClusterControl GUI of the leader nodes via web browser by going to https://{leader_host_ip_address}/clustercontrol
and create a new admin user. In this particular example, the ClusterControl GUI URL should be https://100.18.98.75/clustercontrol
(the leader controller). After creating the admin user, you will be redirected to the ClusterControl dashboard panel where you can start managing your database clusters. See User Guide GUI or User Guide GUI v2.
Connecting via ClusterControl GUI
When running in CMON HA mode, only one ClusterControl Controller is active (leader), and the rest will be followers. If you are using ClusterControl GUI to access the controller, you will get the following error if the corresponding host is not a leader:
The above error means that the ClusterControl Controller of this host will not serve the incoming requests coming from this particular GUI and it returns a redirection warning instead. To determine which node is the leader, kindly use the ClusterControl CLI as below:
$ s9s controller --list --long
S VERSION OWNER GROUP NAME IP PORT COMMENT
l 1.9.6.6408 system admins 100.18.98.75 100.18.98.75 9501 CmonHA just become enabled, starting as leader.
f 1.9.6.6408 system admins 35.208.13.166 35.208.13.166 9501 Responding to heartbeats.
f 1.9.6.6408 system admins 202.131.17.88 202.131.17.88 9501 Responding to heartbeats.
ClusterControl CLI has the ability to follow redirects so you may execute the above command on any controller node’s terminal as long as the node is in the same cluster.