1. Home
  2. Docs
  3. ClusterControl
  4. Administration
  5. High Availability ClusterControl (CMON HA)

High Availability ClusterControl (CMON HA)

ClusterControl can be deployed in a couple of different ways, one is to form a cluster of ClusterControl nodes and then ensure that there is always a node that will be able to deal with the cluster management. We called this solution CMON HA, to build highly available clusters of cmon daemons to achieve ClusterControl high availability. There is another solution called ClusterControl secondary standby host which acts as a hot standby in case the primary ClusterControl host goes down.

Requirements

In order to deploy a cluster of ClusterControl nodes, the following requirements must be met:

  • CMON HA is only available on ClusterControl v1.9.6 and later.
  • A minimum of 3 ClusterControl nodes is required.
  • A multi-master MySQL/MariaDB solution, e.g., MySQL/MariaDB Galera Cluster to host the cmon database on each ClusterControl Controller host.
  • All ClusterControl Controllers (cmon) must be able to communicate and “see” each other (including the Galera Cluster). If they communicate over WAN, this can be achieved by using site-to-site VPN (if ClusterControl controllers are configured with private IP addresses), point-to-point tunneling (e.g, WireGuard, AutoSSH tunneling, PPTP), cloud VPC peering, private connectivity (e.g, AWS DirectConnect, Azure ExpressRoute, GCP Interlink) or direct public IP address communication over WAN. See the Deployment Steps section for ports to be allowed.
  • Any of the ClusterControl Controllers must not manage nor monitor database clusters when activating the CMON HA. Otherwise, you must re-import them back again into ClusterControl.
  • ClusterControl CMON HA feature must be enabled via ClusterControl CLI, otherwise, ClusterControl will just act as a standalone controller.

CMON Cluster Operation

ClusterControl CMON HA is a high-availability extension to ClusterControl’s controller backend. It uses RAFT protocol for member election and keeping configuration files in sync. All the CMON HA nodes connect to the same database cluster system for the cmon database, and this database cluster is not managed by the CMON HA.

In RAFT, there is one leader at a given time and the rest of the nodes are the followers. The leader sends heartbeats to the followers. If a follower does not get a heartbeat in time, it sets itself into a candidate state and starts an election. The candidate always votes for itself, so to have a majority, at least half of the remaining nodes must vote for it. If the election is won, the node sets itself to leader status and starts to send heartbeats.

There are some important considerations one has to know when running CMON HA:

  • Limited failure tolerance – The RAFT protocol tolerates only one node failure in a 3-node cluster. For having two-node failure tolerance, the set needs at least 5 nodes. For example, in a 3-node CMON HA cluster, if two nodes lose network connection, there can not be a majority during the election requested by any of the nodes.
  • Extension to avoid split brain – There is protection against split brain in Cluster Control. If the heartbeat is not accepted by at least half of the followers, then a network partition is assumed and the leader recognized as being in the minority will step down to be a follower. At the same time, the followers being in the majority in another network partition will elect a new leader if the heartbeat is not received. If there is no majority in any of the network partitions, there will be no leader elected.
  • Every ClusterControl controller connects to the cmon database locally, where the connection string is 127.0.0.1 (see the mysql_hostname value inside /etc/cmon.cnf). Therefore, every ClusterControl node must have a MySQL/MariaDB database service and must be part of the Galera Cluster members.
  • Galera special behavior – It is good to keep in mind, that in the case of a 3-node Galera Cluster, which is recommended to store the cmon database, if none of the nodes can reach the other, the Galera Cluster will become operating only if all the nodes are recovered and the cluster is complete again. As long as the Galera cluster is not recovered, neither can the CMON HA cluster work properly.

As this is a new feature, there are a number of known limitations:

  • No automatic CMON HA automatic deployment support. Installation and deployment are shown below.
  • The database cluster storing the cmon database is not managed and is expected not to be managed by the CMON HA instance that uses it as a data storage backend.
  • There is no way to force an election between CMON HA nodes.
  • There are no priorities amongst the CMON HA node to be one or the other the preferred leader.
  • There is no way to manually choose a CMON HA node to be a leader.
  • At the moment, ClusterControl GUI has no automatic redirection to the leader. You will get a redirection error if you are login to a controller that is in the follower state. See Connecting to ClusterControl GUI.

Deployment Steps

There can be many ways to install the CMON HA cluster. One could use another ClusterControl install to install a Galera node and use it as the cmon database for the CMON HA cluster, or use the ClusterControl installer script and convert the MySQL/MariaDB installation to a Galera Cluster. The deployment steps explained in this article will be based on the latter.

In this example, we will deploy a 3-node CMON HA cluster including a MariaDB Galera Cluster as the cmon database backend. Assume we have 3 hosts capable of communicating via public IP addresses:

  • 100.18.98.75 – cmon1 (Site A)
  • 35.208.13.166 – cmon2 (Site B)
  • 202.131.17.88 – cmon3 (Site C)
Attention

ClusterControl nodes must be able to communicate with each other on the following ports:

  • tcp/9500 – CMON RPC
  • tcp/9501 – CMON RPC (TLS)
  • tcp/3306 – MySQL/MariaDB Galera Cluster – database connections
  • tcp/4567 – MySQL/MariaDB Galera Cluster – gcomm
  • tcp/4568 – MySQL/MariaDB Galera Cluster – IST
  • tcp/4444 – MySQL/MariaDB Galera Cluster – SST
Red Hat/Rocky LinuxDebian/Ubuntu

The following steps are based on Rocky Linux 8 64bit. We expect similar steps for other similar RHEL-based OS distributions. Execute the commands on all 3 CMON HA nodes unless specified otherwise.

1) Prepare the host and download the installer script:

dnf install -y wget vim epel-release curl net-tools sysstat mlocate
wget https://severalnines.com/downloads/cmon/install-cc
chmod 755 install-cc

2) Perform ClusterControl installation using the install-cc script (we use a one-liner method and define the public IP address as the host value):

cmon1:

HOST=100.18.98.75 \
S9S_CMON_PASSWORD='q1w2e3!@#' \ 
S9S_ROOT_PASSWORD='q1w2e3!@#' \
S9S_DB_PORT=3306 \
./install-cc

cmon2:

HOST=35.208.13.166 \
S9S_CMON_PASSWORD='q1w2e3!@#' \ 
S9S_ROOT_PASSWORD='q1w2e3!@#' \
S9S_DB_PORT=3306 \
./install-cc

cmon3:

HOST=202.131.17.88 \
S9S_CMON_PASSWORD='q1w2e3!@#' \ 
S9S_ROOT_PASSWORD='q1w2e3!@#' \
S9S_DB_PORT=3306 \
./install-cc
Note

S9S_ROOT_PASSWORD is the MySQL root user password where the cmon database is hosted, while S9S_CMON_PASSWORD is the cmon database user password.

3) Once the installation completes on all nodes, we have to stop the cmon service for CMON HA preparation:

systemctl stop cmon

4) Comment cmon cron job temporarily to make sure cmon will not be started automatically (we will enable it back later):

sed -i '/^\*.*pidof/s/^/# /' /etc/cron.d/cmon

5) Run the following command to set the ClusterControl Controller service to listen to all IP addresses:

echo 'RPC_BIND_ADDRESSES="0.0.0.0"' | sudo tee -a /etc/default/cmon

6) Stop the MariaDB service:

systemctl stop mariadb

7) Convert the default MariaDB 10.3 installed by the installer script to MariaDB Galera Cluster:

dnf install -y mariadb-server-galera mariadb-backup
Attention

This command will install the necessary packages to run a Galera Cluster like the Galera replication library, some dependencies and also backup/restore tools.

8) Copy cmon_password value inside /etc/s9s.conf on cmon1 to all nodes (we will use cmon1 as the reference point):

cmon_password = "d30bc09b-ecd4-4812-8b56-15e6dd524dee"

9) Set the following lines inside /etc/my.cnf under the [mysqld] directive:

cmon1:

wsrep_on               = ON
wsrep_node_address     = 100.18.98.75   # cmon1 primary IP address
wsrep_provider         = '/usr/lib64/galera/libgalera_smm.so'
wsrep_provider_options = 'gcache.size=1024M;gmcast.segment=0'
wsrep_cluster_address  = gcomm://100.18.98.75,35.208.13.166,202.131.17.88   # All nodes' IP addresses
wsrep_cluster_name     = 'CMON_HA_Galera'
wsrep_sst_method       = rsync
binlog_format          = 'ROW'

cmon2:

wsrep_on               = ON
wsrep_node_address     = 35.208.13.166   # cmon2 primary IP address
wsrep_provider         = '/usr/lib64/galera/libgalera_smm.so'
wsrep_provider_options = 'gcache.size=1024M;gmcast.segment=0'
wsrep_cluster_address  = gcomm://100.18.98.75,35.208.13.166,202.131.17.88   # All nodes' IP addresses
wsrep_cluster_name     = 'CMON_HA_Galera'
wsrep_sst_method       = rsync
binlog_format          = 'ROW'

cmon3:

wsrep_on               = ON
wsrep_node_address     = 202.131.17.88   # cmon3 primary IP address
wsrep_provider         = '/usr/lib64/galera/libgalera_smm.so'
wsrep_provider_options = 'gcache.size=1024M;gmcast.segment=0'
wsrep_cluster_address  = gcomm://100.18.98.75,35.208.13.166,202.131.17.88   # All nodes' IP addresses
wsrep_cluster_name     = 'CMON_HA_Galera'
wsrep_sst_method       = rsync
binlog_format          = 'ROW'

10) Bootstrap the Galera cluster on the first node only, cmon1:

galera_new_cluster

11) On the remaining nodes (cmon2 and cmon3), remove the grastate.dat file to force an SST (full syncing) from cmon1 and start the MariaDB Galera service (one node at a time):

rm -f /var/lib/mysql/grastate.dat
systemctl start mariadb

12) Verify the Galera Cluster is communicating correctly. On all nodes, you should see the following:

$ mysql -uroot -p -e "show status like 'wsrep%'"
...
| wsrep_cluster_size            | 3                                                 |
| wsrep_cluster_status          | Primary                                           |
| wsrep_ready                   | ON                                                |
| wsrep_local_state_comment     | Synced                                            |
...
Warning

Do not proceed to the next step until you get the same output as above. The cluster status must be Primary and Synced, with the correct cluster size (total number of nodes in a cluster).

13) Now we are ready to start the cmon service, enable back the cmon cron, and activate CMON HA on the first node (only proceed to the next node after all commands are successful):

cmon1:

systemctl start cmon
sed -i '/pidof/ s/# *//' /etc/cron.d/cmon
s9s controller --enable-cmon-ha

cmon2:

systemctl start cmon
sed -i '/pidof/ s/# *//' /etc/cron.d/cmon

cmon3:

systemctl start cmon
sed -i '/pidof/ s/# *//' /etc/cron.d/cmon

14) Verify if CMON HA can see all nodes in the cluster:

$ s9s controller --list --long
S VERSION    OWNER  GROUP  NAME           IP            PORT COMMENT
l 1.9.6.6408 system admins 100.18.98.75   100.18.98.75  9501 CmonHA just become enabled, starting as leader.
f 1.9.6.6408 system admins 35.208.13.166  35.208.13.166 9501 Responding to heartbeats.
f 1.9.6.6408 system admins 202.131.17.88  202.131.17.88 9501 Responding to heartbeats.

The leftmost column indicates the controller role, l means leader, and f means follower. In the above output, cmon1 is the leader.

15) Open ClusterControl GUI of the leader nodes via web browser by going to https://{leader_host_ip_address}/clustercontrol and create a new admin user. In this particular example, the ClusterControl GUI URL should be https://100.18.98.75/clustercontrol (the leader controller). After creating the admin user, you will be redirected to the ClusterControl dashboard panel where you can start managing your database clusters. See User Guide GUI or User Guide GUI v2.

The following steps are based on Ubuntu 22.04 LTS 64bit (Jammy Jellyfish). We expect similar steps for other similar Debian-based OS distributions. Execute the commands on all 3 CMON HA nodes unless specified otherwise.

1) Prepare the host and download the installer script:

apt install -y wget vim curl net-tools sysstat mlocate
wget https://severalnines.com/downloads/cmon/install-cc
chmod 755 install-cc

2) Before running the installer script, we have to modify it to install MariaDB server/client instead (otherwise, the installer script will default to MySQL 8.0 installation available in the repository):

sed -i 's/mysql-server/mariadb-server/g' install-cc
sed -i 's/mysql-client/mariadb-client/g' install-cc

3) Perform ClusterControl installation using the install-cc script (we use a one-liner method and define the public IP address as the host value):

cmon1:

HOST=100.18.98.75 \
S9S_CMON_PASSWORD='q1w2e3!@#' \ 
S9S_ROOT_PASSWORD='q1w2e3!@#' \
S9S_DB_PORT=3306 \
./install-cc

cmon2:

HOST=35.208.13.166 \
S9S_CMON_PASSWORD='q1w2e3!@#' \ 
S9S_ROOT_PASSWORD='q1w2e3!@#' \
S9S_DB_PORT=3306 \
./install-cc

cmon3:

HOST=202.131.17.88 \
S9S_CMON_PASSWORD='q1w2e3!@#' \ 
S9S_ROOT_PASSWORD='q1w2e3!@#' \
S9S_DB_PORT=3306 \
./install-cc
Note

S9S_ROOT_PASSWORD is the MySQL root user password where the cmon database is hosted, while S9S_CMON_PASSWORD is the cmon database user password.

Attention

After the modification on step 2, this script will install the necessary packages for MariaDB 10.6 where Galera Cluster is included.

4) Once the installation completes on all nodes, we have to stop the cmon service for CMON HA preparation:

systemctl stop cmon

5) Comment cmon cron job temporarily to make sure cmon will not be started automatically (we will enable it back later):

sed -i '/^\*.*pidof/s/^/# /' /etc/cron.d/cmon

6) Run the following command to set the ClusterControl Controller service to listen to all IP addresses:

echo 'RPC_BIND_ADDRESSES="0.0.0.0"' | sudo tee -a /etc/default/cmon

7) Stop the MariaDB service:

systemctl stop mariadb

8) Copy cmon_password value inside /etc/s9s.conf on cmon1 to all nodes (we will use cmon1 as the reference node):

cmon_password = "d30bc09b-ecd4-4812-8b56-15e6dd524dee"

9) Set the following lines inside /etc/my.cnf under the [mysqld] directive:

cmon1:

wsrep_on               = ON
wsrep_node_address     = 100.18.98.75   # cmon1 primary IP address
wsrep_provider         = '/usr/lib/galera/libgalera_smm.so'
wsrep_provider_options = 'gcache.size=1024M;gmcast.segment=0'
wsrep_cluster_address  = gcomm://100.18.98.75,35.208.13.166,202.131.17.88   # All nodes' IP addresses
wsrep_cluster_name     = 'CMON_HA_Galera'
wsrep_sst_method       = rsync
binlog_format          = 'ROW'

cmon2:

wsrep_on               = ON
wsrep_node_address     = 35.208.13.166   # cmon2 primary IP address
wsrep_provider         = '/usr/lib/galera/libgalera_smm.so'
wsrep_provider_options = 'gcache.size=1024M;gmcast.segment=0'
wsrep_cluster_address  = gcomm://100.18.98.75,35.208.13.166,202.131.17.88   # All nodes' IP addresses
wsrep_cluster_name     = 'CMON_HA_Galera'
wsrep_sst_method       = rsync
binlog_format          = 'ROW'

cmon3:

wsrep_on               = ON
wsrep_node_address     = 202.131.17.88   # cmon3 primary IP address
wsrep_provider         = '/usr/lib/galera/libgalera_smm.so'
wsrep_provider_options = 'gcache.size=1024M;gmcast.segment=0'
wsrep_cluster_address  = gcomm://100.18.98.75,35.208.13.166,202.131.17.88   # All nodes' IP addresses
wsrep_cluster_name     = 'CMON_HA_Galera'
wsrep_sst_method       = rsync
binlog_format          = 'ROW'

10) Bootstrap the Galera cluster on the first node only, cmon1:

galera_new_cluster

11) On the remaining nodes (cmon2 and cmon3), remove the grastate.dat file to force an SST (full syncing) from cmon1 and start the MariaDB Galera service (one node at a time):

rm -f /var/lib/mysql/grastate.dat
systemctl start mysql

12) Verify the Galera Cluster is communicating correctly. On all nodes, you should see the following:

$ mysql -uroot -p -e "show status like 'wsrep%'"
...
| wsrep_cluster_size            | 3                                                 |
| wsrep_cluster_status          | Primary                                           |
| wsrep_ready                   | ON                                                |
| wsrep_local_state_comment     | Synced                                            |
...
Warning

Do not proceed to the next step until you get the same output as above. The cluster status must be Primary and Synced, with the correct cluster size (total number of nodes in a cluster).

13) Now we are ready to start the cmon service, enable back the cmon cron, and activate CMON HA on the first node (only proceed to the next node after all commands are successful):

cmon1:

systemctl start cmon
sed -i '/pidof/ s/# *//' /etc/cron.d/cmon
s9s controller --enable-cmon-ha

cmon2:

systemctl start cmon
sed -i '/pidof/ s/# *//' /etc/cron.d/cmon

cmon3:

systemctl start cmon
sed -i '/pidof/ s/# *//' /etc/cron.d/cmon

13) Verify if CMON HA can see all nodes in the cluster:

$ s9s controller --list --long
S VERSION    OWNER  GROUP  NAME           IP            PORT COMMENT
l 1.9.6.6408 system admins 100.18.98.75   100.18.98.75  9501 CmonHA just become enabled, starting as leader.
f 1.9.6.6408 system admins 35.208.13.166  35.208.13.166 9501 Responding to heartbeats.
f 1.9.6.6408 system admins 202.131.17.88  202.131.17.88 9501 Responding to heartbeats.

The leftmost column indicates the controller role, l means leader, and f means follower. In the above output, cmon1 is the leader.

14) Open ClusterControl GUI of the leader nodes via web browser by going to https://{leader_host_ip_address}/clustercontrol and create a new admin user. In this particular example, the ClusterControl GUI URL should be https://100.18.98.75/clustercontrol (the leader controller). After creating the admin user, you will be redirected to the ClusterControl dashboard panel where you can start managing your database clusters. See User Guide GUI or User Guide GUI v2.

Connecting via ClusterControl GUI

When running in CMON HA mode, only one ClusterControl Controller is active (leader), and the rest will be followers. If you are using ClusterControl GUI to access the controller, you will get the following error if the corresponding host is not a leader:

The above error means that the ClusterControl Controller of this host will not serve the incoming requests coming from this particular GUI and it returns a redirection warning instead. To determine which node is the leader, kindly use the ClusterControl CLI as below:

$ s9s controller --list --long
S VERSION    OWNER  GROUP  NAME           IP            PORT COMMENT
l 1.9.6.6408 system admins 100.18.98.75   100.18.98.75  9501 CmonHA just become enabled, starting as leader.
f 1.9.6.6408 system admins 35.208.13.166  35.208.13.166 9501 Responding to heartbeats.
f 1.9.6.6408 system admins 202.131.17.88  202.131.17.88 9501 Responding to heartbeats.

ClusterControl CLI has the ability to follow redirects so you may execute the above command on any controller node’s terminal as long as the node is in the same cluster.

Was this article helpful to you? Yes No