Skip to content

Redundancy and High Availability

ClusterControl can be deployed in a couple of different ways for redundancy and high availability:

  1. Secondary standby - Acts as a hot standby in case the primary ClusterControl host goes down.
  2. CMON HA - Build a cluster of CluterControl controllers to achieve high availability.

Secondary standby

It is possible to have more than one ClusterControl server to monitor a single cluster. This is useful if you have a multi-datacenter cluster and you may need to have ClusterControl on the remote site to monitor and manage the alive nodes if the connection between them goes down. However, ClusterControl servers must be configured to be working in active-passive mode to avoid race conditions when recovering failed nodes or clusters.

In the active mode, the ClusterControl node acts as a primary controller, where it performs automatic recovery and management activities. Therefore the primary controller Cluster/Node Auto Recovery must be set to on. The secondary ClusterControl node however must be configured with Cluster/Node Auto Recovery turned off.

Installing standby server

The steps described in this section must be performed on the secondary ClusterControl server.

  1. Install ClusterControl as explained on the Quickstart.
  2. Import the same cluster via ClusterControl GUI → Deploy a cluster → Import a database cluster. Ensure to toggle off Cluster auto-recovery and Node auto-recovery in the Node configuration section. Repeat this step if you want to import more than one cluster.

    Example

    ClusterControl logo

  3. Set up the cluster configuration accordingly to follow similar settings with the primary ClusterControl (backup schedules, alerting configuration, user roles, etc).

Nothing should be performed on the primary side. The primary ClusterControl server shall perform automatic recovery in case of node or cluster failure. Use the secondary ClusterControl server for monitoring purposes only. For management and recovery purposes like rebuilding the replication, resyncing the node, backup and restore, perform those activities on the primary ClusterControl server.

Info

You don't need an additional ClusterControl license for multiple ClusterControl instances. You can apply the same license as your primary ClusterControl server onto the secondary server. The license is bounded on the number of database/load balancer nodes it manages.

Activating the secondary standby

If you want to make the standby server runs in the active mode, you must do the following:

  1. If the primary ClusterControl server is still alive, stop the primary ClusterControl controller services, or shutdown the server. To stop all ClusterControl processes, run the following command on the primary ClusterControl server:

    systemctl stop cmon cmon-cloud cmon-ssh cmon-events
    
  2. Toggle on Cluster auto-recovery and Node auto recovery on the secondary ClusterControl server.

    Example

    ClusterControl logo

At this point, the standby server has taken over the primary role and you can perform the management activities on the database nodes or clusters.

Attention

Do not let two or more ClusterControl instances perform automatic recovery to the same cluster at a given time.

CMON HA

ClusterControl CMON HA is an extension to ClusterControl's controller backend. It forms a cluster of ClusterControl nodes and then ensure that there is always a node that will be able to deal with the cluster management. This solution is also known as CMON HA, to build clusters of ClusterControl controller services to achieve high availability.

Requirements

In order to deploy a cluster of ClusterControl nodes, the following requirements must be met:

  • CMON HA is only available on ClusterControl v1.9.6 and later, with an Enterprise license.
  • A minimum of 3 ClusterControl nodes is required.
  • A multi-master MySQL/MariaDB solution, e.g., MySQL/MariaDB Galera Cluster to host the cmon database on each ClusterControl Controller host.
  • All ClusterControl Controllers (cmon) must be able to communicate and "see" each other (including the Galera Cluster). If they communicate over WAN, this can be achieved by using site-to-site VPN (if ClusterControl controllers are configured with private IP addresses), point-to-point tunneling (e.g, WireGuard, AutoSSH tunneling, PPTP), cloud VPC peering, private connectivity (e.g, AWS DirectConnect, Azure ExpressRoute, GCP Interlink) or direct public IP address communication over WAN. See the Deployment Steps section for ports to be allowed.
  • Any of the ClusterControl Controllers must not manage nor monitor database clusters when activating the CMON HA. Otherwise, you must re-import them back again into ClusterControl.
  • ClusterControl CMON HA feature must be enabled via ClusterControl CLI, otherwise, ClusterControl will just act as a standalone controller.

Cluster operation

ClusterControl CMON HA is a high-availability extension to ClusterControl's controller backend. It uses RAFT protocol for member election and keeping configuration files in sync. All the CMON HA nodes connect to the same database cluster system for its database, and this database cluster is not managed by the CMON HA.

In RAFT, there is one leader at a given time and the rest of the nodes are the followers. The leader sends heartbeats to the followers. If a follower does not get a heartbeat in time, it sets itself into a candidate state and starts an election. The candidate always votes for itself, so to have a majority, at least half of the remaining nodes must vote for it. If the election is won, the node sets itself to leader status and starts to send heartbeats.

There are some important considerations one has to know when running CMON HA:

  • Limited failure tolerance - The RAFT protocol tolerates only one node failure in a 3-node cluster. For having two-node failure tolerance, the set needs at least 5 nodes. For example, in a 3-node CMON HA cluster, if two nodes lose network connection, there can not be a majority during the election requested by any of the nodes.
  • Extension to avoid split brain - There is protection against split brain in ClusterControl. If the heartbeat is not accepted by at least half of the followers, then a network partition is assumed and the leader recognized as being in the minority will step down to be a follower. At the same time, the followers being in the majority in another network partition will elect a new leader if the heartbeat is not received. If there is no majority in any of the network partitions, there will be no leader elected.
  • Every ClusterControl controller connects to the cmon database locally, where the connection string is 127.0.0.1 (see the mysql_hostname value inside /etc/cmon.cnf). Therefore, every ClusterControl node must have a MySQL/MariaDB database service and must be part of the Galera Cluster members.
  • Galera special behavior - It is good to keep in mind, that in the case of a 3-node Galera Cluster, which is recommended to store the cmon database, if none of the nodes can reach the other, the Galera Cluster will become operating only if all the nodes are recovered and the cluster is complete again. As long as the Galera cluster is not recovered, neither can the CMON HA cluster work properly.

As this is a new feature, there are a number of known limitations:

  • No automatic CMON HA automatic deployment support. Installation and deployment are shown below.
  • The database cluster storing the cmon database is not managed and is expected not to be managed by the CMON HA instance that uses it as a data storage backend.
  • There is no way to force an election between CMON HA nodes.
  • There are no priorities amongst the CMON HA node to be one or the other the preferred leader.
  • There is no way to manually choose a CMON HA node to be a leader.
  • At the moment, ClusterControl GUI has no automatic redirection to the leader. You will get a redirection error if you are login to a controller that is in the follower state. See Connecting via ClusterControl GUI.

Deployment steps

There can be many ways to install the CMON HA cluster. One could use another ClusterControl install to install a Galera node and use it as the cmon database for the CMON HA cluster, or use the ClusterControl installer script and convert the MySQL/MariaDB installation to a Galera Cluster. The deployment steps explained in this article will be based on the latter.

In this example, we will deploy a 3-node CMON HA cluster including a MariaDB Galera Cluster as the cmon database backend. Assume we have 3 hosts capable of communicating via public IP addresses:

  • 100.18.98.75 - cmon1 (Site A)
  • 35.208.13.166 - cmon2 (Site B)
  • 202.131.17.88 - cmon3 (Site C)

The high-level architecture diagram will look like this:

architecture-beta
    group dc1[Site A]
    group dc2[Site B]
    group dc3[Site C]

    service cmon1(server)[cmon1] in dc1
    service cmon2(server)[cmon2] in dc2
    service cmon3(server)[cmon3] in dc3
    service cloud1(cloud)
    service cloud2(cloud)
    service cloud3(cloud)

    cmon1:R -- L:cloud1
    cloud1:R -- L:cmon2
    cmon2:B -- T:cloud2
    cloud2:L -- R:cmon3
    cmon3:L -- R:cloud3
    cloud3:T -- B:cmon1

Attention

ClusterControl nodes must be able to communicate with each other on the following ports:

  • tcp/9500 - CMON RPC
  • tcp/9501 - CMON RPC (TLS)
  • tcp/3306 - MySQL/MariaDB Galera Cluster - database connections
  • tcp/4567 - MySQL/MariaDB Galera Cluster - gcomm
  • tcp/4568 - MySQL/MariaDB Galera Cluster - IST
  • tcp/4444 - MySQL/MariaDB Galera Cluster - SST

The following steps are based on Rocky Linux 8 64bit. We expect similar steps for other similar RHEL-based OS distributions. Execute the commands on all 3 CMON HA nodes unless specified otherwise.

  1. Prepare the host and download the installer script:

    dnf install -y wget vim epel-release curl net-tools sysstat mlocate
    wget https://severalnines.com/downloads/cmon/install-cc
    chmod 755 install-cc
    
  2. Perform ClusterControl installation using the install-cc script (we use a one-liner method and define the public IP address as the host value):

    HOST=100.18.98.75 \
    S9S_CMON_PASSWORD='q1w2e3!@#' \
    S9S_ROOT_PASSWORD='q1w2e3!@#' \
    S9S_DB_PORT=3306 \
    ./install-cc
    
    HOST=35.208.13.166 \
    S9S_CMON_PASSWORD='q1w2e3!@#' \
    S9S_ROOT_PASSWORD='q1w2e3!@#' \
    S9S_DB_PORT=3306 \
    ./install-cc
    
    HOST=202.131.17.88 \
    S9S_CMON_PASSWORD='q1w2e3!@#' \
    S9S_ROOT_PASSWORD='q1w2e3!@#' \
    S9S_DB_PORT=3306 \
    ./install-cc
    

    Note

    S9S_ROOT_PASSWORD is the MySQL root user password where the cmon database is hosted, while S9S_CMON_PASSWORD is the cmon database user password.

  3. Once the installation completes on all nodes, we have to stop the cmon service for CMON HA preparation:

    systemctl stop cmon
    
  4. Comment cmon cron job temporarily to make sure cmon will not be started automatically (we will enable it back later):

    sed -i '/^\*.*pidof/s/^/# /' /etc/cron.d/cmon
    
  5. Run the following command to set the ClusterControl Controller service to listen to all IP addresses:

    echo 'RPC_BIND_ADDRESSES="0.0.0.0"' | sudo tee -a /etc/default/cmon
    
  6. Stop the MariaDB service:

    systemctl stop mariadb
    
  7. Convert the default MariaDB 10.3 installed by the installer script to MariaDB Galera Cluster:

    dnf install -y mariadb-server-galera mariadb-backup
    

    Attention

    This command will install the necessary packages to run a Galera Cluster like the Galera replication library, some dependencies and also backup/restore tools.

  8. Copy cmon_password value inside /etc/s9s.conf on cmon1 to all nodes (we will use cmon1 as the reference point):

    cmon_password = "d30bc09b-ecd4-4812-8b56-15e6dd524dee"
    
  9. Set the following lines inside /etc/my.cnf under the [mysqld] directive:

    wsrep_on               = ON
    wsrep_node_address     = 100.18.98.75   # cmon1 primary IP address
    wsrep_provider         = '/usr/lib64/galera/libgalera_smm.so'
    wsrep_provider_options = 'gcache.size=1024M;gmcast.segment=0'
    wsrep_cluster_address  = gcomm://100.18.98.75,35.208.13.166,202.131.17.88   # All nodes' IP addresses
    wsrep_cluster_name     = 'CMON_HA_Galera'
    wsrep_sst_method       = rsync
    binlog_format          = 'ROW'
    
    wsre  p_on               = ON
    wsrep_node_address     = 35.208.13.166   # cmon2 primary IP address
    wsrep_provider         = '/usr/lib64/galera/libgalera_smm.so'
    wsrep_provider_options = 'gcache.size=1024M;gmcast.segment=0'
    wsrep_cluster_address  = gcomm://100.18.98.75,35.208.13.166,202.131.17.88   # All nodes' IP addresses
    wsrep_cluster_name     = 'CMON_HA_Galera'
    wsrep_sst_method       = rsync
    binlog_format          = 'ROW'
    
    wsrep_on               = ON
    wsrep_node_address     = 202.131.17.88   # cmon3 primary IP address
    wsrep_provider         = '/usr/lib64/galera/libgalera_smm.so'
    wsrep_provider_options = 'gcache.size=1024M;gmcast.segment=0'
    wsrep_cluster_address  = gcomm://100.18.98.75,35.208.13.166,202.131.17.88   # All nodes' IP addresses
    wsrep_cluster_name     = 'CMON_HA_Galera'
    wsrep_sst_method       = rsync
    binlog_format          = 'ROW'
    
  10. Bootstrap the Galera cluster on the first node only, cmon1:

    galera_new_cluster
    
  11. On the remaining nodes (cmon2 and cmon3), remove the grastate.dat file to force an SST (full syncing) from cmon1 and start the MariaDB Galera service (one node at a time):

    rm -f /var/lib/mysql/grastate.dat
    systemctl start mariadb
    
  12. Verify the Galera Cluster is communicating correctly. On all nodes, you should see the following:

    $ mysql -uroot -p -e "show status like 'wsrep%'"
    ...
    | wsrep_cluster_size            | 3                                                 |
    | wsrep_cluster_status          | Primary                                           |
    | wsrep_ready                   | ON                                                |
    | wsrep_local_state_comment     | Synced                                            |
    ...
    

    Warning

    Do not proceed to the next step until you get the same output as above. The cluster status must be Primary and Synced, with the correct cluster size (total number of nodes in a cluster).

  13. Now we are ready to start the cmon service, enable back the cmon cron, and activate CMON HA on the first node (only proceed to the next node after all commands are successful):

    systemctl start cmon
    sed -i '/pidof/ s/# *//' /etc/cron.d/cmon
    s9s controller --enable-cmon-ha
    
    systemctl start cmon
    sed -i '/pidof/ s/# *//' /etc/cron.d/cmon
    
    systemctl start cmon
    sed -i '/pidof/ s/# *//' /etc/cron.d/cmon
    
  14. Verify if CMON HA can see all nodes in the cluster:

    $ s9s controller --list --long
    S VERSION    OWNER  GROUP  NAME           IP            PORT COMMENT
    l 1.9.6.6408 system admins 100.18.98.75   100.18.98.75  9501 CmonHA just become enabled, starting as leader.
    f 1.9.6.6408 system admins 35.208.13.166  35.208.13.166 9501 Responding to heartbeats.
    f 1.9.6.6408 system admins 202.131.17.88  202.131.17.88 9501 Responding to heartbeats.
    

    The leftmost column indicates the controller role, l means leader, and f means follower. In the above output, cmon1 is the leader.

  15. Open ClusterControl GUI of the leader node via web browser by going to https://{leader_host_ip_address}/ and create a new admin user. In this particular example, the ClusterControl GUI URL should be https://100.18.98.75/ (the leader controller). After creating the admin user, you will be redirected to the ClusterControl dashboard panel where you can start managing your database clusters. See User Guide.

The following steps are based on Ubuntu 22.04 LTS 64bit (Jammy Jellyfish). We expect similar steps for other similar Debian-based OS distributions. Execute the commands on all 3 CMON HA nodes unless specified otherwise.

  1. Prepare the host and download the installer script:

    apt install -y wget vim curl net-tools sysstat mlocate
    wget https://severalnines.com/downloads/cmon/install-cc
    chmod 755 install-cc
    
  2. Before running the installer script, we have to modify it to install MariaDB server/client instead (otherwise, the installer script will default to MySQL 8.0 installation available in the repository):

    sed -i 's/mysql-server/mariadb-server/g' install-cc
    sed -i 's/mysql-client/mariadb-client/g' install-cc
    
  3. Perform ClusterControl installation using the install-cc script (we use a one-liner method and define the public IP address as the host value):

    HOST=100.18.98.75 \
    S9S_CMON_PASSWORD='q1w2e3!@#' \
    S9S_ROOT_PASSWORD='q1w2e3!@#' \
    S9S_DB_PORT=3306 \
    ./install-cc
    
    HOST=35.208.13.166 \
    S9S_CMON_PASSWORD='q1w2e3!@#' \
    S9S_ROOT_PASSWORD='q1w2e3!@#' \
    S9S_DB_PORT=3306 \
    ./install-cc
    
    HOST=202.131.17.88 \
    S9S_CMON_PASSWORD='q1w2e3!@#' \
    S9S_ROOT_PASSWORD='q1w2e3!@#' \
    S9S_DB_PORT=3306 \
    ./install-cc
    

    Note

    S9S_ROOT_PASSWORD is the MySQL root user password where the cmon database is hosted, while S9S_CMON_PASSWORD is the cmon database user password.

    Attention

    After the modification on step 2, this script will install the necessary packages for MariaDB 10.6 where Galera Cluster is included.

  4. Once the installation completes on all nodes, we have to stop the cmon service for CMON HA preparation:

    systemctl stop cmon
    
  5. Comment cmon cron job temporarily to make sure cmon will not be started automatically (we will enable it back later):

    sed -i '/^\*.*pidof/s/^/# /' /etc/cron.d/cmon
    
  6. Run the following command to set the ClusterControl Controller service to listen to all IP addresses:

    echo 'RPC_BIND_ADDRESSES="0.0.0.0"' | sudo tee -a /etc/default/cmon
    
  7. Stop the MariaDB service:

    systemctl stop mariadb
    
  8. Copy cmon_password value inside /etc/s9s.conf on cmon1 to all nodes (we will use cmon1 as the reference node):

    cmon_password = "d30bc09b-ecd4-4812-8b56-15e6dd524dee"
    
  9. Set the following lines inside /etc/my.cnf under the [mysqld] directive:

    wsrep_on               = ON
    wsrep_node_address     = 100.18.98.75   # cmon1 primary IP address
    wsrep_provider         = '/usr/lib/galera/libgalera_smm.so'
    wsrep_provider_options = 'gcache.size=1024M;gmcast.segment=0'
    wsrep_cluster_address  = gcomm://100.18.98.75,35.208.13.166,202.131.17.88   # All nodes' IP addresses
    wsrep_cluster_name     = 'CMON_HA_Galera'
    wsrep_sst_method       = rsync
    binlog_format          = 'ROW'
    
    wsrep_on               = ON
    wsrep_node_address     = 35.208.13.166   # cmon2 primary IP address
    wsrep_provider         = '/usr/lib/galera/libgalera_smm.so'
    wsrep_provider_options = 'gcache.size=1024M;gmcast.segment=0'
    wsrep_cluster_address  = gcomm://100.18.98.75,35.208.13.166,202.131.17.88   # All nodes' IP addresses
    wsrep_cluster_name     = 'CMON_HA_Galera'
    wsrep_sst_method       = rsync
    binlog_format          = 'ROW'
    
    wsrep_on               = ON
    wsrep_node_address     = 202.131.17.88   # cmon3 primary IP address
    wsrep_provider         = '/usr/lib/galera/libgalera_smm.so'
    wsrep_provider_options = 'gcache.size=1024M;gmcast.segment=0'
    wsrep_cluster_address  = gcomm://100.18.98.75,35.208.13.166,202.131.17.88   # All nodes' IP addresses
    wsrep_cluster_name     = 'CMON_HA_Galera'
    wsrep_sst_method       = rsync
    binlog_format          = 'ROW'
    
  10. Bootstrap the Galera cluster on the first node only, cmon1:

    galera_new_cluster
    
  11. On the remaining nodes (cmon2 and cmon3), remove the grastate.dat file to force an SST (full syncing) from cmon1 and start the MariaDB Galera service (one node at a time):

    rm -f /var/lib/mysql/grastate.dat
    systemctl start mysql
    
  12. Verify the Galera Cluster is communicating correctly. On all nodes, you should see the following:

    $ mysql -uroot -p -e "show status like 'wsrep%'"
    ...
    | wsrep_cluster_size            | 3                                                 |
    | wsrep_cluster_status          | Primary                                           |
    | wsrep_ready                   | ON                                                |
    | wsrep_local_state_comment     | Synced                                            |
    ...
    

    Warning

    Do not proceed to the next step until you get the same output as above. The cluster status must be Primary and Synced, with the correct cluster size (total number of nodes in a cluster).

  13. Now we are ready to start the cmon service, enable back the cmon cron, and activate CMON HA on the first node (only proceed to the next node after all commands are successful):

    systemctl start cmon
    sed -i '/pidof/ s/# *//' /etc/cron.d/cmon
    s9s controller --enable-cmon-ha
    
    systemctl start cmon
    sed -i '/pidof/ s/# *//' /etc/cron.d/cmon
    
    systemctl start cmon
    sed -i '/pidof/ s/# *//' /etc/cron.d/cmon
    
  14. Verify if CMON HA can see all nodes in the cluster:

    $ s9s controller --list --long
    S VERSION    OWNER  GROUP  NAME           IP            PORT COMMENT
    l 1.9.6.6408 system admins 100.18.98.75   100.18.98.75  9501 CmonHA just become enabled, starting as leader.
    f 1.9.6.6408 system admins 35.208.13.166  35.208.13.166 9501 Responding to heartbeats.
    f 1.9.6.6408 system admins 202.131.17.88  202.131.17.88 9501 Responding to heartbeats.
    

    The leftmost column indicates the controller role, l means leader, and f means follower. In the above output, cmon1 is the leader.

  15. Open ClusterControl GUI of the leader node via web browser by going to https://{leader_host_ip_address}/ and create a new admin user. In this particular example, the ClusterControl GUI URL should be https://100.18.98.75/ (the leader controller). After creating the admin user, you will be redirected to the ClusterControl dashboard panel where you can start managing your database clusters. See User Guide.

Connecting via ClusterControl GUI

When running in CMON HA mode, only one ClusterControl Controller is active (leader), and the rest will be followers. If you are using ClusterControl GUI to access the controller, you will get an error if the corresponding host is not a leader.

The error means that the ClusterControl Controller of this host will not serve the incoming requests coming from this particular GUI and it returns a redirection warning instead. To determine which node is the leader, kindly use the ClusterControl CLI as below:

$ s9s controller --list --long
S VERSION    OWNER  GROUP  NAME           IP            PORT COMMENT
l 1.9.6.6408 system admins 100.18.98.75   100.18.98.75  9501 CmonHA just become enabled, starting as leader.
f 1.9.6.6408 system admins 35.208.13.166  35.208.13.166 9501 Responding to heartbeats.
f 1.9.6.6408 system admins 202.131.17.88  202.131.17.88 9501 Responding to heartbeats.

ClusterControl CLI has the ability to follow redirects so you may execute the above command on any controller node's terminal as long as the node is in the same cluster.