Redundancy & High Availability
ClusterControl can be deployed in a couple of different ways for redundancy and high availability:
- Secondary standby - Acts as a hot standby in case the primary ClusterControl host goes down.
- CMON HA - Build a cluster of CluterControl controllers to achieve high availability.
- CMON Controller Pool - Introduced in ClusterControl 2.3.4, the CMON Controller Pool is a new scaling method. It enables a group of CMON controllers to work together, managing database clusters with dynamic, coordinated ownership.
Secondary standby
It is possible to have more than one ClusterControl server to monitor a single cluster. This is useful if you have a multi-datacenter cluster and you may need to have ClusterControl on the remote site to monitor and manage the alive nodes if the connection between them goes down. However, ClusterControl servers must be configured to be working in active-passive mode to avoid race conditions when recovering failed nodes or clusters.
In the active mode, the ClusterControl node acts as a primary controller, where it performs automatic recovery and management activities. Therefore the primary controller Cluster/Node Auto Recovery must be set to on. The secondary ClusterControl node however must be configured with Cluster/Node Auto Recovery turned off.
Installing standby server
The steps described in this section must be performed on the secondary ClusterControl server.
- Install ClusterControl as explained on the Quickstart.
-
Import the same cluster via ClusterControl GUI → Deploy a cluster → Import a database cluster. Ensure to toggle off Cluster auto-recovery and Node auto-recovery in the Node configuration section. Repeat this step if you want to import more than one cluster.
-
Set up the cluster configuration accordingly to follow similar settings with the primary ClusterControl (backup schedules, alerting configuration, user roles, etc).
Nothing should be performed on the primary side. The primary ClusterControl server shall perform automatic recovery in case of node or cluster failure. Use the secondary ClusterControl server for monitoring purposes only. For management and recovery purposes like rebuilding the replication, resyncing the node, backup and restore, perform those activities on the primary ClusterControl server.
Info
You don't need an additional ClusterControl license for multiple ClusterControl instances. You can apply the same license as your primary ClusterControl server onto the secondary server. The license is bounded on the number of database/load balancer nodes it manages.
Activating the secondary standby
If you want to make the standby server runs in the active mode, you must do the following:
-
If the primary ClusterControl server is still alive, stop the primary ClusterControl controller services, or shutdown the server. To stop all ClusterControl processes, run the following command on the primary ClusterControl server:
-
Toggle on Cluster auto-recovery and Node auto recovery on the secondary ClusterControl server.
At this point, the standby server has taken over the primary role and you can perform the management activities on the database nodes or clusters.
Attention
Do not let two or more ClusterControl instances perform automatic recovery to the same cluster at a given time.
CMON HA
ClusterControl CMON HA is an extension to ClusterControl's controller backend. It forms a cluster of ClusterControl nodes and then ensure that there is always a node that will be able to deal with the cluster management. This solution is also known as CMON HA, to build clusters of ClusterControl controller services to achieve high availability.
Requirements
In order to deploy a cluster of ClusterControl nodes, the following requirements must be met:
- CMON HA is only available on ClusterControl v1.9.6 and later, with an Enterprise license.
- A minimum of 3 ClusterControl nodes is required.
- A multi-master MySQL/MariaDB solution, e.g., MySQL/MariaDB Galera Cluster to host the cmon database on each ClusterControl Controller host.
- All ClusterControl Controllers (cmon) must be able to communicate and "see" each other (including the Galera Cluster). If they communicate over WAN, this can be achieved by using site-to-site VPN (if ClusterControl controllers are configured with private IP addresses), point-to-point tunneling (e.g, WireGuard, AutoSSH tunneling, PPTP), cloud VPC peering, private connectivity (e.g, AWS DirectConnect, Azure ExpressRoute, GCP Interlink) or direct public IP address communication over WAN. See the Deployment Steps section for ports to be allowed.
- Any of the ClusterControl Controllers must not manage nor monitor database clusters when activating the CMON HA. Otherwise, you must re-import them back again into ClusterControl.
- ClusterControl CMON HA feature must be enabled via ClusterControl CLI, otherwise, ClusterControl will just act as a standalone controller.
Cluster operation
ClusterControl CMON HA is a high-availability extension to ClusterControl's controller backend. It uses RAFT protocol for member election and keeping configuration files in sync. All the CMON HA nodes connect to the same database cluster system for its database, and this database cluster is not managed by the CMON HA.
In RAFT, there is one leader at a given time and the rest of the nodes are the followers. The leader sends heartbeats to the followers. If a follower does not get a heartbeat in time, it sets itself into a candidate state and starts an election. The candidate always votes for itself, so to have a majority, at least half of the remaining nodes must vote for it. If the election is won, the node sets itself to leader status and starts to send heartbeats.
There are some important considerations one has to know when running CMON HA:
- Limited failure tolerance - The RAFT protocol tolerates only one node failure in a 3-node cluster. For having two-node failure tolerance, the set needs at least 5 nodes. For example, in a 3-node CMON HA cluster, if two nodes lose network connection, there can not be a majority during the election requested by any of the nodes.
- Extension to avoid split brain - There is protection against split brain in ClusterControl. If the heartbeat is not accepted by at least half of the followers, then a network partition is assumed and the leader recognized as being in the minority will step down to be a follower. At the same time, the followers being in the majority in another network partition will elect a new leader if the heartbeat is not received. If there is no majority in any of the network partitions, there will be no leader elected.
- Every ClusterControl controller connects to the cmon database locally, where the connection string is 127.0.0.1 (see the
mysql_hostnamevalue inside/etc/cmon.cnf). Therefore, every ClusterControl node must have a MySQL/MariaDB database service and must be part of the Galera Cluster members. - Galera special behavior - It is good to keep in mind, that in the case of a 3-node Galera Cluster, which is recommended to store the cmon database, if none of the nodes can reach the other, the Galera Cluster will become operating only if all the nodes are recovered and the cluster is complete again. As long as the Galera cluster is not recovered, neither can the CMON HA cluster work properly.
As this is a new feature, there are a number of known limitations:
- No automatic CMON HA automatic deployment support. Installation and deployment are shown below.
- The database cluster storing the cmon database is not managed and is expected not to be managed by the CMON HA instance that uses it as a data storage backend.
- There is no way to force an election between CMON HA nodes.
- There are no priorities amongst the CMON HA node to be one or the other the preferred leader.
- There is no way to manually choose a CMON HA node to be a leader.
- At the moment, ClusterControl GUI has no automatic redirection to the leader. You will get a redirection error if you are login to a controller that is in the follower state. See Connecting via ClusterControl GUI.
Deployment steps
There can be many ways to install the CMON HA cluster. One could use another ClusterControl install to install a Galera node and use it as the cmon database for the CMON HA cluster, or use the ClusterControl installer script and convert the MySQL/MariaDB installation to a Galera Cluster. The deployment steps explained in this article will be based on the latter.
In this example, we will deploy a 3-node CMON HA cluster including a MariaDB Galera Cluster as the cmon database backend. Assume we have 3 hosts capable of communicating via public IP addresses:
- 100.18.98.75 - cmon1 (Site A)
- 35.208.13.166 - cmon2 (Site B)
- 202.131.17.88 - cmon3 (Site C)
The high-level architecture diagram will look like this:
architecture-beta
group dc1[Site A]
group dc2[Site B]
group dc3[Site C]
service cmon1(server)[cmon1] in dc1
service cmon2(server)[cmon2] in dc2
service cmon3(server)[cmon3] in dc3
service cloud1(cloud)
service cloud2(cloud)
service cloud3(cloud)
cmon1:R -- L:cloud1
cloud1:R -- L:cmon2
cmon2:B -- T:cloud2
cloud2:L -- R:cmon3
cmon3:L -- R:cloud3
cloud3:T -- B:cmon1
Attention
ClusterControl nodes must be able to communicate with each other on the following ports:
- tcp/9500 - CMON RPC
- tcp/9501 - CMON RPC (TLS)
- tcp/3306 - MySQL/MariaDB Galera Cluster - database connections
- tcp/4567 - MySQL/MariaDB Galera Cluster - gcomm
- tcp/4568 - MySQL/MariaDB Galera Cluster - IST
- tcp/4444 - MySQL/MariaDB Galera Cluster - SST
The following steps are based on Rocky Linux 8 64bit. We expect similar steps for other similar RHEL-based OS distributions. Execute the commands on all 3 CMON HA nodes unless specified otherwise.
-
Prepare the host and download the installer script:
-
Perform ClusterControl installation using the
install-ccscript (we use a one-liner method and define the public IP address as the host value):Note
S9S_ROOT_PASSWORDis the MySQL root user password where the cmon database is hosted, whileS9S_CMON_PASSWORDis the cmon database user password. -
Once the installation completes on all nodes, we have to stop the cmon service for CMON HA preparation:
-
Comment cmon cron job temporarily to make sure cmon will not be started automatically (we will enable it back later):
-
Run the following command to set the ClusterControl Controller service to listen to all IP addresses:
-
Stop the MariaDB service:
-
Convert the default MariaDB 10.3 installed by the installer script to MariaDB Galera Cluster:
Attention
This command will install the necessary packages to run a Galera Cluster like the Galera replication library, some dependencies and also backup/restore tools.
-
Copy
cmon_passwordvalue inside/etc/s9s.confon cmon1 to all nodes (we will use cmon1 as the reference point): -
Set the following lines inside
/etc/my.cnfunder the[mysqld]directive:wsrep_on = ON wsrep_node_address = 100.18.98.75 # cmon1 primary IP address wsrep_provider = '/usr/lib64/galera/libgalera_smm.so' wsrep_provider_options = 'gcache.size=1024M;gmcast.segment=0' wsrep_cluster_address = gcomm://100.18.98.75,35.208.13.166,202.131.17.88 # All nodes' IP addresses wsrep_cluster_name = 'CMON_HA_Galera' wsrep_sst_method = rsync binlog_format = 'ROW'wsre p_on = ON wsrep_node_address = 35.208.13.166 # cmon2 primary IP address wsrep_provider = '/usr/lib64/galera/libgalera_smm.so' wsrep_provider_options = 'gcache.size=1024M;gmcast.segment=0' wsrep_cluster_address = gcomm://100.18.98.75,35.208.13.166,202.131.17.88 # All nodes' IP addresses wsrep_cluster_name = 'CMON_HA_Galera' wsrep_sst_method = rsync binlog_format = 'ROW'wsrep_on = ON wsrep_node_address = 202.131.17.88 # cmon3 primary IP address wsrep_provider = '/usr/lib64/galera/libgalera_smm.so' wsrep_provider_options = 'gcache.size=1024M;gmcast.segment=0' wsrep_cluster_address = gcomm://100.18.98.75,35.208.13.166,202.131.17.88 # All nodes' IP addresses wsrep_cluster_name = 'CMON_HA_Galera' wsrep_sst_method = rsync binlog_format = 'ROW' -
Bootstrap the Galera cluster on the first node only, cmon1:
-
On the remaining nodes (cmon2 and cmon3), remove the grastate.dat file to force an SST (full syncing) from cmon1 and start the MariaDB Galera service (one node at a time):
-
Verify the Galera Cluster is communicating correctly. On all nodes, you should see the following:
$ mysql -uroot -p -e "show status like 'wsrep%'" ... | wsrep_cluster_size | 3 | | wsrep_cluster_status | Primary | | wsrep_ready | ON | | wsrep_local_state_comment | Synced | ...Warning
Do not proceed to the next step until you get the same output as above. The cluster status must be Primary and Synced, with the correct cluster size (total number of nodes in a cluster).
-
Now we are ready to start the cmon service, enable back the cmon cron, and activate CMON HA on the first node (only proceed to the next node after all commands are successful):
-
Verify if CMON HA can see all nodes in the cluster:
$ s9s controller --list --long S VERSION OWNER GROUP NAME IP PORT COMMENT l 1.9.6.6408 system admins 100.18.98.75 100.18.98.75 9501 CmonHA just become enabled, starting as leader. f 1.9.6.6408 system admins 35.208.13.166 35.208.13.166 9501 Responding to heartbeats. f 1.9.6.6408 system admins 202.131.17.88 202.131.17.88 9501 Responding to heartbeats.The leftmost column indicates the controller role,
lmeans leader, andfmeans follower. In the above output, cmon1 is the leader. -
Open ClusterControl GUI of the leader node via web browser by going to
https://{leader_host_ip_address}/and create a new admin user. In this particular example, the ClusterControl GUI URL should behttps://100.18.98.75/(the leader controller). After creating the admin user, you will be redirected to the ClusterControl dashboard panel where you can start managing your database clusters. See User Guide.
The following steps are based on Ubuntu 22.04 LTS 64bit (Jammy Jellyfish). We expect similar steps for other similar Debian-based OS distributions. Execute the commands on all 3 CMON HA nodes unless specified otherwise.
-
Prepare the host and download the installer script:
-
Before running the installer script, we have to modify it to install MariaDB server/client instead (otherwise, the installer script will default to MySQL 8.0 installation available in the repository):
-
Perform ClusterControl installation using the
install-ccscript (we use a one-liner method and define the public IP address as the host value):Note
S9S_ROOT_PASSWORDis the MySQL root user password where the cmon database is hosted, whileS9S_CMON_PASSWORDis the cmon database user password.Attention
After the modification on step 2, this script will install the necessary packages for MariaDB 10.6 where Galera Cluster is included.
-
Once the installation completes on all nodes, we have to stop the cmon service for CMON HA preparation:
-
Comment cmon cron job temporarily to make sure cmon will not be started automatically (we will enable it back later):
-
Run the following command to set the ClusterControl Controller service to listen to all IP addresses:
-
Stop the MariaDB service:
-
Copy
cmon_passwordvalue inside/etc/s9s.confon cmon1 to all nodes (we will use cmon1 as the reference node): -
Set the following lines inside
/etc/my.cnfunder the[mysqld]directive:wsrep_on = ON wsrep_node_address = 100.18.98.75 # cmon1 primary IP address wsrep_provider = '/usr/lib/galera/libgalera_smm.so' wsrep_provider_options = 'gcache.size=1024M;gmcast.segment=0' wsrep_cluster_address = gcomm://100.18.98.75,35.208.13.166,202.131.17.88 # All nodes' IP addresses wsrep_cluster_name = 'CMON_HA_Galera' wsrep_sst_method = rsync binlog_format = 'ROW'wsrep_on = ON wsrep_node_address = 35.208.13.166 # cmon2 primary IP address wsrep_provider = '/usr/lib/galera/libgalera_smm.so' wsrep_provider_options = 'gcache.size=1024M;gmcast.segment=0' wsrep_cluster_address = gcomm://100.18.98.75,35.208.13.166,202.131.17.88 # All nodes' IP addresses wsrep_cluster_name = 'CMON_HA_Galera' wsrep_sst_method = rsync binlog_format = 'ROW'wsrep_on = ON wsrep_node_address = 202.131.17.88 # cmon3 primary IP address wsrep_provider = '/usr/lib/galera/libgalera_smm.so' wsrep_provider_options = 'gcache.size=1024M;gmcast.segment=0' wsrep_cluster_address = gcomm://100.18.98.75,35.208.13.166,202.131.17.88 # All nodes' IP addresses wsrep_cluster_name = 'CMON_HA_Galera' wsrep_sst_method = rsync binlog_format = 'ROW' -
Bootstrap the Galera cluster on the first node only, cmon1:
-
On the remaining nodes (cmon2 and cmon3), remove the grastate.dat file to force an SST (full syncing) from cmon1 and start the MariaDB Galera service (one node at a time):
-
Verify the Galera Cluster is communicating correctly. On all nodes, you should see the following:
$ mysql -uroot -p -e "show status like 'wsrep%'" ... | wsrep_cluster_size | 3 | | wsrep_cluster_status | Primary | | wsrep_ready | ON | | wsrep_local_state_comment | Synced | ...Warning
Do not proceed to the next step until you get the same output as above. The cluster status must be Primary and Synced, with the correct cluster size (total number of nodes in a cluster).
-
Now we are ready to start the cmon service, enable back the cmon cron, and activate CMON HA on the first node (only proceed to the next node after all commands are successful):
-
Verify if CMON HA can see all nodes in the cluster:
$ s9s controller --list --long S VERSION OWNER GROUP NAME IP PORT COMMENT l 1.9.6.6408 system admins 100.18.98.75 100.18.98.75 9501 CmonHA just become enabled, starting as leader. f 1.9.6.6408 system admins 35.208.13.166 35.208.13.166 9501 Responding to heartbeats. f 1.9.6.6408 system admins 202.131.17.88 202.131.17.88 9501 Responding to heartbeats.The leftmost column indicates the controller role,
lmeans leader, andfmeans follower. In the above output, cmon1 is the leader. -
Open ClusterControl GUI of the leader node via web browser by going to
https://{leader_host_ip_address}/and create a new admin user. In this particular example, the ClusterControl GUI URL should behttps://100.18.98.75/(the leader controller). After creating the admin user, you will be redirected to the ClusterControl dashboard panel where you can start managing your database clusters. See User Guide.
Connecting via ClusterControl GUI
When running in CMON HA mode, only one ClusterControl Controller is active (leader), and the rest will be followers. If you are using ClusterControl GUI to access the controller, you will get an error if the corresponding host is not a leader.
The error means that the ClusterControl Controller of this host will not serve the incoming requests coming from this particular GUI and it returns a redirection warning instead. To determine which node is the leader, kindly use the ClusterControl CLI as below:
$ s9s controller --list --long
S VERSION OWNER GROUP NAME IP PORT COMMENT
l 1.9.6.6408 system admins 100.18.98.75 100.18.98.75 9501 CmonHA just become enabled, starting as leader.
f 1.9.6.6408 system admins 35.208.13.166 35.208.13.166 9501 Responding to heartbeats.
f 1.9.6.6408 system admins 202.131.17.88 202.131.17.88 9501 Responding to heartbeats.
ClusterControl CLI has the ability to follow redirects so you may execute the above command on any controller node's terminal as long as the node is in the same cluster.
CMON Controller Pool
CMON Controller Pool is a new feature that enables multiple ClusterControl controllers (CMON services) to work together as a pool, sharing the workload of managing database clusters. Instead of relying on a single controller to manage all clusters, controller pool distributes cluster load across multiple controller instances, improving performance, scalability and ensuring high availability.
Note
This feature is currently in Technical Preview, and not meant for production usage.
Enabling controller pool
-
Go to ClusterControl GUI → Settings → Controller pool → Enable controller pool.
-
A configuration wizard will pop up. Go to the Configuration section and specify a RPC port for the new controller. If leave blank, the script in step 3 will auto assign a random port.
-
Execute the script built by the wizard:
Example
root@ds-cc-pool1:~# sudo "$(dirname "$(which cmon || echo /usr/sbin/cmon)")/controller_pool.sh" [2025-10-27 08:21:44] Checking cmon.service status... [2025-10-27 08:21:44] cmon.service is running. Verifying pool mode... [2025-10-27 08:21:44] cmon.service is running WITHOUT --pool. Colocated controllers require the main service to use --pool to avoid conflicting behavior. You can enable pool mode by appending to /etc/default/cmon: EXTRA_OPTS=--pool and then restarting the service: sudo systemctl restart cmon.service Do you want to enable --pool in cmon.service now? (y/N): y [2025-10-27 08:21:55] Adding new EXTRA_OPTS line with --pool to /etc/default/cmon [2025-10-27 08:21:55] Restarting cmon.service to apply changes... [2025-10-27 08:22:07] cmon.service restarted with --pool. Proceeding... [2025-10-27 08:22:07] Using RPC port: 9600 [2025-10-27 08:22:07] Using cmon binary: /usr/sbin/cmon [2025-10-27 08:22:07] Starting colocated cmon controller... [2025-10-27 08:22:07] Command: /usr/sbin/cmon --pool -d --rpc-port=9600 --bind-addr= --events-client=http://127.0.0.1:9510 --cloud-service=http://127.0.0.1:9518 [2025-10-27 08:22:07] Logging to /var/log/cmon_colocated_9600.log (use --stdout to disable redirection) -
Once the above script is executed, let it run in the terminal and you shall see a new CMON controller running on a different port (9600) listed in the Controller pool page.
-
To remove a controller, simply go to Actions → Remove and ClusterControl GUI will send a signal to terminate the chosen CMON process.
Adding more controllers
You can add more controller into the pool by clicking on the Add controller button and follow the deployment wizard accordingly. You basically going to execute the same script as shown above multiple times (each on a different terminal), with a different CMON RPC port.
In this example, we added the third CMON controller instance into the pool:
Example
root@ds-cc-pool1:~# sudo "$(dirname "$(which cmon || echo /usr/sbin/cmon)")/controller_pool.sh"
[2025-10-27 08:26:30] Checking cmon.service status...
[2025-10-27 08:26:30] cmon.service is running. Verifying pool mode...
[2025-10-27 08:26:30] cmon.service configured with --pool (EXTRA_OPTS). Proceeding...
[2025-10-27 08:26:30] Port 9600 is in use, trying next available port...
[2025-10-27 08:26:30] Using RPC port: 9700
[2025-10-27 08:26:30] Using cmon binary: /usr/sbin/cmon
[2025-10-27 08:26:30] Starting colocated cmon controller...
[2025-10-27 08:26:30] Command: /usr/sbin/cmon --pool -d --rpc-port=9700 --bind-addr= --events-client=http://127.0.0.1:9510 --cloud-service=http://127.0.0.1:9518
[2025-10-27 08:26:30] Logging to /var/log/cmon_colocated_9700.log (use --stdout to disable redirection)
A new controller will be added into the pool, running on a different port as shown in the following screenshot:
Take note that currently, all instances reside on the same ClusterControl host (co-located). However, future controller scalability features will enable multiple ClusterControl instances on a different host to operate as a unified entity, managing entire database clusters. This will distribute the load across all CMON processes and include automatic failover/takeover for cluster management.
To remove a controller, simply go to Actions → Remove and ClusterControl GUI will send a signal to terminate the chosen CMON process.
Your feedback is welcome!



