Table of Contents
Provides information and management options for database clusters managed by ClusterControl. Clicking on the Clusters word (sidebar menu) will list out all database clusters on the main panel, each of the entries has a real-time summary of the cluster, together with the dropdown management menu (see Cluster Actions):
Each entry will have the following details:
- Cluster name: Configurable during import or deployment, also via the s9s command line.
- Cluster ID: Immutable ID assigned by ClusterControl after it is registered.
- Cluster type: The database cluster type, recognized by ClusterControl. See Supported Databases.
- Nodes: All nodes grouped under this particular cluster. See Nodes.
- Auto recovery: The ClusterControl automatic recovery settings. See Automatic Recovery.
- Load: Last 5 minutes of cluster load average.
… : Every supported database cluster has its own set of dropdown menus. See Cluster Actions.
Choose a database cluster from the expandable list on the left-side menu to drill down on the cluster-specific features. When selecting a cluster, the navigation breadcrumbs (top panel) will reflect with the corresponding cluster.
Cluster Actions
Provides shortcuts to the main cluster functionality. Each database cluster has its own set of functionality as described below:
- MySQL/MariaDB (replication/standalone)
- Galera Cluster
- PostgreSQL/TimescaleDB (streaming replication/standalone)
- MongoDB
- Redis
- Microsoft SQL Server
- Elasticsearch
MySQL/MariaDB (replication/standalone)
MySQLPercona ServerMariaDB ServerFeature | Description |
---|---|
Schedule maintenance |
|
Add replication node |
|
Add load balancer |
|
Change RPC API token |
|
Disable/Enable cluster recovery |
|
Disable/Enable node recovery |
|
Enable readonly |
|
Remove cluster |
|
Backup Settings |
|
View Nodes |
|
Galera Cluster
MySQLPercona XtraDB ClusterMariaDB (Galera Cluster)Feature | Description |
---|---|
Schedule maintenance |
|
Add node |
|
Add replication node |
|
Add load balancer |
|
Change RPC API token |
|
Disable/Enable cluster recovery |
|
Disable/Enable node recovery |
|
Enable readonly |
|
Rolling restart |
|
Clone cluster |
|
Find most advanced node |
|
Create replica cluster |
|
Bootstrap cluster |
|
Remove cluster |
|
Stop Cluster |
|
Backup Settings |
|
View Nodes |
|
PostgreSQL/TimescaleDB (streaming replication/standalone)
PostgreSQL TimescaleDBFeature | Description |
---|---|
Schedule maintenance |
|
Add replication node |
|
Add load balancer |
|
Change RPC API token |
|
Disable/Enable cluster recovery |
|
Disable/Enable node recovery |
|
Configure WAL |
|
Create replica cluster |
|
Enable TimescaleDB |
|
Install PgBackRest |
|
Reconfigure PgBackRest |
|
Uninstall PgBackRest |
|
Remove cluster |
|
Backup Settings |
|
View Nodes |
|
MongoDB
MongoDB Replica SetMongoDB Sharded ClusterFeature | Description |
---|---|
Schedule maintenance |
|
Add node |
|
Convert to shard |
|
Change RPC API token |
|
Disable/Enable cluster recovery |
|
Disable/Enable node recovery |
|
Install Percona Backup |
|
Remove cluster |
|
Backup Settings |
|
View Nodes |
|
Redis
Redis standalone Redis SentinelFeature | Description |
---|---|
Schedule maintenance |
|
Add node |
|
Change RPC API token |
|
Disable/Enable cluster recovery |
|
Disable/Enable node recovery |
|
Remove cluster |
|
Backup Settings |
|
View Nodes |
|
Microsoft SQL Server
Microsoft SQL Server 2019Feature | Description |
---|---|
Schedule maintenance |
|
Disable/Enable cluster recovery |
|
Disable/Enable node recovery |
|
Remove cluster |
|
Backup Settings |
|
View Nodes |
|
Elasticsearch
Feature | Description |
---|---|
Schedule maintenance |
|
Add Node |
|
Change RPC API token |
|
Disable/Enable cluster recovery |
|
Disable/Enable node recovery |
|
Remove cluster |
|
Backup Settings |
|
View Nodes |
|
Automatic Recovery
ClusterControl is programmed with a number of recovery algorithms to automatically respond to different types of common failures affecting your database systems. It understands different types of database topologies and database-related process management to help you determine the best way to recover the cluster. Some topology managers only cover cluster recoveries but you have to handle the node recovery by yourself. ClusterControl supports recovery at both cluster and node levels.
There are two recovery components supported by ClusterControl:
- Cluster – Attempt to recover a cluster to an operational state. Cluster recovery covers recovery attempts to bring up the entire cluster topology. See Cluster Recovery.
- Node – Attempt to recover a node to an operational state. Node recovery covers node recovery issues like if a node was being stopped outside of ClusterControl knowledge, e.g, via user-intervention stop command from SSH console or process killed by OOM process. See Node Recovery.
These two components are the most important things in order to make sure the service availability is as high as possible.
Node Recovery
ClusterControl is able to recover a database node in case of intermittent failure by monitoring the process and connectivity to the database nodes. For the process, it works similarly to systemd, where it will make sure the MySQL service is started and running unless you intentionally stopped it via ClusterControl UI.
If the node comes back online, ClusterControl will establish a connection back to the database node and will perform the necessary actions. The following is what ClusterControl would do to recover a node:
- It will wait for systemd/chkconfig/init to start up the monitored services/processes for 30 seconds
- If the monitored services/processes are still down, ClusterControl will try to start the database service automatically.
- If ClusterControl is unable to recover the monitored services/processes, an alarm will be raised.
If a database shutdown is initiated by the user via ClusterControl, ClusterControl will not attempt to recover the particular node at a later stage. It expects the user to start it back via ClusterControl UI the Start Node or by using the OS command explicitly.
The recovery includes all database-related services like ProxySQL, HAProxy, MaxScale, Keepalived, PgBouncer, Prometheus exporters, and garbd. Special attention to Prometheus exporters where ClusterControl uses a program called daemon
to daemonize the exporter process. ClusterControl will try to connect to the exporter’s listening port for health check and verification. Thus, it’s recommended to open the exporter ports from ClusterControl and Prometheus server to make sure no false alarms during recovery.
Cluster Recovery
ClusterControl understands the database topology and follows best practices in performing the recovery. For a database cluster that comes with built-in fault tolerance like Galera Cluster, NDB Cluster, and MongoDB Replicaset, the failover process will be performed automatically by the database server via quorum calculation, heartbeat, and role switching (if any). ClusterControl monitors the process and makes necessary adjustments to the visualization like reflecting the changes under the Topology view and adjusting the monitoring and management component for the new role e.g, a new primary node in a replica set.
For database technologies that do not have built-in fault tolerance with automatic recoveries like MySQL/MariaDB Replication and PostgreSQL/TimescaleDB Streaming Replication (see further down), ClusterControl will perform the recovery procedures by following the best practices provided by the database vendor. If the recovery fails, user intervention is required, and of course, you will get an alarm notification regarding this.
In a mixed/hybrid topology, for example, an asynchronous replica that is attached to a Galera Cluster or NDB Cluster, the node will be recovered by ClusterControl if cluster recovery is enabled.
Cluster recovery does not apply to standalone MySQL servers. However, it’s recommended to turn on both node and cluster recoveries for this cluster type in the ClusterControl UI.
MySQL/MariaDB Replication
ClusterControl supports recovery of the following MySQL/MariaDB replication setup:
- Primary-replica with MySQL GTID
- Primary-replica with MariaDB GTID
- Primary-primary with MySQL GTID
- Primary-primary with MariaDB GTID
- Asynchronous replica attached to a Galera Cluster
ClusterControl will respect the following parameters when performing cluster recovery:
enable_cluster_autorecovery
auto_manage_readonly
repl_password
repl_user
replication_auto_rebuild_slave
replication_check_binlog_filtration_bf_failover
replication_check_external_bf_failover
replication_failed_reslave_failover_script
replication_failover_blacklist
replication_failover_events
replication_failover_wait_to_apply_timeout
replication_failover_whitelist
replication_onfail_failover_script
replication_post_failover_script
replication_post_switchover_script
replication_post_unsuccessful_failover_script
replication_pre_failover_script
replication_pre_switchover_script
replication_skip_apply_missing_txs
replication_stop_on_error
For more details on each of the parameters, refer to the documentation page.
ClusterControl will obey the following rules when monitoring and managing a primary-replica replication:
- All nodes will be started with
read_only=ON
andsuper_read_only=ON
(regardless of its role). - Only one primary (
read_only=OFF
) is allowed to operate at any given time. - Rely on the MySQL variable
report_host
to map the topology. - If there are two or more nodes that have
read_only=OFF
at a time, ClusterControl will automatically setread_only=ON
on both primaries, to protect them against accidental writes. User intervention is required to pick the actual primary by disabling the read-only.
In case the active primary goes down, ClusterControl will attempt to perform the primary failover in the following order:
- After 3 seconds of primary unreachability, ClusterControl will raise an alarm.
- Check the replica availability, at least one of the replicas must be reachable by ClusterControl.
- Pick the replica as a candidate to be a primary.
- ClusterControl will calculate the probability of errant transactions if GTID is enabled.
- If no errant transaction is detected, the chosen will be promoted as the new primary.
- Create and grant the replication user to be used by replicas.
- Change the primary for all replicas that were pointing to the old primary to the newly promoted primary.
- Start replica and enable read-only.
- Flush logs on all nodes.
If the replica promotion fails, ClusterControl will abort the recovery job. User intervention or a cmon service restart is required to trigger the recovery job again.
When the old primary is available again, it will be started as read-only and will not be part of the replication. User intervention is required.
PostgreSQL/TimescaleDB Streaming Replication
ClusterControl supports recovery of the following PostgreSQL replication setup:
- PostgreSQL Streaming Replication
- TimescaleDB Streaming Replication
ClusterControl will respect the following parameters when performing cluster recovery:
enable_cluster_autorecovery
repl_password
repl_user
replication_auto_rebuild_slave
replication_failover_whitelist
replication_failover_blacklist
For more details on each of the parameters, refer to the documentation page.
ClusterControl will obey the following rules for managing and monitoring a PostgreSQL streaming replication setup:
wal_level
is set toreplica
(orhot_standby
depending on the PostgreSQL version).- The parameter
archive_mode
is set to ON on the primary. - Set
recovery.conf
file on the replica nodes, which turns the node into a hot standby with read-only enabled.
In case the active primary goes down, ClusterControl will attempt to perform the cluster recovery in the following order:
- After 10 seconds of primary unreachability, ClusterControl will raise an alarm.
- After 10 seconds of graceful waiting timeout, ClusterControl will initiate the primary failover job.
- Sample the
replayLocation
andreceiveLocation
on all available nodes to determine the most advanced node. - Promote the most advanced node as the new primary.
- Stop replicas.
- Verify the synchronization state with
pg_rewind
. - Restarting replicas with the new primary.
If the replica promotion fails, ClusterControl will abort the recovery job. User intervention or a cmon service restart is required to trigger the recovery job again.
When the old primary is available again, it will be forced to shut down and will not be part of the replication. User intervention is required. See further down.
When the old primary comes back online, if the PostgreSQL service is running, ClusterControl will force the shutdown of the PostgreSQL service. This is to protect the server from accidental writes since it would be started without a recovery file (recovery.conf
), which means it would be writable. You should expect the following lines will appear in postgresql-{day}.log
:
2019-11-27 05:06:10.091 UTC [2392] LOG: database system is ready to accept connections
2019-11-27 05:06:27.696 UTC [2392] LOG: received fast shutdown request
2019-11-27 05:06:27.700 UTC [2392] LOG: aborting any active transactions
2019-11-27 05:06:27.703 UTC [2766] FATAL: terminating connection due to administrator command
2019-11-27 05:06:27.704 UTC [2758] FATAL: terminating connection due to administrator command
2019-11-27 05:06:27.709 UTC [2392] LOG: background worker "logical replication launcher" (PID 2419) exited with exit code 1
2019-11-27 05:06:27.709 UTC [2414] LOG: shutting down
2019-11-27 05:06:27.735 UTC [2392] LOG: database system is shut down
The PostgreSQL was started after the server was back online around 05:06:10 but ClusterControl performs a fast shutdown 17 seconds after that around 05:06:27. If this is something that you would not want it to be, you can disable node recovery for this cluster momentarily.
Add Node
Adds a new node by creating or importing it into the running database cluster. At the moment, this feature is available for 2 cluster types:
- Galera Cluster – You may add a new node, or import an existing database node into the cluster.
- Redis – Only adding a new Redis replica is supported.
- Elasticsearch – Only adding a new Elasticsearch master, data, master-data or coordinator node is supported.
Galera Cluster
Percona XtraDB ClusterMariaDB (Galera Cluster)Adds a new or existing database node into the cluster. You can scale out your cluster by adding mode database nodes. The new node will automatically join and synchronize with the rest of the cluster.
Create a database node
If you specify a new hostname or IP address, make sure that the node is accessible from the ClusterControl node via passwordless SSH. See Passwordless SSH.
Field | Description |
---|---|
Node Configuration | |
Data directory |
|
Galera segment |
|
Configuration template |
|
Install software |
|
Disable firewall |
|
Disable SELinux/AppArmor |
|
Advanced Settings | |
Rebuild from a backup |
|
Include in LoadBalancer set (if exists) |
|
Add Node | |
Node |
|
Import a database node
Imports an existing replication node into ClusterControl. Use this feature if you have added a replica manually to your cluster and want it to be detected/managed by ClusterControl. ClusterControl will then detect the new database node as being part of the cluster and starts to manage and monitor it as with the rest of the cluster nodes. This is useful if a replica node has been configured outside ClusterControl e.g, through Puppet, Ansible, or manual way.
Field | Description |
---|---|
Node Configuration | |
Port |
|
Include in LoadBalancer set (if exists) |
|
Add Node | |
Node |
|
Redis
Adds a new Redis replica node to join the cluster. If you specify a new hostname or IP address, make sure that the node is accessible from the ClusterControl node via passwordless SSH. See Passwordless SSH.
Field | Description |
---|---|
Node configuration | |
Port |
|
Redis sentinel port |
|
Install software |
|
Disable firewall |
|
Disable SELinux/AppArmor |
|
Add Node | |
Node |
|
Elasticsearch
Adds a new Elasticsearch master, data, master-data or coordinator node to join the selected cluster. If you specify a new hostname or IP address, make sure that the node is accessible from the ClusterControl node via passwordless SSH. See Passwordless SSH.
Field | Description |
---|---|
Node configuration | |
Port |
|
Install software |
|
Disable firewall |
|
Disable SELinux/AppArmor |
|
Add Node | |
Node |
|
Add Replication Node
MySQL/MariaDB Replication/Galera Cluster
Add replication node requires at least one existing node already configured with binary logs with GTID enabled. This is also true for Galera Cluster. The following must be true for the primaries:
- At least one primary among the Galera nodes.
- MySQL/MariaDB GTID must be enabled.
log_slave_updates
must be enabled.- Primary’s MySQL port is accessible by ClusterControl and replicas.
- To enable binary logs for Galera Cluster, go to ClusterControl → Nodes → choose the database server → Enable Binary Logging.
For the replica, you would need a separate host or VM, with or without MySQL installed. If you do not have MySQL installed, and choose ClusterControl to install MySQL on the replica host, ClusterControl will perform the necessary actions to prepare the replica. This includes configuring the root password, creating the replication user, configuring MySQL, starting the service, and starting the replication link. The MySQL or MariaDB packages be based on the chosen vendor, for example, if you are running a Percona XtraDB Cluster, ClusterControl will prepare the replica using Percona Server. Prior to the deployment, you must perform the following actions:
- The replica node must be accessible using passwordless SSH from the ClusterControl server.
- MySQL port (default 3306) and netcat port 9999 on the replica host are open for connections.
Create a Replication Node
MySQLPercona ServerPercona XtraDB ClusterMariaDB (Server and Galera Cluster)The replica will be set up through backup streaming from the primary to the replica.
Field | Description |
---|---|
Node configuration | |
Netcat port |
|
Port |
|
Install software |
|
Disable firewall |
|
Disable SELinux/AppArmor |
|
Advanced settings | |
Rebuild from a backup |
|
Include in LoadBalancer set (if exists) |
|
Delay the replica node |
|
Semi-synchronous replication |
|
Add Node | |
Primary node |
|
Node |
|
Import a Replication Node
MySQLPercona ServerPercona XtraDB ClusterMariaDB (Server and Galera Cluster)Imports an existing replication node into ClusterControl. Use this feature if you have added a replica manually to your cluster and want it to be detected/managed by ClusterControl. ClusterControl will then detect the new database node as being part of the cluster and starts to manage and monitor it as with the rest of the cluster nodes. This is useful if a replica node has been configured outside of ClusterControl e.g, through Puppet, Ansible, or manual way.
Field | Description |
---|---|
Node Configuration | |
Port |
|
Include in LoadBalancer set (if exists) |
|
Add Node | |
Node |
|
PostgreSQL/TimescaleDB
PostgreSQL replication slave requires at least one primary node. The following must be true for the primary:
- At least one primary under the same cluster ID.
- Only PostgreSQL 9.6 and later are supported.
- Primary’s PostgreSQL port is accessible by ClusterControl and replica hosts.
For replica, you would need a separate host or VM, with or without PostgreSQL installed. If you do not have a PostgreSQL installed, and choose ClusterControl to install the PostgreSQL on the host, ClusterControl will perform the necessary actions to prepare the slave, for example, create a slave user, configure PostgreSQL, start the server and also start the replication. Prior to the deployment, you must perform the following actions:
- The slave node must be accessible using passwordless SSH from the ClusterControl server.
- The PostgreSQL port (default 5432) on the replica is open for connections for at least the ClusterControl server and the other members in the cluster.
To prepare the PostgreSQL configuration file for the slave, go to ClusterControl → Manage → Configurations → Template Configuration files. Later, specify this template file when adding a slave.
Create a Replication Node
PostgreSQLTimescaleDBThe replica will be set up by streaming a pg_basebackup backup from the primary node to the replica node. The primary node’s configuration will be altered to allow the replica node to join.
Field | Description |
---|---|
Node configuration | |
Port |
|
Use package default for datadir |
|
Install software |
|
Advanced settings | |
Include in LoadBalancer set (if exists) |
|
Instance name |
|
Semi-synchronous replication |
|
Add Node | |
Primary node |
|
Node |
|
Import a Replication Node
PostgreSQLTimescaleDBImports an existing replication node into ClusterControl. Use this feature if you have added a replica manually to your cluster and want it to be detected/managed by ClusterControl. ClusterControl will then detect the new database node as being part of the cluster and starts to manage and monitor it as with the rest of the cluster nodes. This is useful if a replica node has been configured outside of ClusterControl e.g, through Puppet, Ansible, or manual way.
Field | Description |
---|---|
Node Configuration | |
Port |
|
Logfile path |
|
Include in LoadBalancer set (if exists) |
|
Add Node | |
Node |
|
Configure WAL
PostgreSQLTimescaleDBThe WALs are the REDO logs in PostgreSQL. REDO logs contain all changes that were made in the database and they are used for replication, recovery, online backup, and point-in-time recovery (PITR). Any changes that have not been applied to the data pages can be redone from the REDO logs.
This step is actually not mandatory, but is extremely important for a robust replication setup, as it is necessary to avoid the main server recycling old WAL files that have not yet been applied to the replica. If this occurs we will need to recreate the replica from scratch. Enabling write-ahead logging (WAL) makes it possible to support online backup and point-in-time recovery in PostgreSQL.
Feature | Description |
---|---|
Archive Mode |
|
Compressed WAL Archive |
|
Custom WAL Archive Directory |
|
Apply to |
|
Changing the current value leads to loss of collected continuous WAL archive and thus loss of time frame to do point-in-time recovery (PITR).
Cluster-Cluster Replication
Percona XtraDB ClusterMariaDB (Galera Cluster)PostgreSQLTimescaleDBThis feature allows you to create a new cluster that will be replicating from this cluster. One common use case is for disaster recovery by having a hot standby site/cluster which can take over when the main site/cluster has failed. Clusters can be rebuilt with an existing backup or by streaming from a primary on the source cluster.
For MySQL-based clusters, ClusterControl will configure asynchronous MySQL replication from a primary cluster to a replica cluster. For PostgreSQL-based clusters, ClusterControl will configure asynchronous streaming replication between a primary cluster to a replica cluster. For Galera Cluster, the asynchronous replication can be optionally set up with uni-directional (primary → replica) or bi-directional (primary ↔ replica) replication.
There are two ways ClusterControl can create a replica cluster:
- Streaming data from primary cluster – Stream the data from a primary using hot backup tools e.g, Percona Xtrabackup, and MariaDB Backup or pg_basebackup. You need to pick one node of the source cluster to replicate from.
- Stage cluster from a backup – Choose an existing full backup from the dropdown list. For MySQL or MariaDB, if none is listed, take a full backup of one of the database nodes in your cluster which have binary logging enabled.
Once the above options have been selected, the cluster deployment wizard will appear similar to deploying a new cluster. See Deploy Database Cluster.
A replica cluster will appear in the database cluster list after deployment finishes. You will notice the replica cluster entry has a green footer showing that it is a replica of another cluster. If it is a bi-directional replication, you would see a double-headed arrow (single-headed arrow for uni-directional replication) with the cluster name and ID that it is replicating from, indicating the cluster-cluster replication is now active. You may also see from the topology of the cluster information card (just roll over on the cluster name to see the cluster information card to appear) and you should see the topology view similar to the example below:
We highly recommend users to enable cluster-wide read-only on the replica cluster. Disable read-only only when promoting the replica cluster as the new primary cluster.
Clone Cluster
Percona XtraDB ClusterMariaDB (Galera Cluster)This feature allows you to create, in one click, an exact copy of your Galera Cluster onto a new set of hosts. The most common use case for cloning a deployment is for setting up a staging deployment for further development and testing. Cloning is a ‘hot’ procedure and does not affect the operations of the source cluster.
A clone will be created for this cluster. The following procedure applies:
- Create a new cluster consisting of one node.
- Stage the new cluster with SST (it is now closed).
- Nodes will be added to the cloned cluster until the cloned cluster size is reached.
- Query Monitor settings and settings for Cluster Recovery and Node Recovery options are not cloned.
- The
my.cnf
file may not be identical on the cloned cluster.
Field | Description |
---|---|
Configuration | |
Cluster Name |
|
Repository |
|
Disable firewall |
|
Disable SELinux/AppArmor |
|
Add Host | |
Node |
|
Backup Settings
Manages the backup default settings for the corresponding cluster.
Feature | Description |
---|---|
Default backup directory |
|
Default subdirectory |
|
Netcat port |
|
Enable hash check on created backup files |
|
Default backup retention period |
|
Default cloud backup retention period |
|
Backup Subdirectory
Variable | Description |
---|---|
B | The date and time when the backup creation was beginning. |
H | The name of the backup host, the host that created the backup. |
i | The numerical ID of the cluster. |
I | The numerical ID of the backup. |
J | The numerical ID of the job that created the backup. |
M | The backup method (e.g. “mysqldump”, “pg_basebackup”, “mongodump”). |
O | The name of the user who initiated the backup job. |
S | The name of the storage host, the host that stores the backup files. |
% | The percent sign itself. Use two percent signs, %% the same way the standard printf() function interprets it as one percent sign. |