Table of Contents
Provides detailed information for each node in the cluster. On the left hand column, you can find a list of all nodes that are members of the cluster including ClusterControl node. If you added slaves, HAProxy, Keepalived, MaxScale, ProxySQL or Garbd to your cluster through ClusterControl, these will also be listed.
Node Monitoring
To learn more about how ClusterControl monitors the hosts, see Monitoring Operations.
The node on the list will appear in red to indicate it is unhealthy. The tabs show performance and resource usage for a specific node. There are also database-specific tabs depending on the type of database running on the host.
Database node status indicator:
Status | Description |
---|---|
OK | This indicates the node is working fine. |
WARNING | Indicates the node is degraded and not performing as expected. |
MAINTENANCE | Indicates the node is under maintenance. |
PROBLEMATIC | Indicates the node is down or unreachable. |
Starting from ClusterControl 1.9.7 (September 2023), ClusterControl GUI v2 is the default frontend graphical user interface (GUI) for ClusterControl. Note that the GUI v1 is considered a feature-freeze product with no future development. All new developments will be happening on ClusterControl GUI v2. See User Guide (GUI v2).
Controller Node
Field | Description |
---|---|
Overview |
|
Top |
|
Database Nodes
Field | Description |
---|---|
Overview |
|
Top |
|
Logs |
|
DB Performance |
|
DB Status |
|
DB Variables |
|
HAProxy Nodes
Provides a detailed view of HAProxy stats, similar to the HAProxy stats page. If HAProxy is deployed using ClusterControl, ClusterControl will automatically create the HAProxy stats page on port 9600. You can access the page directly using the value of Admin User and Admin Password specified during the deployment at ClusterControl → Manage → Load Balancer → Install HAProxy → Show Advanced Settings.
Field | Description |
---|---|
Stats URL |
|
Update |
|
Enabled |
|
ProxySQL Nodes
Provides a detailed view of ProxySQL stats. ClusterControl connects to the ProxySQL admin interface to retrieve the stats and visualize them here.
Monitor
Field | Description |
---|---|
ProxySQL Host Groups |
|
ProxySQL Stats |
|
Top Queries
List of queries digested by the ProxySQL instance. For each query, there is a menu if you roll over on the row.
Field | Description |
---|---|
Clear Queries |
|
Create Rule |
|
Cache Query |
|
Full Digests |
|
Rules
List out all query rules created under this ProxySQL instance.
Field | Description |
---|---|
Add New Rule | Creates a new query rule. Details at ProxySQL MySQL query rules. |
Edit | Edit an existing query rule. This will expand a dialog for you to fine-tune the query rule before updating it into ProxySQL. |
Delete | Delete an existing query rule. |
Servers
List out all backend servers created under this ProxySQL instance.
Field | Description |
---|---|
Add Server |
|
Host Groups |
|
ProxySQL Cluster |
|
Users
List out all users created under this ProxySQL.
Field | Description |
---|---|
Import Users |
|
Add New User |
|
Edit |
|
Drop User |
|
Variables
Lists out all ProxySQL variables for this instance. You can filter the variables using the lookup field. Details at ProxySQL Global Variables.
Scheduler Scripts
Lists out scheduler script, commonly being configured if you are running ProxySQL on top of Galera Cluster. The scheduler is a cron-like implementation integrated inside ProxySQL with millisecond granularity. Details at ProxySQL Scheduler.
Node Performance
Provides a summary of host information and statistic histogram including CPU, disk, network, and memory usage.
Process List
Lists out ProxySQL process list, similar to the output of sSELECT * FROM stats_mysql_processlist
. This can be useful for troubleshooting processes and making sure the query is routed properly to the respective hostgroup in real-time.
Prometheus
Provides a detailed view of ProxySQL stats. ClusterControl connects to the ProxySQL admin interface to retrieve the stats and visualize them here.
Exporters
Lists out exporter jobs per host. A green exporter means the exporter is working correctly.
Settings
Shows the Prometheus settings.
Node Actions
SSH Console
Opens a web-based SSH terminal in a new browser window that allows executing shell commands on the server directly from a browser as the configured os_user
. This feature is only supported with Apache 2.4+ with a running cmon-ssh daemon. For more details, on this component, see ClusterControl SSH.
Schedule Maintenance Mode
Puts individual nodes into maintenance mode which prevents ClusterControl from raising alarms and notifications during the maintenance period. When toggling ON, you can set the maintenance period for a pre-defined time or schedule it accordingly. Specify the reason for auditing purposes. ClusterControl will not degrade the node, hence the node’s state remains as it is unless you perform any maintenance on it.
Alarms and notifications for this node will be activated back once the maintenance period is exceeded, or you explicitly toggling it OFF.
If node auto-recovery is enabled, ClusterControl will always recover a node regardless of the maintenance mode status. Don’t forget to disable node auto-recovery to avoid ClusterControl interfering with your maintenance tasks.
Reboot Host
Initiates a system reboot of the selected host. Once initiated, ClusterControl will monitor the reboot progress every 5 seconds for 10 minutes (600 seconds) before declaring the reboot operation has failed.
Restart Node
Restarts the active monitored process of the selected host. For example, if the node’s role is HAProxy, ClusterControl will restart the HAProxy process. This is not a system reboot. Only available if the service is started.
You can configure the graceful shutdown timeout (default is 1800 seconds) in the Confirm Shutdown dialog. ClusterControl will give up waiting for a node to gracefully terminate. If the node is still running after the timeout you may send the SIGKILL signal to force the node down by toggling on the “Force stop (SIGKILL) node after the graceful shutdown timeout has been reached” option.
The node will be shut down and enter maintenance mode.
Stop Node
Stops the monitored process of the selected host. For example, if the node’s role is HAProxy, ClusterControl will restart the HAProxy process. This is not a system shutdown. Only available if the service is started.
You can configure the graceful shutdown timeout (default is 1800 seconds) in the Confirm Shutdown dialog. ClusterControl will give up waiting for a node to gracefully terminate. If the node is still running after the timeout you may send the SIGKILL signal to force the node down by toggling on the “Force stop (SIGKILL) node after the graceful shutdown timeout has been reached” option.
The node will be shut down and enter maintenance mode.
Unregister Node
Removes the database node from the database cluster and/or ClusterControl monitoring. You can choose one of these three options:
Field | Description |
---|---|
Keep the service running |
|
Stop service and keep files untouched |
|
Stop and uninstall service (all configuration files will be deleted) |
|
Cluster-specific Node Actions
Some of the node management feature sets are built for a particular cluster, as described in the next sections.
Galera Cluster
These are specific options available for Galera nodes:
Field | Description |
---|---|
Resync Node |
|
Bootstrap Cluster |
|
Rebuild Replication Slave |
|
Start Node |
|
Make Primary |
|
Enable Binary Logging |
|
Rebuilding Replication Slave will wipe out the selected node’s MySQL datadir
.
MySQL Cluster
These are specific options available for MySQL cluster nodes:
Field | Description |
---|---|
Shutdown Node |
|
Restart Node |
|
Restart Node |
|
Start Node |
|
MySQL Replication
These are specific options available for MySQL replication nodes:
Field | Description |
---|---|
Disable Readonly |
|
Enable Readonly |
|
Promote Slave |
|
Start Slave |
|
Stop Slave |
|
Rebuild Replication Slave |
|
Change Replication Master |
|
Reset Slave |
|
Reset Slave All |
|
Rebuilding Replication Slave will wipe out the selected node’s MySQLdatadir
.
MySQL Standalone
These are specific options available for MySQL standalone nodes:
Field | Description |
---|---|
Enable Binary Logging |
|
Disable Read Only |
|
Enable Read Only |
|
ProxySQL
The following are specific options available for ProxySQL nodes:
- Sync Instances
- Synchronizes a ProxySQL configuration with other instances to keep them identical. You can perform syncing operations (export & import), export (backup), or import (restore) of ProxySQL configurations.
- For export (backup), the configuration data will be exported into several SQL dump files where applicable. The following configuration data will be exported:
- Query Rules
- Host Groups/Servers
- Users and corresponding MySQL users
- Global Variables
- Scheduler
proxysql.cnf
- For import (restore), the existing configuration will be overwritten.
Failover, Switchover, Topology Changes and Recovery
ClusterControl performs failover, switchover, and recovery procedures based on the cluster topology that the user has set up. Since MySQL can be running in hybrid replication mode e.g., a three-node Galera Cluster with two asynchronous replication slaves attached to it.
Galera Cluster Recovery
Node Recovery
In the Galera Cluster, all nodes are equal – each node holds the same role and the same dataset. Therefore, there is no failover within the cluster if a node fails. Only the application side requires failover, to skip the inoperational nodes while the cluster is partitioned. Therefore, it’s highly recommended to place load balancers on top of a Galera Cluster to:
- Unify the multiple database endpoints to a single endpoint (load balancer host or virtual IP address as the endpoint).
- Balance the database connections between the backend database servers.
- Perform health checks and only forward the database connections to healthy nodes.
- Redirect/rewrite/block offending (badly written) queries before they hit the database servers.
For the Galera Cluster, ClusterControl supports HAProxy, MariaDB MaxScale, and ProxySQL. ClusterControl also supports virtual IP address implementation through Keepalived. If having a load balancer is not an option, ensure your applications are aware of these topology changes and redirect the request to the healthy node accordingly. There are several MySQL connectors that come with built-in automatic failover like php-mysqlnd_ms for PHP and MySQL Connector/J for Java.
For node recovery, if the cluster loses a minority of the nodes at one time, the majority of nodes will be very likely to remain operational, thanks to the Galera quorum calculation and group communication. When the problematic node comes back up, the node will re-establish group communication with the operational nodes and automatic syncing operation will take place before the node is allowed to join the cluster. In simple words, the node recovery process is handled automatically by Galera. Nevertheless, ClusterControl will still oversee this recovery process and notify users of the status and progress. ClusterControl automatic node recovery will only kick in if Galera’s automatic recovery fails.
Cluster Recovery
A cluster is deemed as a failure if all nodes or the majority of the nodes go offline without graceful shutdown. Offline in this context means they are not able to see each other through Galera’s replication traffic or group communication. Examples of cluster failure include power trips against all or the majority of the nodes, MySQL/MariaDB, or Galera software crashes due to bugs or shared-storage failures. If total failure happens, bootstrap is the only way to go.
In the case of a network glitch, Galera will always attempt to recover a partitioned cluster once the network issue is resolved. Galera will automatically re-establish the communication between members, exchange node’s states, and determine the possibility of reforming the primary component by comparing node state, UUIDs, and seqnos. If the probability is there, Galera will merge the primary components and the cluster can resume in the operational state without any intervention. Otherwise, you have to promote at least one of the nodes to become a primary component or re-bootstrap the cluster.
To re-bootstrap a cluster, pick one node (usually the most advanced node by looking at the highest wsrep_committed
value) and then go to Cluster Action → Bootstrap Cluster → Bootstrap Node. ClusterControl will then perform the cluster bootstrapping process from the chosen node. You can choose Auto-Select from the dropdown in case you can’t figure out which node is the most advanced node. ClusterControl will always try to bootstrap the most up-to-date node if this option is selected.
Otherwise, you can run the following command on the most advanced node to simply promote it again as a primary component:
mysql> SET GLOBAL 'wsrep_provider_options="pc.bootstrap=1"'
To learn more about Galera Cluster recovery when network partitioning happens, check out this blog post, Galera Cluster Recovery 101 – A Deep Dive into Network Partitioning.
Asynchronous Cluster Recovery
On the other hand, it’s also possible to have asynchronous replication between two Galera Clusters. This should be handled differently and we have covered the failover and failback procedures in this blog post, Asynchronous Replication Between MySQL Galera Clusters – Failover and Failback.
MySQL Replication Master Failover
In MySQL Replication, the process of promoting a replica (slave) to become a master after the old master has failed is called failover. On the other hand, “switchover” happens when the user triggers the promotion of the replica. A new master is promoted from a replica pointed by the user and the old master, typically, becomes a replica of the new master.
ClusterControl applies industry best practices to make sure that the failover process is performed correctly. It also ensures that the process will be safe – default settings are intended to abort the failover if possible issues are detected. Those settings can be overridden by the user should they want to prioritize failover over data safety. Take note that ClusterControl will perform automatic recovery only if auto-recovery is toggled on.
All of the configuration options mentioned in this section can be configured under /etc/cmon.d/cmon_X.cnf
, where X is the cluster-ID of the MySQL Replication.
For more info on how to control the replication failover behavior performed by ClusterControl (whitelist/blacklist configuration, determine a good/bad master candidate, etc), check out this blog post, How to Control Replication Failover for MySQL and MariaDB. For a complete list of configuration options related to MySQL Replication, see ClusterControl Controller Configuration Options.
Reverse Proxies with Keepalived
To eliminate a single point of failure (SPOF) in the load balancer tier, a redundant reverse proxy is one of the ways to go. That’s why to deploy Keepalived using ClusterControl, you need two or more load balancers installed by or imported into ClusterControl. Keepalived will be used to tie load balancers together with a floating IP address in an active-passive mode, where the active node is the one that holds the virtual IP address at one given time.
For production usage, we highly recommend the load balancer software to be running on a standalone host and not co-located with your database nodes. Check out this blog post, How ClusterControl Configures Virtual IP and What to Expect During Failover for more info on how ClusterControl deploys Keepalived and what happens during failover.