Table of Contents
Manages ClusterControl integration modules. Starting from version 1.5.0, there are two modules available:
- 3rd Party Notifications via
clustercontrol-notifications
package. - Cloud Provider integration via
clustercontrol-cloud
andclustercontrol-clud
packages.
3rd Party Notifications
Configures third-party notifications on events triggered by ClusterControl. allowing users to integrate ClusterControl into your company’s communication channels, incident response systems, and workflow applications. To prevent users from receiving too many notifications, ClusterControl integrations also allow users to send out only specific critical alerts or warnings (see Warning Events and Critical Events).
Introducing the ClusterControl Alerting Integrations.
Supported services are:
Incident management services | Chat services | Others |
---|---|---|
PagerDuty | Slack | Webhook |
VictorOps | Telegram | |
OpsGenie | ||
ServiceNow |
Note that events are only created when either of the following occurred:
- Alarm created
- Alarm ended
Thus, if you already have alarms raised and then you create, for example, a Slack (or any other) integration, you will never see events for the alarms already raised/created before you created the Slack integration. You will only see alarms created after the integration.
Field | Description |
---|---|
Add new integration |
|
Select Service |
|
Service Configuration |
|
Notification Configuration |
|
Edit |
|
Delete |
|
Event | Description |
---|---|
All Events | All ClusterControl events including warning and critical events. |
All Warning Events | All ClusterControl warning events, e.g. cluster degradation, network glitch. See Warning Events. |
All Critical Events | All ClusterControl critical events, e.g. cluster failed, host failed. See Critical Events. |
Network | Network-related events, e.g. host unreachable, SSH issues. |
CMON Database | Internal CMON database-related events, e.g. unable to connect to CMON database, datadir mounted as read-only. |
Mail system-related events, e.g. unable to send mail, mail server unreachable. | |
Cluster | Cluster-related events, e.g. cluster failure, cluster degradation, time drifting. |
Cluster Configuration | Cluster configuration events, e.g. SST account mismatch. |
Cluster Recovery | Recovery events, e.g. cluster or node recovery failures. |
Node | Node-related events, e.g. node disconnected, missing GRANT, failed to start HAProxy, failed to start NDB cluster nodes. |
Host | Host-related messages, e.g. CPU/disk/RAM/swap exceeds thresholds, memory full. |
Database Health | Database health-related events, e.g. memory usage of MySQL servers, connections, missing primary key. |
Database Performance | Alarms for long-running transactions, replication lag, and deadlocks. |
Software Installation | Software installation-related events, e.g. license expiration. |
Backup | Backups related events, e.g. backup failed. |
Warning Events
ClusterControl uses the same alarms’ code name for other database clusters which produces a similar result. For example, if a PostgreSQL slave is lagging behind, the alarms internal code name is also called “MySqlReplicationLag”. However, the actual alarm’s response will have proper PostgreSQL relevant texts. This is handled internally.
Area | Alarms | Severity | Description |
---|---|---|---|
Node | MySqlReplicationLag | Warning | MySQL replication slave lag, default 10 seconds. |
MySqlReplicationBroken | Warning | The SQL thread has stopped. For PostgreSQL, it means the slave gets disconnected from the master. | |
CertificateExpiration | Warning | SSL certificate expiration time (<=31 days, >7 days). | |
MySqlAdvisor | Warning | Raised by wsrep_sst_method.js and wsrep_node_name.js advisors. |
|
MySqlTableAnalyzer | Warning | Raised by schema_check_nopk.js advisor. |
|
StorageMyIsam | Warning | Raised by schema_check_myisam.js advisor. |
|
MySqlIndexAnalyzer | Warning | Raised by schema_check_dupl_index.js advisor. |
|
MySqlReplicationLooseServer | Warning | A slave is found but its master can’t be determined, or it is not part of the cluster. | |
Host | HostSwapV2 | Warning | If a configurable number of pages has been swapped in/out during a configurable period of time. Default 20 pages in 10 minutes. |
HostSwapping | Warning | >5% swap space has been used. | |
HostCpuUsage | Warning | >80%, <90% CPU used. | |
HostRamUsage | Warning | >80%, <90% RAM used. | |
HostDiskUsage | Warning | >80%, <90% disk space used on a monitored_mountpoint. | |
ProcessCpuUsage | Warning | >95 % CPU used on average by a process for 15 minutes. | |
Backup | BackupFailed | Warning | The backup job fails. |
Recovery | GaleraWsrepMissing | Warning | wsrep_cluster_address or wsrep_provider is missing. |
GaleraSstAuth | Warning | SST settings (user/pass are wrong). | |
Network | HostFirewall | Warning | The host is not responding to ping after 3 cycles. |
HostSshSlow | Warning | It takes 6-12 seconds to SSH into a host. | |
Cluster | ClusterTimeDrift | Warning | Time drift between ClusterControl and database nodes. |
ClusterLicenseExpire | Warning | The license is about to expire. | |
ClusterInconsitentView | Warning | The load balancer or ClusterControl sees a different set of working nodes (master is down from ClusterControl point-of-view, while load balancer or the slave reports the master working.) |
Critical Events
ClusterControl uses the same alarms’ code name for other database clusters which produces a similar result. For example, if a PostgreSQL server goes down, the alarms internal code name is also called “MySqlDisconnected”. However, the actual alarm’s response will have proper PostgreSQL relevant texts. This is handled internally.
Area | Alarms | Severity | Description |
---|---|---|---|
Node | MySqlDisconnected | Critical | The database server cannot be reached. |
MySqlGrantMissing | Critical | Node does not have the correct privileges set for the cmon user. | |
MySqlLongRunningQuery | Critical | If queries are running for too long time. Only used if configured, by default it is not. | |
ProcFailedRestart | Critical | A process (HAProxy, ProxySQL, Garbd, MaxScale) could not be restarted after a failure. | |
CertificateExpiration | Critical | (<= 7 days), SSL Certificates expiration time. | |
MySqlReplicationMultiMaster | Critical | Multiple writable masters detected. | |
Host | HostSwapV2 | Critical | If a configurable number of pages has been swapped in/out during a configurable period of time. Default 20 pages in 10 minutes. |
HostSwapping | Critical | >20% swap space has been used. | |
HostCpuUsage | Critical | >90% CPU used. | |
HostRamUsage | Critical | >90% RAM used. | |
HostDiskUsage | Critical | >90% disk space used on a monitored_mountpoint . |
|
ProcessCpuUsage | Critical | >99 % CPU used on average by a process for 15 minutes. | |
Backup | BackupVerificationFailed | Critical | Backup verification fails. |
Recovery | GaleraWsrepMissing | Critical | wsrep_cluster_address or wsrep_provider is missing, and still missing after 20 sample cycles which are ~ 100 seconds in this case) |
GaleraClusterSplit | Critical | There is a split brain. | |
ClusterRecoveryFail | Critical | Recovery has failed. | |
GaleraConfigProblem1 | Critical | A configuration issue preventing the node to start. | |
GaleraNodeRecoveryFail | Critical | Automatic recovery has failed 3 consecutive times. | |
ReplicationFailoverBlacklistError | Critical | In the case of automatic failover, the only possible candidate is blacklisted, then this alarm is raised with critical severity. | |
Network | HostUnreachable | Critical | The host is not responding to ping after 3 cycles. |
HostSshFailed | Critical | Please check SSH access to the host. The host may also be down. | |
HostSshAuth | Critical | Please check whether the configured SSH key is authenticated on the host. | |
HostSudoError | Critical | sudo command error on the host. |
|
HostSshSlow | Critical | It takes >12 seconds to SSH into a host. | |
Cluster | ClusterFailure | Critical | Cluster is a failure. |
ClusterLicenseExpire | Critical | The license is expired. |
Webhook Integrations
For webhook integrations, ClusterControl will raise an alarm by sending the JSON data using the HTTP POST method to the configured endpoint. The webhook endpoint should receive the following example event if a new alarm is created:
{
"id": 470,
"status": "CREATED",
"component": "Node",
"hostname": "192.168.20.62",
"title": "Server disconnected",
"message": "PostgreSQL server on 192.168.20.62:5432 disconnected: Connect failure: Watchdog: Failed connection to [email protected]:5432. timeout expired\n",
"recommendation": "Check node status on UI and error log of failed server.",
"severity": "CRITICAL"
}
The above event was created and sent out to the webhook endpoint after ClusterControl has detected that one of our database servers (192.168.20.62) was down. After the server came back online, ClusterControl would then send another event with the same alarm ID (470) with a different status “ENDED”, indicating the above alarm has ended and the node is back operational:
{
"id": 470,
"status": "ENDED",
"component": "Node",
"hostname": "192.168.20.62",
"title": "Server disconnected",
"message": "PostgreSQL server on 192.168.20.62:5432 disconnected: Connect failure: Watchdog: Failed connection to [email protected]:5432. timeout expired\n",
"recommendation": "Check node status on UI and error log of failed server.",
"severity": "CRITICAL"
}
Commonly, the “ended” event has an almost identical response text with the “created” event except for the “status” value.
You may use https://webhook.site/ to test out the webhook integration with ClusterControl.
Cloud Providers
Manages resources and credentials for cloud providers. Note that this new feature requires two modules called clustercontrol-cloud
and clustercontrol-clud
. The former is a helper daemon that extends CMON’s capability of cloud communication, while the latter is a file manager client to upload and download files on cloud instances. Both packages are dependencies of the clustercontrol
UI package, which will be installed automatically if does not exist.
ClusterControl Components.
The credentials that have been set up here can be used to:
- Manage cloud resources (instances, virtual network, subnet)
- Deploy databases in the cloud
- Upload backup to cloud storage
To create a cloud profile, click on Add Cloud Credentials and follow the wizard accordingly. Supported cloud providers are:
- Amazon Web Service
- Google Cloud Platform
- Microsoft Azure
Amazon Web Services Credential
The stored AWS credential will be used by ClusterControl to list out Amazon EC2 instances, spin new instances when deploying a cluster, and upload/download backups to AWS S3.
To create an access key for your AWS account root user:
- Use your AWS account email address and password to sign in to the AWS Management Console as the AWS account root user.
- On the IAM Dashboard page, choose your account name in the navigation bar, and then choose My Security Credentials.
- If you see a warning about accessing the security credentials for your AWS account, choose to Continue to Security Credentials.
- Expand the Access keys (access key ID and secret access key) section.
- Choose Create New Access Key. Then choose Download Key File to save the access key ID and secret access key to a file on your computer. After you close the dialog box, you can’t retrieve this secret access key again.
Field | Description |
---|---|
Name | Credential name. |
AWS Key ID | Your AWS Access Key ID as described on this page. You can get this from the AWS IAM Management console. |
AWS Key Secret | Your AWS Secret Access Key as described on this page. You can get this from the AWS IAM Management console. |
Default Region | Choose the default AWS region for this credential. |
Comment (Optional) | Description of the credential. |
AWS Instances
Lists out your AWS instances. You can perform simple AWS instance management tasks directly from ClusterControl, which uses your defined AWS credentials to connect to the AWS API.
Field | Description |
---|---|
AWS Credentials | Choose which credential to use to access your AWS resources. |
Stop | Shut down the instance. |
Reboot | Restart the instance. |
Terminate | Shut down and terminate the instance. |
AWS VPC
This allows you to conveniently manage your VPC from ClusterControl, which uses your defined AWS credentials to connect to AWS VPC. Most of the functionalities are dynamically populated and integrated to have the same look and feel as the AWS VPC console. Thus, you may refer to the VPC User Guide for details on how to manage AWS VPC.
Field | Description |
---|---|
Start VPC Wizard | Open the VPC creation wizard. Please refer to Getting Started Guide for details on how to start creating a VPC. |
AWS Credentials | Choose which credentials to use to access your AWS resources. |
Region | Choose the AWS region for the VPC. |
VPC |
List of VPCs created under the selected region.
|
Subnet |
List of VPC subnet created under the selected region.
|
Route Tables | List of routing tables created under the selected region. |
Internet Gateway | List of security groups created under the selected region. |
Network ACL | List of network Access Control Lists created under the selected region. |
Security Group | List of security groups created under the selected region. |
Running Instances | List of all running instances under the selected region. |
Google Cloud Platform Credentials
To create a service account:
- Open the Service Accounts page in the Cloud Platform Console.
- Select your project and click Continue.
- In the left navigation, click Service accounts.
- Look for the service account for which you wish to create a key, click on the vertical ellipses button in that row, and click Create key.
- Select
JSON
as the Key type and click Create
Field | Description |
---|---|
Name | Credential name. |
Read from JSON | The service account definition in JSON format. |
Comment (Optional) | Description of the credential. |
Microsoft Azure Credentials
In order to provide access to Azure services, you need to register an application and grant it access to your Azure resources.
- Create an application:
1. log in to Microsoft Azure portal → Azure Active Directory → App registrations → New application registration.
2. Provide a name and URL for the application. Select “Web app / API” for the type of application you want to create.
3. After specifying the values, click Create. - Get application ID and authentication key:
1. From App registrations in Azure Active Directory, select your application.
2. Copy the Application ID. You should pass that value asapplication_id
.
3. To generate an authentication key, select Manage → Certificates & secrets → New client secret.
4. Provide a description of the key and a duration for the key. When done, select Save.
5. After saving the key, the value of the key is displayed. Copy this value because you are not able to retrieve the key later. Pass this value asclient_secret
. - Get tenant ID:
1. Go to Microsoft Azure portal → Azure Active Directory → Properties for your Azure AD tenant.
2. Copy the Directory ID. Pass this value astenant_id.
- Get subscription ID:
1.Go to Microsoft Azure portal → Subscriptions.
2. Select your subscription from the list.
3. Copy the Subscription ID. Pass this value assubscription_id
. - Create a resource group:
1. Go to Microsoft Azure portal → Resource groups → Add.
2.Fill in the values and click Create.
3. Copy the Resource group name and use it asresource_group
in the credentials.
4. Wait until the Resource group is created and click Go to Resource group → Access control (IAM) → Add → Add Role Assignment.
5. Select Contributor as a Role then put your application’s name into search input and select it from the list.
6. Click Save. - Create a storage account (for Upload Backup to Cloud feature):
1. Go to Microsoft Azure portal → Storage accounts → Add.
2. Fill in the Name and select Blob storage as Account kind.
3. Copy the Name value and use it asstorage_account
in credentials.
4. Select Enabled for Secure transfer required, select Subscription and Resource group (use the same Resource group as in the previous steps).
5. Select the storage Location and then click Create.
Finally, create a new text file on your workstation and copy all the required information retrieved from the previous steps in a JSON
format. For example:
{
"application_id":"7f649053-xxxx-xxxx-xxxx-2179c1fa83b8",
"client_secret":"jbzW9tj4AyXHDkfO/KoTL9OP5EexpD6jeHROo2S4xxxx",
"tenant_id":"ce6b8358-xxxx-xxxx-xxxx-49b8c7a5cbc2",
"subscription_id":"6fafe95c-xxxx-xxxx-xxxx-1c33daa1c2c3",
"resource_group":"cc",
"storage_account":"mybackupazure"
}
Then, when configuring the Azure credentials, load the above text file under Read from JSON field. Take note that storage_account
value is optional.
The uploaded backup will be available under BLOB CONTAINERS storage. You can verify its existence in the cloud by going to Microsoft Azure portal → Storage Accounts → [your storage account] → Storage Explorer → BLOB CONTAINERS.
Field | Description |
---|---|
Name | Credential name. |
Read from JSON | The service account definition in JSON format. |
Comment (Optional) | Description of the credential. |
S3-Compatible Storage Provider Credentials
ClusterControl 1.9.0 introduces support for any cloud storage provider that supports AWS’s S3-compatible object storage API.
Field | Description |
---|---|
Name | Credential name. |
Endpoint | The S3-compatible endpoint in {host}:{port} format. For OpenStack Swift, see this. |
Region (Optional) | The region name. This field is optional. |
Access Key | The access key provided by the cloud provider identity access management console. |
Secret Key | The secret for the defined Access Key. |
Use SSL (Optional) | Whether to use plain HTTP or HTTPS. |
Comment (Optional) | Description of the credential. This field is optional. |
Starting from ClusterControl 1.9.7 (September 2023), ClusterControl GUI v2 is the default frontend graphical user interface (GUI) for ClusterControl. Note that the GUI v1 is considered a feature-freeze product with no future development. All new developments will be happening on ClusterControl GUI v2. See User Guide (GUI v2).