Maintenance

ClusterControl offers a maintenance mode for individual nodes or entire clusters. This feature suppresses alarms and notifications for a set or scheduled duration. Users can also document the reason for the maintenance, such as RAM upgrades or OS patching, for auditing. While in maintenance mode, ClusterControl will not automatically alter the node's state unless manual actions are taken.

Attention

If automatic cluster/node recovery is enabled, ClusterControl will always recover a node/cluster regardless of the maintenance mode status. Do not forget to disable cluster/node recovery to avoid ClusterControl interfering with your maintenance tasks.

Type of maintenance mode

ClusterControl has a built-in logic to handle two type of maintenance mode:

Cluster-wide maintenance
Individual node maintenance

Cluster-wide

Cluster-wide maintenance mode lets you schedule a maintenance window for the entire cluster rather than on a per-node basis. This is suitable if you want to perform maintenance on all nodes in the cluster in the same maintenance window. When active:

Alarms and notifications are suppressed; ClusterControl will not generate alerts (e.g., threshold breaches, failover notifications) for cluster events during the maintenance period.
All nodes are marked “under maintenance”. Every node regardless of role (primary, replica, proxy, etc.) is shown with the maintenance badge in ClusterControl GUI, and a global banner flags the upcoming or active window.
Automatic recovery can be disabled (optional). By default recovery/failover remains enabled, but you can turn it off to avoid “false” repairs during planned work.

Individual node

Schedule a maintenance window for a specific cluster node when targeted maintenance is required. While active:

Alarms and notifications are suppressed; ClusterControl will not generate alerts (e.g., threshold breaches, failover notifications) for node events during the maintenance period.
Only the selected node is marked "under maintenance" and will display a maintenance badge in the ClusterControl GUI.
Automatic recovery can be disabled (optional). By default recovery/failover remains enabled, but you can turn it off to avoid “false” repairs during planned work.

Configuring maintenance mode

Maintenance mode can be configured from ClusterControl GUI and also using ClusterControl CLI.

ClusterControl GUIClusterControl CLI

Cluster-wide maintenance mode

Log in to your ClusterControl GUI → choose a database cluster → Actions → Schedule maintenance.
Choose a database cluster.
Go to Actions → Schedule maintenance. This will bring up a panel where you can activate a cluster-wide maintenance mode.

Node maintenance mode

Log in to your ClusterControl GUI → Clusters.
Choose a database cluster.
Go to Nodes → select the node → Actions → Schedule maintenance. This will bring up a panel where you can activate maintenance mode for the corresponding node.

Alarms and notifications will be reactivated once the maintenance period is over, or the operator explicitly disables it by going to Nodes → select the node → Actions → Disable Maintenance Mode or _choose a database cluster → Actions → Disable Maintenance Mode_.

Create a maintenance period for PostgreSQL node 10.35.112.21, starting at 05:44:55 AM for one full day (cmon expects UTC time to create a maintenance):

s9s maintenance \
     --create \
     --nodes=10.35.112.21:5432 \
     --begin=2024-05-15T05:44:55.000Z \
     --end=2024-05-16T05:44:55.000Z \
     --reason='Upgrading RAM' \
     --batch

Create a new maintenance period for 192.168.1.121 which shall start tomorrow and finish an hour later:

s9s maintenance --create \
       --nodes=192.168.1.121 \
       --begin="$(date --date='now + 1 day' --utc '+%Y-%m-%d %H:%M:%S')'" \
       --end="$(date --date='now + 1day + 1 hour' --utc '+%Y-%m-%d %H:%M:%S')" \
       --reason="Upgrading software."

Checking maintenance status

Check the maintenance status allowing you to quickly determine if maintenance mode is active. Alarms and notifications will be deactivated if the maintenance mode is active.

ClusterControl GUIClusterControl CLI

Cluster-wide maintenance mode

Log in to your ClusterControl GUI → Clusters.
Choose a database cluster.
If you see a blue wrench icon next to the cluster status, it means the maintenance mode is active for the cluster. Roll over on the wrench icon will show a pop-up card with more details on the maintenance like status, end time, who iniatiated the maintenance and the reason.

Node maintenance mode

Log in to your ClusterControl GUI → Nodes.
Choose a node from the list.
If you see a blue wrench icon next to the node status, it means the maintenance mode is active for the node. Roll over on the wrench icon will show a pop-up card with more details on the maintenance like status, end time, who iniatiated the maintenance and the reason.

List out all clusters or nodes that are under maintenance period:
```
s9s maintenance --list --long
```

Deactivating maintenance mode

Maintenance mode automatically ends after the set end time. You can also manually deactivate it earlier once maintenance is finished and the server is ready. Alarms and notifications will be reactivated once the maintenance period is over.

To deactivate the maintenance mode:

ClusterControl GUIClusterControl CLI

Cluster-wide maintenance mode

Log in to your ClusterControl GUI → Clusters.
Choose the database cluster.
Go to Actions → Maintenance → Stop current maintenance. Confirm the action in the confirmation box to proceed.

Node maintenance mode

Log in to your ClusterControl GUI → Clusters.
Choose the database cluster.
Go to Nodes → select the node → Actions → Maintenance → Stop current maintenance. Confirm the action in the confirmation box to proceed.

Find the item UUID by listing out all nodes that are under maintenance:
```
s9s maintenance --list --long
```
Delete a maintenance period for UUID 70346c3:
```
s9s maintenance --delete --uuid=70346c3
```