Overview

ClusterControl provides comprehensive backup and restore capabilities to ensure the continued availability and integrity of your database systems. This overview outlines how ClusterControl simplifies the process of creating reliable backups, managing your backup strategy, and performing efficient data restoration for various database technologies.

What you can do

Here are the list of things you can do or what you will get with backups and restore features in ClusterControl, across its supported database engines:

Centralized backup management platform for all your database clusters - Create, schedule, verify, store, and restore backups, all from a unified dashboard in ClusterControl GUI or via the ClusterControl CLI.
Create or schedule physical or logical database backup on any database node.
Create or schedule full, incremental, or partial (specific schema/table).
Define rotation and retention policies for local backups or cloud backups. The old backups will be automatically purged.
Get alerts via email, PagerDuty, Slack or custom webhooks on any backup-related events.
Create or schedule operational reports for backup status.
Backups can be stored on local storage, controller storage, public cloud object storage, or any private cloud object storage that supports S3-compatible storage.
Automatic backup verification.
Ability to restore to a temporary database server, not part of the cluster.
Create a new cluster from an existing backup.
Backup encryption.
Backup compression with configurable compression level.
Point-in-time recovery for MySQL, MariaDB and PostgreSQL.
Partial backup and restore.
Backup history.
Manage backups via ClusterControl CLI.
Incremental or differential backup chaining - Backups are grouped together based on the backup type.
Auto select backup host and backup failover.
Backup throttling.
Configure the backup file name and its subdirectory format.
Upload/Download existing backups to/from cloud.

Backup jobs

The backup process performed by ClusterControl is running as a background thread (RUNNING3) which doesn’t block any other non-backup jobs in the queue. If the backup job takes hours to complete, other non-backup jobs can still run simultaneously via the main thread (RUNNING). You can see the job progress at ClusterControl GUI → Activity center → Jobs.

Note that ClusterControl executes a single backup job per cluster at any given time. If multiple scheduled backups for a cluster overlap, the subsequent schedules will not start until the active backup job is finished. For instance, if a full daily backup is set for 7:00 AM, and incremental backups are scheduled hourly, the 7:00 AM incremental backup will likely wait for the full backup to complete before it begins. You may avoid this by excluding certain hours in the schedule by using the scheduler advanced editor, which follow the UNIX cron format, as shown in the following example:

Backup encryption and decryption

If the encryption option is enabled for a particular backup, ClusterControl will use OpenSSL to encrypt the backup using the AES-256 CBC algorithm. Encryption happens on the backup node. If you choose to store the backup on the controller node, the backup files are streamed over in encrypted format through socat or netcat.

If compression is enabled, the backup is first compressed and then encrypted resulting in smaller backup sizes. The encryption key will be generated automatically (if not exists) and stored inside the CMON configuration for the particular cluster under backup_encryption_key the option. This key is stored with base64 encoded and should be decoded first before using it as an argument to pass when decrypting the backup. The following command shows how to decode the key:

cat /etc/cmon.d/cmon_X.cnf | grep ^backup_encryption_key | cut -d"'" -f2 | base64 -d > keyfile.key

Where X is the cluster ID. The above command will read the backup_encryption_key value and decode the value to a binary output. Thus, it is important to redirect the output to a file, as in the example, we redirected the output to keyfile.key. The key file which stores the actual encryption key can be used in the OpenSSL command to decrypt the backup, for example:

$ cat {BACKUPFILE}.aes256 | openssl enc -d -aes-256-cbc -pass file:/path/to/keyfile.key > backup_file.xbstream.gz

Or, you can pass the stdin to the respective restore command chain, for example:

$ cat {BACKUPFILE}.aes256 | openssl enc -d -aes-256-cbc -pass file:/path/to/keyfile.key | gzip -dc | xbstream -x -C /var/lib/mysql

Backup retention

Backup retention can be configured at ClusterControl GUI → choose a cluster → Backups → ... → Backup Settings → Default backups retention period (default is 31 days) to the number of days ClusterControl keeps the existing backups. Backups older than the value defined will be deleted. You can also customize the retention period per backup (default, custom or keep forever) under the Backup Retention field when creating or scheduling a backup.

The purging is based on the following conditions:

When a new backup is successfully created, and if no verified backup is requested, the older backups will be checked and removed.
When the verified backup is successfully created, the older backups will be checked and removed.
The backup housekeeping job remains executed every 24 hours. Thus, if no backups are created and no backups are verified, the backup retention still will be done every 24 hours

For cloud-storage backups, they follow the Cloud backup retention period setting (default is 180 days).

Danger

Backups exceeding the retention period will be deleted. Once the backups are deleted, they can’t be recovered.

Supported backup methods

ClusterControl offers support for various backup tools and methods, tailored to the specific database clusters in use. The table below details these options:

Cluster type	Supported backup methods
MySQL-based clusters	mysqldump Percona Xtrabackup (full, incremental)
MariaDB-based clusters	mariadb-dump MariaDB Backup (full, incremental)
PostgreSQL-based clusters	pg_dump pg_dumpall pg_basebackup pgBackRest (full, incremental, differential)
MongoDB-based clusters	mongodump Percona Backup for MongoDB
Redis-based clusters	RDB AOF
SQL Server	Full Differential Transaction log
Elasticsearch	Elasticsearch snapshot

Each database backup method is explained in the following sections.

mysqldump

ClusterControl performs mysqldump against all or selected databases by using the --single-transaction option. It automatically performs mysqldump with --master-data=2 if it detects binary logging is enabled on the particular node to generate a binary log file and position statement in the dump file. ClusterControl generates a set of 4 mysqldump files with the following suffixes:

_data.sql.gz – Schemas’ data.
_schema.sql.gz – Schemas’ structure.
_mysqldb.sql.gz – MySQL system database.
_triggerseventroutines.sql.gz – MySQL triggers, events, and routines.

Percona Xtrabackup

Percona Xtrabackup is an open-source MySQL hot backup utility from Percona. It is a combination of xtrabackup (built in C) and innobackupex (built on Perl) and can back up data from InnoDB, XtraDB, and MyISAM tables. Xtrabackup does not lock your database during the backup process. For large databases (100+ GB), it provides a much better restoration time as compared to mysqldump. The restoration process involves preparing MySQL data from the backup files before replacing or switching it with the current data directory on the target node.

Since its ability to create full and incremental MySQL backups, ClusterControl manages incremental backups, and groups the combination of full and incremental backups in a backup set. A backup set has an ID based on the latest full backup ID. All incremental backups after a full backup will be part of the same backup set. The backup set can then be restored as one single unit using the Restore Backup feature.

Attention

Without a full backup to start from, the incremental backups are useless.

mariadb-dump

The mariadb-dump is a logical backup utility included with MariaDB Server. It's essentially MariaDB’s fork of the original MySQL mysqldump tool and is a replacement to mysqldump since MariaDB 10.4. It supports MariaDB system-versioned tables, sequence, virtual columns and MariaDB-specific privilege grants which are different than the MySQL implementation.

On all supported versions for MariaDB, ClusterControl will default to mariadb-dump as the preferred logical backup method.

MariaDB Backup

MariaDB Backup is a fork of Percona XtraBackup with added support for compression and data-at-rest encryption available in MariaDB, included in MariaDB 10.1.23 and later. It is an open-source tool provided by MariaDB for performing physical online backups of InnoDB, Aria, and MyISAM tables. MariaDB Backup is available on Linux and Windows.

On all supported versions for MariaDB, ClusterControl will default to MariaDB Backup as the preferred backup method and snapshot state transfer (SST) method.

pg_dump/pg_dumpall

Depending on the database backup target configuration, ClusterControl will automatically choose the suitable method to create the logical backup using either pg_dump or pg_dumpall backup tool. Commonly, pg_dump will be chosen for a partial logical backup, while pg_dumpall for a full logical backup.

ClusterControl performs pg_dumpall against all databases together with --clean the option, which includes SQL commands to clean (drop) databases before recreating them. DROP commands for roles and tablespaces are added as well. The output will be .sql.gz extension and the file name contains the timestamp of the backup.

pg_basebackup

pg_basebackup is used to take base backups of a running PostgreSQL database cluster. These are taken without affecting other clients to the database and can be used both for point-in-time recovery and as the starting point for a log shipping or streaming replication standby server. It makes a binary copy of the database cluster files while making sure the system is put in and out of backup mode automatically. Backups are always taken of the entire database cluster; it is not possible to back up individual databases or database objects.

ClusterControl connects to the replication stream using the replication user (default is cmon_replication) with --wal-method=fetch option when creating the backup. The output will be base.tar.gz inside the backup directory.

pgBackRest

pgBackRest is an open-source software developed to perform efficient backup on PostgreSQL databases that measure in tens of terabytes and greater. It supports per-file checksums, compression, partial/failed backup resume, high-performance parallel transfer, asynchronous archiving, tablespaces, expiration, full/differential/incremental, local/remote operation via SSH, hard-linking, restore, and more. The tool is written in Perl and does not depend on rsync or tar but instead performs its own deltas which give it maximum flexibility.

Starting from ClusterControl 1.9.0, pgBackRest can be configured as follows:

Primary: Install on the current master but not on slaves. The backup repository (host to store the backup data) will be configured to be on the master. There will be no SSH configuration for PgBackRest.
All database nodes: Install on all database nodes. The backup repository will be created on the current master. The backup will be made by using a standby node. PgBackRest will use SSH for communication between hosts.
All database nodes and a dedicated repository host: Install on all PostgreSQL database nodes. The backup repository will be made on a specified host. The backup will be made by using a standby node. PgBackRest will use SSH for communication between hosts.

Note

The pgBackRest backup directory cannot be reused or shared with other backup methods.

During the first attempt of making pgBackRest backup, ClusterControl will re-configure the node to install and configure pgBackRest. Take note that this operation requires a database restart and might introduce downtime to your database. A configuration file will be created at /etc/pgbackrest.conf and will be configured according to the version used and location of the PostgreSQL data. The pgBackRest default repository path is located at /var/lib/pgbackrest. Additionally, ClusterControl will configure the following lines inside postgresql.conf (which explains why it requires restart during the first run):

archive_mode = on 
archive_command = 'pgbackrest --stanza=clustercontrol-stanza archive-push %p'

Attention

PITR using pg_basebackup will no longer be possible if switching the WAL archiving method from "local WAL archiving" to "pgBackRest".

Info

In the ClusterControl GUI, pgBackRest nodes are listed with a fake port number (200000 + cluster ID), due to the internal CmonHost object requirement for a host:port combination. As a matter of fact, the pgBackRest process does not listen, nor need a port number.

Full backup

Full backup of pgBackRest copies the entire contents of the database cluster to the backup. The first backup of the database cluster is always a Full Backup. pgBackRest is always able to restore a full backup directly. The full backup does not depend on any files outside of the full backup for consistency.

Differential backup

For differential backup, pgBackRest copies only those database cluster files that have changed since the last full backup. pgBackRest restores a differential backup by copying all of the files in the chosen differential backup and the appropriate unchanged files from the previous full backup. The advantage of a differential backup is that it requires less disk space than a full backup, however, the differential backup and the full backup must both be valid to restore the differential backup.

For example, if a full backup is taken on Sunday and the following daily differential backups are scheduled, the data that is backed up will be:

Monday – data from Sunday to Monday
Tuesday – data from Sunday to Tuesday
Wednesday – data from Sunday to Wednesday
Thursday – data from Sunday to Thursday

Incremental backup

For incremental backup, pgBackRest copies only those database cluster files that have changed since the last backup (which can be another incremental backup, a differential backup, or a full backup). As an incremental backup only includes those files changed since the prior backup, they are generally much smaller than full or differential backups. As with the differential backup, the incremental backup depends on other backups to be valid to restore the incremental backup. Since the incremental backup includes only those files since the last backup, all prior incremental backups back to the prior differential, the prior differential backup, and the prior full backup must all be valid to perform a restore of the incremental backup. If no differential backup exists then all prior incremental backups back to the prior full backup, which must exist, and the full backup itself must be valid to restore the incremental backup.

For example, if a full backup is taken on Sunday and the following daily incremental backups are scheduled, the data that is backed up will be:

Monday – data from Sunday to Monday
Tuesday – data from Monday to Tuesday
Wednesday – data from Tuesday to Wednesday
Thursday – data from Wednesday to Thursday

mongodump

ClusterControl uses the standard command to perform mongodump with --journal, which allows mongodump operations to use the durability journal to ensure that the export is in a consistent state against shards.

Info

mongodump (and mongorestore) is not available for MongoDB Sharded Cluster 4 and later that have sharded transactions in progress, as backups created with mongodump do not maintain the atomicity guarantees of transactions across shards.

Percona Backup for MongoDB

Percona Backup for MongoDB is a distributed, low-impact solution for achieving consistent backups of MongoDB sharded clusters and replica sets. Percona Backup for MongoDB supports Percona Server for MongoDB and MongoDB Community v3.6 or higher with MongoDB Replication enabled (standalone is not supported due to the dependency on MongoDB’s oplog). The Percona Backup for MongoDB project inherited from and replaces mongodb-consistent-backup, which is no longer actively developed or supported.

Percona Backup for MongoDB requires an extra step for installation and configuration, and it is not enabled by default. You can use ClusterControl to install this tool by going to ClusterControl → GUI Clusters → choose the MongoDB cluster → Cluster Action → Install Percona Backup, or simply choose the backup method percona-backup-mongodb from the dropdown list when configuring a backup method for a backup. If the tool is not installed, ClusterControl will advise on installing the backup tool first before moving forward to the next step.

Attention

Percona Backup for MongoDB requires a remote file server mounted to a local directory. It is the responsibility of the server administrators to guarantee that the same remote directory is mounted at exactly the same local path on all servers in the MongoDB cluster or non-sharded replica set. If the path is accidentally a normal local directory, errors will eventually occur, most likely during a restore attempt.

Redis/Valkey RDB & AOF

RDB is Valkey Database Backup file. It is a dump of all user data stored in an internal, compressed serialization format at a particular timestamp which is used for point-in-time recovery (recovery from a timestamp). AOF stands for Append Only File, which is actually a persistence technique in which an RDB file is generated once and all the data is appended to it as it comes.

When backing up Redis or Valkey, ClusterControl backs up both RDB and AOF (if enabled) files on the selected database node.

MSSQL Backup

ClusterControl performs a backup routine on a Microsoft SQL Server by using the sqlcmd client to connect to the SQL Server and take backups. For a full backup, ClusterControl will iterate against all databases and create a standard full backup. Full database backups represent the database at the time the backup is finished.

For differential backup, it only backs up the data that has changed since the last full backup. This type of backup requires you to work with fewer data than a full database backup, while also shortening the time required to complete a backup. All the differential backups will be grouped under the last full backup on the Backup page.

Minimally, one must have created at least one full backup before one can create any log backups. After that, the transaction log can be backed up at any time unless the log is already being backed up. It is recommended to take log backups frequently, both to minimize work loss exposure and to truncate the transaction log.

A database administrator typically creates a full database backup occasionally, such as weekly, and, optionally, creates a series of differential database backups at shorter intervals, such as daily. Independent of the database backups, the database administrator backs up the transaction log at frequent intervals.

One of the limitations of using MSSQL Backup is backups created by a more recent version of SQL Server cannot be restored in earlier versions of SQL Server.

Elasticsearch Snapshot

An Elasticsearch snapshot is a way to back up your Elasticsearch indices and data. The process involves creating a snapshot of one or more indices and storing it in a designated repository. This snapshot can then be used to restore the indices to their previous state in case of data loss or other issues. The snapshot process is incremental, meaning that only changes made since the last snapshot are stored. Snapshots can be taken manually or automatically on a schedule, and can also be used to migrate data between clusters or to create new indices.

Note that during cluster deployment, ClusterControl configures Elasticsearch snapshot storage by requiring the specification of the repository name, storage location, and file system path early in the process. The snapshot repository will then be available under the Backups section of the Elascticsearch cluster.