Kubernetes
Info
This feature is in Technical Preview phase.
ClusterControl enables database management and monitoring in Kubernetes by leveraging community and vendor-provided database operators. The agent-operator interacts with Kubernetes resources like database operators and other Kubernetes services. The Kubernetes namespace used for all ClusterControl custom resources is severalnines-system.
The integration with our Kubernetes operator is handled with a new proxy process, kuber-proxy, which is installed alongside ClusterControl Ops-C.
Steps
To set up your first database in Kubernetes, follow these steps.
1. Install ClusterControl Ops-C / MCC
ClusterControl Ops-C / MCC is now installed by default so you just need to follow the instructions in the Installation section to install it. See Online Installation.
Check that the kuber-proxy process is reachable on port 50051 (default) either using a public or private address depending on your setup (public managed Kubernetes cluster vs on-premise for example).
2. Install the required tools
The following tools are required:
3. Create a Kubernetes Cluster
If you don't have it yet, create a new Kubernetes Cluster. For example, you can do it on DigitalOcean or Google Kubernetes Engine. Follow the instructions and specify the resources and configurations.
4. Connect to a Kubernetes cluster / environment
After creating the Kubernetes cluster (or if it was already created), you have to connect to the environment by installing the agent-operator. The Kubernetes agent-operator establishes the connection to your Kubernetes resources. It acts as a bridge between ClusterControl and the Kubernetes environment, allowing you to monitor and perform operations directly through ClusterControl.
The Kubernetes integration is accessed from a new dedicated page, Kubernetes, where you can manage all things related to Kubernetes like your database clusters, backups, database operators and connections to different Kubernetes clusters.
You can add any Kubernetes cluster whether it is a on-premise self-hosted, a public managed service, Docker Desktop or OrbStack development environment as long as the agent-operator is able to connect to the kuber-proxy process using the provided GRPC address to the ClusterControl host where the proxy process is running. The default port that kuber-proxy is listening on is 50051.
The Kubernetes clusters that are successfully connected will appear in the Environments page.
The agent-operator is installed using a helm chart which is executed for the current active Kubernetes context (cluster). You can check which Kubernetes cluster that you are currently established with:
After you have verified that this is the Kubernetes cluster that you want to use, then connect this Kubernetes cluster with ClusterControl by clicking on Connect environment:
helm install kuber-agent kuber-agent --repo https://severalnines.github.io/helm-charts/ \
--set agent.name="cc-dev" \
--set proxy.grpcAddress="clustercontrol-k8s.dev.io:50051" \
--set agent.publicKey="<public-key>"
Copy agent.publicKey
variable, review and change the proxy.grpcAddress
if ClusterControl’s suggestion is incorrect before running the helm chart.
helm install kuber-agent kuber-agent --repo https://severalnines.github.io/helm-charts/ --namespace severalnines-system --create-namespace --set agent.publicKey="LS0tLS1CRUdJTiBSU0EgUFVCTElDIEtFWS0tLS0tCk1JSUJDZ0tDQVFFQW8zbG42QVU3Q2ZGMjkxOTlRb01iVEZEUU1sSEJzU3RsRWNKYjhMRlNmSS96NjllcWJLNzAKVnlGMmhuQW15UWZLWnVvMnFaQjRjdnNCclJSR2ZzVk1PR3owSG5DQlh6MTlUQTBXVmtEcWZ2M0xTRkhzbmV6agp2SXFSYmc5R2xQTWptc00000000000000000000000000000000zZzWXdha1lDdmhjckw2cnA2TzgreVRQQkRxCk1FWW54bW8xK3Y4RmJ5eWRIeXFkelIyQXkrMmk5d3lyMXpjRUJiOWRCVHZNQ3piNTNOZHBBQnQ4T2xTanAvZzcKdEVHWWJDbFZNa3ZVa0JJK1NXc3BYN21zbmNmZXFyNUZOZzJIUVd4a291bFpOQ2RVVEN0MXpSeUx0dmFSVWJTagpobC9mcUJydmFHWU1tNmdWRnFvQzZ0RjBOR00vVUFMdklRSURBUUFCCi0tLS0tRU5EIFJTQSBQVUJMSUMgS0VZLS0tLS0K" --set agent.name="cc-dev" --set proxy.grpcAddress="cc-ubuntu24.local:50051"
NAME: kuber-agent
LAST DEPLOYED: Mon Apr 14 20:42:19 2025
NAMESPACE: severalnines-system
STATUS: deployed
REVISION: 1
TEST SUITE: No
Verify that the agent-operator’s CRDs have been installed with:
$ kubectl get crds
NAME CREATED AT
backuppolicies.moco.cybozu.com 2025-03-11T11:29:16Z
backups.postgresql.cnpg.io 2025-03-06T22:44:13Z
certificaterequests.cert-manager.io 2025-03-11T11:28:08Z
certificates.cert-manager.io 2025-03-11T11:28:08Z
challenges.acme.cert-manager.io 2025-03-11T11:28:08Z
clusterimagecatalogs.postgresql.cnpg.io 2025-03-06T22:44:13Z
clusterissuers.cert-manager.io 2025-03-11T11:28:08Z
clusters.postgresql.cnpg.io 2025-03-06T22:44:13Z
.....
configversions.data.severalnines.com 2025-02-25T23:31:42Z
databasebackups.agent.severalnines.com 2025-03-06T22:01:14Z
databasebackupschedules.agent.severalnines.com 2025-03-04T09:20:56Z
databaseclusters.agent.severalnines.com 2025-03-04T10:14:27Z
databaseoperators.agent.severalnines.com 2025-03-04T09:20:56Z
^^^......^^^
databases.postgresql.cnpg.io 2025-03-06T22:44:13Z
imagecatalogs.postgresql.cnpg.io 2025-03-06T22:44:13Z
issuers.cert-manager.io 2025-03-11T11:28:08Z
mysqlclusters.moco.cybozu.com 2025-03-11T11:29:16Z
orders.acme.cert-manager.io 2025-03-11T11:28:08Z
perconaxtradbclusterbackups.pxc.percona.com 2025-02-18T21:38:11Z
perconaxtradbclusterrestores.pxc.percona.com 2025-03-04T05:46:12Z
perconaxtradbclusters.pxc.percona.com 2025-03-04T05:46:12Z
poolers.postgresql.cnpg.io 2025-03-06T22:44:13Z
publications.postgresql.cnpg.io 2025-03-06T22:44:13Z
scheduledbackups.postgresql.cnpg.io 2025-03-06T22:44:13Z
subscriptions.postgresql.cnpg.io 2025-03-06T22:44:13Z
IMAGE k8s connected
5. Deploy a Kubernetes Database Operator
Before deploying a database cluster, a corresponding database operator for a specific database vendor or technology must be installed in the Kubernetes cluster that you want to use. For example, cloudnative-pg for PostgreSQL.
You can deploy a database operator from the Environments page:
Or from the Operators page:
6. Create a new database cluster
Once the desired database operators have been deployed, the next step is to deploy a new database cluster using one those operators.
The database deployment wizard is in the Kubernetes -> Clusters page. Click on Deploy database cluster to open the deployment dialog.
Follow the deployment dialog steps by:
-
Enter a name for your database cluster.
-
Select which Kubernetes environment to use.
-
Select which database operator to use / database technology to deploy.
-
Select the Kubernetes namespace that your database should be contained in.
-
Select the number of database nodes that your cluster should have.
Once successfully deployed, your database will be listed in the Cluster page.
7. Database Connection Details
To enable your applications to connect to the database, you need the connection details for the database cluster. These details can be retrieved by selecting the cluster and using the Info menu action.
Supported operators
The following database operators are currently supported:
PostgreSQL
MySQL Replication
Managing Backups
Backups are created by the database operator and the method and feature set depends on what the operator supports. ClusterControl provide backups with cloudnative-pg and the moco operator with cloud based storage.
Existing cloud storage credentials created in ClusterControl will be migrated over to Kubernetes secrets so they can be re-used with the database operators.
The backups are normally created by adding a backup specification to a yaml
file and then applied to the database cluster via its database operator.
ClusterControl supports adding a backup schedule with a custom resource definition which tries to provide an abstraction over the different backup features that each of the database operators have.
Restoring a cluster from backup is also handled by the database operator.
Kubernetes Proxy Server Certificates
Kubernetes Proxy requires a Certificate Authority (CA) certificate, a CA private key, a server certificate, and a server private key to enable TLS for its gRPC services.
Automatic Certificate Generation (Default Behavior)
On startup, if Kubernetes Proxy does not find existing certificates at its configured paths (see Using Your Own Existing Certificates), it will automatically:
-
Generate a new Root Certificate Authority (CA):
ca.crt
(CA certificate) andca.key
(CA private key). -
Generate a new server certificate (
server.crt
) and server private key (server.key
). This server certificate will be signed by the auto-generated CA.
These files are typically created in a default directory such as /usr/share/kuber-proxy/certs/
. The proxy's startup logs will indicate the exact paths used if certificates are auto-generated.
Important
The CA certificate (ca.crt), whether auto-generated by the proxy or provided by you, is crucial. During the agent registration process, the proxy sends this CA certificate to the agent. The agent then uses this CA certificate to verify the proxy's server certificate in all subsequent mTLS connections, ensuring secure communication. The initial registration call from the agent to the proxy uses InsecureSkipVerify = true, so the agent does not need this CA certificate before its first successful registration.
Using Your Own Existing Certificates
For production environments or when you have specific PKI requirements, you will likely want to use your own existing CA and server certificates rather than relying on auto-generated ones.
Certificate Requirements
-
CA Certificate: Your CA's public certificate file (e.g.,
my_company_ca.crt
). -
CA Private Key: Your CA's private key file (e.g.,
my_company_ca.key
). The proxy needs this to be able to issue mTLS certificates for agents during their registration. -
Server Certificate: The proxy's server certificate (e.g.,
kuber_proxy_server.crt
), which must be signed by your CA. This certificate should have the correct Common Name (CN) and Subject Alternative Names (SANs) matching the hostname(s) and/or IP address(es) Kubernetes Proxy will be accessible on (e.g., DNS:kubernetes-proxy.example.com, IP:192.168.1.100). -
Server Private Key: The private key corresponding to the proxy's server certificate (e.g.,
kuber_proxy_server.key
).
File Permissions
The Kubernetes Proxy process must have read access to these certificate and key files. Ensure that the user account running Kubernetes Proxy has the necessary permissions. Private key files should be strictly protected (e.g., readable only by the proxy user/group).
Example (adjust user, group, paths, and permissions as per your setup):
# Example: Grant ownership to the 's9s_cc:severalnines' user and group
sudo chown s9s_cc:severalnines /opt/custom_certs/my_company_ca.crt
sudo chown s9s_cc:severalnines /opt/custom_certs/my_company_ca.key
sudo chown s9s_cc:severalnines /opt/custom_certs/kuber_proxy_server.crt
sudo chown s9s_cc:severalnines /opt/custom_certs/kuber_proxy_server.key
# Set restrictive permissions, especially for private keys
sudo chmod 644 /opt/custom_certs/my_company_ca.crt
sudo chmod 600 /opt/custom_certs/my_company_ca.key
sudo chmod 644 /opt/custom_certs/kuber_proxy_server.crt
sudo chmod 600 /opt/custom_certs/kuber_proxy_server.key
Configuration Methods
You can configure Kubernetes Proxy to use your existing certificates in one of two ways:
-
Using Default File Paths: Place your four certificate and key files (CA certificate, CA private key, server certificate, server private key) into the default directory that Kubernetes Proxy checks at startup, ensuring they are named as follows:
- CA Certificate:
/usr/share/kuber-proxy/certs/ca.crt
- CA Key:
/usr/share/kuber-proxy/certs/ca.key
- Server Certificate:
/usr/share/kuber-proxy/certs/server.crt
- Server Key:
/usr/share/kuber-proxy/certs/server.key
If Kubernetes Proxy finds these files with the correct names in its default certificate directory, it will use them.
- CA Certificate:
-
Using Environment Variables: Set the following environment variables to point to the absolute paths of your certificate and key files. This is often the preferred method for containerized deployments or when using custom file locations. These variables can be set in the proxy's service unit file (e.g.,
/etc/default/kuber-proxy
for systemd services) or as part of its container/pod environment.- PROXY_CA_CERT_PATH=
/path/to/your/ca.crt
- PROXY_CA_KEY_PATH=
/path/to/your/ca.key
- PROXY_CERT_PATH=
/path/to/your/server.crt
- PROXY_KEY_PATH=
/path/to/your/server.key
Replace
/path/to/your/
with the actual paths to your respective files. - PROXY_CA_CERT_PATH=
Troubleshooting Issues
Use k9s to interact with your Kubernetes cluster. It makes it easier to navigate and visualize the resources deployed.
Restart agent
Sometimes you may need to restart agent when issues occurring. Known issues which are fixed by restart are:
- Agent is in a disconnected state and doesn't attempt to connect to proxy.
- Moco operator backups are successful but can’t see backups in UI.
To restart, run the following command:
Agent with debug log
To see debug logs when installing or upgrading agent add --set debug.logLevel=debug
to helm install or upgrade command.
- Watch agent logs:
- Search in logs:
kubectl logs -l app.kubernetes.io/instance=kuber-agent -n severalnines-system --tail=-1 | grep "error"
Basic Troubleshooting Commands
- View Pods in a Specific Namespace:
- Get Detailed Information About a Resource:
- View Pod Logs:
- List All Pods Across Namespaces:
Example Issues and Solutions
Let's see some examples about different issues and its possible solutions.
Failed Database Operators
- Moco Operator Failure: Check the operator status:
kubectl get databaseoperator
NAME TYPE STATUS VERSION AGE
cnpg cnpg Ready 38m
moco moco Error 37m
stackgres stackgres Ready 35m
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
cert-manager cert-manager-86d7c7b689-znbz5 1/1 Running 0 13m
cert-manager cert-manager-cainjector-77894f5f57-djr94 1/1 Running 0 13m
cert-manager cert-manager-webhook-6cf469dbd6-tkvnd 1/1 Running 0 13m
cnpg-system cnpg-controller-manager-8db87d769-jcvp2 1/1 Running 1 (13m ago) 13m
default cnpg-cluster-1 1/1 Running 0 11m
default cnpg-cluster-2 1/1 Running 0 10m
default cnpg-cluster-3 1/1 Running 0 10m
severalnines-system kuber-agent-controller-manager-c6b7d9489-259gp 1/1 Running 0 19h
stackgres stackgres-operator-5bfff484d8-55dmj 0/1 ContainerCreating 0 10m
stackgres stackgres-operator-set-crd-version-n2rm7 0/1 Completed 0 10m
stackgres stackgres-restapi-6554545f7b-mmpmc 0/2 ContainerCreating 0 10m
-
Diagnosis: Moco operator requires cert-manager and operator pods. While cert-manager is running, the operator pods have not been deployed.
-
Check agent logs for details:
kubectl logs kuber-agent-controller-manager-c6b7d0000-000gp -n severalnines-system -f
2025-03-19T04:27:24Z ERROR Failed to install operator {"controller": "databaseoperator", "controllerGroup": "agent.severalnines.com", "controllerKind": "DatabaseOperator", "DatabaseOperator": {"name":"moco","namespace":"default"}, "namespace": "default", "name": "moco", "reconcileID": "af00009c-0000-4b9a-0000-faadfbf70000", "error": "failed to install cert-manager: failed to apply manifest: [failed to apply resource moco-system/cert-manager-edit: clusterroles.rbac.authorization.k8s.io \"cert-manager-edit\" is forbidden: user \"system:serviceaccount:severalnines-system:agent-operator-controller-manager\" (groups=[\"system:serviceaccounts\" \"system:serviceaccounts:severalnines-system\" \"system:authenticated\"]) is attempting to grant RBAC permissions not currently held:\n{APIGroups:[\"acme.cert-manager.io\"], Resources:[\"challenges\"], Verbs:[\"deletecollection\"]}\n{APIGroups:[\"acme.cert-manager.io\"], Resources:[\"orders\"], Verbs:[\"deletecollection\"]}\n{APIGroups:[\"cert-manager.io\"], Resources:[\"certificaterequests\"], Verbs:[\"deletecollection\"]}\n{APIGroups:[\"cert-manager.io\"], Resources:[\"certificates\"], Verbs:[\"deletecollection\"]}\n{APIGroups:[\"cert-manager.io\"], Resources:[\"issuers\"], Verbs:[\"deletecollection\"]}, failed to apply resource moco-system/cert-manager-controller-approve:cert-manager-io: clusterroles.rbac.authorization.k8s.io \"cert-manager-controller-approve:cert-manager-io\" is forbidden: user \"system:serviceaccount:severalnines-system:agent-operator-controller-manager\" (groups=[\"system:serviceaccounts\" \"system:serviceaccounts:severalnines-system\" \"system:authenticated\"]) is attempting to grant RBAC permissions not currently held:\n{APIGroups:[\"cert-manager.io\"], Resources:[\"signers\"], ResourceNames:[\"clusterissuers.cert-manager.io/*\"], Verbs:[\"approve\"]}\n{APIGroups:[\"cert-manager.io\"], Resources:[\"signers\"], ResourceNames:[\"issuers.cert-manager.io/*\"], Verbs:[\"approve\"]}, failed to apply resource moco-system/cert-manager-controller-certificatesigningrequests: clusterroles.rbac.authorization.k8s.io \"cert-manager-controller-certificatesigningrequests\" is forbidden: user \"system:serviceaccount:severalnines-system:agent-operator-controller-manager\" (groups=[\"system:serviceaccounts\" \"system:serviceaccounts:severalnines-system\" \"system:authenticated\"]) is attempting to grant RBAC permissions not currently held:\n{APIGroups:[\"certificates.k8s.io\"], Resources:[\"signers\"], ResourceNames:[\"clusterissuers.cert-manager.io/*\"], Verbs:[\"sign\"]}\n{APIGroups:[\"certificates.k8s.io\"], Resources:[\"signers\"], ResourceNames:[\"issuers.cert-manager.io/*\"], Verbs:[\"sign\"]}, failed to apply resource moco-system/cert-manager-controller-approve:cert-manager-io: clusterroles.rbac.authorization.k8s.io \"cert-manager-controller-approve:cert-manager-io\" not found, failed to apply resource moco-system/cert-manager-controller-certificatesigningrequests: clusterroles.rbac.authorization.k8s.io \"cert-manager-controller-certificatesigningrequests\" not found]"}
github.com/severalnines/clustercontrol-k8s/agent-operator/internal/controller.(*DatabaseOperatorReconciler).Reconcile
/app/agent-operator/internal/controller/databaseoperator_controller.go:275
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:303
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224
2025-03-19T04:27:24Z ERROR Reconciler error {"controller": "databaseoperator", "controllerGroup": "agent.severalnines.com", "controllerKind": "DatabaseOperator", "DatabaseOperator": {"name":"moco","namespace":"default"}, "namespace": "default", "name": "moco", "reconcileID": "af00009c-0000-4b9a-0000-faadfbf70000", "error": "failed to install cert-manager: failed to apply manifest: [failed to apply resource moco-system/cert-manager-edit: clusterroles.rbac.authorization.k8s.io \"cert-manager-edit\" is forbidden: user \"system:serviceaccount:severalnines-system:agent-operator-controller-manager\" (groups=[\"system:serviceaccounts\" \"system:serviceaccounts:severalnines-system\" \"system:authenticated\"]) is attempting to grant RBAC permissions not currently held:\n{APIGroups:[\"acme.cert-manager.io\"], Resources:[\"challenges\"], Verbs:[\"deletecollection\"]}\n{APIGroups:[\"acme.cert-manager.io\"], Resources:[\"orders\"], Verbs:[\"deletecollection\"]}\n{APIGroups:[\"cert-manager.io\"], Resources:[\"certificaterequests\"], Verbs:[\"deletecollection\"]}\n{APIGroups:[\"cert-manager.io\"], Resources:[\"certificates\"], Verbs:[\"deletecollection\"]}\n{APIGroups:[\"cert-manager.io\"], Resources:[\"issuers\"], Verbs:[\"deletecollection\"]}, failed to apply resource moco-system/cert-manager-controller-approve:cert-manager-io: clusterroles.rbac.authorization.k8s.io \"cert-manager-controller-approve:cert-manager-io\" is forbidden: user \"system:serviceaccount:severalnines-system:agent-operator-controller-manager\" (groups=[\"system:serviceaccounts\" \"system:serviceaccounts:severalnines-system\" \"system:authenticated\"]) is attempting to grant RBAC permissions not currently held:\n{APIGroups:[\"cert-manager.io\"], Resources:[\"signers\"], ResourceNames:[\"clusterissuers.cert-manager.io/*\"], Verbs:[\"approve\"]}\n{APIGroups:[\"cert-manager.io\"], Resources:[\"signers\"], ResourceNames:[\"issuers.cert-manager.io/*\"], Verbs:[\"approve\"]}, failed to apply resource moco-system/cert-manager-controller-certificatesigningrequests: clusterroles.rbac.authorization.k8s.io \"cert-manager-controller-certificatesigningrequests\" is forbidden: user \"system:serviceaccount:severalnines-system:agent-operator-controller-manager\" (groups=[\"system:serviceaccounts\" \"system:serviceaccounts:severalnines-system\" \"system:authenticated\"]) is attempting to grant RBAC permissions not currently held:\n{APIGroups:[\"certificates.k8s.io\"], Resources:[\"signers\"], ResourceNames:[\"clusterissuers.cert-manager.io/*\"], Verbs:[\"sign\"]}\n{APIGroups:[\"certificates.k8s.io\"], Resources:[\"signers\"], ResourceNames:[\"issuers.cert-manager.io/*\"], Verbs:[\"sign\"]}, failed to apply resource moco-system/cert-manager-controller-approve:cert-manager-io: clusterroles.rbac.authorization.k8s.io \"cert-manager-controller-approve:cert-manager-io\" not found, failed to apply resource moco-system/cert-manager-controller-certificatesigningrequests: clusterroles.rbac.authorization.k8s.io \"cert-manager-controller-certificatesigningrequests\" not found]"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224
2025-03-19T04:27:24Z INFO status-predicate DIAGNOSTIC: Detected status-only change, filtering out reconciliation {"name": "moco", "kind": "", "oldGeneration": 1, "newGeneration": 1, "resourceVersion": "24302778", "oldResourceVersion": "24302764"}
2025/03/19 04:27:25 Received message from proxy: {"group":"agent.severalnines.com","version":"v1alpha1","kind":"DatabaseOperator","namespace":"","limit":50,"continue":"","labelSelector":"","fieldSelector":"","annotationSelector":""} (list_resources)
Failed Backups
- List database backups:
kubectl get databasebackup
NAME STATUS AGE STARTED COMPLETED
cnpg-cluster-backup-cnpg-cluster-backup-backup-20250319055800 Failed 26s
- Check CNPG backup status:
kubectl get backup
NAME AGE CLUSTER METHOD PHASE ERROR
cnpg-cluster-backup-backup-20250319055800 94s cnpg-cluster barmanObjectStore failed can't execute backup: cmd: [/controller/manager backup cnpg-cluster-backup-backup-20250319055800]...
- Get backup details:
- Error:
Status:
Error: can't execute backup: cmd: [/controller/manager backup cnpg-cluster-backup-backup-20250319055800]
error: command terminated with exit code 1
stdErr: {"level":"info","ts":"2025-03-19T05:58:22.131051022Z","msg":"Error while requesting backup","logging_pod":"cnpg-cluster-2","backupURL":"http://localhost:8010/pg/backup","statusCode":500,"body":"error while requesting backup: while starting backup: cannot recover backup credentials: while getting secret s3-credentials: secrets \"s3-credentials\" not found\n"}
Node Maintenance and Troubleshooting
- Check Node Status
- Debug Node Issues
Start a debug container on a node:
- Manage Container Runtime
Access the host filesystem:
List Container Images:
List All Containers (Including Stopped):
- Disk Usage Management
Check Log Usage:
Check Container Runtime Storage:
- Cleanup Tasks
Remove Stopped Containers:
crictl ps -a -o json | jq -r '.containers[] | select(.state=="CONTAINER_EXITED") | .id' | xargs -r crictl rm
Remove Dangling Images:
crictl images -o json | jq -r '.images[] | select((.repoTags == null) or ((.repoTags | length)==0)) | .id' | xargs -r crictl rmi
Manually Purge Database Operators and Kuber-agent
Kuber-agent
Cloudnative-pg
#!/bin/bash
set -e
CNPG_NAMESPACE="cnpg-system" # Change this if you used a custom namespace
echo "===> Deleting CloudNative-PG Deployments..."
kubectl delete deployments --all -n ${CNPG_NAMESPACE} --ignore-not-found
echo "===> Deleting CloudNative-PG StatefulSets (if any)..."
kubectl delete statefulsets --all -n ${CNPG_NAMESPACE} --ignore-not-found
echo "===> Deleting CloudNative-PG Services..."
kubectl delete svc --all -n ${CNPG_NAMESPACE} --ignore-not-found
echo "===> Deleting CloudNative-PG Jobs (if any)..."
kubectl delete jobs --all -n ${CNPG_NAMESPACE} --ignore-not-found
echo "===> Deleting CloudNative-PG Custom Resources (PostgreSQL clusters)..."
kubectl get postgresql.cnpg.io -A -o name | while read cr; do
kubectl delete "$cr" --ignore-not-found
done
echo "===> Deleting CloudNative-PG CRDs..."
kubectl get crds | grep cnpg | awk '{print $1}' | while read crd; do
kubectl delete crd "$crd" --ignore-not-found
done
echo "===> Deleting leftover CloudNative-PG pods..."
kubectl delete pods --all -n ${CNPG_NAMESPACE} --ignore-not-found
echo "===> Attempting to delete namespace ${CNPG_NAMESPACE}..."
kubectl delete namespace ${CNPG_NAMESPACE} --ignore-not-found || true
# Force-remove finalizers if namespace is stuck
if kubectl get namespace ${CNPG_NAMESPACE} &>/dev/null; then
echo "===> Namespace ${CNPG_NAMESPACE} still exists. Removing finalizers..."
kubectl get namespace ${CNPG_NAMESPACE} -o json | jq '.spec.finalizers = []' | kubectl replace --raw "/api/v1/namespaces/${CNPG_NAMESPACE}/finalize" -f -
fi
echo "===> Final verification:"
kubectl get all -A | grep cnpg || echo "No running CloudNative-PG resources found."
kubectl get crds | grep cnpg || echo "No CloudNative-PG CRDs remaining."
echo "✅ CloudNative-PG operator purge completed."
Moco
#!/bin/bash
set -e
MOCO_NAMESPACE="moco-system"
echo "===> Deleting MOCO Deployments (if any)..."
kubectl delete deployments --all -n ${MOCO_NAMESPACE} --ignore-not-found
echo "===> Deleting MOCO StatefulSets (if any)..."
kubectl delete statefulsets --all -n ${MOCO_NAMESPACE} --ignore-not-found
echo "===> Deleting MOCO Services..."
kubectl delete svc --all -n ${MOCO_NAMESPACE} --ignore-not-found
echo "===> Deleting MOCO Jobs (if any)..."
kubectl delete jobs --all -n ${MOCO_NAMESPACE} --ignore-not-found
echo "===> Deleting MOCO Custom Resources (mysqlclusters, etc.)..."
kubectl get mysqlclusters.moco.cybozu.com -A -o name | while read cr; do
kubectl delete "$cr" --ignore-not-found
done
echo "===> Deleting MOCO CRDs..."
kubectl get crds | grep moco | awk '{print $1}' | while read crd; do
kubectl delete crd "$crd" --ignore-not-found
done
echo "===> Deleting leftover MOCO pods..."
kubectl delete pods --all -n ${MOCO_NAMESPACE} --ignore-not-found
echo "===> Attempting to delete MOCO namespace..."
kubectl delete namespace ${MOCO_NAMESPACE} --ignore-not-found || true
# If stuck in Terminating
if kubectl get namespace ${MOCO_NAMESPACE} &>/dev/null; then
echo "===> Namespace ${MOCO_NAMESPACE} still exists. Removing finalizers..."
kubectl get namespace ${MOCO_NAMESPACE} -o json | jq '.spec.finalizers = []' | kubectl replace --raw "/api/v1/namespaces/${MOCO_NAMESPACE}/finalize" -f -
fi
echo "===> Final verification:"
kubectl get all -A | grep moco || echo "No running MOCO resources found."
kubectl get crds | grep moco || echo "No MOCO CRDs remaining."
echo "✅ MOCO operator purge completed."