Shutdown Proxmox cluster with Ceph storage

37

To shutdown Proxmox cluster and prevent data loss or corruption, especially when Ceph storage is in use, you must follow a specific procedure.

When working with a high-availability infrastructure, performing maintenance tasks can be a delicate process. Shutting down a cluster, even for a planned event like a power outage or hardware upgrade, requires careful execution to prevent data corruption and unexpected downtime.

This guide will walk you through the correct procedure for gracefully shutting down a Proxmox cluster with a Ceph storage backend, ensuring that all services and data remain intact.

Shutdown Proxmox cluster with Ceph storage

Before proceeding with the actual shutdown, ensure the Ceph cluster is in a healthy state.

Shutdown Proxmox cluster with Ceph storage
Shutdown Proxmox cluster with Ceph storage 9

To prevent Ceph storage from automatically marking an OSD (Object Storage Daemon) as out when it goes offline, the noout flag must be set to stop the cluster from triggering a data rebalancing process.

Without the noout flag, Ceph’s default behavior is to mark an unresponsive OSD as out after a short timeout (typically 5 minutes), and then begin to re-replicate the data from that OSD to others in the cluster. This can be a waste of resources and cause unnecessary performance degradation if the OSD is expected to come back online shortly.

Run the command:

Shutdown all VMs and Containers.

Shutdown Proxmox cluster with Ceph storage
Shutdown Proxmox cluster with Ceph storage 10

If your infrastructure uses the HA capability, you should disable the HA manager during a planned shutdown or maintenance of a Proxmox node. It prevents the HA manager from performing unnecessary and disruptive actions, such as migrating services, when you are intentionally taking a node offline.

Check the status of the HA manager.

On each node stop the pve-ha-lrm service to prevent each node from trying to take action on its local services. This service manages the local VMs and containers and execute the commands it receives from the CRM. If you only stop it on one node, the other nodes will still be running their LRM services and the CRM will be making decisions.

Shutdown Proxmox cluster with Ceph storage
Shutdown Proxmox cluster with Ceph storage 11

Once the LRMs are stopped on all nodes, you can stop the CRMs on each node. This service also runs on every node, but only one node at a time is the master CRM. This master is the decision-maker for the entire cluster.

If you stop the master CRM but the LRMs are still running, a new master will be elected and it will continue to manage the highly available resources, potentially overriding any manual changes you’ve made.

Stopping both services on all nodes effectively freezes the high-availability stack for the entire cluster

Shutdown Proxmox cluster with Ceph storage
Shutdown Proxmox cluster with Ceph storage 12

Now all Ceph services must be stopped. Run the following command on each node:

Run the sync and shutdown commands on all nodes. The sync command ensures pending disk writes are flushed. Wait for each node to fully shutdown before moving to the next.

Shutdown Proxmox cluster with Ceph storage
Shutdown Proxmox cluster with Ceph storage 13

The three nodes have been shutdown.

Power on the cluster

After shutdown Proxmox, there is no specific sequence to follow when powering on the physical nodes to restore cluster functionality.

After the nodes have powered on, wait for them to form a cluster quorum.

Check the Ceph storage cluster health. Note the noout flag is reported as set.

Shutdown Proxmox cluster with Ceph storage
Shutdown Proxmox cluster with Ceph storage 14

Since the noout flag was set during the shutdown procedure, you must unset the noout flag.

The HA status check reports that it is not active.

Shutdown Proxmox cluster with Ceph storage
Shutdown Proxmox cluster with Ceph storage 15

When the Proxmox cluster is powered on, the HA service must be reactivated in the reverse order.

On each node run this command to enable the CRM service.

The Proxmox cluster is now up and running.

Shutdown Proxmox cluster with Ceph storage
Shutdown Proxmox cluster with Ceph storage 16

Following this procedure, you can safely shutdown Proxmox with Ceph without risk of data loss or corruption.

80%
Awesome
  • Design
Leave A Reply

Your email address will not be published.

Ads Blocker Image Powered by Code Help Pro

Ads Blocker Detected!!!

We have detected that you are using extensions to block ads. Please support us by disabling these ads blocker.

Powered By
100% Free SEO Tools - Tool Kits PRO