Recovering from a Catastrophic Failure

In those rare instances where a GridGuard™ Virtual Appliance has experienced a catastrophic failure, there are a number of options available for recovery.

  1. Re-boot the server: Once the server has re-booted, it will automatically synchronize itself with the other nodes in the cluster and be able to process transactions as normal.
  2. Recover from snapshot: If the machine cannot be re-booted, and virtual machine snapshots are available, you can restore the machine to a previously saved snapshot. Once the machine boots up, it will synchronize with the other nodes in the cluster.
  3. Build new appliance: If the machine cannot be recovered (and this might happen because of problems with the underlying hardware, like a disk failure), if you have GridGuard™ backups available, they can be restored into a newly setup machine. The restore will update both configuration & data on the server to the state saved in the backup file.
  4. If none of the above 3 options work, you can instantiate a brand new machine, export the configuration from one of the other nodes in the cluster; import it into the new machine and then manually add the new machine to the cluster.

It is important to ensure that you schedule periodic backups on the GridGuard™ server. It is also a good idea to keep virtual appliance snapshots, especially before making critical changes to the system like applying upgrades. This will enable you to recover gracefully in case of a catastrophic failure.