Resiliency: How to do a takeover to move services from Master instance to the Clone?
When you have a resilient webrunner or resilient kvm, you have actually 3 instances running:
- master instance (runner0 or kvm0): where the services are running
- PBS : where the backup is stored
- clone instance (runner1 or kvm1): where the services are cloned each day
If for some reason, you want to move your services from master to clone (because master is down, master is overloaded, ...), you need to follow the following steps. If you do the takeover procedure, your services won't run anymore on the previous master.
I Go to the takeover URL of your instance
You will find the takeover URL in the connection parameters of your instance (either "takeover-runner-1-url" or "takeover-kvm-1-url"). Please see also the takeover password ("takeover-runner-1-password" or "takeover-kvm-1-password"), you will need it later.
On the takeover URL, you should see something like this:
II Make sure your backup is OK
In the takeover interface, you will see the last valid backup date. Make sure the date is OK and you won't lose any data. Also, you should read the log from the importer (the script that copy the data, compiles and deploy the services in the clone). The login/password to see the importer log is the same as the monitor. You will find the password in the "monitor-setup-url" connection parameter.
Note: the backup is done once per day by default. You can change this setting in the parameter "Periodicity of backup" of your instance. For example setting this parameter to "0 * * * *" will run the backup each hour.
For KVM, this backup will backup your virtual disk (so meaning the whole VM). For webrunner, the backup will only copy what is inside ~/srv/backup directory.
WARNING: The backup process for webrunner of the resiliency will only backup what is inside ~/srv/backup directory so you need to make sure your software release running inside webrunner is filling the backup directory properly and you need to make sure your software release is doing the backup at the frequency you want.
Here is an example for ERP5 software release:
- ERP5 is running its own backup, backuping zodb and mariadb inside ~/srv/backup. This is done at a fixed period of time depending on a ERP5 parameter (see https://lab.nexedi.com/nexedi/slapos/blob/master/software/erp5/instance-erp5-input-schema.json#L280).
You can check you have the 2 crons inside your webrunner "ls instance/*/etc/cron.d/tidstorage
" and "ls instance/*/etc/cron.d/mariadb-backup
"
- webrunner is running the resiliency, meaning it will rsync the data from
~/srv/backup
into the PBS. This is done at a fixed period of time depending on the parameter "Periodicity of backup" of your webrunner. You can check the cron file "ls ~/etc/cron.d/backup
" in your webrunner.
- The PBS will push the data into the runner1 as soon as it received it
- The runner1 will recompile your ERP5 software release, then restore the data inside each partition and run custom script inside each partition (see "
Invoke custom import scripts defined by each instances
" in the resilient.log). Those scripts will restore mysql and zodb.
III Do the takeover
If the backup is OK, you can paste the takeover password and click on the red button "Take Over".
IV Wait for the end of the takeover
Now you have to wait for resiliency to replace your master. The original kvm0 (or runner0) will be rename to "broken" and the kvm1 (or runner1) will be renamed to kvm0 and a new clone will be requested. Once the process is over, the system will be reajusted and your access will be moved to the new master.
V Update your frontend (webrunner) or your IPv6 (kvm)
Since your services changed the computer where they run on, they changed their IPv6.
If you have a webrunner, you should redeploy your services inside the webrunner so that they know about the change. Also, you will need to change all the frontends pointing to your services (especially, the custom frontend if you have one).
If you have a KVM, you may need to update the IPv6 inside your machine if you used IPv6 provided by slapos.