From eb576cb94678087d8e31314ee4450f033f2a8be4 Mon Sep 17 00:00:00 2001 From: Rafael Brito Date: Tue, 31 Aug 2021 10:12:37 -0500 Subject: [PATCH 1/7] #4067 Initial design of the new plugins - pre-post backup and restore Signed-off-by: Rafael Brito --- .../new-prepost-backuprestore-plugin-hooks.md | 372 ++++++++++++++++++ 1 file changed, 372 insertions(+) create mode 100644 design/new-prepost-backuprestore-plugin-hooks.md diff --git a/design/new-prepost-backuprestore-plugin-hooks.md b/design/new-prepost-backuprestore-plugin-hooks.md new file mode 100644 index 0000000000..d6ead1867e --- /dev/null +++ b/design/new-prepost-backuprestore-plugin-hooks.md @@ -0,0 +1,372 @@ +# Pre-Backup, Post-Backup, Pre-Restore, and Post-Restore Action Plugin Hooks + +## Abstract + +Velero should provide a way to trigger actions before and after each backup and restore. +**Important**: These proposed plugin hooks are fundamentally different from the existing plugin hooks, BackupItemAction and RestoreItemAction, which are triggered per item during backup and restore, respectively. +The proposed plugin hooks are to be executed only once. + +These plugin hooks will be invoked: + +- PreBackupAction: plugin actions are executed after the backup object is created and validated but before the backup is being processed, more precisely _before_ function [c.backupper.Backup](https://github.com/vmware-tanzu/velero/blob/74476db9d791fa91bba0147eac8ec189820adb3d/pkg/controller/backup_controller.go#L590). If the PreBackupActions return an err, the backup object is not processed. +- PostBackupAction: plugin actions are executed after the backup finishes processing all items and volumes snapshots are completed, more precisely _after_ function [recordBackupMetrics](https://github.com/vmware-tanzu/velero/blob/74476db9d791fa91bba0147eac8ec189820adb3d/pkg/controller/backup_controller.go#L630). If the PostBackupActions return errors or warnings, these return statuses are counted towards to the backup Status object. +- PreRestoreAction: plugin actions are executed after the restore object is created and validated and backup object is fetched, more precisely _after_ function [c.fetchBackupInfo](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L345). If the PreRestoreActions return an err, the restore object is not processed. +- PostRestoreAction: plugin actions are executed after the restore finishes processing all items and volumes snapshots are restored, more precisely _after_ function [c.restore.Restore](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L479). If the PostRestoreActions return errors or warnings, these return statuses are counted towards to the restore Status object. + +## Background + +Increasingly, Velero is employed for workload migrations across different Kubernetes clusters. +Using Velero for migrations requires an atomic operation involving a Velero backup on a source cluster followed by a Velero restore on a destination cluster. + +It is common during these migrations to perform many actions inside and outside Kubernetes clusters. +**Attention**: these actions are not necessarily per resource item, but they are actions to be executed _once_ before and/or after the migration itself (remember, migration in this context is Velero Backup + Velero Restore). + +One important use case driving this proposal is migrating stateful workloads at scale across different clusters/storage backends. +Today, Velero's Restic integration is the response for such use cases, but there are some limitations: + +- Quiesce/unquiesce workloads: Pod hooks are useful for quiescing/unquiescing workloads, but platform engineers often do not have the luxury/visibility/time/knowledge to go through each pod in order to add specific commands to quiesce/unquiesce workloads. +- Orphan PVC/PV pairs: PVCs/PVs that do not have associated running pods are not backed up and consequently, are not migrated. + +Aiming to address these two limitations, and separate from this proposal, we would like to write a Velero plugin that takes advantage of the proposed Pre-Backup plugin hook. This plugin will be executed _once_ (not per resource item) prior backup. It will scale down the applications setting `.spec.replicas=0` to all deployments, statefulsets, daemonsets, replicasets, etc. and will start a small-footprint staging pod that will mount all PVC/PV pairs. Similarly, we would like to write another plugin that will utilize the proposed Post-Restore plugin hook. This plugin will unquiesce migrated applications by killing the staging pod and reinstating original `.spec.replicas values` after the Velero restore is completed. + +Other examples of plugins that can use the proposed plugin hooks are + +- PostBackupAction: trigger a Velero Restore after a successful Velero backup (and complete the migration operation). +- PreRestoreAction: pre-expand the cluster's capacity via Cluster API to avoid starvation of cluster resources before the restore. +- PostRestoreAction: call actions to be performed outside Kubernetes clusters, such as configure a global load balancer (GLB) that enables the new cluster. + +This design seeks to provide missing extension points. This proposal's scope is to only add the new plugin hooks, not the plugins themselves. + +## Goals + +- Provide PreBackupAction, PostBackupAction, PreRestoreAction, and PostRestoreAction APIs for plugins to implement. +- Update Velero backup and restore creation logic to invoke registered PreBackupAction and PreRestoreAction plugins before processing the backup and restore respectively. +- Update Velero backup and restore complete logic to invoke registered PostBackupAction and PostRestoreAction plugins after flagging the objects as completed. + +## Non-Goals + +- Specific implementations of the PreBackupAction, PostBackupAction, PreRestoreAction and PostRestoreAction API beyond test cases. + +## High-Level Design + +The PreBackupAction plugin API will resemble the BackupItemAction plugin hook design, but with the fundamental difference that it will receive only as input the Velero `Backup` object created. +It will not receive any resource list items because the backup is not yet running at that stage. +In addition, the `PreBackupAction` interface will only have an `Execute()` method since the plugin will be executed once per Backup creation, not per item. + +The Velero backup controller will be modified so that if there are any PreBackupAction plugins registered, they will be executed after backup object is created and validated but they will execute prior to processing the backup items and volume snapshots. More precisely _before_ function [c.backupper.Backup](https://github.com/vmware-tanzu/velero/blob/74476db9d791fa91bba0147eac8ec189820adb3d/pkg/controller/backup_controller.go#L590). If the PreBackupActions return an err, the function `runBackup` returns it and backup object is not processed. + +The PostBackupAction plugin API will resemble the BackupItemAction plugin design, but with the fundamental difference that it will receive only as input the Velero `Backup` object without any resource list items. +By this stage, the backup has already been executed, with items backed up and volumes snapshots processed. +The `PostBackupAction` interface will only have an `Execute()` method since the plugin will be executed only once per Backup, not per item. + +The Velero backup controller package will be modified. +If there are any PostBackupAction plugins registered, they will be executed as almost last step of Backup object, more precisely _after_ function [recordBackupMetrics](https://github.com/vmware-tanzu/velero/blob/74476db9d791fa91bba0147eac8ec189820adb3d/pkg/controller/backup_controller.go#L630) but before the backup logs are persisted on the gzip file. We want to capture the logs from the PostBackupActions as part of the backup object. If the PostBackupActions return errors or warnings, these return statuses are counted towards to the backup Status object on `backup.Status.Warnings` and `backup.Status.Errors`. + +The PreRestoreAction plugin API will resemble the RestoreItemAction plugin design, but with the fundamental difference that it will receive only as input the Velero `Restore` object created. +It will not receive any resource list items because the restore has not yet been running at that stage. +In addition, the `PreRestoreAction` interface will only have an `Execute()` method since the plugin will be executed only once per Restore creation, not per item. + +The Velero restore controller will be modified so that if there are any PreRestoreAction plugins registered, they will be executed after the restore object is created and validated and backup object is fetched, more precisely _after_ function [c.fetchBackupInfo](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L345). At this point, the user might want to take some actions with the backup object prior processing the restore (i.e. calculate the amount of resources needed for the restore). If the PreRestoreActions return an err, the function `validateAndComplete` returns it and restore object is not processed. + +The PostRestoreAction plugin API will resemble the RestoreItemAction plugin design, but with the fundamental difference that it will receive only as input the Velero `Restore` object without any resource list items. +At this stage, the restore has already been executed. +The `PostRestoreAction` interface will only have an `Execute()` method since the plugin will be executed only once per Restore, not per item. + +The Velero restore controller package will be modified. +If any PostRestoreAction plugins are registered, they will be executed after the restore finishes processing all items and volumes snapshots are restored, more precisely _after_ function [c.restore.Restore](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L479). If the PostRestoreActions return errors or warnings, these return statuses are counted towards to the restore Status object, on `restoreWarnings`, `restoreErrors`. + + +## Detailed Design + +### New types + +#### PreBackupAction + +The `PreBackupAction` interface is as follows: + +```go +// PreBackupAction provides a hook into the backup process before it begins. +type PreBackupAction interface { + // Execute the PreBackupAction plugin providing it access to the Backup that + // is being executed + Execute(backup *api.Backup) error +} +``` + +`PreBackupAction` will be defined in `pkg/plugin/velero/pre_backup_action.go`. + +#### PostBackupAction + +The `PostBackupAction` interface is as follows: + +```go +// PostBackupAction provides a hook into the backup process after it completes. +type PostBackupAction interface { + // Execute the PostBackupAction plugin providing it access to the Backup that + // has been completed + Execute(backup *api.Backup) error +} +``` + +`PostBackupAction` will be defined in `pkg/plugin/velero/post_backup_action.go`. + +#### PreRestoreAction + +The `PreRestoreAction` interface is as follows: + +```go +// PreRestoreAction provides a hook into the restore process before it begins. +type PreRestoreAction interface { + // Execute the PreRestoreAction plugin providing it access to the Restore that + // is being executed + Execute(restore *api.Restore) error +} +``` + +`PreRestoreAction` will be defined in `pkg/plugin/velero/pre_restore_action.go`. + +#### PostRestoreAction + +The `PostRestoreAction` interface is as follows: + +```go +// PostRestoreAction provides a hook into the restore process after it completes. +type PostRestoreAction interface { + // Execute the PostRestoreAction plugin providing it access to the Restore that + // has been completed + Execute(restore *api.Restore) error +} +``` + +`PostRestoreAction` will be defined in `pkg/plugin/velero/post_restore_action.go`. + +### Generate Protobuf Definitions and Client/Servers + +In `pkg/plugin/proto`, add the following: + +1. Protobuf definitions will be necessary for PreBackupAction in `pkg/plugin/proto/PreBackupAction.proto`. + +```protobuf +message PreBackupActionExecuteRequest { + ... +} + +service PreBackupAction { + rpc Execute(PreBackupActionExecuteRequest) returns (Empty) +} +``` + +Once these are written, then a client and server implementation can be written in `pkg/plugin/framework/pre_backup_action_client.go` and `pkg/plugin/framework/pre_backup_action_server.go`, respectively. + +2. Protobuf definitions will be necessary for PostBackupAction in `pkg/plugin/proto/PostBackupAction.proto`. + +```protobuf +message PostBackupActionExecuteRequest { + ... +} + +service PostBackupAction { + rpc Execute(PostBackupActionExecuteRequest) returns (Empty) +} +``` + +Once these are written, then a client and server implementation can be written in `pkg/plugin/framework/post_backup_action_client.go` and `pkg/plugin/framework/post_backup_action_server.go`, respectively. + +3. Protobuf definitions will be necessary for PreRestoreAction in `pkg/plugin/proto/PreRestoreAction.proto`. + +```protobuf +message PreRestoreActionExecuteRequest { + ... +} + +service PreRestoreAction { + rpc Execute(PreRestoreActionExecuteRequest) returns (Empty) +} +``` + +Once these are written, then a client and server implementation can be written in `pkg/plugin/framework/pre_restore_action_client.go` and `pkg/plugin/framework/pre_restore_action_server.go`, respectively. + +4. Protobuf definitions will be necessary for PostRestoreAction in `pkg/plugin/proto/PostRestoreAction.proto`. + +```protobuf +message PostRestoreActionExecuteRequest { + ... +} + +service PostRestoreAction { + rpc Execute(PostRestoreActionExecuteRequest) returns (Empty) +} +``` + +Once these are written, then a client and server implementation can be written in `pkg/plugin/framework/post_restore_action_client.go` and `pkg/plugin/framework/post_restore_action_server.go`, respectively. + +### Restartable Delete Plugins + +Similar to the `RestoreItemAction` and `BackupItemAction` plugins, restartable processes will need to be implemented (with the difference that there is no `AppliedTo()` method). + +In `pkg/plugin/clientmgmt/`, add + +1. `restartable_pre_backup_action.go`, creating the following unexported type: + +```go +type restartablePreBackupAction struct { + key kindAndName + sharedPluginProcess RestartableProcess +} + +func newRestartablePreBackupAction(name string, sharedPluginProcess RestartableProcess) *restartablePreBackupAction { + // ... +} + +func (r *restartablePreBackupAction) getPreBackupAction() (velero.PreBackupAction, error) { + // ... +} + +func (r *restartablePreBackupAction) getDelegate() (velero.PreBackupAction, error) { + // ... +} + +// Execute restarts the plugin's process if needed, then delegates the call. +func (r *restartablePreBackupAction) Execute(input *velero.PreBackupActionInput) (error) { + // ... +} +``` + +2. `restartable_post_backup_action.go`, creating the following unexported type: + +```go +type restartablePostBackupAction struct { + key kindAndName + sharedPluginProcess RestartableProcess +} + +func newRestartablePostBackupAction(name string, sharedPluginProcess RestartableProcess) *restartablePostBackupAction { + // ... +} + +func (r *restartablePostBackupAction) getPostBackupAction() (velero.PostBackupAction, error) { + // ... +} + +func (r *restartablePostBackupAction) getDelegate() (velero.PostBackupAction, error) { + // ... +} + +// Execute restarts the plugin's process if needed, then delegates the call. +func (r *restartablePostBackupAction) Execute(input *velero.PostBackupActionInput) (error) { + // ... +} +``` + +3. `restartable_pre_restore_action.go`, creating the following unexported type: + +```go +type restartablePreRestoreAction struct { + key kindAndName + sharedPluginProcess RestartableProcess +} + +func newRestartablePreRestoreAction(name string, sharedPluginProcess RestartableProcess) *restartablePreRestoreAction { + // ... +} + +func (r *restartablePreRestoreAction) getPreRestoreAction() (velero.PreRestoreAction, error) { + // ... +} + +func (r *restartablePreRestoreAction) getDelegate() (velero.PreRestoreAction, error) { + // ... +} + +// Execute restarts the plugin's process if needed, then delegates the call. +func (r *restartablePreRestoreAction) Execute(input *velero.PreRestoreActionInput) (error) { + // ... +} +``` + +4. `restartable_post_restore_action.go`, creating the following unexported type: + +```go +type restartablePostRestoreAction struct { + key kindAndName + sharedPluginProcess RestartableProcess +} + +func newRestartablePostRestoreAction(name string, sharedPluginProcess RestartableProcess) *restartablePostRestoreAction { + // ... +} + +func (r *restartablePostRestoreAction) getPostRestoreAction() (velero.PostRestoreAction, error) { + // ... +} + +func (r *restartablePostRestoreAction) getDelegate() (velero.PostRestoreAction, error) { + // ... +} + +// Execute restarts the plugin's process if needed, then delegates the call. +func (r *restartablePostRestoreAction) Execute(input *velero.PostRestoreActionInput) (error) { + // ... +} +``` + +### Plugin Manager Changes + +Add the following methods to the `Manager` interface in `pkg/plugin/clientmgmt/manager.go`: + +```go +type Manager interface { + ... + // Get PreBackupAction returns a PreBackupAction plugin for name. + GetPreBackupAction(name string) (PreBackupAction, error) + + // Get PreBackupActions returns the all PreBackupAction plugins. + GetPreBackupActions() ([]PreBackupAction, error) + + // Get PostBackupAction returns a PostBackupAction plugin for name. + GetPostBackupAction(name string) (PostBackupAction, error) + + // GetPostBackupActions returns the all PostBackupAction plugins. + GetPostBackupActions() ([]PostBackupAction, error) + + // Get PreRestoreAction returns a PreRestoreAction plugin for name. + GetPreRestoreAction(name string) (PreRestoreAction, error) + + // Get PreRestoreActions returns the all PreRestoreAction plugins. + GetPreRestoreActions() ([]PreRestoreAction, error) + + // Get PostRestoreAction returns a PostRestoreAction plugin for name. + GetPostRestoreAction(name string) (PostRestoreAction, error) + + // GetPostRestoreActions returns the all PostRestoreAction plugins. + GetPostRestoreActions() ([]PostRestoreAction, error) + +} +``` + +`GetPreBackupAction` and `GetPreBackupActions` will invoke the `restartablePreBackupAction` implementations. +`GetPostBackupAction` and `GetPostBackupActions` will invoke the `restartablePostBackupAction` implementations. +`GetPreRestoreAction` and `GetPreRestoreActions` will invoke the `restartablePreRestoreAction` implementations. +`GetPostRestoreAction` and `GetPostRestoreActions` will invoke the `restartablePostRestoreAction` implementations. + +## Alternatives Considered + +An alternative to these plugin hooks is to implement all the pre/post backup/restore logic _outside_ Velero. +In this case, one would need to write an external controller that works similar to what [Konveyor Crane](https://github.com/konveyor/mig-controller/blob/master/pkg/controller/migmigration/quiesce.go) does today when quiescing applications. +We find this a viable way, but we think that Velero users can benefit from Velero having greater embedded capabilities, which will allow users to write or load plugins extensions without relying on an external components. + +## Security Considerations + +The plugins will only be invoked if loaded per a user's discretion. +It is recommended to check security vulnerabilities before execution. + +## Compatibility + +In terms of backward compatibility, this design should stay compatible with most Velero installations that are upgrading. +If plugins are not present, then the backup/restore process should proceed the same way it worked before their inclusion. + +## Implementation + +The implementation dependencies are roughly in the order as they are described in the [Detailed Design](#detailed-design) section. + +## Open Issues From 96b2285b43a7c634633880a97e6e68a8562b0d5d Mon Sep 17 00:00:00 2001 From: Rafael Brito Date: Tue, 31 Aug 2021 15:28:16 -0500 Subject: [PATCH 2/7] Update new-prepost-backuprestore-plugin-hooks.md --- design/new-prepost-backuprestore-plugin-hooks.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/design/new-prepost-backuprestore-plugin-hooks.md b/design/new-prepost-backuprestore-plugin-hooks.md index d6ead1867e..3ec94d9099 100644 --- a/design/new-prepost-backuprestore-plugin-hooks.md +++ b/design/new-prepost-backuprestore-plugin-hooks.md @@ -10,8 +10,8 @@ These plugin hooks will be invoked: - PreBackupAction: plugin actions are executed after the backup object is created and validated but before the backup is being processed, more precisely _before_ function [c.backupper.Backup](https://github.com/vmware-tanzu/velero/blob/74476db9d791fa91bba0147eac8ec189820adb3d/pkg/controller/backup_controller.go#L590). If the PreBackupActions return an err, the backup object is not processed. - PostBackupAction: plugin actions are executed after the backup finishes processing all items and volumes snapshots are completed, more precisely _after_ function [recordBackupMetrics](https://github.com/vmware-tanzu/velero/blob/74476db9d791fa91bba0147eac8ec189820adb3d/pkg/controller/backup_controller.go#L630). If the PostBackupActions return errors or warnings, these return statuses are counted towards to the backup Status object. -- PreRestoreAction: plugin actions are executed after the restore object is created and validated and backup object is fetched, more precisely _after_ function [c.fetchBackupInfo](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L345). If the PreRestoreActions return an err, the restore object is not processed. -- PostRestoreAction: plugin actions are executed after the restore finishes processing all items and volumes snapshots are restored, more precisely _after_ function [c.restore.Restore](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L479). If the PostRestoreActions return errors or warnings, these return statuses are counted towards to the restore Status object. +- PreRestoreAction: plugin actions are executed after the restore object is created and validated and backup object is fetched, more precisely in function `runValidatedRestore` _before_ function [c.restore.Restore](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L479). If the PreRestoreActions return an err, the restore object is not processed. +- PostRestoreAction: plugin actions are executed after the restore finishes processing all items and volumes snapshots are restored, more precisely in function `runValidatedRestore` _after_ function [c.restore.Restore](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L479). If the PostRestoreActions return errors or warnings, these return statuses are counted towards to the restore Status object. ## Background @@ -66,14 +66,14 @@ The PreRestoreAction plugin API will resemble the RestoreItemAction plugin desig It will not receive any resource list items because the restore has not yet been running at that stage. In addition, the `PreRestoreAction` interface will only have an `Execute()` method since the plugin will be executed only once per Restore creation, not per item. -The Velero restore controller will be modified so that if there are any PreRestoreAction plugins registered, they will be executed after the restore object is created and validated and backup object is fetched, more precisely _after_ function [c.fetchBackupInfo](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L345). At this point, the user might want to take some actions with the backup object prior processing the restore (i.e. calculate the amount of resources needed for the restore). If the PreRestoreActions return an err, the function `validateAndComplete` returns it and restore object is not processed. +The Velero restore controller will be modified so that if there are any PreRestoreAction plugins registered, they will be executed after restore object is created and validated and backup object is fetched, more precisely in function `runValidatedRestore` _before_ function [c.restore.Restore](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L479). If the PreRestoreActions return an err, the restore object is not processed. If the PreRestoreActions return an err, the function `runValidatedRestore` returns it and restore object is not processed. The PostRestoreAction plugin API will resemble the RestoreItemAction plugin design, but with the fundamental difference that it will receive only as input the Velero `Restore` object without any resource list items. At this stage, the restore has already been executed. The `PostRestoreAction` interface will only have an `Execute()` method since the plugin will be executed only once per Restore, not per item. The Velero restore controller package will be modified. -If any PostRestoreAction plugins are registered, they will be executed after the restore finishes processing all items and volumes snapshots are restored, more precisely _after_ function [c.restore.Restore](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L479). If the PostRestoreActions return errors or warnings, these return statuses are counted towards to the restore Status object, on `restoreWarnings`, `restoreErrors`. +If any PostRestoreAction plugins are registered, they will be executed after the restore finishes processing all items and volumes snapshots are restored, more precisely in function `runValidatedRestore` _after_ function [c.restore.Restore](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L479). If the PostRestoreActions return errors or warnings, these return statuses are counted towards to the restore Status object, on `restoreWarnings`, `restoreErrors`. ## Detailed Design From 206bd8c63e35eca471942e253aeb08466fe4041a Mon Sep 17 00:00:00 2001 From: Rafael Brito Date: Tue, 21 Sep 2021 10:54:44 -0500 Subject: [PATCH 3/7] Updated design doc as per feedback Signed-off-by: Rafael Brito --- .../new-prepost-backuprestore-plugin-hooks.md | 40 ++++++++++++------- 1 file changed, 25 insertions(+), 15 deletions(-) diff --git a/design/new-prepost-backuprestore-plugin-hooks.md b/design/new-prepost-backuprestore-plugin-hooks.md index 3ec94d9099..2506ec57ad 100644 --- a/design/new-prepost-backuprestore-plugin-hooks.md +++ b/design/new-prepost-backuprestore-plugin-hooks.md @@ -9,9 +9,9 @@ The proposed plugin hooks are to be executed only once. These plugin hooks will be invoked: - PreBackupAction: plugin actions are executed after the backup object is created and validated but before the backup is being processed, more precisely _before_ function [c.backupper.Backup](https://github.com/vmware-tanzu/velero/blob/74476db9d791fa91bba0147eac8ec189820adb3d/pkg/controller/backup_controller.go#L590). If the PreBackupActions return an err, the backup object is not processed. -- PostBackupAction: plugin actions are executed after the backup finishes processing all items and volumes snapshots are completed, more precisely _after_ function [recordBackupMetrics](https://github.com/vmware-tanzu/velero/blob/74476db9d791fa91bba0147eac8ec189820adb3d/pkg/controller/backup_controller.go#L630). If the PostBackupActions return errors or warnings, these return statuses are counted towards to the backup Status object. -- PreRestoreAction: plugin actions are executed after the restore object is created and validated and backup object is fetched, more precisely in function `runValidatedRestore` _before_ function [c.restore.Restore](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L479). If the PreRestoreActions return an err, the restore object is not processed. -- PostRestoreAction: plugin actions are executed after the restore finishes processing all items and volumes snapshots are restored, more precisely in function `runValidatedRestore` _after_ function [c.restore.Restore](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L479). If the PostRestoreActions return errors or warnings, these return statuses are counted towards to the restore Status object. +- PostBackupAction: plugin actions are executed after the backup is finished and persisted, more precisely _after_ function [c.runBackup](https://github.com/vmware-tanzu/velero/blob/74476db9d791fa91bba0147eac8ec189820adb3d/pkg/controller/backup_controller.go#L274). If the PostBackupActions return errors or warnings, these return statuses are counted towards to the backup Status object and the the final status of the backup will be patched. +- PreRestoreAction: plugin actions are executed after the restore object is created and validated and before the backup object is fetched, more precisely in function `validateAndComplete` _before_ function [backupXorScheduleProvided](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L316). If the PreRestoreActions return an err, the restore object is not processed. +- PostRestoreAction: plugin actions are executed after the restore finishes processing all items and volumes snapshots are restored and logs persisted, more precisely in function `processRestore` _after_ setting [`restore.Status.CompletionTimestamp`](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L273). If the PostRestoreActions return errors or warnings, these return statuses are counted towards to the final restore Status object. ## Background @@ -19,7 +19,7 @@ Increasingly, Velero is employed for workload migrations across different Kubern Using Velero for migrations requires an atomic operation involving a Velero backup on a source cluster followed by a Velero restore on a destination cluster. It is common during these migrations to perform many actions inside and outside Kubernetes clusters. -**Attention**: these actions are not necessarily per resource item, but they are actions to be executed _once_ before and/or after the migration itself (remember, migration in this context is Velero Backup + Velero Restore). +**Attention**: these actions are not per resource item, but they are actions to be executed _once_ before and/or after the migration itself (remember, migration in this context is Velero Backup + Velero Restore). One important use case driving this proposal is migrating stateful workloads at scale across different clusters/storage backends. Today, Velero's Restic integration is the response for such use cases, but there are some limitations: @@ -29,52 +29,62 @@ Today, Velero's Restic integration is the response for such use cases, but there Aiming to address these two limitations, and separate from this proposal, we would like to write a Velero plugin that takes advantage of the proposed Pre-Backup plugin hook. This plugin will be executed _once_ (not per resource item) prior backup. It will scale down the applications setting `.spec.replicas=0` to all deployments, statefulsets, daemonsets, replicasets, etc. and will start a small-footprint staging pod that will mount all PVC/PV pairs. Similarly, we would like to write another plugin that will utilize the proposed Post-Restore plugin hook. This plugin will unquiesce migrated applications by killing the staging pod and reinstating original `.spec.replicas values` after the Velero restore is completed. -Other examples of plugins that can use the proposed plugin hooks are +Other examples of plugins that can use the proposed plugin hooks are: - PostBackupAction: trigger a Velero Restore after a successful Velero backup (and complete the migration operation). - PreRestoreAction: pre-expand the cluster's capacity via Cluster API to avoid starvation of cluster resources before the restore. - PostRestoreAction: call actions to be performed outside Kubernetes clusters, such as configure a global load balancer (GLB) that enables the new cluster. +The post backup actions will be executed after the backup is uploaded (persisted) on the disk. The logs of post-backup actions will be uploaded on the disk once the actions are completed. + +The post restore actions will be executed after the restore is uploaded (persisted) on the disk. The logs of post-restore actions will be uploaded on the disk once the actions are completed. + This design seeks to provide missing extension points. This proposal's scope is to only add the new plugin hooks, not the plugins themselves. ## Goals - Provide PreBackupAction, PostBackupAction, PreRestoreAction, and PostRestoreAction APIs for plugins to implement. - Update Velero backup and restore creation logic to invoke registered PreBackupAction and PreRestoreAction plugins before processing the backup and restore respectively. -- Update Velero backup and restore complete logic to invoke registered PostBackupAction and PostRestoreAction plugins after flagging the objects as completed. - +- Update Velero backup and restore complete logic to invoke registered PostBackupAction and PostRestoreAction plugins the objects are uploaded on disk. +- Create two new Backup phases: `ExecutingPreBackupActions` (after `New` and before `InProgress`) and `ExecutingPostBackupActions` (after `Uploading` or `UploadingPartialFailure`) +- Create two new Restore phases: `ExecutingPreRestoreActions` (after `New` and before `InProgress`) and `ExecutingPostRestoreActions` (after `InProgress`) + ## Non-Goals - Specific implementations of the PreBackupAction, PostBackupAction, PreRestoreAction and PostRestoreAction API beyond test cases. +- For migration specific actions (Velero Backup + Velero Restore), add disk synchronization during the validation of the Restore (making sure the newly created backup will show during restore) ## High-Level Design +The Velero backup controller package will be modified for `PreBackupAction` and `PostBackupAction`. + The PreBackupAction plugin API will resemble the BackupItemAction plugin hook design, but with the fundamental difference that it will receive only as input the Velero `Backup` object created. It will not receive any resource list items because the backup is not yet running at that stage. In addition, the `PreBackupAction` interface will only have an `Execute()` method since the plugin will be executed once per Backup creation, not per item. -The Velero backup controller will be modified so that if there are any PreBackupAction plugins registered, they will be executed after backup object is created and validated but they will execute prior to processing the backup items and volume snapshots. More precisely _before_ function [c.backupper.Backup](https://github.com/vmware-tanzu/velero/blob/74476db9d791fa91bba0147eac8ec189820adb3d/pkg/controller/backup_controller.go#L590). If the PreBackupActions return an err, the function `runBackup` returns it and backup object is not processed. +The Velero backup controller will be modified so that if there are any PreBackupAction plugins registered, they will be executed after backup object is created and validated but they will execute prior to processing the backup items and volume snapshots. More precisely _before_ function [c.backupper.Backup](https://github.com/vmware-tanzu/velero/blob/74476db9d791fa91bba0147eac8ec189820adb3d/pkg/controller/backup_controller.go#L590). During the execution of `PreBackupAction`, the status of the backup object will be set to `ExecutingPreBackupActions`. If the PreBackupActions return an err, the function `runBackup` returns it and backup object is not processed. The PostBackupAction plugin API will resemble the BackupItemAction plugin design, but with the fundamental difference that it will receive only as input the Velero `Backup` object without any resource list items. -By this stage, the backup has already been executed, with items backed up and volumes snapshots processed. +By this stage, the backup has already been executed, with items backed up and volumes snapshots processed and persisted. The `PostBackupAction` interface will only have an `Execute()` method since the plugin will be executed only once per Backup, not per item. -The Velero backup controller package will be modified. -If there are any PostBackupAction plugins registered, they will be executed as almost last step of Backup object, more precisely _after_ function [recordBackupMetrics](https://github.com/vmware-tanzu/velero/blob/74476db9d791fa91bba0147eac8ec189820adb3d/pkg/controller/backup_controller.go#L630) but before the backup logs are persisted on the gzip file. We want to capture the logs from the PostBackupActions as part of the backup object. If the PostBackupActions return errors or warnings, these return statuses are counted towards to the backup Status object on `backup.Status.Warnings` and `backup.Status.Errors`. +If there are any PostBackupAction plugins registered, they will be executed after backup is processed and persisted, more precisely _after_ [c.runBackup](https://github.com/vmware-tanzu/velero/blob/74476db9d791fa91bba0147eac8ec189820adb3d/pkg/controller/backup_controller.go#L274). During the execution of `PostBackupAction`, the status of the backup object will be set to `ExecutingPostBackupActions`. We want to capture the logs from the PostBackupActions on the object storage, so after execution of `PostBackupAction`, backup controller will persist the logs adding a new log on the existent backup store via a new method called `PatchBackup` on `BackupStore` interface. If the PostBackupActions return errors or warnings, these return statuses are counted towards to the backup Status object on `backup.Status.Warnings` and `backup.Status.Errors`. + +The Velero restore controller package will be modified for `PreRestoreAction` and `PostRestoreAction`. The PreRestoreAction plugin API will resemble the RestoreItemAction plugin design, but with the fundamental difference that it will receive only as input the Velero `Restore` object created. It will not receive any resource list items because the restore has not yet been running at that stage. In addition, the `PreRestoreAction` interface will only have an `Execute()` method since the plugin will be executed only once per Restore creation, not per item. -The Velero restore controller will be modified so that if there are any PreRestoreAction plugins registered, they will be executed after restore object is created and validated and backup object is fetched, more precisely in function `runValidatedRestore` _before_ function [c.restore.Restore](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L479). If the PreRestoreActions return an err, the restore object is not processed. If the PreRestoreActions return an err, the function `runValidatedRestore` returns it and restore object is not processed. +The Velero restore controller will be modified so that if there are any PreRestoreAction plugins registered, they will be executed after restore object is created and the basic semantics of restore object are passed, more precisely in function `validateAndComplete` _before_ function [backupXorScheduleProvided](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L316). At this point, the backup or schedule object have not been retrieved yet. +Inside the `PreRestoreAction` plugin execution, the status of the restore object will be set to `ExecutingPreRestoreActions` and we will proactively sync the object storage. +If the PreRestoreActions return an err, the restore object is not processed. If the PreRestoreActions return an err, the function `ValidatedRestore` returns it and restore object is not processed. The PostRestoreAction plugin API will resemble the RestoreItemAction plugin design, but with the fundamental difference that it will receive only as input the Velero `Restore` object without any resource list items. At this stage, the restore has already been executed. The `PostRestoreAction` interface will only have an `Execute()` method since the plugin will be executed only once per Restore, not per item. -The Velero restore controller package will be modified. -If any PostRestoreAction plugins are registered, they will be executed after the restore finishes processing all items and volumes snapshots are restored, more precisely in function `runValidatedRestore` _after_ function [c.restore.Restore](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L479). If the PostRestoreActions return errors or warnings, these return statuses are counted towards to the restore Status object, on `restoreWarnings`, `restoreErrors`. - +If any PostRestoreAction plugins are registered, they will be executed after the restore finishes processing all items and volumes snapshots are restored and logs persisted, more precisely in function `processRestore` _after_ setting [`restore.Status.CompletionTimestamp`](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L273). In this case registerd, the status of the restore object will be set to `ExecutingPreRestoreActions`. If the actions return errors or warnings, these return statuses are counted towards to the restore Status object, on `restoreWarnings`, `restoreErrors` and dissaminated to the restore's final status. ## Detailed Design From c6742115cee3d5d61fbbd7237e0d5746e2609f88 Mon Sep 17 00:00:00 2001 From: Rafael Brito Date: Tue, 28 Sep 2021 18:44:59 -0500 Subject: [PATCH 4/7] Adding design changes as per feedback --- .../new-prepost-backuprestore-plugin-hooks.md | 207 +++++++++++++++++- 1 file changed, 195 insertions(+), 12 deletions(-) diff --git a/design/new-prepost-backuprestore-plugin-hooks.md b/design/new-prepost-backuprestore-plugin-hooks.md index 2506ec57ad..703e0043c9 100644 --- a/design/new-prepost-backuprestore-plugin-hooks.md +++ b/design/new-prepost-backuprestore-plugin-hooks.md @@ -3,15 +3,90 @@ ## Abstract Velero should provide a way to trigger actions before and after each backup and restore. -**Important**: These proposed plugin hooks are fundamentally different from the existing plugin hooks, BackupItemAction and RestoreItemAction, which are triggered per item during backup and restore, respectively. -The proposed plugin hooks are to be executed only once. +**Important**: These proposed plugin hooks are fundamentally different from the existing plugin hooks, BackupItemAction and RestoreItemAction, which are triggered per resource item during backup and restore, respectively. +The proposed plugin hooks are to be executed only once: pre-backup (before backup starts), post-backup (after the backup is completed and uploaded to object storage, including volumes snapshots), pre-restore (before restore starts) and post-restore (after the restore is completed, including volumes are restored). + +### PreBackup and PostBackup Actions + +For the backup, the sequence of events of Velero backup are the following (these sequence depicted is prior upcoming changes for [upload progress #3533](https://github.com/vmware-tanzu/velero/issues/3533) ): + +New Backup Request + |--> Validation of the request + |--> Set Backup Phase "In Progress" + | --> Start Backup + | --> Discover all Plugins + |--> Check if Backup Exists + |--> Backup all K8s Resource Items + |--> Perform all Volumes Snapshots + |--> Final Backup Phase is determined + |--> Persist Backup and Logs on Object Storage + +We propose the pre-backup and post-backup plugin hooks to be executed in this sequence: + +New Backup Request + |--> Validation of the request + |--> Set Backup Phase "In Progress" + | --> Start Backup + | --> Discover all Plugins + |--> Check if Backup Exists + |--> *PreBackupActions* are executed, logging actions on existent backup log file + |--> Backup all K8s Resource Items + |--> Perform all Volumes Snapshots + |--> Final Backup Phase is determined + |--> Persist Backup and logs on Object Storage + |--> *PostBackupActions* are executed, logging to its own file These plugin hooks will be invoked: -- PreBackupAction: plugin actions are executed after the backup object is created and validated but before the backup is being processed, more precisely _before_ function [c.backupper.Backup](https://github.com/vmware-tanzu/velero/blob/74476db9d791fa91bba0147eac8ec189820adb3d/pkg/controller/backup_controller.go#L590). If the PreBackupActions return an err, the backup object is not processed. -- PostBackupAction: plugin actions are executed after the backup is finished and persisted, more precisely _after_ function [c.runBackup](https://github.com/vmware-tanzu/velero/blob/74476db9d791fa91bba0147eac8ec189820adb3d/pkg/controller/backup_controller.go#L274). If the PostBackupActions return errors or warnings, these return statuses are counted towards to the backup Status object and the the final status of the backup will be patched. -- PreRestoreAction: plugin actions are executed after the restore object is created and validated and before the backup object is fetched, more precisely in function `validateAndComplete` _before_ function [backupXorScheduleProvided](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L316). If the PreRestoreActions return an err, the restore object is not processed. -- PostRestoreAction: plugin actions are executed after the restore finishes processing all items and volumes snapshots are restored and logs persisted, more precisely in function `processRestore` _after_ setting [`restore.Status.CompletionTimestamp`](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L273). If the PostRestoreActions return errors or warnings, these return statuses are counted towards to the final restore Status object. +- PreBackupAction: plugin actions are executed after the backup object is created and validated but before the backup is being processed, more precisely _before_ function [c.backupper.Backup](https://github.com/vmware-tanzu/velero/blob/74476db9d791fa91bba0147eac8ec189820adb3d/pkg/controller/backup_controller.go#L590). If the PreBackupActions return an err, the backup object is not processed and the Backup phase will be set as `FailedPreBackupActions`. + +- PostBackupAction: plugin actions are executed after the backup is finished and persisted, more precisely _after_ function [c.runBackup](https://github.com/vmware-tanzu/velero/blob/74476db9d791fa91bba0147eac8ec189820adb3d/pkg/controller/backup_controller.go#L274). + +The proposed plugin hooks will execute actions that will have statuses on their own: +`Backup.Status.PreBackupActionsStatuses` and `Backup.Status.PostBackupActionsStatuses` which will be an array of a proposed struct `ActionStatus` with PluginName, StartTimestamp, CompletionTimestamp and Phase. + +### PreRestore and PostRestore Actions + +For the restore, the sequence of events of Velero restore are the following (these sequence depicted is prior upcoming changes for [upload progress #3533](https://github.com/vmware-tanzu/velero/issues/3533) ): + +New Restore Request + |--> Validation of the request + |--> Checks if restore is from a backup or a schedule + |--> Fetches backup + |--> Set Restore Phase "In Progress" + |--> Start Restore + |--> Discover all Plugins + |--> Download backup file to temp + |--> Fetch list of volumes snapshots + |--> Restore K8s items, including PVs + |--> Final Restore Phase is determined + |--> Persist Restore logs on Object Storage + +We propose the pre-restore and post-restore plugin hooks to be executed in this sequence: + +New Restore Request + |--> Validation of the request + |--> Checks if restore is from a backup or a schedule + |--> Fetches backup + |--> Set Restore Phase "In Progress" + |--> Start Restore + |--> Discover all Plugins + |--> Download backup file to temp + |--> Fetch list of volumes snapshots + |--> *PreRestoreActions* are executed, logging actions on existent backup log file + |--> Restore K8s items, including PVs + |--> Final Restore Phase is determined + |--> Persist Restore logs on Object Storage + |--> *PostRestoreActions* are executed, logging to its own file + +These plugin hooks will be invoked: + +- PreRestoreAction: plugin actions are executed after the restore object is created and validated and before the backup object is fetched, more precisely in function `runValidatedRestore` _after_ function [info.backupStore.GetBackupVolumeSnapshots](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L460). If the PreRestoreActions return an err, the restore object is not processed and the Restore phase will be set a `FailedPreRestoreActions`. + +- PostRestoreAction: plugin actions are executed after the restore finishes processing all items and volumes snapshots are restored and logs persisted, more precisely in function `processRestore` _after_ setting [`restore.Status.CompletionTimestamp`](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L273). + +The proposed plugin hooks will execute actions that will have statuses on their own: +`Restore.Status.PreRestoreActionsStatuses` and `Restore.Status.PostRestoreActionsStatuses` which will be an array of a proposed struct `ActionStatus` with PluginName, StartTimestamp, CompletionTimestamp and Phase. ## Background @@ -46,9 +121,11 @@ This design seeks to provide missing extension points. This proposal's scope is - Provide PreBackupAction, PostBackupAction, PreRestoreAction, and PostRestoreAction APIs for plugins to implement. - Update Velero backup and restore creation logic to invoke registered PreBackupAction and PreRestoreAction plugins before processing the backup and restore respectively. - Update Velero backup and restore complete logic to invoke registered PostBackupAction and PostRestoreAction plugins the objects are uploaded on disk. -- Create two new Backup phases: `ExecutingPreBackupActions` (after `New` and before `InProgress`) and `ExecutingPostBackupActions` (after `Uploading` or `UploadingPartialFailure`) -- Create two new Restore phases: `ExecutingPreRestoreActions` (after `New` and before `InProgress`) and `ExecutingPostRestoreActions` (after `InProgress`) - +- Create one `ActionStatus` struct to keep track of execution of the plugin hooks. This struct has PluginName, StartTimestamp, CompletionTimestamp and Phase. +- Add sub statuses for the plugins on Backup object: `Backup.Status.PreBackupActionsStatuses` and `Backup.Status.PostBackupActionsStatuses`. They will be flagged as optional and nullable. They will be populated only each plugin registered for the PreBackup and PostBackup hooks, respectively. +- Add sub statuses for the plugins on Restore object: `Backup.Status.PreRestoreActionsStatuses` and `Backup.Status.PostRestoreActionsStatuses`. They will be flagged as optional and nullable. They will be populated only each plugin registered for the PreRestore and PostRestore hooks, respectively. +- that will be populated optionally if Pre/Post Backup/Restore. + ## Non-Goals - Specific implementations of the PreBackupAction, PostBackupAction, PreRestoreAction and PostRestoreAction API beyond test cases. @@ -62,13 +139,13 @@ The PreBackupAction plugin API will resemble the BackupItemAction plugin hook de It will not receive any resource list items because the backup is not yet running at that stage. In addition, the `PreBackupAction` interface will only have an `Execute()` method since the plugin will be executed once per Backup creation, not per item. -The Velero backup controller will be modified so that if there are any PreBackupAction plugins registered, they will be executed after backup object is created and validated but they will execute prior to processing the backup items and volume snapshots. More precisely _before_ function [c.backupper.Backup](https://github.com/vmware-tanzu/velero/blob/74476db9d791fa91bba0147eac8ec189820adb3d/pkg/controller/backup_controller.go#L590). During the execution of `PreBackupAction`, the status of the backup object will be set to `ExecutingPreBackupActions`. If the PreBackupActions return an err, the function `runBackup` returns it and backup object is not processed. +The Velero backup controller will be modified so that if there are any PreBackupAction plugins registered, they will be executed after the restore object is created and validated and before the backup object is fetched, more precisely in function `runValidatedRestore` _after_ function [info.backupStore.GetBackupVolumeSnapshots](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L460). If the PreRestoreActions return an err, the restore object is not processed and the Restore phase will be set a `FailedPreRestoreActions`. The PostBackupAction plugin API will resemble the BackupItemAction plugin design, but with the fundamental difference that it will receive only as input the Velero `Backup` object without any resource list items. By this stage, the backup has already been executed, with items backed up and volumes snapshots processed and persisted. The `PostBackupAction` interface will only have an `Execute()` method since the plugin will be executed only once per Backup, not per item. -If there are any PostBackupAction plugins registered, they will be executed after backup is processed and persisted, more precisely _after_ [c.runBackup](https://github.com/vmware-tanzu/velero/blob/74476db9d791fa91bba0147eac8ec189820adb3d/pkg/controller/backup_controller.go#L274). During the execution of `PostBackupAction`, the status of the backup object will be set to `ExecutingPostBackupActions`. We want to capture the logs from the PostBackupActions on the object storage, so after execution of `PostBackupAction`, backup controller will persist the logs adding a new log on the existent backup store via a new method called `PatchBackup` on `BackupStore` interface. If the PostBackupActions return errors or warnings, these return statuses are counted towards to the backup Status object on `backup.Status.Warnings` and `backup.Status.Errors`. +If there are any PostBackupAction plugins registered, they will be executed after the restore finishes processing all items and volumes snapshots are restored and logs persisted, more precisely in function `processRestore` _after_ setting [`restore.Status.CompletionTimestamp`](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L273). The Velero restore controller package will be modified for `PreRestoreAction` and `PostRestoreAction`. @@ -84,10 +161,104 @@ The PostRestoreAction plugin API will resemble the RestoreItemAction plugin desi At this stage, the restore has already been executed. The `PostRestoreAction` interface will only have an `Execute()` method since the plugin will be executed only once per Restore, not per item. -If any PostRestoreAction plugins are registered, they will be executed after the restore finishes processing all items and volumes snapshots are restored and logs persisted, more precisely in function `processRestore` _after_ setting [`restore.Status.CompletionTimestamp`](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L273). In this case registerd, the status of the restore object will be set to `ExecutingPreRestoreActions`. If the actions return errors or warnings, these return statuses are counted towards to the restore Status object, on `restoreWarnings`, `restoreErrors` and dissaminated to the restore's final status. +If any PostRestoreAction plugins are registered, they will be executed after the restore finishes processing all items and volumes snapshots are restored and logs persisted, more precisely in function `processRestore` _after_ setting [`restore.Status.CompletionTimestamp`](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L273). ## Detailed Design +### New Status struct + +To keep the status of the plugins, we propose the following struct: + +```go +type ActionStatus struct { + // PluginName is the name of the registered plugin + // retrieved by the PluginManager as id.Name + // +optional + // +nullable + PluginName string `json:"pluginName,omitempty"` + + // StartTimestamp records the time the plugin started. + // +optional + // +nullable + StartTimestamp *metav1.Time `json:"startTimestamp,omitempty"` + + // CompletionTimestamp records the time the plugin was completed. + // +optional + // +nullable + CompletionTimestamp *metav1.Time `json:"completionTimestamp,omitempty"` + + // Phase is the current state of the Action. + // +optional + // +nullable + Phase ActionPhase `json:"phase,omitempty"` +} + +// ActionPhase is a string representation of the lifecycle phase of an action being executed by a plugin +// of a Velero backup. +// +kubebuilder:validation:Enum=InProgress;Completed;Failed +type ActionPhase string + +const ( + // ActionPhaseInProgress means the action has being executed + ActionPhaseInProgress ActionPhase = "InProgress" + + // ActionPhaseCompleted means the action finished successfully + ActionPhaseCompleted ActionPhase = "Completed" + + // ActionPhaseFailed means the action failed + ActionPhaseFailed ActionPhase = "Failed" +) + +``` + +### Backup Status of the Plugins + +The `Backup` Status section will have the follow: + +```go +type BackupStatus struct { + (...) + // PreBackupActionsStatuses contains information about the pre backup plugins's execution. + // Note that this information is will be only populated if there are prebackup plugins actions + // registered + // +optional + // +nullable + PreBackupActionsStatuses *[]ActionStatus `json:"preBackupActionsStatuses,omitempty"` + + // PostBackupActionsStatuses contains information about the post backup plugins's execution. + // Note that this information is will be only populated if there are postbackup plugins actions + // registered + // +optional + // +nullable + PostBackupActionsStatuses *[]ActionStatus `json:"postBackupActionsStatuses,omitempty"` + +} +``` + +### Restore Status of the Plugins + +The `Restore` Status section will have the follow: + +```go +type RestoreStatus struct { + (...) + // PreRestoreActionsStatuses contains information about the pre Restore plugins's execution. + // Note that this information is will be only populated if there are preRestore plugins actions + // registered + // +optional + // +nullable + PreRestoreActionsStatuses *[]ActionStatus `json:"preRestoreActionsStatuses,omitempty"` + + // PostRestoreActionsStatuses contains information about the post restore plugins's execution. + // Note that this information is will be only populated if there are postrestore plugins actions + // registered + // +optional + // +nullable + PostRestoreActionsStatuses *[]ActionStatus `json:"postRestoreActionsStatuses,omitempty"` + +} +``` + ### New types #### PreBackupAction @@ -150,6 +321,18 @@ type PostRestoreAction interface { `PostRestoreAction` will be defined in `pkg/plugin/velero/post_restore_action.go`. +### New BackupStore Interface Methods + +For the persistence of the logs originated from the PostBackup and PostRestore plugins, create two additional methods on `BackupStore` interface: + +```go +type BackupStore interface { + (...) + PutPostBackuplog(backup string, log io.Reader) error + PutPostRestoreLog(backup, restore string, log io.Reader) error + (...) +``` + ### Generate Protobuf Definitions and Client/Servers In `pkg/plugin/proto`, add the following: From f47dbc254701a80c2f734b2175b6c9d9a2a80840 Mon Sep 17 00:00:00 2001 From: Rafael Brito Date: Tue, 28 Sep 2021 18:53:38 -0500 Subject: [PATCH 5/7] Update design on prepost-backup-restore plugins --- .../new-prepost-backuprestore-plugin-hooks.md | 31 ++++++++++--------- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/design/new-prepost-backuprestore-plugin-hooks.md b/design/new-prepost-backuprestore-plugin-hooks.md index 703e0043c9..07f53e90c5 100644 --- a/design/new-prepost-backuprestore-plugin-hooks.md +++ b/design/new-prepost-backuprestore-plugin-hooks.md @@ -10,6 +10,7 @@ The proposed plugin hooks are to be executed only once: pre-backup (before backu For the backup, the sequence of events of Velero backup are the following (these sequence depicted is prior upcoming changes for [upload progress #3533](https://github.com/vmware-tanzu/velero/issues/3533) ): +``` New Backup Request |--> Validation of the request |--> Set Backup Phase "In Progress" @@ -20,22 +21,23 @@ New Backup Request |--> Perform all Volumes Snapshots |--> Final Backup Phase is determined |--> Persist Backup and Logs on Object Storage - +``` We propose the pre-backup and post-backup plugin hooks to be executed in this sequence: +``` New Backup Request |--> Validation of the request |--> Set Backup Phase "In Progress" | --> Start Backup | --> Discover all Plugins |--> Check if Backup Exists - |--> *PreBackupActions* are executed, logging actions on existent backup log file + |--> **PreBackupActions** are executed, logging actions on existent backup log file |--> Backup all K8s Resource Items |--> Perform all Volumes Snapshots |--> Final Backup Phase is determined |--> Persist Backup and logs on Object Storage - |--> *PostBackupActions* are executed, logging to its own file - + |--> **PostBackupActions** are executed, logging to its own file +``` These plugin hooks will be invoked: - PreBackupAction: plugin actions are executed after the backup object is created and validated but before the backup is being processed, more precisely _before_ function [c.backupper.Backup](https://github.com/vmware-tanzu/velero/blob/74476db9d791fa91bba0147eac8ec189820adb3d/pkg/controller/backup_controller.go#L590). If the PreBackupActions return an err, the backup object is not processed and the Backup phase will be set as `FailedPreBackupActions`. @@ -48,7 +50,7 @@ The proposed plugin hooks will execute actions that will have statuses on their ### PreRestore and PostRestore Actions For the restore, the sequence of events of Velero restore are the following (these sequence depicted is prior upcoming changes for [upload progress #3533](https://github.com/vmware-tanzu/velero/issues/3533) ): - +``` New Restore Request |--> Validation of the request |--> Checks if restore is from a backup or a schedule @@ -61,9 +63,9 @@ New Restore Request |--> Restore K8s items, including PVs |--> Final Restore Phase is determined |--> Persist Restore logs on Object Storage - +``` We propose the pre-restore and post-restore plugin hooks to be executed in this sequence: - +``` New Restore Request |--> Validation of the request |--> Checks if restore is from a backup or a schedule @@ -73,11 +75,12 @@ New Restore Request |--> Discover all Plugins |--> Download backup file to temp |--> Fetch list of volumes snapshots - |--> *PreRestoreActions* are executed, logging actions on existent backup log file + |--> **PreRestoreActions** are executed, logging actions on existent backup log file |--> Restore K8s items, including PVs |--> Final Restore Phase is determined |--> Persist Restore logs on Object Storage - |--> *PostRestoreActions* are executed, logging to its own file + |--> **PostRestoreActions** are executed, logging to its own file +``` These plugin hooks will be invoked: @@ -102,7 +105,7 @@ Today, Velero's Restic integration is the response for such use cases, but there - Quiesce/unquiesce workloads: Pod hooks are useful for quiescing/unquiescing workloads, but platform engineers often do not have the luxury/visibility/time/knowledge to go through each pod in order to add specific commands to quiesce/unquiesce workloads. - Orphan PVC/PV pairs: PVCs/PVs that do not have associated running pods are not backed up and consequently, are not migrated. -Aiming to address these two limitations, and separate from this proposal, we would like to write a Velero plugin that takes advantage of the proposed Pre-Backup plugin hook. This plugin will be executed _once_ (not per resource item) prior backup. It will scale down the applications setting `.spec.replicas=0` to all deployments, statefulsets, daemonsets, replicasets, etc. and will start a small-footprint staging pod that will mount all PVC/PV pairs. Similarly, we would like to write another plugin that will utilize the proposed Post-Restore plugin hook. This plugin will unquiesce migrated applications by killing the staging pod and reinstating original `.spec.replicas values` after the Velero restore is completed. +Aiming to address these two limitations, and separate from this proposal, we would like to write a Velero plugin that takes advantage of the proposed Pre-Backup plugin hook. This plugin will be executed _once_ (not per resource item) prior backup. It will scale down the applications setting `.spec.replicas=0` to all deployments, statefulsets, daemonsets, replicasets, etc. and will start a small-footprint staging pod that will mount all PVC/PV pairs. Similarly, we would like to write another plugin that will utilize the proposed Post-Restore plugin hook. This plugin will unquiesce migrated applications by killing the staging pod and reinstating original `.spec.replicas` values after the Velero restore is completed. Other examples of plugins that can use the proposed plugin hooks are: @@ -139,13 +142,13 @@ The PreBackupAction plugin API will resemble the BackupItemAction plugin hook de It will not receive any resource list items because the backup is not yet running at that stage. In addition, the `PreBackupAction` interface will only have an `Execute()` method since the plugin will be executed once per Backup creation, not per item. -The Velero backup controller will be modified so that if there are any PreBackupAction plugins registered, they will be executed after the restore object is created and validated and before the backup object is fetched, more precisely in function `runValidatedRestore` _after_ function [info.backupStore.GetBackupVolumeSnapshots](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L460). If the PreRestoreActions return an err, the restore object is not processed and the Restore phase will be set a `FailedPreRestoreActions`. +The Velero backup controller will be modified so that if there are any PreBackupAction plugins registered, they will be The PostBackupAction plugin API will resemble the BackupItemAction plugin design, but with the fundamental difference that it will receive only as input the Velero `Backup` object without any resource list items. By this stage, the backup has already been executed, with items backed up and volumes snapshots processed and persisted. The `PostBackupAction` interface will only have an `Execute()` method since the plugin will be executed only once per Backup, not per item. -If there are any PostBackupAction plugins registered, they will be executed after the restore finishes processing all items and volumes snapshots are restored and logs persisted, more precisely in function `processRestore` _after_ setting [`restore.Status.CompletionTimestamp`](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L273). +If there are any PostBackupAction plugins registered, they will be executed after the backup is finished and persisted, more precisely _after_ function [c.runBackup](https://github.com/vmware-tanzu/velero/blob/74476db9d791fa91bba0147eac8ec189820adb3d/pkg/controller/backup_controller.go#L274). The Velero restore controller package will be modified for `PreRestoreAction` and `PostRestoreAction`. @@ -153,9 +156,7 @@ The PreRestoreAction plugin API will resemble the RestoreItemAction plugin desig It will not receive any resource list items because the restore has not yet been running at that stage. In addition, the `PreRestoreAction` interface will only have an `Execute()` method since the plugin will be executed only once per Restore creation, not per item. -The Velero restore controller will be modified so that if there are any PreRestoreAction plugins registered, they will be executed after restore object is created and the basic semantics of restore object are passed, more precisely in function `validateAndComplete` _before_ function [backupXorScheduleProvided](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L316). At this point, the backup or schedule object have not been retrieved yet. -Inside the `PreRestoreAction` plugin execution, the status of the restore object will be set to `ExecutingPreRestoreActions` and we will proactively sync the object storage. -If the PreRestoreActions return an err, the restore object is not processed. If the PreRestoreActions return an err, the function `ValidatedRestore` returns it and restore object is not processed. +The Velero restore controller will be modified so that if there are any PreRestoreAction plugins registered, they will be executed after the restore object is created and validated and before the backup object is fetched, more precisely in function `runValidatedRestore` _after_ function [info.backupStore.GetBackupVolumeSnapshots](https://github.com/vmware-tanzu/velero/blob/7c75cd6cf854064c9a454e53ba22cc5881d3f1f0/pkg/controller/restore_controller.go#L460). If the PreRestoreActions return an err, the restore object is not processed and the Restore phase will be set a `FailedPreRestoreActions`. The PostRestoreAction plugin API will resemble the RestoreItemAction plugin design, but with the fundamental difference that it will receive only as input the Velero `Restore` object without any resource list items. At this stage, the restore has already been executed. From 82ce62e8a4214e4b8178bf802f8cf9c2d7179248 Mon Sep 17 00:00:00 2001 From: Rafael Brito Date: Tue, 5 Oct 2021 10:32:14 -0500 Subject: [PATCH 6/7] More color on how to call plugins Signed-off-by: Rafael Brito --- .../new-prepost-backuprestore-plugin-hooks.md | 165 +++++++++++++++++- 1 file changed, 156 insertions(+), 9 deletions(-) diff --git a/design/new-prepost-backuprestore-plugin-hooks.md b/design/new-prepost-backuprestore-plugin-hooks.md index 07f53e90c5..b4ea16912e 100644 --- a/design/new-prepost-backuprestore-plugin-hooks.md +++ b/design/new-prepost-backuprestore-plugin-hooks.md @@ -260,7 +260,57 @@ type RestoreStatus struct { } ``` -### New types +### New Backup and Restore Phases + +#### New Backup Phase: FailedPreBackupActions + +In case the PreBackupActionsStatuses has at least one `ActionPhase` = `Failed`, it means al least one of the plugins returned an error and consequently, the backup will not move forward. The final status of the Backup object will be set as `FailedPreBackupActions`: + +```go + +// BackupPhase is a string representation of the lifecycle phase +// of a Velero backup. +// +kubebuilder:validation:Enum=New;FailedValidation;FailedPreBackupActions;InProgress;Uploading;UploadingPartialFailure;Completed;PartiallyFailed;Failed;Deleting +type BackupPhase string + +const ( + + (...) + + // BackupPhaseFailedPreBackupActions means one or more the Pre Backup Actions has failed + // and therefore backup will not run. + BackupPhaseFailedPreBackupActions BackupPhase = "FailedPreBackupActions" + + (...) +) + +``` + +#### New Restore Phase FailedPreRestoreActions + +In case the PreRestoreActionsStatuses has at least one `ActionPhase` = `Failed`, it means al least one of the plugins returned an error and consequently, the restore will not move forward. The final status of the Restore object will be set as `FailedPreRestoreActions`: + +```go + +// RestorePhase is a string representation of the lifecycle phase +// of a Velero restore +// +kubebuilder:validation:Enum=New;FailedValidation;FailedPreRestoreActions;InProgress;Completed;PartiallyFailed;Failed +type RestorePhase string + +const ( + + (...) + + // RestorePhaseFailedPreRestoreActions means one or more the Pre Restore Actions has failed + // and therefore restore will not run. + RestorePhaseFailedPreRestoreActions BackupPhase = "FailedPreRestoreActions" + + (...) +) + +``` + +### New Interface types #### PreBackupAction @@ -269,8 +319,8 @@ The `PreBackupAction` interface is as follows: ```go // PreBackupAction provides a hook into the backup process before it begins. type PreBackupAction interface { - // Execute the PreBackupAction plugin providing it access to the Backup that - // is being executed + // Execute the PreBackupAction plugin providing it access to the Backup that + // is being executed Execute(backup *api.Backup) error } ``` @@ -284,8 +334,8 @@ The `PostBackupAction` interface is as follows: ```go // PostBackupAction provides a hook into the backup process after it completes. type PostBackupAction interface { - // Execute the PostBackupAction plugin providing it access to the Backup that - // has been completed + // Execute the PostBackupAction plugin providing it access to the Backup that + // has been completed Execute(backup *api.Backup) error } ``` @@ -299,8 +349,8 @@ The `PreRestoreAction` interface is as follows: ```go // PreRestoreAction provides a hook into the restore process before it begins. type PreRestoreAction interface { - // Execute the PreRestoreAction plugin providing it access to the Restore that - // is being executed + // Execute the PreRestoreAction plugin providing it access to the Restore that + // is being executed Execute(restore *api.Restore) error } ``` @@ -314,8 +364,8 @@ The `PostRestoreAction` interface is as follows: ```go // PostRestoreAction provides a hook into the restore process after it completes. type PostRestoreAction interface { - // Execute the PostRestoreAction plugin providing it access to the Restore that - // has been completed + // Execute the PostRestoreAction plugin providing it access to the Restore that + // has been completed Execute(restore *api.Restore) error } ``` @@ -334,6 +384,9 @@ type BackupStore interface { (...) ``` +The implementation of these new two methods will go hand-in-hand with the changes of uploading phases rebase. + + ### Generate Protobuf Definitions and Client/Servers In `pkg/plugin/proto`, add the following: @@ -543,6 +596,100 @@ type Manager interface { `GetPreRestoreAction` and `GetPreRestoreActions` will invoke the `restartablePreRestoreAction` implementations. `GetPostRestoreAction` and `GetPostRestoreActions` will invoke the `restartablePostRestoreAction` implementations. +### How to invoke the Plugins + +#### Getting Pre/Post Backup Actions + +Getting Actions on `backup_controller.go` in `runBackup`: + +```go + + backupLog.Info("Getting PreBackup actions") + preBackupActions, err := pluginManager.GetPreBackupActions() + if err != nil { + return err + } + + backupLog.Info("Getting PostBackup actions") + postBackupActions, err := pluginManager.GetPostBackupActions() + if err != nil { + return err + } +``` + +#### Pre Backup Actions Plugins + +Calling the Pre Backup actions: + +```go + for _, preBackupAction := range preBackupActions { + err := preBackupAction.Execute(backup.Backup) + if err != nil { + backup.Backup.Status.Phase = velerov1api.BackupPhaseFailedPreBackupActions + return err + } + } +``` + +#### Post Backup Actions Plugins + +Calling the Post Backup actions: + +```go + for _, postBackupAction := range postBackupActions { + err := postBackupAction.Execute(backup.Backup) + if err != nil { + postBackupLog.Error(err) + } + } +``` + +#### Getting Pre/Post Restore Actions + +Getting Actions on `restore_controller.go` in `runValidatedRestore`: + +```go + + restoreLog.Info("Getting PreRestore actions") + preRestoreActions, err := pluginManager.GetPreRestoreActions() + if err != nil { + return errors.Wrap(err, "error getting pre-restore actions") + } + + restoreLog.Info("Getting PostRestore actions") + postRestoreActions, err := pluginManager.GetPostRestoreActions() + if err != nil { + return errors.Wrap(err, "error getting post-restore actions") + } +``` + +#### Pre Restore Actions Plugins + +Calling the Pre Restore actions: + +```go + for _, preRestoreAction := range preRestoreActions { + err := preRestoreAction.Execute(restoreReq.Restore) + if err != nil { + restoreReq.Restore.Status.Phase = velerov1api.RestorePhaseFailedPreRestoreActions + return errors.Wrap(err, "error executing pre-restore action") + } + } +``` + +#### Post Restore Actions Plugins + +Calling the Post Restore actions: + +```go + for _, postRestoreAction := range postRestoreActions { + err := postRestoreAction.Execute(restoreReq.Restore) + if err != nil { + postRestoreLog.Error(err.Error()) + } + } +``` + ## Alternatives Considered An alternative to these plugin hooks is to implement all the pre/post backup/restore logic _outside_ Velero. From cd5fa3b0d5d9efafdeba0ef2c24378b1d4f37310 Mon Sep 17 00:00:00 2001 From: Rafael Brito Date: Mon, 13 Dec 2021 22:01:20 -0600 Subject: [PATCH 7/7] Proposing annotations to skip plugin execution Signed-off-by: Rafael Brito --- .../new-prepost-backuprestore-plugin-hooks.md | 22 +++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/design/new-prepost-backuprestore-plugin-hooks.md b/design/new-prepost-backuprestore-plugin-hooks.md index b4ea16912e..65e3051472 100644 --- a/design/new-prepost-backuprestore-plugin-hooks.md +++ b/design/new-prepost-backuprestore-plugin-hooks.md @@ -690,6 +690,28 @@ Calling the Post Restore actions: } ``` +### Giving the User the Option to Skip the Execution of the Plugins + +Velero plugins are loaded as init containers. If plugins are unloaded, they trigger a restart of the Velero controller. +Not mentioning if one plugin does get loaded for any reason (i.e., docker hub image pace limit), Velero does not start. +In other words, the constant load/unload of plugins can disrupt the Velero controller, and they cannot be the only method to run the actions from these plugins selectively. +As part of this proposal, we want to give the velero user the ability to skip the execution of the plugins via annotations on the Velero CR backup and restore objects. +If one of these exists, the given plugin, referenced below as `plugin-name`, will be skipped. + +Backup Object Annotations: + +``` + /prebackup=skip + /postbackup=skip +``` + +Restore Object Annotations: + +``` + /prerestore=skip + /postrestore=skip +``` + ## Alternatives Considered An alternative to these plugin hooks is to implement all the pre/post backup/restore logic _outside_ Velero.