-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GKE to AWS - msg="unable to successfully complete restic restores of pod's volumes" error="timed out waiting for all PodVolumeRestores to complete" #1993
Comments
Do you have a storage class on the AWS cluster that has the same name as the one the backed-up PVs were on the GKE cluster? Or, did you set up a storage class mapping? |
I did a storage class mapping using below yaml and it is able to create the PV and PVCs but unable to restore data from the source.
|
Hmm, OK. I am also noticing that the pods you're ending up with in the AWS cluster have different names than the ones that were backed up from the GKE cluster. GKE:
AWS:
This implies that the pods that were restored by Velero were subsequently deleted and replaced with new ones, likely by the deployment/replicaset controllers on the target cluster. This would pose a problem for the restic restore process. Unfortunately, I'm not sure why this would be happening. Let's look at some more info: can you provide the YAML for the deployments, replicasets, and pods on both the GKE and AWS clusters? It'd be easiest if you could put this into a gist using YAML formatting. |
Hi Kriss, Please let me know in case if you are looking for any additional information. |
Sorry for the delay on this. I'm seeing that in the
So it looks like the ReplicaSets that Velero actually restored were very quickly replaced by new ones, implying a Deployment rollout happened (https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#updating-a-deployment). If you still have the environment around, could you provide the YAML for all of these ReplicaSets in the target cluster? According to the documentation, we should only get a Deployment rollout if the pod template spec changes, so I'm wondering if we can identify what changed. |
Potentially similar cause to #1981 |
closing this out as inactive, feel free to reach out again as needed. |
What steps did you take and what happened:
What steps did you take and what happened:*
I was trying to restore velero restic backup taken from GKE cluster to AWS(kops) cluster.
On target AWS, backup location is pointing to the GCP bucket where backup. has been taken.
Once I initiate the restore from AWS K8s cluster, it starts restoration, and it even restores, namespace, Deployments, pods, Persitent Volumes etc. without any issues.
It stucks in pending status for a very long time and finally comes out as partially failed. From the logs we can see below message
time="2019-10-24T15:15:12Z" level=error msg="unable to successfully complete restic restores of pod's volumes" error="timed out waiting for all PodVolumeRestores to complete" logSource="pkg/restore/restore.go:1126" restore=velero/restore-from-gcp
time="2019-10-24T15:15:12Z" level=error msg="unable to successfully complete restic restores of pod's volumes" error="timed out waiting for all PodVolumeRestores to complete" logSource="pkg/restore/restore.go:1126" restore=velero/restore-from-gcp
time="2019-10-24T15:15:12Z" level=info msg="restore completed" logSource="pkg/controller/restore_controller.go:465" restore=velero/restore-from-gcp.
But the strange thing is, it is able to restore all other componentes including the creation of Persistent Volumes but unable to restore the data.
Source
GKE Cluster
Source backup location: GCP bucket
[prassomanp_gmail_com@bastion wp-mysql]$ velero create backup wp-mysql --include-namespaces webapp
Backup request "wp-mysql" submitted successfully.
Run
velero backup describe wp-mysql
orvelero backup logs wp-mysql
for more details.[prassomanp_gmail_com@bastion wp-mysql]$
[prassomanp_gmail_com@bastion velero-v1.1.0-linux-amd64]$ velero backup describe wp-mysql --details
Name: wp-mysql
Namespace: velero
Labels: velero.io/storage-location=default
Annotations:
Phase: Completed
Namespaces:
Included: webapp
Excluded:
Resources:
Included: *
Excluded:
Cluster-scoped: auto
Label selector:
Storage Location: default
Snapshot PVs: auto
TTL: 720h0m0s
Hooks:
Backup Format Version: 1
Started: 2019-10-24 19:18:09 +0530 IST
Completed: 2019-10-24 19:18:28 +0530 IST
Expiration: 2019-11-23 19:18:09 +0530 IST
Resource List:
apps/v1/Deployment:
- webapp/wordpress
- webapp/wordpress-mysql
apps/v1/ReplicaSet:
- webapp/wordpress-dccb8668f
- webapp/wordpress-mysql-7d4fc77fdc
v1/Endpoints:
- webapp/wordpress
- webapp/wordpress-mysql
v1/Event:
- webapp/mysql-pv-claim.15d0981661b47672
- webapp/wordpress-dccb8668f-zzx65.15d0981db4ec6cfe
- webapp/wordpress-dccb8668f-zzx65.15d0981e2c0c49cb
- webapp/wordpress-dccb8668f-zzx65.15d0981f32b733d9
- webapp/wordpress-dccb8668f-zzx65.15d09820af5e6504
- webapp/wordpress-dccb8668f-zzx65.15d0982466dfc134
- webapp/wordpress-dccb8668f-zzx65.15d09824f8788385
- webapp/wordpress-dccb8668f-zzx65.15d09825030857dc
- webapp/wordpress-dccb8668f.15d0981db4da467e
- webapp/wordpress-mysql-7d4fc77fdc-bx6rz.15d09815e87ffe56
- webapp/wordpress-mysql-7d4fc77fdc-bx6rz.15d0981663f0b116
- webapp/wordpress-mysql-7d4fc77fdc-bx6rz.15d09817691f3518
- webapp/wordpress-mysql-7d4fc77fdc-bx6rz.15d0981ac656a7f5
- webapp/wordpress-mysql-7d4fc77fdc-bx6rz.15d0981aca686649
- webapp/wordpress-mysql-7d4fc77fdc-bx6rz.15d0981ad5657418
- webapp/wordpress-mysql-7d4fc77fdc.15d09815e8236957
- webapp/wordpress-mysql.15d09815e6831b9b
- webapp/wordpress.15d0981d9fe6ef20
- webapp/wordpress.15d0981db2a2e909
- webapp/wordpress.15d0982a8e3167ec
- webapp/wp-pv-claim.15d0981e29d3b0ed
v1/Namespace:
- webapp
v1/PersistentVolume:
- pvc-a79b80c7-f661-11e9-a78e-42010a80013b
- pvc-bb9786ef-f661-11e9-a78e-42010a80013b
v1/PersistentVolumeClaim:
- webapp/mysql-pv-claim
- webapp/wp-pv-claim
v1/Pod:
- webapp/wordpress-dccb8668f-zzx65
- webapp/wordpress-mysql-7d4fc77fdc-bx6rz
v1/ResourceQuota:
- webapp/gke-resource-quotas
v1/Secret:
- webapp/default-token-rc6mx
- webapp/mysql-pass
v1/Service:
- webapp/wordpress
- webapp/wordpress-mysql
v1/ServiceAccount:
- webapp/default
Persistent Volumes:
Restic Backups:
Completed:
webapp/wordpress-dccb8668f-zzx65: wordpress-persistent-storage
webapp/wordpress-mysql-7d4fc77fdc-bx6rz: mysql-persistent-storage
[prassomanp_gmail_com@bastion velero-v1.1.0-linux-amd64]$
Target
AWS cluster (KOPS)
backup-location: Source GCP bucket
[ec2-user@ip-172-31-87-112 velero-v1.1.0-linux-amd64]$ velero backup-location get
NAME PROVIDER BUCKET/PREFIX ACCESS MODE
default gcp gcpvelerotest ReadWrite
[ec2-user@ip-172-31-87-112 velero-v1.1.0-linux-amd64]$ velero backup get
NAME STATUS CREATED EXPIRES STORAGE LOCATION SELECTOR
backup-w-annotate Completed 2019-10-23 17:30:55 +0530 IST 28d default
backup-wo-annotate Completed 2019-10-23 17:25:00 +0530 IST 28d default
wp-mysql Completed 2019-10-24 19:18:09 +0530 IST 29d default
[ec2-user@ip-172-31-87-112 velero-v1.1.0-linux-amd64]$
[ec2-user@ip-172-31-87-112 velero-v1.1.0-linux-amd64]$ velero restore create restore-from-gcp --from-backup wp-mysql
Restore request "restore-from-gcp" submitted successfully.
Run
velero restore describe restore-from-gcp
orvelero restore logs restore-from-gcp
for more details.[ec2-user@ip-172-31-87-112 velero-v1.1.0-linux-amd64]$
[ec2-user@ip-172-31-87-112 velero-v1.1.0-linux-amd64]$ velero restore get
NAME BACKUP STATUS WARNINGS ERRORS CREATED SELECTOR
restore-from-gcp wp-mysql InProgress 0 0 2019-10-24 19:45:12 +0530 IST
[ec2-user@ip-172-31-87-112 velero-v1.1.0-linux-amd64]$
While restore is still in progress, the namespace got restored in aws along with Pods,services and Persistent volumes.
[ec2-user@ip-172-31-87-112 velero-v1.1.0-linux-amd64]$ kubectl get all -n webapp
NAME READY STATUS RESTARTS AGE
pod/wordpress-76b5d9f5c8-hfnjr 1/1 Running 0 37m
pod/wordpress-mysql-66594fb556-fpmsp 1/1 Running 0 37m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/wordpress LoadBalancer 100.66.57.204 ab25be46af66811e9a4310a2a60e9fd1-495652919.us-east-1.elb.amazonaws.com 80:32467/TCP 37m
service/wordpress-mysql ClusterIP None 3306/TCP 37m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/wordpress 1/1 1 1 37m
deployment.apps/wordpress-mysql 1/1 1 1 37m
NAME DESIRED CURRENT READY AGE
replicaset.apps/wordpress-76b5d9f5c8 1 1 1 37m
replicaset.apps/wordpress-dccb8668f 0 0 0 37m
replicaset.apps/wordpress-mysql-66594fb556 1 1 1 37m
replicaset.apps/wordpress-mysql-7d4fc77fdc 0 0 0 37m
[ec2-user@ip-172-31-87-112 velero-v1.1.0-linux-amd64]$
[ec2-user@ip-172-31-87-112 velero-v1.1.0-linux-amd64]$ kubectl get pvc -n webapp
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
mysql-pv-claim Bound pvc-b1c5cf63-f668-11e9-a431-0a2a60e9fd17 3Gi RWO gp2 37m
wp-pv-claim Bound pvc-b1cb2c62-f668-11e9-a431-0a2a60e9fd17 3Gi RWO gp2 37m
[ec2-user@ip-172-31-87-112 velero-v1.1.0-linux-amd64]$
And finally, it shows as partially failed.
[ec2-user@ip-172-31-87-112 velero-v1.1.0-linux-amd64]$ velero restore describe restore-from-gcp --details
Name: restore-from-gcp
Namespace: velero
Labels:
Annotations:
Phase: PartiallyFailed (run 'velero restore logs restore-from-gcp' for more information)
Errors:
Velero: timed out waiting for all PodVolumeRestores to complete
Cluster:
Namespaces:
Backup: wp-mysql
Namespaces:
Included: *
Excluded:
Resources:
Included: *
Excluded: nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
Cluster-scoped: auto
Namespace mappings:
Label selector:
Restore PVs: auto
Restic Restores:
New:
webapp/wordpress-dccb8668f-zzx65: wordpress-persistent-storage
webapp/wordpress-mysql-7d4fc77fdc-bx6rz: mysql-persistent-storage
[ec2-user@ip-172-31-87-112 velero-v1.1.0-linux-amd64]$
What did you expect to happen:
What steps did you take and what happened:*
I was trying to restore velero restic backup taken from GKE cluster to AWS(kops) cluster.
On target AWS, backup location is pointing to the GCP bucket where backup. has been taken.
Once I initiate the restore from AWS K8s cluster, it starts restoration, and it even restores, namespace, Deployments, pods, Persitent Volumes etc. without any issues.
It stucks in pending status for a very long time and finally comes out as partially failed. From the logs we can see below message
time="2019-10-24T15:15:12Z" level=error msg="unable to successfully complete restic restores of pod's volumes" error="timed out waiting for all PodVolumeRestores to complete" logSource="pkg/restore/restore.go:1126" restore=velero/restore-from-gcp
time="2019-10-24T15:15:12Z" level=error msg="unable to successfully complete restic restores of pod's volumes" error="timed out waiting for all PodVolumeRestores to complete" logSource="pkg/restore/restore.go:1126" restore=velero/restore-from-gcp
time="2019-10-24T15:15:12Z" level=info msg="restore completed" logSource="pkg/controller/restore_controller.go:465" restore=velero/restore-from-gcp.
But the strange thing is, it is able to restore all other componentes including the creation of Persistent Volumes but unable to restore the data.
What did you expect to happen:
A complete restoration including data.
In this case velero able to restore pods, services and persistent volume. But failed to restore data.
The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)
kubectl logs deployment/velero -n velero
velero backup describe <backupname>
orkubectl get backup/<backupname> -n velero -o yaml
velero backup logs <backupname>
velero restore describe <restorename>
orkubectl get restore/<restorename> -n velero -o yaml
velero restore logs <restorename>
target-aws-cluster-logs.txt
gke-source-cluster-logs.txt
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Attached more detailed logs.
Environment:
Both source & Target
velero version
):Client:
Version: v1.1.0
Git commit: a357f21
Server:
Version: v1.1.0
velero client config get features
):source
velero client config get features
Target
[ec2-user@ip-172-31-87-112 velero-v1.1.0-linux-amd64]$ velero client config get features
features:
[ec2-user@ip-172-31-87-112 velero-v1.1.0-linux-amd64]$
kubectl version
):AWS
[ec2-user@ip-172-31-87-112 velero-v1.1.0-linux-amd64]$ kubectl version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-15T19:18:23Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.6", GitCommit:"96fac5cd13a5dc064f7d9f4f23030a6aeface6cc", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:16Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
[ec2-user@ip-172-31-87-112 velero-v1.1.0-linux-amd64]$
GKE
[prassomanp_gmail_com@bastion ~]$ kubectl version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.1", GitCommit:"d647ddbd755faf07169599a625faf302ffc34458", GitTreeState:"clean", BuildDate:"2019-10-02T17:01:15Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.10-gke.0", GitCommit:"569511c9540f78a94cc6a41d895c382d0946c11a", GitTreeState:"clean", BuildDate:"2019-08-21T23:28:44Z", GoVersion:"go1.11.13b4", Compiler:"gc", Platform:"linux/amd64"}
[prassomanp_gmail_com@bastion ~]$
Kubernetes installer & version:
Cloud provider or hardware configuration:
Source: GKE
Target: AWS (kops)
OS (e.g. from
/etc/os-release
):The text was updated successfully, but these errors were encountered: