-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restic volume not restored when using OpenShift (DeploymentConfig + ReplicationController) #1981
Comments
hmm, based on the following lines:
it looks like there might be an issue with the informer caches. Could you try deleting all of the restic daemonset pods, letting them get re-created, and then trying another restore? (you'll want to delete the target namespace as well before kicking off the new restore) |
Hi @skriss Thanks for the answer. I already try this. I'm making a restore on another cluster. It is a fresh one so there should not be any cache ? Can this be the issue ? ( Restoring to another cluster) I'm not able to try it until monday, but not sure it will fix the issue since I already try on a fresh instance. Any other idea ? :) |
Hey , So I made a new fresh install to test this and make sure this is not a cache issue. Here is the whole procedure to help you debugging ( I install the restic DS before velero because I adapted for okd , this can be the issue maybe.) Instalation on Cluster Tools && Cluster Tools-B :
Cluster Tools Backup location
Cluster Tools-B Backup location
And then I edited BackupLocation cluster-tools to add "tools" Prefix. Now everything is running fine :
Cluster Tools : Creating the backup
Backup get velero backup get Velero Logs : https://gist.github.com/Stolr/23d0dd11b301150ccb336a12b77107a1 Backup description
https://gist.github.com/Stolr/9b862178df8f951cbd9b50357bd502c8 Backup logs
https://gist.github.com/Stolr/293051c52536541fec55f924f76386be I can see there is 2 error, but it says my restic are completed. It should not be relevant. This is probably due to some pods not correct in that namespace. First time didn't have that error but the restic issue was here. Now , On Cluster Tools-B
Same issue : The restore Stay in progress because of the restic not restored
Restic Logs https://gist.github.com/Stolr/dabac536a5235b87ecd184045ab2e7b5 Velero Logs https://gist.github.com/Stolr/5dfc2f7fea9c63f0ddbd61d9276ac984 Restore Logs No available PodVolumeRestore
My bitbucket data is not restored. No init container is created. But the postgres one is working as espected. Do you find something in all theses logs that can explain this ? Thanks for your help ! |
@Stolr i'm not exactly sure what's going on, but I do see that during the backup, the pod that's being backed up is I'm not super-familiar with OpenShift's deploymentconfigs and (apparently) their use of replication controllers, but in plain vanilla Kubernetes, the way this would work is we'd restore pod "14", then restore the replicaset controlling it, and that replicaset would see pod "14" and "adopt" it. It seems like possibly, something about the deploymentconfig/replicationcontroller is preventing this "adoption" from happening, and triggering the creating of a new pod "15". Does this ring any bells for you? Maybe we can figure it out together :) |
@Stolr @skriss sorry to bump into conversation, just a thought. Instead of annotating the pod itself, can you try annotating the pod template spec of the parent controller, i.e.: Deployment or ReplicationController? |
@skriss Wow thanks !! For some reason , openshift trigger a new deploy. Probably because of all resource beeing restore. No way to return on the 14 even with a rollback. I'm not super familiar also with Openshift Everything is working as espected using deployment. @yashbhutwala : I Might try this when i will be able. Thanks for your answer. Thanks both for your help. Since this issue is related to Openshift , you can close the issue if you want or rename it. Best Regards Thanks again for helping me getting through this. |
@sseago @dymurray do you guys have any thoughts on what's going on here? (#1981 (comment)) |
Off the top of my head, I'm not sure what's going on, although I haven't looked at the logs in detail yet. The redeployment of a new pod may well be affecting things here, since the new pod probably won't have the restic annotation. For the work my group has been doing, we actually do a two-phase backup/restore, in part to eliminate as much complexity as possible from the environment Restic is working in. We create a full backup without any restic annotations, and then a limited backup with just the PVs/PVCs and pods which mount them with the restic annotations. Then, on restore we first restore the restic backup (pods only, no deployments, deploymentconfigs, etc.) -- this is when the restic copies happen. Then those restored pods are deleted and we do the full restore (without restic annotations). I don't know that all of this is necessary for a basic backup/restore -- in our case we're using it for app migration from one cluster to another, with the possibility of running the restic/PV migration more than once before the final migration. In any case, if you're restoring deploymentconfigs which are then rolling out new pods post-restore, that could definitely interfere with restic. I don't know what the appropriate general-purpose answer is here -- our approach has been for a very specific migration use case. I wonder whether the same issue comes up with non-OpenShift resources. Daemonsets, Deployments, etc. Annotating the pod template spec, as suggested above (in addition to annotating the pod) may be the way to go here. I"m not sure whether it will resolve this issue completely or not, though. |
To add on to what Scott said, yes we hit this same problem very early on. This is a problem that extends beyond OCP specific restores, my understanding is that any pod which is managed by another resource faces this risk. If a pod is managed by another resource the restic restore will generally fail since both the pod and the managing resource is restored which causes the initial pod (with the restic annotation) to be overwritten. I could have sworn there was an open issue on this but I can't seem to find it right now. |
We haven't seen this, at least not with pods managed by replicasets/deployments. Per my comment (#1981 (comment)), during a restic restore, we first restore the pod & trigger a restic restore, then restore the owning replicaset and deployment. The pod is successfully "adopted" by the replicaset, since the pod's spec matches the pod template spec from the replicaset. If that behavior were different, then I agree it would likely cause problems with restic restores, which seems to be what we're seeing here. Can you shed any more light onto why the DeploymentConfig restore is triggering the creation of a new pod, rather than adopting the existing one? |
From what I've seen with DeploymentConfigs they don't always trigger new pods, but sometimes they do. I believe they actually do (initially) adopt the restored pod, as expected, but if there's a ConfigChange trigger registered, then the restore event on the DeploymentConfig will sometimes trigger that if the restore process looks like a configuration change. Most of my experience here is in restoring resources to a different cluster than the backup came from, with some spec params modified by a plugin on restore ("image" references, for example, if the image is located in an in-cluster registry). The pod as restored will run for a short amount of time, but will terminate as soon as the ConfigChange triggered replacement is ready. Most recently, this week I've restored a couple DeploymentConfigs to the same cluster as the backup was run in, and in that case I did not see a replacement being created post-restore. |
So I spent some time digging into this, and based on what I've learned I can say that yes the method Velero is currently taking with restic restores has it's shortcomings. Currently, we are lucky that a deployment doesn't trigger a new generation of the pod in 99% of the restore use cases. If you specifically trigger a redeploy during the restic restore then things will break as shown in #1919 . With deploymentconfigs, there are a number of triggers you can set which will trigger the redeploy of a pod, but the bigger issue is that currently with DCs the pod is restored first with the restic annotation and then later adopted to the DC controller and redeployed wiping the annotation out. If a plugin is used to not restore a pod if it's managed by a DC in conjunction with placing the annotation on the DC pod template spec then the restic restore has a good chance of succeeding, but the same concern that Kubernetes could trigger a new deployment for deployments and deploymentconfigs during restore is a larger problem that needs to be solved. |
open to ideas on how to improve this. the data populator KEP that's making the rounds upstream may be relevant/useful, though AFAIK it's only for PVs, not any pod volume. |
Well, I had just the same problem! Restore completed, no errors in logs but the PV is completely empty! Sucks. |
I wanted to restore only PVC with PV itself and did it: velero restore create --from-backup daily-20200528020046 --include-namespaces test-project --include-resources persistentvolumeclaims,persistentvolumes --restore-volumes=true Completed, no errors. But there is no data at all. |
What is interesting I tested it before but only after removing a whole project and then it was ok and even data was there. So it works only during restoring of whole projects? It is not possible to restore just a volume? |
I can confirm - I can restore volumes only restoring a whole project. So a whole namespace - it must be empty. So in my case, I needed to restore to a mapped temporary namespace. Then go there and scale everything down. Then spin a new POD just to attach PV and rsync data out of the volume to my host. Then I deleted temporary namespace. I run the helper POD again in my original project and I needed there to connect to PV and rsync all the data there. Later I did chown with the user ID of the container. Removed helper POD and then finally scale up the deployment. And it worked and data was there from the backup snapshot. But the process is very inconvenient in such cases, very clumsy. |
I'm facing this issue when I'm restoring a backup of prometheus-operator. My restore tests was done in the same cluster where backup lives but in another namespace. The production application was still live on it's own namespace. My cluster is running in EKS. It's version is 1.16. There are three PVs that should be backed-up: grafana, prometheus and alertmanager. Prometheus and grafana PVs could be restored without problems but alertmanager PV no, because alertmanager Statefulset is dynamically created by an Alertmanager object (from monitoring.coreos.com/v1 API). I can see in velero logs that it could successfully restore the alertmanager pod and could inject the restic-wait container on it. But, when Alertmanager object is restored, it creates the Statefulset which replaces the pod. This is the velero logs that proves the restic-wait container creation on alertmanager pod: time="2020-08-18T11:23:39Z" level=info msg="Restoring resource 'pods' into namespace 'monitoring-restored'" logSource="pkg/restore/restore.go:702" restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=info msg="Getting client for /v1, Kind=Pod" logSource="pkg/restore/restore.go:746" restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=info msg="Executing item action for pods" logSource="pkg/restore/restore.go:964" restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=info msg="Executing AddPVCFromPodAction" cmd=/velero logSource="pkg/restore/add_pvc_from_pod_action.go:44" pluginName=velero restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=info msg="Adding PVC monitoring/alertmanager-prometheus-operator-alertmanager-db-alertmanager-prometheus-operator-alertmanager-0 as an additional item to restore" cmd=/velero logSource="pkg/restore/add_pvc_from_pod_action.go:58" pluginName=velero restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=info msg="Skipping persistentvolumeclaims/monitoring-restored/alertmanager-prometheus-operator-alertmanager-db-alertmanager-prometheus-operator-alertmanager-0 because it's already been restored." logSource="pkg/restore/restore.go:844" restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=info msg="Executing item action for pods" logSource="pkg/restore/restore.go:964" restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=info msg="Executing item action for pods" logSource="pkg/restore/restore.go:964" restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=info msg="Executing ResticRestoreAction" cmd=/velero logSource="pkg/restore/restic_restore_action.go:69" pluginName=velero restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=info msg="Restic backups for pod found" cmd=/velero logSource="pkg/restore/restic_restore_action.go:95" pluginName=velero pod=monitoring/alertmanager-prometheus-operator-alertmanager-0 restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=debug msg="Getting plugin config" cmd=/velero logSource="pkg/restore/restic_restore_action.go:99" pluginName=velero pod=monitoring/alertmanager-prometheus-operator-alertmanager-0 restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=debug msg="No config found for plugin" cmd=/velero logSource="pkg/restore/restic_restore_action.go:160" pluginName=velero pod=monitoring/alertmanager-prometheus-operator-alertmanager-0 restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=info msg="Using image \"velero/velero-restic-restore-helper:v1.4.2\"" cmd=/velero logSource="pkg/restore/restic_restore_action.go:106" pluginName=velero pod=monitoring/alertmanager-prometheus-operator-alertmanager-0 restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=debug msg="No config found for plugin" cmd=/velero logSource="pkg/restore/restic_restore_action.go:195" pluginName=velero pod=monitoring/alertmanager-prometheus-operator-alertmanager-0 restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=debug msg="No config found for plugin" cmd=/velero logSource="pkg/restore/restic_restore_action.go:206" pluginName=velero pod=monitoring/alertmanager-prometheus-operator-alertmanager-0 restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=info msg="Done executing ResticRestoreAction" cmd=/velero logSource="pkg/restore/restic_restore_action.go:155" pluginName=velero restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=info msg="Attempting to restore Pod: alertmanager-prometheus-operator-alertmanager-0" logSource="pkg/restore/restore.go:1070" restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=debug msg="Acquiring lock" backupLocation=default logSource="pkg/restic/repository_ensurer.go:122" volumeNamespace=monitoring
time="2020-08-18T11:23:39Z" level=debug msg="Acquired lock" backupLocation=default logSource="pkg/restic/repository_ensurer.go:131" volumeNamespace=monitoring
time="2020-08-18T11:23:39Z" level=debug msg="Ready repository found" backupLocation=default logSource="pkg/restic/repository_ensurer.go:147" volumeNamespace=monitoring
time="2020-08-18T11:23:39Z" level=debug msg="Released lock" backupLocation=default logSource="pkg/restic/repository_ensurer.go:128" volumeNamespace=monitoring 1 second later, the Alertmanager object is restored: time="2020-08-18T11:23:40Z" level=info msg="Restoring resource 'alertmanagers.monitoring.coreos.com' into namespace 'monitoring-restored'" logSource="pkg/restore/restore.go:702" restore=velero/monitoring
time="2020-08-18T11:23:40Z" level=info msg="Getting client for monitoring.coreos.com/v1, Kind=Alertmanager" logSource="pkg/restore/restore.go:746" restore=velero/monitoring
time="2020-08-18T11:23:40Z" level=info msg="Attempting to restore Alertmanager: prometheus-operator-alertmanager" logSource="pkg/restore/restore.go:1070" restore=velero/monitoring This is the backup's content: velero backup describe monitoring --details
Name: monitoring
Namespace: velero
Labels: velero.io/storage-location=default
Annotations: velero.io/source-cluster-k8s-gitversion=v1.16.8-eks-e16311
velero.io/source-cluster-k8s-major-version=1
velero.io/source-cluster-k8s-minor-version=16+
Phase: Completed
Errors: 0
Warnings: 0
Namespaces:
Included: monitoring
Excluded: <none>
Resources:
Included: *
Excluded: certificates.cert-manager.io, certificaterequests.cert-manager.io, orders.acme.cert-manager.io
Cluster-scoped: auto
Label selector: <none>
Storage Location: default
Velero-Native Snapshot PVs: auto
TTL: 720h0m0s
Hooks: <none>
Backup Format Version: 1
Started: 2020-08-18 10:17:16 +0200 CEST
Completed: 2020-08-18 10:17:57 +0200 CEST
Expiration: 2020-09-17 10:17:16 +0200 CEST
Total items to be backed up: 234
Items backed up: 234
Resource List:
apiextensions.k8s.io/v1/CustomResourceDefinition:
- alertmanagers.monitoring.coreos.com
- prometheuses.monitoring.coreos.com
- prometheusrules.monitoring.coreos.com
- servicemonitors.monitoring.coreos.com
apps/v1/ControllerRevision:
- monitoring/alertmanager-prometheus-operator-alertmanager-54df75fb5b
- monitoring/prometheus-operator-prometheus-node-exporter-599f4fbbfd
- monitoring/prometheus-prometheus-operator-prometheus-6cbd9d8d8b
apps/v1/DaemonSet:
- monitoring/prometheus-operator-prometheus-node-exporter
apps/v1/Deployment:
- monitoring/prometheus-operator-grafana
- monitoring/prometheus-operator-kube-state-metrics
- monitoring/prometheus-operator-operator
apps/v1/ReplicaSet:
- monitoring/prometheus-operator-grafana-5986dbf74f
- monitoring/prometheus-operator-grafana-7ff4f8b97b
- monitoring/prometheus-operator-kube-state-metrics-6f8cc5ffd5
- monitoring/prometheus-operator-operator-fd978d8d7
apps/v1/StatefulSet:
- monitoring/alertmanager-prometheus-operator-alertmanager
- monitoring/prometheus-prometheus-operator-prometheus
extensions/v1beta1/Ingress:
- monitoring/prometheus-operator-alertmanager
- monitoring/prometheus-operator-grafana
- monitoring/prometheus-operator-prometheus
monitoring.coreos.com/v1/Alertmanager:
- monitoring/prometheus-operator-alertmanager
monitoring.coreos.com/v1/Prometheus:
- monitoring/prometheus-operator-prometheus
monitoring.coreos.com/v1/PrometheusRule:
- monitoring/prometheus-operator-alertmanager.rules
- monitoring/prometheus-operator-etcd
- monitoring/prometheus-operator-general.rules
- monitoring/prometheus-operator-k8s.rules
- monitoring/prometheus-operator-kube-apiserver-slos
- monitoring/prometheus-operator-kube-apiserver.rules
- monitoring/prometheus-operator-kube-prometheus-general.rules
- monitoring/prometheus-operator-kube-prometheus-node-recording.rules
- monitoring/prometheus-operator-kube-scheduler.rules
- monitoring/prometheus-operator-kube-state-metrics
- monitoring/prometheus-operator-kubelet.rules
- monitoring/prometheus-operator-kubernetes-apps
- monitoring/prometheus-operator-kubernetes-resources
- monitoring/prometheus-operator-kubernetes-storage
- monitoring/prometheus-operator-kubernetes-system
- monitoring/prometheus-operator-kubernetes-system-apiserver
- monitoring/prometheus-operator-kubernetes-system-controller-manager
- monitoring/prometheus-operator-kubernetes-system-kubelet
- monitoring/prometheus-operator-kubernetes-system-scheduler
- monitoring/prometheus-operator-node-exporter
- monitoring/prometheus-operator-node-exporter.rules
- monitoring/prometheus-operator-node-network
- monitoring/prometheus-operator-node.rules
- monitoring/prometheus-operator-prometheus
- monitoring/prometheus-operator-prometheus-operator
monitoring.coreos.com/v1/ServiceMonitor:
- monitoring/prometheus-operator-alertmanager
- monitoring/prometheus-operator-apiserver
- monitoring/prometheus-operator-coredns
- monitoring/prometheus-operator-grafana
- monitoring/prometheus-operator-kube-controller-manager
- monitoring/prometheus-operator-kube-etcd
- monitoring/prometheus-operator-kube-proxy
- monitoring/prometheus-operator-kube-scheduler
- monitoring/prometheus-operator-kube-state-metrics
- monitoring/prometheus-operator-kubelet
- monitoring/prometheus-operator-node-exporter
- monitoring/prometheus-operator-operator
- monitoring/prometheus-operator-prometheus
networking.k8s.io/v1beta1/Ingress:
- monitoring/prometheus-operator-alertmanager
- monitoring/prometheus-operator-grafana
- monitoring/prometheus-operator-prometheus
rbac.authorization.k8s.io/v1/ClusterRole:
- prometheus-operator-grafana-clusterrole
- prometheus-operator-kube-state-metrics
- prometheus-operator-operator
- prometheus-operator-operator-psp
- prometheus-operator-prometheus
- prometheus-operator-prometheus-psp
- psp-prometheus-operator-kube-state-metrics
- psp-prometheus-operator-prometheus-node-exporter
rbac.authorization.k8s.io/v1/ClusterRoleBinding:
- prometheus-operator-grafana-clusterrolebinding
- prometheus-operator-kube-state-metrics
- prometheus-operator-operator
- prometheus-operator-operator-psp
- prometheus-operator-prometheus
- prometheus-operator-prometheus-psp
- psp-prometheus-operator-kube-state-metrics
- psp-prometheus-operator-prometheus-node-exporter
rbac.authorization.k8s.io/v1/Role:
- monitoring/prometheus-operator-alertmanager
- monitoring/prometheus-operator-grafana
- monitoring/prometheus-operator-grafana-test
rbac.authorization.k8s.io/v1/RoleBinding:
- monitoring/prometheus-operator-alertmanager
- monitoring/prometheus-operator-grafana
- monitoring/prometheus-operator-grafana-test
v1/ConfigMap:
- monitoring/prometheus-operator-apiserver
- monitoring/prometheus-operator-cluster-total
- monitoring/prometheus-operator-controller-manager
- monitoring/prometheus-operator-etcd
- monitoring/prometheus-operator-grafana
- monitoring/prometheus-operator-grafana-config-dashboards
- monitoring/prometheus-operator-grafana-datasource
- monitoring/prometheus-operator-grafana-test
- monitoring/prometheus-operator-k8s-coredns
- monitoring/prometheus-operator-k8s-resources-cluster
- monitoring/prometheus-operator-k8s-resources-namespace
- monitoring/prometheus-operator-k8s-resources-node
- monitoring/prometheus-operator-k8s-resources-pod
- monitoring/prometheus-operator-k8s-resources-workload
- monitoring/prometheus-operator-k8s-resources-workloads-namespace
- monitoring/prometheus-operator-kubelet
- monitoring/prometheus-operator-namespace-by-pod
- monitoring/prometheus-operator-namespace-by-workload
- monitoring/prometheus-operator-node-cluster-rsrc-use
- monitoring/prometheus-operator-node-rsrc-use
- monitoring/prometheus-operator-nodes
- monitoring/prometheus-operator-persistentvolumesusage
- monitoring/prometheus-operator-pod-total
- monitoring/prometheus-operator-prometheus
- monitoring/prometheus-operator-proxy
- monitoring/prometheus-operator-scheduler
- monitoring/prometheus-operator-statefulset
- monitoring/prometheus-operator-workload-total
- monitoring/prometheus-prometheus-operator-prometheus-rulefiles-0
v1/Endpoints:
- monitoring/alertmanager-operated
- monitoring/prometheus-operated
- monitoring/prometheus-operator-alertmanager
- monitoring/prometheus-operator-grafana
- monitoring/prometheus-operator-kube-state-metrics
- monitoring/prometheus-operator-operator
- monitoring/prometheus-operator-prometheus
- monitoring/prometheus-operator-prometheus-node-exporter
v1/Event:
- monitoring/prometheus-operator-admission-create-ngxh5.162c4eac037b378f
- monitoring/prometheus-operator-admission-create-ngxh5.162c4eac3d7a4c20
- monitoring/prometheus-operator-admission-create-ngxh5.162c4eacfd856868
- monitoring/prometheus-operator-admission-create-ngxh5.162c4ead0a39ac70
- monitoring/prometheus-operator-admission-create-ngxh5.162c4ead13445eeb
- monitoring/prometheus-operator-admission-create-ngxh5.162c4ead713ac0dc
- monitoring/prometheus-operator-admission-create-ngxh5.162c4ead8cff268e
- monitoring/prometheus-operator-admission-create.162c4eac0309e0cb
- monitoring/prometheus-operator-admission-patch-4pt6r.162c4eb4cb068bca
- monitoring/prometheus-operator-admission-patch-4pt6r.162c4eb517441275
- monitoring/prometheus-operator-admission-patch-4pt6r.162c4eb51d3ac352
- monitoring/prometheus-operator-admission-patch-4pt6r.162c4eb52b6739be
- monitoring/prometheus-operator-admission-patch.162c4eb4ca533c92
- monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e619ce31070
- monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e637284176b
- monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e71b870b6b4
- monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7b2a4186a1
- monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7ba8d7beae
- monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7c23594737
- monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7c2b84195d
- monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7c36882b14
- monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7c67022081
- monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7e1d1be052
- monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7e2fa25cfc
- monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7e40871cd9
- monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7ec738b8ab
- monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7eca575457
- monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7ed9a8b480
- monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7ed9d7309c
- monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e826db5397b
- monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e82883b0412
- monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e82a018a2d3
- monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4eb25495f5c9
- monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4eb25496faf5
- monitoring/prometheus-operator-grafana-5986dbf74f-q7q88.162c4e6199f96154
- monitoring/prometheus-operator-grafana-5986dbf74f-q7q88.162c4e619a02fad3
- monitoring/prometheus-operator-grafana-5986dbf74f-q7q88.162c4e619a049a56
- monitoring/prometheus-operator-grafana-5986dbf74f.162c4e619c9d5316
- monitoring/prometheus-operator-grafana-5986dbf74f.162c4eb254704b48
- monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eaf32ce5cd2
- monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eaf7c0b856d
- monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eaf7f2b718e
- monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eaf874cd7a1
- monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eaf9133924c
- monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eaf9468b3d9
- monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eaf9decda56
- monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eafce8b1887
- monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eafd2252390
- monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eafdbc8ec47
- monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eafdc3af0c0
- monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eafe8f543b5
- monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eaff4fd17b2
- monitoring/prometheus-operator-grafana-7ff4f8b97b.162c4eaf31f3b2ba
- monitoring/prometheus-operator-grafana.162c4eaf3087e1e1
- monitoring/prometheus-operator-grafana.162c4eb253680d39
- monitoring/prometheus-operator-prometheus-node-exporter-slszj.162c4e71bff6bc71
- monitoring/prometheus-operator-prometheus-node-exporter-slszj.162c4e7210a514b8
- monitoring/prometheus-operator-prometheus-node-exporter-slszj.162c4e735b1ee1b9
- monitoring/prometheus-operator-prometheus-node-exporter-slszj.162c4e7405ed1b22
- monitoring/prometheus-operator-prometheus-node-exporter-slszj.162c4e74199ddb10
- monitoring/prometheus-operator-prometheus-node-exporter-slszj.162c4e7b19464cfd
- monitoring/prometheus-operator-prometheus-node-exporter-slszj.162c4e7b1a45f166
- monitoring/prometheus-operator-prometheus-node-exporter.162c4e71bdefdbaf
- monitoring/prometheus-operator-prometheus-node-exporter.162c4e7b1a499523
v1/Namespace:
- monitoring
v1/PersistentVolume:
- pvc-502cf99f-99fb-4a83-abd9-2a15bcf2a30d
- pvc-7107894a-2ede-473e-9c24-2cb5a3f9d7f1
- pvc-e6d638c0-b4a8-4bcf-a9d1-1f66c387c7e9
v1/PersistentVolumeClaim:
- monitoring/alertmanager-prometheus-operator-alertmanager-db-alertmanager-prometheus-operator-alertmanager-0
- monitoring/prometheus-operator-grafana
- monitoring/prometheus-prometheus-operator-prometheus-db-prometheus-prometheus-operator-prometheus-0
v1/Pod:
- monitoring/alertmanager-prometheus-operator-alertmanager-0
- monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk
- monitoring/prometheus-operator-kube-state-metrics-6f8cc5ffd5-47jbw
- monitoring/prometheus-operator-operator-fd978d8d7-cf956
- monitoring/prometheus-operator-prometheus-node-exporter-fxl7s
- monitoring/prometheus-prometheus-operator-prometheus-0
v1/Secret:
- monitoring/alertmanager-prometheus-operator-alertmanager
- monitoring/alertmanager.ict.navinfo.cloud-tls
- monitoring/default-token-vf8dm
- monitoring/grafana.ict.navinfo.cloud-tls
- monitoring/ict-admission
- monitoring/prometheus-operator-admission
- monitoring/prometheus-operator-alertmanager-token-jxljb
- monitoring/prometheus-operator-grafana
- monitoring/prometheus-operator-grafana-test-token-q5lsl
- monitoring/prometheus-operator-grafana-token-949ch
- monitoring/prometheus-operator-kube-state-metrics-token-9gsz5
- monitoring/prometheus-operator-operator-token-556vs
- monitoring/prometheus-operator-prometheus-node-exporter-token-9f545
- monitoring/prometheus-operator-prometheus-token-bxb9w
- monitoring/prometheus-prometheus-operator-prometheus
- monitoring/prometheus-prometheus-operator-prometheus-tls-assets
- monitoring/prometheus.ict.navinfo.cloud-tls
- monitoring/sh.helm.release.v1.prometheus-operator.v1
- monitoring/sh.helm.release.v1.prometheus-operator.v2
v1/Service:
- monitoring/alertmanager-operated
- monitoring/prometheus-operated
- monitoring/prometheus-operator-alertmanager
- monitoring/prometheus-operator-grafana
- monitoring/prometheus-operator-kube-state-metrics
- monitoring/prometheus-operator-operator
- monitoring/prometheus-operator-prometheus
- monitoring/prometheus-operator-prometheus-node-exporter
v1/ServiceAccount:
- monitoring/default
- monitoring/prometheus-operator-alertmanager
- monitoring/prometheus-operator-grafana
- monitoring/prometheus-operator-grafana-test
- monitoring/prometheus-operator-kube-state-metrics
- monitoring/prometheus-operator-operator
- monitoring/prometheus-operator-prometheus
- monitoring/prometheus-operator-prometheus-node-exporter
Velero-Native Snapshots: <none included>
Restic Backups:
Completed:
monitoring/alertmanager-prometheus-operator-alertmanager-0: alertmanager-prometheus-operator-alertmanager-db
monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk: storage
monitoring/prometheus-prometheus-operator-prometheus-0: prometheus-prometheus-operator-prometheus-db This is the restore details. Note that velero couldn't restore alertmanager-prometheus-operator-alertmanager Statefulset because it was already created by Alertmanager object. It couldn't restore prometheus-prometheus-operator-prometheus Statefulset also because it is created by Prometheus object (other prometheus-operator CRD). But, it's PV could be restored because the created Statefulset could "adopt" the restored POD. I have no clues why alertmanager Statefulset couldn't "adopt" the restored alertmanager POD. Perhaps a racing condition or something else... Name: monitoring
Namespace: velero
Labels: <none>
Annotations: <none>
Phase: PartiallyFailed (run 'velero restore logs monitoring' for more information)
Warnings:
Velero: <none>
Cluster: could not restore, customresourcedefinitions.apiextensions.k8s.io "alertmanagers.monitoring.coreos.com" already exists. Warning: the in-cluster version is different than the backed-up version.
could not restore, customresourcedefinitions.apiextensions.k8s.io "prometheuses.monitoring.coreos.com" already exists. Warning: the in-cluster version is different than the backed-up version.
could not restore, customresourcedefinitions.apiextensions.k8s.io "prometheusrules.monitoring.coreos.com" already exists. Warning: the in-cluster version is different than the backed-up version.
could not restore, customresourcedefinitions.apiextensions.k8s.io "servicemonitors.monitoring.coreos.com" already exists. Warning: the in-cluster version is different than the backed-up version.
could not restore, clusterrolebindings.rbac.authorization.k8s.io "prometheus-operator-grafana-clusterrolebinding" already exists. Warning: the in-cluster version is different than the backed-up version.
could not restore, clusterrolebindings.rbac.authorization.k8s.io "prometheus-operator-kube-state-metrics" already exists. Warning: the in-cluster version is different than the backed-up version.
could not restore, clusterrolebindings.rbac.authorization.k8s.io "prometheus-operator-operator-psp" already exists. Warning: the in-cluster version is different than the backed-up version.
could not restore, clusterrolebindings.rbac.authorization.k8s.io "prometheus-operator-operator" already exists. Warning: the in-cluster version is different than the backed-up version.
could not restore, clusterrolebindings.rbac.authorization.k8s.io "prometheus-operator-prometheus-psp" already exists. Warning: the in-cluster version is different than the backed-up version.
could not restore, clusterrolebindings.rbac.authorization.k8s.io "prometheus-operator-prometheus" already exists. Warning: the in-cluster version is different than the backed-up version.
could not restore, clusterrolebindings.rbac.authorization.k8s.io "psp-prometheus-operator-kube-state-metrics" already exists. Warning: the in-cluster version is different than the backed-up version.
could not restore, clusterrolebindings.rbac.authorization.k8s.io "psp-prometheus-operator-prometheus-node-exporter" already exists. Warning: the in-cluster version is different than the backed-up version.
Namespaces:
monitoring-restored: could not restore, endpoints "alertmanager-operated" already exists. Warning: the in-cluster version is different than the backed-up version.
could not restore, services "alertmanager-operated" already exists. Warning: the in-cluster version is different than the backed-up version.
could not restore, services "prometheus-operated" already exists. Warning: the in-cluster version is different than the backed-up version.
could not restore, statefulsets.apps "alertmanager-prometheus-operator-alertmanager" already exists. Warning: the in-cluster version is different than the backed-up version.
could not restore, statefulsets.apps "prometheus-prometheus-operator-prometheus" already exists. Warning: the in-cluster version is different than the backed-up version.
Errors:
Velero: timed out waiting for all PodVolumeRestores to complete
Cluster: <none>
Namespaces: <none>
Backup: monitoring
Namespaces:
Included: all namespaces found in the backup
Excluded: <none>
Resources:
Included: *
Excluded: nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
Cluster-scoped: auto
Namespace mappings: monitoring=monitoring-restored
Label selector: <none>
Restore PVs: auto
Restic Restores:
Completed:
monitoring-restored/prometheus-operator-grafana-7ff4f8b97b-jxwzk: storage
monitoring-restored/prometheus-prometheus-operator-prometheus-0: prometheus-prometheus-operator-prometheus-db
New:
monitoring-restored/alertmanager-prometheus-operator-alertmanager-0: alertmanager-prometheus-operator-alertmanager-db I'll try to first restore the Pods and PVs and then the rest. |
The PVs restore by using the bellow command was successfully executed: velero restore create monitoring-1 --from-backup monitoring --namespace-mappings monitoring:monitoring-restored \
--exclude-resources=alertmanager.monitoring.coreos.com,prometheuses.monitoring.coreos.com After that, I could restore without worries the Alertmanager and Prometheuses objects: velero restore create monitoring-cdrs --from-backup monitoring --namespace-mappings monitoring:monitoring-restored \
--include-resources=alertmanager.monitoring.coreos.com,prometheuses.monitoring.coreos.com |
What steps did you take and what happened:
I'm trying to restore a restic volume.
My backup got 2 volumes with 2 deployments
Backup
When I make a restore from this backup , The postgres pod is restore properly but not the bitbucket server
The init container is not created on the bitbucket-server pod , so the restic stay stuck in "New" phase but the pod is created and running. It shouldn't
*** Restic log ***
*** Velero Log ***
https://gist.github.com/Stolr/02ee7e4ee7d662b94df52de93f953ab3
*** PodVolumeRestore ***
** Environment **
velero version
Client:
Version: v1.1.0
Git commit: a357f21
Server:
Version: v1.1.0
oc v3.11.0+0cbc58b
kubernetes v1.11.0+d4cacc0
openshift v3.11.0+bdd37ad-314
kubernetes v1.11.0+d4cacc0
The namespace does not exist before the restore so every resources is new on the cluster
Any idea ?
Thanks a lot
The text was updated successfully, but these errors were encountered: