Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

velero backup create fails to upload backup to s3 using aws plugin #7543

Closed
Wayne-H-Ha opened this issue Mar 19, 2024 Discussed in #7542 · 57 comments
Closed

velero backup create fails to upload backup to s3 using aws plugin #7543

Wayne-H-Ha opened this issue Mar 19, 2024 Discussed in #7542 · 57 comments

Comments

@Wayne-H-Ha
Copy link

Wayne-H-Ha commented Mar 19, 2024

Discussed in #7542

Originally posted by Wayne-H-Ha March 19, 2024
We used to be able to create backup using velero 1.12.2 and aws plugin 1.8.2.

We tried velero 1.13.0 and plugin 1.9.0 and it failed so we switched back to older version.

We tried again with velero 1.13.1 and plugin 1.9.1 and it still fails. Any configuration change we need to make in order to use the new version?

We tried to find the backup in s3 and it didn't get uploaded there.

When we describe the backup, it returns:

velero-v1.13.1-linux-amd64/velero describe backup cp-20240319163110 | tail
Started:    2024-03-19 16:32:01 +0000 UTC
Completed:  <n/a>
Expiration:  2024-04-18 16:32:01 +0000 UTC
Total items to be backed up:  2871
Items backed up:              2871
Backup Volumes:
  <error getting backup volume info: DownloadRequest.velero.io "cp-20240319163110-63fd6028-b8fb-4c35-97e2-7fbfc44f74f3" is invalid: spec.target.kind: Unsupported value: "BackupVolumeInfos": supported values: "BackupLog", "BackupContents", "BackupVolumeSnapshots", "BackupItemOperations", "BackupResourceList", "BackupResults", "RestoreLog", "RestoreResults", "RestoreResourceList", "RestoreItemOperations", "CSIBackupVolumeSnapshots", "CSIBackupVolumeSnapshotContents">

We believe the problem is a suffix "@aws" is added to key id? For example, aws_access_key_id = "3..0" but "3..0@aws" is passed to s3? Is there a configuration we can use to not having this suffix added?

cat /credentials/cloud
[default]
aws_access_key_id = "3..0"
aws_secret_access_key = "a..b"
@sseago
Copy link
Collaborator

sseago commented Mar 19, 2024

It looks like you may have the wrong CRDs installed. BackupVolumeInfos wa sa new vale added to spec.target.kind for DownloadRequest in 1.13. If you're trying to run Velero 1.13 but have Velero 1.12 CRDs installed, that would explain the error.

@Wayne-H-Ha
Copy link
Author

Thanks for the quick response. I found in the doc I can run:

velero install --crds-only --dry-run -o yaml

So I run the above using velero 1.12.2 and 1.13.1 and as you said, I found BackupVolumeInfos in the output produced by 1.13.1:

% diff -w crds.1.13.1 crds.1.12.2 | grep BackupVolumeInfos        
<                       - BackupVolumeInfos

My next question is how do I update CRDS from 1.12.2 to 1.13.1?

@qiuming-best
Copy link
Contributor

Here is one doc that you could reference

@Wayne-H-Ha
Copy link
Author

Thanks for the link to the doc. I have run the following:

velero-v1.13.1-linux-amd64/velero install --crds-only --dry-run -o yaml | kubectl apply -f -

velero-v1.13.1-linux-amd64/velero backup create cp-20240320020119

But the backup still fails:

velero-v1.13.1-linux-amd64/velero backup describe cp-20240320020119
Name:         cp-20240320020119
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/resource-timeout=10m0s
              velero.io/source-cluster-k8s-gitversion=v1.27.11+IKS
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=27
Phase:  Failed (run `velero backup logs cp-20240320020119` for more information)
Namespaces:
  Included:  *
  Excluded:  <none>
Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto
Label selector:  <none>
Or label selector:  <none>
Storage Location:  default
Velero-Native Snapshot PVs:  auto
Snapshot Move Data:          false
Data Mover:                  velero
TTL:  720h0m0s
CSISnapshotTimeout:    10m0s
ItemOperationTimeout:  4h0m0s
Hooks:  <none>
Backup Format Version:  1.1.0
Started:    2024-03-20 02:06:35 +0000 UTC
Completed:  <n/a>
Expiration:  2024-04-19 02:06:35 +0000 UTC
Total items to be backed up:  2989
Items backed up:              2989
Backup Volumes:
  Velero-Native Snapshots: <none included>
  CSI Snapshots: <none included or not detectable>
  Pod Volume Backups: <none included>
HooksAttempted:  0
HooksFailed:     0

Maybe it is still adding "@aws" suffix to the key id?

@blackpiglet
Copy link
Contributor

Not sure about the @aws suffix. IMO, there is no need to add that.

Could you post the error information of the failed backup?

@Wayne-H-Ha
Copy link
Author

Wayne-H-Ha commented Mar 20, 2024

Thanks for looking into this problem. Here is the error I found for the failed backup:

Mar 19 22:31:03 velero-58c946d54d-k5xdt velero info time="2024-03-20T02:30:53Z" level=info msg="Setting up backup store to persist the backup" backup=velero/cp-20240320023032 logSource="pkg/controller/backup_controller.go:729"
Mar 19 22:31:03 velero-58c946d54d-k5xdt velero error time="2024-03-20T02:30:53Z" level=error msg="Error uploading log file" backup=cp-20240320023032 bucket=codeengine-cp-dev-relint error="rpc error: code = Unknown desc = error putting object dev-relint-controlplane/backups/cp-20240320023032/cp-20240320023032-logs.gz: operation error S3: PutObject, https response error StatusCode: 403, RequestID: b54ad6b1-c6a4-443f-9e99-be04b978a9bf, HostID: , api error AccessDenied: Access Denied" error.file="/go/src/velero-plugin-for-aws/velero-plugin-for-aws/object_store.go:253" error.function="main.(*ObjectStore).PutObject" logSource="pkg/persistence/object_store.go:252" prefix=dev-relint-controlplane
Mar 19 22:31:03 velero-58c946d54d-k5xdt velero info time="2024-03-20T02:30:53Z" level=info msg="Initial backup processing complete, moving to Finalizing" backup=velero/cp-20240320023032 logSource="pkg/controller/backup_controller.go:743"
Mar 19 22:31:03 velero-58c946d54d-k5xdt velero error time="2024-03-20T02:30:53Z" level=error msg="backup failed" backuprequest=velero/cp-20240320023032 controller=backup error="rpc error: code = Unknown desc = error putting object dev-relint-controlplane/backups/cp-20240320023032/velero-backup.json: operation error S3: PutObject, https response error StatusCode: 403, RequestID: 74bbf24c-dc0a-43c2-8b92-3796929fd421, HostID: , api error AccessDenied: Access Denied" logSource="pkg/controller/backup_controller.go:288"
Mar 19 22:31:03 velero-58c946d54d-k5xdt velero info time="2024-03-20T02:30:53Z" level=info msg="Updating backup's final status" backuprequest=velero/cp-20240320023032 controller=backup logSource="pkg/controller/backup_controller.go:307"

The s3 support recommended:

We usually see a suffix of @aws in the access_key_id of HMAC access when the s3 signature/presigned URL is not correct. We suggest engaging Velero support to investigate if they have a behavior change on the s3 signature/presigned in their new version.

@blackpiglet
Copy link
Contributor

What is the backup repository's backend?
Is it the AWS S3 or an on-premise OSS?

@Wayne-H-Ha
Copy link
Author

The S3 backend is IBM Cloud Object Storage that behaves like AWS S3.

@sseago
Copy link
Collaborator

sseago commented Mar 20, 2024

Hmm. Looks like you may have the wrong bucket permissions for your s3 bucket. See the bucket policies section at https:/vmware-tanzu/velero-plugin-for-aws/blob/main/README.md and compare with what you have.

@Wayne-H-Ha
Copy link
Author

Thanks for the link to the documentation. As I mentioned earlier, velero 1.12.2 and aws plugin 1.8.2 backup used to work for us. So not sure why it stopped working when we upgrade to 1.13.0 and 1.9.0 or 1.13.1 and 1.9.1? Here is the velero install command we use for many versions of velero including 1.11 and earlier versions:

      /tmp/velero-${RELEASE}-linux-amd64/velero install \
      --image "${REGISTRY_PATH}/velero:${VELERO_IMAGE_TAG}"  \
      --provider aws \
      --plugins ${REGISTRY_PATH}/velero-plugin-for-aws-amd64:${VELERO_PLUGIN_IMAGE_TAG} \
      --bucket ${COS_BUCKET}  \
      --prefix ${COS_PREFIX} \
      --secret-file /tmp/cos-credentials \
      --use-volume-snapshots=false \
      --backup-location-config region=us-east-1,s3ForcePathStyle="true",s3Url=${COS_ENDPOINT} \
      --velero-pod-cpu-request "700m" \
      --velero-pod-mem-request "$MEM_REQUEST" \
      --velero-pod-cpu-limit "700m" \
      --velero-pod-mem-limit "$MEM_LIMIT"

@sseago
Copy link
Collaborator

sseago commented Mar 20, 2024

I can't think of any changes we've made to the way we handle uploads that would trigger new permission requirements between 1.12 and 1.13, although maybe there's something I'm not aware of. It may be worth creating a new bucket and making sure it has the recommended bucket policy in place to see whether this works, which will eliminate the possibility that something changed in the bucket itself.

@Wayne-H-Ha
Copy link
Author

We tried the following combinations:

Velero vs aws plugin
1.12.2 vs 1.8.2 works
1.12.2 vs 1.9.1 fails
1.13.1 vs 1.8.2 works
1.13.1 vs 1.9.1 fails

So we suspect aws plugin 1.9.1 is adding "@aws" to end of key id so velero fails to upload backup to IBM Cloud Object Storage?

@blackpiglet
Copy link
Contributor

The issue may relate to the AWS SDK version bump in the Velero AWS plugin version v1.9.
Could you give more information about your suspected @aws suffix?

Did you see that in the secret, the pod, or the Velero log?

@Wayne-H-Ha
Copy link
Author

I contacted IBM Cloud Object Storage and they said they found the following in their log (note suffix "@aws" at end of remote_user):

orig_timestamp  Mar 19, 2024 @ 14:31:07.000
container_name  codeengine-cp-dev-relint
request_type    REST.PUT.OBJECT
access_status   403
remote_user     3f3dad27c65d41b4835b8a3be6d91cb0@aws
credential_type hmac
user_agent      aws-sdk-go-v2/1.21.0 os/linux lang/go#1.21.6

@Alwinius
Copy link

We have the same issue as described here and we are using official Amazon S3. Let me know if you need any logs

@blackpiglet
Copy link
Contributor

IMO, this "@aws" may not be an issue. The 403 error code implies permission denied.
Is there any possibility of permission not being enough issue for the Velero role?

@Wayne-H-Ha
Copy link
Author

Wayne-H-Ha commented Mar 22, 2024

We tried the following combinations:

Velero vs aws plugin 1.12.2 vs 1.8.2 works 1.12.2 vs 1.9.1 fails 1.13.1 vs 1.8.2 works 1.13.1 vs 1.9.1 fails

So we suspect aws plugin 1.9.1 is adding "@aws" to end of key id so velero fails to upload backup to IBM Cloud Object Storage?

As I mentioned previously, we have tried the newest version of velero 1.13.1 vs the newest version of plugin 1.9.1 and it failed. But if we switch to older version of plugin 1.8.2 then it works. In both cases, we have the same permission.

@reasonerjt
Copy link
Contributor

reasonerjt commented Mar 22, 2024

@Wayne-H-Ha

Thanks for the link to the documentation. As I mentioned earlier, velero 1.12.2 and aws plugin 1.8.2 backup used to work for us. So not sure why it stopped working when we upgrade to 1.13.0 and 1.9.0 or 1.13.1 and 1.9.1?

Since aws-plugin v1.9.x, we've switched to aws-sdk-go-v2, so there might be compatibility issue. Some change in sdk-v2 makes IBM Object Storage think @aws was added. Is it possible to check IBM and let them explain how the remote_user was extracted?

I may look into the code, but I can't commit a fix b/c currently the plugin works with AWS-S3 and S3-Compatible storage (minio) in our pipeline.

@Wayne-H-Ha
Copy link
Author

We have the same issue as described here and we are using official Amazon S3. Let me know if you need any logs

@reasonerjt Yes, I will report to IBM Cloud Object Storage with your findings. But please also be informed that @Alwinius said he also has problem with Amazon S3.

@mateusoliveira43
Copy link
Contributor

I also experienced the problem in IBM Cloud with aws plugin v.1.9.1

@Wayne-H-Ha
Copy link
Author

@reasonerjt IBM Cloud Object Storage team replied:

The expected of the remote user should be the access key ID of the HMAC without tailing with the @aws.

For example: "3f3dad27c65d41b4835b8a3be6d91cb0@aws", the ""3f3dad27c65d41b4835b8a3be6d91cb0" is the expected access key ID.

@reasonerjt
Copy link
Contributor

@Wayne-H-Ha
So if @aws is not in the credentials file.
You will need to check with IBM where it comes from, they will need to check the code to find out.
I briefly checked the SDK and didn't find it adding the suffix.

@Wayne-H-Ha
Copy link
Author

@reasonerjt I just got the reply from IBM Cloud Object Storage (COS). I hope you understand the reply as I don't have enough knowledge to digest the information.

COS internal managed to capture debug logged requests for HTTP 403 for PUT. Specifically, the AWS signature does not match what we are expecting and so stop processing the request any further.

Request_id 1) 0ed2fc0b-acf8-4d05-b003-dd5a1bf1b072:

2024-04-02 03:30:32.330 DEBUG [etp466364426-20827]
{s3.auth:56ac6033-f67f-4ba2-a2e0-b7b65350824d} org.cleversafe.s3.auth.AwsAuthenticator -
Invalid AWS V4 Chunked Headers: Incorrect value for Content Hash on Chunked Put Request

in the other:

Request_id 2) 5982df29-85a9-4492-9573-54aaba4b484e:

2024-04-02 03:30:32.319 DEBUG [etp579017959-19571]
{s3.auth:a4493674-ffdc-48c7-920c-2133c490c197} org.cleversafe.s3.auth.AwsAuthenticator -
Invalid AWS V4 Chunked Headers: Incorrect value for Content Hash on Chunked Put Request

Checking COS logs, they can see all HTTP 403 for PUT were for user_agent
"aws-sdk-go-v2/1.21.0 os/linux lang/go#1.21.6 md/GOOS#linux md/GOARCH#amd64 api/s3#1.40.0 ft/s3-transfer".
The write requests which succeeded for the bucket were for user_agent
"aws-sdk-go/1.44.253 (go1.20.10; linux; amd64) S3Manager."

@sseago
Copy link
Collaborator

sseago commented Jun 28, 2024

@Wayne-H-Ha are you seeing other debug logs? If not, it might be better to replace the last two lines with a combined --log-level=debug

@sseago
Copy link
Collaborator

sseago commented Jun 28, 2024

(oh, I'm just noticing that the docs suggest it the way you had it -- so the main question is whether you're seeing other "level=debug" logs. There should be many of them if log level is debug. If they're not, then we'll need to figure out why the setting isn't working. If there are, then we may need to look into what exactly should be logged here and which of those messages you're seeing and which you aren't.

@kaovilai
Copy link
Contributor

I found out the sdkv2 by default do not produce logs.. I'm PRing to aws plugin in a bit.

@kaovilai
Copy link
Contributor

@Wayne-H-Ha
Copy link
Author

I see more than 40 K entries for level=debug and more than 10 K entries for level=info and only 2 entries for level=error:

cat velero-bundle/kubecapture/core_v1/velero/velero-fd9576677-jv989/velero/velero.log | wc -l
52957

cat velero-bundle/kubecapture/core_v1/velero/velero-fd9576677-jv989/velero/velero.log | egrep "level=debug" | wc -l
42203

cat velero-bundle/kubecapture/core_v1/velero/velero-fd9576677-jv989/velero/velero.log | egrep "level=info" | wc -l
10752

cat velero-bundle/kubecapture/core_v1/velero/velero-fd9576677-jv989/velero/velero.log | egrep -v "level=debug|level=info" | wc -l
2

grep "level=error" velero-bundle/kubecapture/core_v1/velero/velero-fd9576677-jv989/velero/velero.log -B2 -A2
time="2024-06-28T20:31:02Z" level=debug msg="found preexisting restartable plugin process" backup=velero/cp-20240628203047 command=/plugins/velero-plugin-for-aws kind=ObjectStore logSource="pkg/plugin/clientmgmt/manager.go:144" name=velero.io/aws
time="2024-06-28T20:31:02Z" level=debug msg="Skip generating BackupVolumeInfo when the CSI feature is disabled." backup=velero/cp-20240628203047 logSource="internal/volume/volumes_information.go:516"
time="2024-06-28T20:31:02Z" level=error msg="Error uploading log file" backup=cp-20240628203047 bucket=codeengine-cp-dev-relint error="rpc error: code = Unknown desc = error putting object dev-relint-controlplane/backups/cp-20240628203047/cp-20240628203047-logs.gz: operation error S3: PutObject, https response error StatusCode: 400, RequestID: d2aa3a87-377d-4e29-bcd1-04d54ffe21c3, HostID: , api error MissingDigest: Missing required content hash for this request: Content-MD5 or x-amz-content-sha256" error.file="/go/src/velero-plugin-for-aws/velero-plugin-for-aws/object_store.go:280" error.function="main.(*ObjectStore).PutObject" logSource="pkg/persistence/object_store.go:256" prefix=dev-relint-controlplane
time="2024-06-28T20:31:02Z" level=info msg="Initial backup processing complete, moving to Finalizing" backup=velero/cp-20240628203047 logSource="pkg/controller/backup_controller.go:756"
time="2024-06-28T20:31:02Z" level=debug msg="received EOF, stopping recv loop" backup=velero/cp-20240628203047 cmd=/plugins/velero-plugin-for-aws err="rpc error: code = Unavailable desc = error reading from server: EOF" logSource="pkg/plugin/clientmgmt/process/logrus_adapter.go:75" pluginName=stdio
--
time="2024-06-28T20:31:02Z" level=info msg="plugin process exited" backup=velero/cp-20240628203047 cmd=/velero id=1380 logSource="pkg/plugin/clientmgmt/process/logrus_adapter.go:80" plugin=/velero
time="2024-06-28T20:31:02Z" level=debug msg="plugin exited" backup=velero/cp-20240628203047 cmd=/velero logSource="pkg/plugin/clientmgmt/process/logrus_adapter.go:75"
time="2024-06-28T20:31:02Z" level=error msg="backup failed" backuprequest=velero/cp-20240628203047 controller=backup error="rpc error: code = Unknown desc = error putting object dev-relint-controlplane/backups/cp-20240628203047/velero-backup.json: operation error S3: PutObject, https response error StatusCode: 400, RequestID: ec71a2ad-9f33-41d2-8a80-67cd6401dc06, HostID: , api error MissingDigest: Missing required content hash for this request: Content-MD5 or x-amz-content-sha256" logSource="pkg/controller/backup_controller.go:288"
time="2024-06-28T20:31:02Z" level=info msg="Updating backup's final status" backuprequest=velero/cp-20240628203047 controller=backup logSource="pkg/controller/backup_controller.go:307"
time="2024-06-28T20:31:02Z" level=debug msg="Getting Backup" backup=velero/cp-20240628203047 controller=backup-finalizer logSource="pkg/controller/backup_finalizer_controller.go:91"

@kaovilai
Copy link
Contributor

@Wayne-H-Ha try this image with debug logging enabled.
ghcr.io/kaovilai/velero-plugin-for-aws:sdk-v2-logging

from vmware-tanzu/velero-plugin-for-aws#207

Then relay that info to IBM COS

@Wayne-H-Ha
Copy link
Author

@kaovilai Thanks for providing the image with debug logging enabled. I have reproduced the problem and sent the new logs to IBM COS for them to investigate.

@kaovilai
Copy link
Contributor

Let us know of any updates.

@Wayne-H-Ha
Copy link
Author

IBM COS said since our bucket has retention policy set, setting checksumAlgorithm to "" will not work for us. They need to implement sdkv2 support in IBM COS.

@sseago
Copy link
Collaborator

sseago commented Jul 12, 2024

@Wayne-H-Ha So does this mean a new version of IBM COS will be needed? Is this on the roadmap?

@Wayne-H-Ha
Copy link
Author

IBM COS said they are working on implementing sdkv2 support in their product.

gjanders added a commit to gjanders/velero that referenced this issue Aug 5, 2024
As per vmware-tanzu#7543 setting checksumAlgorithm to avoid 403 errors

Added plugins line as velero install failed without this option in version 1.14.0

Removed the volumesnapshotlocation as it does not exist in 1.14.0

Signed-off-by: Gareth Anderson <[email protected]>
gjanders added a commit to gjanders/velero that referenced this issue Aug 5, 2024
Added plugins line as velero install failed without this option in version 1.14.0

Removed the volumesnapshotlocation as it does not exist in 1.14.0

Signed-off-by: Joe Beda <[email protected]>
gjanders added a commit to gjanders/velero that referenced this issue Aug 5, 2024
Added plugins line as velero install failed without this option in version 1.14.0

Removed the volumesnapshotlocation as it does not exist in 1.14.0

Signed-off-by: Gareth Anderson <[email protected]>
gjanders added a commit to gjanders/velero that referenced this issue Aug 5, 2024
Added option checksumAlgorith, this stops 403 errors as per vmware-tanzu#7543
Added plugins line as velero install failed without this option in version 1.14.0
Removed the volumesnapshotlocation as it does not exist in 1.14.0

Signed-off-by: Gareth Anderson <[email protected]>
@RangerRick
Copy link

Just adding my voice to this, we ran into it using the Replicated backup tools (which are Velero under the covers) to DigitalOcean's S3-compatible "Spaces". Setting checksumAlgorithm: "" on the BackupStorageLocation resource fixed it for us too, but I'm not able to twiddle that for the restore.

@sseago
Copy link
Collaborator

sseago commented Sep 6, 2024

@RangerRick "Setting checksumAlgorithm: "" on the BackupStorageLocation resource fixed it for us too, but I'm not able to twiddle that for the restore." -- I'm not sure what you mean there. If it's set on the BSL, then that setting is in use for any operation that accesses the object store -- backup, restore, etc.

@Wayne-H-Ha
Copy link
Author

Wayne-H-Ha commented Sep 6, 2024

@sseago I have checksumAlgorithm set to "" in BSL and velero backup create works but velero restore create --from-backup fails. It looks like it failed to get the following objects from IBM COS:

Get "https://s3.direct.eu-de.cloud-object-storage.appdomain.cloud/codeengine-cp-dev-relint-2/dev-relint-controlplane/restores/cp-20240906165700-20240906131034/
Get "https://s3.direct.eu-de.cloud-object-storage.appdomain.cloud/codeengine-cp-dev-relint-2/dev-relint-controlplane/restores/cp-20240906165700-20240906131034/
Get "https://s3.direct.eu-de.cloud-object-storage.appdomain.cloud/codeengine-cp-dev-relint-2/dev-relint-controlplane/restores/cp-20240906165700-20240906131034/
restore-cp-20240906165700-20240906131034-results.gz
restore-cp-20240906165700-20240906131034-results.gz
cp-20240906165700-20240906131034-volumeinfo.json.gz
?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=xxxxxxxaws_access_key_idxxxxxxxx%2F20240906%2Fus-east-1%2Fs3%2Faws4_request
?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=xxxxxxxaws_access_key_idxxxxxxxx%2F20240906%2Fus-east-1%2Fs3%2Faws4_request
?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=xxxxxxxaws_access_key_idxxxxxxxx%2F20240906%2Fus-east-1%2Fs3%2Faws4_request
&X-Amz-Date=20240906T171349Z
&X-Amz-Date=20240906T171349Z
&X-Amz-Date=20240906T171419Z
&X-Amz-Expires=600&X-Amz-SignedHeaders=host&x-id=GetObject
&X-Amz-Expires=600&X-Amz-SignedHeaders=host&x-id=GetObject
&X-Amz-Expires=600&X-Amz-SignedHeaders=host&x-id=GetObject
&X-Amz-Signature=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx": context deadline exceeded>
&X-Amz-Signature=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx": context deadline exceeded>
&X-Amz-Signature=yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy": context deadline exceeded>

@RangerRick
Copy link

@RangerRick "Setting checksumAlgorithm: "" on the BackupStorageLocation resource fixed it for us too, but I'm not able to twiddle that for the restore." -- I'm not sure what you mean there. If it's set on the BSL, then that setting is in use for any operation that accesses the object store -- backup, restore, etc.

@sseago Sorry, specifically in Replicated's tools, which automate the entire disaster recovery process, so I have no way to hook into the space between the pulling of the metadata and when they start the restore. I'm sure they could work around it too, but it would be nice if the s3 plugin could negotiate these things clearly and transparently in the first place.

@sseago
Copy link
Collaborator

sseago commented Sep 6, 2024

The s3 plugin is configured via the BackupStorageLocation. This field is included there. If you are unable to configure the BSL completely (including this parameter) via Replicated tools, then you may need to open an issue against Replicated. Since this is part of the BackupStorageLocation configuration, Velero takes configuration from there. If checksumAlgorithm isn't set, then Velero uses a default. The issue is that the default doesn't work in your environment, so you need to set this when you create the BSL -- and if Replicated is creating the BSL, then the Replicated tooling needs to be able to set this if they claim to be compatible with the latest velero-plugin-for-aws releases.

@reasonerjt
Copy link
Contributor

Thanks @sseago since the original issue is to be resolved on IBM side per this comment
Let me close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests