Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[36384] Add job state time limit actions to batch queue #36658

Closed

Conversation

jchorl
Copy link
Contributor

@jchorl jchorl commented Mar 29, 2024

Description

A new option has been added to batch job queues: https://aws.amazon.com/about-aws/whats-new/2024/03/aws-batch-alerts-detect-jobs-runnable-state/

This PR just adds support for the new API fields to the batch queue terraform resource.

Relations

Closes #36384

References

API reference: https://docs.aws.amazon.com/batch/latest/APIReference/API_JobStateTimeLimitAction.html

Output from Acceptance Testing

I didn't run acceptance tests to avoid dangling resources in my account.

I did however use this to update job queues and it worked.

@github-actions github-actions bot added size/L Managed by automation to categorize the size of a PR. documentation Introduces or discusses updates to documentation. tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure. service/batch Issues and PRs that pertain to the batch service. provider Pertains to the provider itself, rather than any interaction with AWS. labels Mar 29, 2024
@terraform-aws-provider terraform-aws-provider bot added the needs-triage Waiting for first response or review from a maintainer. label Mar 29, 2024
Copy link

Thank you for your contribution! 🚀

Please note that the CHANGELOG.md file contents are handled by the maintainers during merge. This is to prevent pull request merge conflicts, especially for contributions which may not be merged immediately. Please see the Contributing Guide for additional pull request review items.

Remove any changes to the CHANGELOG.md file and commit them in this pull request to prevent delays with reviewing and potentially merging this pull request.

Copy link
Collaborator

@drewmullen drewmullen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this PR. Was planning to get to that this week... appreciate you stepping up. Here's some requested changes:

Also please include a changelog entry

internal/service/batch/job_queue.go Show resolved Hide resolved
internal/service/batch/job_queue.go Show resolved Hide resolved
internal/service/batch/job_queue.go Show resolved Hide resolved
internal/service/batch/job_queue_test.go Show resolved Hide resolved
website/docs/r/batch_job_queue.html.markdown Outdated Show resolved Hide resolved
website/docs/r/batch_job_queue.html.markdown Outdated Show resolved Hide resolved
website/docs/r/batch_job_queue.html.markdown Outdated Show resolved Hide resolved
if !plan.JobStateTimeLimitAction.IsNull() && !plan.JobStateTimeLimitAction.Equal(state.JobStateTimeLimitAction) {
flex.Expand(ctx, plan.JobStateTimeLimitAction, &input.JobStateTimeLimitActions)
update = true
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can reason be updated?

@justinretzolk justinretzolk added enhancement Requests to existing resources that expand the functionality or scope. and removed needs-triage Waiting for first response or review from a maintainer. labels Apr 1, 2024
@drewmullen
Copy link
Collaborator

@jchorl looks like I gave you the validation functions using the old framework and this resource is on the new framework… I can send some examples tomorrow if you have trouble

@jchorl
Copy link
Contributor Author

jchorl commented Apr 2, 2024

@jchorl looks like I gave you the validation functions using the old framework and this resource is on the new framework… I can send some examples tomorrow if you have trouble

You're too fast, just figuring it out locally. Will push updates soon.

@drewmullen
Copy link
Collaborator

I saw the notification and immediately felt bad for giving bad code lol

@jchorl

@jchorl
Copy link
Contributor Author

jchorl commented Apr 2, 2024

I think I addressed all feedback. Unfortunately I can't test updating the reason this sec because there's some unrelated concurrent changes in-flux that can't be targetted around. I can try to test tomorrow.

Copy link
Collaborator

@drewmullen drewmullen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good. Seems like the CI is complaining about the test config name... odd that it hasnt complained about this until now... i believe the solution would be to add a _, aka ..Config_Base... might be b (lower case)... i cant remember

internal/service/batch/job_queue_test.go Outdated Show resolved Hide resolved
internal/service/batch/job_queue_test.go Outdated Show resolved Hide resolved
@ChannyClaus
Copy link
Contributor

ChannyClaus commented Aug 8, 2024

@jchorl are you still working on this PR? if you're busy with other things, happy to take over (this feature would be incredibly helpful for us 😅)

update - my company may be getting acquired... might still do it but if someone else wants to jump in, feel free to!

@jchorl
Copy link
Contributor Author

jchorl commented Aug 8, 2024

@jchorl are you still working on this PR? if you're busy with other things, happy to take over (this feature would be incredibly helpful for us 😅)

Please feel free to take it over - I haven't had a chance to figure out terraform tests, especially with all the resources they configure in your AWS acct.

I do think there is a lingering issue in this PR with state getting out of sync - i.e. if you apply the same config twice I think it still shows changes.

@ChannyClaus
Copy link
Contributor

@jchorl are you still working on this PR? if you're busy with other things, happy to take over (this feature would be incredibly helpful for us 😅)

Please feel free to take it over - I haven't had a chance to figure out terraform tests, especially with all the resources they configure in your AWS acct.

I do think there is a lingering issue in this PR with state getting out of sync - i.e. if you apply the same config twice I think it still shows changes.

made a PR #38784 to pick this up. do you happen to remember the configuration with which you ran into the out-of-sync state issue? seems like when i make changes to the job_state_time_limit_action attribute, the subsequent runs of terraform apply shows no change.

on a related-ish note, hopefully someone will be able to take a look at the PR soon, since given the acquisition going on at my company, it's unclear how much longer i'll have access to AWS account here :.)

@jchorl
Copy link
Contributor Author

jchorl commented Aug 9, 2024

@jchorl are you still working on this PR? if you're busy with other things, happy to take over (this feature would be incredibly helpful for us 😅)

Please feel free to take it over - I haven't had a chance to figure out terraform tests, especially with all the resources they configure in your AWS acct.
I do think there is a lingering issue in this PR with state getting out of sync - i.e. if you apply the same config twice I think it still shows changes.

made a PR #38784 to pick this up. do you happen to remember the configuration with which you ran into the out-of-sync state issue? seems like when i make changes to the job_state_time_limit_action attribute, the subsequent runs of terraform apply shows no change.

on a related-ish note, hopefully someone will be able to take a look at the PR soon, since given the acquisition going on at my company, it's unclear how much longer i'll have access to AWS account here :.)

I just built my branch and tf-applied off that provider. Got a bunch of:

│ Error: Provider produced inconsistent final plan                                                                                                                                             
│                                                                                                                                                                                              
│ When expanding the plan for module.whatever.compute_env["foo"] to include new values learned so far during apply, provider               
│ "registry.terraform.io/hashicorp/aws" changed the planned action from Update to DeleteThenCreate.                                                                                            
│                                                                                                                                                                                              
│ This is a bug in the provider, which should be reported in the provider's own issue tracker. 

I'm not sure what this error means/why it would occur. This was off just a normal terraform apply. If you can toggle on/off the job_state_time_limit_action, tf apply, and not hit this, maybe it's fixed.

Genuine thanks for picking this up.

@ChannyClaus
Copy link
Contributor

yup, tried it with

terraform {
  required_providers {
	aws = {
      source  = "terraform.local/local/aws"
    }
  }
}

# Configure the AWS Provider
provider "aws" {
  region = "us-west-2"
}


resource "aws_batch_compute_environment" "sample" {
  compute_environment_name = "sample"

  compute_resources {
    max_vcpus = 16
    subnets = [
      "<redacted>",
    ] 
    security_group_ids = [
      "<redacted>",
    ]
    type = "FARGATE"
  }

  type         = "MANAGED"
}

resource "aws_batch_job_queue" "test_queue" {
  name     = "tf-test-batch-job-queue4"
  state    = "ENABLED"
  priority = 1
  compute_environment_order {
    order               = 1
    compute_environment = aws_batch_compute_environment.sample.arn
  }
  job_state_time_limit_action {
    action           = "CANCEL"
    max_time_seconds = 606
    reason           = "CAPACITY:INSUFFICIENT_INSTANCE_CAPACITY"
    state            = "RUNNABLE"
  }
}

i don't seem to get that error message.

and no problem! we ended up setting up a cronjob to pick up on jobs that get stuck on RUNNABLE and i'm just glad AWS got around to support this natively on their end.

@jchorl jchorl closed this Aug 13, 2024
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Introduces or discusses updates to documentation. enhancement Requests to existing resources that expand the functionality or scope. provider Pertains to the provider itself, rather than any interaction with AWS. service/batch Issues and PRs that pertain to the batch service. size/L Managed by automation to categorize the size of a PR. tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement]: aws_batch_job_queue jobStateTimeLimitActions parameter
4 participants