[36384] Add job state time limit actions to batch queue #36658

jchorl · 2024-03-29T21:36:45Z

Description

A new option has been added to batch job queues: https://aws.amazon.com/about-aws/whats-new/2024/03/aws-batch-alerts-detect-jobs-runnable-state/

This PR just adds support for the new API fields to the batch queue terraform resource.

Relations

Closes #36384

References

API reference: https://docs.aws.amazon.com/batch/latest/APIReference/API_JobStateTimeLimitAction.html

Output from Acceptance Testing

I didn't run acceptance tests to avoid dangling resources in my account.

I did however use this to update job queues and it worked.

github-actions · 2024-03-29T21:37:32Z

Thank you for your contribution! 🚀

Please note that the CHANGELOG.md file contents are handled by the maintainers during merge. This is to prevent pull request merge conflicts, especially for contributions which may not be merged immediately. Please see the Contributing Guide for additional pull request review items.

Remove any changes to the CHANGELOG.md file and commit them in this pull request to prevent delays with reviewing and potentially merging this pull request.

drewmullen

Thanks for opening this PR. Was planning to get to that this week... appreciate you stepping up. Here's some requested changes:

Also please include a changelog entry

internal/service/batch/job_queue.go

internal/service/batch/job_queue_test.go

website/docs/r/batch_job_queue.html.markdown

drewmullen · 2024-04-01T15:04:50Z

internal/service/batch/job_queue.go

+ if !plan.JobStateTimeLimitAction.IsNull() && !plan.JobStateTimeLimitAction.Equal(state.JobStateTimeLimitAction) {
+ flex.Expand(ctx, plan.JobStateTimeLimitAction, &input.JobStateTimeLimitActions)
+ update = true
+ }


can reason be updated?

Co-authored-by: drewmullen <[email protected]>

drewmullen · 2024-04-02T22:44:55Z

@jchorl looks like I gave you the validation functions using the old framework and this resource is on the new framework… I can send some examples tomorrow if you have trouble

jchorl · 2024-04-02T22:55:49Z

@jchorl looks like I gave you the validation functions using the old framework and this resource is on the new framework… I can send some examples tomorrow if you have trouble

You're too fast, just figuring it out locally. Will push updates soon.

drewmullen · 2024-04-02T23:26:39Z

I saw the notification and immediately felt bad for giving bad code lol

@jchorl

jchorl · 2024-04-02T23:46:32Z

I think I addressed all feedback. Unfortunately I can't test updating the reason this sec because there's some unrelated concurrent changes in-flux that can't be targetted around. I can try to test tomorrow.

drewmullen

This looks really good. Seems like the CI is complaining about the test config name... odd that it hasnt complained about this until now... i believe the solution would be to add a _, aka ..Config_Base... might be b (lower case)... i cant remember

internal/service/batch/job_queue_test.go

Co-authored-by: drewmullen <[email protected]>

ChannyClaus · 2024-08-08T19:35:14Z

@jchorl ~~are you still working on this PR? if you're busy with other things, happy to take over (this feature would be incredibly helpful for us 😅)~~

update - my company may be getting acquired... might still do it but if someone else wants to jump in, feel free to!

jchorl · 2024-08-08T19:54:53Z

@jchorl are you still working on this PR? if you're busy with other things, happy to take over (this feature would be incredibly helpful for us 😅)

Please feel free to take it over - I haven't had a chance to figure out terraform tests, especially with all the resources they configure in your AWS acct.

I do think there is a lingering issue in this PR with state getting out of sync - i.e. if you apply the same config twice I think it still shows changes.

ChannyClaus · 2024-08-09T20:02:31Z

@jchorl are you still working on this PR? if you're busy with other things, happy to take over (this feature would be incredibly helpful for us 😅)

Please feel free to take it over - I haven't had a chance to figure out terraform tests, especially with all the resources they configure in your AWS acct.

I do think there is a lingering issue in this PR with state getting out of sync - i.e. if you apply the same config twice I think it still shows changes.

made a PR #38784 to pick this up. do you happen to remember the configuration with which you ran into the out-of-sync state issue? seems like when i make changes to the job_state_time_limit_action attribute, the subsequent runs of terraform apply shows no change.

on a related-ish note, hopefully someone will be able to take a look at the PR soon, since given the acquisition going on at my company, it's unclear how much longer i'll have access to AWS account here :.)

jchorl · 2024-08-09T20:36:39Z

@jchorl are you still working on this PR? if you're busy with other things, happy to take over (this feature would be incredibly helpful for us 😅)

Please feel free to take it over - I haven't had a chance to figure out terraform tests, especially with all the resources they configure in your AWS acct.
I do think there is a lingering issue in this PR with state getting out of sync - i.e. if you apply the same config twice I think it still shows changes.

made a PR #38784 to pick this up. do you happen to remember the configuration with which you ran into the out-of-sync state issue? seems like when i make changes to the job_state_time_limit_action attribute, the subsequent runs of terraform apply shows no change.

on a related-ish note, hopefully someone will be able to take a look at the PR soon, since given the acquisition going on at my company, it's unclear how much longer i'll have access to AWS account here :.)

I just built my branch and tf-applied off that provider. Got a bunch of:

│ Error: Provider produced inconsistent final plan                                                                                                                                             
│                                                                                                                                                                                              
│ When expanding the plan for module.whatever.compute_env["foo"] to include new values learned so far during apply, provider               
│ "registry.terraform.io/hashicorp/aws" changed the planned action from Update to DeleteThenCreate.                                                                                            
│                                                                                                                                                                                              
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.

I'm not sure what this error means/why it would occur. This was off just a normal terraform apply. If you can toggle on/off the job_state_time_limit_action, tf apply, and not hit this, maybe it's fixed.

Genuine thanks for picking this up.

ChannyClaus · 2024-08-09T20:50:28Z

yup, tried it with

terraform {
  required_providers {
	aws = {
      source  = "terraform.local/local/aws"
    }
  }
}

# Configure the AWS Provider
provider "aws" {
  region = "us-west-2"
}


resource "aws_batch_compute_environment" "sample" {
  compute_environment_name = "sample"

  compute_resources {
    max_vcpus = 16
    subnets = [
      "<redacted>",
    ] 
    security_group_ids = [
      "<redacted>",
    ]
    type = "FARGATE"
  }

  type         = "MANAGED"
}

resource "aws_batch_job_queue" "test_queue" {
  name     = "tf-test-batch-job-queue4"
  state    = "ENABLED"
  priority = 1
  compute_environment_order {
    order               = 1
    compute_environment = aws_batch_compute_environment.sample.arn
  }
  job_state_time_limit_action {
    action           = "CANCEL"
    max_time_seconds = 606
    reason           = "CAPACITY:INSUFFICIENT_INSTANCE_CAPACITY"
    state            = "RUNNABLE"
  }
}

i don't seem to get that error message.

and no problem! we ended up setting up a cronjob to pick up on jobs that get stuck on RUNNABLE and i'm just glad AWS got around to support this natively on their end.

github-actions · 2024-09-14T02:16:22Z

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

[36384] Add job state time limit actions to batch queue

5a40e79

terraform-aws-provider bot added the needs-triage Waiting for first response or review from a maintainer. label Mar 29, 2024

rev changelog

d81c8b7

drewmullen suggested changes Apr 1, 2024

View reviewed changes

drewmullen reviewed Apr 1, 2024

View reviewed changes

justinretzolk added enhancement Requests to existing resources that expand the functionality or scope. and removed needs-triage Waiting for first response or review from a maintainer. labels Apr 1, 2024

Apply suggestions from code review

315e6d2

Co-authored-by: drewmullen <[email protected]>

compiles

03a501a

jchorl added 3 commits April 2, 2024 23:41

update tests

ec6a023

update secs

14661a2

add changelog

7509f0b

drewmullen suggested changes Apr 4, 2024

View reviewed changes

internal/service/batch/job_queue_test.go Outdated Show resolved Hide resolved

internal/service/batch/job_queue_test.go Outdated Show resolved Hide resolved

jchorl and others added 3 commits April 4, 2024 14:38

Apply suggestions from code review

2e46952

Co-authored-by: drewmullen <[email protected]>

f

1be0813

f

a6ab8e8

ChannyClaus mentioned this pull request Aug 9, 2024

[36384] Add job state time limit actions to batch queue #38784

Merged

jchorl closed this Aug 13, 2024

github-actions bot locked as resolved and limited conversation to collaborators Sep 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[36384] Add job state time limit actions to batch queue #36658

[36384] Add job state time limit actions to batch queue #36658

jchorl commented Mar 29, 2024

github-actions bot commented Mar 29, 2024

drewmullen left a comment

drewmullen Apr 1, 2024

drewmullen commented Apr 2, 2024

jchorl commented Apr 2, 2024

drewmullen commented Apr 2, 2024

jchorl commented Apr 2, 2024

drewmullen left a comment

ChannyClaus commented Aug 8, 2024 •

edited

Loading

jchorl commented Aug 8, 2024

ChannyClaus commented Aug 9, 2024

jchorl commented Aug 9, 2024

ChannyClaus commented Aug 9, 2024

github-actions bot commented Sep 14, 2024

[36384] Add job state time limit actions to batch queue #36658

[36384] Add job state time limit actions to batch queue #36658

Conversation

jchorl commented Mar 29, 2024

Description

Relations

References

Output from Acceptance Testing

github-actions bot commented Mar 29, 2024

drewmullen left a comment

Choose a reason for hiding this comment

drewmullen Apr 1, 2024

Choose a reason for hiding this comment

drewmullen commented Apr 2, 2024

jchorl commented Apr 2, 2024

drewmullen commented Apr 2, 2024

jchorl commented Apr 2, 2024

drewmullen left a comment

Choose a reason for hiding this comment

ChannyClaus commented Aug 8, 2024 • edited Loading

jchorl commented Aug 8, 2024

ChannyClaus commented Aug 9, 2024

jchorl commented Aug 9, 2024

ChannyClaus commented Aug 9, 2024

github-actions bot commented Sep 14, 2024

ChannyClaus commented Aug 8, 2024 •

edited

Loading