Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error tagging resources for target_group #16860

Closed
sj158 opened this issue Dec 21, 2020 · 12 comments · Fixed by #17280
Closed

Error tagging resources for target_group #16860

sj158 opened this issue Dec 21, 2020 · 12 comments · Fixed by #17280
Labels
service/elbv2 Issues and PRs that pertain to the elbv2 service.
Milestone

Comments

@sj158
Copy link

sj158 commented Dec 21, 2020

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

Terraform v0.13.5
hashicorp/aws 3.21.0

Note: Most recent core version is 0.14.3. However, a switch is not possible right now since some modules (e.g. terraform-aws-modules/rds/aws in version ~> 2.0) do not support the recent 0.14.x versions.

Affected Resource(s)

  • aws_lb_target_group

Terraform Configuration Files

The setting is pretty ordinary. An excerpt of the hcl definitions as follows:

variable "lb_listeners"        = {
  type  = list
  default =  [ { "protocol" = "TCP", "src_port" = "8765, "trg_port" = "8765" } ]
}

resource "aws_lb" "lb_component" {
  count = var.minimal_setting ? 0 : 1

  name               = "${local.resource_prefix}-LB"
  internal           = true
  load_balancer_type = var.lb_type

  tags = merge(local.tags,
    {
      "tag1" = var.tag1 ? "1" : null
      "tag2" = var.tag2 ? "someothervalue" : null
  }, )
}

resource "aws_lb_target_group" "tg_component" {
  count = var.minimal_setting ? 0 : length(var.lb_listeners)

  name                 = "${local.resource_prefix}-${var.lb_listeners[count.index].trg_port}-TG"
  port                 = var.lb_listeners[count.index].trg_port
  protocol             = var.lb_listeners[count.index].protocol
  vpc_id               = var.vpc_id
  target_type          = "instance"
  
  tags = local.tags
}

resource "aws_lb_listener" "listener_component" {
  count = var.minimal_setting ? 0 : length(var.lb_listeners)

  load_balancer_arn = aws_lb.lb_component[0].arn
  port              = var.lb_listeners[count.index].src_port
  protocol          = var.lb_listeners[count.index].protocol

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.tg_component[count.index].arn
  }

}

Debug Output

[....]
aws_fsx_windows_file_system.fsx_file_system: Still creating... [18m10s elapsed]
aws_fsx_windows_file_system.fsx_file_system: Creation complete after 18m12s [id=fs-xxxxxx]

Error: error updating LB Target Group (arn:aws:elasticloadbalancing:eu-central-1:xxxxxx:targetgroup/tgname-8765-TG/ba2efa855f5cebf0) tags: error tagging resource (arn:aws:elasticloadbalancing:eu-central-1:xxxxxx:targetgroup/tgname-8765-TG/ba2efa855f5cebf0): TargetGroupNotFound: One or more target groups not found
    status code: 400, request id: xxxx-xxx-xxx-xxx-xxxx
[....]

Expected Behavior

TargetGroup could be found and related Tags have been created.

Actual Behavior

TargetGroup could not be found such that Tags cannot be created as well.

Steps to Reproduce

The bug occurs occasionally and cannot be reproduced in a reliable manner. However, the nature of this bug was found (and fixed) already for other resources in Error tagging resources #12427 and Error tagging resources #24395. I guess target_group resource was missed to fixed that time.

References

@ghost ghost added the service/elbv2 Issues and PRs that pertain to the elbv2 service. label Dec 21, 2020
@github-actions github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Dec 21, 2020
@matt-ns
Copy link

matt-ns commented Jan 21, 2021

We're encountering the same issue occasionally on Terraform 0.12.28 with version 3.21.0 of the AWS provider and gathered some more information from CloudTrail following a failed execution. A successful CreateTargetGroup call is made, but within the same second a subsequent call to AddTags fails with TargetGroupNotFoundException.

CreateTargetGroup:

{
  "eventVersion": "1.08",
  "userIdentity": <redacted>
  "eventTime": "2021-01-18T17:52:26Z",
  "eventSource": "elasticloadbalancing.amazonaws.com",
  "eventName": "CreateTargetGroup",
  "awsRegion": "eu-west-1",
  "sourceIPAddress": <redacted>,
  "userAgent": "aws-sdk-go/1.36.0 (go1.14.5; linux; amd64) exec-env/AWS_ECS_EC2 APN/1.0 HashiCorp/1.0 Terraform/0.12.28 (+https://www.terraform.io)",
  "requestParameters": {
    "unhealthyThresholdCount": 3,
    "healthCheckTimeoutSeconds": 29,
    "healthyThresholdCount": 2,
    "protocol": "HTTP",
    "matcher": {
      "httpCode": "200"
    },
    "targetType": "instance",
    "healthCheckPort": "8000",
    "healthCheckPath": <redacted>,
    "vpcId": <redacted>,
    "port": 8000,
    "healthCheckProtocol": "HTTP",
    "healthCheckIntervalSeconds": 30,
    "name": <redacted>,
    "healthCheckEnabled": true
  },
  "responseElements": {
    "targetGroups": [
      {
        "targetGroupArn": <redacted>,
        "healthCheckPort": "8000",
        "healthCheckPath": <redacted>,
        "healthCheckEnabled": true,
        "healthCheckTimeoutSeconds": 29,
        "protocol": "HTTP",
        "healthCheckProtocol": "HTTP",
        "unhealthyThresholdCount": 3,
        "healthCheckIntervalSeconds": 30,
        "port": 8000,
        "matcher": {
          "httpCode": "200"
        },
        "targetGroupName": <redacted>,
        "vpcId": <redacted>,
        "protocolVersion": "HTTP1",
        "targetType": "instance",
        "healthyThresholdCount": 2
      }
    ]
  },
  "requestID": <redacted>,
  "eventID": <redacted>,
  "readOnly": false,
  "eventType": "AwsApiCall",
  "apiVersion": "2015-12-01",
  "managementEvent": true,
  "eventCategory": "Management",
  "recipientAccountId": <redacted>
}

AddTags:

{
  "eventVersion": "1.08",
  "userIdentity": <redacted>,
  "eventTime": "2021-01-18T17:52:27Z",
  "eventSource": "elasticloadbalancing.amazonaws.com",
  "eventName": "AddTags",
  "awsRegion": "eu-west-1",
  "sourceIPAddress": <redacted>,
  "userAgent": "aws-sdk-go/1.36.0 (go1.14.5; linux; amd64) exec-env/AWS_ECS_EC2 APN/1.0 HashiCorp/1.0 Terraform/0.12.28 (+https://www.terraform.io)",
  "errorCode": "TargetGroupNotFoundException",
  "errorMessage": "One or more target groups not found",
  "requestParameters": {
    "tags": <redacted>
    "resourceArns": [
      <same ARN as $.responseElements.targetGroups[0].targetGroupArn in CreateTargetGroup>
    ]
  },
  "responseElements": null,
  "requestID": <redacted>,
  "eventID": <redacted>,
  "readOnly": false,
  "eventType": "AwsApiCall",
  "apiVersion": "2015-12-01",
  "managementEvent": true,
  "eventCategory": "Management",
  "recipientAccountId": <redacted>
}

After AddTags fails, there are no other events referencing the same target group ARN. This does look like a possible consistency issue with the ELB API but we aren't sure. We also observed that during the same execution, the following events are generated while creating a successful target group:

  • 2021-01-18T17:52:12Z: CreateTargetGroup
  • 2021-01-18T17:52:12Z: DescribeTags
  • 2021-01-18T17:52:12Z: AddTags

Assuming the CloudTrail Events are always ordered correctly, it might be of significance that during creation of a successful target group the DescribeTags call comes immediately after creation and before AddTags but perhaps this is not abnormal behaviour.

@mgusiew-guide
Copy link
Contributor

mgusiew-guide commented Jan 21, 2021

My team is also running this issue on TF 0.13.5 and AWS provider 3.22. It seems the provider tries to create tags even though the target group does not yet exist

@shuheiktgw
Copy link
Collaborator

In order to reproduce this behavior, I ran the acc test about 20 times, but could not reproduce the error. TF version is v0.14.5 and terraform-aws-provider points to the master branch. I've tested it using the ap-northeast-1 region.

make testacc TESTARGS='-run=TestAccAWSLBTargetGroup_tags'
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go test ./aws -v -count 1 -parallel 20 -run=TestAccAWSLBTargetGroup_tags -timeout 120m
=== RUN   TestAccAWSLBTargetGroup_tags
=== PAUSE TestAccAWSLBTargetGroup_tags
=== CONT  TestAccAWSLBTargetGroup_tags
--- PASS: TestAccAWSLBTargetGroup_tags (276.03s)
PASS
ok  	github.com/terraform-providers/terraform-provider-aws/aws	279.406s

@sj158
Copy link
Author

sj158 commented Jan 25, 2021

Hi @shuheiktgw,

many thanks for your investigation so far!

I haven't tested this for v0.14.x due to some module incompatibilities yet. However, in v0.13.x (and also 0.12.x as posted by other guys above) the error occurs occasionally.
For example, I had no problems at all within 2 weeks with IaC running once a day. Then suddenly the error occurs every day (without any changes on my side). So it's really unpredictable on my side. Maybe sometimes AWS acts quick enough in provisioning the target group and sometimes, for whatever reason, AWS takes longer.

Could you please have a look at ticket #12427 (I referred to this in the initial post). They had a pretty similar problem with other artifacts and solved it. Maybe their sync/wait-implementation could be reused for target groups?!

Tags are pretty important for billing, tracing and automation. Would be great if you could somehow make this work reliably.

Thanks in advance and best regards,
Stefan

@shuheiktgw
Copy link
Collaborator

@sj158 Thank you for your input! I see, then I'll create a PR which should be similar to #12738 and let the maintainers decide whether they should merge the pr or not.

@shuheiktgw
Copy link
Collaborator

shuheiktgw commented Jan 25, 2021

Tackled it in #17280. Hope the PR will be reviewed soon.

@nateww
Copy link

nateww commented Feb 8, 2021

I'm seeing the same behavior as the original author, and I JUST updated to the latest terraform release (0.14.6). Even more concerning is that somehow it gets the ARN from the newly created target group, but somehow fails to consider it as valid. I'm also doing tagging.

module.websocket.aws_lb_target_group.app[0]: Creating...
...
[ ~12 minutes elapsed while I'm spinning up more infrastructure, but NO other messages regarding the TG is output. ]
Error: error updating LB Target Group (arn:aws:elasticloadbalancing:us-west-2:1233456789:targetgroup/websocket-nate-test/adb48fa8a765cbd7) tags: error tagging resource (arn:aws:elasticloadbalancing:us-west-2:123456789:targetgroup/websocket-nate-test/adb48fa8a765cbd7): TargetGroupNotFound: Target groups 'arn:aws:elasticloadbalancing:us-west-2:123456789:targetgroup/websocket-nate-test/adb48fa8a765cbd7' not found
	status code: 400, request id: 7a37c60b-1b74-40ca-84b4-3c71e9e81e3e

Another TG was created a few lines after I started create the above TG, and this is the relevant logs lines from that run.

module.turforsurf.aws_lb_target_group.app[0]: Creating...
module.turforsurf.aws_lb_target_group.app[0]: Creation complete after 0s [id=arn:aws:elasticloadbalancing:us-west-2:123456789:targetgroup/turforsurf-nate-test/f4f0c91946300de9]

It sure smells like some weird race condition in the AWS provider code (trying to do the two requests at the same time), but the timing in my case from the creation and the tag appear to be many minutes apart.

@njoyneer
Copy link

I have the same issue (Terraform 0.12.25), but in my case debug log shows that Terraform sends requests in series, only after it receives aws' response that the target group was created:

2021-02-15T12:43:33.427Z [DEBUG] plugin.terraform-provider-aws_v2.70.0_x4: Action=CreateTargetGroup...
2021-02-15T12:43:33.555Z [DEBUG] plugin.terraform-provider-aws_v2.70.0_x4: 2021/02/15 12:43:33 [DEBUG] [aws-sdk-go] <CreateTargetGroupResponse xmlns="http://elasticloadbalancing.amazonaws.com/doc/2015-12-01/">
2021-02-15T12:43:33.555Z [DEBUG] plugin.terraform-provider-aws_v2.70.0_x4:   <CreateTargetGroupResult>...
...
2021-02-15T12:43:33.555Z [DEBUG] plugin.terraform-provider-aws_v2.70.0_x4:         <TargetGroupArn>arn:aws:elasticloadbalancing:...
2021-02-15T12:43:33.556Z [DEBUG] plugin.terraform-provider-aws_v2.70.0_x4: </CreateTargetGroupResponse>

And debug shows that only after terraform gets confirmation response from AWS about group creation it sends AddTags request:

2021-02-15T12:43:33.556Z [DEBUG] plugin.terraform-provider-aws_v2.70.0_x4: 2021/02/15 12:43:33 [DEBUG] [aws-sdk-go] DEBUG: Request elasticloadbalancing/AddTags Details:
...
2021-02-15T12:43:33.556Z [DEBUG] plugin.terraform-provider-aws_v2.70.0_x4: Action=AddTags&ResourceArns.member.1=arn%3Aaws%3Aelasticloadbalancing...
2021-02-15T12:43:33.570Z [DEBUG] plugin.terraform-provider-aws_v2.70.0_x4: 2021/02/15 12:43:33 [DEBUG] [aws-sdk-go] DEBUG: Response elasticloadbalancing/AddTags Details:
2021-02-15T12:43:33.570Z [DEBUG] plugin.terraform-provider-aws_v2.70.0_x4: ---[ RESPONSE ]--------------------------------------
2021-02-15T12:43:33.570Z [DEBUG] plugin.terraform-provider-aws_v2.70.0_x4: HTTP/1.1 400 Bad Request
...
2021-02-15T12:43:33.570Z [DEBUG] plugin.terraform-provider-aws_v2.70.0_x4:   <Error>
...
2021-02-15T12:43:33.570Z [DEBUG] plugin.terraform-provider-aws_v2.70.0_x4:     <Code>TargetGroupNotFound</Code>
...
2021-02-15T12:43:33.570Z [DEBUG] plugin.terraform-provider-aws_v2.70.0_x4: </ErrorResponse>
2021-02-15T12:43:33.570Z [DEBUG] plugin.terraform-provider-aws_v2.70.0_x4: 2021/02/15 12:43:33 [DEBUG] [aws-sdk-go] DEBUG: Validate Response elasticloadbalancing/AddTags failed, attempt 0/25, error TargetGroupNotFound: Target groups ... not found
2021-02-15T12:43:33.570Z [DEBUG] plugin.terraform-provider-aws_v2.70.0_x4: 	status code: 400, request id: ...
2021/02/15 12:43:33 [DEBUG] ...aws_lb_target_group.this[X]: apply errored, but we're indicating that via the Error pointer rather than returning it: error updating LB Target Group

So for me it looks more like resource inconsistency on AWS' side than terraform issue. In terraform debug logs I always see AddTags request for Target Groups only after it receives target group creation confirmation - whether TG was created shortly after request or after some delay.

@YakDriver YakDriver added this to the v3.29.0 milestone Feb 18, 2021
@YakDriver YakDriver removed the needs-triage Waiting for first response or review from a maintainer. label Feb 18, 2021
@YakDriver
Copy link
Member

This should be resolved with #17280. If you continue to have issues, please open a new issue and let us know.

@sj158
Copy link
Author

sj158 commented Feb 19, 2021

Thanks for your work! Hopefully this solves the problem once and for all.

Special thanks to @shuheiktgw and the other guys supporting this issue by describing their similar problem here.

@ghost
Copy link

ghost commented Feb 19, 2021

This has been released in version 3.29.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template for triage. Thanks!

@ghost
Copy link

ghost commented Mar 20, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked as resolved and limited conversation to collaborators Mar 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
service/elbv2 Issues and PRs that pertain to the elbv2 service.
Projects
None yet
7 participants