Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Behavior of Variable Validation Against Data Blocks and Resources on First Apply #35846

Open
konradzagozda opened this issue Oct 12, 2024 · 6 comments

Comments

@konradzagozda
Copy link

Terraform Version

1.9.7

Terraform Configuration Files

variable "unused_variable" {
    description = "This variable is not used in the configuration"
    type        = string
    default     = "unused"

    validation {
        condition = var.unused_variable != null && startswith(data.aws_s3_bucket.example_bucket_data.bucket, "zzz")
        error_message = "test"
    }
}

resource "random_string" "bucket_suffix" {
  length  = 6
  special = false
  upper   = false
}

resource "aws_s3_bucket" "example_bucket" {
  bucket = "kz-${random_string.bucket_suffix.result}"
}


data "aws_s3_bucket" "example_bucket_data" {
  bucket = aws_s3_bucket.example_bucket.bucket
}

Debug Output

first apply success: https://gist.github.com/konradzagozda/cd5f98261795ece6561fac8a1f29723a
second apply fails: https://gist.github.com/konradzagozda/65a77d4d6228fef75f3043bcc9a8c1ca
destroy fails: https://gist.github.com/konradzagozda/d42619fe01c7536e5e4772508c9c4346

Expected Behavior

Variable validation could occur after it can reference the attributes of a resource.

Actual Behavior

  1. Validation passes every time when the resource was not present before the apply.
  2. After that, it works as expected.

Steps to Reproduce

  1. terraform init
  2. terraform apply - success
  3. terraform destroy|apply - fail

Additional Context

A relatively new feature for variable validation was released in 1.9: "Input variable validation rules can refer to other objects."

While I find this super useful for referencing other variables, it behaves strangely when I try to reference resource or data blocks.

For example, let's say I have a feature for my app that has external infrastructure dependencies. I'd like to use a data block to fetch that information to validate whether the feature can be enabled successfully.

    validation {
        condition = var.unused_variable != null && startswith(data.aws_s3_bucket.example_bucket_data.bucket, "zzz")
        error_message = "test"
    }

I would expect this to fail every time because the prefix doesn't match the expected name.

However, the behavior is as follows:

First apply: passes without issues.
Second apply: actually validates against the data block and produces the error message.
Destroy: blocks the destroy operation due to the validation.

I can see this being useful if validation that includes references to resources and blocks were moved to the final stage of the apply process, perhaps with the ability to roll back on failure.

References

No response

@konradzagozda konradzagozda added bug new new issue not yet triaged labels Oct 12, 2024
@jbardin
Copy link
Member

jbardin commented Oct 12, 2024

Hi @konradzagozda,

The result here is what I would expect, because you are validating something which cannot be known at the time of validation. This means it can't be evaluated until the next plan phase when all values have become known. It doesn't really make sense for an input variable to reference something which isn't know during plan, because there would be no way to resolve the dependencies in an order which could later be useful, hence it ends up in a deadlock like you've seen here.

This situation was one of the pitfalls associated with allowing references to other objects, and it doesn't seem that it was fully addressed. I'm not sure if we can fully prevent references like this now that the feature is released, but maybe it's possible to warn about them during the plan. It's possible that anytime a variable validation is unknown during plan it could result in creating an impasse during apply, so perhaps that should warrant a warning at least about the situation. Or perhaps that impasse is correct, and showing that something was incorrectly configured and must be changed in order to proceed, however I don't really like the fact that you can't destroy from the resulting state+config either.

As for the example presented here: the fact that you have a data source and a managed resource representing the same object means there's likely some other structural problem within your module which needs addressing first, so there may not be a need for the type of validation shown. I know this is a contrived example, but here the validation really should be a condition on the bucket itself or in a check block rather than in an input variable.

@jbardin jbardin added core and removed new new issue not yet triaged labels Oct 12, 2024
@konradzagozda
Copy link
Author

Hi @jbardin
Thanks for the clarifications. I could have been more concise with my example, I used an extra resource to see if the same issue happens with resource blocks—sorry for the redundancy.

It looks like the validation block runs against the last known state of terraform. Will this behaviour remain?
Perhaps validation could be delayed until a later stage for this type of validation in the current terraform run, and produce warnings at the end. It would allow validating the resulting infrastructure against the variable, not just the variable itself.

@jbardin
Copy link
Member

jbardin commented Oct 12, 2024

Thanks, that makes it a little more clear what aspect of the behavior you were concerned about.

The validation is run when the variable is evaluated, and since the data referenced by the condition is unknown during plan, it can't produce a result. Because your variable is not used elsewhere within the configuration, it's not defined whether validation will run or not during apply, but right now it's skipped during apply because it's not required to apply any changes. If you add that variable as in input to another resource, the apply will fail accordingly.

Regardless of whether it fails on the first apply or not, the validation is still happening too late to be of any use, because the condition isn't known until after it's already been applied. The end result is also the same, in that you have a state from which you cannot plan or destroy. This simpler version of the config demonstrates things more clearly:

variable "unused_variable" {
  type    = string
  default = "unused"
  validation {
    condition     = var.unused_variable == terraform_data.example.id
    error_message = "test"
  }
}
resource "terraform_data" "example" {
}

# uncomment this to fail during first apply
#resource "terraform_data" "fail" {
#  input = var.unused_variable
#}

We could issue a warning if the validation is unknown, but that's likely to cause a lot of noise for otherwise valid existing configurations. Maybe we could detect if it's unknown due to something in the same module, but again there are valid configs which could trigger it.

The main problem here is that you're not only validating the variable, you're using the variable's validation to check the status of something else unrelated. The variable validations are only meant to check the inputs into the module, which is why they were initially limited to only refer to themselves, and checking the results of other things within the module is going to cause confusing results like we've seen here. Since there should be no reason for an unused input variable, we're in kind of an edge case that maybe doesn't need to be cleanly resolved. If you want to add an arbitrary validation of resources, a variable is not the place to put it.

@konradzagozda
Copy link
Author

Got it, so I can validate against resources and data blocks, but the variable must be used to create the dependency graph correctly.
That's amazing, I can do:

resource "aws_s3_bucket" "example_bucket" {
  bucket = "${var.bucket_name_prefix}-ig29qi"
}

resource "terraform_data" "example" {
  input = var.unused_variable
}

variable "unused_variable" {
    description = "This variable is not used in the configuration"
    type        = string
    default     = "kz-sandbox-ig29qi.s3.amazonaws.com"

    validation {
        # argument that is not known until apply
        condition = var.unused_variable == aws_s3_bucket.example_bucket.bucket_domain_name 
        error_message = "test"
    }
}

In the state, I can see the dependency inferred based on this validation:

    {
      "mode": "managed",
      "type": "terraform_data",
      ...
      "instances": [
        {
          ...
          "dependencies": [
            "aws_s3_bucket.example_bucket"
          ]
        }
      ]
    }

For externally managed resources of different kinds, this could be powerful for validating whether the environment matches Terraform configuration expectations.

One issue I'm seeing is that if validation doesn't pass it blocks apply midway creating a deadlock with a partially applied configuration. Destroy will also validate variable that didn't have a chance to be used to create resource. Shall that be the case?

@konradzagozda
Copy link
Author

The root cause of the initial issue I was having is that an unused variable is not treated as a first-class entity for building the dependency graph, but rather as an intermediary.

If variables could have dependencies regardless of whether they are used or not, that would fix the issue.

    {
      "object_kind": "var",
      "config_addr": "var.unused_variable",
      "status": "pass",
      "objects": [
        {
          "object_addr": "var.unused_variable",
          "status": "pass"
        }
      ],
      dependencies: [ # new
          "some_resource.used_in_validation"
      ] 
    },

@jbardin
Copy link
Member

jbardin commented Oct 13, 2024

@konradzagozda, variables have dependencies just like anything else that contains references in the configuration, it wouldn't change anything to record them in the state. Resource dependencies are recorded in the state for when they need to be deleted and there is no longer an associated configuration.

Yes, the problem with using a variable validation to check something which has yet to be created is that it more likely to abort partway through the apply process, leaving a partially applied configuration. While I'd like to take care of the validation inconsistency presented here in some way such that it always works in a well-defined manner, the example use case still looks dubious when there are other means of validation or policy enforcement more relevant to the resource being checked. Input variable validation is designed to check the module input values, not to validate resource created by the module, and using it otherwise could still yield surprising behaviors because the validation error is not coming from the resource, but rather a variable which is not otherwise depended on by anything else the configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants