Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes nginx-ingress controller cannot create load balancer #183

Closed
1 of 4 tasks
jens-ghc opened this issue Oct 26, 2018 · 16 comments
Closed
1 of 4 tasks

Kubernetes nginx-ingress controller cannot create load balancer #183

jens-ghc opened this issue Oct 26, 2018 · 16 comments

Comments

@jens-ghc
Copy link

jens-ghc commented Oct 26, 2018

I have issues

I'm submitting a...

  • bug report
  • feature request
  • support request
  • kudos, thank you, warm fuzzy

What is the current behavior?

No ELB or NLB load balancer is created by EKS when using nginx ingress controller

If this is a bug, how to reproduce? Please include a code sample if relevant.

When defining an nginx ingress controller (using defaults that use the classic ELB load balancer) in Kubernetes and watching the output in the Kubernetes dashboard, the following error message is shown

Error creating load balancer (will retry): failed to ensure load balancer for service ingress-nginx/ingress-nginx: AccessDenied: User: arn:aws:sts::xxxxxxx:assumed-role/eks-cluster/xxxxxxx is not authorized to perform: ec2:DescribeAccountAttributes status code: 403, request id: 84509652-d8ec-11e8-92e4-c38efc9058a2

What's the expected behavior?

A load balancer should be created

Are you able to fix this problem and submit a PR? Link here if you have already.

This problem is probably related to #103. The problem is fixed when adding this role policy:

resource "aws_iam_role_policy" "eks_cluster_ingress_loadbalancer_creation" {
  name   = "eks-cluster-ingress-loadbalancer-creation"
  role       = "${aws_iam_role.cluster.name}"
  policy = <<POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeAccountAttributes"
      ],
      "Resource": "*"
    }
  ]
}
POLICY
}

Environment details

  • Affected module version:
  • OS:
  • Terraform version:

Any other relevant info

@max-rocket-internet
Copy link
Contributor

That's very odd. ec2:DescribeAccountAttributes is not included in the default policy written by AWS. And also, I installed 2 nginx-ingress controllers last week without any problems. Both with classic ELBs.

@jens-ghc
Copy link
Author

jens-ghc commented Oct 30, 2018

I created two complete new EKS clusters this week in us-west-2 and in both cases the load balancers were not created by the ingress controller. I'm currently using the new NLB but when I tried with the classic ELB, the same problem happened. Only when I then manually added the ec2:DescribeAccountAttributes permission did the load balancer get created.

@aalimovs
Copy link

I just created a new cluster and saw the same error, but in the end it did create the load balancer without doing any changes to the IAM policies.

@max-rocket-internet
Copy link
Contributor

Hmmm. I don't really understand why this is.

The EKS service should be using a service-linked role named AWSServiceRoleForElasticLoadBalancing and this includes this permission:
https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/elb-service-linked-roles.html

@jens-ghc
Copy link
Author

jens-ghc commented Nov 2, 2018

I think there were a couple of issues around the service-linked role before. My issue seems to match #87 and #103 (and also this StackOverflow issue): The problem seems to be caused by the ingress controller trying to create the very first load balancer in that specfic AWS account. As stated in the other linked issues, it might be the case that the problem will not occur if there is already another load balancer active in the AWS account. In my case there is no other load balancer, which might trigger the issue.

In order to fix the issue, I see two possible paths. Wanted to see what you think about them:

  • Option 1: Add the permission as shown in my fix above
  • Option 2: Instead, expose the name of the created cluster role as an output of the module. Users that run into the issue can then manually add the permission inside their own code. The added benefit could be that through this hook additional permissions can be assigned to that role if someone has a need for that.

Option 2 would look something like this: In terraform-aws-modules-terraform-aws-eks/outputs.tf add this line:

output "cluster_iam_role_name" {
  value       = "${aws_iam_role.cluster.name}"
}

Then in the consumer of the EKS module, do this:

resource "aws_iam_role_policy" "eks_cluster_ingress_loadbalancer_creation" {
name   = "eks-cluster-ingress-loadbalancer-creation"
role       = "${module.eks.cluster_iam_role_name}"
policy = <<POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeAccountAttributes"
      ],
      "Resource": "*"
    }
  ]
}
POLICY
}

@max-rocket-internet
Copy link
Contributor

The problem seems to be caused by the ingress controller trying to create the very first load balancer in that specfic AWS account

I think this is correct. I asked in the AWS Slack org and someone said the same:

screen shot 2018-11-02 at 09 52 50

I think option 1 is better and cleaner than option 2 but would be even better and make more sense would be for AWS to add this to THEIR policy. I'll ask them.

@jens-ghc
Copy link
Author

jens-ghc commented Nov 3, 2018

Thanks for researching this @max-rocket-internet!

@jens-ghc
Copy link
Author

jens-ghc commented Nov 3, 2018

Also one more data point: after I added the permission and the ingress was able to create the NLB, I completely destroyed the EKS cluster. At that point, no more load balancer was in that AWS account. When I then created the new EKS, I ran into the same permission issue as before. So in case it helps to temporarily create a first ELB to permanently get rid of this problem, it might require using a load balancer outside EKS (as mentioned by Chris Hein above). Haven't tested this though.

@mmcaya
Copy link
Contributor

mmcaya commented Nov 5, 2018

@jens-totemic: The adding / removing the load balancer is a red herring, as that will only result in the provisioning of the required service link role for continued ELB operation. This is happening regardless of the presence of the service link role.

The root cause is that the EKS Cluster policy arn:aws:iam::aws:policy/AmazonEKSClusterPolicy does not have all the IAM permissions AWS has documented are required for a CreateLoadBalancer API call (e.g. like the call used in the kubernetes aws load balancer cloud provider package) See here : https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/elb-api-permissions.html

It is actually missing 3 permissions:

ec2:DescribeAccountAttributes
ec2:DescribeAddresses
ec2:DescribeInternetGateways

Not sure how these perms are being used in the specific setup exposing the error, will try a simple test to reproduce via a cli call. For reference, could you provide the ingress controller config you are using?

As @max-rocket-internet noted, the real solution is getting AWS to correct the cluster policy, or at least them noting why those three documented permissions were omitted. When they added iam:CreateServiceLinkedRole to the cluster policy a little while back, it rendered module fixes for #87 moot, causing the fix to be eventually reverted, and that should be avoided here if possible.

I'll follow up in the AWS #kubernetes channel later to see what additional info they can provide.

@jens-ghc
Copy link
Author

jens-ghc commented Nov 5, 2018

@mmcaya I'm setting up the ingress using the standard configuration files provided by kubernetes like this:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/mandatory.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/provider/aws/service-nlb.yaml

Thanks for investigating this further!

@pangorgo
Copy link

pangorgo commented Dec 5, 2018

The same happened to me. In my case tagging was a reason. As EKS documentations states, EKS add some tags to resources like VPC or subnets. https://docs.aws.amazon.com/eks/latest/userguide/network_reqs.html

As you might expect managing tags by terraform and by EKS itself at the same time can end up in some unpredictable situations. I my case I first run this module, and it created EKS, which tagged subnets and VPC. I made some mistakes in configuration so iterated a bit and run terraform couple of times. Terraform obviously removed EKS managed tags, which are kubernetes.io/cluster/<cluster name>=shared. Thats why, when finally I was done with terraform part, and provisioned LoadBalancer service on top of EKS it was not able to discover proper subnet.

Then I stumbled upon this issue: hashicorp/terraform#6632 , where I found a solution that worked for me.

Basically I've added this lifecycle rule to my VPC and Subnets, to prevent terraform from removing EKS managed tags:

  lifecycle {
    ignore_changes = ["tags.%", "tags.kubernetes.io/cluster"]
  }

What is interesting is that you don't have to provide full tag name - but prefix is enough. This is really important in this case because after ../cluster/ there is dynamic part of tag containing cluster name.

@max-rocket-internet
Copy link
Contributor

@pangorgo
Copy link

pangorgo commented Dec 6, 2018

Yeah, example is legit as long as you manage VPC and EKS in the same directory (root module).
In my case, when I wanted to manage VPC terraform code and EKS code in separate dirs, then problem arises. lifecycle solution is good enough for me.

@max-rocket-internet
Copy link
Contributor

I'm gonna close this now. Feel free to reopen if it's still an issue.

@ixtli
Copy link

ixtli commented Mar 15, 2019

I just wanna leave a note here because im trying to build a terraform config that creates an EKS cluster in a new vpc with new subnets and installs helm/tiller then installs some packages all in one script. I was pulling my hair out about this until i read @pangorgo say

As you might expect managing tags by terraform and by EKS itself at the same time can end up in some unpredictable situations. I my case I first run this module, and it created EKS, which tagged subnets and VPC. I made some mistakes in configuration so iterated a bit and run terraform couple of times. Terraform obviously removed EKS managed tags, which are kubernetes.io/cluster/<cluster name>=shared. Thats why, when finally I was done with terraform part, and provisioned LoadBalancer service on top of EKS it was not able to discover proper subnet.

which made me check the tags on my subnets. Only then did i find out that i didnt ever check to see if the tags were applied right. I was doing

resource "aws_subnet" "public" {
  count             = "${length(data.aws_availability_zones.available.names)}"
  vpc_id            = "${aws_vpc.current.id}"
  cidr_block        = "10.${var.cidr_b_class}.${count.index * 16}.0/20"
  availability_zone = "${data.aws_availability_zones.available.names[count.index]}"

  tags = {
    Name = "${var.name}-public-az-${count.index}"

    "kubernetes.io/cluster/${local.cluster_name}" = "shared"
  }

  depends_on = ["aws_internet_gateway.default"]
}

turns out there was a reason why this was done with the map() interpolation in the docs! To be clear for anyone else new to this: YOU CANT USE INTERPOLATIONS IN MAP NAMES THIS WAY. The vim plugin happily hilighted "kubernetes.io/cluster/${local.cluster_name}" = "shared" and passed the key literally as "kubernetes.io/cluster/${local.cluster_name}"

EDIT:

Figured i should post the correct resource for completeness

resource "aws_subnet" "public" {
  count             = "${length(data.aws_availability_zones.available.names)}"
  vpc_id            = "${aws_vpc.current.id}"
  cidr_block        = "10.${var.cidr_b_class}.${count.index * 16}.0/20"
  availability_zone = "${data.aws_availability_zones.available.names[count.index]}"

  tags = "${
		map(
    	"Name", "${var.name}-public-az-${count.index}",
    	"kubernetes.io/cluster/${local.cluster_name}" , "shared"
		)
	}"

  depends_on = ["aws_internet_gateway.default"]
}

ivan-sukhomlyn added a commit to ivan-sukhomlyn/terraform-aws-eks that referenced this issue May 31, 2020
AmazonEKSClusterPolicy IAM policy doesn't contain all necessary
permissions to create ELB service-linked role required during
LB creation on AWS with K8S Service.

terraform-aws-modules#900
terraform-aws-modules#183 (comment)
ivan-sukhomlyn added a commit to ivan-sukhomlyn/terraform-aws-eks that referenced this issue May 31, 2020
AmazonEKSClusterPolicy IAM policy doesn't contain all necessary
permissions to create ELB service-linked role required during
LB creation on AWS with K8S Service.

terraform-aws-modules#900
terraform-aws-modules#183 (comment)
ivan-sukhomlyn added a commit to ivan-sukhomlyn/terraform-aws-eks that referenced this issue May 31, 2020
AmazonEKSClusterPolicy IAM policy doesn't contain all necessary
permissions to create ELB service-linked role required during
LB provisioning at AWS by K8S Service.

terraform-aws-modules#900
terraform-aws-modules#183 (comment)
ivan-sukhomlyn added a commit to ivan-sukhomlyn/terraform-aws-eks that referenced this issue Jun 9, 2020
AmazonEKSClusterPolicy IAM policy doesn't contain all necessary
permissions to create ELB service-linked role required during
LB provisioning at AWS by K8S Service.

terraform-aws-modules#900
terraform-aws-modules#183 (comment)
ivan-sukhomlyn added a commit to ivan-sukhomlyn/terraform-aws-eks that referenced this issue Jun 9, 2020
AmazonEKSClusterPolicy IAM policy doesn't contain all necessary
permissions to create ELB service-linked role required during
LB provisioning at AWS by K8S Service.

terraform-aws-modules#900
terraform-aws-modules#183 (comment)
dpiddockcmp pushed a commit that referenced this issue Jun 28, 2020
…ster (#902)

AmazonEKSClusterPolicy IAM policy doesn't contain all necessary permissions to create ELB service-linked role required during LB provisioning at AWS by K8S Service.

#900
#183 (comment)
barryib pushed a commit to Polyconseil/terraform-aws-eks that referenced this issue Oct 25, 2020
…ster (terraform-aws-modules#902)

AmazonEKSClusterPolicy IAM policy doesn't contain all necessary permissions to create ELB service-linked role required during LB provisioning at AWS by K8S Service.

terraform-aws-modules#900
terraform-aws-modules#183 (comment)
baibailiha added a commit to baibailiha/terraform-aws-eks that referenced this issue Sep 13, 2022
…ster (#902)

AmazonEKSClusterPolicy IAM policy doesn't contain all necessary permissions to create ELB service-linked role required during LB provisioning at AWS by K8S Service.

terraform-aws-modules/terraform-aws-eks#900
terraform-aws-modules/terraform-aws-eks#183 (comment)
@github-actions
Copy link

github-actions bot commented Dec 2, 2022

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants