Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unhandled rate limit which causes other errors #537

Closed
uthark opened this issue Jul 22, 2019 · 1 comment
Closed

Unhandled rate limit which causes other errors #537

uthark opened this issue Jul 22, 2019 · 1 comment
Labels

Comments

@uthark
Copy link
Contributor

uthark commented Jul 22, 2019

We found the following behavior:

  1. Because of ENIs are released when API is down, causing potential outage #431, we decided to warming maximum number of ENIs that can be attached to an instance.
  2. We have medium size clusters (hundreds of nodes)
  3. During rolling out, we observed spike of AWS API Errors in the plugin.
  4. During investigation, we found the error in the plugin.

Steps to reproduce

  1. Setup WARM_ENI_TARGET to max available to your instance type.
  2. Rollout change in a big cluster.
  3. CNI Plugins on each nodes will start to warming ENIs.

ipamd logs:

2019-07-19T17:40:34Z [INFO]     Exceeded instance ENI attachment limit: 7
2019-07-19T17:40:34Z [ERROR]    Failed to attach ENI eni-029644ab1bc40fc71: AttachmentLimitExceeded: Interface count 9 exceeds the limit for m4.4xlarge
        status code: 400, request id: b14ba702-ca17-4772-a0a8-7c577257d2ab
2019-07-19T17:40:34Z [DEBUG]    Trying to delete ENI: eni-029644ab1bc40fc71

Upon further investigation, we found the issue in AllocENI function:

func (cache *EC2InstanceMetadataCache) AllocENI(useCustomCfg bool, sg []*string, subnet string) (string, error) {
eniID, err := cache.createENI(useCustomCfg, sg, subnet)
if err != nil {
return "", errors.Wrap(err, "AllocENI: failed to create ENI")
}
cache.tagENI(eniID)
attachmentID, err := cache.attachENI(eniID)
if err != nil {
_ = cache.deleteENI(eniID)
return "", errors.Wrap(err, "AllocENI: error attaching ENI")
}
// also change the ENI's attribute so that the ENI will be deleted when the instance is deleted.
attributeInput := &ec2.ModifyNetworkInterfaceAttributeInput{
Attachment: &ec2.NetworkInterfaceAttachmentChanges{
AttachmentId: aws.String(attachmentID),
DeleteOnTermination: aws.Bool(true),
},
NetworkInterfaceId: aws.String(eniID),
}
start := time.Now()
_, err = cache.ec2SVC.ModifyNetworkInterfaceAttribute(attributeInput)
awsAPILatency.WithLabelValues("ModifyNetworkInterfaceAttribute", fmt.Sprint(err != nil)).Observe(msSince(start))
if err != nil {
awsAPIErrInc("ModifyNetworkInterfaceAttribute", err)
err := cache.FreeENI(eniID)
if err != nil {
awsUtilsErrInc("ENICleanupUponModifyNetworkErr", err)
}
return "", errors.Wrap(err, "AllocENI: unable to change the ENI's attribute")
}
log.Infof("Successfully created and attached a new ENI %s to instance", eniID)
return eniID, nil
}

  1. If ModifyNetworkInterfaceAttribute call was unsuccessful (it was in our case - cloudtrail logs indicates that this call was rate-limited), then plugin will try to delete ENI.
  2. ENI can't be deleted because it is attached to an instance.
62019-07-19T17:25:55Z [DEBUG]	Trying to delete ENI: eni-0a0b3f58e0fe6d7d0
2019-07-19T17:29:30Z [DEBUG]	Not able to delete ENI yet (attempt 1/20): InvalidParameterValue: Network interface 'eni-0a0b3f58e0fe6d7d0' is currently in use.
	status code: 400, request id: 9c360487-187e-4573-a072-4a234b8d1946 

This error causes another issue: we will have an ENI instance that is attached to an EC2 instance, but CNI Plugin doesn't know about it, and it will try to warm another CNI plugin and it will fail wit the error message AttachmentLimitExceeded.

The reason CNI Plugin get this error is because it was not properly deleted. As a result, internal datastore is not updated. And on the next call of ìncreasIPPool it will fail at

if c.dataStore.GetENIs() < c.maxENI {
c.tryAllocateENI()
c.updateLastNodeIPPoolAction()
Because we have 8 ENIs attached to the instance out of possible 8 ENI, but ENI thinks that we have only 7 attached and will try to provision another.

@uthark
Copy link
Contributor Author

uthark commented Jul 22, 2019

@mogren Could you please take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants