Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reset avc_cache_threshold to 512 as higher values cause performance issues #9923

Closed
wants to merge 1 commit into from
Closed

Reset avc_cache_threshold to 512 as higher values cause performance issues #9923

wants to merge 1 commit into from

Conversation

fcami
Copy link

@fcami fcami commented Sep 5, 2018

Reset avc_cache_threshold to 512 which is the default in RHEL7.
Higher values cause performance problems and any improvement is marginal for most workloads.
Rationale: SELinuxProject/selinux-kernel#34 (comment)
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1548428

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: fcami
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: jmencak

If they are not already assigned, you can assign the PR to them by writing /assign @jmencak in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Sep 5, 2018
@papr-bot
Copy link

papr-bot commented Sep 5, 2018

Can one of the admins verify this patch?
I understand the following commands:

  • bot, add author to whitelist
  • bot, test pull request
  • bot, test pull request once

@openshift-ci-robot openshift-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 5, 2018
@eparis
Copy link
Member

eparis commented Sep 5, 2018

I'd like to see a value in the 4k-16k range. 512 is a big inadequate when you have 200 containers on a node. Though clearly 64k goes way to far the other direction.

@jeremyeder
Copy link
Contributor

@fcami we need to be data-driven. Let's close this PR until we have run the tests. I created a JIRA card for our team to re-qualify this setting. Until then, there is no reason at all that a customer could not adjust that value if they run into issues. OK?

So, we will post a PR to openshift-ansible change this setting once those tests are complete (in the 4.0 timeframe).

@eparis
Copy link
Member

eparis commented Sep 5, 2018

/ok-to-test

we have data. 65k bad. 512 better. I'd rather, if we're stabbing in the dark, go in the middle. But I can wait until 4.0.

@openshift-ci-robot openshift-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 5, 2018
@fcami
Copy link
Author

fcami commented Sep 5, 2018

@jeremyeder I'm concerned that 64K is much too high a default. Maybe 512 is too low, but then, could we ship 3.11 with an intermediate value? It is not easy to find out that SELinux is the root cause when a cluster with much churn uses too much CPU kernel-side.

@DanyC97
Copy link
Contributor

DanyC97 commented Sep 5, 2018

my 0.02 $ from a user of Origin running few prod clusters with 2k pods each.

@jeremyeder how would you expect a user will be able to get to the bottom of this issue w/o having any metrics nor any KB with the info provided in the BZ above?

Imo i do agree with @fcami @eparis to sort it out now and maybe get a KB published asap rather than wait for 4.0.

@fcami
Copy link
Author

fcami commented Sep 5, 2018

Updated to 4K as per @eparis ' comment "in the 4k-16k range".

@jeremyeder
Copy link
Contributor

Just so you're all aware, this setting has been in place since before 3.0 GA and we've only rarely heard about any concerns. So I don't see how this is an emergency.

Regardless, I've already assigned the JIRA research card to a member of my team @ekuric and discussed with him having a recommendation in place for 3.11, before 3.11 code-freeze next Wednesday. He will update this issue accordingly.

Feel free to write a Kbase article to address releases prior to 3.11.

@sdodson
Copy link
Member

sdodson commented Sep 6, 2018

/hold
Whenever @eparis and @jeremyeder say this is good to go i'm fine pulling it in. I just want to avoid someone on my team slapping lgtm on this and it going in.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 6, 2018
@fcami
Copy link
Author

fcami commented Sep 6, 2018

Closing since this will be fixed in 3.11 as per previous comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants