Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[net] Exclude net prio and classid cgroups to avoid conflict with cgroup2 #198

Merged
merged 1 commit into from Mar 11, 2021
Merged

Conversation

ghost
Copy link

@ghost ghost commented Mar 2, 2021

  • Unset CONFIG_CGROUP_NET_CLASSID, CONFIG_CGROUP_NET_PRIO and depended
    modules CONFIG_NET_CLS_CGROUP, CONFIG_NETFILTER_XT_MATCH_CGROUP
    in Kconfig to disable those v1 cgroups.
    The system contains programs, which use both groups v1 and v2, e.g. docker
    uses net_prio, net_cls and "ip vrf" uses cgroup2 socket matching.
    But, Linux kernel does not allow to work with net_prio, net_cls and
    socket matching from cgroup2 in the same time. Link to comment in source file:
    https://elixir.bootlin.com/linux/v4.19.156/source/include/linux/cgroup-defs.h#L745
    The related warning, appearing on startup: "sonic INFO kernel: [ 14.057746] cgroup:
    cgroup: disabling cgroup2 socket matching due to net_prio or net_cls activation".
    Disabling of net_prio and net_cls will prevent this conflict and make programs,
    which uses cgroup2 socket matching, to be working correct.

Signed-off-by: Maksym Belei [email protected]

What I did
Resolves sonic-net/sonic-buildimage#6858
The next cgroups has disabled in Linux kernel: net_prio, net_cls. Their depended modules has disabled too.

Why I did it
Using of the v1 cgroups makes impossible using of socket matching from cgroup2. Here is comment from Linux kernel:
https://elixir.bootlin.com/linux/v4.19.156/source/include/linux/cgroup-defs.h#L745
Syslog with the related warning:
sonic INFO kernel: [ 14.057746] cgroup: cgroup: disabling cgroup2 socket matching due to net_prio or net_cls activation

As some utilities in the system, like ip vrf, use cgroup2, there is necessity to disable net_prio, net_cls to ensure that those utilities will work correct.

How I verified it

  1. sudo config vrf add mgmt
  2. sudo ip vrf exec mgmt ping {IP address of eth0 interface} or show ntp(if NTP has configured through eth0 interface)

If there is the conflict in cgroups, ip vrf utility will not be able to work in scope of the VRF.

@paulmenzel
Copy link
Contributor

Thank you for the patch. Can you please add the details, how to reproduce this? I would have imagined, if there was a conflict, than the Linux would have measure to prevent it.

@ghost
Copy link
Author

ghost commented Mar 3, 2021

@paulmenzel, could you check the related defect: sonic-net/sonic-buildimage#6858? There is information about how to reproduce the issue and my investigation result with evidences regarding conflict of usage of cgroups v1 and v2. Also, here is 3rd party source with information, regarding net_prio, net_cls and socket matching in cgroup2.

I am currently testing the change locally, could somebody check it with pure jenkins build?

@paulmenzel
Copy link
Contributor

Thank you for your quick reply. A commit message should be self-contained in my opinion. I am going to check in the next days, but it’d be great, if you extended the commit message.

* Unset CONFIG_CGROUP_NET_CLASSID, CONFIG_CGROUP_NET_PRIO and depended
  modules CONFIG_NET_CLS_CGROUP, CONFIG_NETFILTER_XT_MATCH_CGROUP
  in Kconfig to disable those v1 cgroups.
  The system contains programs, which use both groups v1 and v2, e.g. docker
  uses net_prio, net_cls and "ip vrf" uses cgroup2 socket matching.
  But, Linux kernel does not allow to work with net_prio, net_cls and
  socket matching from cgroup2 in the same time. Link to comment in source file:
  https://elixir.bootlin.com/linux/v4.19.156/source/include/linux/cgroup-defs.h#L745
  The related warning, appearing on startup: "sonic INFO kernel: [ 14.057746] cgroup:
  cgroup: disabling cgroup2 socket matching due to net_prio or net_cls activation".
  Disabling of net_prio and net_cls will prevent this conflict and make programs,
  which uses cgroup2 socket matching, to be working correct.

Signed-off-by: Maksym Belei <[email protected]>
@ghost
Copy link
Author

ghost commented Mar 3, 2021

Commit message has updated, could you check it?

@ghost ghost marked this pull request as ready for review March 3, 2021 09:30
@lguohan lguohan merged commit ba9558c into sonic-net:master Mar 11, 2021
@ghost
Copy link
Author

ghost commented Mar 11, 2021

@lguohan, the issue is being occurred on 202012 too. I think we can cherry-pick the solution to 202012 without any changes. Could you mark it as for 202012?

@dprital
Copy link
Collaborator

dprital commented Mar 15, 2021

@daall - Can you please merge to 202012 ?

daall pushed a commit that referenced this pull request Mar 16, 2021
…ing (#198)

* Unset CONFIG_CGROUP_NET_CLASSID, CONFIG_CGROUP_NET_PRIO and depended
  modules CONFIG_NET_CLS_CGROUP, CONFIG_NETFILTER_XT_MATCH_CGROUP
  in Kconfig to disable those v1 cgroups.
  The system contains programs, which use both groups v1 and v2, e.g. docker
  uses net_prio, net_cls and "ip vrf" uses cgroup2 socket matching.
  But, Linux kernel does not allow to work with net_prio, net_cls and
  socket matching from cgroup2 in the same time. Link to comment in source file:
  https://elixir.bootlin.com/linux/v4.19.156/source/include/linux/cgroup-defs.h#L745
  The related warning, appearing on startup: "sonic INFO kernel: [ 14.057746] cgroup:
  cgroup: disabling cgroup2 socket matching due to net_prio or net_cls activation".
  Disabling of net_prio and net_cls will prevent this conflict and make programs,
  which uses cgroup2 socket matching, to be working correct.

Signed-off-by: Maksym Belei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ntp] manully added serveron frontpanel and mgmt not connecting after applying mgmt vrf with no ntp from dhcp
4 participants