Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional checks for cni 0.4.0 failing (atleast with the bridge plugin) #74

Closed
mrunalp opened this issue Jun 5, 2020 · 21 comments
Closed

Comments

@mrunalp
Copy link
Member

mrunalp commented Jun 5, 2020

Here is the config I am trying to use in CRI-O:

{
  "cniVersion": "0.4.0",
  "name": "bridge-firewall",
  "plugins": [
    {
      "type": "bridge",
      "bridge": "cni0",
      "isGateway": true,
      "ipMasq": true,
      "hairpinMode": true,
      "ipam": {
        "type": "host-local",
        "routes": [
            { "dst": "0.0.0.0/0" }
        ],
        "ranges": [
            [
               {
                   "subnet": "10.85.0.0/16",
                   "gateway": "10.85.0.1"
               }
            ]
        ]
      }
    },
    {
      "type": "firewall"
    }
  ]
}

However, I see that the additional checks in ocicni for version 0.4.0 and higher start failing:

Jun 05 13:57:28 localhost.localdomain crio[25444]: time="2020-06-05 13:57:28.849936252-07:00" level=error msg="Error checking network: Interface veth6e529a55 Mac doesn't match: 66:43:79:e6:a9:17 not found"
Jun 05 13:57:28 localhost.localdomain crio[25444]: time="2020-06-05 13:57:28.849949659-07:00" level=error msg="Error while checking pod to CNI network \"bridge-firewall\": Interface veth6e529a55 Mac doesn't match: 66:43:79:e6:a9:17 not found"
Jun 05 13:57:42 localhost.localdomain crio[25444]: time="2020-06-05 13:57:42.845894713-07:00" level=error msg="Error checking network: Interface vethc824d8d9 Mac doesn't match: e6:0f:c8:73:c9:ad not found"
Jun 05 13:57:42 localhost.localdomain crio[25444]: time="2020-06-05 13:57:42.845901992-07:00" level=error msg="Error while checking pod to CNI network \"bridge-firewall\": Interface vethc824d8d9 Mac doesn't match: e6:0f:c8:73:c9:ad not found"

If I compile the check out, then pods come up fine for me:

diff --git a/vendor/github.com/cri-o/ocicni/pkg/ocicni/ocicni.go b/vendor/github.com/cri-o/ocicni/pkg/ocicni/ocicni.go
index 85dcdcfe2..82bf12bb6 100644
--- a/vendor/github.com/cri-o/ocicni/pkg/ocicni/ocicni.go
+++ b/vendor/github.com/cri-o/ocicni/pkg/ocicni/ocicni.go
@@ -778,24 +778,8 @@ func (network *cniNetwork) addToNetwork(ctx context.Context, rt *libcni.RuntimeC
 func (network *cniNetwork) checkNetwork(ctx context.Context, rt *libcni.RuntimeConf, cni *libcni.CNIConfig, nsManager *nsManager, netns string) (cnitypes.Result, error) {
        logrus.Infof("About to check CNI network %s (type=%v)", network.name, network.config.Plugins[0].Network.Type)
-       gtet, err := cniversion.GreaterThanOrEqualTo(network.config.CNIVersion, "0.4.0")
-       if err != nil {
-               return nil, err
-       }
-
        var result cnitypes.Result
-
-       // When CNIVersion supports Check, use it.  Otherwise fall back on what was done initially.
-       if gtet {
-               err = cni.CheckNetworkList(ctx, network.config, rt)
-               logrus.Infof("Checking CNI network %s (config version=%v)", network.name, network.config.CNIVersion)
-               if err != nil {
-                       logrus.Errorf("Error checking network: %v", err)
-                       return nil, err
-               }
-       }
-
-       result, err = cni.GetNetworkListCachedResult(network.config, rt)
+       result, err := cni.GetNetworkListCachedResult(network.config, rt)

Is this something we have to fix in ocicni or upstream in the bridge plugin?

@mrunalp
Copy link
Member Author

mrunalp commented Jun 5, 2020

@dcbw @mccv1r0 ptal.

@mccv1r0
Copy link
Contributor

mccv1r0 commented Jun 8, 2020

@mrunalp Is this from a unit test? ocicnitool? e.g. is there a way I can reproduce?

Error checking network: Interface veth6e529a55 Mac doesn't match: 66:43:79:e6:a9:17 not found"

After cniADD, info about the interface is cached (in case cniDel is needed and e.g. oci doesn't have access to the info anymore to supply it to cniDel.) It looks like CRI-O calls checkNetwork to verify that info cached hasn't changed in the network namespace since it was added. This error indicates that (at least) veth MAC found in the container is different from what was cached earlier. Or there is a bug somewhere.

@mrunalp
Copy link
Member Author

mrunalp commented Jun 8, 2020

@mccv1r0 The check is from the code in ocicni that I have pasted in the first comment. It looks like the check is failing for the bridge plugin so it probably needs to be fixed upstream for it.

@mccv1r0
Copy link
Contributor

mccv1r0 commented Jun 8, 2020

I see the code. But what is failing? e.g. what context?

I can add and check the cniVersion 0.4.0 config you provided above just fine using ocicnitool (upstream master branch):

$ sudo ./ocicnitool add mcc-cni-test0 mccPod0 mccId0 /var/run/netns/mcc-cni-test0 
INFO[0000] Found CNI network bridge-firewall (type=bridge) at /etc/cni/net.d/1-bridge-firewall.conflist 
INFO[0000] Found CNI network crionet_test_args (type=bridge) at /etc/cni/net.d/10-plugin_test-args.conf 
INFO[0000] Found CNI network crio-bridge (type=bridge) at /etc/cni/net.d/100-crio-bridge.conf 
INFO[0000] Found CNI network v6UP (type=bridge) at /etc/cni/net.d/600-v6UP.conflist 
INFO[0000] Found CNI network v6LAN (type=bridge) at /etc/cni/net.d/601-v6LAN.conflist 
INFO[0000] Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman-bridge.conflist 
INFO[0000] Found CNI network nginx-net (type=bridge) at /etc/cni/net.d/nginx-net.conflist 
INFO[0000] Found CNI network podnet (type=bridge) at /etc/cni/net.d/podnet.conflist 
INFO[0000] Found CNI network test-network (type=bridge) at /etc/cni/net.d/test-network.conflist 
INFO[0000] Update default CNI network name to bridge-firewall 
INFO[0000] Got pod network &{Name:mccPod0 Namespace:mcc-cni-test0 ID:mccId0 NetNS:/var/run/netns/mcc-cni-test0 Networks:[] RuntimeConfig:map[]} 
INFO[0000] About to add CNI network bridge-firewall (type=bridge) 
IP: 10.85.0.7/16 (eth0 32:5a:e0:f2:86:af)
$
$
$ sudo ./ocicnitool status mcc-cni-test0 mccPod0 mccId0 /var/run/netns/mcc-cni-test0 
INFO[0000] Found CNI network bridge-firewall (type=bridge) at /etc/cni/net.d/1-bridge-firewall.conflist 
INFO[0000] Found CNI network crionet_test_args (type=bridge) at /etc/cni/net.d/10-plugin_test-args.conf 
INFO[0000] Found CNI network crio-bridge (type=bridge) at /etc/cni/net.d/100-crio-bridge.conf 
INFO[0000] Found CNI network v6UP (type=bridge) at /etc/cni/net.d/600-v6UP.conflist 
INFO[0000] Found CNI network v6LAN (type=bridge) at /etc/cni/net.d/601-v6LAN.conflist 
INFO[0000] Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman-bridge.conflist 
INFO[0000] Found CNI network nginx-net (type=bridge) at /etc/cni/net.d/nginx-net.conflist 
INFO[0000] Found CNI network podnet (type=bridge) at /etc/cni/net.d/podnet.conflist 
INFO[0000] Found CNI network test-network (type=bridge) at /etc/cni/net.d/test-network.conflist 
INFO[0000] Update default CNI network name to bridge-firewall 
INFO[0000] Got pod network &{Name:mccPod0 Namespace:mcc-cni-test0 ID:mccId0 NetNS:/var/run/netns/mcc-cni-test0 Networks:[] RuntimeConfig:map[]} 
INFO[0000] About to check CNI network bridge-firewall (type=bridge) 
INFO[0000] Checking CNI network bridge-firewall (config version=0.4.0) 
IP: 10.85.0.7/16 (eth0 32:5a:e0:f2:86:af)

Notice INFO[0000] Checking CNI network bridge-firewall (config version=0.4.0) is logged by the code you commented out.

Inside the netns, "check" finds what it expects:

$ sudo ip netns exec mcc-cni-test0 ip addr show 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
13: eth0@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 32:5a:e0:f2:86:af brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.85.0.7/16 brd 10.85.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::305a:e0ff:fef2:86af/64 scope link 
       valid_lft forever preferred_lft forever
$

Thus, check passes.

When the code fails for you, what are you running? A CI test? k8s? What versions?

Perhaps (and this is just a guess, I need more info) the CNI and/or CNI Plugins being used need to be updated?

@mrunalp
Copy link
Member Author

mrunalp commented Jun 8, 2020

@mccv1r0 I am running a local k8s cluster with cri-o master and k8s master with the version of ocicni vendored in cri-o master. I am using v0.8.6 for the CNI plugins. What version are the plugins in your test above?

@mccv1r0
Copy link
Contributor

mccv1r0 commented Jun 9, 2020

@mrunalp I used the CNI plugins master branch. But check has been in for well over a year.

I tried again with v0.8.6 and things work as shown above. Is k8s pointing CNI to the proper netns?

@mrunalp
Copy link
Member Author

mrunalp commented Jun 15, 2020

I was testing with manage_ns_lifecycle = true which changes the way we handle network namespaces from the default of directly accessing the network namespace of the pod pid.

cc: @haircommander

@mrunalp
Copy link
Member Author

mrunalp commented Jun 15, 2020

I also tested with that set to false and no change for me :/

@harche
Copy link

harche commented Dec 2, 2020

I ran into same issue today with upstream k8s master and crio master.

@haircommander
Copy link
Member

do you have cycles to look into it @harche

@harche
Copy link

harche commented Dec 3, 2020

@haircommander Little busy with 4.7 bugs right now, but I can definitely look at it after that. Would that be fine?

@haircommander
Copy link
Member

yeah that's fine!

@champtar
Copy link

champtar commented Apr 20, 2023

I just faced similar issue trying to migrate from containerd to cri-o, it seems containerd isn't calling CNI CHECK (

err = cni.CheckNetworkList(ctx, network.config, rt)
), so we have some false positive (plugin is working but the check part is buggy)
workaround is to use cniVersion 0.3.1
here 2 CNI plugins bug fixes: containernetworking/plugins#885 / containernetworking/plugins#887

@haircommander
Copy link
Member

But after those fixes, the issue goes away @champtar ?

@champtar
Copy link

@haircommander for my use case involving ptp and tuning plugin and my particular config yes issue is fixed,
but some other plugins and config might uncover more bugs as CHECK is not called by containerd.
Would be good to add an option to not call CHECK as a workaround when similar bugs are encountered.

@squeed
Copy link
Collaborator

squeed commented Apr 25, 2023

There is also the option to disable check in the CNI config file itself. We anticipated that some configurations might fail check due to complicated chains. But, bugs are possible too :-/.

@champtar
Copy link

Thanks @squeed ! just tried, but disableCheck seems to have no effects :(

@squeed
Copy link
Collaborator

squeed commented Apr 25, 2023

That's not good! Note that disableCheck only works for config lists, not for singular configs. Maybe that's the issue?

@champtar
Copy link

That's not good! Note that disableCheck only works for config lists, not for singular configs. Maybe that's the issue?

I'm using multus and using a singular config, if I try to put the multus config in a list

"cniVersion": "0.4.0",
"disableCheck": true,
"name": "multus-cni-network",
"plugins": [{
  "type": "multus-cni",

I get Error loading CNI config file /usr/lib/cni/net.d/00-multus.conf: error parsing configuration: missing 'type'

@champtar
Copy link

Works ok if I move the disableCheck inside the multus config

@haircommander
Copy link
Member

for posterity: this should have been fixed with cni bridge plugin 1.2.0: containernetworking/plugins#809

@champtar I think you're hitting a different issue. are you still encountering it? if so, can you open a new issue please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants