Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch between ot ping and net ping #34597

Closed
alexbarcelo opened this issue Apr 27, 2021 · 7 comments
Closed

Mismatch between ot ping and net ping #34597

alexbarcelo opened this issue Apr 27, 2021 · 7 comments
Assignees
Labels
area: Networking area: OpenThread bug The issue is a bug, or the PR is fixing a bug

Comments

@alexbarcelo
Copy link

Describe the bug

The ping on the network subsystem is different than the one on the openthread subsystem.

To Reproduce

Build an application for a nRF52840 board. The rellevant flags, AFAICT, are:

CONFIG_NET_CONFIG_NEED_IPV6=y
CONFIG_NET_CONFIG_NEED_IPV4=n
CONFIG_NET_CONFIG_SETTINGS=y
CONFIG_NET_IPV6=y
CONFIG_NET_IPV4=n

CONFIG_NET_IPV6_NBR_CACHE=y
CONFIG_NET_IPV6_MLD=y

CONFIG_NET_L2_OPENTHREAD=y

CONFIG_NET_IPV6_NBR_CACHE=y
CONFIG_OPENTHREAD_SLAAC=y
CONFIG_SHELL=y
CONFIG_OPENTHREAD_SHELL=y
CONFIG_NET_IF_UNICAST_IPV6_ADDR_COUNT=6
CONFIG_NET_CONFIG_MY_IPV6_ADDR=""

Join the device to an OpenThread network. Check that net ipv6 and ot ipaddr are consistent between them. Try to do a ping to, for instance, the border router:

ot ping 2600:70ff:f0de:200::1
Done

rtt:~$ 16 bytes from rtt:~$ 2600:70ff:f0de:200:0:0:0:1rtt:~$ : icmp_seq=13 hlim=64rtt:~$  time=15msrtt:~$ rtt:~$ 

But this fails:

net ping 2600:70ff:f0de:200::1
PING 2600:70ff:f0de:200::1
Ping timeout
rtt:~$ [00:36:42.907,531] <err> net_otPlat_radio: Error while calling otIp6Send
rtt:~$ [00:36:43.908,416] <err> net_otPlat_radio: Error while calling otIp6Send
rtt:~$ [00:36:44.909,301] <err> net_otPlat_radio: Error while calling otIp6Send
rtt:~$ rtt:~$ 

I am also unable to configure the routes in order to perform pings to IPv6 addresses outside the Thread network, but that may be due to my ignorance or misconfiguration up to this point.

Expected behavior

Pings on the IPv6 network should just work.

Impact

I have connectivity issues when trying to configure other things (such as LWM2M). I am not sure if the problem is the same, but I don't know how to diagnose while the pings are behaving like this. Maybe it is intended behaviour but I cannot grasp why.

Environment (please complete the following information):

  • OS: Linux
  • Toolchain: Zephyr SDK through PlatformIO. Just updated.
  • Build: Zephyr OS build zephyr-v20500
@alexbarcelo alexbarcelo added the bug The issue is a bug, or the PR is fixing a bug label Apr 27, 2021
@rlubos
Copy link
Contributor

rlubos commented Apr 27, 2021

Would be good to see net iface output. I've investigated a similar issue some time ago and it turned out that Zephyr selected some rubbish source address due to misconfiguration:
#30688 (comment)

@alexbarcelo
Copy link
Author

alexbarcelo commented Apr 27, 2021

The net iface output is the following:

00> net iface
00> 
00> Interface 0x20000f64 (OpenThread) [1]
00> =====================================
00> Link addr : F4:CE:36:D7:AC:93:0B:2C
00> MTU       : 1280
00> IPv6 unicast addresses (max 6):
00>   fe80::44d1:78a8:a61:c74a autoconf preferred infinite
00>   fde7:7d74:a202:3406:5ffe:afc9:adc8:b6bb autoconf preferred infinite meshlocal
00>   2600:70ff:f0de:200:c761:be4b:47d1:8900 autoconf preferred infinite
00> IPv6 multicast addresses (max 8):
00>   ff02::1
00>   ff02::1:ff02:d8f
00>   ff02::1:ff93:b2c
00>   ff02::1:ff61:c74a
00>   ff02::1:ffc8:b6bb
00>   ff03::1
00>   ff03::fc
00>   ff33:40:fde7:7d74:a202:3406:0:1
00> IPv6 prefixes (max 2):
00>   <none>
00> IPv6 hop limit           : 64
00> IPv6 base reachable time : 30000
00> IPv6 reachable time      : 23903
00> IPv6 retransmit timer    : 0
00> rtt:~$ 

I have been trying to update all the codebase (I am using the ot-br-agent for the Raspberry Pi and the official OpenThread NCP build for the nRF stick). The behaviour is the same. I am not sure it is a Zephyr issue, but the Border Router seems to work and the errors

net_otPlat_radio: Error while calling otIp6Send

where bugging me --and I assumed that this error was internal to the Zephyr networking stuff, as it seems to come from within the framework, but no idea if it is really an error or a harmless log.

@alexbarcelo
Copy link
Author

Some additional things I have discovered, that may or may not be relevant here (I apologize for my lack of understanding).

If I add a route to the netdata and register that, the ot ping works for servers in the LAN. The following can be seen in the Zephyr device:

rtt:~$ net route

IPv6 routes for interface 0x20000f64 (OpenThread)
=================================================
rtt:~$ ot netdata show
Prefixes:
2600:70ff:f0de:200::/64 parosn med f800
Routes:
2600:70ff:f0de:0::/64 s med f800
Services:
(...)
Done

(Note how the route appears in the netdata but does not appear in the net subsystem).

The net ping doesn't work, the ot ping does. If I do a tcpdump on the Border Router, only the ot ping is seen in the dump, and it seems to work properly:

17:12:36.773835 IP6 2600:70ff:f0de:200:c761:be4b:47d1:8900 > machine.domain.lan: ICMP6, echo request, seq 27, length 16

@rlubos
Copy link
Contributor

rlubos commented Apr 27, 2021

The unicast addresses you have on the interface seem fine - Zephyr should select the right address for an external destination.

The Error while calling otIp6Send you're observing is actually the reason why net ping doesn't work. For whatever reason, OT is rejecting to send the IP packet created by Zephyr. One of the possible reasons is wrong source address selection, that's why I asked for the net iface output, but it doesn't seem to be the case for you.

On potential misconfiguration I see is

CONFIG_NET_IPV6_NBR_CACHE=y
CONFIG_NET_IPV6_MLD=y

These are usually disabled with OpenThread, since Thread networks don't use these mechanisms. I can't tell for sure if it's reason of the failures you observe (it could be that OT rejects to send ND messages) but for sure it's worth checking how does it behave with these options disabled.

I might try to reproduce in a similar setup when I get some time (no promises on when though).

@alexbarcelo
Copy link
Author

You are right! Thank you! As I suspected, the problem was caused by my botch attempts on making things work.

Once both flags are set to n, then net route complains with Set CONFIG_NET_ROUTE to enable network route support.... which I assume is what we want because then the network system works through the OpenThread machinery (as it should).

rtt:~$ net ping 2600:70ff:f0de::1
PING 2600:70ff:f0de::1
8 bytes from 2600:70ff:f0de::1 to 2600:70ff:f0de:200:c761:be4b:47d1:8900: icmp_seq=0 ttl=63 rssi=0 time=18 ms
8 bytes from 2600:70ff:f0de::1 to 2600:70ff:f0de:200:c761:be4b:47d1:8900: icmp_seq=1 ttl=63 rssi=0 time=13 ms
8 bytes from 2600:70ff:f0de::1 to 2600:70ff:f0de:200:c761:be4b:47d1:8900: icmp_seq=2 ttl=63 rssi=0 time=13 ms
rtt:~$ 

No need to reproduce it --that was right on the money. After solving it I was gonna say "wouldn't it be nice for the system to complain under that kind of misconfiguration?" but then I realized that a device may have multiple network interfaces, so it may not be appropriate to trigger a build failure (for instance, it is conceivable to have NBR things on a WiFi interface and an OpenThread interface for meshlocal only stuff).

@rlubos
Copy link
Contributor

rlubos commented Apr 27, 2021

After solving it I was gonna say "wouldn't it be nice for the system to complain under that kind of misconfiguration?"

Well that's definitely something that needs addressing, specifically for multi-inteface. We should rather have an option to enable/disable ND/MLD on each interface separately at runtime instead of relying on static configuration. OpenThread could then enforce that ND/MLD are disabled on its interface. But we're simply not there yet.

@KenMacD
Copy link

KenMacD commented Jan 22, 2023

I also ran in to this issue today. I'll second the suggestion that a build error would have been nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Networking area: OpenThread bug The issue is a bug, or the PR is fixing a bug
Projects
None yet
Development

No branches or pull requests

4 participants