Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gnrc_ipv6_nib: don't ignore PL for route #10627

Merged
merged 1 commit into from
Mar 24, 2020

Conversation

miri64
Copy link
Member

@miri64 miri64 commented Dec 18, 2018

Contribution description

When pinging to a prefix for which there is a prefix list entry on the node (so no next hop) but a default route, a packet to a non-existent address under that prefix results in the packet being forwarded to the default route instead (see this thread on the devel mailing list). This fixes it, so the node tries address resolution on the interface the prefix list entry is associated to.

Testing procedure

Flash the gnrc_border_router example and try to ping a non-existent address from the host under the prefix advertised by the UHCP server (e.g. 2001:db8::1234). Without this fix you will get Hop limit exceeded messages, since the package circles over the ethos interface (which is the default route of the border router) until the hop limit exceeds. With this PR, neighbor solicitations to the solicited nodes multicast address of 2001:db8::1234 (ff02::1:ff00:1234) go out the WPAN interface.

Issues/PRs references

None

@miri64 miri64 added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: network Area: Networking labels Dec 18, 2018
@cladmi
Copy link
Contributor

cladmi commented Dec 18, 2018

I tested on IoT-LAB with iotlab-m3. The server has an IPv6 default route.

One node uses ethos using 2001:db8::1/64 tap address with a sniffer on channel 11 and flashed with: ETHOS_BAUDRATE=500000 DEFAULT_CHANNEL=11 BOARD=iotlab-m3 IOTLAB_NODE=m3-8.grenoble.iot-lab.info make -C examples/gnrc_border_router/ clean flash.

With both master and this PR on the node I get the same output for ping6:

> ping6 2001:db8::1234
ping6 2001:db8::1234
ping timeout
ping timeout
ping timeout

But with the PR I see NS packets on the radio sniffer:

tcpdump  -r pr_ping6.pcap.pcap  -q | head -n 5
reading from file pr_ping6.pcap.pcap, link-type IEEE802_15_4 (IEEE 802.15.4 with FCS)
19:30:28.008129 IEEE 802.15.4 Data packet 
19:30:29.007616 IEEE 802.15.4 Data packet 
19:30:30.008690 IEEE 802.15.4 Data packet 
19:30:32.004247 IEEE 802.15.4 Data packet 
19:30:33.003704 IEEE 802.15.4 Data packet 

On master, when pinging on the linux machine (IoT-LAB server that has an IPv6 default route) after a long time I got the 'Hop limit reached' that came from a host somewhere after the default route.
And I had no packets on the radio interface.

ping6 2001:db8::1234                                                                                                                                                            
PING 2001:db8::1234(2001:db8::1234) 56 data bytes                                                                                                                                                  
From 2001:660:5307:30ff::5 icmp_seq=458 Time exceeded: Hop limit     

With the PR I see the NS on the radio interface.

@cladmi
Copy link
Contributor

cladmi commented Dec 18, 2018

As I do not really understand how the change is done I asked @miri64 irl if the comment should not need to be updated.

@cladmi
Copy link
Contributor

cladmi commented Dec 18, 2018

Edit from my message before, the Time exceeded was from the other interface address not another one on the network.

When run from an iotlab-a8 I get the Time exceeded instantly and repeatedly. I can also monitor the packets with tcpdump that have decreasing ttl.

@cladmi
Copy link
Contributor

cladmi commented Dec 18, 2018

This now prevents me to do the behavior that is done by tunslip6 to have an address from the advertised prefix as tap0 interface.
So basically set 2001:db8::/64 as prefix and set 2001:db8::1/64 to tap0 and ping the host on this address. It was working in master. It maybe was not supposed to be supported but I could ping the host before.

Unrelated issue I think, but when trying to ping to a public ip address that is reachable through the default route, the riot node is using a link local address as source.
2018-12-18-222637_2560x1440_scrot
It also uses a link local address when pinging an address on the same host as ethos but then I can get the ping answer.

@miri64
Copy link
Member Author

miri64 commented Jan 9, 2019

So basically set 2001:db8::/64 as prefix and set 2001:db8::1/64 to tap0 and ping the host on this address. It was working in master. It maybe was not supposed to be supported but I could ping the host before.

IMHO this is wrong on tunslip6's part. The 2001:db8::/64 prefix is the prefix for the WPAN, so adding 2001:db8::1/64 to tap0 (which is topologically one hop before the WPAN) is wrong.

        o tap0
        |
        o BR
  -------------- WPAN (2001:db8::/64)
       / \
      o   o
     /   / \
    o   /   o
   / \ /
  o   o

@miri64
Copy link
Member Author

miri64 commented Jan 9, 2019

Unrelated issue I think, but when trying to ping to a public ip address that is reachable through the default route, the riot node is using a link local address as source.
2018-12-18-222637_2560x1440_scrot
It also uses a link local address when pinging an address on the same host as ethos but then I can get the ping answer.

Maybe #10499 fixes that. Something similar was reported there (the border router waited for a RA on its Ethernet interface, since there was no distinction between non-6LN and 6LN interfaces and so the global address never became valid).

@miri64
Copy link
Member Author

miri64 commented Jan 29, 2019

So basically set 2001:db8::/64 as prefix and set 2001:db8::1/64 to tap0 and ping the host on this address. It was working in master. It maybe was not supposed to be supported but I could ping the host before.

IMHO this is wrong on tunslip6's part. The 2001:db8::/64 prefix is the prefix for the WPAN, so adding 2001:db8::1/64 to tap0 (which is topologically one hop before the WPAN) is wrong.

        o tap0
        |
        o BR
  -------------- WPAN (2001:db8::/64)
       / \
      o   o
     /   / \
    o   /   o
   / \ /
  o   o

@tcschmidt @waehlisch @emmanuelsearch do you have any idea about this?

@tcschmidt
Copy link
Member

So basically set 2001:db8::/64 as prefix and set 2001:db8::1/64 to tap0 and ping the host on this address. It was working in master. It maybe was not supposed to be supported but I could ping the host before.

IMHO this is wrong on tunslip6's part. The 2001:db8::/64 prefix is the prefix for the WPAN, so adding 2001:db8::1/64 to tap0 (which is topologically one hop before the WPAN) is wrong.

        o tap0
        |
        o BR
  -------------- WPAN (2001:db8::/64)
       / \
      o   o
     /   / \
    o   /   o
   / \ /
  o   o

@tcschmidt @waehlisch @emmanuelsearch do you have any idea about this?

I'm not sure I understand all details, but if tap0 is the upstream of the BR, then it should have a prefix different from the prefix in the WPAN. 2001:db8::1/64 could be a typical address of the WPAN interface of the BR, so the downstream.

@miri64
Copy link
Member Author

miri64 commented Jan 29, 2019

I'm not sure I understand all details, but if tap0 is the upstream of the BR, then it should have a prefix different from the prefix in the WPAN. 2001:db8::1/64 could be a typical address of the WPAN interface of the BR, so the downstream.

That confirms my suspicion that the tunslip tools are enforcing wrong behavior.

@miri64
Copy link
Member Author

miri64 commented May 6, 2019

Ping?

@cladmi
Copy link
Contributor

cladmi commented May 6, 2019

As this changes the current behavior, it would be nice to document it in the commit.
Also are there anythings that should be re-tested in regards to the changed route handling ?

@miri64
Copy link
Member Author

miri64 commented May 6, 2019

As this changes the current behavior, it would be nice to document it in the commit.

Added an empty squash commit for that (5ea42f7 when writing this post).

Also are there anythings that should be re-tested in regards to the changed route handling ?

I suggest retesting the things above.

@miri64
Copy link
Member Author

miri64 commented Mar 6, 2020

sliptty is now merged (#10477) which provides an alternative to tunslip which, if I read #10627 (comment) correctly, configured the routes wrongly.

@benpicco
Copy link
Contributor

benpicco commented Mar 24, 2020

I can reproduce the issue:

Without this patch, when I restart the border router while pinging a node I get

64 bytes from 2001:db8::3c8b:2187:3d8b:2186: icmp_seq=571 ttl=63 time=30.6 ms
64 bytes from 2001:db8::3c8b:2187:3d8b:2186: icmp_seq=572 ttl=63 time=30.5 ms
64 bytes from 2001:db8::3c8b:2187:3d8b:2186: icmp_seq=573 ttl=63 time=29.6 ms
From fdea:dbee:f::1 icmp_seq=574 Time exceeded: Hop limit
From fdea:dbee:f::1 icmp_seq=575 Time exceeded: Hop limit
From fdea:dbee:f::1 icmp_seq=576 Time exceeded: Hop limit
From fdea:dbee:f::1 icmp_seq=577 Time exceeded: Hop limit
From fdea:dbee:f::1 icmp_seq=578 Time exceeded: Hop limit
From fdea:dbee:f::1 icmp_seq=579 Time exceeded: Hop limit
From fdea:dbee:f::1 icmp_seq=580 Time exceeded: Hop limit
From fdea:dbee:f::1 icmp_seq=581 Time exceeded: Hop limit
…
From fdea:dbee:f::1 icmp_seq=622 Time exceeded: Hop limit
From fdea:dbee:f::1 icmp_seq=623 Time exceeded: Hop limit
64 bytes from 2001:db8::3c8b:2187:3d8b:2186: icmp_seq=624 ttl=63 time=29.8 ms
64 bytes from 2001:db8::3c8b:2187:3d8b:2186: icmp_seq=625 ttl=63 time=30.1 m

until I restart the node or wait 1 minute.

With this patch routing continues right after the border router received a prefix again:

64 bytes from 2001:db8::3c8b:2187:3d8b:2186: icmp_seq=722 ttl=63 time=30.5 ms
64 bytes from 2001:db8::3c8b:2187:3d8b:2186: icmp_seq=723 ttl=63 time=30.8 ms
From fdea:dbee:f::1 icmp_seq=724 Time exceeded: Hop limit
From fdea:dbee:f::1 icmp_seq=725 Time exceeded: Hop limit
From fdea:dbee:f::1 icmp_seq=726 Time exceeded: Hop limit
From fdea:dbee:f::1 icmp_seq=727 Time exceeded: Hop limit
From fdea:dbee:f::1 icmp_seq=728 Time exceeded: Hop limit
From fdea:dbee:f::1 icmp_seq=729 Time exceeded: Hop limit
From fdea:dbee:f::1 icmp_seq=730 Time exceeded: Hop limit
From fdea:dbee:f::1 icmp_seq=731 Time exceeded: Hop limit
From fdea:dbee:f::1 icmp_seq=732 Time exceeded: Hop limit
64 bytes from 2001:db8::3c8b:2187:3d8b:2186: icmp_seq=740 ttl=63 time=29.9 ms
64 bytes from 2001:db8::3c8b:2187:3d8b:2186: icmp_seq=741 ttl=63 time=29.4 ms

@miri64
Copy link
Member Author

miri64 commented Mar 24, 2020

What does "routing continues" mean in this context?

@benpicco
Copy link
Contributor

The message gets delivered to the right node.

@miri64
Copy link
Member Author

miri64 commented Mar 24, 2020

So your assessment is, this fixes the issue?

@benpicco
Copy link
Contributor

benpicco commented Mar 24, 2020

I would say so.
I also read the discussion again and it makes sense that the BR should send out the packet to an unknown address if it's in the prefix of the PAN.

It also makes sense that the PAN and the upstream interface should have different prefixes. (But I made that mistake too at some point. Might start a wiki page with common IPv6 mistakes, those things just tend to work and first and then introduce subtle errors. Just like those self-assembling 6LoWPAN nodes…)

Please squash. (don't forget you added a commit message there in the last commit, so don't accidentally fixup)

@miri64
Copy link
Member Author

miri64 commented Mar 24, 2020

Please squash. (don't forget you added a commit message there in the last commit, so don't accidentally fixup)

That's why I marked it with squash! so git rebase -i --autosquash picks it up correctly ;-)

When pinging to a prefix for which there is a prefix list entry on the
node (so no next hop) but a default route, a packet to a non-existent
address under that prefix results in the packet being forwarded to the
default route instead. This fixes it, so the node tries address
resolution on the interface the prefix list entry is associated to.
@miri64
Copy link
Member Author

miri64 commented Mar 24, 2020

Squashed!

@miri64 miri64 added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Mar 24, 2020
Copy link
Contributor

@benpicco benpicco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change makes sense and improves the routing behavior.

@miri64 miri64 merged commit e47c8fa into RIOT-OS:master Mar 24, 2020
@miri64 miri64 deleted the gnrc_ipv6_nib/fix/forward-to-pl branch March 24, 2020 20:50
miri64 added a commit to miri64/RIOT that referenced this pull request Mar 25, 2020
In 06aa65e (RIOT-OS#10627) a new behavior was
introduced in IPv6 route resolution to try address resolution only at
interfaces that have the prefix of the address to be resolved configured
in the prefix list. This however only makes sense, if the prefix
configured is [on-link], otherwise there is small likelihood of the
address to be resolved being on that link.

For the error case presented for 06aa65e (circular routing at the border
router) this made sense, however within a 6LoWPAN, due to the prefix
being valid for the entire mesh, this leads to the nodes always trying
classic address resolution for in-network addresses instead of just
routing to the default route.
Classic address resolution however fails, as 6LoWPAN hosts typically
[don't join the solicited-node multicast address of their unicast
addresses][6LN-iface-init], resulting in in-network addresses not being
reachable.

As such, to prevent both error cases

- the fallback to address resolution by prefix list must only be used
  when the prefix is on-link,
- the prefix configured by DHCPv6/UHCP at the 6LoWPAN border router
  must be configured as on-link, but
- the prefix must not be advertised as on-link within the 6LoWPAN to
  still be [in line with RFC 6775][RFC-6775-forbidden]

With this change these cases are covered.

[on-link]: https://tools.ietf.org/html/rfc4861#page-6
[RFC 6775]: https://tools.ietf.org/html/rfc6775
[6LN-iface-init]: https://tools.ietf.org/html/rfc6775#section-5.2
[RFC-6775-forbidden]: https://tools.ietf.org/html/rfc6775#section-6.1
@miri64
Copy link
Member Author

miri64 commented Mar 25, 2020

Sadly, this broke routing within the a LoWPAN. See #13709.

miri64 added a commit to miri64/RIOT that referenced this pull request Mar 25, 2020
In 06aa65e (RIOT-OS#10627) a new behavior was
introduced in IPv6 route resolution to try address resolution only at
interfaces that have the prefix of the address to be resolved configured
in the prefix list. This however only makes sense, if the prefix
configured is [on-link], otherwise there is small likelihood of the
address to be resolved being on that link.

For the error case presented for 06aa65e (circular routing at the border
router) this made sense, however within a 6LoWPAN, due to the prefix
being valid for the entire mesh, this leads to the nodes always trying
classic address resolution for in-network addresses instead of just
routing to the default route.
Classic address resolution however fails, as 6LoWPAN hosts typically
[don't join the solicited-node multicast address of their unicast
addresses][6LN-iface-init], resulting in in-network addresses not being
reachable.

As such, to prevent both error cases

- the fallback to address resolution by prefix list must only be used
  when the prefix is on-link,
- the prefix configured by DHCPv6/UHCP at the 6LoWPAN border router
  must be configured as on-link, but
- the prefix must not be advertised as on-link within the 6LoWPAN to
  still be [in line with RFC 6775][RFC-6775-forbidden]

With this change these cases are covered.

[on-link]: https://tools.ietf.org/html/rfc4861#page-6
[RFC 6775]: https://tools.ietf.org/html/rfc6775
[6LN-iface-init]: https://tools.ietf.org/html/rfc6775#section-5.2
[RFC-6775-forbidden]: https://tools.ietf.org/html/rfc6775#section-6.1
@leandrolanzieri leandrolanzieri added this to the Release 2020.04 milestone Mar 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: network Area: Networking CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants