Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bgpd crashes in in vpn_leak_to_vrf_update_all #2473

Closed
nuqleo opened this issue Jun 16, 2018 · 8 comments
Closed

bgpd crashes in in vpn_leak_to_vrf_update_all #2473

nuqleo opened this issue Jun 16, 2018 · 8 comments
Assignees

Comments

@nuqleo
Copy link

nuqleo commented Jun 16, 2018

Hi,

Trying frr built from current git to run on Fedora28 with bgpd.conf:

!
! Zebra configuration saved from vty
!   2018/06/16 21:17:55
!
frr version 5.1-dev
frr defaults traditional
!
hostname frr
log file /var/log/frr/bgpd.log
!
!
router bgp 65001 vrf vrf11
 !
 address-family ipv4 unicast
  redistribute connected
  rd vpn export 65001:11
  rt vpn import 65001:12
  rt vpn export 65001:11
 exit-address-family
!
router bgp 65001 vrf vrf12
 !
 address-family ipv4 unicast
  redistribute connected
  rd vpn export 65001:12
  rt vpn import 65001:11
  rt vpn export 65001:12
 exit-address-family
!
line vty
!

Interfaces setup:

frr# show interface vrf all 
Interface eth0 is up, line protocol is up
  Link ups:       0    last: (never)
  Link downs:     0    last: (never)
  PTM status: disabled
  vrf: Default-IP-Routing-Table
  index 4 metric 0 mtu 1500 speed 0 
  flags: <UP,BROADCAST,RUNNING,MULTICAST>
  Type: Ethernet
  HWaddr: 00:16:3e:60:ca:79
  inet6 fe80::216:3eff:fe60:ca79/64
  Interface Type Other

Interface lo is up, line protocol is up
  Link ups:       0    last: (never)
  Link downs:     0    last: (never)
  PTM status: disabled
  vrf: Default-IP-Routing-Table
  index 1 metric 0 mtu 65536 speed 0 
  flags: <UP,LOOPBACK,RUNNING>
  Type: Loopback
  Interface Type Other

Interface eth0.11 is up, line protocol is up
  Link ups:       0    last: (never)
  Link downs:     0    last: (never)
  PTM status: disabled
  vrf: vrf11
  index 6 metric 0 mtu 1500 speed 0 
  flags: <UP,BROADCAST,RUNNING,MULTICAST>
  Type: Ethernet
  HWaddr: 00:16:3e:60:ca:79
  inet 10.0.11.1/24 broadcast 10.0.11.255
  inet6 fe80::216:3eff:fe60:ca79/64
  Interface Type Vlan
  VLAN Id 11
  Link ifindex 4(eth0)

Interface vrf11 is up, line protocol is up
  Link ups:       0    last: (never)
  Link downs:     0    last: (never)
  PTM status: disabled
  vrf: vrf11
  index 3 metric 0 mtu 65536 speed 0 
  flags: <UP,RUNNING,NOARP>
  Type: Ethernet
  HWaddr: 0e:e5:f7:f8:31:02
  Interface Type VRF

Interface eth0.12 is up, line protocol is up
  Link ups:       0    last: (never)
  Link downs:     0    last: (never)
  PTM status: disabled
  vrf: vrf12
  index 5 metric 0 mtu 1500 speed 0 
  flags: <UP,BROADCAST,RUNNING,MULTICAST>
  Type: Ethernet
  HWaddr: 00:16:3e:60:ca:79
  inet 10.0.12.1/24 broadcast 10.0.12.255
  inet6 fe80::216:3eff:fe60:ca79/64
  Interface Type Vlan
  VLAN Id 12
  Link ifindex 4(eth0)

Interface vrf12 is up, line protocol is up
  Link ups:       0    last: (never)
  Link downs:     0    last: (never)
  PTM status: disabled
  vrf: vrf12
  index 2 metric 0 mtu 65536 speed 0 
  flags: <UP,RUNNING,NOARP>
  Type: Ethernet
  HWaddr: 8e:72:a4:49:53:d4
  Interface Type VRF

But bgpd process crashes:

#0  0x00005555555d2070 in vpn_leak_to_vrf_update_all (bgp_vrf=bgp_vrf@entry=0x555555c937a0, bgp_vpn=0x0, afi=afi@entry=AFI_IP) at ../bgpd/bgp_table.h:154
#1  0x00005555555df4c5 in vpn_leak_postchange (bgp_vrf=0x555555c937a0, bgp_vpn=<optimized out>, afi=AFI_IP, direction=BGP_VPN_POLICY_DIR_FROMVPN) at ../bgpd/bgp_mplsvpn.h:202
#2  af_rt_vpn_imexport_magic (self=<optimized out>, no=<optimized out>, rtlist=<optimized out>, direction_str=<optimized out>, argv=<optimized out>, argc=<optimized out>, vty=<optimized out>) at bgp_vty.c:6561
#3  af_rt_vpn_imexport (self=<optimized out>, vty=<optimized out>, argc=<optimized out>, argv=<optimized out>) at ../bgpd/bgp_vty_clippy.c:308
#4  0x00007ffff7b4e9e7 in cmd_execute_command_real (vline=vline@entry=0x555555c8d540, vty=vty@entry=0x555555c8fd40, cmd=cmd@entry=0x0, filter=FILTER_STRICT) at lib/command.c:1046
#5  0x00007ffff7b50ff9 in cmd_execute_command_strict (vline=vline@entry=0x555555c8d540, vty=vty@entry=0x555555c8fd40, cmd=cmd@entry=0x0) at lib/command.c:1144
#6  0x00007ffff7b511a2 in command_config_read_one_line (vty=vty@entry=0x555555c8fd40, cmd=cmd@entry=0x0, use_daemon=use_daemon@entry=0) at lib/command.c:1293
#7  0x00007ffff7b5131f in config_from_file (vty=vty@entry=0x555555c8fd40, fp=fp@entry=0x555555881290, line_num=line_num@entry=0x7fffffffd21c) at lib/command.c:1339
#8  0x00007ffff7b94045 in vty_read_file (confp=0x555555881290) at lib/vty.c:2366
#9  vty_read_config (config_file=<optimized out>, config_default_dir=config_default_dir@entry=0x7ffff7dd5d20 <config_default> "") at lib/vty.c:2552
#10 0x00007ffff7b681df in frr_config_fork () at lib/libfrr.c:728
#11 0x000055555557efe8 in main (argc=1, argv=0x7fffffffe458) at bgp_main.c:429
@paulzlabn
Copy link
Contributor

Initial impressions before testing: I think this crash arises because bgp_vpn is NULL (see vpn_leak_to_vrf_update_all() in the stack trace), which implies bgp_get_default() returns NULL, which is probably due to a missing default bgp instance.

You will need a default instance in order to do any vrf importing/exporting in any case. That should enable you to avoid this crash by defining one.

(For my own future reference:) 1. Code should tolerate missing default instance and not crash; maybe generate error message; 2. Does default instance need to be defined in config before other instances? If not, does there need to be any handling for deferred vrf/vpn leak operations resulting from prior configuration?

@nuqleo
Copy link
Author

nuqleo commented Jun 24, 2018

No crash after I added default bgp instance.

/etc/frr/bgpd.conf

!
! Zebra configuration saved from vty
!   2018/06/25 00:02:22
!
frr version 5.1-dev
frr defaults traditional
!
hostname frr
log file /var/log/frr/bgpd.log
!
!
router bgp 65001
 bgp router-id 10.0.11.1
!
router bgp 65001 vrf vrf11
 !
 address-family ipv4 unicast
  redistribute connected
  rd vpn export 65001:11
  rt vpn import 65001:12
  rt vpn export 65001:11
 exit-address-family
!
router bgp 65001 vrf vrf12
 !
 address-family ipv4 unicast
  redistribute connected
  rd vpn export 65001:12
  rt vpn import 65001:11
  rt vpn export 65001:12
 exit-address-family
!
line vty
!

/etc/frr/zebra.conf

!
! Zebra configuration saved from vty
!   2018/06/25 00:02:21
!
frr version 5.1-dev
frr defaults traditional
!
hostname frr
log file /var/log/frr/zebra.log
!
!
!
!
!
!
line vty
!

But no routes was actually imported

frr# show bgp vrf all

Instance Default:
No BGP prefixes displayed, 0 exist

Instance vrf11:
BGP table version is 1, local router ID is 10.0.11.1, vrf id 3
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 10.0.11.0/24     0.0.0.0                  0         32768 ?

Displayed  1 routes and 1 total paths

Instance vrf12:
BGP table version is 1, local router ID is 10.0.12.1, vrf id 2
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 10.0.12.0/24     0.0.0.0                  0         32768 ?

Displayed  1 routes and 1 total paths
frr# show ip route vrf all 
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR,
       > - selected route, * - FIB route


VRF vrf11:
C>* 10.0.11.0/24 is directly connected, eth0.11, 00:04:39
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR,
       > - selected route, * - FIB route


VRF vrf12:
C>* 10.0.12.0/24 is directly connected, eth0.12, 00:04:39

/var/log/frr/zebra.log

2018/06/25 00:04:17 ZEBRA: Initializing internal label manager
2018/06/25 00:04:17 ZEBRA: zebra 5.1-dev starting: vty@2601
2018/06/25 00:04:17 ZEBRA: client 13 says hello and bids fair to announce only bgp routes vrf=0
2018/06/25 00:04:17 ZEBRA: Assigned Label Chunk 16 - 65 to bgp instance 0
2018/06/25 00:04:17 ZEBRA: zebra_redistribute_add: Specified Route Type 0 does not exist
2018/06/25 00:04:17 ZEBRA: zebra_redistribute_add: Specified Route Type 0 does not exist
2018/06/25 00:04:17 ZEBRA: zebra_redistribute_add: Specified Route Type 0 does not exist
2018/06/25 00:04:17 ZEBRA: zebra_redistribute_add: Specified Route Type 0 does not exist
2018/06/25 00:04:17 ZEBRA: zebra_redistribute_add: Specified Route Type 0 does not exist
2018/06/25 00:04:17 ZEBRA: zebra_redistribute_add: Specified Route Type 0 does not exist

@paulzlabn
Copy link
Contributor

paulzlabn commented Jun 24, 2018

Thank you for providing test results; at least that confirms my hunch. I am thinking about the correct way to deal with cases where the default instance is defined after the vpn instances.

In order to leak the routes, you need to explicitly enable leaking. If you want to leak routes in both directions between the vrfs, there are four leak paths that must be enabled:

  • vrf11 (unicast RIB) -> default (vpn RIB)
  • default (vpn RIB) -> vrf12 (unicast RIB)
  • vrf12 (unicast RIB) -> default (vpn RIB)
  • default (vpn RIB) -> vrf11 (unicast RIB)

(The current code does not support direct vrf->vrf leak paths, so they all have to go through the default instance vpn RIB)

So each vrf block should have "export vpn" and "import vpn":

router bgp 65001 vrf vrf11
 !
 address-family ipv4 unicast
  redistribute connected
  rd vpn export 65001:11
  rt vpn import 65001:12
  rt vpn export 65001:11
  export vpn
  import vpn
 exit-address-family
!
router bgp 65001 vrf vrf12
 !
 address-family ipv4 unicast
  redistribute connected
  rd vpn export 65001:12
  rt vpn import 65001:11
  rt vpn export 65001:12
  export vpn
  import vpn
 exit-address-family

(The manual describes this syntax in BGP>BGP Router Configuration>VRFs>VRF Route Leaking)

@nuqleo
Copy link
Author

nuqleo commented Jun 24, 2018

Thank you for information.

Sorry that asking here but direct vrf->vrf leak will be implemented in the future?

After I added export/import routes are shown in "show bgp" but not in "show ip route" - why?

frr# show bgp vrf all 

Instance Default:
No BGP prefixes displayed, 0 exist

Instance vrf11:
BGP table version is 1, local router ID is 10.0.11.1, vrf id 3
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 10.0.11.0/24     0.0.0.0                  0         32768 ?
   10.0.12.0/24     0.0.0.0@2<               0         32768 ?

Displayed  2 routes and 2 total paths

Instance vrf12:
BGP table version is 1, local router ID is 10.0.12.1, vrf id 2
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
   10.0.11.0/24     0.0.0.0@3<               0         32768 ?
*> 10.0.12.0/24     0.0.0.0                  0         32768 ?

Displayed  2 routes and 2 total paths
frr# show ip route vrf all 
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR,
       > - selected route, * - FIB route


VRF vrf11:
C>* 10.0.11.0/24 is directly connected, eth0.11, 00:06:02
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR,
       > - selected route, * - FIB route


VRF vrf12:
C>* 10.0.12.0/24 is directly connected, eth0.12, 00:06:02

Messages "zebra_redistribute_add: Specified Route Type 0 does not exist" and "zebra_redistribute_delete: Specified Route Type 0 does not exist" still appears, what they mean?

@paulzlabn
Copy link
Contributor

paulzlabn commented Jun 25, 2018

Sorry that asking here but direct vrf->vrf leak will be implemented in the future?

I do not know; probably best to ask on the frr users mailing list

After I added export/import routes are shown in "show bgp" but not in "show ip route" - why?

I think you are showing output from vtysh. "show bgp" shows the contents of the BGP routing tables, but "show ip route" shows the contents of the zebra routing tables. The BGP routes will only be sent to zebra if they are marked valid (*) and best (>).

It is possible you are encountering issue #2381. I put a speculative fix in PR #2540 but I have not had a chance to test it yet. Please feel free to try it out if you like.

@ton31337
Copy link
Member

@nuqleo is this happening with the latest release?

@nuqleo
Copy link
Author

nuqleo commented May 7, 2020

I didn't tried but I think that crash should not happen any more.

@ton31337
Copy link
Member

ton31337 commented May 8, 2020

@polychaeta autoclose in 1 day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants