Saturday, June 28, 2014

BGP Next-Hop Self, BGP Next-Hop Processing.

Lessons Learned:

Topology:















Issue, in a fully meshed iBGP implementation with multiple edge routers peering with the provider at different exit points. There can be a case where one edge router doesn’t have a full path the other edge router. When it comes to BGP as long as we have reachability we can establish adjacency.
One way to correct this issue would be to advertise the transit networks on the edge devices into our IGP.
This would allow full reachability to the other edge devices and our iBGP to install the routes in the routing table.

Before adding the transit network to for the peer advertising the 10.x.x.x routes. We see that we have no best path to these prefixes and they cannot be installed in to the routing table.

R8#sh ip route 10.3.1.0
% Network not in table

   Network          Next Hop            Metric LocPrf Weight Path
*>i4.4.4.0/24       100.100.4.4              0    100      0 i
*>i5.5.5.0/24       100.100.5.5              0    100      0 i
*>i6.6.6.0/24       100.100.6.6              0    100      0 i
* i10.1.1.0/24      204.12.28.254            0    100      0 400 i
* i10.2.1.0/24      204.12.28.254            0    100      0 400 i
* i10.3.1.0/24      204.12.28.254            0    100      0 400 i
*>i172.16.1.0/24    206.33.33.1              0    100      0 300 i
*>i172.16.2.0/24    206.22.22.2              0    100      0 200 i

-------------------------------------------------------------------------------------
After adding the transit network to into our IBGP

   Network          Next Hop            Metric LocPrf Weight Path
*>i4.4.4.0/24       100.100.4.4              0    100      0 i
*>i5.5.5.0/24       100.100.5.5              0    100      0 i
*>i6.6.6.0/24       100.100.6.6              0    100      0 i
*>i10.1.1.0/24      204.12.28.254            0    100      0 400 i
*>i10.2.1.0/24      204.12.28.254            0    100      0 400 i
*>i10.3.1.0/24      204.12.28.254            0    100      0 400 i
*>i172.16.1.0/24    206.33.33.1              0    100      0 300 i
*>i172.16.2.0/24    206.22.22.2              0    100      0 200 i


R8#sh ip route 10.3.1.0
Routing entry for 10.3.1.0/24
 Known via "bgp 100", distance 200, metric 0
  Tag 400, type internal
  Last update from 204.12.28.254 00:00:05 ago
  Routing Descriptor Blocks:
  * 204.12.28.254, from 100.100.4.4, 00:00:05 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 400

Now the route is in the table. It’s learned from BGP 100 (metric 200 iBGP).
We also the next hop of the route is 204.12.28.254 0 which is not directly connected to this router.

We can continue our recursive lookup by showing the route for 204.12.28.254

R8#sh ip route 204.12.28.254
Routing entry for 204.12.28.0/24
  Known via "eigrp 10", distance 90, metric 30720, type internal
  Redistributing via eigrp 10
  Last update from 192.168.48.4 on FastEthernet0/1, 00:03:49 ago
  Routing Descriptor Blocks:
  * 192.168.48.4, from 192.168.48.4, 00:03:49 ago, via FastEthernet0/1
      Route metric is 30720, traffic share count is 1
      Total delay is 200 microseconds, minimum bandwidth is 100000 Kbit
      Reliability 255/255, minimum MTU 1500 bytes
      Loading 1/255, Hops 1

R8#

We’re learning this route form 192.168.4.4 through our FastEthernet0/1 interface.

interface FastEthernet0/1
ip address 192.168.48.8 255.255.255.0
speed 100
full-duplex
end

We also could have done the verification by looking at the CEF table, ex:

R8#sh ip cef 204.12.28.254
204.12.28.0/24, version 57, epoch 0, cached adjacency 192.168.48.4
0 packets, 0 bytes
  via 192.168.48.4, FastEthernet0/1, 3 dependencies
    next hop 192.168.48.4, FastEthernet0/1
    valid cached adjacency
R8#

CEF in reality is pre-calculating the recursion and the layer 2 header that’s going to be used on that link

If we look at the #Sh IP CEF detail we can see the interface that it’s using.
And #sh ip cef internal –

R8#sh ip cef 204.12.28.254 internal – This destination
204.12.28.0/24, version 57, epoch 0, cached adjacency 192.168.48.4 – Points to this next Hop
0 packets, 0 bytes
  via 192.168.48.4, FastEthernet0/1, 3 dependencies – Out this interface.
    next hop 192.168.48.4, FastEthernet0/1
    valid cached adjacency
  refcount 5

The “SH ip cef Internal “would show the load distribution and if there was multiple paths. This lookup is ultimately what the routing process is looking for.

The above solution was to advertise a route to the next hop by adding the transit networks into our IGP.
Which is a perfectly valid design solution. The other solution is to change the next hop to something the routers already do have a route to.


We can do this by using our existing peer-group configuration;

Router BGP 100 
Neighbor IBGP_PEERS peer-group 
Neighbor IBGP_PEERS remote-as 100 
Neighbor
_PEERS update-source loopback 10
Neighbor 100.100.4.4 peer-group IBGP_PEERS
Neighbor 100.100.5.5 peer-group IBGP_PEERS
Neighbor 100.100.6.6 peer-group IBGP_PEERS
Neighbor 100.100.7.7 peer-group IBGP_PEERS
Neighbor 100.100.8.8 peer-group IBGP_PEERS

To test this I’ll remove the transit networks from our IGP for R4 and for R5
R4 = 204.12.28.253 255.255.255.0
R5 = 206.22.22.5 255.255.255.0

So under the peer-group, I need to say. When I learn a route from an EBGP neighbor and I turn around and advertise it to an iBGP neighbor, I want to set the next hop value to my own local peering address.

Ex:
Router bgp 100
Neighbor IBGP_PEERS next-hop-self

Now since my update source for the peer group is my loopback 10 interface, it would then mean than all of my BGP routes that are advertised to these neighbors are all going to have the next-hop address of the loopback 10.

Once I make the change I’m going to need to send an update to all my neighbors – I can do this is with the automatically enabled route-refresh capability.

We can do a clear ip bgp *
Or
We can just do a clear ip bgp 100 out – this will not reset the TCP session, it will just send them a new triggered updated.

We can now see that the next hops are all the Loopbacks:


   Network          Next Hop            Metric LocPrf Weight Path
*>i0.0.0.0          100.100.6.6              0    100      0 300 i
*>i4.4.4.0/24       100.100.4.4              0    100      0 i
*>i5.5.5.0/24       100.100.5.5              0    100      0 i
*>i6.6.6.0/24       100.100.6.6              0    100      0 i
*>i10.1.1.0/24      100.100.4.4              0    100      0 400 i
*>i10.2.1.0/24      100.100.4.4              0    100      0 400 i
*>i10.3.1.0/24      100.100.4.4              0    100      0 400 i
*>i172.16.1.0/24    100.100.6.6              0    100      0 300 i
*>i172.16.2.0/24    100.100.5.5              0    100      0 200 i
R7#

Also because my loopback is advertised in my IGP I should be able to perform route recursion and I can also verify that again via the CEF table.

R7#sh ip cef 10.1.1.0 detail
10.1.1.0/24, version 37, epoch 0, per-destination sharing
0 packets, 0 bytes
  via 100.100.4.4, 0 dependencies, recursive
    next hop 192.168.67.6, FastEthernet0/0 via 100.100.4.0/24
    valid adjacency
  Recursive load sharing using 100.100.4.0/24.
R7#

OR # sh IP cef internal – this will show how the load distribution for a route

R7#sh ip cef 10.1.1.0 inter
10.1.1.0/24, version 37, epoch 0, per-destination sharing
0 packets, 0 bytes
  via 100.100.4.4, 0 dependencies, recursive
    next hop 192.168.67.6, FastEthernet0/0 via 100.100.4.0/24
    valid adjacency

  Recursive load sharing using 100.100.4.0/24
  Load distribution: 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 (refcount 5)

  Hash  OK  Interface                 Address         Packets
  1     Y   FastEthernet0/0           192.168.67.6          0
  2     Y   FastEthernet0/1           192.168.57.5          0
  3     Y   FastEthernet0/0           192.168.67.6          0
  4     Y   FastEthernet0/1           192.168.57.5          0
  5     Y   FastEthernet0/0           192.168.67.6          0
  6     Y   FastEthernet0/1           192.168.57.5          0
  7     Y   FastEthernet0/0           192.168.67.6          0
  8     Y   FastEthernet0/1           192.168.57.5          0
  9     Y   FastEthernet0/0           192.168.67.6          0
  10    Y   FastEthernet0/1           192.168.57.5          0
  11    Y   FastEthernet0/0           192.168.67.6          0
  12    Y   FastEthernet0/1           192.168.57.5          0
  13    Y   FastEthernet0/0           192.168.67.6          0
  14    Y   FastEthernet0/1           192.168.57.5          0
  15    Y   FastEthernet0/0           192.168.67.6          0
  16    Y   FastEthernet0/1           192.168.57.5          0
  refcount 5
R7#


So the next-hop self-command basically says: When learning routes from my EBGP Peers, and I then send those routes on to my iBGP peers. Change the next-hop value so my own local loopback interface – in this case loopback 10.

We’re using the loopback 10 because that’s what’s called out in the “update-source command”. Ex, if my update-source was set to my FA0/0 interface that’s what the next hop would be changed to.

Next-HOP – Route-map.
We can also use a route-map to manually change the next hop value.
So based on the topology, I’ll use router7.

Current BGP config:

router bgp 100
 no synchronization
 bgp log-neighbor-changes
 neighbor IBGP_PEERS peer-group
 neighbor IBGP_PEERS remote-as 100
 neighbor IBGP_PEERS update-source Loopback10
 neighbor IBGP_PEERS next-hop-self
 neighbor 100.100.4.4 peer-group IBGP_PEERS
 neighbor 100.100.5.5 peer-group IBGP_PEERS
 neighbor 100.100.6.6 peer-group IBGP_PEERS
 neighbor 100.100.8.8 peer-group IBGP_PEERS
 no auto-summary

New config:


route-map CHANGE_NEXT_HOP permit 10
 set ip next-hop 100.100.7.7

Router bgp 100
 Neighbor IBGP_PEERS route-map CHANGE_NEXT_HOP out

---------------------------------------

Note: for the next-hop value, all we’re really looking for is that the other routers can do the full route recursion. If I set the next to a value that’s not in the routing table, that’s not going to work.

We can use out own IP address the neighbors or even a physical link. The only issue is if we use a physical link and the link goes down. The BGP peers would to be able to use any of those routes associated to that router.

Where the physical interface can be a good thing, is if we want to control traffic and if the interface does go down, we don’t want traffic to choose another path.

We can even use and IP SLA statement to track the interface and then tie it to a static route to then set the next hop value.

Example:

On R4 – which peers with an EBGP Neighbor (R1)

ip sla 1
 icmp-echo 10.3.1.1
 timeout 2000
 frequency 5

ip sla schedule 1 life forever start-time now

ip route 69.254.0.1 255.255.255.255 Null0 track 2 (note this is the IPv4 automatic IP)

route-map CHANGE_NEXT_HOP permit 10
 set ip next-hop 169.254.0.1

Router bgp 100
 Neighbor IBGP_PEERS route-map CHANGE_NEXT_HOP out.

Just verifying I can actually ping the IP address.

---------------------------------------------------------------------------------
R4(config-sla-monitor-echo)#do ping 10.3.1.1

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.3.1.1, timeout is 2 seconds:
!!!!!                                                                                                                             
--------------------------------------------------------------------------------- 


So now if the route goes down the IPSA will kick in, remote the tracked route and my routes will now show as invalid paths. 

No comments:

Post a Comment