Saturday, June 28, 2014

BGP Next-Hop Self, BGP Next-Hop Processing.

Lessons Learned:

Topology:















Issue, in a fully meshed iBGP implementation with multiple edge routers peering with the provider at different exit points. There can be a case where one edge router doesn’t have a full path the other edge router. When it comes to BGP as long as we have reachability we can establish adjacency.
One way to correct this issue would be to advertise the transit networks on the edge devices into our IGP.
This would allow full reachability to the other edge devices and our iBGP to install the routes in the routing table.

Before adding the transit network to for the peer advertising the 10.x.x.x routes. We see that we have no best path to these prefixes and they cannot be installed in to the routing table.

R8#sh ip route 10.3.1.0
% Network not in table

   Network          Next Hop            Metric LocPrf Weight Path
*>i4.4.4.0/24       100.100.4.4              0    100      0 i
*>i5.5.5.0/24       100.100.5.5              0    100      0 i
*>i6.6.6.0/24       100.100.6.6              0    100      0 i
* i10.1.1.0/24      204.12.28.254            0    100      0 400 i
* i10.2.1.0/24      204.12.28.254            0    100      0 400 i
* i10.3.1.0/24      204.12.28.254            0    100      0 400 i
*>i172.16.1.0/24    206.33.33.1              0    100      0 300 i
*>i172.16.2.0/24    206.22.22.2              0    100      0 200 i

-------------------------------------------------------------------------------------
After adding the transit network to into our IBGP

   Network          Next Hop            Metric LocPrf Weight Path
*>i4.4.4.0/24       100.100.4.4              0    100      0 i
*>i5.5.5.0/24       100.100.5.5              0    100      0 i
*>i6.6.6.0/24       100.100.6.6              0    100      0 i
*>i10.1.1.0/24      204.12.28.254            0    100      0 400 i
*>i10.2.1.0/24      204.12.28.254            0    100      0 400 i
*>i10.3.1.0/24      204.12.28.254            0    100      0 400 i
*>i172.16.1.0/24    206.33.33.1              0    100      0 300 i
*>i172.16.2.0/24    206.22.22.2              0    100      0 200 i


R8#sh ip route 10.3.1.0
Routing entry for 10.3.1.0/24
 Known via "bgp 100", distance 200, metric 0
  Tag 400, type internal
  Last update from 204.12.28.254 00:00:05 ago
  Routing Descriptor Blocks:
  * 204.12.28.254, from 100.100.4.4, 00:00:05 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 400

Now the route is in the table. It’s learned from BGP 100 (metric 200 iBGP).
We also the next hop of the route is 204.12.28.254 0 which is not directly connected to this router.

We can continue our recursive lookup by showing the route for 204.12.28.254

R8#sh ip route 204.12.28.254
Routing entry for 204.12.28.0/24
  Known via "eigrp 10", distance 90, metric 30720, type internal
  Redistributing via eigrp 10
  Last update from 192.168.48.4 on FastEthernet0/1, 00:03:49 ago
  Routing Descriptor Blocks:
  * 192.168.48.4, from 192.168.48.4, 00:03:49 ago, via FastEthernet0/1
      Route metric is 30720, traffic share count is 1
      Total delay is 200 microseconds, minimum bandwidth is 100000 Kbit
      Reliability 255/255, minimum MTU 1500 bytes
      Loading 1/255, Hops 1

R8#

We’re learning this route form 192.168.4.4 through our FastEthernet0/1 interface.

interface FastEthernet0/1
ip address 192.168.48.8 255.255.255.0
speed 100
full-duplex
end

We also could have done the verification by looking at the CEF table, ex:

R8#sh ip cef 204.12.28.254
204.12.28.0/24, version 57, epoch 0, cached adjacency 192.168.48.4
0 packets, 0 bytes
  via 192.168.48.4, FastEthernet0/1, 3 dependencies
    next hop 192.168.48.4, FastEthernet0/1
    valid cached adjacency
R8#

CEF in reality is pre-calculating the recursion and the layer 2 header that’s going to be used on that link

If we look at the #Sh IP CEF detail we can see the interface that it’s using.
And #sh ip cef internal –

R8#sh ip cef 204.12.28.254 internal – This destination
204.12.28.0/24, version 57, epoch 0, cached adjacency 192.168.48.4 – Points to this next Hop
0 packets, 0 bytes
  via 192.168.48.4, FastEthernet0/1, 3 dependencies – Out this interface.
    next hop 192.168.48.4, FastEthernet0/1
    valid cached adjacency
  refcount 5

The “SH ip cef Internal “would show the load distribution and if there was multiple paths. This lookup is ultimately what the routing process is looking for.

The above solution was to advertise a route to the next hop by adding the transit networks into our IGP.
Which is a perfectly valid design solution. The other solution is to change the next hop to something the routers already do have a route to.


We can do this by using our existing peer-group configuration;

Router BGP 100 
Neighbor IBGP_PEERS peer-group 
Neighbor IBGP_PEERS remote-as 100 
Neighbor
_PEERS update-source loopback 10
Neighbor 100.100.4.4 peer-group IBGP_PEERS
Neighbor 100.100.5.5 peer-group IBGP_PEERS
Neighbor 100.100.6.6 peer-group IBGP_PEERS
Neighbor 100.100.7.7 peer-group IBGP_PEERS
Neighbor 100.100.8.8 peer-group IBGP_PEERS

To test this I’ll remove the transit networks from our IGP for R4 and for R5
R4 = 204.12.28.253 255.255.255.0
R5 = 206.22.22.5 255.255.255.0

So under the peer-group, I need to say. When I learn a route from an EBGP neighbor and I turn around and advertise it to an iBGP neighbor, I want to set the next hop value to my own local peering address.

Ex:
Router bgp 100
Neighbor IBGP_PEERS next-hop-self

Now since my update source for the peer group is my loopback 10 interface, it would then mean than all of my BGP routes that are advertised to these neighbors are all going to have the next-hop address of the loopback 10.

Once I make the change I’m going to need to send an update to all my neighbors – I can do this is with the automatically enabled route-refresh capability.

We can do a clear ip bgp *
Or
We can just do a clear ip bgp 100 out – this will not reset the TCP session, it will just send them a new triggered updated.

We can now see that the next hops are all the Loopbacks:


   Network          Next Hop            Metric LocPrf Weight Path
*>i0.0.0.0          100.100.6.6              0    100      0 300 i
*>i4.4.4.0/24       100.100.4.4              0    100      0 i
*>i5.5.5.0/24       100.100.5.5              0    100      0 i
*>i6.6.6.0/24       100.100.6.6              0    100      0 i
*>i10.1.1.0/24      100.100.4.4              0    100      0 400 i
*>i10.2.1.0/24      100.100.4.4              0    100      0 400 i
*>i10.3.1.0/24      100.100.4.4              0    100      0 400 i
*>i172.16.1.0/24    100.100.6.6              0    100      0 300 i
*>i172.16.2.0/24    100.100.5.5              0    100      0 200 i
R7#

Also because my loopback is advertised in my IGP I should be able to perform route recursion and I can also verify that again via the CEF table.

R7#sh ip cef 10.1.1.0 detail
10.1.1.0/24, version 37, epoch 0, per-destination sharing
0 packets, 0 bytes
  via 100.100.4.4, 0 dependencies, recursive
    next hop 192.168.67.6, FastEthernet0/0 via 100.100.4.0/24
    valid adjacency
  Recursive load sharing using 100.100.4.0/24.
R7#

OR # sh IP cef internal – this will show how the load distribution for a route

R7#sh ip cef 10.1.1.0 inter
10.1.1.0/24, version 37, epoch 0, per-destination sharing
0 packets, 0 bytes
  via 100.100.4.4, 0 dependencies, recursive
    next hop 192.168.67.6, FastEthernet0/0 via 100.100.4.0/24
    valid adjacency

  Recursive load sharing using 100.100.4.0/24
  Load distribution: 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 (refcount 5)

  Hash  OK  Interface                 Address         Packets
  1     Y   FastEthernet0/0           192.168.67.6          0
  2     Y   FastEthernet0/1           192.168.57.5          0
  3     Y   FastEthernet0/0           192.168.67.6          0
  4     Y   FastEthernet0/1           192.168.57.5          0
  5     Y   FastEthernet0/0           192.168.67.6          0
  6     Y   FastEthernet0/1           192.168.57.5          0
  7     Y   FastEthernet0/0           192.168.67.6          0
  8     Y   FastEthernet0/1           192.168.57.5          0
  9     Y   FastEthernet0/0           192.168.67.6          0
  10    Y   FastEthernet0/1           192.168.57.5          0
  11    Y   FastEthernet0/0           192.168.67.6          0
  12    Y   FastEthernet0/1           192.168.57.5          0
  13    Y   FastEthernet0/0           192.168.67.6          0
  14    Y   FastEthernet0/1           192.168.57.5          0
  15    Y   FastEthernet0/0           192.168.67.6          0
  16    Y   FastEthernet0/1           192.168.57.5          0
  refcount 5
R7#


So the next-hop self-command basically says: When learning routes from my EBGP Peers, and I then send those routes on to my iBGP peers. Change the next-hop value so my own local loopback interface – in this case loopback 10.

We’re using the loopback 10 because that’s what’s called out in the “update-source command”. Ex, if my update-source was set to my FA0/0 interface that’s what the next hop would be changed to.

Next-HOP – Route-map.
We can also use a route-map to manually change the next hop value.
So based on the topology, I’ll use router7.

Current BGP config:

router bgp 100
 no synchronization
 bgp log-neighbor-changes
 neighbor IBGP_PEERS peer-group
 neighbor IBGP_PEERS remote-as 100
 neighbor IBGP_PEERS update-source Loopback10
 neighbor IBGP_PEERS next-hop-self
 neighbor 100.100.4.4 peer-group IBGP_PEERS
 neighbor 100.100.5.5 peer-group IBGP_PEERS
 neighbor 100.100.6.6 peer-group IBGP_PEERS
 neighbor 100.100.8.8 peer-group IBGP_PEERS
 no auto-summary

New config:


route-map CHANGE_NEXT_HOP permit 10
 set ip next-hop 100.100.7.7

Router bgp 100
 Neighbor IBGP_PEERS route-map CHANGE_NEXT_HOP out

---------------------------------------

Note: for the next-hop value, all we’re really looking for is that the other routers can do the full route recursion. If I set the next to a value that’s not in the routing table, that’s not going to work.

We can use out own IP address the neighbors or even a physical link. The only issue is if we use a physical link and the link goes down. The BGP peers would to be able to use any of those routes associated to that router.

Where the physical interface can be a good thing, is if we want to control traffic and if the interface does go down, we don’t want traffic to choose another path.

We can even use and IP SLA statement to track the interface and then tie it to a static route to then set the next hop value.

Example:

On R4 – which peers with an EBGP Neighbor (R1)

ip sla 1
 icmp-echo 10.3.1.1
 timeout 2000
 frequency 5

ip sla schedule 1 life forever start-time now

ip route 69.254.0.1 255.255.255.255 Null0 track 2 (note this is the IPv4 automatic IP)

route-map CHANGE_NEXT_HOP permit 10
 set ip next-hop 169.254.0.1

Router bgp 100
 Neighbor IBGP_PEERS route-map CHANGE_NEXT_HOP out.

Just verifying I can actually ping the IP address.

---------------------------------------------------------------------------------
R4(config-sla-monitor-echo)#do ping 10.3.1.1

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.3.1.1, timeout is 2 seconds:
!!!!!                                                                                                                             
--------------------------------------------------------------------------------- 


So now if the route goes down the IPSA will kick in, remote the tracked route and my routes will now show as invalid paths. 

Monday, June 23, 2014

BGP Local AS, BGP Peer Groups.

Lessons Learned:
iBGP Peering Rules:

iBGP packets default to TTL 255
-implies neighbors do not have to be connected as long as IGP reachability exists
Loop preventions via route filtering
-iBGP learned routes cannot be advertised on to another iBGP neighbor.
-implies need for either….. .
--Fully meshed iBGP peerings
--Route reflection
--Confederation.

====================================

Topology:















Now before we even configure BGP we can test the low level connectivity by sending a telnet to the peer IP address using port 179. This would imply that the remote session is configured, it it’s not configured on the way back the connection will simple be refused.

We can verify basic TCP reachability with just a basic Telnet session to the IP of the peer. This can also rule any filters in the network.

Note: There can be design cases where s single router is peering with an EBGP router that is in a different AS than the rest of the routers, for example during a migration from.
To handle this under the BGP process we would configure
The peer address #neighbor x.x.x.x remote-as XX
Then we would need to configure the local AS (migrating one)
#Neighbor x.x.x.x remote-as local-as XX
We can also send two separate open message for to the new AS and one for the migrating AS for ex:
# Neighbor x.x.x.x remote-as local-as XX no-prepend replace-as dual-as              

All internal routers in the topology are configured for AS 100
Each edge routers is configured for AS 400 / 200 / and 300 respectively

We can verify using a simple command of # s hip bgp summary

Neighbor        V     AS        MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
204.12.28.253   4   100                   55      55              5     0    0                00:51:07        1
What we want to see here is the for starters – any integer 0 or above under the State/PfxRcd – will show the number of routes we’re receiving from that neighbor. What we don’t want to see is Active of Idle – this will indicate something is wrong with the peering.

If there are peering issues we can use a # debug IP bgp or # debug ip packet – as long as we apply an ACL that permits only BGP
Note: verify there’s no authentication configured before debugs.

Ex:
#access-list 100 permit tcp any eq bgp any
#access-list 100 permit tcp and any eq bgp
# debug IP packet detail 100
If possible we can always do a #clear ip bgp *


Now based on the topology – without having to do route-reflection or confederation. It means we will need a full mesh of peering’s in the topology. So configuration wise it would mean that each router would need a different neighbor statement pointed at each router in the topology.

Which is not feasible with 100 + routers in the topology.
One way we can simply this is to take the iBGP peers and put them into a template of configuration that is called a peer group.

The peer group will be actual optimization the update state machine because there is one update sent to the entire peer group instead of individual updates that are sent to the neighbors.

To configure the Peer Groups:

Start the BGP process

Router BGP 100 –start process
Neighbor IBGP_PEERS peer-group - give it a name
Neighbor IBGP_PEERS remote-as 100 - all peers will be in AS 100
Neighbor IBGP_PEERS update-source Loopback0 – optional peer feature
Neighbor 100.100.4.4 peer-group IBGP_PEERS – all iBGP peer addresses.
Neighbor 100.100.5.5 peer-group IBGP_PEERS
Neighbor 100.100.6.6 peer-group IBGP_PEERS
Neighbor 100.100.7.7 peer-group IBGP_PEERS
Neighbor 100.100.8.8 peer-group IBGP_PEERS


Now what makes easier about this configuration - not only is it an optimization behind the scenes of how the update process works, it makes it easier because we can apply this template onto all of the routers.

Note: you won’t be able to peer with your own local address.
% Cannot configure the local system as neighbor

Now if we look at the #sh ip bgp summary we should see all the routes are up and that we still have a peering with the EBGP peers.

R5#sh ip bgp summary
BGP router identifier 100.100.5.5, local AS number 100
BGP table version is 7, main routing table version 7
8 network entries using 936 bytes of memory
8 path entries using 416 bytes of memory
6/3 BGP path/bestpath attribute entries using 744 bytes of memory
3 BGP AS-PATH entries using 72 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 2168 total bytes of memory
BGP activity 12/4 prefixes, 12/4 paths, scan interval 60 secs

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
100.100.4.4     4   100       8       8        7    0    0 00:02:22        4
100.100.6.6     4   100       6       6        7    0    0 00:00:08        2
100.100.7.7     4   100       6       8        7    0    0 00:02:52        0
100.100.8.8     4   100       4       6        7    0    0 00:00:38        0
206.22.22.2     4   200     148     152        7    0    0 00:09:19        1
R5#

So now – we should have a full mesh of peering’s inside the network as well as the peering’s for the EBGP neighbors.
The result is that for every peer – we should see the EBGP update that came in was passed along ot all the IBGP neighbors.

However – once the route comes in it will not be advertises the neighbors based on the iBGP rules.
Without route-reflection you cannot exchange any iBGP learned routes to other iBGP neighbors.

We should only see the routes updated from the EBGP source:

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
100.100.4.4     4   100      15      15        7    0    0 00:09:45        4
100.100.6.6     4   100      13      13        7    0    0 00:07:30        2
100.100.7.7     4   100      14      16        7    0    0 00:10:14        0
100.100.8.8     4   100      12      14        7    0    0 00:08:01        0
206.22.22.2     4   200     155     159        7    0    0 00:16:42        1
R5#
R5#sh ip bgp summary
BGP router identifier 100.100.5.5, local AS number 100
BGP table version is 8, main routing table version 8
9 network entries using 1053 bytes of memory
9 path entries using 468 bytes of memory
6/3 BGP path/bestpath attribute entries using 744 bytes of memory
3 BGP AS-PATH entries using 72 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 2337 total bytes of memory
BGP activity 13/4 prefixes, 13/4 paths, scan interval 60 secs

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
100.100.4.4     4   100      17      18        8    0    0 00:11:41        4
100.100.6.6     4   100      15      16        8    0    0 00:09:26        2
100.100.7.7     4   100      16      19        8    0    0 00:12:10        0
100.100.8.8     4   100      13      16        8    0    0 00:09:57        0
206.22.22.2     4   200     158     161        8    0    0 00:18:38        2
R5#sh ip bgp
R5#sh ip bgp
BGP table version is 8, local router ID is 100.100.5.5
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*>i4.4.4.0/24       100.100.4.4              0    100      0 i
*> 5.5.5.0/24       0.0.0.0                  0         32768 i
*>i6.6.6.0/24       100.100.6.6              0    100      0 i
* i10.1.1.0/24      204.12.28.254            0    100      0 400 i
* i10.2.1.0/24      204.12.28.254            0    100      0 400 i
* i10.3.1.0/24      204.12.28.254            0    100      0 400 i
* i172.16.1.0/24    206.33.33.1              0    100      0 300 i
*> 172.16.2.0/24    206.22.22.2              0             0 200 i
*> 222.222.222.0    206.22.22.2              0             0 200 i
R5#
R5#

How to read the output of #sh ip bgp

Starting from the left.
Between the asterisks and the lowercase letter I – we should be seeing the > sign (or a null value, depending on who we learn the update from) sign – this indicates the best route. The best route is the one that’s candidate to be installed in the routing table and the one we advertise.

The lower case letter I – this mean the route came from an internal BGP peer.
Next we have the actual prefix – note: if the subnet mask doesn’t show up here, is means they have the classful mask. Anything that is a subnet or an aggregate is going to show the actual mask value.
Next it the next hop value, this is what we would need to know via IGP in order to actual use the prefix.
Next is the MED – which is a non-transitive attribute. Only locally significant between me and my directly connected AS.
The loca-pref is 100 by default
The Weight value is zero for anything that’s not originated
Then the AS-PATH
Followed by the origin code. Where lowercase I for igp is better than ? for incomplete

Note: anytime we redistribute a route into BGP it’s going to get the origin code of incomplete. This would be less preferred that a route that was configured under the network statement under BGP process.

R5#sh ip route bgp
B    222.222.222.0/24 [20/0] via 206.22.22.2, 00:18:11
     4.0.0.0/24 is subnetted, 1 subnets
B       4.4.4.0 [200/0] via 100.100.4.4, 00:29:45
     6.0.0.0/24 is subnetted, 1 subnets
B       6.6.6.0 [200/0] via 100.100.6.6, 00:27:30
     172.16.0.0/24 is subnetted, 1 subnets
B       172.16.2.0 [20/0] via 206.22.22.2, 00:36:07
R5#


Here we can see that the routers is only installing the routes learned from the EBGP neighbors instead of the iBGP neighbors. The reason is as issue-with the next-hop reachability. 

Sunday, June 22, 2014

BGP 4-Byte ASN’s

Lessons Learned: 

4-Byte BGP ASNs
0.0   – 65535.65535 notation
-0. [0-65535] denote original 2-byte ASNs
Requires backwards compatibility with old code
-4 byte ASN support negotiated during capability exchange
-“OLD” BGP speakers are sent ASdot numbers encoded as ASN 23456
-Real AS-Path encoded with optional transitive attributes AS4_AGGREGATOR as AS4-PATH

Most Devices that are running later versions of 12.4T code will support the 4-Byte AS.
The quickest clue will be under the BGP process – when you configure the AS number, if you’re not allowed to add for example:
Router bgp 1.5 – Then the code you’re running does not support the newer 4-Byte AS Numbers.

To view the 4 Byte field you can simply use # sh ip bgp command – under the AS Path you will see the 4 byte number.

Note: From the perspective of any system that only supports the two Byte AS number, essential the Local AS that it will send to its peer is AS# 23456 (or HEX 5BA0). Also any 4 –byte AS that is in the AS path will be encoded as AS# 23456.

There’s an encoding that happens on routers that do not support the 4 byte that converts the AS to 23456 – so the path might look like 4 23456 23456 23456. The real path un-encoded would look like 4 1.4 1.5 1.6, etc.

Key point is that from the device that only support only the 2-Byte AS – they need to say that the remote as is 23456.


Note: One thing to be aware of – if you’re doing EBGP multi-hop peering, we need to make sure not to introduce a “BGP race condition” in the case that the neighbors address we’re learning is also a route in BGP. By peering with neighbor that’s already advertising that same peer prefix into BGP. This could cause a BGP timeout because BGP cannot rely on itself for transport. 

Establishing BGP peering’s, EBGP Mulihop, BGP Neighbor Disable Connected Check

Lessons learned:

BGP Transport
BGP used TCP port 179 for transport                       
-implies that BGP needs IGP first
BGP neighbor statement tells process to….
-listen for remote address via TCP 179
-initiate a session to remote address via TCP 179
-if collision, higher router-id becomes TCP client

If we cannot establish basic IP reachability then we cannot establish the TCP session.

Enabling basic BGP process between two routers.
TOPOLOGY:













Let’s firs setup a basic peering – from the topology I will user R9 and R8, with their AS numbers being the numbers of the routers.

Before we start the process – we can turn on a few debugs -
One to look at the actual BGP peering messages and the other is a low level debug for the IP transport.

The current underlying IGP is EIGRP. When we run the debug we want to make sure we filter out the EIGRP and only debug the BGP packets.
To do this we will need to create an extended access-list.

  access-list 100 deny eigrp any any
  access-list 100 permit ip any any

We can then turn on the debug on both R8 and R9. So then in our output we shouldn’t see any EIGRP hellos and updates.

R9#debug ip packet detail 100
IP packet debugging is on (detailed) for access list 100
R9#
-------------------------------------------------------------------------------------------------------------------------------

We can verify the debug is working by sending a ping to neighbor and look at the debug output.
R9#ping 192.168.89.8

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.89.8, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/8/16 ms
R9#
*Mar  1 02:11:19.183: IP: tableid=0, s=192.168.89.9 (local), d=192.168.89.8 (Serial0/0), routed via FIB
*Mar  1 02:11:19.187: IP: s=192.168.89.9 (local), d=192.168.89.8 (Serial0/0), len 100, sending
*Mar  1 02:11:19.187:     ICMP type=8, code=0
*Mar  1 02:11:19.191: IP: tableid=0, s=192.168.89.8 (Serial0/0), d=192.168.89.9 (Serial0/0), routed via RIB

-------------------------------------------------------------------------------------------------------------------------------
FYI this debug generates a large amount of data – we would normally want send this output to a syslog of buffer.
To limit the output we can also turn off timestamps
EX:
R9(config)#no service timestamps

We will also turn on debug ip bgp
This will essential turn on all debugs for the address family – Ipv4 Unicast.

R9#debug ip bgp
BGP debugging is on for address family: IPv4 Unicast
R9#

Under each router I’ve setup basic BGP peering. With the neighbor remote-as command and advertised each routers loopbacks networks
R8 = 8.8.8.0 /24
R9 = 9.9.9.0/24
===================

IP: s=192.168.89.8 (Serial0/0), d=192.168.89.9, len 44, rcvd 0

    TCP src=17950, dst=179, seq=488103526, ack=0, win=16384 SYN   -- Note: here from the debug we see the TCP src port, then the destination is port = 179 . We also see this is a SYN packet.

IP: tableid=0, s=192.168.89.9 (local), d=192.168.89.8 (Serial0/0), routed via FIB
IP: s=192.168.89.9 (local), d=192.168.89.8 (Serial0/0), len 40, sending
    TCP src=179, dst=17950, seq=0, ack=488103527, win=0 ACK RST

Notice: the final part of the output is an ACK RST> this is because I have not yet added the neighbor statement on this router only (R9)
Now here’s the full output after Ive added the neighbor statement.

BGP: 192.168.89.8 went from OpenConfirm to Established
%BGP-5-ADJCHANGE: neighbor 192.168.89.8 Up
IP: tableid=0, s=192.168.89.9 (local), d=192.168.89.8 (Serial0/0), routed via FIB
IP: s=192.168.89.9 (local), d=192.168.89.8 (Serial0/0), len 92, sending
    TCP src=179, dst=29773, seq=2220868845, ack=3315269002, win=16268 ACK PSH
IP: s=192.168.89.8 (Serial0/0), d=192.168.89.9, len 78, rcvd 0
    TCP src=29773, dst=179, seq=3315269002, ack=2220868897, win=16268 ACK PSH
IP: tableid=0, s=192.168.89.9 (local), d=192.168.89.8 (Serial0/0), routed via FIB
IP: s=192.168.89.9 (local), d=192.168.89.8 (Serial0/0), len 78, sending
    TCP src=179, dst=29773, seq=2220868897, ack=3315269040, win=16230 ACK PSH
IP: s=192.168.89.8 (Serial0/0), d=192.168.89.9, len 40, rcvd 0
    TCP src=29773, dst=179, seq=3315269040, ack=2220868935, win=16230 ACK

Once I configured the neighbor statement, the router is now listening for the session. The key is that BGP is not dynamic and it cannot learn its peers automatically they must be specified manually.

Note: It’s important in the BGP network to figure out what is the actual route between the two neighbors before we actually establish the peering. This scenario was between two neighbors only but you can see in the RST in the debugs, if there was multiple paths and the router was not configured correctly, it could keep us from establishing the peering. Especially if the routers are more than one HOP away. It’s then going to depend on the routing table to determine where the session is going to be allowed.


-------------------------------  
 EBGP Multi-hop.

Based on the topology let’s assume now that R9 (BGP AS9) want to peer with R3 (BGP AS 3) – this would be considered a multi-hop BGP peering.
When R9 configures the neighbor statement – let’s say we point it at the FA0/1 interface on R3. When R3 configures the neighbor statement we point the neighbor statement at S0/0 interface of Router 9.

Now we need to consider that when R9 generates the TCP packet that’s going to R3. What is the local interface we would use to reach that destination?

We can obviously verify this by looking at the routing table for that route.

R3:
FastEthernet0/1            172.16.37.3

R9#sh ip route 172.16.37.3
Routing entry for 172.16.37.0/24
  Known via "eigrp 1", distance 90, metric 2172416, type internal
  Redistributing via eigrp 1
  Last update from 172.16.79.7 on Serial0/1, 03:20:17 ago -- this is the source of the packet to R3.
  Routing Descriptor Blocks:
  * 172.16.79.7, from 172.16.79.7, 03:20:17 ago, via Serial0/1
      Route metric is 2172416, traffic share count is 1
      Total delay is 20100 microseconds, minimum bandwidth is 1544 Kbit
      Reliability 255/255, minimum MTU 1500 bytes
      Loading 1/255, Hops 1

Problem: on R3 we cannot guarantee the return path to R9 is going to be via the same interface.

R3#sh ip route 172.16.79.9
Routing entry for 172.16.79.0/24
  Known via "eigrp 1", distance 90, metric 2172416, type internal
  Redistributing via eigrp 1
  Last update from 172.16.37.7 on FastEthernet0/1, 02:28:42 ago
  Routing Descriptor Blocks:
  * 172.16.37.7, from 172.16.37.7, 02:28:42 ago, via FastEthernet0/1
      Route metric is 2172416, traffic share count is 1
      Total delay is 20100 microseconds, minimum bandwidth is 1544 Kbit
      Reliability 255/255, minimum MTU 1500 bytes
      Loading 1/255, Hops 1

In this case it is coming from the same interface. This means that each BGP update it going to come from the IP address: 172.16.37.7

IF THIS DOES NOT match the neighbor statement that’s configured under the process, then the TCP session will not work.

The way around this issue is by using the # update-source command. In any situation that we’re not peering over a direct connection between the neighbors, we should manually specify where the packet is coming from so the remote end is then going to agree with that.

Most real world designs for this – the update source will be a loopback interface.
The reason behind this is if I’m advertising my loopback into my IGP then is really doesn’t matter as long the path is available.

How we configure this is simple.
Under the process we would simply say:
BGP (AS#)
# Neighbor x.x.x.x update-source serial0/0

Basically in the case where the BGP peering’s have multiple paths between them, then you might want to consider using the update-source command and sourcing from the loopback interface. 

Let’s change the peering now and keep the same debugs running. This time we’ll add the update-source of each routers loopback interface.

The current configs look like this:
R9 –

router bgp 9
no synchronization
bgp log-neighbor-changes
network 9.9.9.0 mask 255.255.255.0
neighbor 192.168.89.8 remote-as 8
no auto-summary

R8 -
router bgp 8
no synchronization
bgp log-neighbor-changes
network 8.8.8.0 mask 255.255.255.0
neighbor 192.168.89.9 remote-as 9
no auto-summary


Basically were adding this command respectively.
Neighbor 192.168.89.9 update-source lo8

Neighbor 192.168.89.8 update-source lo9

Also before we can establish the TCP session we need to verify reachability.

R9#ping 8.8.8.8

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 8.8.8.8, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/7/28 ms
R9#

Since the destination is an EBGP peer and the route is not via a connected interface. Then neither the processes are going to establish by default. No one is essentially going to send the initial TCP SYN.

We can correct this using the #disable connected check feature is used for.
EX: R9(config-router)#neighbor 192.168.89.8 disable-connected-check

We can view all the details with the command
# sh ip bgp neighbor

9#sh ip bgp neighbors
BGP neighbor is 192.168.89.8,  remote AS 8, external link – here it shows the neighbor ID
  BGP version 4, remote router ID 8.8.8.8  --  We’re running Version 4 and the router id is 8.8.8.8
  BGP state = Established, up for 00:00:15
  Last read 00:00:15, last write 00:00:15, hold time is 180, keepalive interval is 60 seconds – Note: The KA and HT do nt have to match this will be negotiated
  Neighbor capabilities:
    Route refresh: advertised and received(old & new)
    Address family IPv4 Unicast: advertised and received
  Message statistics:
    InQ depth is 0
    OutQ depth is 0
                         Sent       Rcvd
    Opens:                  2          2
    Notifications:          0          0
    Updates:                2          2
    Keepalives:            24         24
    Route Refresh:          0          0
    Total:                 28         28
  Default minimum time between advertisement runs is 30 second
For address family: IPv4 Unicast
  BGP table version 5, neighbor version 5/0
Output queue size : 0
  Index 1, Offset 0, Mask 0x2
  1 update-group member
                                 Sent       Rcvd
  Prefix activity:               ----       ----
    Prefixes Current:               1          1 (Consumes 52 bytes)
    Prefixes Total:                 1          1
    Implicit Withdraw:              0          0
    Explicit Withdraw:              0          0
    Used as bestpath:             n/a          1
    Used as multipath:            n/a          0

                                   Outbound    Inbound
  Local Policy Denied Prefixes:    --------    -------
    Bestpath from this peer:              1        n/a
    Total:                                1          0
  Number of NLRIs in the update sent: max 1, min 1

  Connections established 2; dropped 1
  Last reset 00:09:52, due to Peer closed the session
Connection state is ESTAB, I/O status: 1, unread input bytes: 0           
Connection is ECN Disabled, Mininum incoming TTL 0, Outgoing TTL 1 – This show the TTL
Local host: 192.168.89.9, Local port: 42146  --  These next two lines will tell us who is the server and the client
Foreign host: 192.168.89.8, Foreign port: 179

NOTE: Remember the server Sources its traffic from TCP 179 so the Router is the client ant the traffic is coming from the client port 42146 and going to the server 192.168.89.8 on port 179. Really this won’t matter unless there’s is filtering in-place for port 179. Normally the client is the one who initiates the session first. We will only look to the router-id if we send at the same time.

Enqueued packets for retransmit: 0, input: 0  mis-ordered: 0 (0 bytes)

Event Timers (current time is 0x1B28EC):
Timer          Starts    Wakeups            Next
Retrans             4          0             0x0
TimeWait            0          0             0x0
AckHold             3          1             0x0
SendWnd             0          0             0x0
KeepAlive           0          0             0x0
GiveUp              0          0             0x0
PmtuAger            0          0             0x0
DeadWait            0          0             0x0

iss: 4093217048  snduna: 4093217203  sndnxt: 4093217203     sndwnd:  16230
irs: 2835067880  rcvnxt: 2835068035  rcvwnd:      16230  delrcvwnd:    154

SRTT: 124 ms, RTTO: 1405 ms, RTV: 1281 ms, KRTT: 0 ms
minRTT: 8 ms, maxRTT: 300 ms, ACK hold: 200 ms
Flags: active open, nagle
IP Precedence value : 6

Datagrams (max data segment is 1460 bytes):
Rcvd: 5 (out of order: 0), with data: 3, total data bytes: 154
Sent: 7 (retransm


EBGP – TTL
Example: Based on the lab image – if I wanted to peer with R3 (running AS3) from R9.
Under the BGP process –
I would need to configure the neighbor statement for R3 and add the remote as
I would then also need to configure my update-source
I would also then configure the neighbor statement for R3 and configure ebgp-multihop

Ex:
# neighbor x.x.x.x ebgp-mulithop # (default is 255)  
Note: if we do no configure the multi-hop command it defaults to one.
This would mean essentially no matter how far away I am form them I’ll establish the session.

Once again we can verify this with the #sh ip bgp neighbor command

R9#sh ip bgp neighbors
BGP neighbor is 3.3.3.3,  remote AS 3, external link
  BGP version 4, remote router ID 3.3.3.3
  BGP state = Established, up for 00:00:38
  Last read 00:00:38, last write 00:00:08, hold time is 180, keepalive interval is 60 seconds
  Neighbor capabilities:
    Route refresh: advertised and received(old & new)
    Address family IPv4 Unicast: advertised and received
  Message statistics:
    InQ depth is 0
    OutQ depth is 0
                         Sent       Rcvd
    Opens:                  1          1
    Notifications:          0          0
    Updates:                2          1
    Keepalives:             3          1
    Route Refresh:          0          0
    Total:                  6          3
  Default minimum time between advertisement runs is 30 seconds

 For address family: IPv4 Unicast
  BGP table version 4, neighbor version 4/0
 Output queue size : 0
  Index 1, Offset 0, Mask 0x2
  1 update-group member
                                 Sent       Rcvd
  Prefix activity:               ----       ----
    Prefixes Current:               3          1 (Consumes 52 bytes)
    Prefixes Total:                 3          1
    Implicit Withdraw:              0          0
    Explicit Withdraw:              0          0
    Used as bestpath:             n/a          1
    Used as multipath:            n/a          0

                                   Outbound    Inbound
  Local Policy Denied Prefixes:    --------    -------
    Total:                                0          0
  Number of NLRIs in the update sent: max 2, min 1

  Connections established 1; dropped 0
  Last reset never
  External BGP neighbor may be up to 255 hops away.
Connection state is ESTAB, I/O status: 1, unread input bytes: 0           
Connection is ECN Disabled, Mininum incoming TTL 0, Outgoing TTL 255
Local host: 192.168.89.9, Local port: 179
Foreign host: 3.3.3.3, Foreign port: 63387

Note: By default the ebgp-multihop command is only going to control what the TTL is on your outgoing packets.


Sunday, June 15, 2014

EBGP Overview, BGP Peering Types.

Lessons learned:

BGP:
Open standards based
-RFC 4271, BGP 4
Classless path vector routing protocol
-Used multiple attributes fo routing decisions
-Supports VLSM and summarization
-Extensible
--IPv4 Mulitcast, IPv6, MPLS, etc.

BGP on Cisco’s Site:

BGP is an open standards protocol – it’s considered a path vector protocol. IGP’s are making decision based on one value, the metric to reach a destination, choosing the lowest path end to end. With BGP was originally implemented with Policy in mind where the individual attributes are on a per route based, this will determine how we route to a destination.

BGP supports VLSM and summarization. With the size of the global table growing constantly it’s important to be able summarize prefix information.
Many sites offer routers that you can login to and view the global BGP table called “route-views or route-servers.

Ex: route-server.ip.att.net – they’re basically just routers online that you can connect to. We can use these to check policies we’re trying to apply to our outbound advertisements as they pertain to the global internet.

EX: connect to router and login as rviews with a password or rviews…
rviews@route-server.ip.att.net>

You can login and see on average the size of the global BGP table.

You can also get a list of Route-servers here: http://www.netdigix.com/servers.html

Example os “s hip bgp sum” off a route-view server:
route-views>sh ip bgp summary
BGP router identifier 128.223.51.103, local AS number 6447
BGP table version is 1192614054, main routing table version 1192614054

521135 network entries using 68789820 bytes of memory  à this says we currently have 521135 entries

15209495 path entries using 790893740 bytes of memory à this says’ there are over 15209495 paths to reach these entries.

2508169/94266 BGP path/bestpath attribute entries using 421372392 bytes of memory
2166488 BGP AS-PATH entries using 86870842 bytes of memory
66927 BGP community entries using 5058824 bytes of memory
396 BGP extended community entries using 12842 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 1372998460 total bytes of memory
Dampening enabled. 11709 history paths, 16038 dampened paths
BGP activity 878512/333647 prefixes, 50876541/35465181 paths, scan interval 60 secs

Neighbor        V          AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
4.69.184.193    4       3356 7842992   73921 1192614065    0    0 4w5d       490479

---------------------------------------------------- 

BGP is extensible, it can be used for more than just IP unicast routing.

----------------------------------------------------

BGP ASNs

Autonomous Systems (AS) per RFC 4271 -
-A set of routers under a single technical administration, using and interior gateway protocol (IGP) and common metrics to determine how to router packets within the AS, and using and inter-AS routing protocol to determine how to route packets to other ASes,

-ASNs are allocated by Internet Assigned Numbers Authority (IANA).

In general a routing policy will apply for the AS as a whole, where the policy for AS1 is different from AS2, etc.
Note: Routers inside an AS need to reachability first, in reality BGP is not a routing protocol – it’s an application that manly designed to do two things.
It’s designed to advertise an IP Prefix and a next-hop value associated with that prefix.

The design issue - the next hop value that BGP reports must then be recused through some other IGP routing protocol. BGP will rely on EIGRP, OSPF etc. BGP is for destinations outside our network that we’re trying to reach.

The AS numbers themselves are assigned by IANA.

There’s recently been a change in the format that the AS number use:
Original 2-byte field
-Values 0 – 65535
-Public ASNs 1 – 64511
Private ASNs 64512 – 65535 (1024 addresses) Similar to RFC 1918
Currently 4- byte field
-RFC 4893 BGP Support for Four-octet AS Number Space
-IOS supports as of 12.4. (24)T

As of today almost all 2 byte field AS’s have been allocated.

4-Byte BGP ASNs

-          0.0 – 65535.65535 notation
-          0.{0-65535) denote original 2-byte ASNs
Requires backwards compatibility with old code
-4 Byte ASN support negotiated during capability exchange
-old BGP speakers are sent ASdot numbers encoded as ASN “23456” – from the perspective of the IOS versions that DO NOT support the AS4 numbers, the will see everyone that has the 4 byte value represented AS number 23456. This doesn’t mean the information will be lost.

-Real AS-Path encoded with optional transitive attributes AS4_AGGREGATOR and AS4_PATH –

Establishing BGP Peerings:
Like IGP, first step in BGP is to find neighbors to exchange information with

The actual logic BGP for establishing the updates, routing the traffic etc is the same as we use in our IGP.
For example in EIGRP and OSFP our first step is to figure out who are our neighbors on our connected links that we want to run our protocols on.
Once we find the neighbors we go through an adjacency negotiation where we define attributes, Area #, etc.
Once neighbors come up we exchange information and then we can do the path selection.

Same type logic in BGP –
First step is how do we establish the peering?

Unlike IGP….
-BGP does not have its own transport – EX: OSPF used IP protocol # 89, EIGRP # 88. BGP runs on top of TCP. This implies the BGP neighbors would have to have IGP reachability before they can peer BGP. Since BGP is a Standard TCP application, the normal Client/Server Roles of TCP are going to apply.

-BGP has different types of neighbors – IBGP – EBGP – Route reflectors, etc. this will control how updates are process and best path sections.
-BGP neighbors are not discovered – Neighbors are not dynamically discovered – unlike IGPs that use Multicast. Peerings are based on unicast neighbor statements under the process.

-BGP neighbors do not have to be connected – Because TCP is the transport protocol.

BGP Transport:
BGP uses TCP port 179 for transport
-implies the BGP needs IGP First.
BGP Neighbor statement tell process to….
-listen for remote address via TCP 179
-initiate a session to remote address via TCP 179
-if collision, higher router-id becomes TCP client. Can happen if client and server try to establish at same time
 Normally the client will initiate the session over port 179.

Handshake Example:
R1 -- > < --R2
1)      R1 Sends TCP Syn packet with a random Source port and a Destination port of 179 (if R2 is configured to accept the session from R1)
2)      R2 replies with the second portion of the handshake – With a TCP ACK and SYN packet (TCP SYN ACK) saying I also want to start a session.
3)      R2 will send its TCP ACK with the source port of 179 and the destination of a random negotiate value.
4)      Then R1 will reply with a TCP ACK – then the session is fully open.

Key point – is that R1 will always be sending traffic toward 179 and R2 will sending traffic from port 179.

BGP Peering types:
External BGP (EBGP) Peers
-Neighbors outside my Autonomous System
Internal BGP (iBGP) Peers
-Neighbors inside my AS
Update and path selection rules change depending on what type of peer a route is being sent to/received from.

EBGP Peering Rules:
EBGP packets default to TTL 1
-can be modified if neighbors are multiple hops away.
Ex:
--#neighbor (as # ) ebgp-multihop (TTL)
--#neighbor (AS #) ttl-secuirty hops (ttl)  - Common today – used to prevent against remote TCP reset attacks.

Note: these commands are mutually exclusive - you would use one of the other, not both.

Non multi-hop peers must be directly connected by default.
-can be modified if connected neighbors peer via loopbacks
-# Neighbor disable-connected-check - This disabled the connected check, normally this would be used based on Loopback addresses.

Note: The default behavior when a router goes to establish an EBGP peering, if it looks in the routing table for the neighbors destination address and it doesn’t find a directly connected route. Then the neighbor will not send the “open” message. It won’t try to establish the 3-way-handshake.
Neighbor disable-connected-check – will disable this behavior.

Loop prevention via AS-Path
-Local ASN is “prepended” to outbound updates
-inbound updates containing local ASN are discarded
-Can be modified with # neighbor allowas-in command.  

Every time an update is send out to EBGP peers, we take out local AS number and add it to the AS path attribute that is inside that actual update.
The AS will track what AS’s this update when t through from the originator to our local AS. This can also be seen by using the “route-view” servers.

Ex:
route-views>sh ip bgp        

   Network          Next Hop            Metric LocPrf Weight Path
*  1.0.0.0/24       157.130.10.233                         0 701 6453 15169 i

This output basically tells us that this route was originated by AS 15169 – then through 6453 and then 701. So the local peer that eh AS is received from is AS 3 701.

The number on the left most portion is the AS that we are learning the AS from, the number on the right most portion is the prefix that the AS is originated in.

Key point – whenever we send an update outbound, we will take our own AS number and out it as the first number is the AS path.
If for some reason we seen our AS number INBOUND – in the path – then we will automatically filter that update out. Basic loop prevention logic.

We can modify this with the # neighbor allowas-in command.  In cases where we have the same AS number that is separated by a different AS number in the middle.


EBGP Peering Rules:
Next-hop processing  
-Outbound EBGP updates have local update-source for neighbor set as next-hop.
-EX: if update source is loopback 0, next-hop is loopback0

Can be modified with route-map action “set ip next-hop” but typically shouldn’t
-ex 3rd party next hop.
-------------------

When we send updates outbound, whatever the local address is for that peering, is going to be the next hop value that goes into the next-hop value of the route. Ex:

R1 and R2
Both routers peer via their loopback interfaces
R1
Loopback IP 1.1.1.1
R2
Loopback IP 2.2.2.2
R12 advertises 10.0.0.0 /24 -- > when this update is advertised to R2 over the EBGP session, R2 will say the prefix 10.0.0.0 /24 is reachable via the next-hop of 1.1.1.1

This implies that R2 will need an additional step in the route recursion to figure out what is the local connected interface that R2 will use in order to reach the destination 1.1.1.1

If we modify this we can use the route-map option but typically you would not use this is a normal design.
This is normally called a 3rd part next hop – this is where the local router is an update but then tell you to use some other source in the data plane
In order to get there.

Example: 












R1 – sends and update to R2 for prefix 10.0.0.0/24 and instead of using the normal next-hop value – which would be whatever the update source is from R1 to R2. We can tell R2 to user R3. (3.3.3.3)

This means the control plane for BGP is going to be between R1 and R2. But then the actual data plane (actual traffic forwarding) would be through some other device.

One of the flexible feature of BGP is that the control plane is not actually tied to the data plane.

iBGP peering Rules.

iBGP Peering packets default to TTL 255
-implies neighbors do not have to be connected as long as IGP reachability exists

Loop prevention via route filtering
- iBGP learned routes cannot be advertised on to another iBGP neighbor.
-implies need for either….
-Fully meshed iBGP peerings – most efficient for selecting correct path
-route reflection
-Confederation

--------------------------------------------------

Key note: there is no time to live on internal iBGP packets, this means the neighbors do not have to be directly connected. As
Long as there’s IGP reachability in the internal network it will allow us to establish the iBGP peering and ultimately advertise the prefixes.
Based on the next-hop process rules of iBGP, the control plane message again does not have to follow- the actual data plane forwarding.

The loop prevention for iBGP uses a very simple concept. If you learn a route form and iBGP neighbor – DON’T advertise it to another iBGP neighbor.

Next-hop processing:
-outbound iBGP updates to not modify the next-hop attribute regardless of iBGP type.
--iBGP peer
--Route reflectors client peer
--Route reflectors non-client peer
--confederation EBGP peer

Can be modified with the #next-hop-self or # route-map Action set ip next-hop
This basically mean the original entry point for the route is going to be maintained throughout all of the updates in the iBGP network.

Next hop value will always be the value that came from your EBGP neighbor to begin with. Unless we use the next-hop self-command.