Saturday, September 20, 2014

GLBP Overview

GLBP

Lessons Learned:


The advantage of using over HSPR and VRRP is you can have multiple active routers at the same time in order to do load-balancing out of the network.


GLBP
-Cisco proprietary protocol
-extends HSRP Functionality
Adds load balancing function to FHRP
-Every physical gateway may now be active
--Called Active Virtual forwarders (AVF)
--Each AVF assigned a virtual MAC address
-One gateway responds to ARP requests for GLBP IP
--called Active Virtual Gateway
-ARP response uses virtual MACs of AVFs to implement load balancing.


GLBP extends the HSRP functionality by decoupling the role of the person in charge of the ARP responses VS the person that actually forwards the traffic based on the virtual MAC addresses.
Where in HSRP and VRRP – we only have a single role for active routers.
GLBP has two separate roles, the AVG and the AVF…

The AVF are the routers who have the virtual MAC addresses assigned that actually forward the packets out of the network. The key is the additional role which is the AVG – the AVG is in charge of responding to the SARP requests from the end hosts. The AVG will respond with the MAC address of the AVF.  

The LB is based on the ARP replys and the traffic of the end hosts.

GLBP Implementation:

AVG elected based on priority
-by default AVG is the only AVF, all other are standby.
-No AVG preemption by default
--enable using glbp preempt

To enable load balancing
-command “glbp xxx loag-balancing weighted Y”
-Assign weights with glbp XXX weighting Y
Weight can be adjusted based on object tracking

Note: The will be one AVG where all the other routers will be backing up the AVG. The AVG will be the only forwarder unless we configure load-balancing…


GLBP CFG :

Topology:





GLBP 1  ip 192.168.1.1

Once the routers go through tier initial convergence and discover who the AVG and who are the AVF.

R2#
*Mar  1 00:08:14.251: %GLBP-6-STATECHANGE: FastEthernet0/0 Grp 1 state Standby -> Active
R2#
*Mar  1 00:08:24.251: %GLBP-6-FWDSTATECHANGE: FastEthernet0/0 Grp 1 Fwd 1 state Listen -> Active

Verify:::::::::

From the output we can see that preemption is disabled, the active router is router 2 and the priority is 100. < - this is talking about the AVG.
Farther down the output we can see that there is 3 AVF’s and their MAC addresses.

Technically it doesn’t matter who is the AVG as long as the AVF’s are correct. Which are the devices that are actually forwarding the traffic onto the link.  

Note: the AVG is basically used as the control plane for the GLBP feature. All forwarding takes place at the AVF.


R2#sh glbp
FastEthernet0/0 - Group 1
  State is Active
    2 state changes, last state change 00:04:15
  Virtual IP address is 192.168.1.254
  Hello time 3 sec, hold time 10 sec
    Next hello sent in 2.692 secs
  Redirect time 600 sec, forwarder time-out 14400 sec
  Preemption disabled
  Active is local
  Standby is 192.168.1.3, priority 100 (expires in 7.076 sec)
  Priority 100 (default)
  Weighting 100 (default 100), thresholds: lower 1, upper 100
  Load balancing: round-robin
  Group members:
    c000.1f40.0000 (192.168.1.1)
    c002.1f40.0000 (192.168.1.3)
    cc01.1a94.0000 (192.168.1.2) local
  There are 3 forwarders (1 active)

R1#sh glbp
FastEthernet0/0 - Group 1
  State is Listen
  Virtual IP address is 192.168.1.254
  Hello time 3 sec, hold time 10 sec
    Next hello sent in 2.876 secs
  Redirect time 600 sec, forwarder time-out 14400 sec
  Preemption disabled
  Active is 192.168.1.2, priority 100 (expires in 9.692 sec)
  Standby is 192.168.1.3, priority 100 (expires in 7.036 sec)
  Priority 100 (default)
  Weighting 100 (default 100), thresholds: lower 1, upper 100
  Load balancing: round-robin
  Group members:
    c000.1f40.0000 (192.168.1.1) local
    c002.1f40.0000 (192.168.1.3)
    cc01.1a94.0000 (192.168.1.2)
  There are 3 forwarders (1 active)

R3#sh glbp
FastEthernet0/0 - Group 1
  State is Standby
    1 state change, last state change 00:05:47
  Virtual IP address is 192.168.1.254
  Hello time 3 sec, hold time 10 sec
    Next hello sent in 0.096 secs
  Redirect time 600 sec, forwarder time-out 14400 sec
  Preemption disabled
  Active is 192.168.1.2, priority 100 (expires in 9.764 sec)
  Standby is local
  Priority 100 (default)
  Weighting 100 (default 100), thresholds: lower 1, upper 100
  Load balancing: round-robin
  Group members:
    c000.1f40.0000 (192.168.1.1)
    c002.1f40.0000 (192.168.1.3) local
    cc01.1a94.0000 (192.168.1.2)
  There are 3 forwarders (1 active)


Now to configure the weighting of traffic on the AVG.
The weighting is essentially how it responds to the arp replies.

On the router who sis elected the AVG – in this case it’s R2:



R2(config-if)#int fa0/0

R2(config-if)#glbp 1 weighting 10

R2(config-if)#glbp 1 load-balancing weighted


Basically we’ve configured weighting (we can get into complex weighting if we have tracking enabled and based on how many upstream neighbors we have.)
We now have configured and value and said LB based on that value. On the other two Forwarder I will configure a weight of 5.

The verify output – show that we still have one active and the original forwarders.

R2#sh glbp
FastEthernet0/0 - Group 1
  State is Active
    2 state changes, last state change 00:32:37
  Virtual IP address is 192.168.1.254
  Hello time 3 sec, hold time 10 sec
    Next hello sent in 1.356 secs
  Redirect time 600 sec, forwarder time-out 14400 sec
  Preemption disabled
  Active is local
  Standby is 192.168.1.3, priority 100 (expires in 8.660 sec)
  Priority 100 (default)
  Weighting 10 (configured 10), thresholds: lower 1, upper 10
  Load balancing: weighted
  Group members:
    c000.1f40.0000 (192.168.1.1)
    c002.1f40.0000 (192.168.1.3)
    cc01.1a94.0000 (192.168.1.2) local
  There are 3 forwarders (1 active)
  Forwarder 1
    State is Active
      1 state change, last state change 00:32:27
    MAC address is 0007.b400.0101 (default)
    Owner ID is cc01.1a94.0000
    Redirection enabled
    Preemption enabled, min delay 30 sec
    Active is local, weighting 10
    Arp replies sent: 1


We now need to configure all the routers LB to use the configured weight.
EX: glbp 1 load-balancing weighted


R3#sh glbp
FastEthernet0/0 - Group 1
  State is Standby
    1 state change, last state change 00:35:55
  Virtual IP address is 192.168.1.254
  Hello time 3 sec, hold time 10 sec
    Next hello sent in 2.320 secs
  Redirect time 600 sec, forwarder time-out 14400 sec
  Preemption disabled
  Active is 192.168.1.2, priority 100 (expires in 8.568 sec)
  Standby is local
  Priority 100 (default)
  Weighting 5 (configured 5), thresholds: lower 1, upper 5
  Load balancing: weighted
  Group members:
    c000.1f40.0000 (192.168.1.1)
    c002.1f40.0000 (192.168.1.3) local
    cc01.1a94.0000 (192.168.1.2)
  There are 3 forwarders (1 active)
  Forwarder 1
    State is Listen
    MAC address is 0007.b400.0101 (learnt)
    Owner ID is cc01.1a94.0000
    Time to live: 14398.560 sec (maximum 14400 sec)
    Preemption enabled, min delay 30 sec
    Active is 192.168.1.2 (primary), weighting 10 (expires in 7.424 sec)
  Forwarder 2
    State is Listen
    MAC address is 0007.b400.0102 (learnt)
    Owner ID is c000.1f40.0000
    Time to live: 14397.492 sec (maximum 14400 sec)
    Preemption enabled, min delay 30 sec
    Active is 192.168.1.1 (primary), weighting 5 (expires in 7.492 sec)
  Forwarder 3
    State is Active
      1 state change, last state change 00:36:00
    MAC address is 0007.b400.0103 (default)
    Owner ID is c002.1f40.0000
    Preemption enabled, min delay 30 sec
    Active is local, weighting 5
R3#

Note: now for forwarder 3 (R3) we are now active and also notice that preemption is active.
Note: remember the LB is based on ARP request and reply. The LB is not a pre destination like CEF, it’s a pre ARP request.

If we wanted to us tracking we can configure the weighting and tracking together.

EX: glbp 1 weighting track 10 decrement (value to decrement) 

Sunday, August 17, 2014

MPLS Overview, MPLS Label Distribution Protocol (LDP)

Multiprotocol Label Switching
-Open standard per RFC 3031
-Previous Cisco proprietary tag switching

MPLS Overview – Multiprotocol
-can transport different payloads
-Layer 2
--Ethernet, HDLC, PPP, Frame and ATM
-Layer3
-IPv4 & IPv6

MPLS is made up of two portions
Multiprotocol encap because it can transport different types payloads.
Including both layer 2 and layer 3 (there is also extensions to transport IPv6 over MPLS called 6PE)

MPLS Label Switching
-Traffic is switched between interfaces based on locally significant label values
-Similar to how a frame relay or ATM switch uses input/output DLCIs and VPI/VPCs

MPLS Label Format
-4 byte header used to switch packets
-RFC 3032 – MPLS label Stack Encoding
-- 20 bit label = locally significant to router
--3 bit EXP = Class of Service (QoS)
--S bit = Defines last label in the label stack (used by provide edge router)
--8 bit TTL = Time to live

Sits between the layer 2 and layer 3 encap.


How Labels work:
-MPLS Labels are bound to FECs
--Forward Equivalency Class
--IPv4 prefix for the CCIE purposes
Router uses MPLS LFIB to switch traffic
-Essentially CEF table + Label
Switching Logic
-If traffic comes in IF1 with label X send it out IF2 with Label Y

Based on label value to an IPv4 prefix association in the network, the binding of the Label and the Prefix value is known as the FEC.
The FEC is used to switch the label between the routers CEF table which is now known as the LFIB – CEF + Label Value.

In IPv4 – we determine the destination based on the outgoing interface.
In MPLS the outgoing interface is based on the incoming label value. The routers in the MPLS transit path need to agree on what the locally significant labels are. The actual labels values can be used over and over because they are really only relevant to the local router.

MPLS Device Roles:
PE /LER
--Provider Edge Router / Label edge router - Exchange IPv4 routes from Provider to Customer.
Connects to Customer Edge (CE) devices
Receives unlabeled packets and adds label
-AKA label push or label imposition -- Ads labels to normal IP packet
In L3VPN performs both IP routing & MPLS lookups

Once in the Provider network

P / LSR devices
-Provider Router / Label Switch Routers (not attached to any customers)
Connects to Pes and / or other P routers
Switches traffic based only on MPLS label

Key:
Provider routers only switch traffic based on MPLS label.
Design – the advantage of running MPLS from the Service provider’s perspective, it reduces the load in the Control Plane of the SP core.

Label Push / Pop / Swap:
PE and P routers perform three major operations

Label push – Done by Provider edge – Adding label on packet on inbound packet
--AKA add label to incoming packet
--AKA label imposition

Label Swap - Normally done by the P router in the core of the SP network. Where we receive a packet inbound that already has a label assigned
-Replace a label on an incoming packet – the adding a new label as it’s sent out another interface.

Label POP - Where the packet exit’s the network – removes label and sends to customer.
-Remove a label from an outgoing packet
-AKA label disposition

Label Distribution:

In order for the routers to do the label switching, they first must agree on what the mappings are between the label numbers and the IP Prefixes. This is what’s considered the FEC. Label values are only locally significant so we don’t need information about the entire topology.
Most LDP will rely on underlying loop preventions like IGP – to know about the topology and keep it loop free.

The first of the protocols that advertise the Labe distribution are:

Legacy Cisco TDP and the open standard LDP.

Label dist –
Adjacent P/PE’s must agree on label pre FEC.
Label bindings can be dynamic through….
--Tag distribution protocol TDP

--Label Distribution protocol - LDP
Resource reservation Protocol
--used for MPLS traffic Engineering (MPLS TE)
Multiprotocol BGP (MP-BGP)

LDP:
Standard per RFC 30306
Neighbor discovery
--uses UDP port 646 to 224.0.0.2
Neighbor Adjacency – Once the neighbors discover each-other they use TCP
-TCP port 646 to remote LDP router-ID
Note: same logic as BGP using it’s loopback to source the TCP session.
Loopback logic is the same as for EX: OSPF

Label Advertisement
-Advertise FEC for connected IGP interfaces
-Advertise FEC for IGP learned routes.


LDP is an IGP based label dist protocol, this means that LDP will only advertise locally enable for IGP and IGP learned prefixes.
EX: SH IP ROUTE OISPF _ these will be advertised in the FEC


MPLS Config:

Globally - Ip cef
Globally - Mpls label protocol (LDP | TDP)
Interface - Mpls ip
#sh ip mpls ldp interface – interfaces running LDP
#sh mpls ldp Neighbor – shows ADJ status – verify peering
#sh mplds forwarding-table – eq of s hip route
#sh ip cef (internal) – how the router is encapsulation the packet. Label value, etc. Same info as forwarding table.
#debufg mpls packet.

NOTE: MPLS IP ant the interface level will enable the process. The new default is LDP old TDP.

Note: Since LDP is an IGP based Protocol, Once LDP is enabled on the per-link basis, we should see Label bindings for all the routes in the routing table. Any connected interface and learned from the IGP. Locally significant on a hop-by-hop- basis. These means the label will change on a Hop basic as negotiated by the routed and the prefix.

Once LDP is enabled all the – Assuming there’s nothing wrong with eh underlying transport – All the Label announcements should happen automatically

Penultimate Hop Popping (PHP)

Uses to remove labels on CE facing routers and connected routers. This is basically an optimization of the label lookup that says the device that is the next to last hop. Is automatically going to remove the top most label in the stack – before it is sent out to that neighbor.

Penultimate means next to last. (hop)
Normally last hop must….
--lookup MPLS label
--Pop MPLS label
--Lookup IPv4 Destination
PHP avoids extra lookup on last hop
Accomplished through Implicit NULL label advertisement for connected prefixes.

Note: Any time we show the MPLS forward-table, if there is a decimal value under the outgoing label values. This means that the destination is several HOPS away. For any destination that the local router is the next to last hop – we should see the words, “Pop Label” under the outgoing heading.

Note: if we see the “no label” it means that the traffic is being sent on a non LDP enabled interface. Normal IP interface).


Thursday, July 24, 2014

BGP Confederation Configuration:

Reduces full mesh iBGP requirement by splitting AS into smaller Sub-ASes
--Inside Sub-AS full mesh or RR requirement remains
--Between Sub-AS acts like EBGP
Devices outside the confederation do not know about the internal structure
--Aub-AS numbers asre stripped from advertisements to True EBGP peers
Typically uses ASNs in private range (64512 – 65535)

------------------------
Essentially BGP Confederations are when we take the entire AS and split it into smaller more manageable ASes.
This is similar to Clusters in route reflection but based on the normal loop prevention mechanisms of EBGP –
The loop prevention is based on the AS PATH.

All configuration inside the SUB-AS will still need a full mesh of peering’s or route reflection. The change in confederation is going to be for any updates between the SUB-ASes, in what is now known as the confederation EBGP peers.
The Sub AS numbers will be stripped form advertisements for routers outside the AS or not part of the configuration.

BGP Confederation Configuration:

Enable the BGP process
# router bgp (SUB-AS)
Specify the main AS number
#bgp confederation-id (main-AS)
Specify other sub-ASes the you peer with
#bgp confederation-peers (sub-as1 sub-as-2, etc)
--not all Sub-ASes, just those directly peered with.

Note: in order to tell the difference between a true EBGP peer and a confederation EBGP peer. Any routers on the edge of the SUB-AS,
Will need to specify which of its neighbors are confederation peers and which are true EBGP peers.

For this we would use the #bgp confederation-peers (sub-as1 sub-as-2, etc)
Command – it only needs to be configured on the SUB-AS edge and only for the SUB as systems were directly peering with.

We will need to use the Private range for the Internal SUB-ASes and they cannot overlap. When Sub-ASes exchange information, they will use the SUB-AS number prepended to the end of the routes.
One thing to note about Confederation ebgp peers – Since theses technically count as EBGP peering, the TTL will be sent to 1 be default.
 Note – if you require full mesh – you can configure route reflectors inside your SUB-ASes.

Once all peering are up – we would verify peering. We should also expect to see – inside the SUB-ASes ONLY – the Sub-as. If peering with other ebgp neighbor we would not want to see the Sub-as in the path.
If we are traversing though sub-ases we would see the sub-as prepended and in parentheses ex: (65000) (65146) 100 200, etc.

Note: there are special configurations related only to the confederation designs and also to the community attributes – specific to confederations.
These are:

# bgp bestpath med confed – which means, between our confederation peers we could compare the MED to figure out which path to choose.

We could also create a route-map the set the community attribute.
For example – set community the is the local-as (also sometimes know as # no export sub-confed). This means that prefixes in the local AS will not go out of the sub-as …
We can also set the command “no-export”, this will essential send the prefix to our sub-as EBGP peers but not our true EBGP peer.

This just depends where we want the prefix to be confined to – the local-as or the entire sub-as

Sunday, July 6, 2014

Large Scale BGP Route Reflection


Lesson Learned:

Larger Scale BGP desisgn cannot be serviced by only a single RR
--Single RR is a single point of failure
RR Clusters allow redundancy and Hierarchy
--Cluster is defined by the clients a RR servers
--RRs in the same Cluster use the same Cluster-ID
Inter-Cluster peerings between RRs can client of non-client peerings
--Depends on redundancy design.

Using multiple RR’s in cluster, for example RR1, RR2, and RR3. Each with client connected to each RR.
The ultimate design goal is when a client for RR1 sends an update within the cluster to the RR. These updates are passed down to the clients in the Cluster. The update is then passed to the other clusters then in turn sends the updates down to their clients.

Then based on the rules of RR’s, this will determine whether or not the route is updated. Example, if RR1, RR2, and RR# are all clients of each other, if they are clients, the updates will be passed along, if not then the updates will fail.

Note: there is technically nothing wrong with RR’s being clients of each other. The only issue is there will be more overhead in the Control Plane. Then even though the route is looping between the RR’s, the Cluster list is going to prevent that loop from occurring. In a large scale design you could be processing 3 or 400,000 routes.

The Case where you would want the RRs to be clients of each other, is if there was a link failure between the RR’s and if the RR’s are non-clients, it would affect the updates between all RRs.

TOPOLOGY:




From the Topology The Configs for RR1 and it’s clients will look like this:

RR Config:
R1:

Router bgp 100
neighbor R1CLIENTs peer-group
neighbor R1CLIENTs remote-as 100
neighbor R1CLIENTs route-reflector-client
neighbor R1CLIENTs update-source Loopback1
neighbor 2.2.2.2 peer-group R1CLIENTs
neighbor 3.3.3.3 peer-group R1CLIENTs

---------

CLIENT Config:

router bgp 100
neighbor 1.1.1.1 remote-as 100
neighbor 1.1.1.1 update-source loopback1



Cluster 2:

RR Config:

R4:

Router bgp 100
neighbor R4CLIENTs peer-group
neighbor R4CLIENTs remote-as 100
neighbor R4CLIENTs route-reflector-client
neighbor R4CLIENTs update-source Loopback1
neighbor 5.5.5.5 peer-group R4CLIENTs
neighbor 6.6.6.6 peer-group R4CLIENTs


CLIENT Config:

router bgp 100
neighbor 4.4.4.4 remote-as 100
neighbor 4.4.4.4 update-source loopback1

Cluster 3:

RR:
Router bgp 100
neighbor R7CLIENTs peer-group
neighbor R7CLIENTs remote-as 100
neighbor R7CLIENTs route-reflector-client
neighbor R7CLIENTs update-source Loopback1
neighbor 8.8.8.8 peer-group R7CLIENTs
neighbor 9.9.9.9 peer-group R7CLIENTs

CLIENT Config:

router bgp 100
neighbor 1.1.1.1 remote-as 100
neighbor 4.4.4.4 update-source loopback1


Once these are all up we will need to verify the peerings.
We can do a #sh ip bgp sum – we should see on all the RR – the peers for all the neighbors:

Ex:
R1#sh ip bgp summary
BGP router identifier 1.1.1.1, local AS number 100
BGP table version is 1, main routing table version 1

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
2.2.2.2         4   100      37      37        1    0    0 00:34:50        0
3.3.3.3         4   100      37      37        1    0    0 00:34:53        0
4.4.4.4         4   100       6       6        1    0    0 00:02:50        0
7.7.7.7         4   100       6       6        1    0    0 00:02:52        0
R1#


Also – from RR4 –

R4#sh ip bgp
BGP table version is 20, local router ID is 4.4.4.4
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
* i1.1.1.0/24       1.1.1.1                  0    100      0 i
*> 4.4.4.0/24       0.0.0.0                  0         32768 i
r>i7.7.7.0/24       7.7.7.7                  0    100      0 i
R4#

We can see the routes and how they’re learned.

interface Loopback10
 ip address 100.100.100.9 255.255.255.0
end

So now on R9 – lets add a route and advertise it and see how BGP updates the route.
R9(config)#router bgp 100
R9(config-router)#network 100.100.100.0 mask 255.255.255.0

We can see from another RR in the topology that the route has been updated from one RR to another:

*>i100.100.100.0/24 9.9.9.9                  0    100      0 i
R7#


Note:  If there’s a case where there’s more than 1 RR per cluster, first we would need to specify the cluster ID be the same on both RRs – to keep loops form happening. Also we might want to set the local preference because the Local Preference is for the entire AS.

For example:
On R1 – I’ll set the local preference to be 100 and on R4 I’ll set the preference to be 200.

R4:

R4#sh route-map
route-map PREF_200, permit, sequence 10
  Match clauses:
  Set clauses:
    local-preference 200
  Policy routing matches: 0 packets, 0 bytes

Then under the BGP process
I’ll tell neighbor 7.7.7.7 (R7) to use this weight

R4(config-router)#neighbor 7.7.7.7 route-map PREF_200 in

Note: this change will not apply until we ask for a route refresh. For this lab I’ll just clear ip bgp.

   Network          Next Hop            Metric LocPrf Weight Path
*>i1.1.1.0/24       1.1.1.1                  0    100      0 i
*> 4.4.4.0/24       0.0.0.0                  0         32768 i
r>i7.7.7.0/24       7.7.7.7                  0    200      0 i
*>i100.100.100.0/24 9.9.9.9                  0    200      0 i

Now we can see the routes form R7 have a preference of 200.
Note: The convergence of the BGP network, is more a function if the underlying IGP that BGP itself. Because we’re always relying on the IGP route to the next hop, which means if we can get high-availability for the next hop, it doesn’t really matter what the BGP update is.



Tuesday, July 1, 2014

BGP Route Reflectors

Lessons Learned:

iBGP Route Reflection

Eliminates need for full mesh
-only need peering(s) to the RR(s)

Like OSPF DR and IS-IS DIS, minimizes prefix replication
--Send one update to the RR
--RR sends the update to its “clients”
Loop prevention through Cluster-ID
--RR discards routes received with its oven Cluster-ID
--Does not modify other attributes such as next-hop

======================================================

The overall principal in Route Reflection can be thought of like the DR in OSPF. Where there is some sort of centralized device, which is used to minimize the amount of control plane routing information that is sent throughout the network.

With a full mesh of iBGP peers it’s a lot of administrative overhead to maintain all the peering’s. It’s also a lot of control plane information that the router needs to process as it’s sending all of the duplicate updates to all the peers in the network. With Route Reflection we can centralize the peering arrangements so that the clients are sending one copy of their routing updates to the RR then the RR can turn around and send to all the clients in the network.

Note: you can have more than one RR I a network, it just depends on what your redundancy goals are based on the physical topology of the network.

With RR we’re breaking the loop prevention mechanism of the IBGP PEERS where is says if you learn a route from and ibgp neighbor, you’re not allowed to advertise it to another ibgp neighbor.

We will now need to implement another form of loop prevention – in RR – this comes in the form of the Cluster-ID.

When a RR receives an update from a client and then forwards it on to another client or another ibgp peer.
The RR’s router-id, also known as the Cluster-ID, is going to get added to the update in the cluster list.
If a RR receives an update with its own Cluster-ID in the cluster list, it will automatically filter those updates out.

Route Reflector Peering’s.

Route Reflectors can have three types of peers
EBGP Peers
--Neighbors in different AS (routers that are in a different AS than us locally)

Client Peers
--iBGP peers with # route-reflector-client (routers that have the RR-Client configured in their neighbor command)

Non-Client peers
--ibgp peers without #route-reflector-client (regular ibgp neighbors – W/O the client command configured)

The reason that it is significant to separate these peers into these 3 separate roles is because the RR is going to treat routing updates differently depending on where the update is coming from and where the update is going to.

Route Reflector Update Processing

RR processes updates differently depending on what type of peer the came from.

EBGP learned routes….
--Can be advertised to EBGP peers & clients & Non-clients

Client learned routes
--can be advertised to EBGP peers, clients, and Non-clients

Non-Client learned routes
--can be advertised to EBGP peers and Clients (This is the only restriction)
RR placement based upon these rules.

Key point: remember that the RR will not advertise routes between two non-client peers.

There are cases where there are multiple RR’s inside the same AS to avoid single points of failure.

Large Scale Route Reflection

Larger scale BGP designs cannot be services by only a single RR
--Single RR is a single point of failure.
RR “Clusters” allow redundancy and hierarchy
--Cluster is defined by the clients a RR servers
--RRs in the same cluster user the same cluster-ID
Inter-Cluster peering’s between RRs can be client or non-client peering’s.
--Depends on the redundancy design

Note: RRs will generally select only ONE best path on a per route basis.
This selection will ultimately go into the routing table and then ultimately what routes can be advertised.

Topology RR:



First thing we need to do is to verify we have underlying reachability. For this exercise I’ll user EIGRP AS 100
Since I will be using the loopbacks as the update source on each router, I will advertise them into my IGP.

R1#ping 2.2.2.2

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 20/22/32 ms
R1#ping 1.1.1.1

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms
R1#

Then we need to configure R1 – who will be the RR:

R1:

router bgp 100
 no synchronization
 bgp log-neighbor-changes
 network 1.1.1.0 mask 255.255.255.0
 neighbor 2.2.2.2 remote-as 100
 neighbor 2.2.2.2 update-source Loopback1
 neighbor 2.2.2.2 route-reflector-client
 neighbor 3.3.3.3 remote-as 100
 neighbor 3.3.3.3 update-source Loopback1
 neighbor 3.3.3.3 route-reflector-client
 no auto-summary

Then we’ll configure each other Router as RR Clients and point them both to the RR as the neighbor.

R2:

router bgp 100
 no synchronization
 bgp log-neighbor-changes
 network 2.2.2.0 mask 255.255.255.0
 neighbor 1.1.1.1 remote-as 100
 neighbor 3.3.3.3 remote-as 100
 no auto-summary
!

R3:
router bgp 100
 no synchronization
 bgp log-neighbor-changes
 network 3.3.3.0 mask 255.255.255.0
 neighbor 1.1.1.1 remote-as 100
 neighbor 1.1.1.1 update-source Loopback1
 neighbor 2.2.2.2 remote-as 100
 no auto-summary
!

We should now verify the config via #sh ip bgp

R1:
R1#sh ip bgp
BGP table version is 18, local router ID is 1.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 1.1.1.0/24       0.0.0.0                  0         32768 i
r>i2.2.2.0/24       2.2.2.2                  0    100      0 i
r>i3.3.3.0/24       3.3.3.3                  0    100      0 i
R1#
-------------------------------------------------------------------------------------------
R2#sh ip bgp
BGP table version is 18, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
r>i1.1.1.0/24       1.1.1.1                  0    100      0 i
*> 2.2.2.0/24       0.0.0.0                  0         32768 i
r>i3.3.3.0/24       3.3.3.3                  0    100      0 i
R2#
-------------------------------------------------------------------------------------------
R3#sh ip bgp
BGP table version is 18, local router ID is 3.3.3.3
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
r>i1.1.1.0/24       1.1.1.1                  0    100      0 i
r>i2.2.2.0/24       2.2.2.2                  0    100      0 i
*> 3.3.3.0/24       0.0.0.0                  0         32768 i
R3#

In this case the r> next to the prefix does not mean RIB failure, it means it was learned from the Route-Reflector.

We can verify the RR is working correctly – I’ll add a loopback interface to R3 and add it’s prefix into BGP-

network 6.1.1.0 mask 255.255.255.0

We should now see this route or the RR:
R1#sh ip bgp
BGP table version is 19, local router ID is 1.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 1.1.1.0/24       0.0.0.0                  0         32768 i
r>i2.2.2.0/24       2.2.2.2                  0    100      0 i
r>i3.3.3.0/24       3.3.3.3                  0    100      0 i
*>i6.1.1.0/24       3.3.3.3                  0    100      0 i

Also on R2:

We can see the route with a next hop of 3.3.3.3 and learned from the Loopback of R1 – the RR

R2#sh ip route 6.1.1.0
Routing entry for 6.1.1.0/24
  Known via "bgp 100", distance 200, metric 0, type internal
  Last update from 3.3.3.3 00:30:32 ago
  Routing Descriptor Blocks:
  * 3.3.3.3, from 1.1.1.1, 00:30:32 ago
      Route metric is 0, traffic share count is 1

      AS Hops 0

Saturday, June 28, 2014

BGP Next-Hop Self, BGP Next-Hop Processing.

Lessons Learned:

Topology:















Issue, in a fully meshed iBGP implementation with multiple edge routers peering with the provider at different exit points. There can be a case where one edge router doesn’t have a full path the other edge router. When it comes to BGP as long as we have reachability we can establish adjacency.
One way to correct this issue would be to advertise the transit networks on the edge devices into our IGP.
This would allow full reachability to the other edge devices and our iBGP to install the routes in the routing table.

Before adding the transit network to for the peer advertising the 10.x.x.x routes. We see that we have no best path to these prefixes and they cannot be installed in to the routing table.

R8#sh ip route 10.3.1.0
% Network not in table

   Network          Next Hop            Metric LocPrf Weight Path
*>i4.4.4.0/24       100.100.4.4              0    100      0 i
*>i5.5.5.0/24       100.100.5.5              0    100      0 i
*>i6.6.6.0/24       100.100.6.6              0    100      0 i
* i10.1.1.0/24      204.12.28.254            0    100      0 400 i
* i10.2.1.0/24      204.12.28.254            0    100      0 400 i
* i10.3.1.0/24      204.12.28.254            0    100      0 400 i
*>i172.16.1.0/24    206.33.33.1              0    100      0 300 i
*>i172.16.2.0/24    206.22.22.2              0    100      0 200 i

-------------------------------------------------------------------------------------
After adding the transit network to into our IBGP

   Network          Next Hop            Metric LocPrf Weight Path
*>i4.4.4.0/24       100.100.4.4              0    100      0 i
*>i5.5.5.0/24       100.100.5.5              0    100      0 i
*>i6.6.6.0/24       100.100.6.6              0    100      0 i
*>i10.1.1.0/24      204.12.28.254            0    100      0 400 i
*>i10.2.1.0/24      204.12.28.254            0    100      0 400 i
*>i10.3.1.0/24      204.12.28.254            0    100      0 400 i
*>i172.16.1.0/24    206.33.33.1              0    100      0 300 i
*>i172.16.2.0/24    206.22.22.2              0    100      0 200 i


R8#sh ip route 10.3.1.0
Routing entry for 10.3.1.0/24
 Known via "bgp 100", distance 200, metric 0
  Tag 400, type internal
  Last update from 204.12.28.254 00:00:05 ago
  Routing Descriptor Blocks:
  * 204.12.28.254, from 100.100.4.4, 00:00:05 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 400

Now the route is in the table. It’s learned from BGP 100 (metric 200 iBGP).
We also the next hop of the route is 204.12.28.254 0 which is not directly connected to this router.

We can continue our recursive lookup by showing the route for 204.12.28.254

R8#sh ip route 204.12.28.254
Routing entry for 204.12.28.0/24
  Known via "eigrp 10", distance 90, metric 30720, type internal
  Redistributing via eigrp 10
  Last update from 192.168.48.4 on FastEthernet0/1, 00:03:49 ago
  Routing Descriptor Blocks:
  * 192.168.48.4, from 192.168.48.4, 00:03:49 ago, via FastEthernet0/1
      Route metric is 30720, traffic share count is 1
      Total delay is 200 microseconds, minimum bandwidth is 100000 Kbit
      Reliability 255/255, minimum MTU 1500 bytes
      Loading 1/255, Hops 1

R8#

We’re learning this route form 192.168.4.4 through our FastEthernet0/1 interface.

interface FastEthernet0/1
ip address 192.168.48.8 255.255.255.0
speed 100
full-duplex
end

We also could have done the verification by looking at the CEF table, ex:

R8#sh ip cef 204.12.28.254
204.12.28.0/24, version 57, epoch 0, cached adjacency 192.168.48.4
0 packets, 0 bytes
  via 192.168.48.4, FastEthernet0/1, 3 dependencies
    next hop 192.168.48.4, FastEthernet0/1
    valid cached adjacency
R8#

CEF in reality is pre-calculating the recursion and the layer 2 header that’s going to be used on that link

If we look at the #Sh IP CEF detail we can see the interface that it’s using.
And #sh ip cef internal –

R8#sh ip cef 204.12.28.254 internal – This destination
204.12.28.0/24, version 57, epoch 0, cached adjacency 192.168.48.4 – Points to this next Hop
0 packets, 0 bytes
  via 192.168.48.4, FastEthernet0/1, 3 dependencies – Out this interface.
    next hop 192.168.48.4, FastEthernet0/1
    valid cached adjacency
  refcount 5

The “SH ip cef Internal “would show the load distribution and if there was multiple paths. This lookup is ultimately what the routing process is looking for.

The above solution was to advertise a route to the next hop by adding the transit networks into our IGP.
Which is a perfectly valid design solution. The other solution is to change the next hop to something the routers already do have a route to.


We can do this by using our existing peer-group configuration;

Router BGP 100 
Neighbor IBGP_PEERS peer-group 
Neighbor IBGP_PEERS remote-as 100 
Neighbor
_PEERS update-source loopback 10
Neighbor 100.100.4.4 peer-group IBGP_PEERS
Neighbor 100.100.5.5 peer-group IBGP_PEERS
Neighbor 100.100.6.6 peer-group IBGP_PEERS
Neighbor 100.100.7.7 peer-group IBGP_PEERS
Neighbor 100.100.8.8 peer-group IBGP_PEERS

To test this I’ll remove the transit networks from our IGP for R4 and for R5
R4 = 204.12.28.253 255.255.255.0
R5 = 206.22.22.5 255.255.255.0

So under the peer-group, I need to say. When I learn a route from an EBGP neighbor and I turn around and advertise it to an iBGP neighbor, I want to set the next hop value to my own local peering address.

Ex:
Router bgp 100
Neighbor IBGP_PEERS next-hop-self

Now since my update source for the peer group is my loopback 10 interface, it would then mean than all of my BGP routes that are advertised to these neighbors are all going to have the next-hop address of the loopback 10.

Once I make the change I’m going to need to send an update to all my neighbors – I can do this is with the automatically enabled route-refresh capability.

We can do a clear ip bgp *
Or
We can just do a clear ip bgp 100 out – this will not reset the TCP session, it will just send them a new triggered updated.

We can now see that the next hops are all the Loopbacks:


   Network          Next Hop            Metric LocPrf Weight Path
*>i0.0.0.0          100.100.6.6              0    100      0 300 i
*>i4.4.4.0/24       100.100.4.4              0    100      0 i
*>i5.5.5.0/24       100.100.5.5              0    100      0 i
*>i6.6.6.0/24       100.100.6.6              0    100      0 i
*>i10.1.1.0/24      100.100.4.4              0    100      0 400 i
*>i10.2.1.0/24      100.100.4.4              0    100      0 400 i
*>i10.3.1.0/24      100.100.4.4              0    100      0 400 i
*>i172.16.1.0/24    100.100.6.6              0    100      0 300 i
*>i172.16.2.0/24    100.100.5.5              0    100      0 200 i
R7#

Also because my loopback is advertised in my IGP I should be able to perform route recursion and I can also verify that again via the CEF table.

R7#sh ip cef 10.1.1.0 detail
10.1.1.0/24, version 37, epoch 0, per-destination sharing
0 packets, 0 bytes
  via 100.100.4.4, 0 dependencies, recursive
    next hop 192.168.67.6, FastEthernet0/0 via 100.100.4.0/24
    valid adjacency
  Recursive load sharing using 100.100.4.0/24.
R7#

OR # sh IP cef internal – this will show how the load distribution for a route

R7#sh ip cef 10.1.1.0 inter
10.1.1.0/24, version 37, epoch 0, per-destination sharing
0 packets, 0 bytes
  via 100.100.4.4, 0 dependencies, recursive
    next hop 192.168.67.6, FastEthernet0/0 via 100.100.4.0/24
    valid adjacency

  Recursive load sharing using 100.100.4.0/24
  Load distribution: 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 (refcount 5)

  Hash  OK  Interface                 Address         Packets
  1     Y   FastEthernet0/0           192.168.67.6          0
  2     Y   FastEthernet0/1           192.168.57.5          0
  3     Y   FastEthernet0/0           192.168.67.6          0
  4     Y   FastEthernet0/1           192.168.57.5          0
  5     Y   FastEthernet0/0           192.168.67.6          0
  6     Y   FastEthernet0/1           192.168.57.5          0
  7     Y   FastEthernet0/0           192.168.67.6          0
  8     Y   FastEthernet0/1           192.168.57.5          0
  9     Y   FastEthernet0/0           192.168.67.6          0
  10    Y   FastEthernet0/1           192.168.57.5          0
  11    Y   FastEthernet0/0           192.168.67.6          0
  12    Y   FastEthernet0/1           192.168.57.5          0
  13    Y   FastEthernet0/0           192.168.67.6          0
  14    Y   FastEthernet0/1           192.168.57.5          0
  15    Y   FastEthernet0/0           192.168.67.6          0
  16    Y   FastEthernet0/1           192.168.57.5          0
  refcount 5
R7#


So the next-hop self-command basically says: When learning routes from my EBGP Peers, and I then send those routes on to my iBGP peers. Change the next-hop value so my own local loopback interface – in this case loopback 10.

We’re using the loopback 10 because that’s what’s called out in the “update-source command”. Ex, if my update-source was set to my FA0/0 interface that’s what the next hop would be changed to.

Next-HOP – Route-map.
We can also use a route-map to manually change the next hop value.
So based on the topology, I’ll use router7.

Current BGP config:

router bgp 100
 no synchronization
 bgp log-neighbor-changes
 neighbor IBGP_PEERS peer-group
 neighbor IBGP_PEERS remote-as 100
 neighbor IBGP_PEERS update-source Loopback10
 neighbor IBGP_PEERS next-hop-self
 neighbor 100.100.4.4 peer-group IBGP_PEERS
 neighbor 100.100.5.5 peer-group IBGP_PEERS
 neighbor 100.100.6.6 peer-group IBGP_PEERS
 neighbor 100.100.8.8 peer-group IBGP_PEERS
 no auto-summary

New config:


route-map CHANGE_NEXT_HOP permit 10
 set ip next-hop 100.100.7.7

Router bgp 100
 Neighbor IBGP_PEERS route-map CHANGE_NEXT_HOP out

---------------------------------------

Note: for the next-hop value, all we’re really looking for is that the other routers can do the full route recursion. If I set the next to a value that’s not in the routing table, that’s not going to work.

We can use out own IP address the neighbors or even a physical link. The only issue is if we use a physical link and the link goes down. The BGP peers would to be able to use any of those routes associated to that router.

Where the physical interface can be a good thing, is if we want to control traffic and if the interface does go down, we don’t want traffic to choose another path.

We can even use and IP SLA statement to track the interface and then tie it to a static route to then set the next hop value.

Example:

On R4 – which peers with an EBGP Neighbor (R1)

ip sla 1
 icmp-echo 10.3.1.1
 timeout 2000
 frequency 5

ip sla schedule 1 life forever start-time now

ip route 69.254.0.1 255.255.255.255 Null0 track 2 (note this is the IPv4 automatic IP)

route-map CHANGE_NEXT_HOP permit 10
 set ip next-hop 169.254.0.1

Router bgp 100
 Neighbor IBGP_PEERS route-map CHANGE_NEXT_HOP out.

Just verifying I can actually ping the IP address.

---------------------------------------------------------------------------------
R4(config-sla-monitor-echo)#do ping 10.3.1.1

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.3.1.1, timeout is 2 seconds:
!!!!!                                                                                                                             
--------------------------------------------------------------------------------- 


So now if the route goes down the IPSA will kick in, remote the tracked route and my routes will now show as invalid paths.