My CCIE Studies

Monday, June 23, 2014

BGP Local AS, BGP Peer Groups.

Lessons Learned:

iBGP Peering Rules:

iBGP packets default to TTL 255

-implies neighbors do not have to be connected as long as IGP reachability exists

Loop preventions via route filtering

-iBGP learned routes cannot be advertised on to another iBGP neighbor.

-implies need for either….. .

--Fully meshed iBGP peerings

--Route reflection

--Confederation.

====================================

Topology:

Now before we even configure BGP we can test the low level connectivity by sending a telnet to the peer IP address using port 179. This would imply that the remote session is configured, it it’s not configured on the way back the connection will simple be refused.

We can verify basic TCP reachability with just a basic Telnet session to the IP of the peer. This can also rule any filters in the network.

Note: There can be design cases where s single router is peering with an EBGP router that is in a different AS than the rest of the routers, for example during a migration from.

To handle this under the BGP process we would configure

The peer address #neighbor x.x.x.x remote-as XX

Then we would need to configure the local AS (migrating one)

#Neighbor x.x.x.x remote-as local-as XX

We can also send two separate open message for to the new AS and one for the migrating AS for ex:

# Neighbor x.x.x.x remote-as local-as XX no-prepend replace-as dual-as

All internal routers in the topology are configured for AS 100

Each edge routers is configured for AS 400 / 200 / and 300 respectively

We can verify using a simple command of # s hip bgp summary

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd

204.12.28.253 4 100 55 55 5 0 0 00:51:07 1

What we want to see here is the for starters – any integer 0 or above under the State/PfxRcd – will show the number of routes we’re receiving from that neighbor. What we don’t want to see is Active of Idle – this will indicate something is wrong with the peering.

If there are peering issues we can use a # debug IP bgp or # debug ip packet – as long as we apply an ACL that permits only BGP

Note: verify there’s no authentication configured before debugs.

Ex:

#access-list 100 permit tcp any eq bgp any

#access-list 100 permit tcp and any eq bgp

# debug IP packet detail 100

If possible we can always do a #clear ip bgp *

Now based on the topology – without having to do route-reflection or confederation. It means we will need a full mesh of peering’s in the topology. So configuration wise it would mean that each router would need a different neighbor statement pointed at each router in the topology.

Which is not feasible with 100 + routers in the topology.

One way we can simply this is to take the iBGP peers and put them into a template of configuration that is called a peer group.

The peer group will be actual optimization the update state machine because there is one update sent to the entire peer group instead of individual updates that are sent to the neighbors.

To configure the Peer Groups:

Start the BGP process

Router BGP 100 –start process

Neighbor IBGP_PEERS peer-group - give it a name

Neighbor IBGP_PEERS remote-as 100 - all peers will be in AS 100

Neighbor IBGP_PEERS update-source Loopback0 – optional peer feature

Neighbor 100.100.4.4 peer-group IBGP_PEERS – all iBGP peer addresses.

Neighbor 100.100.5.5 peer-group IBGP_PEERS

Neighbor 100.100.6.6 peer-group IBGP_PEERS

Neighbor 100.100.7.7 peer-group IBGP_PEERS

Neighbor 100.100.8.8 peer-group IBGP_PEERS

Now what makes easier about this configuration - not only is it an optimization behind the scenes of how the update process works, it makes it easier because we can apply this template onto all of the routers.

Note: you won’t be able to peer with your own local address.

% Cannot configure the local system as neighbor

Now if we look at the #sh ip bgp summary we should see all the routes are up and that we still have a peering with the EBGP peers.

R5#sh ip bgp summary

BGP router identifier 100.100.5.5, local AS number 100

BGP table version is 7, main routing table version 7

8 network entries using 936 bytes of memory

8 path entries using 416 bytes of memory

6/3 BGP path/bestpath attribute entries using 744 bytes of memory

3 BGP AS-PATH entries using 72 bytes of memory

0 BGP route-map cache entries using 0 bytes of memory

0 BGP filter-list cache entries using 0 bytes of memory

BGP using 2168 total bytes of memory

BGP activity 12/4 prefixes, 12/4 paths, scan interval 60 secs

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd

100.100.4.4 4 100 8 8 7 0 0 00:02:22 4

100.100.6.6 4 100 6 6 7 0 0 00:00:08 2

100.100.7.7 4 100 6 8 7 0 0 00:02:52 0

100.100.8.8 4 100 4 6 7 0 0 00:00:38 0

206.22.22.2 4 200 148 152 7 0 0 00:09:19 1

R5#

So now – we should have a full mesh of peering’s inside the network as well as the peering’s for the EBGP neighbors.

The result is that for every peer – we should see the EBGP update that came in was passed along ot all the IBGP neighbors.

However – once the route comes in it will not be advertises the neighbors based on the iBGP rules.

Without route-reflection you cannot exchange any iBGP learned routes to other iBGP neighbors.

We should only see the routes updated from the EBGP source:

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd

100.100.4.4 4 100 15 15 7 0 0 00:09:45 4

100.100.6.6 4 100 13 13 7 0 0 00:07:30 2

100.100.7.7 4 100 14 16 7 0 0 00:10:14 0

100.100.8.8 4 100 12 14 7 0 0 00:08:01 0

206.22.22.2 4 200 155 159 7 0 0 00:16:42 1

R5#

R5#sh ip bgp summary

BGP router identifier 100.100.5.5, local AS number 100

BGP table version is 8, main routing table version 8

9 network entries using 1053 bytes of memory

9 path entries using 468 bytes of memory

6/3 BGP path/bestpath attribute entries using 744 bytes of memory

3 BGP AS-PATH entries using 72 bytes of memory

0 BGP route-map cache entries using 0 bytes of memory

0 BGP filter-list cache entries using 0 bytes of memory

BGP using 2337 total bytes of memory

BGP activity 13/4 prefixes, 13/4 paths, scan interval 60 secs

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd

100.100.4.4 4 100 17 18 8 0 0 00:11:41 4

100.100.6.6 4 100 15 16 8 0 0 00:09:26 2

100.100.7.7 4 100 16 19 8 0 0 00:12:10 0

100.100.8.8 4 100 13 16 8 0 0 00:09:57 0

206.22.22.2 4 200 158 161 8 0 0 00:18:38 2

R5#sh ip bgp

BGP table version is 8, local router ID is 100.100.5.5

Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,

r RIB-failure, S Stale

Origin codes: i - IGP, e - EGP, ? - incomplete

Network Next Hop Metric LocPrf Weight Path

*>i4.4.4.0/24 100.100.4.4 0 100 0 i

*> 5.5.5.0/24 0.0.0.0 0 32768 i

*>i6.6.6.0/24 100.100.6.6 0 100 0 i

* i10.1.1.0/24 204.12.28.254 0 100 0 400 i

* i10.2.1.0/24 204.12.28.254 0 100 0 400 i

* i10.3.1.0/24 204.12.28.254 0 100 0 400 i

* i172.16.1.0/24 206.33.33.1 0 100 0 300 i

*> 172.16.2.0/24 206.22.22.2 0 0 200 i

*> 222.222.222.0 206.22.22.2 0 0 200 i

R5#

How to read the output of #sh ip bgp

Starting from the left.

Between the asterisks and the lowercase letter I – we should be seeing the > sign (or a null value, depending on who we learn the update from) sign – this indicates the best route. The best route is the one that’s candidate to be installed in the routing table and the one we advertise.

The lower case letter I – this mean the route came from an internal BGP peer.

Next we have the actual prefix – note: if the subnet mask doesn’t show up here, is means they have the classful mask. Anything that is a subnet or an aggregate is going to show the actual mask value.

Next it the next hop value, this is what we would need to know via IGP in order to actual use the prefix.

Next is the MED – which is a non-transitive attribute. Only locally significant between me and my directly connected AS.

The loca-pref is 100 by default

The Weight value is zero for anything that’s not originated

Then the AS-PATH

Followed by the origin code. Where lowercase I for igp is better than ? for incomplete

Note: anytime we redistribute a route into BGP it’s going to get the origin code of incomplete. This would be less preferred that a route that was configured under the network statement under BGP process.

R5#sh ip route bgp

B 222.222.222.0/24 [20/0] via 206.22.22.2, 00:18:11

4.0.0.0/24 is subnetted, 1 subnets

B 4.4.4.0 [200/0] via 100.100.4.4, 00:29:45

6.0.0.0/24 is subnetted, 1 subnets

B 6.6.6.0 [200/0] via 100.100.6.6, 00:27:30

172.16.0.0/24 is subnetted, 1 subnets

B 172.16.2.0 [20/0] via 206.22.22.2, 00:36:07

R5#

Here we can see that the routers is only installing the routes learned from the EBGP neighbors instead of the iBGP neighbors. The reason is as issue-with the next-hop reachability.

Sunday, June 22, 2014

BGP 4-Byte ASN’s

Lessons Learned:

4-Byte BGP ASNs

0.0 – 65535.65535 notation

-0. [0-65535] denote original 2-byte ASNs

Requires backwards compatibility with old code

-4 byte ASN support negotiated during capability exchange

-“OLD” BGP speakers are sent ASdot numbers encoded as ASN 23456

-Real AS-Path encoded with optional transitive attributes AS4_AGGREGATOR as AS4-PATH

Most Devices that are running later versions of 12.4T code will support the 4-Byte AS.

The quickest clue will be under the BGP process – when you configure the AS number, if you’re not allowed to add for example:

Router bgp 1.5 – Then the code you’re running does not support the newer 4-Byte AS Numbers.

To view the 4 Byte field you can simply use # sh ip bgp command – under the AS Path you will see the 4 byte number.

Note: From the perspective of any system that only supports the two Byte AS number, essential the Local AS that it will send to its peer is AS# 23456 (or HEX 5BA0). Also any 4 –byte AS that is in the AS path will be encoded as AS# 23456.

There’s an encoding that happens on routers that do not support the 4 byte that converts the AS to 23456 – so the path might look like 4 23456 23456 23456. The real path un-encoded would look like 4 1.4 1.5 1.6, etc.

Key point is that from the device that only support only the 2-Byte AS – they need to say that the remote as is 23456.

Note: One thing to be aware of – if you’re doing EBGP multi-hop peering, we need to make sure not to introduce a “BGP race condition” in the case that the neighbors address we’re learning is also a route in BGP. By peering with neighbor that’s already advertising that same peer prefix into BGP. This could cause a BGP timeout because BGP cannot rely on itself for transport.

Establishing BGP peering’s, EBGP Mulihop, BGP Neighbor Disable Connected Check

Lessons learned:

BGP Transport

BGP used TCP port 179 for transport

-implies that BGP needs IGP first

BGP neighbor statement tells process to….

-listen for remote address via TCP 179

-initiate a session to remote address via TCP 179

-if collision, higher router-id becomes TCP client

If we cannot establish basic IP reachability then we cannot establish the TCP session.

Enabling basic BGP process between two routers.

TOPOLOGY:

Let’s firs setup a basic peering – from the topology I will user R9 and R8, with their AS numbers being the numbers of the routers.

Before we start the process – we can turn on a few debugs -

One to look at the actual BGP peering messages and the other is a low level debug for the IP transport.

The current underlying IGP is EIGRP. When we run the debug we want to make sure we filter out the EIGRP and only debug the BGP packets.

To do this we will need to create an extended access-list.

access-list 100 deny eigrp any any

access-list 100 permit ip any any

We can then turn on the debug on both R8 and R9. So then in our output we shouldn’t see any EIGRP hellos and updates.

R9#debug ip packet detail 100

IP packet debugging is on (detailed) for access list 100

R9#

-------------------------------------------------------------------------------------------------------------------------------

We can verify the debug is working by sending a ping to neighbor and look at the debug output.

R9#ping 192.168.89.8

Type escape sequence to abort.

Sending 5, 100-byte ICMP Echos to 192.168.89.8, timeout is 2 seconds:

!!!!!

Success rate is 100 percent (5/5), round-trip min/avg/max = 1/8/16 ms

R9#

*Mar 1 02:11:19.183: IP: tableid=0, s=192.168.89.9 (local), d=192.168.89.8 (Serial0/0), routed via FIB

*Mar 1 02:11:19.187: IP: s=192.168.89.9 (local), d=192.168.89.8 (Serial0/0), len 100, sending

*Mar 1 02:11:19.187: ICMP type=8, code=0

*Mar 1 02:11:19.191: IP: tableid=0, s=192.168.89.8 (Serial0/0), d=192.168.89.9 (Serial0/0), routed via RIB

-------------------------------------------------------------------------------------------------------------------------------

FYI this debug generates a large amount of data – we would normally want send this output to a syslog of buffer.

To limit the output we can also turn off timestamps

EX:

R9(config)#no service timestamps

We will also turn on debug ip bgp

This will essential turn on all debugs for the address family – Ipv4 Unicast.

R9#debug ip bgp

BGP debugging is on for address family: IPv4 Unicast

R9#

Under each router I’ve setup basic BGP peering. With the neighbor remote-as command and advertised each routers loopbacks networks

R8 = 8.8.8.0 /24

R9 = 9.9.9.0/24

===================

IP: s=192.168.89.8 (Serial0/0), d=192.168.89.9, len 44, rcvd 0

TCP src=17950, dst=179, seq=488103526, ack=0, win=16384 SYN -- Note: here from the debug we see the TCP src port, then the destination is port = 179 . We also see this is a SYN packet.

IP: tableid=0, s=192.168.89.9 (local), d=192.168.89.8 (Serial0/0), routed via FIB

IP: s=192.168.89.9 (local), d=192.168.89.8 (Serial0/0), len 40, sending

TCP src=179, dst=17950, seq=0, ack=488103527, win=0 ACK RST

Notice: the final part of the output is an ACK RST> this is because I have not yet added the neighbor statement on this router only (R9)

Now here’s the full output after Ive added the neighbor statement.

BGP: 192.168.89.8 went from OpenConfirm to Established

%BGP-5-ADJCHANGE: neighbor 192.168.89.8 Up

IP: tableid=0, s=192.168.89.9 (local), d=192.168.89.8 (Serial0/0), routed via FIB

IP: s=192.168.89.9 (local), d=192.168.89.8 (Serial0/0), len 92, sending

TCP src=179, dst=29773, seq=2220868845, ack=3315269002, win=16268 ACK PSH

IP: s=192.168.89.8 (Serial0/0), d=192.168.89.9, len 78, rcvd 0

TCP src=29773, dst=179, seq=3315269002, ack=2220868897, win=16268 ACK PSH

IP: tableid=0, s=192.168.89.9 (local), d=192.168.89.8 (Serial0/0), routed via FIB

IP: s=192.168.89.9 (local), d=192.168.89.8 (Serial0/0), len 78, sending

TCP src=179, dst=29773, seq=2220868897, ack=3315269040, win=16230 ACK PSH

IP: s=192.168.89.8 (Serial0/0), d=192.168.89.9, len 40, rcvd 0

TCP src=29773, dst=179, seq=3315269040, ack=2220868935, win=16230 ACK

Once I configured the neighbor statement, the router is now listening for the session. The key is that BGP is not dynamic and it cannot learn its peers automatically they must be specified manually.

Note: It’s important in the BGP network to figure out what is the actual route between the two neighbors before we actually establish the peering. This scenario was between two neighbors only but you can see in the RST in the debugs, if there was multiple paths and the router was not configured correctly, it could keep us from establishing the peering. Especially if the routers are more than one HOP away. It’s then going to depend on the routing table to determine where the session is going to be allowed.

-------------------------------

EBGP Multi-hop.

Based on the topology let’s assume now that R9 (BGP AS9) want to peer with R3 (BGP AS 3) – this would be considered a multi-hop BGP peering.

When R9 configures the neighbor statement – let’s say we point it at the FA0/1 interface on R3. When R3 configures the neighbor statement we point the neighbor statement at S0/0 interface of Router 9.

Now we need to consider that when R9 generates the TCP packet that’s going to R3. What is the local interface we would use to reach that destination?

We can obviously verify this by looking at the routing table for that route.

R3:

FastEthernet0/1 172.16.37.3

R9#sh ip route 172.16.37.3

Routing entry for 172.16.37.0/24

Known via "eigrp 1", distance 90, metric 2172416, type internal

Redistributing via eigrp 1

Last update from 172.16.79.7 on Serial0/1, 03:20:17 ago -- this is the source of the packet to R3.

Routing Descriptor Blocks:

* 172.16.79.7, from 172.16.79.7, 03:20:17 ago, via Serial0/1

Route metric is 2172416, traffic share count is 1

Total delay is 20100 microseconds, minimum bandwidth is 1544 Kbit

Reliability 255/255, minimum MTU 1500 bytes

Loading 1/255, Hops 1

Problem: on R3 we cannot guarantee the return path to R9 is going to be via the same interface.

R3#sh ip route 172.16.79.9

Routing entry for 172.16.79.0/24

Known via "eigrp 1", distance 90, metric 2172416, type internal

Redistributing via eigrp 1

Last update from 172.16.37.7 on FastEthernet0/1, 02:28:42 ago

Routing Descriptor Blocks:

* 172.16.37.7, from 172.16.37.7, 02:28:42 ago, via FastEthernet0/1

Route metric is 2172416, traffic share count is 1

Total delay is 20100 microseconds, minimum bandwidth is 1544 Kbit

Reliability 255/255, minimum MTU 1500 bytes

Loading 1/255, Hops 1

In this case it is coming from the same interface. This means that each BGP update it going to come from the IP address: 172.16.37.7

IF THIS DOES NOT match the neighbor statement that’s configured under the process, then the TCP session will not work.

The way around this issue is by using the # update-source command. In any situation that we’re not peering over a direct connection between the neighbors, we should manually specify where the packet is coming from so the remote end is then going to agree with that.

Most real world designs for this – the update source will be a loopback interface.

The reason behind this is if I’m advertising my loopback into my IGP then is really doesn’t matter as long the path is available.

How we configure this is simple.

Under the process we would simply say:

BGP (AS#)

# Neighbor x.x.x.x update-source serial0/0

Basically in the case where the BGP peering’s have multiple paths between them, then you might want to consider using the update-source command and sourcing from the loopback interface.

Let’s change the peering now and keep the same debugs running. This time we’ll add the update-source of each routers loopback interface.

The current configs look like this:

R9 –

router bgp 9

no synchronization

bgp log-neighbor-changes

network 9.9.9.0 mask 255.255.255.0

neighbor 192.168.89.8 remote-as 8

no auto-summary

R8 -

router bgp 8

no synchronization

bgp log-neighbor-changes

network 8.8.8.0 mask 255.255.255.0

neighbor 192.168.89.9 remote-as 9

no auto-summary

Basically were adding this command respectively.

Neighbor 192.168.89.9 update-source lo8

Neighbor 192.168.89.8 update-source lo9

Also before we can establish the TCP session we need to verify reachability.

R9#ping 8.8.8.8

Type escape sequence to abort.

Sending 5, 100-byte ICMP Echos to 8.8.8.8, timeout is 2 seconds:

!!!!!

Success rate is 100 percent (5/5), round-trip min/avg/max = 1/7/28 ms

R9#

Since the destination is an EBGP peer and the route is not via a connected interface. Then neither the processes are going to establish by default. No one is essentially going to send the initial TCP SYN.

We can correct this using the #disable connected check feature is used for.

EX: R9(config-router)#neighbor 192.168.89.8 disable-connected-check

We can view all the details with the command

# sh ip bgp neighbor

9#sh ip bgp neighbors

BGP neighbor is 192.168.89.8, remote AS 8, external link – here it shows the neighbor ID

BGP version 4, remote router ID 8.8.8.8 -- We’re running Version 4 and the router id is 8.8.8.8

BGP state = Established, up for 00:00:15

Last read 00:00:15, last write 00:00:15, hold time is 180, keepalive interval is 60 seconds – Note: The KA and HT do nt have to match this will be negotiated

Neighbor capabilities:

Route refresh: advertised and received(old & new)

Address family IPv4 Unicast: advertised and received

Message statistics:

InQ depth is 0

OutQ depth is 0

Sent Rcvd

Opens: 2 2

Notifications: 0 0

Updates: 2 2

Keepalives: 24 24

Route Refresh: 0 0

Total: 28 28

Default minimum time between advertisement runs is 30 second

For address family: IPv4 Unicast

BGP table version 5, neighbor version 5/0

Output queue size : 0

Index 1, Offset 0, Mask 0x2

1 update-group member

Sent Rcvd

Prefix activity: ---- ----

Prefixes Current: 1 1 (Consumes 52 bytes)

Prefixes Total: 1 1

Implicit Withdraw: 0 0

Explicit Withdraw: 0 0

Used as bestpath: n/a 1

Used as multipath: n/a 0

Outbound Inbound

Local Policy Denied Prefixes: -------- -------

Bestpath from this peer: 1 n/a

Total: 1 0

Number of NLRIs in the update sent: max 1, min 1

Connections established 2; dropped 1

Last reset 00:09:52, due to Peer closed the session

Connection state is ESTAB, I/O status: 1, unread input bytes: 0

Connection is ECN Disabled, Mininum incoming TTL 0, Outgoing TTL 1 – This show the TTL

Local host: 192.168.89.9, Local port: 42146 -- These next two lines will tell us who is the server and the client

Foreign host: 192.168.89.8, Foreign port: 179

NOTE: Remember the server Sources its traffic from TCP 179 so the Router is the client ant the traffic is coming from the client port 42146 and going to the server 192.168.89.8 on port 179. Really this won’t matter unless there’s is filtering in-place for port 179. Normally the client is the one who initiates the session first. We will only look to the router-id if we send at the same time.

Enqueued packets for retransmit: 0, input: 0 mis-ordered: 0 (0 bytes)

Event Timers (current time is 0x1B28EC):

Timer Starts Wakeups Next

Retrans 4 0 0x0

TimeWait 0 0 0x0

AckHold 3 1 0x0

SendWnd 0 0 0x0

KeepAlive 0 0 0x0

GiveUp 0 0 0x0

PmtuAger 0 0 0x0

DeadWait 0 0 0x0

iss: 4093217048 snduna: 4093217203 sndnxt: 4093217203 sndwnd: 16230

irs: 2835067880 rcvnxt: 2835068035 rcvwnd: 16230 delrcvwnd: 154

SRTT: 124 ms, RTTO: 1405 ms, RTV: 1281 ms, KRTT: 0 ms

minRTT: 8 ms, maxRTT: 300 ms, ACK hold: 200 ms

Flags: active open, nagle

IP Precedence value : 6

Datagrams (max data segment is 1460 bytes):

Rcvd: 5 (out of order: 0), with data: 3, total data bytes: 154

Sent: 7 (retransm

EBGP – TTL

Example: Based on the lab image – if I wanted to peer with R3 (running AS3) from R9.

Under the BGP process –

I would need to configure the neighbor statement for R3 and add the remote as

I would then also need to configure my update-source

I would also then configure the neighbor statement for R3 and configure ebgp-multihop

Ex:

# neighbor x.x.x.x ebgp-mulithop # (default is 255)

Note: if we do no configure the multi-hop command it defaults to one.

This would mean essentially no matter how far away I am form them I’ll establish the session.

Once again we can verify this with the #sh ip bgp neighbor command

R9#sh ip bgp neighbors

BGP neighbor is 3.3.3.3, remote AS 3, external link

BGP version 4, remote router ID 3.3.3.3

BGP state = Established, up for 00:00:38

Last read 00:00:38, last write 00:00:08, hold time is 180, keepalive interval is 60 seconds

Neighbor capabilities:

Route refresh: advertised and received(old & new)

Address family IPv4 Unicast: advertised and received

Message statistics:

InQ depth is 0

OutQ depth is 0

Sent Rcvd

Opens: 1 1

Notifications: 0 0

Updates: 2 1

Keepalives: 3 1

Route Refresh: 0 0

Total: 6 3

Default minimum time between advertisement runs is 30 seconds

For address family: IPv4 Unicast

BGP table version 4, neighbor version 4/0

Output queue size : 0

Index 1, Offset 0, Mask 0x2

1 update-group member

Sent Rcvd

Prefix activity: ---- ----

Prefixes Current: 3 1 (Consumes 52 bytes)

Prefixes Total: 3 1

Implicit Withdraw: 0 0

Explicit Withdraw: 0 0

Used as bestpath: n/a 1

Used as multipath: n/a 0

Outbound Inbound

Local Policy Denied Prefixes: -------- -------

Total: 0 0

Number of NLRIs in the update sent: max 2, min 1

Connections established 1; dropped 0

Last reset never

External BGP neighbor may be up to 255 hops away.

Connection state is ESTAB, I/O status: 1, unread input bytes: 0

Connection is ECN Disabled, Mininum incoming TTL 0, Outgoing TTL 255

Local host: 192.168.89.9, Local port: 179

Foreign host: 3.3.3.3, Foreign port: 63387

Note: By default the ebgp-multihop command is only going to control what the TTL is on your outgoing packets.

Sunday, June 15, 2014

EBGP Overview, BGP Peering Types.

Lessons learned:

BGP:

Open standards based

-RFC 4271, BGP 4

Classless path vector routing protocol

-Used multiple attributes fo routing decisions

-Supports VLSM and summarization

-Extensible

--IPv4 Mulitcast, IPv6, MPLS, etc.

BGP on Cisco’s Site:

http://www.cisco.com/c/en/us/tech/ip/border-gateway-protocol-bgp/index.html

BGP is an open standards protocol – it’s considered a path vector protocol. IGP’s are making decision based on one value, the metric to reach a destination, choosing the lowest path end to end. With BGP was originally implemented with Policy in mind where the individual attributes are on a per route based, this will determine how we route to a destination.

BGP supports VLSM and summarization. With the size of the global table growing constantly it’s important to be able summarize prefix information.

Many sites offer routers that you can login to and view the global BGP table called “route-views or route-servers.

Ex: route-server.ip.att.net – they’re basically just routers online that you can connect to. We can use these to check policies we’re trying to apply to our outbound advertisements as they pertain to the global internet.

EX: connect to router and login as rviews with a password or rviews…

rviews@route-server.ip.att.net>

You can login and see on average the size of the global BGP table.

You can also get a list of Route-servers here: http://www.netdigix.com/servers.html

Example os “s hip bgp sum” off a route-view server:

route-views>sh ip bgp summary

BGP router identifier 128.223.51.103, local AS number 6447

BGP table version is 1192614054, main routing table version 1192614054

521135 network entries using 68789820 bytes of memory à this says we currently have 521135 entries

15209495 path entries using 790893740 bytes of memory à this says’ there are over 15209495 paths to reach these entries.

2508169/94266 BGP path/bestpath attribute entries using 421372392 bytes of memory

2166488 BGP AS-PATH entries using 86870842 bytes of memory

66927 BGP community entries using 5058824 bytes of memory

396 BGP extended community entries using 12842 bytes of memory

0 BGP route-map cache entries using 0 bytes of memory

0 BGP filter-list cache entries using 0 bytes of memory

BGP using 1372998460 total bytes of memory

Dampening enabled. 11709 history paths, 16038 dampened paths

BGP activity 878512/333647 prefixes, 50876541/35465181 paths, scan interval 60 secs

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd

4.69.184.193 4 3356 7842992 73921 1192614065 0 0 4w5d 490479

----------------------------------------------------

BGP is extensible, it can be used for more than just IP unicast routing.

----------------------------------------------------

BGP ASNs

Autonomous Systems (AS) per RFC 4271 -

-A set of routers under a single technical administration, using and interior gateway protocol (IGP) and common metrics to determine how to router packets within the AS, and using and inter-AS routing protocol to determine how to route packets to other ASes,

-ASNs are allocated by Internet Assigned Numbers Authority (IANA).

http://www.iana.org/numbers/

In general a routing policy will apply for the AS as a whole, where the policy for AS1 is different from AS2, etc.

Note: Routers inside an AS need to reachability first, in reality BGP is not a routing protocol – it’s an application that manly designed to do two things.

It’s designed to advertise an IP Prefix and a next-hop value associated with that prefix.

The design issue - the next hop value that BGP reports must then be recused through some other IGP routing protocol. BGP will rely on EIGRP, OSPF etc. BGP is for destinations outside our network that we’re trying to reach.

The AS numbers themselves are assigned by IANA.

There’s recently been a change in the format that the AS number use:

Original 2-byte field

-Values 0 – 65535

-Public ASNs 1 – 64511

Private ASNs 64512 – 65535 (1024 addresses) Similar to RFC 1918

Currently 4- byte field

-RFC 4893 BGP Support for Four-octet AS Number Space

-IOS supports as of 12.4. (24)T

As of today almost all 2 byte field AS’s have been allocated.

4-Byte BGP ASNs

- 0.0 – 65535.65535 notation

- 0.{0-65535) denote original 2-byte ASNs

Requires backwards compatibility with old code

-4 Byte ASN support negotiated during capability exchange

-old BGP speakers are sent ASdot numbers encoded as ASN “23456” – from the perspective of the IOS versions that DO NOT support the AS4 numbers, the will see everyone that has the 4 byte value represented AS number 23456. This doesn’t mean the information will be lost.

-Real AS-Path encoded with optional transitive attributes AS4_AGGREGATOR and AS4_PATH –

Establishing BGP Peerings:

Like IGP, first step in BGP is to find neighbors to exchange information with

The actual logic BGP for establishing the updates, routing the traffic etc is the same as we use in our IGP.

For example in EIGRP and OSFP our first step is to figure out who are our neighbors on our connected links that we want to run our protocols on.

Once we find the neighbors we go through an adjacency negotiation where we define attributes, Area #, etc.

Once neighbors come up we exchange information and then we can do the path selection.

Same type logic in BGP –

First step is how do we establish the peering?

Unlike IGP….

-BGP does not have its own transport – EX: OSPF used IP protocol # 89, EIGRP # 88. BGP runs on top of TCP. This implies the BGP neighbors would have to have IGP reachability before they can peer BGP. Since BGP is a Standard TCP application, the normal Client/Server Roles of TCP are going to apply.

-BGP has different types of neighbors – IBGP – EBGP – Route reflectors, etc. this will control how updates are process and best path sections.

-BGP neighbors are not discovered – Neighbors are not dynamically discovered – unlike IGPs that use Multicast. Peerings are based on unicast neighbor statements under the process.

-BGP neighbors do not have to be connected – Because TCP is the transport protocol.

BGP Transport:

BGP uses TCP port 179 for transport

-implies the BGP needs IGP First.

BGP Neighbor statement tell process to….

-listen for remote address via TCP 179

-initiate a session to remote address via TCP 179

-if collision, higher router-id becomes TCP client. Can happen if client and server try to establish at same time

Normally the client will initiate the session over port 179.

Handshake Example:

R1 -- > < --R2

1) R1 Sends TCP Syn packet with a random Source port and a Destination port of 179 (if R2 is configured to accept the session from R1)

2) R2 replies with the second portion of the handshake – With a TCP ACK and SYN packet (TCP SYN ACK) saying I also want to start a session.

3) R2 will send its TCP ACK with the source port of 179 and the destination of a random negotiate value.

4) Then R1 will reply with a TCP ACK – then the session is fully open.

Key point – is that R1 will always be sending traffic toward 179 and R2 will sending traffic from port 179.

BGP Peering types:

External BGP (EBGP) Peers

-Neighbors outside my Autonomous System

Internal BGP (iBGP) Peers

-Neighbors inside my AS

Update and path selection rules change depending on what type of peer a route is being sent to/received from.

EBGP Peering Rules:

EBGP packets default to TTL 1

-can be modified if neighbors are multiple hops away.

Ex:

--#neighbor (as # ) ebgp-multihop (TTL)

--#neighbor (AS #) ttl-secuirty hops (ttl) - Common today – used to prevent against remote TCP reset attacks.

Note: these commands are mutually exclusive - you would use one of the other, not both.

Non multi-hop peers must be directly connected by default.

-can be modified if connected neighbors peer via loopbacks

-# Neighbor disable-connected-check - This disabled the connected check, normally this would be used based on Loopback addresses.

Note: The default behavior when a router goes to establish an EBGP peering, if it looks in the routing table for the neighbors destination address and it doesn’t find a directly connected route. Then the neighbor will not send the “open” message. It won’t try to establish the 3-way-handshake.

Neighbor disable-connected-check – will disable this behavior.

Loop prevention via AS-Path

-Local ASN is “prepended” to outbound updates

-inbound updates containing local ASN are discarded

-Can be modified with # neighbor allowas-in command.

Every time an update is send out to EBGP peers, we take out local AS number and add it to the AS path attribute that is inside that actual update.

The AS will track what AS’s this update when t through from the originator to our local AS. This can also be seen by using the “route-view” servers.

Ex:

route-views>sh ip bgp

Network Next Hop Metric LocPrf Weight Path

* 1.0.0.0/24 157.130.10.233 0 701 6453 15169 i

This output basically tells us that this route was originated by AS 15169 – then through 6453 and then 701. So the local peer that eh AS is received from is AS 3 701.

The number on the left most portion is the AS that we are learning the AS from, the number on the right most portion is the prefix that the AS is originated in.

Key point – whenever we send an update outbound, we will take our own AS number and out it as the first number is the AS path.

If for some reason we seen our AS number INBOUND – in the path – then we will automatically filter that update out. Basic loop prevention logic.

We can modify this with the # neighbor allowas-in command. In cases where we have the same AS number that is separated by a different AS number in the middle.

EBGP Peering Rules:

Next-hop processing

-Outbound EBGP updates have local update-source for neighbor set as next-hop.

-EX: if update source is loopback 0, next-hop is loopback0

Can be modified with route-map action “set ip next-hop” but typically shouldn’t

-ex 3^rd party next hop.

-------------------

When we send updates outbound, whatever the local address is for that peering, is going to be the next hop value that goes into the next-hop value of the route. Ex:

R1 and R2

Both routers peer via their loopback interfaces

Loopback IP 1.1.1.1

Loopback IP 2.2.2.2

R12 advertises 10.0.0.0 /24 -- > when this update is advertised to R2 over the EBGP session, R2 will say the prefix 10.0.0.0 /24 is reachable via the next-hop of 1.1.1.1

This implies that R2 will need an additional step in the route recursion to figure out what is the local connected interface that R2 will use in order to reach the destination 1.1.1.1

If we modify this we can use the route-map option but typically you would not use this is a normal design.

This is normally called a 3^rd part next hop – this is where the local router is an update but then tell you to use some other source in the data plane

In order to get there.

Example:

R1 – sends and update to R2 for prefix 10.0.0.0/24 and instead of using the normal next-hop value – which would be whatever the update source is from R1 to R2. We can tell R2 to user R3. (3.3.3.3)

This means the control plane for BGP is going to be between R1 and R2. But then the actual data plane (actual traffic forwarding) would be through some other device.

One of the flexible feature of BGP is that the control plane is not actually tied to the data plane.

iBGP peering Rules.

iBGP Peering packets default to TTL 255

-implies neighbors do not have to be connected as long as IGP reachability exists

Loop prevention via route filtering

- iBGP learned routes cannot be advertised on to another iBGP neighbor.

-implies need for either….

-Fully meshed iBGP peerings – most efficient for selecting correct path

-route reflection

-Confederation

--------------------------------------------------

Key note: there is no time to live on internal iBGP packets, this means the neighbors do not have to be directly connected. As

Long as there’s IGP reachability in the internal network it will allow us to establish the iBGP peering and ultimately advertise the prefixes.

Based on the next-hop process rules of iBGP, the control plane message again does not have to follow- the actual data plane forwarding.

The loop prevention for iBGP uses a very simple concept. If you learn a route form and iBGP neighbor – DON’T advertise it to another iBGP neighbor.

Next-hop processing:

-outbound iBGP updates to not modify the next-hop attribute regardless of iBGP type.

--iBGP peer

--Route reflectors client peer

--Route reflectors non-client peer

--confederation EBGP peer

Can be modified with the #next-hop-self or # route-map Action set ip next-hop

This basically mean the original entry point for the route is going to be maintained throughout all of the updates in the iBGP network.

Next hop value will always be the value that came from your EBGP neighbor to begin with. Unless we use the next-hop self-command.

Monday, May 19, 2014

Administrative Distance Based Routing Loops, Debug IP Routing, IP Route Profile

Lessons Learned:

Identifying Routing Loops.

Visually trace the route advertisement path

--ideally you should be able to predict 100% of routing loops before they occur

CLI tools

-Connectivity testing with ICMP via TCL

-debug ip routing

-ip route profile

-Traceroute

-------------------------------------------------------

If the route is looping, even ping’s and traceroute’s might not tell where the source of the problem is. This where the ip route profile and debug ip route are good for.

TOPOLOGY:

BB3 will peer RIP with R4 – R4 will peer EIGRP with R5 and R4 will peer OSPF with R6.

On R4 – lets redistribute RIP in to EIGRP.

R4(config)#router eigrp 1

R4(config-router)#redistribute rip metric 100000 100 255 1 1500

Verify the routes we expect are showing up correctly:

D EX 10.10.10.0 [170/2221056] via 172.16.5.5, 00:00:52, FastEthernet0/0

D EX 11.0.0.0/8 [170/2221056] via 172.16.5.5, 00:00:52, FastEthernet0/0

D EX 12.0.0.0/8 [170/2221056] via 172.16.5.5, 00:00:54, FastEthernet0/0

D EX 13.0.0.0/8 [170/2221056] via 172.16.5.5, 00:00:54, FastEthernet0/0

Now let’s redistribute EIGRP in RIP

R4(config)#router rip

R4(config-router)#redistribute eigrp 1 metric 1

Verify the networks are as expected on the BB# router:

R 172.16.0.0/16 [120/1] via 10.10.10.4, 00:00:11, FastEthernet0/0

Now we should verify the traffic path, currently router 4 is the only router doing redistribution so we should only traverse that path.

BB3#traceroute 172.16.23.2

Type escape sequence to abort.

Tracing the route to 172.16.23.2

1 10.10.10.4 16 msec 40 msec 20 msec

2 172.16.45.5 16 msec 20 msec 24 msec

3 172.16.5.10 40 msec 16 msec 44 msec

4 172.16.10.2 48 msec * 56 msec

BB3#

Now on R# we will do redistribution from OSPF and EIGRP

R3(config)#router ospf 1

R3(config-router)#redistribute eigrp 1 subnets

R3(config)#router eigrp 1

R3(config-router)#redistribute ospf 1 metric 100000 100 255 1 1500

Now let’s verify the routes – we will at the routing table of R10.

R10#traceroute 192.168.69.6

Type escape sequence to abort.

Tracing the route to 192.168.69.6

1 172.16.10.2 16 msec 20 msec 28 msec

2 172.16.23.3 24 msec 48 msec 36 msec

3 192.168.93.9 48 msec 40 msec 56 msec

4 192.168.69.6 68 msec * 84 msec

R10#

Once all the redistribution is complete – we can check to see if routes are being added or removed and collect statistic on the routing table to see if the network is stable or f there are changes in the routing table.

The “ip route profile” feature is designed to accomplish. We will need to configure this on all the routers.

EX: R10(config)#ip route profile

Now that the feature is turned on – we can look at the output by simply saying

R3#sh ip route profile

IP routing table change statistics:

Frequency of changes in a 5 second sampling interval

-------------------------------------------------------------

Change/ Fwd-path Prefix Nexthop Pathcount Prefix

interval change add change change refresh

-------------------------------------------------------------

0 5 5 6 6 6

1 0 0 0 0 0

2 0 0 0 0 0

3 1 1 0 0 0

4 0 0 0 0 0

5 0 0 0 0 0

10 0 0 0 0 0

15 0 0 0 0 0

20 0 0 0 0 0

25 0 0 0 0 0

30 0 0 0 0 0

55 0 0 0 0 0

80 0 0 0 0 0

105 0 0 0 0 0

130 0 0 0 0 0

155 0 0 0 0 0

280 0 0 0 0 0

405 0 0 0 0 0

------------------------------------------------------------

Change/ Fwd-path Prefix Nexthop Pathcount Prefix

interval change add change change refresh

-------------------------------------------------------------

530 0 0 0 0 0

655 0 0 0 0 0

780 0 0 0 0 0

1405 0 0 0 0 0

2030 0 0 0 0 0

2655 0 0 0 0 0

3280 0 0 0 0 0

3905 0 0 0 0 0

7030 0 0 0 0 0

10155 0 0 0 0 0

13280 0 0 0 0 0

Overflow 0 0 0 0 0

R3#

This is basically taking a sample every 5 seconds, so did the number of routes go up, down, etc. Did the next hop change, did the flooding change?

We can read this like this:

The first row says - There were 0 changes over a 5 second interval. The number of intervals of which that occurred we’re 5 intervals. Basically there we’re 0 changes in 5 seconds 5 times so far.

If we see the changes increase over time especially in the high intervals – 3 – 80 for ex. That’s bad, means there’s an issue:

-------------------------------------------------------------

Change/ Fwd-path Prefix Nexthop Pathcount Prefix

interval change add change change refresh

-------------------------------------------------------------

0 67 67 81 81 81

1 0 0 0 0 0

2 0 0 0 0 0

3 14 14 0 0 0

4 0 0 0 0 0

5 0 0 0 0 0

10 0 0 0 0 0

15 0 0 0 0 0

20 0 0 0 0 0

25 0 0 0 0 0

30 0 0 0 0 0

55 0 0 0 0 0

80 0 0 0 0 0

105 0 0 0 0 0

130 0 0 0 0 0

155 0 0 0 0 0

280 0 0 0 0 0

405 0 0 0 0 0

In general the bolded values should all be zero and the values at the top should all be counting up…

There are some issues here, so 14 times there was 3 changes over a 5 (samples are in 5 seconds) second interval there issues with routers begin added and Fwd-path changing…

3 14 14 0 0 0

This will show routing table stability and instability. This will not show what is the exact issues are, the Key is the feature will help diagnose issues.

For the most part this will show either convergence of the network OR some type of flapping topology going on in the network.

The route profile will work better for AD based loops than Metric based loops.

To determine the actual issues – we should most likely turn-on debug ip routing

Note: in large scale networks it might be a better idea to send the log outputs to the buffer instead of the console.

R4#debug ip routing

IP routing debugging is on

*Mar 1 00:20:48.739: RT: add 11.0.0.0/8 via 10.10.10.3, rip metric [120/1]

*Mar 1 00:20:48.739: RT: NET-RED 11.0.0.0/8

*Mar 1 00:20:48.747: RT: SET_LAST_RDB for 12.0.0.0/8

NEW rdb: via 10.10.10.3

The first part of the output says we received 11.0.0.0/8 via 10.10.10.3 (BB3) via RI with a distance of 120 and a metric of one.

This is now is installed in the routing table.

*Mar 1 00:20:48.747: RT: add 12.0.0.0/8 via 10.10.10.3, rip metric [120/1]

*Mar 1 00:20:48.751: RT: NET-RED 12.0.0.0/8

*Mar 1 00:20:48.755: RT: SET_LAST_RDB for 13.0.0.0/8

NEW rdb: via 10.10.10.3

*Mar 1 00:20:48.759: RT: add 13.0.0.0/8 via 10.10.10.3, rip metric [120/1]

*Mar 1 00:20:48.763: RT: NET-RED 13.0.0.0/8

*Mar 1 00:20:48.995: RT: closer admin distance for 11.0.0.0, flushing 1 routes

*Mar 1 00:20:48.999: RT: NET-RED 11.0.0.0/8

*Mar 1 00:20:49.003: RT: SET_LAST_RDB for 11.0.0.0/8

NEW rdb: via 192.168.46.6

*Mar 1 00:20:49.007: RT: add 11.0.0.0/8 via 192.168.46.6, ospf metric [110/20]

*Mar 1 00:20:49.011: RT: NET-RED 11.0.0.0/8

*Mar 1 00:20:49.047: RT: closer admin distance for 12.0.0.0, flushing 1 routes

*Mar 1 00:20:49.047: RT: NET-RED 12.0.0.0/8

*Mar 1 00:20:49.055: RT: SET_LAST_RDB for 12.0.0.0/8

NEW rdb: via 192.168.46.6

This Now says for prefix 11.0.0.0/8 – there’s now a “closer admin distance” closer admin distance via OSPF. With an AD of 110 and metric of 20.

This will now override the RIP Metric.

I’ve omitted the reset of the output abut it showed that over and over the routed was added to the table via RIP – then quickly removed and added via OSPF.

Another way to test is run an ICMP ping with a timeout of 1 – we should see intermittent connectivity.

We could also ten turn on “debug ip icmp” to see who is exactly sending us the ICMP unreachable messages.

I now need to turn on “debug ip routing”on all the other routers.

On R5 -

*Mar 1 01:19:16.123: RT: no routes to 11.0.0.0

*Mar 1 01:19:16.127: RT: NET-RED 11.0.0.0/8

*Mar 1 01:19:16.131: RT: delete network route to 11.0.0.0

*Mar 1 01:19:16.131: RT: NET-RED 11.0.0.0/8

*Mar 1 01:19:16.163: RT: delete route to 13.0.0.0 via 172.16.45.4, eigrp metric [170/2195456]

*Mar 1 01:19:16.167: RT: SET_LAST_RDB for 13.0.0.0/8

OLD rdb: via 172.16.45.4, Serial0/0

So this is telling us that the R4 router is removing the route to into EIGRP.

If I go to another router in the topology – R3

*Mar 1 01:21:30.635: RT: no routes to 11.0.0.0

*Mar 1 01:21:30.635: RT: NET-RED 11.0.0.0/8

*Mar 1 01:21:30.635: RT: delete network route to 11.0.0.0

*Mar 1 01:21:30.635: RT: NET-RED 11.0.0.0/8

*Mar 1 01:21:31.663: Periodic IP routing statistics collection

R3#

This also say’s I had a router to 11.0.0.0 then it was redrawn. This would lead us back to our router doing the redistribution (R4).

*Mar 1 01:24:20.799: RT: add 11.0.0.0/8 via 192.168.46.6, ospf metric [110/20]

*Mar 1 01:24:20.799: RT: NET-RED 11.0.0.0/8

*Mar 1 01:24:20.827: RT: closer admin distance for 12.0.0.0, flushing 1 routes

*Mar 1 01:24:20.827: RT: NET-RED 12.0.0.0/8

*Mar 1 01:24:20.831: RT: SET_LAST_RDB for 12.0.0.0/8

NEW rdb: via 192.168.46.6

Once again we see that we’re deleting the RIP route and installing the OSPF route…

One the RIP route is deleted, it means that it cannot be redistributed into EIGRP and then it cannot be redistributed into OSPF.

So the withdrawn will happen over one and over. We now know the issues is related to the Administrative Distance.

So if we changed the distance so the RIP route is lower is it will it should correct the issue. The problem is that filtering in redistribution is not going to help.

One way to corrected this is to just have R4 perform the redistribution itself into both protocols. This will keep R3 (currently redist between EIGRP and OSPF) from learning the routed from EIGRP because of the AD.

The EIGRP routed will not get installed into the route table, so they cannot be redistributed.

So on R4 – if we do the redistribution –

R4(config)#router ospf 1

R4(config-router)#redistribute rip subnets

R4(config)#router eigrp 1

R4(config-router)#redistribute ospf 1 metric 1 1 1 1 1

R4(config)#router rip

R4(config-router)#redistribute ospf 1 metric 1

R3#sh ip route 11.0.0.0

Routing entry for 11.0.0.0/8

Known via "ospf 1", distance 110, metric 20, type extern 2, forward metric 3

Redistributing via eigrp 1

Advertised by eigrp 1 metric 100000 100 255 1 1500

Last update from 192.168.93.9 on FastEthernet1/0, 00:01:00 ago

Routing Descriptor Blocks:

* 192.168.93.9, from 4.4.4.4, 00:01:00 ago, via FastEthernet1/0

Route metric is 20, traffic share count is 1

We can now see the router is installed via OSPF and not EIGRP.

The only problem with this – is that if one of the links goes down and the routes are then learned from another protocol we would have a routing loop again.

Basically an order of operations again because only what’s then in the routing table will be able to be redistributed.

To correct this issue completely we need to tell the redistributing router what routes to use what protocols.