Monday, May 19, 2014

Administrative Distance Based Routing Loops, Debug IP Routing, IP Route Profile

Lessons Learned:

Identifying Routing Loops.
Visually trace the route advertisement path
--ideally you should be able to predict 100% of routing loops before they occur
CLI tools
-Connectivity testing with ICMP via TCL
-debug ip routing
-ip route profile
-Traceroute

------------------------------------------------------- 

If the route is looping, even ping’s and traceroute’s might not tell where the source of the problem is. This where the ip route profile and debug ip route are good for.

TOPOLOGY:
BB3 will peer RIP with R4 – R4 will peer EIGRP with R5 and R4 will peer  OSPF with R6.



















On R4 – lets redistribute RIP in to EIGRP.

R4(config)#router eigrp 1
R4(config-router)#redistribute rip metric 100000 100 255 1 1500

Verify the routes we expect are showing up correctly:
D EX    10.10.10.0 [170/2221056] via 172.16.5.5, 00:00:52, FastEthernet0/0
D EX 11.0.0.0/8 [170/2221056] via 172.16.5.5, 00:00:52, FastEthernet0/0
D EX 12.0.0.0/8 [170/2221056] via 172.16.5.5, 00:00:54, FastEthernet0/0
D EX 13.0.0.0/8 [170/2221056] via 172.16.5.5, 00:00:54, FastEthernet0/0

Now let’s redistribute EIGRP in RIP
R4(config)#router rip
R4(config-router)#redistribute eigrp 1 metric 1

Verify the networks are as expected on the BB# router:
R    172.16.0.0/16 [120/1] via 10.10.10.4, 00:00:11, FastEthernet0/0

Now we should verify the traffic path, currently router 4 is the only router doing redistribution so we should only traverse that path.

BB3#traceroute 172.16.23.2
Type escape sequence to abort.
Tracing the route to 172.16.23.2

  1 10.10.10.4 16 msec 40 msec 20 msec
  2 172.16.45.5 16 msec 20 msec 24 msec
  3 172.16.5.10 40 msec 16 msec 44 msec
  4 172.16.10.2 48 msec *  56 msec
BB3#

Now on R# we will do redistribution from OSPF and EIGRP


R3(config)#router ospf 1
R3(config-router)#redistribute eigrp 1 subnets

R3(config)#router eigrp 1
R3(config-router)#redistribute ospf 1 metric 100000 100 255 1 1500

Now let’s verify the routes – we will at the routing table of R10.
R10#traceroute 192.168.69.6

Type escape sequence to abort.
Tracing the route to 192.168.69.6
  1 172.16.10.2 16 msec 20 msec 28 msec
  2 172.16.23.3 24 msec 48 msec 36 msec
  3 192.168.93.9 48 msec 40 msec 56 msec
  4 192.168.69.6 68 msec *  84 msec
R10#

Once all the redistribution is complete – we can check to see if routes are being added or removed and collect statistic on the routing table to see if the network is stable or f there are changes in the routing table.
The “ip route profile” feature is designed to accomplish. We will need to configure this on all the routers.

EX: R10(config)#ip route profile

Now that the feature is turned on – we can look at the output by simply saying
R3#sh ip route profile

IP routing table change statistics:
Frequency of changes in a 5 second sampling interval
-------------------------------------------------------------
Change/   Fwd-path  Prefix   Nexthop  Pathcount  Prefix
interval  change    add      change   change     refresh
-------------------------------------------------------------
0         5         5        6        6          6        
1         0         0        0        0          0        
2         0         0        0        0          0        
3         1         1        0        0          0        
4         0         0        0        0          0        
5         0         0        0        0          0        
10        0         0        0        0          0        
15        0         0        0        0          0        
20        0         0        0        0          0        
25        0         0        0        0          0        
30        0         0        0        0          0        
55        0         0        0        0          0        
80        0         0        0        0          0        
105       0         0        0        0          0        
130       0         0        0        0          0        
155       0         0        0        0          0        
280       0         0        0        0          0        
405       0         0        0        0          0        
------------------------------------------------------------
Change/   Fwd-path  Prefix   Nexthop  Pathcount  Prefix
interval  change    add      change   change     refresh
-------------------------------------------------------------
530       0         0        0        0          0        
655       0         0        0        0          0        
780       0         0        0        0          0        
1405      0         0        0        0          0        
2030      0         0        0        0          0        
2655      0         0        0        0          0        
3280      0         0        0        0          0        
3905      0         0        0        0          0         
7030      0         0        0        0          0        
10155     0         0        0        0          0        
13280     0         0        0        0          0        
Overflow  0         0        0        0          0        
R3#

This is basically taking a sample every 5 seconds, so did the number of routes go up, down, etc. Did the next hop change, did the flooding change?

We can read this like this:
The first row says - There were 0 changes over a 5 second interval. The number of intervals of which that occurred we’re 5 intervals. Basically there we’re 0 changes in 5 seconds 5 times so far.
If we see the changes increase over time especially in the high intervals – 3 – 80 for ex. That’s bad, means there’s an issue:

-------------------------------------------------------------
Change/   Fwd-path  Prefix   Nexthop  Pathcount  Prefix
interval  change    add      change   change     refresh
-------------------------------------------------------------
0         67        67       81       81         81       
1         0         0        0        0          0        
2         0         0        0        0          0        
3         14        14       0        0          0        
4         0         0        0        0          0        
5         0         0        0        0          0        
10        0         0        0        0          0        
15        0         0        0        0          0        
20        0         0        0        0          0        
25        0         0        0        0          0        
30        0         0        0        0          0        
55        0         0        0        0          0        
80        0         0        0        0          0        
105       0         0        0        0          0        
130       0         0        0        0          0        
155       0         0        0        0          0        
280       0         0        0        0          0        
405       0         0        0        0          0  

In general the bolded values should all be zero and the values at the top should all be counting up…
There are some issues here, so 14 times there was 3 changes over a 5 (samples are in 5 seconds) second interval there issues with routers begin added and Fwd-path changing…
3         14        14       0        0          0        

This will show routing table stability and instability. This will not show what is the exact issues are, the Key is the feature will help diagnose issues.
For the most part this will show either convergence of the network OR some type of flapping topology going on in the network.
The route profile will work better for AD based loops than Metric based loops.

To determine the actual issues – we should most likely turn-on debug ip routing
Note: in large scale networks it might be a better idea to send the log outputs to the buffer instead of the console.

R4#debug ip routing
IP routing debugging is on


*Mar  1 00:20:48.739: RT: add 11.0.0.0/8 via 10.10.10.3, rip metric [120/1]
*Mar  1 00:20:48.739: RT: NET-RED 11.0.0.0/8
*Mar  1 00:20:48.747: RT: SET_LAST_RDB for 12.0.0.0/8
  NEW rdb: via 10.10.10.3

The first part of the output says we received 11.0.0.0/8 via 10.10.10.3 (BB3) via RI with a distance of 120 and a metric of one.
This is now is installed in the routing table.


*Mar  1 00:20:48.747: RT: add 12.0.0.0/8 via 10.10.10.3, rip metric [120/1]
*Mar  1 00:20:48.751: RT: NET-RED 12.0.0.0/8
*Mar  1 00:20:48.755: RT: SET_LAST_RDB for 13.0.0.0/8
  NEW rdb: via 10.10.10.3

*Mar  1 00:20:48.759: RT: add 13.0.0.0/8 via 10.10.10.3, rip metric [120/1]
*Mar  1 00:20:48.763: RT: NET-RED 13.0.0.0/8
*Mar  1 00:20:48.995: RT: closer admin distance for 11.0.0.0, flushing 1 routes
*Mar  1 00:20:48.999: RT: NET-RED 11.0.0.0/8
*Mar  1 00:20:49.003: RT: SET_LAST_RDB for 11.0.0.0/8
  NEW rdb: via 192.168.46.6

*Mar  1 00:20:49.007: RT: add 11.0.0.0/8 via 192.168.46.6, ospf metric [110/20]
*Mar  1 00:20:49.011: RT: NET-RED 11.0.0.0/8
*Mar  1 00:20:49.047: RT: closer admin distance for 12.0.0.0, flushing 1 routes
*Mar  1 00:20:49.047: RT: NET-RED 12.0.0.0/8
*Mar  1 00:20:49.055: RT: SET_LAST_RDB for 12.0.0.0/8
  NEW rdb: via 192.168.46.6

This Now says for prefix 11.0.0.0/8 – there’s now a “closer admin distance” closer admin distance via OSPF. With an AD of 110 and metric of 20.
This will now override the RIP Metric.
I’ve omitted the reset of the output abut it showed that over and over the routed was added to the table via RIP – then quickly removed and added via OSPF.

Another way to test is run an ICMP ping with a timeout of 1 – we should see intermittent connectivity.
We could also ten turn on “debug ip icmp” to see who is exactly sending us the ICMP unreachable messages.


I now need to turn on “debug ip routing”on all the other routers.
On R5 -

*Mar  1 01:19:16.123: RT: no routes to 11.0.0.0
*Mar  1 01:19:16.127: RT: NET-RED 11.0.0.0/8
*Mar  1 01:19:16.131: RT: delete network route to 11.0.0.0
*Mar  1 01:19:16.131: RT: NET-RED 11.0.0.0/8
*Mar  1 01:19:16.163: RT: delete route to 13.0.0.0 via 172.16.45.4, eigrp metric [170/2195456]
*Mar  1 01:19:16.167: RT: SET_LAST_RDB for 13.0.0.0/8
  OLD rdb: via 172.16.45.4, Serial0/0

So this is telling us that the R4 router is removing the route to into EIGRP.


If I go to another router in the topology – R3

*Mar  1 01:21:30.635: RT: no routes to 11.0.0.0
*Mar  1 01:21:30.635: RT: NET-RED 11.0.0.0/8
*Mar  1 01:21:30.635: RT: delete network route to 11.0.0.0
*Mar  1 01:21:30.635: RT: NET-RED 11.0.0.0/8
*Mar  1 01:21:31.663: Periodic IP routing statistics collection
R3#
R3#

This also say’s I had a router to 11.0.0.0 then it was redrawn. This would lead us back to our router doing the redistribution (R4).

*Mar  1 01:24:20.799: RT: add 11.0.0.0/8 via 192.168.46.6, ospf metric [110/20]
*Mar  1 01:24:20.799: RT: NET-RED 11.0.0.0/8
*Mar  1 01:24:20.827: RT: closer admin distance for 12.0.0.0, flushing 1 routes
*Mar  1 01:24:20.827: RT: NET-RED 12.0.0.0/8
*Mar  1 01:24:20.831: RT: SET_LAST_RDB for 12.0.0.0/8
  NEW rdb: via 192.168.46.6

Once again we see that we’re deleting the RIP route and installing the OSPF route…
One the RIP route is deleted, it means that it cannot be redistributed into EIGRP and then it cannot be redistributed into OSPF.
So the withdrawn will happen over one and over. We now know the issues is related to the Administrative Distance.

So if we changed the distance so the RIP route is lower is it will it should correct the issue. The problem is that filtering in redistribution is not going to help.
One way to corrected this is to just have R4 perform the redistribution itself into both protocols. This will keep R3 (currently redist between EIGRP and OSPF) from learning the routed from EIGRP because of the AD.
The EIGRP routed will not get installed into the route table, so they cannot be redistributed.

So on R4 – if we do the redistribution –

R4(config)#router ospf 1
R4(config-router)#redistribute rip subnets

R4(config)#router eigrp 1
R4(config-router)#redistribute ospf 1 metric 1 1 1 1 1

R4(config)#router rip
R4(config-router)#redistribute ospf 1 metric 1

R3#sh ip route 11.0.0.0
Routing entry for 11.0.0.0/8
  Known via "ospf 1", distance 110, metric 20, type extern 2, forward metric 3
  Redistributing via eigrp 1
  Advertised by eigrp 1 metric 100000 100 255 1 1500
  Last update from 192.168.93.9 on FastEthernet1/0, 00:01:00 ago
  Routing Descriptor Blocks:
  * 192.168.93.9, from 4.4.4.4, 00:01:00 ago, via FastEthernet1/0
      Route metric is 20, traffic share count is 1

We can now see the router is installed via OSPF and not EIGRP.

The only problem with this – is that if one of the links goes down and the routes are then learned from another protocol we would have a routing loop again.
Basically an order of operations again because only what’s then in the routing table will be able to be redistributed.

To correct this issue completely we need to tell the redistributing router what routes to use what protocols.


1 comment:

  1. Thank you so much for sharing this convoluted troubleshooting. Bests, Dan

    ReplyDelete