Lessons Learned:
Identifying Routing
Loops.
Visually trace the route advertisement path
--ideally you should be able to predict 100% of routing
loops before they occur
CLI tools
-Connectivity testing with ICMP via TCL
-debug ip routing
-ip route profile
-Traceroute
-------------------------------------------------------
If the route is looping, even ping’s and traceroute’s might
not tell where the source of the problem is. This where the ip route profile and debug ip route are good for.
TOPOLOGY:
BB3 will peer RIP with R4 – R4 will peer EIGRP with R5 and
R4 will peer OSPF with R6.
On R4 – lets redistribute RIP in to EIGRP.
R4(config)#router eigrp 1
R4(config-router)#redistribute rip metric 100000 100 255 1
1500
Verify the routes we expect are showing up correctly:
D EX 10.10.10.0
[170/2221056] via 172.16.5.5, 00:00:52, FastEthernet0/0
D EX 11.0.0.0/8 [170/2221056] via 172.16.5.5, 00:00:52,
FastEthernet0/0
D EX 12.0.0.0/8 [170/2221056] via 172.16.5.5, 00:00:54,
FastEthernet0/0
D EX 13.0.0.0/8 [170/2221056] via 172.16.5.5, 00:00:54,
FastEthernet0/0
Now let’s redistribute EIGRP in RIP
R4(config)#router rip
R4(config-router)#redistribute eigrp 1 metric 1
Verify the networks are as expected on the BB# router:
R 172.16.0.0/16 [120/1]
via 10.10.10.4, 00:00:11, FastEthernet0/0
Now we should verify the traffic path, currently router 4 is
the only router doing redistribution so we should only traverse that path.
BB3#traceroute 172.16.23.2
Type escape sequence to abort.
Tracing the route to 172.16.23.2
1 10.10.10.4 16 msec
40 msec 20 msec
2 172.16.45.5 16
msec 20 msec 24 msec
3 172.16.5.10 40
msec 16 msec 44 msec
4 172.16.10.2 48
msec * 56 msec
BB3#
Now on R# we will do redistribution from OSPF and EIGRP
R3(config)#router ospf 1
R3(config-router)#redistribute eigrp 1 subnets
R3(config)#router eigrp 1
R3(config-router)#redistribute ospf 1 metric 100000 100 255
1 1500
Now let’s verify the routes – we will at the routing table
of R10.
R10#traceroute 192.168.69.6
Type escape sequence to abort.
Tracing the route to 192.168.69.6
1 172.16.10.2 16
msec 20 msec 28 msec
2 172.16.23.3 24
msec 48 msec 36 msec
3 192.168.93.9 48
msec 40 msec 56 msec
4 192.168.69.6 68
msec * 84 msec
R10#
Once all the redistribution is complete – we can check to
see if routes are being added or removed and collect statistic on the routing
table to see if the network is stable or f there are changes in the routing
table.
The “ip route profile” feature is designed to accomplish. We
will need to configure this on all the routers.
EX: R10(config)#ip
route profile
Now that the feature is turned on – we can look at the
output by simply saying
R3#sh ip route
profile
IP routing table change statistics:
Frequency of changes in a 5 second sampling interval
-------------------------------------------------------------
Change/
Fwd-path Prefix Nexthop
Pathcount Prefix
interval change add
change change refresh
-------------------------------------------------------------
0 5 5 6
6 6
1 0 0 0
0 0
2 0 0 0
0 0
3 1 1 0
0 0
4 0 0
0 0 0
5 0 0 0
0 0
10 0 0 0
0 0
15 0 0 0
0 0
20 0 0 0
0 0
25 0 0 0
0 0
30 0 0 0
0 0
55 0 0 0
0 0
80 0 0 0
0 0
105 0 0 0
0 0
130 0 0 0
0 0
155 0 0 0
0 0
280 0 0 0
0 0
405 0
0 0 0 0
------------------------------------------------------------
Change/
Fwd-path Prefix Nexthop
Pathcount Prefix
interval change add
change change refresh
-------------------------------------------------------------
530 0 0 0
0 0
655 0 0 0
0 0
780 0 0 0
0 0
1405 0 0 0
0 0
2030 0 0 0
0 0
2655 0 0 0
0 0
3280 0 0 0
0 0
3905 0 0 0
0 0
7030 0 0 0
0 0
10155 0 0 0
0 0
13280 0 0 0
0 0
Overflow 0 0 0
0 0
R3#
This is basically taking a sample every 5 seconds, so did
the number of routes go up, down, etc. Did the next hop change, did the
flooding change?
We can read this like
this:
The first row says - There were 0 changes over a 5 second
interval. The number of intervals of which that occurred we’re 5 intervals.
Basically there we’re 0 changes in 5 seconds 5 times so far.
If we see the changes increase over time especially in the
high intervals – 3 – 80 for ex. That’s bad, means there’s an issue:
-------------------------------------------------------------
Change/
Fwd-path Prefix Nexthop
Pathcount Prefix
interval change add
change change refresh
-------------------------------------------------------------
0 67 67
81 81
81
1 0 0 0
0 0
2 0 0 0
0 0
3 14 14
0 0 0
4 0 0 0
0 0
5 0 0 0
0 0
10 0 0 0
0 0
15 0 0 0
0 0
20 0 0 0
0 0
25 0
0 0 0 0
30 0 0 0
0 0
55 0 0 0
0 0
80 0 0 0
0 0
105 0 0 0
0 0
130 0 0 0
0 0
155 0 0 0
0 0
280 0 0 0
0 0
405 0 0 0
0 0
In general the bolded values should all be zero and the
values at the top should all be counting up…
There are some issues here, so 14 times there was 3 changes
over a 5 (samples are in 5 seconds) second interval there issues with routers begin
added and Fwd-path changing…
3 14
14 0 0 0
This will show routing table stability and instability. This
will not show what is the exact issues are, the Key is the feature will help diagnose
issues.
For the most part this will show either convergence of the
network OR some type of flapping topology going on in the network.
The route profile will work better for AD based loops than
Metric based loops.
To determine the actual issues – we should most likely turn-on
debug ip routing
Note: in large scale networks it might be a better idea to
send the log outputs to the buffer instead of the console.
R4#debug ip routing
IP routing debugging is on
*Mar 1 00:20:48.739:
RT: add 11.0.0.0/8 via 10.10.10.3, rip metric [120/1]
*Mar 1 00:20:48.739:
RT: NET-RED 11.0.0.0/8
*Mar 1 00:20:48.747:
RT: SET_LAST_RDB for 12.0.0.0/8
NEW rdb: via
10.10.10.3
The first part of the output says we received 11.0.0.0/8 via 10.10.10.3 (BB3) via RI with
a distance of 120 and a metric of one.
This is now is installed in the routing table.
*Mar 1 00:20:48.747:
RT: add 12.0.0.0/8 via 10.10.10.3, rip metric [120/1]
*Mar 1 00:20:48.751:
RT: NET-RED 12.0.0.0/8
*Mar 1 00:20:48.755:
RT: SET_LAST_RDB for 13.0.0.0/8
NEW rdb: via
10.10.10.3
*Mar 1 00:20:48.759:
RT: add 13.0.0.0/8 via 10.10.10.3, rip metric [120/1]
*Mar 1 00:20:48.763:
RT: NET-RED 13.0.0.0/8
*Mar 1 00:20:48.995:
RT: closer admin distance for 11.0.0.0, flushing 1 routes
*Mar 1 00:20:48.999:
RT: NET-RED 11.0.0.0/8
*Mar 1 00:20:49.003:
RT: SET_LAST_RDB for 11.0.0.0/8
NEW rdb: via
192.168.46.6
*Mar 1 00:20:49.007: RT: add 11.0.0.0/8 via
192.168.46.6, ospf metric [110/20]
*Mar 1 00:20:49.011:
RT: NET-RED 11.0.0.0/8
*Mar 1 00:20:49.047:
RT: closer admin distance for
12.0.0.0, flushing 1 routes
*Mar 1 00:20:49.047:
RT: NET-RED 12.0.0.0/8
*Mar 1 00:20:49.055:
RT: SET_LAST_RDB for 12.0.0.0/8
NEW rdb: via
192.168.46.6
This Now says for prefix 11.0.0.0/8 – there’s now a “closer admin distance” closer admin distance
via OSPF. With an AD of 110 and metric of 20.
This will now override the RIP Metric.
I’ve omitted the reset of the output abut it showed that
over and over the routed was added to the table via RIP – then quickly removed
and added via OSPF.
Another way to test is run an ICMP ping with a timeout of 1 –
we should see intermittent connectivity.
We could also ten turn on “debug ip icmp” to see who is exactly sending us the ICMP
unreachable messages.
I now need to turn on “debug ip routing”on all the other routers.
On R5 -
*Mar 1 01:19:16.123:
RT: no routes to 11.0.0.0
*Mar 1 01:19:16.127:
RT: NET-RED 11.0.0.0/8
*Mar 1 01:19:16.131: RT: delete network route to
11.0.0.0
*Mar 1 01:19:16.131:
RT: NET-RED 11.0.0.0/8
*Mar 1 01:19:16.163:
RT: delete route to 13.0.0.0 via 172.16.45.4, eigrp metric [170/2195456]
*Mar 1 01:19:16.167:
RT: SET_LAST_RDB for 13.0.0.0/8
OLD rdb: via
172.16.45.4, Serial0/0
So this is telling us that the R4 router is removing the
route to into EIGRP.
If I go to another router in the topology – R3
*Mar 1 01:21:30.635:
RT: no routes to 11.0.0.0
*Mar 1 01:21:30.635:
RT: NET-RED 11.0.0.0/8
*Mar 1 01:21:30.635: RT: delete network route to
11.0.0.0
*Mar 1 01:21:30.635:
RT: NET-RED 11.0.0.0/8
*Mar 1 01:21:31.663:
Periodic IP routing statistics collection
R3#
R3#
This also say’s I had a router to 11.0.0.0 then it was
redrawn. This would lead us back to our router doing the redistribution (R4).
*Mar 1 01:24:20.799:
RT: add 11.0.0.0/8 via 192.168.46.6, ospf metric [110/20]
*Mar 1 01:24:20.799:
RT: NET-RED 11.0.0.0/8
*Mar 1 01:24:20.827: RT: closer admin distance for
12.0.0.0, flushing 1 routes
*Mar 1 01:24:20.827:
RT: NET-RED 12.0.0.0/8
*Mar 1 01:24:20.831:
RT: SET_LAST_RDB for 12.0.0.0/8
NEW rdb: via
192.168.46.6
Once again we see that we’re deleting the RIP route and
installing the OSPF route…
One the RIP route is deleted, it means that it cannot be redistributed
into EIGRP and then it cannot be redistributed into OSPF.
So the withdrawn will happen over one and over. We now know
the issues is related to the Administrative Distance.
So if we changed the distance so the RIP route is lower is
it will it should correct the issue. The problem is that filtering in redistribution
is not going to help.
One way to corrected this is to just have R4 perform the
redistribution itself into both protocols. This will keep R3 (currently redist
between EIGRP and OSPF) from learning the routed from EIGRP because of the AD.
The EIGRP routed will not get installed into the route
table, so they cannot be redistributed.
So on R4 – if we do the redistribution –
R4(config)#router ospf 1
R4(config-router)#redistribute rip subnets
R4(config)#router eigrp 1
R4(config-router)#redistribute ospf 1 metric 1 1 1 1 1
R4(config)#router rip
R4(config-router)#redistribute ospf 1 metric 1
R3#sh ip route 11.0.0.0
Routing entry for 11.0.0.0/8
Known via "ospf
1", distance 110, metric 20, type extern 2, forward metric 3
Redistributing via
eigrp 1
Advertised by eigrp
1 metric 100000 100 255 1 1500
Last update from
192.168.93.9 on FastEthernet1/0, 00:01:00 ago
Routing Descriptor
Blocks:
* 192.168.93.9, from
4.4.4.4, 00:01:00 ago, via FastEthernet1/0
Route metric is
20, traffic share count is 1
We can now see the router is installed via OSPF and not
EIGRP.
The only problem with this – is that if one of the links
goes down and the routes are then learned from another protocol we would have a
routing loop again.
Basically an order of operations again because only what’s
then in the routing table will be able to be redistributed.
To correct this issue completely we need to tell the redistributing
router what routes to use what protocols.
Thank you so much for sharing this convoluted troubleshooting. Bests, Dan
ReplyDelete