IP routing is an important component of a network and troubleshooting it is also important. In the previous article we saw some methods related to connected and static routing, and we started to discuss the issues with dynamic routing protocols. In this second part, we’ll continue with EIGRP and OSPF, and some common troubles that can be a problem in every dynamic protocol. We can use the same lab topology as in the first part.
As mentioned before, EIGRP uses the automatic summarization feature (as we can see by the show ip protocols command: “Automatic network summarization is in effect”). This can be useful, but let’s see the following case:
- On R2 we create a three-loopback interface with addresses 172.16.0.1, 172.16.0.63, and 172.16.0.129, respectively, with a /26 netmask.
- On R3 we create a loopback interface with the address of 172.16.0.193/26.
- We advertise these routes with the proper wildcard mask of 0.0.0.63.
If we now check the routing table on R1, we can see an entry about the 172.16.0.0/16 classful network, reachable via Fa0/0:
If we try to ping the 172.16.0.193 address on R3 from router R1 or from its LAN, it’s unreachable. Why? Both R2 and R3 advertise the whole 172.16.0.0/16 classful network. But because R2 has a better metric value, R1 chooses the path via R2. This router has its own subnets from this classful network but again, because of the auto-summary, it doesn’t have specific route to 172.16.0.192/26. The best route it has is 172.16.0.0/16, but its exit interface is Null0, which is a “bit bucket.” Therefore it will drop the packets destined to this subnet.
The solution is to turn off automatic summarizing by the no auto-summary command in router config mode on every router. Now each of them has specific routes to all subnets in the topology. It should be mentioned that, in IOS versions from 15.0, the auto-summary is disabled by default.
There’s a popular IGP on the market that can be our best bet in some situations: OSPF. OSPF is a link-state protocol, and has several parameters and configuration settings that can be problematic if not configured properly.
First of all, OSPF is based heavily on adjacency with neighbors. When the OSPF process is started and networks are advertised, on the interfaces that are part of these networks, the sending of hello packets starts. If the following parameters correspond, the adjacency can be established:
- area identifier
- network type
- hello and dead timers
Let’s see if these are configured properly.
Areas are important in planning OSPF, and in a big network they have advantages. In a simpler network we use only area 0, which is the so-called backbone area. In the case when just the backbone is used, we need to finish every network command with the keyword area 0. But when we accidentally use another number, we’ll see that the OSPF adjacency won’t form. In this case we can use the debug ip ospf adj command to trace the adjacency:
The message states the reason: “mismatch area ID.” The hello packet contains this data, and therefore the two routers can decide to establish an adjacency or not.
Another configuration problem can be using the wrong netmask, which causes the neighbor routers to be on separate subnets. Try the following: configure the Fa0/1 interface on R2 with the same IP address but the netmask of 255.255.255.240 (or /28 in CIDR format). The other side uses the /29 mask, so if we issue the debug ip ospf events command, we can observe the following message:
We can see immediately that the hello parameters are mismatched, but it’s a bit harder to figure out the exact reason. Let’s interpret the output. The locally Configured (signed by a capital C) and the Received (capital R) dead and hello intervals are 40 seconds and 10 seconds, respectively. So far, so good. The problem is that masks are not the same: the Received is different than the Configured.
The working of OSPF on a router depends on the type of network to which the interfaces are connecting. For example, on Ethernet there can be a lot of neighbors, and therefore OSPF needs a DR and a BDR to maintain the adjacencies. On the other hand, on a point-to-point network there is no need for such devices. OSPF can automatically recognize the type of the network, as it can be seen by issuing the very informative show ip ospf interface name command:
Under some circumstances, we need to override the network type settings. For example, if we want to use OSPF on a frame relay network and simulate the behavior of a broadcast multiaccess network, we can set the serial interface network type to broadcast with the ip ospf network broadcast command.
Now let’s explore what will be the effect of setting the Fa0/0 interface on R1 to point-to-point, and leaving the other side (on R2) broadcast. If we shut down the interfaces and then turn them on again, the first thing that can be strange is that the adjacency forms slower than normal. Secondly, we can observe that R1 doesn’t get routes (e.g., the subnets of 172.16.0.0) directly from R2, but from R3. It means that R2 doesn’t advertise its networks via this faulty connection. Thirdly, if we check the situation with show ip ospf neighbor on R1 and R2, there’s inconsistency: R2 sees R1 as DR, but on R1 we see full state without any DR/BDR type. This kind of error sometimes hard to recognize, so if you really need to change the network type, configure carefully!
In the previous article, we saw that EIGRP timers (hello and hold timer) can be different, although the wrong settings can cause troubles (for example, flapping routes). OSPF is very strict about these timers. The function of the dead timer is exactly the same as in EIGRP: if we don’t get any hellos from the neighbor in this interval, we consider that neighbor as “dead.” The values are held by the hello packet, and they have to be exactly the same on two adjacent routers. Let’s experiment with this.
By default, on Ethernet (which is a broadcast multiaccess type network) the hello interval is 10 seconds, and the dead timer is 30 seconds. If, for example, we have a trustworthy connection, we can increase these values. We can check the default values by the show ip ospf interface command. Now try to set the hello timer on the Fa0/1 interface on R2 to 20 seconds. The adjacency goes down, as can be seen on the following picture:
The best method to reveal the reason for the error (besides checking the configuration, of course) is the debug ip ospf events command. We saw before that this will display the mismatched hello parameters message, and now we can use it again: seek for the configured and received hello and dead timer values in the output on R1.
Maybe I don’t need to mention the solution: Configure the interface on the other side to the same value. By the way, if we want to change the dead interval, we can use the ip ospf dead-interval value command.
As mentioned before, OSPF needs designated and backup designated routers (DR and BDR) on a multiaccess network, even if there’s a point-to-point Ethernet connection. These devices help to reduce the need for maintaining too much adjacencies for a router. DR and BDR are elected by a process in which the routers can use interface priority (by default the value is 1); or, if it’s the same on every connecting router, then they use the router-ID: the bigger the ID, the bigger the chance for a router to became DR. This ID is nothing more than an IPv4 address. This can be set directly, it can come from a loopback address, and finally it can come from an active interface’s address. Ideally, all the routers have different IDs, therefore they can break the tie: the router with the biggest ID will be the DR. But what happens if router IDs are the same? Let’s examine this situation.
The simplest method to set the OSPF router-ID to an easy-to-remember value is to use the router-id value command in router configuration mode. Now set the 22.214.171.124 value to R1 and R2 also! The states of DR and BDR don’t change, because if a router becomes DR, it won’t let this role out until reloading or resetting the OSPF process. We do the latter: issue the clear ip ospf process command in privileged mode but, before this, don’t forget to issue the debug ip ospf adj command also. Soon we’ll get the following output:
The first line states that the router detected a duplicate router id, so the adjacent routers cannot elect DR and BDR, therefore the neighborship cannot form. The solution is obvious: make sure that all of the routers on a shared segment have different router IDs.
Now let’s examine the authentication issues with routing protocols. As I mentioned before, it’s important to authenticate the routing information sources in a production environment to prevent hackers from injecting false information about routes. RIP and EIGRP support MD5 authentication; OSPF supports another (but less secure) method also: plaintext authentication.
EIGRP and RIP use a so-called key chain for the authentication configuration. Think of it as a real key ring that holds many keys. The first important thing is to define at least one key on both sides using the same password; this is the key-string. As EIGRP supports MD5 authentication, no password will traverse the network: the peers are using hash values (unique data computed by the shared key and the contents of the packet).
First, configure EIGRP authentication, as in the next picture:
So far, so good. Now let’s see what happens, if we intentionally configure a wrong key string “Cisco”—it’s easy to mix up with “Cisc0.” Before this, use the debug eigrp packet terse command (this won’t display hello packet information). When we apply the configuration, the adjacency will go down, and we’ll see the following:
It can be deduced that the problem is with the authentication, moreover with key 1. So we need to check the key-string value, character by character. We can also use the show key chain command for this.
Another problem can be the difference of key ID. If, for example, we use key 1 on one side, and key 2 on the other, even if the key-strings are the same, we’ll get the following error message in the debug:
EIGRP: pkt authentication key id = 2, key not defined or not live
Another feature is that the keys have accept lifetimes and send lifetimes. This is a period of time in which the key data is considered valid. If, for example, we want to use a key that authenticates a router just for a month, we can set the start and end date and time related to the key. By default both lifetime values are infinite, which means they’re always valid. If we set these time intervals incorrectly, there can be a period of time in which there’s no active key (for example, when we accidentally set the accept-lifetime starting date in the future). In this case we need to check the output of show key chain command, like this:
Observe the “valid now” text at the end; that should be there. If not, we’ll get No live authentication keys debug messages from debug eigrp packet terse command.
OSPF has two methods to authenticate the peers: plaintext/simple (Type 1) and MD5 (Type 2) authentication. (Type 0 authentication also exists, which is null authentication; this is the default.) Plaintext authentication is not too secure as the shared password is in the hello packets in an easily readable format. Today MD5 authentication is preferred, but we can mix them in an area. For example, in our example topology we use Type 2 authentication everywhere except the serial connection: there Type 1 authentication is allowable. It can be possible because authentication data can be configured on an interface basis. So we configure OSPF in the following way:
- Issue the area 0 authentication message-digest command on every router in router configuration mode (this is not a must if we set the method on an interface basis—see the next commands).
- Issue ip ospf authentication message-digest, then ip ospf message-digest-key 1 md5 cisco command on the FastEthernet interfaces.
- Issue ip ospf authentication, then ip ospf authentication-key cisco on the serial interfaces.
Troubleshooting is meaningful if there are problems – let’s make some! First, observe the effect of misconfiguration of Type1 passwords. On R1 use cisco, but on R3 use ciscp – a little typo. Not a surprise, adjacency cannot be established. If we investigate the reason, use the debug ip ospf events command, and soon we get a message about mismatching authentication keys. Now issue the show run interface interface_name
command to check the passwords.
With MD5 authentication can be more problems. Not only must the password match, but the key ID must also match. If not, we’ll get “No message digest key x on interface“, meaning that we don’t get the corresponding key ID. If this setting is OK, but the MD5 password is not, we’ll receive a “Mismatch Authentication Key – Message Digest Key x” message. In both cases, we need to check the interface configuration and make sure that the proper values have been used.
One last area that can be problematic: distributing a default route. If a router on the border of our network has a default static route towards outside networks, it can be propagated by the routing protocol. In this way, the inner routers will have a default route also. The configuration is quite simple:
- On the border router there must be a static default route (the “quad-zero” network).
- Depending on the routing protocol, we need to redistribute this information on this router. With EIGRP, we use the redistribute static command, while with RIP and OSPF we can use the default-information originate command.
If each step is missing, the other routers won’t get this information, but fortunately these two steps are really easy to handle.
At the end of these two articles it should be clear that troubleshooting needs comprehensive knowledge about the topic (in this case routing), but if somebody has it and always improves it, it is worth the effort. I hope that this material helped you to be a bit better at troubleshooting!
EIGRP troubleshooting documents:
OSPF troubleshooting documents: