Troubleshooting TCP/IP Communication Issues
Articles and Tips: article
Proactive Resolution Team
Novell Worldwide Support
15 May 2000
This document addresses communication issues that generate about a third of the support calls coming into the TCP/IP group at Novell Technical Support. We recommend that anyone who is implementing TCP/IP in a NetWare 5.x environment read and understand the information presented here.
This article is divided into two parts: understanding the concepts behind IP routing, and troubleshooting common TCP/IP problems. A follow-up article will explain some of the TCP/IP tools that are available for use in troubleshooting problems in a TCP/IP environment.
Concepts Behind TCP/IP Routing
The majority of connectivity issues involve problems with routing table entries. Every packet being processed by a TCP/IP host has a source and destination IP address. Upon receiving each packet, the IP protocol examines the destination address of the packet, compares it with entries in its local routing table, and then decides what action to take:
If the destination IP address is itself (that is, to a local application such as GroupWise, BorderManager Proxy Server, etc.), the packet is passed up to a protocol layer above IP.
If the packet is destined for another known network, the packet is forwarded through one of the locally-attached network adapters. (This assumes that the TCP/IP host has multiple interfaces and has routing enabled.)
If neither of the above apply, the packet is discarded.
The TCP/IP routing table can maintain four different types of routes, listed below in the order that they are searched for a match:
Host (a route to a single, specific destination IP address)
Subnet (a route to a subnet)
Network (a route to an entire network)
Default (used when there is no other match)
IP compares the destination IP address of the packet that it is processing with the entries in the table. If IP finds that a host entry exists and matches the destination IP address, it will forward the packet to the next hop associated with that host entry. Host entries are usually found in routing tables when ICMP (Internet Control Message Protocol) has added the entry because of the pathMTU algorithm, or from an "ICMP redirect" call. To check this, load the TCPCON utility at the server console prompt and look at the IP Routing Table option to verify if the protocol associated with that route is ICMP.
IP has three classes of addresses: Class A, Class B and Class C. Each class contains a default subnet mask (for instance, Class A has 255.0.0.0. as a default subnet) until a class of addresses is broken into extra networks (i.e., subnetted). However, once the network is subnetted, the IP address will not have the default subnet mask.
So if IP doesn't find a host entry, but does find a subnet entry that matches the packet's destination IP address, IP will forward the packet to the next hop associated with that subnet entry. Subnet entries exist when RIP2 (Routing Internet Protocol v2), OSPF (Open Shortest Path First), or static entries have been added to the routing table through a non-default subnet mask.
If IP doesn't find a subnet entry in the TCP/IP routing table but does find a network entry that matches the destination IP address, IP will forward the packet to the next hop associated with that network entry. (Customers running in default NetWare TCP/IP mode will have network entries.)
Finally, if IP doesn't find a network entry, but does find that a default route entry exists, IP will forward the packet to the next hop associated with that default entry. The default route is most commonly inserted as a static route through NetWare's server console INETCFG utility. However, the route may also be learned via RIP or OSPF. Failure to at least have a default route can often lead to communication problems on the network.
If an IP packet match has not been found in the TCP/IP routing table at this stage, the packet is simply dropped and an ICMP "destination unreachable" message is triggered to notify the sender that the host or network is unreachable.
When a TCP/IP communication problem occurs, the most common reason is that a route entry doesn't exist for the network or host with which you are trying to communicate. When this is the case, you can either add a route entry or try to figure out why the route is missing.
Troubleshooting Common TCP/IP Problems
When troubleshooting any networking problem, it is helpful to take a logical approach. Some questions to ask are:
What does work?
What doesn't work?
How are the things that do and don't work related?
Have the things that don't work ever worked on this computer/network?
If so, what has changed since the last time it did work?
Troubleshooting a problem "from the bottom up" is often a good way to quickly isolate what's wrong and come up with a solution. The "bottom up" approach from an IP routing perspective is to start by verifying that the problem is not related to the physical layer (cabling, hubs, switches, and so on) or ARP (Address Resolution Protocol). Next, you ensure that the IP routing table is functioning correctly. Finally, you check to see whether the problem is at a generic TCP/UDP or application level.
To better understand the TCP/IP troubleshooting scenarios covered in this article, we'll use a small example network to illustrate some of the most common IP problems. This example network is shown in Figure 1.
Figure 1: Example network for TCP/IP troubleshooting scenarios.
In this network, Workstation 1 accesses the Internet/WAN through a NetWare server which contains two network adapters, each with its own IP address: 220.127.116.11 and 18.104.22.168. Workstation 2 accesses the Internet/WAN through the Internet Router with the IP address of 22.214.171.124. The NetWare server also communicates to the Internet/WAN through the Internet Router, as well as the Unix box (whose IP address is 126.96.36.199), which also communicates to the Internet/WAN through the Internet Router (188.8.131.52). The Internet Router's IP internet address is 184.108.40.206.
It's also important that you understand the terms "local host" and "remote host" in an IP network environment:
A local host is one that has the same IP network address/subnet mask as another host with which you are trying to communicate.
A remote host is one that has a different IP network address/subnet mask than another host with which you are trying to communicate.
From the point of view of Workstation 1 in Figure 1, the NetWare server is considered a local host because its network adapter is attached to the same IP subnet as Workstation 1. Workstation 2, whose IP subnet address is different than that of Workstation 1, can be considered a remote host.
The following scenarios, which represent six of the most common IP problems, use the example network in Figure 1 as a reference. The most common solutions are given for each of these problems. While this is not a comprehensive list of solutions, they cover most of the routing issues that customers face.
Scenario 1: Cannot PING or Communicate with Local Router.
Symptom: The user cannot PING from Workstation 1 (220.127.116.11) to the local segment side of the NetWare server (18.104.22.168).
Solutions: If two nodes on the same subnet cannot PING each other successfully, you can use the "ARP _A" command at a Windows workstation to check the ARP table entries. The -A parameter displays the ARP entries by interrogating the current protocol data. If more than one network adapter uses the Address Resolution Protocol, you'll see entries for each ARP table.
You can also use the TCPCON utility on the NetWare server to view the IP Address Translations Table. Select the Protocol Information | IP | IP Address Translation options, and see if the computers have the correct MAC addresses listed for each other.
Note: You can use the IPConfig utility (for Windows NT), the WINIPCFG utility (for Windows 95/98), or type CONFIG <Enter> at the NetWare server console to determine a host's MAC address (displayed as Node Address).
If an ARP entry exists for the default router's IP address, perform the following troubleshooting steps.
Check for duplicate IP addresses. If another host with a duplicate IP address exists on the network, the ARP cache may contain the MAC (node) address for the other computer. If this is the case, change the IP address of one of the hosts so that it is not a duplicate on your subnet.
There may be a static (permanent) entry in the ARP cache that does not correspond to the MAC address of the host with which you are trying to communicate. If this is the case, delete that specific entry using the "ARP _D IP_address" command at a Windows workstation DOS prompt. You can also use the TCPCON utility on the NetWare server. Select the Protocol Information | IP | IP Address Translation options to view the IP Address Translations Table, highlight the appropriate Host Name / Mac Address entry, and press the <Delete> key.
The ARP table may be corrupted, in which case you must delete all entries by using the commands and/or utilities mentioned previously.
If no ARP entry exists for the default router's IP address, this usually indicates that there is a hardware problem with the devices on the network. Perform the following troubleshooting steps.
First check the physical connection of either host, as the ARP request is a physical layer broadcast and should be responded to. By typing "set tcp arp debug = on" at the server console, all ARP packets being transmitted and received by the stack are displayed at the server console and you should be able to verify whether a response to the original ARP request was received.
Verify that the IP address of the default route that is shown in the TCPCON utility through the IP Routing Table entry is correct and on the same IP subnet. You do this by checking with the IS&T department. If the workstation is requesting the MAC/IP address mapping for a different and possibly inactive IP address, there will be no ARP responses from that inactive host.
Scenario 2: Cannot PING or Communicate with Remote Interface of Local Router.
Symptom: The user can PING from Workstation 1 (22.214.171.124) to the local segment side of the NetWare server (126.96.36.199), but not from Workstation 1 to the other side of the NetWare server (188.8.131.52).Solutions:
In this scenario, Workstation 1 needs to know which IP router to send the IP packet to when the destination network is on a different subnet (to a remote host, according to our earlier definition). This procedure is not required if Workstation 1 wants to communicate with hosts only on its local subnet (local host). Each TCP/IP stack configuration (whether client or server) has a parameter for a default router or gateway. (See TID #10018660 for information on configuring and troubleshooting client issues on Windows 95/98 and NT.)
In this scenario, Workstation 1 would need to configure as its default router the IP address of the server's network adapter that is local to the workstation. The IP address would be 184.108.40.206. This implies that any packets that Workstation 1 will transmit to any remote hosts will be sent through this IP address.
The NetWare server must be configured as an IP router so that it can forward packets from one network interface board (220.127.116.11) to the other (18.104.22.168). For this to happen, TCP/IP must have been loaded with the parameter "forward=yes" as part of the configuration.
The best way to verify that TCP/IP has been loaded with forwarding enabled is through the TCPCON utility. Load TCPCON at the server console. You will see the "IP Forwarded: numbers" entry in the lower left-hand corner of the top window. If this entry has numbers after it (even if it is 0), then this server is configured as an IP router. If this entry has DISABLED after the statistic, it is not set to gateway mode. To enable this, load the INETCFG utility at the server console, select the Protocols entry, the TCP/IP entry, and then ensure that the "IP Packet Forwarding" parameter is set to ENABLED. (See TID #10013002 for more details.)
Check to see whether the "Local Errors" field in TCPCON | Statistics | IP entry increases as your PING requests fail. This field increments anytime IP drops an incoming packet for any reason. If this field is increasing, perform the following diagnostic steps:
Verify that the packet is not being blocked through a filtering mechanism, such as IPFLT.NLM. If this NLM is loaded, type "Unload IPFLT.NLM" at the server prompt, then check to see if the behavior remains the same.
Check the LAN/WAN Drivers statistics in the server MONITOR utility to see if the server is running out of ECBs (as will be indicated by the "Receive discarded, no available buffers" parameter). To do this, load MONITOR at the server console prompt, select the LAN/WAN drivers entry, the Ethernet_II entry from the Available LAN Drivers, then press the Tab key and scroll down to Receive Discarded, No Available Buffers entry.
If this entry shows a non-zero value, increase the minimum packet receive buffers setting for the server. To do this in MONITOR, select the Server Parameters | Communications options, then select the Minimum Packet Receive Buffers entry and double it. Note that the changes won't take effect until you restart the server.
Go to the TCPCON | Statistics | ICMP entry and see if any of the fields other than "ICMP Echo's Sent and Received" are incrementing as the PING command fails. (A PING request is an ICMP Echo Request, so you will see this entry increase through the PING command.) Depending on the ICMP message that is increasing, this procedure may help pin-point some network related issues. For example, the ICMP Time Exceeded Messages entry may indicate routing loops, while the ICMP Source Quench Messages entry may mean there are problems with the system being overloaded.
Scenario 3: Cannot Ping or Communicate with Internet Router.
Symptom: From Workstation 1 (22.214.171.124) the user can ping both IP addresses that are bound to the network adapters in the NetWare server (126.96.36.199 and 188.8.131.52), but cannot ping the Internet Router (184.108.40.206).Solutions:
By default, the NetWare server uses RIP as its routing protocol. However, most IP Routers use either OSPF (Open Shortest Path First) or IGRP (Interior Gateway Routing Protocol) as the routing protocol of choice. Since the routing protocols are different on both routers, they will not update each other's routing table. The IP Router will not have a route back to the 220.127.116.11 segment, and will therefore not know how to respond to Workstation 1's PING.
To fix this problem, insert a static route entry at the IP Router. On a NetWare server, this can be done using INETCFG by selecting Protocols| TCPIP| Static Routes. This entry tells the IP Router that in order to get to the 18.104.22.168 subnet, packets must go through the 22.214.171.124 gateway, which is the IP address of the NetWare server for the segment local to the IP Router. This implies that any time the Internet Router has a packet destined for 126.96.36.199, it will send it to the 188.8.131.52 gateway.
Another possible solution is to synchronize the routing protocols at the NetWare server or IP Router so that they both understand either RIP or OSPF. You do this by enabling the same routing protocol on all routers in the network. This will guarantee that the routes being advertised by both sides will dynamically enter the necessary routing tables. Note that an ASBR (Autonomous System Boundary Router) can also be set up on either router to act as a conversion gateway between OSPF and non-OSPF (static, ICMP, or RIP) routes.
Check to see whether an ARP entry exists for the Internet Router (184.108.40.206 ) in the NetWare server. To do this, go to the TCPCON | Protocol Information | IP | IP Address Translations screen.
If no ARP entry exists for the Internet Router, check the physical connection between the NetWare server and the Internet Router. Most IP routers offer commands to dump the ARP cache table; for example, Cisco's IOS router provides a "show ip arp" command.
If an ARP entry does exist for the Internet Router in the NetWare server, make sure that the hardware (MAC) address shown corresponds to the Internet Router's MAC address. If it doesn't match, the ARP table may be corrupted. In this case, load the TCPCON utility on the NetWare server. Select the Protocol Information | IP | IP Address Translation options to view the IP Address Translations Table, highlight the appropriate Host Name / Mac Address entry, and press the <Delete> key.
One other possible problem is that another device is responding to the ARPs using the Internet Router's IP address. In this case, there is either an IP address conflict or a bad switch.
Scenario 4: Cannot PING or Communicate with Remote Workstation.Symptom:
From Workstation 1 (220.127.116.11), the user can PING both IP addresses that are bound to the network adaptersin the NetWare server (18.104.22.168 and 22.214.171.124), and the Internet Router (126.96.36.199), but cannot PING Workstation 2 (188.8.131.52).Solutions:
As described in Scenario 2, the workstation must have its default router or gateway set in order to reply or send packets to segments other than its local segment gateway. (See TID #10018660 for information on configuring and troubleshooting client issues on Windows 95/98 and NT.)
From a Windows workstation DOS prompt, type the "NETSTAT -R" command. This command displays protocol statistics and current TCP/IP network connections, while the -R parameter displays the routing table. You can use this information to verify whether a default route exists from Workstation 1 to Workstation 2, and whether the route points to the next correct hop router for that subnet.
Configure the default gateway on Workstation 2. In this scenario, the default route should point to the IP address of the Internet Router (184.108.40.206), or to the server network adapter that is local to Workstation 2's segment (220.127.116.11) Then reboot the workstation (unless you used the "ROUTE ADD" command as mentioned in TID #10018660 to insert the static route at the workstation).
Scenario 5: Cannot PING or Communicate with Remote UNIX Host.Symptom:
From Workstation 1 (18.104.22.168), the user can ping both IP addresses that are bound to the network adapters in the NetWare server (22.214.171.124 and 126.96.36.199) and the Internet Router (188.8.131.52), but cannot PING the UNIX box (184.108.40.206).Solutions:
At the UNIX box, use the "NETSTAT -R" command to see if a default route (0.0.0.0) exists on that box. If no static route exists, you must enter one in order for the UNIX box to have a route to the 220.127.116.11 subnet. The syntax for adding a static route on the UNIX box in this scenario should be similar to the following:
route add net 18.104.22.168 22.214.171.124 1
(For more information on the route command for UNIX, refer to the documentation that comes with your UNIX software.)
Synchronize the routing protocols at the NetWare server or at the Unix box so they will both understand either RIP or OSPF protocols (as explained in Scenario 3, Step 2). This will guarantee that the routes being advertised by each side will dynamically enter both routing tables. Note that an ASBR (Autonomous System Boundary Router) can also be set up on either the NetWare router or the UNIX box to act as a conversion gateway between OSPF and non-OSPF (static, ICMP, or RIP) routes.
Check to see whether an ARP entry exists for the UNIX box (126.96.36.199 ) in the NetWare server. To do this, go to the TCPCON | Protocol Information | IP | IP Address Translations screen.
If no ARP entry exists at the NetWare server for the UNIX box, check the physical connection between the NetWare server and the UNIX box. (This procedure is often independent of what Novell does, so begin by looking at cabling, switches, etc.)
If an ARP entry does exist at the NetWare server for the UNIX box, make sure that the hardware (MAC) address shown corresponds to the UNIX box's MAC address. If it doesn't match, the ARP table may be corrupt. In this case, use the TCPCON utility on the NetWare server. Select the Protocol Information | IP | IP Address Translation options to view the IP Address Translations Table, highlight the appropriate Host Name / MAC Address entry, and press the <Delete> key.
One other possible problem is that another device is responding to the ARPs using the Unix box's IP address. In this case, there is either an IP address conflict or a bad switch.
Scenario 6: Cannot PING or Communicate with Remote Hosts Beyond the Internet Router.Symptom:
From Workstation 1 (188.8.131.52), the user can ping both IP addresses that are bound to the network adapters in the NetWare server (184.108.40.206 and 220.127.116.11) and the Internet Router (18.104.22.168). The user can also ping Workstation 2 (22.214.171.124) and the UNIX box (126.96.36.199), but cannot PING past the Internet Router.Solutions:
In this scenario, the NetWare server knows about both the 188.8.131.52, and the 184.108.40.206 subnet segments, but it does not know where to route the packet if the destination is not on either of these segments. To fix this, you must add a default route to the NetWare server. Load the INETCFG utility at the server console and go to the Protocols | TCP/IP | Static Routing entry (be sure it is Enabled), then go to the Static Routing Table entry. Press <Insert> to add Default Route with an IP address of Network/Host 0.0.0.0 and with the Next Hop Router on Route (Gateway) of 220.127.116.11 Metric 1 Passive. This information is then written to the SYS:ETC\GATEWAYS file. (See TID #2911404, "Set LAN Default Route NW 4.x, 3.x, WEB, Proxy," for more details.) Then use the Reinitialize System command from INETCFG's initial "Internetworking Configuration" window to add the static route to the routing tables located in the server's memory.
Run commands, such as the "show ip route summary" for Cisco IOS at the Internet Router's console prompt to view the Internet Router's routing table and see if Workstation 1's network (18.104.22.168) has an entry. Because of dynamic routing protocols such as RIP and OSPF, this should normally be the case. Problems here may indicate that the Internet Router's routing table is not being updated correctly. If no route exists, insert a static route for the 22.214.171.124 network and investigate why the 126.96.36.199 network is not being advertised by the dynamic routing protocols.
To troubleshoot this problem, you first need to understand the network layout. Having the layout in mind will enable you to identify other routers in the network that should be advertising the route. You can use LAN traces to verify whether or not these other routers are advertising the missing network, and if so, with the proper parameters, such as hop count. In some cases, invalid hop counts may be advertised and the routes are being dropped accordingly.
Verify that the packet is not being blocked through some filtering mechanism, such as the IPFLT.NLM. If this NLM is loaded, type "Unload IPFLT.NLM" at the server prompt, then check to see if the behavior is the same.
In the next column, we'll look into an extension of this troubleshooting scenario dealing with subnets and a couple of the more common problems that users face with subnetting.
* Originally published in Novell AppNotes
The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.