Server stops communicating via IP / Free Small ECBs dropping
(Last modified: 08Apr2005)
This document (10089748) is provided subject to the disclaimer at the end of this document.
fact
Novell NetWare
Novell NetWare 5.1
Novell NetWare 5.1 SP6
Novell NetWare 5.1 SP7
Novell NetWare 5.1 SP8
Novell NetWare 6.0
Novell NetWare 6 SP4
Novell NetWare 6 SP5
Novell NetWare 6.5
Novell NetWare 6.5 SP1
Novell NetWare 6.5 SP2
Novell NetWare 6.5 SP3
This is the official Small ECBs solution.
symptom
Server stops communicating via IP / Free Small ECBs dropping
Server loses IP connectivity
Small ECBs are not getting freed up
Small ECB leak (Ususally higher usage NOT a leak)
Server unable to communicate over IP
IP communication fails
Free Small ECBs dropping in _ip screen
cause
The server runs out of Small ECB (buffers) and stops communicating because there are no more Small ECB's to store data for processing.
Small ECBs are buffers that are 256 bytes in size. They are similar to regular ECBs which are the size of the maximum physical receive packet size which by default is 4224. They limit the use of memory required for the processing of small amounts of data.
If the server runs out of Small ECBs it can appear to stop communicating but it will not be hung and it will not be the cause of higher than normal utilization.
fix
The fix for any server experiencing a Small ECB depletion is to set the maximum Small ECB setting to 65534.
To do this at the console you would type - set tcp ip maximum small ecbs = 65534 and hit enter.
There have been some fixes for actual Small ECB leaks (not higher numbers of Small ECBs being used) released in Consolidated Support Pack 11 which contains NetWare 6 SP5 and NetWare 6.5 SP2. These fixes are in all subsequent releases of the OS including Support Packs.
note
Additional trouble shooting information:
Small ECBs are simply resources used to hold data for processing. That data may be incoming or it may be outgoing. If the data that is being received or sent is 256 bytes or smaller it will be placed in a Small ECB. If the data is incoming and the application it is destined for is hung then the data received will get placed in a Small ECB and never freed. This is an example of Small ECB depletion cause by a hung application. If the data is outbound and the LAN driver cannot put the packet on the wire the Small ECB will not get freed up. This is an example of how a NIC or LAN driver can cause Small ECB depletion. There can also be a problem in an applications code, or in Winsock, or the TCP stack that will cause a Small ECB that should get freed up to not get freed up. This is an example of a defect in code. Finding the root cause of a defect in code will take some work.
You can check how many Small ECBs are in use on your server by typing _IP <enter> at the console and select option number 1 which is "Display Current Activity". On this screen you will see 3 values relating to Small ECBs. These values are: Total Small ECBs, Free Small ECBs, and Max Small ECBs Allowed.
Total Small ECBs - This value shows how many Small ECBs are currently allocated for use on the system and the initial allocation is 512.
Free Small ECBs - This value shows how many of the Total Small ECBs are currently being used. If the value for Free Small ECBs is 512 then none of the initially allocated Small ECBs (Total Small ECBs) are in use at this time. If it is 500 then 12 of the Total Small ECBs are currently in use.
Max Small ECBs Allowed - This value limits how many Total Small ECBs can be allocated. The default value is 1024. This value is typically more than enough but can be adjusted to a maximum of 65534 using the set tcp ip maximum small ecbs = command. Some of todays applications are specifically designed to take advantage Small ECBs and will require more then the initial default of 1024. NetMail is a good example of this kind of application.
If the server appears to not be communicating and you still have plenty of Free Small ECBs you may want to try the following
- Verify that the server can ping 127.0.0.1 - The default ping on the server will be 40 bytes in size. If the ping of the loop back address works then the TCP stack is functioning and is able to allocate and free buffers. You don't have a malfunctioning TCP stack causing the loss of IP communications.
- Verify that you can now ping that server from another device and get a reply. If you can then routing, the NIC, LAN Driver and TCP stack are all functioning fine. You may simply have a hung application giving you the impression that the server cannot communicate.
- Verify the server can ping other devices. Again, if you can then routing, the NIC, LAN Driver, and TCP stack are functioning fine. Again look at the applications running on the server that don't appear to be responsive.
Understanding how and why Small ECBs are allocated:
- 512 Small ECB's are initially allocated for use on the server
- When one of the allocated Small ECBs is used the Free Small ECBs value will decrement. When that Small ECB is no longer needed it will be freed and the Free Small ECBs value will increment. If the Free Small ECBs value reaches 0, meaning all of the allocated Small ECBs are in use, and another Small ECB is required then one will be allocated and the Total Small ECB value will be incremented by one.
If the Total Small ECBs value increments to the point that it equals the Max Small ECBs value, no more Small ECBs can be allocated. At this point all applications needing to send data that should be put into a Small ECB will be unable to communicate until more Small ECBs are available. This makes it appear like the server can no longer communicate.
The process of rebooting the server unloads applications and reloads them reinitializing resources including Small ECBs and the newly loaded applications will again be able to communicate. This is not a fix but rather a statement of fact explaining why a reboot allows the server to communicate again.
Finding the root cause of "complete Small ECB depletion" or a leak will be time consuming and will usually require debug code and coredumps.
Here is how to begin the process of trouble shooting an actual Small ECB leak:
A Small ECB leak by definition is where Small ECBs get allocated but do not get freed. This can be caused by a defect in code, or even a bad NIC.
The first thing that needs to be done is to make sure that the server has the ability to allocate all of the Small ECBs it can possibly allocate. This can be done by typing at the console - set tcp ip maximum small ecbs = 65534 and then hit enter. If this value doesn't provide the server with enough Small ECBs to perform it's tasks then further work is required.
Since all Small ECBs will contain data relating to an application running on the server it is easy to conclude that unloading an application, which will free up it's resources, will free up the Small ECBs associated with that application. If you unload an application and the Free Small ECBs are still decrementing then you have not unloaded the application that is using up the Small ECBs. Conversely, if you unload an application and ALL of the Small ECBs are freed up then you know that you have found the application that is allocating but for some reason is not freeing up Small ECB resources. There can also be the case where unloading an application causes the depletion to stop but NOT ALL of the Small ECBs are freed up. In this final case that application may actually have a leak or be exposing a leak in Winsock, or BSDSock, or JSock. These are the rare cases. There is usually not an actual leak or loss of Small ECB resources.
Choosing which applications to try to unload first will be up to you. You should start with applications that have recently been added or updated. It is a good idea to test the application that is the most highly used on the server. Remember to load the application again if it did not release the Small ECBs. That way you can be certain when you find the one application that frees up the small ECBs that it is the only application that is unloaded.
Any application can have this problem. For example it has been seen to be:
CPQWEBAG.NLM
PROXY.NLM
NLDAP.NLM
DS.NLM
SURVEY.NLM
TIMESYNC.NLM
NAMED.NLM
NetMail - which includes all of the modules loaded by a load IMS command.
GWINTER.NLM
WINSOCK modules
If you have determined that the root cause may be in the TCP stack. It is a very good idea to be running the very latest TCP stack available. You can find the very latest stack by going to Product Updates and typing TCP* and clicking on the search button. This will list all of the most current released TCP stacks.
If you have done all of the above and cannot identify the source of the Small ECB depletion then debug code will be required. TCP development can provide debug code through the WWS Connectivity team that will track Small ECBs. With this code running on the server a coredump can be taken while the server is showing the Small ECB depletion and the type of data contained in the Small ECBs can be identified. This helps to identify the application to look at. Further debug code may be required from that application or the socket layer that the application chooses to communicate with the TCP stack through.
.document
Document Title: | Server stops communicating via IP / Free Small ECBs dropping |
Document ID: | 10089748 |
Solution ID: | NOVL94434 |
Creation Date: | 19Dec2003 |
Modified Date: | 08Apr2005 |
Novell Product Class: | NetWare |
disclaimer
The Origin of this information may be internal or external to Novell. Novell makes all reasonable efforts to verify this information. However, the information provided in this document is for your information only. Novell makes no explicit or implied claims to the validity of this information.
Any trademarks referenced in this document are the property of their respective owners. Consult your product manuals for complete trademark information.