We Interrupt This Server: A Discussion on Interrupts and NetWare
Articles and Tips: tip
01 Sep 1999
Interrupts may be viewed as the highest priority thread in an operating system. The configuration of the interrupts within a system may profoundly affect system performance for good or ill. A poor interrupt configuration may wreak havoc. How well interrupts are configured by the administrator may have more to do with overall system performance than the administrator realizes. The system administrator who configures system interrupts has a big responsibility in making sure that the priority of device interrupts in the system is set optimally.
This may require some experimentation. On some systems, the administrator may be constrained to use a configuration which is hardwired on the system board, leaving no room for reconfiguration. The purpose of this NetNote is to help make system administrators more aware of some of the issues surrounding interrupt configurations in NetWare, as well as to dispel some of the myths that have circulated in times past.
All versions of NetWare support interrupt sharing. To share a hardware interrupt, the device involved must be configured for delivering level-triggered interrupts. PCI and MCA bus devices are by default level-triggered and may therefore share with any other level-triggered device. EISA bus devices may be set up and edge- or level-triggered, but must be set up as level-triggered in order to share a hardware interrupt with another device. ISA bus devices are edge-triggered by definition and therefore may not be shared with another hardware device.
With a few rare exceptions which we will note, configuring devices to share a common interrupt is fully supported by NetWare. In addition, sharing interrupts is often required to support system configurations with a large number of I/O devices.
In the past, some technical information documents have issued broad statements discouraging the sharing of interrupts. These statements have left some users puzzled about the capabilities of NetWare and Novell's true position on the subject. Let us clarify Novell's position: NetWare supports shared interrupt configurations! Having said that, let us also state that sharing interrupts is not the optimal thing to do and in a few rare situations may cause unwanted side effects.
The biggest downside to a shared interrupt configuration is that system performance will be degraded when multiple devices share the same interrupt. In most situations, this performance degradation is not significant and should not be a cause for concern. However, if the number of devices that share an interrupt increases beyond two or three, performance degradation may become noticeable.
The reason shared interrupts degrade system performance is that each time a device asserts its interrupt line, the interrupt handlers of several other device drivers may be called before the interrupt handler for the device which actually caused the interrupt. Each interrupt handler is required to determine if the current interrupt was caused by the device it services. If the interrupt handler detects that its corresponding device did generate an interrupt, the interrupt is claimed and serviced. If the interrupt was not caused by the corresponding device, the interrupt handler returns an error code to the OS indicating that the interrupt was not claimed or not serviced. When an interrupt is not claimed, the OS calls the next interrupt handler on the shared interrupt.
If after all device drivers have been called and no device claims the interrupt, the NetWare operating system records this event as a spurious, or not claimed, interrupt event. Large numbers of spurious interrupts may indicate a problem with the system hardware, device hardware, or device driver. Spurious interrupts may be ignored by turning the warnings off using the console command SET DISPLAY SPURIOUS INTERRUPT ALERTS = OFF.
Calling device drivers which did not generate the interrupt is time consuming. The overhead is directly proportional to the number of shared devices on the interrupt line. In the worst-case scenario, the server may end up calling every other interrupt handler on a shared interrupt line before finally calling the correct handler. At best, the server calls the correct handler on the very first try. The bottom line is that a shared interrupt configuration results in more overhead during interrupt processing.
In some systems, interrupt sharing is simply not avoidable and you must simply put up with the additional processing overhead. However, whenever possible, the system administrator is encouraged to configure the system's interrupt resources in such a way that the performance hit from the shared interrupts is minimal.
The Linked List and Interrupt Starvation
In NetWare 3 and 4, interrupt handlers on a shared interrupt are placed in data structures that are linked together into what software engineers refer to as a "linked list." Here the interrupt handlers on a shared interrupt list are called out in the same order every time. In some rare configurations, interrupt processing for the last device on the list can be delayed beyond what is required for acceptable system performance.
This delay in interrupt processing is most likely to occur when a busy device, generating a lot of interrupts, is hooked at the beginning of the linked list. In this situation, the busy device always gets the first shot at claiming and servicing an interrupt. Other handlers at the end of the list may be required to wait for a new interrupt to occur before they get a chance to service the interrupt.
While calling the busiest driver first may seem like a good idea, doing so actually causes devices downstream to experience what is referred to as interrupt starvation. Interrupt starvation is most likely to happen when a device generating lots of interrupts (such as a FDDI LAN adapter) is linked at the beginning of the list.
Interrupt starvation may result in system failure if interrupt response time is critical. Such is the case for a MSL card in an SFT III server. If the MSL card is linked in the list behind a another device driver, the required interrupt response time to keep the two servers synchronized could be compromised.
For this reason, Novell recommends that devices likely to generate large amounts of interrupt traffic be separated from devices that are not so busy. For example, you may want to let each high-speed LAN adapter have its own interrupt because they tend to generate a lot of interrupt traffic compared to disk devices, which typically do not generate as many interrupts. Therefore, sharing an interrupt between two disk devices should pose less of a problem than the sharing of interrupts between a LAN and disk device. An SFT III MSL card ought to have its own, non-shared, high-priority interrupt.
Novell recommends giving the device used to control the disk which contains the DOS or boot partition its own, non-shared, interrupt. This is not required, but it may help avoid problems with some device drivers.
If you are performing an "over-the-wire" network installation of NetWare 3 or NetWare 4, you should not share the DOS LAN adapter interrupt with any other device that may be used by NetWare. For NetWare 3 and NetWare 4, this is not a suggestion—it is a requirement. However, this requirement does not apply to NetWare 5 because NetWare 5 shuts down any DOS LAN client devices after a certain stage of the NetWare 5 install.
The NetWare 5 OS also calls out shared interrupt handlers in a round robin fashion, thereby eliminating the whole problem of interrupt starvation and priority inversion on shared interrupts. The fairness resulting from the round robin policy in NetWare 5 removes the need for the system administrator to pay much attention to which devices are shared.
In summary, NetWare fully supports shared interrupt configurations. However, as a rule of thumb, and when your system configuration allows it, interrupt sharing should be avoided, or at least carefully configured. This will eliminate, or at least reduce, the interrupt processing overhead described above and the chance for interrupt priority inversion unique to NetWare 3 and 4.
For systems which employ the 8259 interrupt controller in a master/slave configuration, the priority of interrupts is as follows:
0, 1, 8, 9(2), 10, 11, 12, 13, 14, 15, 3, 4, 5, 6, 7
with 0 being the highest priority and 7 being the lowest priority.
Novell recommends using the highest priority interrupt available for special purpose LAN adpaters such as an SFT III MSL. Such a configuration ensures devices used in the critical high speed communication path between servers have the highest interrupt priority. However, a high traffic interrupt device may starve other device drivers in the system. For this reason, it may be necessary to do some experimentation with your system in different interrupt configurations.
In general, Novell recommends that disk devices be given a higher priority interrupt than LAN devices. If this causes the system to behave poorly, you may want to rearrange the priority of device interrupts to better meet the needs of your network environment.
In the past, some technical information documents have issued broad statements discouraging the use of IRQ 9(2). These statements have left some users puzzled about IRQ 9(2) and Novell's position on the subject. So let us clarify Novell's official position.
Older ISA bus devices which may be configured to use IRQ 2 are really using IRQ 9 on PC/AT hardware. IRQs 2 and 9 are the same from both a software and hardware standpoint. Other than the fact the IRQ 9(2) is a high-priority interrupt, it is no different from any other interrupt and its use is not restricted in any way whatsoever.
However, because IRQ 9(2) is a high-priority interrupt, the devices using it should be chosen carefully. The misuse of a high-priority interrupt such as IRQ 9(2) may cause system problems such as interrupt starvation for devices stuck on lower priority interrupts. This issue is known as interrupt priority inversion. Problems caused by interrupt priority inversion are the most likely culprit leading to problems and/or myths concerning the use of IRQ 9(2).
In general, Novell recommends that disk devices be given a higher priority interrupt than LAN devices. If this causes the system to behave poorly you may wish to rearrange the priority of device interrupts to better meet the needs of your network environment. However, for SFT III, the MSL device should be given highest interrupt priority.
If a configuration does not employ IRQ 9(2) but has a device on IRQ 10, the same downstream concerns apply to IRQ 10. But this usually only becomes a problem when you put an extremely busy device which performs a lot of back to back interrupts, on a high-priority interrupt.
The whole issue of interrupt priority inversion is a moot point if there is adequate time between successive high-priority interrupts for devices at a lower priority to be serviced. If interrupt priority does become an issue, it is the system administrator's responsibility to tune or adjust the priority of interrupts. However, in most configurations this is not a problem because there is adequate time between high-priority interrupts for all lower priority interrupts to get serviced. This is especially true if you put your LAN devices at the lowest priority, since high network traffic is typically the cause of most system interrupts.
In summary, IRQ 9(2) is a high-priority interrupt. Novell recommends use of IRQ 9(2) in both shared and non-shared interrupt configurations. To avoid interrupt priority inversion, the administrator may need to carefully select which devices use high-priority interrupts.
IRQs 7 and 15
In the past, some technical information documents have issued broad statements discouraging the use of IRQs 7 and 15. These statements have left some users puzzled about IRQs 7 and 15 and Novell's official position on the subject. This section will fully discuss the issues surrounding the use of IRQ 7 and 15.
When a device asserts its interrupt line, the corresponding 8259 interrupt input line detects the assertion and sets an interrupt request bit corresponding to that IRQ. If interrupts are enabled at the processor, the processor will acknowledge the interrupt request with a special interrupt acknowledge bus cycle. During this bus cycle, the appropriate 8259 places the interrupt vector corresponding to the highest priority interrupt request, on the system bus. During normal operation, at least one request bit is set indicating that the corresponding interrupt line is asserted.
If the interrupt request bit is cleared in the short interval between the time the processor is interrupted and the interrupt acknowledge cycle, the 8259 may not see any request bits set. However, it must still place an interrupt vector on the bus. This is where IRQs 7 and 15 come in.
The interrupt vector corresponding to input line 7 of the 8259 Priority Interrupt Controller (PIC) is used in a special way. The vector assigned to line 7 of the 8259 is placed on the system bus when there are no interrupt request register (IRR) bits set in the 8259 during the interrupt acknowledge bus cycle. 8259 input line 7 corresponds to IRQ 7 on the primary controller and to IRQ 15 on the secondary controller.
The only reasons there would not be any request bits set at interrupt acknowledge time are:
A level-triggered device un-asserted its interrupt between the time the processor was interrupted and the interrupt acknowledgment cycle.
The interrupt was masked by software during the same period of time.
There is crosstalk or noise on the interrupt lines of the system during the same period of time.
When one of these phenomena occurs, NetWare—or any PC/AT OS for that matter—will get extra interrupts coming in on IRQs 7 or 15. NetWare has code to detect these events and reports them as "lost" interrupts.
Any interrupt input line on the master or slave 8259 can be the cause of this phenomenon. For example, if the interrupt came from lines 0-7 on the primary 8259, and at interrupt acknowledgement there is no request on IRQs 0-7, then a lost interrupt event on IRQ 7 would be recorded. If the cascade input, line 2 of the primary controller, has a valid request but lines 0-7 on the secondary controller, IRQs 8-15, do not have any request bits set, then a lost event on IRQ 15 will occur.
NetWare detects the occurrence of lost interrupts and reports these events as a tool in debugging system hardware, device, or device driver problems. Unless these events are occurring in rapid succession, they may be ignored. For the most part, their occurrence is simply annoying and has little impact on the system. Lost interrupts may be ignored by turning the warnings off using the console command SET DISPLAY LOST INTERRUPT ALERTS = OFF.
However, there are some situations where lost interrupts may cause some devices to fail, as we will discuss in a moment. First. let's make some observations about lost interrupts.
The 8259 latches edge-triggered interrupts and holds them pending whether the interrupt is masked or not. When masked, the interrupt is held pending but the request bit is cleared. If an interrupt is masked immediately after the processor got the interrupt but before the interrupt acknowledgement cycle, the corresponding request bit will be cleared at interrupt acknowledgement time resulting in a "lost" interrupt. For the most part, NetWare does not spend much time changing the mask bits of the 8259 controllers so this is not normally the cause of lost interrupts.
Because edge-triggered interrupts are latched and held pending, they are not usually the reason for lost interrupts occurring in the system. It is typically level-triggered interrupt devices or bus noise that causes this problem. Also, the offending device need not be on IRQs 7 or 15.
For a level-triggered interrupt, the 8259 output signal follows the interrupt input signal. So if the interrupt is un-asserted at the 8259 input, the 8259 request bits and 8259 output will also un-assert. If you are looking for the cause of lost hardware interrupts in the system, the best place to start is by examining the level-triggered interrupt devices feeding the 8259.
As pointed out above, they need not be on IRQs 7 and 15. As a matter of fact, the device on IRQ 15 has a 1- in 8-chance of being the offending device and the device on IRQ 7 has a 1- in 15- chance of being the offending device.
As mentioned earlier, some devices may fail if they are assigned IRQs 7 or 15. This is usually only true for edge-triggered devices assigned to IRQs 7 or 15. Some older ISA type device drivers fail when their interrupt handler is invoked and their device is not the cause of the interrupt. Some edge-triggered devices do not have the ability to determine if they actually triggered the interrupt or not, so they just service the interrupt, which causes some devices or device drivers to fail in unpredictable ways.
In versions of NetWare prior to NetWare 4.11, there was a bug where the OS would incorrectly decide that a good interrupt on IRQ 7 or 15 was actually a "lost" interrupt and throw it away by not calling the interrupt handlers. This caused some edge-triggered devices which did not have interrupt retry timeout capability to fail. This problem was corrected in NetWare 4.11 and subsequent versions of NetWare do not have this problem.
If you have such a problem with an edge-triggered device on IRQ 7 or 15, move it to another interrupt. Put a level-triggered device or devices on IRQs 7 and 15. Level-triggered interrupt devices are accustomed to being called when they did not generate the interrupt. They also stay asserted until they are called. So there is no problem for level-triggered devices being placed on IRQs 7 or 15.
This statement is worth repeating. If you are having trouble with an edge-triggered device on IRQs 7 or 15, move the device to another interrupt and put level-triggered devices on IRQs 7 or 15. If a level-triggered device fails to function as a result of lost interrupts occurring on IRQs 7 and 15, the problem is most likely a device or driver problem and not a problem with NetWare.
In summary, lost interrupts may be caused by any device interrupt on either 8259. They are not necessarily caused by the device currently assigned to IRQ 7 or 15. For the most part, lost interrupts may be ignored by turning the warnings off using the console command SET DISPLAY LOST INTERRUPT ALERTS = OFF. If you do have a problem with an edge-triggered device on IRQ 7 or 15, change the configuration so that level triggered devices are using IRQs 7 and 15. And finally, Novell encourages administrators to use IRQs 7 and 15 in both shared and non-shared interrupt configurations especially with level-triggered devices.
* Originally published in Novell AppNotes
The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.