8259 Interrupts and NetWare
(Last modified: 17Jan2003)
This document (10024783) is provided subject to the disclaimer at the end of this document.
goal
8259 Interrupts and NetWare
fact
Formerly TID 2950154
fix
A Discussion on 8259 Interrupts and NetWare:
Interrupts may be viewed as the highest priority thread in an operating system. The configuration of the interrupts within a system may profoundly affect system performance for good or ill. A poor interrupt configuration may wreak havoc. How well interrupts are configured by the administrator may have more to do with overall system performance than the administrator realizes. The system administrator who configures system interrupts has a big responsibility in making sure that the priority of device interrupts in the system is set optimally. This may require some experimentation. On some systems the administrator may be constrained to use a configuration which is hardwired on the system motherboard leaving no room for reconfiguration.
The intent of this technical information document is to help make system administrators more aware of some of the issues surrounding interrupt configurations in NetWare as well as dispel some of the myths that have circulated in times past.
Interrupt Sharing:
All versions of NetWare support interrupt sharing. To share a hardware interrupt the device involved must be configured for delivering interrupts level triggered. PCI and MCA bus devices are by default level triggered and may therefore share with any other level triggered device. EISA bus devices may be set up and edge or level triggered, but must be set up as level triggered in order to share a hardware interrupt with another device. ISA bus devices are edge triggered by definition and therefore may not be shared with another hardware device.
With a few rare exceptions which we will note, configuring devices to share a common interrupt is fully supported by NetWare. In addition sharing interrupts is often required to support system configurations with a large number of IO devices.
In the past some technical information documents have issued broad statements discouraging the sharing of interrupts. These statements have left some users puzzled about the capabilities of NetWare and Novell's true position on the subject. So let us clarify Novell's position. NetWare supports shared interrupt configurations!!! Now having stated that, let us also state that sharing interrupts is not the optimal thing to do and in a few rare situations may cause unwanted side affects.
The biggest downside to a shared interrupt configuration is that system performance will be degraded when multiple devices share the same interrupt. In most situations this performance degradation is not significant and should not be a cause for concern. However, if the number of devices that share an interrupt increases beyond two or three, performance degradation may become noticeable.
The reason shared interrupts degrade system performance is that each time an device asserts its interrupt line the interrupt handlers of several other device drivers may be called before the interrupt handler for the device which actually caused the interrupt.
Each interrupt handler is required to determine if the current interrupt was caused by the device it services. If the interrupt handler detects that its corresponding device did generate an interrupt then the interrupt is claimed and serviced. If the interrupt was not caused by the corresponding device then the interrupt handler returns an error code to the OS indicating that the interrupt was not claimed or not serviced. When an interrupt is not claimed the OS calls the next interrupt handler on the shared interrupt.
If after all device drivers have been called and no one claims the interrupt the NetWare 3, 4, and 5 record this event as a spurious, or not claimed, interrupt event. Large numbers of spurious interrupts may indicate a problem with the system hardware, device hardware or device driver. Spurious interrupts may be ignored by turning the warnings off using the console command SET DISPLAY SPURIOUS INTERRUPT ALERTS = OFF.
Calling device drivers which did not generate the interrupt is time consuming. The overhead is directly proportional to the number of shared devices on the interrupt line. In the worst case we may end up calling every other interrupt handler on a shared interrupt line before finally calling the correct handler. At best we may call the correct handler on the very first try. The bottom line is that a shared interrupt configuration results in more overhead during interrupt processing.
In some systems interrupt sharing is not avoidable and we must just put up with the additional processing overhead. However, where possible the system administrator is encouraged to configure the systems interrupt resources in such a way that the performance hit from the shared interrupts is reduced to a minimum.
In NetWare 3 and 4 interrupt handlers on a shared interrupt are placed in data structures linked together into what software engineers refer to as a linked list. In NetWare 3 and 4 the interrupt handlers on a shared interrupt list are called out in the same order every time. In some rare configurations interrupt processing for the last device on the list can be delayed beyond what is required for acceptable system performance.
This delay in interrupt processing is most likely to occur when a busy device, generating a lot of interrupts, is hooked at the beginning of the linked list. In this situation the busy device always gets the first shot at claiming and servicing an interrupt. Other handlers at the end of the list may be required to wait for a new interrupt to occur before they get a chance to service the interrupt.
While calling the busiest driver first may seem like a good idea, doing so actually causes devices down stream to experience what we refer to as interrupt starvation. Interrupt starvation is most likely to happen when a device generating lots of interrupts such as a FDDI LAN card is linked at the beginning of the list.
Interrupt starvation may result in system failure if interrupt response time is critical. Such is the case for a MSL card in an SFT III server. If the MSL card is linked in the list behind a another device driver the required interrupt response time to keep the two servers synchronized could be compromised.
For this reason Novell recommends that devices that are likely to generate large amounts of interrupt traffic be separated from devices that are not so "busy". For example: you may wish to let each high speed LAN card have its own interrupt because LAN cards tend generate a lot of interrupt traffic. Disk devices on the other hand typically do not generate as many interrupts. Therefore the sharing an interrupt between two disk devices should pose less of a problem that say the sharing of interrupts between a LAN and disk device.
An SFT III MSL card really ought to have its own, non shared, high priority interrupt.
Novell also recommends giving the device used to control the disk which contains the DOS or boot partition its own, non shared, interrupt. This is not required but may avoid some problem with some device drivers.
If you are doing an over the wire network install of NetWare 3 or NetWare 4 you may NOT share the DOS LAN card interrupt with any other device that may be used by NetWare. For NetWare 3 and NetWare 4 this is not a suggestion, it is a requirement. This requirement does not apply to NetWare 5 because NetWare 5 shuts any DOS LAN client devices down after a certain stage of the NetWare 5 install.
Also in NetWare 5 the OS calls out shared interrupt handlers in a round robin fashion and the whole problem of interrupt starvation and priority inversion on shared interrupts has been eliminated. The fairness resulting from the round robin policy in NetWare 5 removes the need for the system administrator to pay much attention to which devices are shared.
In summary NetWare fully supports shared interrupt configurations. However, as a good rule of thumb, and when your system configuration allows it, interrupts sharing should be avoided, or at least carefully configured. This will eliminate, or at least reduce, the interrupt processing overhead described above and the chance for interrupt priority inversion unique to NetWare 3 and NetWare 4.
Interrupt Priority:
For systems which employ the 8259 interrupt controller in master slave configuration, the priority of interrupts is as follows:
Highest - 0 Lowest - 7
0, 1, 8, 9(2), 10, 11, 12, 13, 14, 15, 3, 4, 5, 6, 7
Novell recommends using the highest priority interrupt available for special purpose NIC cards such as an SFT III MSL. Such a configuration ensures devices used in the critical high speed communication path between servers have the highest interrupt priority.
It should be noted that a high traffic interrupt device may in fact starve other device drivers in the system. For this reason it may be necessary to do some experimentation with your system in different interrupt configurations.
In general Novell recommends that disk devices be given a higher priority interrupt than LAN devices. If this causes the system to behave poorly you may wish to rearrange the priority of devices interrupts to better meet the needs of your network environment.
For SFT III the mirrored server link (MSL) device should be given highest interrupt priority.
IRQ 9(2):
In the past some technical information documents have issued broad statements discouraging the use of IRQ 9(2). These statements have left some users puzzled about IRQ 9(2) and Novell's position on the subject. So let us clarify Novell's official position.
Older ISA bus devices which may be configured to use IRQ 2 are really using IRQ 9 on PC/AT hardware. IRQs 2 and 9 are the same from both a software and hardware standpoint. Other than the fact the IRQ 9(2) is a high priority interrupt, it is no different from any other interrupt and its use is not restricted in any way whatsoever.
However, because IRQ 9(2) is a high priority interrupt, the devices using it should be chosen carefully. The misuse of a high priority interrupt such as IRQ 9(2) may cause system problems such as down stream interrupt starvation for devices stuck on lower priority interrupts. This issue is known as interrupt priority inversion. Problems caused by interrupt priority inversion are the most likely culprit leading to problems and/or myths concerning the use of IRQ 9(2).
In general Novell recommends that disk devices be given a higher priority interrupt than LAN devices. If this causes the system to behave poorly you may wish to rearrange the priority of devices interrupts to better meet the needs of your network environment.
For SFT III the MSL device should be given highest interrupt priority.
If a configuration did not employ IRQ 9(2), but had a device on IRQ 10 the same downstream concerns apply to IRQ 10. But this usually only becomes a problem when you put an extremely busy device, with lots and lots of back to back interrupts, on a high priority interrupt.
The whole issue of interrupt priority inversion is a mute point if there is adequate time between successive high priority interrupts for devices at a lower priority to be serviced. If interrupt priority does become an issue it is the system administrators responsibility to tune or adjust the priority of interrupts in his system to his satisfaction. However, in most configurations this is not a problem because there is adequate time between high priority interrupts for all lower priority interrupts to get serviced. This is especially true if you put your LAN devices at the lowest priority since high network traffic is typically the cause of most system interrupts.
In summary, IRQ 9(2) is a high priority interrupt. Novell recommends use of IRQ 9(2) in both shared and non shared interrupt configurations. To avoid interrupt priority inversion the administrator may need to carefully select which devices use high priority interrupts.
IRQs 7 and 15:
In the past some technical information documents have issued broad statements discouraging the use of IRQs 7 and 15. These statements have left some users puzzled about IRQs 7 and 15 and Novell's official position on the subject. In this section we will fully discuss the issues surrounding IRQ 7 and 15 and dispel some of the myths concerning their use.
When a device asserts its interrupt line the corresponding 8259 interrupt input line detects the assertion and sets an interrupt request bit corresponding to that IRQ. If interrupts are enabled at the processor, the processor will acknowledge the interrupt request with a special interrupt acknowledge bus cycle. During this bus cycle the appropriate 8259 places the interrupt vector corresponding to the highest priority interrupt request, on the system bus. During normal operation at least one request bit is set indicating that the corresponding interrupt line is asserted.
If the interrupt request bit is cleared in the short interval between the time the processor is interrupted and the interrupt acknowledge cycle, the 8259 may not see any request bits set. However, it must still place an interrupt vector on the bus. This is where IRQs 7 and 15 come in.
The interrupt vector corresponding to input line 7 of the 8259 Priority Interrupt Controller (PIC) is used by the 8259 in a special way. The vector assigned to line 7 of the 8259 is placed on the system bus when there are no interrupt request register (IRR) bits set in the 8259 during the interrupt acknowledge bus cycle. 8259 input line 7 corresponds to IRQ 7 on the primary controller and to IRQ 15 on the secondary controller.
The only reasons there would not be any request bits set at interrupt acknowledge time are: 1) A level triggered device de-asserted its interrupt between the time the processor was interrupted and the interrupt acknowledge cycle. 2) The interrupt was masked by software during the same period of time. 3) There is crosstalk or noise on the interrupt lines of the system during the same period of time.
When this phenomena occurs NetWare, or any PC/AT OS for that matter, will get extra interrupts coming in on IRQs 7 or 15. NetWare has code to detect these events and reports them as "lost" interrupts.
Any interrupt input line on the master or slave 8259 can be the cause this phenomena. For example if the interrupt came from lines 0-7 on the primary 8259, and at interrupt acknowledge there is no request on IRQs 0-7, then a lost interrupt event on IRQ 7 would be recorded. If the cascade input, line 2 of the primary controller, has a valid request but lines 0-7 on the secondary controller, IRQs 8-15, do not have any request bits set then a lost event on IRQ 15 will occur.
NetWare detects the occurrence of lost interrupts and reports these events as a tool or an aid in debugging system hardware, device, or device driver problems. Unless these events are occurring in rapid succession they may ignored. For the most part their occurrence is simply annoying and has little impact on the system. Lost interrupts may be ignored by turning the warnings off using the console command SET DISPLAY LOST INTERRUPT ALERTS = OFF.
However, there are some situations where lost interrupts may cause some devices to fail. These failure situations will be discussed in a moment. But first lets make some observations about lost interrupts.
First, the 8259 latches edge triggered interrupts and holds them pending whether the interrupt is masked or not. When masked the interrupt is held pending but the request bit is cleared. If an interrupt was masked immediately after the processor got the interrupt but before the interrupt acknowledge cycle, the corresponding request bit will be clear at interrupt acknowledge time resulting in a "lost" interrupt. For the most part NetWare does not spend much time changing the mask bits of the 8259 controllers so this is not normally the cause of lost interrupts.
Because edge triggered interrupts are latched and held pending they are not usually the reason for lost interrupts occurring in the system. It is typically level triggered interrupt devices or bus noise that cause this problem. Also the offending device need not be on IRQs 7 or 15.
For a level triggered interrupt the 8259 output signal follows the interrupt input signal. So if the interrupt is de-asserted at the 8259 input, the 8259 request bits and 8259 output will also de- assert. So if you are looking for the cause of lost hardware interrupts in the system the best place to start is by examining the level triggered interrupt devices feeding the 8259. As pointed out above they need not be on IRQs 7 and 15. As a matter of fact the device on IRQ 15 has a 1 in 8 chance of being the offending device and the device on IRQ 7 has a 1 in 15 chance of being the offending device.
As mentioned above some devices may fail if they are assigned IRQs 7 or 15. This is usually only true for edge triggered devices assigned to IRQs 7 or 15. Some older ISA type device drivers fail when their interrupt handler is invoked and their device is not the cause of the interrupt. Some edge triggered devices do not have the ability to determine if they actually triggered the interrupt or not, so they just service it which causes some devices or device drivers to fail in unpredictable ways.
In versions of NetWare prior to NetWare 4.11 there was a bug where the OS would incorrectly decide that a good interrupt on IRQ 7 or 15 was actually a "lost" interrupt and throw it away by not calling the interrupt handlers. This caused some edge triggered devices which did not have interrupt retry timeout capability to fail. This problem was corrected in NetWare 4.11. Subsequent versions of NetWare do not have this problem.
If you have such a problem with an edge triggered device on IRQ 7 or 15 move it to another interrupt. Put a level triggered device or devices on IRQs 7 and 15. Level triggered interrupt devices are accustomed to being called when they did not generate the interrupt. They also stay asserted until they are called. So there is no problem for level triggered devices on IRQs 7 or 15. This statement is worth repeating. If you are having trouble with an edge triggered device on IRQs 7 or 15 move it to another interrupt and put some level triggered devices on IRQs 7 or 15. If a level triggered device fails to function as a result of lost interrupts occurring on IRQs 7 and 15 then the problem is most likely a device or driver problem not a problem with NetWare.
In summary, lost interrupts may be caused by any device interrupt on either 8259. They are not necessarily caused by the device currently assigned to IRQ 7 or 15. For the most part lost interrupts may be ignored by turning the warnings off using the console command SET DISPLAY LOST INTERRUPT ALERTS = OFF. If you do have a problem with an edge triggered device on IRQ 7 or 15 then change the configuration so that level triggered devices are using IRQs 7 and 15. And finally Novell encourages administrators to use IRQs 7 and 15 in both shared and non shared interrupt configurations especially with level triggered devices.
.
document
Document Title: | 8259 Interrupts and NetWare |
Document ID: | 10024783 |
Solution ID: | 1.0.49345018.2492168 |
Creation Date: | 11Jan2000 |
Modified Date: | 17Jan2003 |
Novell Product Class: | Management Products NetWare |
disclaimer
The Origin of this information may be internal or external to Novell. Novell makes all reasonable efforts to verify this information. However, the information provided in this document is for your information only. Novell makes no explicit or implied claims to the validity of this information.
Any trademarks referenced in this document are the property of their respective owners. Consult your product manuals for complete trademark information.