What Is a "Node Castout; Fatal SAN Device Alert"?
Articles and Tips: tip
System Test Engineer
01 Nov 2001
If you change screen to the system console screen, you'll see the device driver has also reported a fatal fibre channel error. When the device driver says it can no longer access the device, it deactivates it. All users of the device get a callback to inform them the device went away.
Clustering also receives this callback and since it essentially means the I/O path to all shared disks has failed according to the device driver, clustering forces the node to fail (as fast as it can), to allow applications to failover to other nodes.
In this scenario, a graceful shutdown is unlikely to be successful because applications will themselves fail or hang trying to use a shared disk that is no longer available to them.
The root cause is the shared disk device driver, in this case, CPQFC.HAM,has decided (for whatever reason) the I/O path to shared disk has hard-failed and it (the driver) deactivates all devices previously presented to the NetWare Operating System.
So, either the driver or the SAN itself has experienced some kind of fault that hard faults the I/O path from server to disk. Normally, the CPQFC device driver module can deal with soft faults (fibre channel LIPs for example). But in this case, it has decided the SAN has hard failed.
Compaq has a couple of tools (the FCMON and the FCDIAGS utilities) that can help you debug what their FC driver thinks is happening on the SAN.
* Originally published in Novell AppNotes
The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.