Novell Home

How to setup / use multipathing on SLES

Knowledgebase

Information

Preamble: The procedure described within this article is only supported on SLES9 SP2 level and higher. Earlier releases may not work as expected.

Note: Please see the SLES 11 SP1: Storage Administration Guide for current information on this topic.

1. Introduction

The Multipath IO (MPIO) support in SLES9 (SP2) is based on the Device Mapper (DM) multipath module of the Linux kernel, and the multipath-tools user-space package. These have been enhanced and integrated into SLES9 SP2 by SUSE Development.

DM MPIO is the preferred form of MPIO on SLES9 and the only option completely supported by Novell/SUSE.

DM MPIO features automatic configuration of the MPIO subsystem for a large variety of setups. Active/passive or active/active (with round-robin load balancing) configurations of up to 8 paths to each device are supported.

The framework is extensible both via specific hardware handlers (see below) or via more sophisticated load balancing algorithms than round-robin.

The user-space component takes care of automatic path discovery and grouping, as well as automated path retesting, so that a previously failed path is automatically reinstated when it becomes healthy again. This minimizes, if not obviates, the need for administrator attention in a production environment.

2. Supported configurations

  • Supported hardware: Architectures

    MPIO is supported on all seven architectures: IA32, AMD64/EM64T, IPF/IA64, p-Series (32-bit/64-bit), z-Series (31-bit and 64-bit).

  • Supported hardware: Storage subsystems

    The multipath-tools package is currently aware of the following storage subsystems:

    • 3Pardata VV

    • Compaq HSV110 / MSA1000

    • DDN SAN MultiDirector

    • DEC HSG80

    • EMC CLARiiON CX

    • FSC CentricStor

    • HP HSV110 / A6189A / Open-

    • Hitachi DF400 / DF500 / DF600

    • IBM 3542 / ProFibre 4000R

    • NETAPP

    • SGI TP9100 / TP9300 / TP9400 / TP9500

    • STK OPENstorage DS280

    • SUN StorEdge 3510 / T4

    In general, most other storage subsystems should work; however, the ones above will be detected automatically. Others might require an appropriate entry in the /etc/multipath.conf devices section.

    Storage arrays which require special commands on fail-over from one path to the other, or require special non-standard error handling, might require more extensive support; however, the DM framework has hooks for hardware handlers, and one such handler for the EMC CLARiiON CX family of arrays is already provided.

  • Hardware support: Host bus adapters

    • Qlogic

    • Emulex

    • LSI

    In general, all Fibre Channel / SCSI cards should work, as our MPIO implementation is above the device layer.

  • Supported software configurations summary

    Currently, DM MPIO is not available for either the root or the boot partition, as the boot loader does not know how to handle MPIO.

    All auxiliary data partitions such as /home or application data can be placed on an MPIO device.

    LVM2 is supported on top of DM MPIO. See the setup notes.

    Partitions are supported in combination with DM MPIO, but have limitations. See the setup notes.

    Software RAID on top of DM MPIO is also supported; however, note that auto-discovery is not available and that you will need to setup /etc/raidtab (if using raidtools) or /etc/mdadm.conf (if using mdadm) correctly.

3. Installation notes

  • Software installation

    Upgrade a system to SLES9 SP2 level (or more recent) and install the multipath-tools package.

  • Changing system configuration

    Using an editor of your choice, within /etc/sysconfig/hotplug set this value:

    HOTPLUG_USE_SUBFS=no

    In addition to the above change, please configure the system to automatically load the device drivers for the controllers the MPIO devices are connected to within the INITRD. The boot scripts will only detect MPIO devices if the modules for the respective controllers are loaded at boot time. To achieve this, simply add the needed driver module to the variable INITRD_MODULES within the file /etc/sysconfig/kernel.

    Example:

    Your system contains a RAID controller that is accessed by the cciss driver and you are using ReiserFS as a filesystem. The MPIO devices will be connected to a Qlogic controller accessed by the driver qla2xxx, which is not yet configured to be used on this system. The mentioned entry within /etc/sysconfig/kernel will then probably look like this:

    INITRD_MODULES="cciss reiserfs"

    Using an editor, you would now change this entry:

    INITRD_MODULES="cciss reiserfs qla2xxx"

    When you have applied this change, you will need to recreate the INITRD on your system to reflect it. Simply run this command:

    mkinitrd

    When you are using GRUB as a bootmanager, you do not use to make any further changes. Upon the next reboot the needed driver will be loaded within the INITRD. If you are using LILO as bootmanager, please remember to run it once to update the boot record.

  • Configuring multipath-tools

    If your system is one of those listed above, no further configuration should be required.

    You might otherwise have to create /etc/multipath.conf (see the examples under /usr/share/doc/packages/multipath-tools/) and add an appropriate devices entry for your storage subsystem.

    One particularly interesting option in the /etc/multipath-tools.conf file is the "polling_interval" which defines the frequency of the path checking that can be configured.

    Alternatively, you might choose to blacklist certain devices which you do not want multipath-tools to scan.

    You can then run:

    multipath -v2 -d

    to perform a 'dry-run' with this configuration. This will only scan the devices and print what the setup would look like.

    The output will look similar to:

    3600601607cf30e00184589a37a31d911
    [size=127 GB][features="0"][hwhandler="1 emc"]
    \_ round-robin 0 [first]
      \_ 1:0:1:2 sdav 66:240  [ready ]
      \_ 0:0:1:2 sdr  65:16   [ready ]
    \_ round-robin 0
      \_ 1:0:0:2 sdag 66:0    [ready ]
      \_ 0:0:0:2 sdc  8:32    [ready ]
    

    showing you the name of the MPIO device, its size, the features and hardware handlers involved, as well as the (in this case, two) priority groups (PG). For each PG, it shows whether it is the first (highest priority) one, the scheduling policy used to balance IO within the group, and the paths contained within the PG. For each path, its physical address (host:bus:target:lun), device nodename and major:minor number is shown, and of course whether the path is currently active or not.

    Paths are grouped into priority groups; there's always just one priority group in active use. To model an active/active configuration, all paths end up in the same group; to model active/passive, the paths which should not be active in parallel will be placed in several distinct priority groups. This normally happens completely automatically on device discovery.

  • Enabling the MPIO components

    Now run

    /etc/init.d/boot.multipath start
    /etc/init.d/multipathd start
    

    as user root. The multipath devices should now show up automatically under /dev/disk/by-name/; the default naming will be the WWN of the Logical Unit, which you can override via /etc/multipath.conf to suit your tastes.

    Run

    insserv boot.multipath multipathd
    

    to integrate the multipath setup into the boot sequence.

    From now on all access to the devices should go through the MPIO layer.

  • Querying MPIO status

    To query the current MPIO status, run

    multipath -l
    

    This will output the current status of the multipath maps in a format similar to the command already explained above:

    3600601607cf30e00184589a37a31d911
    [size=127 GB][features="0"][hwhandler="1 emc"]
    \_ round-robin 0 [active][first]
      \_ 1:0:1:2 sdav 66:240  [ready ][active]
      \_ 0:0:1:2 sdr  65:16   [ready ][active]
    \_ round-robin 0 [enabled]
      \_ 1:0:0:2 sdag 66:0    [ready ][active]
      \_ 0:0:0:2 sdc  8:32    [ready ][active]
    

    However, it includes additional information about which priority group is active, disabled or enabled, as well as for each path whether it is currently active or not.

  • Tuning the fail-over with specific HBAs

    HBA timeouts are typically setup for non-MPIO environments, where longer timeouts make sense - as the only alternative would be to error out the IO and propagate the error to the application. However, with MPIO, some faults (like cable failures) should be propagated upwards as fast as possible so that the MPIO layer can quickly take action and redirect the IO to another, healthy path.

    For the QLogic 2xxx family of HBAs, the following setting in /etc/modprobe.conf.local is thus recommended:

    options qla2xxx qlport_down_retry=1 ql2xfailover=0 ql2xretrycount=5
    
  • Managing IO in error situations

    In certain scenarios, where the driver, the HBA or the fabric experiences spurious errors,it is advisable that DM MPIO is configured to queue all IO in case of errors leading loss of all paths, and never propagate errors upwards.

    This can be achieved by setting

    defaults {
    		default_features "1 queue_if_no_path"
    }
    				

    in /etc/multipath.conf.

    As this will lead to IO being queued forever, unless a path is reinstated, make sure that multipathd is running and works for your scenario. Otherwise, IO might be stalled forever on the affected MPIO device, until reboot or until you manually issue a

    dmsetup message 3600601607cf30e00184589a37a31d911 0 fail_if_no_path
    				

    (substituting the correct map name), which will immediately cause all queued IO to fail. You can reactivate the queue if no path feature by issueing

    dmsetup message 3600601607cf30e00184589a37a31d911 0 queue_if_no_path
    

    You can also use these two commands to switch between both modes for testing, before committing the command to your /etc/multipath.conf.

4. Using the MPIO devices

  • Using the whole MPIO devices directly

    If you want to use the whole LUs directly (if for example you're using the SAN features to partition your storage), you can simply use the /dev/disk/by-name/xxx names directly for mkfs, /etc/fstab, your application, etc.

  • Using LVM2 on top of the MPIO devices

    To make LVM2 recognize the MPIO devices as possible Physical Volumes (PVs), you will have to modify /etc/lvm/lvm.conf. You will also want to modify it so that it does not scan and use the physical paths, but only accesses your MPIO storage via the MPIO layer.

    Thus, change the "filter" entry in lvm.conf as follows and add the types extension to make LVM2 recognize them:

    filter = [ "a|/dev/disk/by-name/.*|", "r|.*|" ]
    types = [ "device-mapper", 1 ] 

    This will allow LVM2 to only scan the by-name paths and reject everything else. (If you are also using LVM2 on non-MPIO devices, you will of course need to make the necessary adjustments to suit your setup.)

    You can then use pvcreate and the other LVM2 commands as usual on the /dev/disk/by-name/ path.

  • Partitions on top of MPIO devices

    It is not currently possible to partition the MPIO devices themselves. However, if the underlying physical device is partitioned, the MPIO device will reflect those partitions and the MPIO layer will provide /dev/disk/by-name/>name<p1 ... pN devices so you can access the partitions through the MPIO layer.

    So you will have to partition the devices prior to enabling MPIO; if you change the partitioning in the running system, MPIO will not automatically detect this and reflect the changes; you will have to reinit MPIO, which in a running system, with active access to the devices, will likely imply a reboot.

    Thus, using the LUNs directly or via LVM2 is recommended.

5. More information

Should you have trouble using MPIO on SLES9 SP2, please contact Novell support.

More information can be found at http://christophe.varoqui.free.fr/ for the multipath-tools package as well as http://sources.redhat.com/dm/ for the kernel Device Mapper components.


Disclaimer

The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.

© 2014 Novell