Network RAID1

The Linux SCSI Target Wiki

(Difference between revisions)
Jump to: navigation, search
m
m
 
(20 intermediate revisions not shown)
Line 1: Line 1:
-
{{Review | date = 11/26/2010 }}
+
{{Review | date=2010-11-26}}
-
'''Network RAID''' allows for two or more [[Target]] machines to become physically redundant in order to mask hardware or storage array failures.
+
'''Network RAID1''' is supported by the {{Target}} and allows for two or more {{T}} systems to become physically redundant in order to mask hardware or storage array failures.
-
{{AdSense right}}
+
-
== Overview ==
+
A prototype of a {{Target}}/[[Initiator]] ("T/I") Repeater Node was built with [[DRBD]] volumes as described below. A {{anchor|T/I Repeated Node}}"T/I Repeater Node" is a physical or virtual machine that is running both iSCSI target and Initiator stacks. The DRBD T/I Repeater Node was implemented with [[Open-iSCSI]] and {{Target}}s running in DomUs under Xen. The Xen DomU VMs for {{T}} were used to ease development.  
-
 
+
-
Network RAID1 is fully supported by LIO, it is now fully integrated into the [[RTS Director]].
+
-
 
+
-
A prototype of a [[Target]]/[[Initiator]] ("T/I") Repeater Node was built with [[DRBD]] volumes as described below. A {{anchor|T/I Repeated Node}}"T/I Repeater Node" is a physical or virtual machine that is running both iSCSI Target and Initiator stacks. The DRBD T/I Repeater Node was implemented with [[Open-iSCSI]] and LIO [[Target]]s running in DomUs under Xen. The Xen DomUs VM for the [[Target]]s were used to ease development.  
+
The setup can be ported into [[LIO-VM]]. For the [[Initiator]]s, both [[Open-iSCSI]] and [[Core-iSCSI]] can be used. For a multi-OS T/I repeater node, local Host OS local iSCSI storage can be imported through a hypervisor into [[LIO-VM]].
The setup can be ported into [[LIO-VM]]. For the [[Initiator]]s, both [[Open-iSCSI]] and [[Core-iSCSI]] can be used. For a multi-OS T/I repeater node, local Host OS local iSCSI storage can be imported through a hypervisor into [[LIO-VM]].
Line 13: Line 8:
== Setup ==
== Setup ==
-
A Network RAID1 demo setup can be built with virtual machines. In an early example based on Xen, both [[Initiator]] and [[Target]] nodes were fully redundant. The early example contained four Xen paravirtualized machines (two [[Target]] VMs and two Initiator VMs with ext3/[[OCFS2]]) running across two physical dom0 machines with 2x socket 2x core x86_64 with 8 GB of memory. The two Network RAID1 client VMs had no local storage (other than a Xen block device for the root filesystem), and were accessing storage on LIO [[Target]]s through [[Open-iSCSI]]. On both of the Network RAID1 Target nodes, volumes were created on top of available SCSI block devices.  On the primary Network RAID1 Target node, the RAID1 array was built with:
+
A Network RAID1 demo setup can be built with virtual machines. In an early example based on Xen, both [[Initiator]] and {{T}} nodes were fully redundant. The early example contained four Xen paravirtualized machines (two {{T}} VMs and two Initiator VMs with ext3/[[OCFS2]]) running across two physical dom0 machines with 2x socket 2x core x86_64 with 8 GB of memory. The two Network RAID1 client VMs had no local storage (other than a Xen block device for the root filesystem), and were accessing storage on {{T}} through [[Open-iSCSI]]. On both of the Network RAID1 target nodes, volumes were created on top of available SCSI block devices.  On the primary Network RAID1 target node, the RAID1 array was built with:
<pre>
<pre>
-
  mdadm --create /dev/md0 --level=1 --raid-devices=2
+
mdadm --create /dev/md0 --level=1 --raid-devices=2
-
        --bitmap=internal /dev/LIO-NR1-Elements/NR1-Local-Element
+
      --bitmap=internal /dev/LIO-NR1-Elements/NR1-Local-Element
-
        --write-mostly /dev/LIO-NR1-Elements/NR1-Remote-Element
+
      --write-mostly /dev/LIO-NR1-Elements/NR1-Remote-Element
</pre>
</pre>
-
In the example, the Network RAID1 volume on the [[Target]] is constructed with Linux MD RAID1, with an internal write intent bitmap and write mostly element flag. The use of an internal bitmap for tracking changed blocks allows failed Network RAID1 Primary and Secondary nodes to recover quickly in the face of node failure. The write mostly element flag is used on the Primary's remote iSCSI volume which represents the Secondary's local storage. This ensures that READ operations coming from frontend iSCSI initiators are issued to the Primary's local storage.
+
In the example, the Network RAID1 volume on {{T}} is constructed with Linux MD RAID1, with an internal write intent bitmap and write mostly element flag. The use of an internal bitmap for tracking changed blocks allows failed Network RAID1 Primary and Secondary nodes to recover quickly in the face of node failure. The write mostly element flag is used on the Primary's remote iSCSI volume which represents the Secondary's local storage. This ensures that READ operations coming from frontend iSCSI initiators are issued to the Primary's local storage.
The resulting Network RAID1 array looks as follows:
The resulting Network RAID1 array looks as follows:
<pre>
<pre>
-
  [root@bbtest2 ~]# cat /proc/mdstat  
+
[root@bbtest2 ~]# cat /proc/mdstat  
-
    Personalities : [raid1]  
+
Personalities : [raid1]  
-
    md0 : active raid1 dm-2[0] dm-3[1](W)
+
md0 : active raid1 dm-2[0] dm-3[1](W)
-
        10477504 blocks [2/2] [UU]
+
    10477504 blocks [2/2] [UU]
-
        bitmap: 1/160 pages [4KB], 32KB chunk
+
    bitmap: 1/160 pages [4KB], 32KB chunk
-
    unused devices: <none>
+
unused devices: <none>
</pre>
</pre>
From there, a new volume group (LIO-NR1-VOL) is created and a new volume (NR1-PRIMARY-VOL) on the LIO-NR1 array (/dev/md0).
From there, a new volume group (LIO-NR1-VOL) is created and a new volume (NR1-PRIMARY-VOL) on the LIO-NR1 array (/dev/md0).
-
<pre>
+
<small><pre>
-
  [root@bbtest2 ~]# lvs -v
+
[root@bbtest2 ~]# lvs -v
-
    Finding all logical volumes
+
Finding all logical volumes
-
  LV                VG              #Seg Attr  LSize  Maj Min KMaj KMin Origin Snap%  Move Copy%  Log LV UUID                               
+
LV                VG              #Seg Attr  LSize  Maj Min KMaj KMin Origin Snap%  Move Copy%  Log LV UUID                               
-
  NR1-Local-Element  LIO-NR1-Elements    1 -wimao 10.00G 253  2 253  2                                  Qu7YhW-vdWo-IZPd-yDxP-sEbm-xM8L-y96RPD
+
NR1-Local-Element  LIO-NR1-Elements    1 -wimao 10.00G 253  2 253  2                                  Qu7YhW-vdWo-IZPd-yDxP-sEbm-xM8L-y96RPD
-
  NR1-Remote-Element LIO-NR1-Elements    1 -wimao  9.99G 253  3 253  3                                  EEQewk-dhCW-UoMY-LgIK-QV8C-5Zlx-0Hppxc
+
NR1-Remote-Element LIO-NR1-Elements    1 -wimao  9.99G 253  3 253  3                                  EEQewk-dhCW-UoMY-LgIK-QV8C-5Zlx-0Hppxc
-
  NR1-PRIMARY-VOL    LIO-NR1-VOL        1 -wimao  9.98G 253  4 253  4                                  JYElqI-kJOD-QwRo-A68B-s6X9-jw6g-Jfyy1p
+
NR1-PRIMARY-VOL    LIO-NR1-VOL        1 -wimao  9.98G 253  4 253  4                                  JYElqI-kJOD-QwRo-A68B-s6X9-jw6g-Jfyy1p
-
  LogVol00          VolGroup00          1 -wi-ao  3.75G  -1  -1 253  0                                  69PKY5-5nIM-7TZX-vjoh-4sRJ-pALn-QDzNTq
+
LogVol00          VolGroup00          1 -wi-ao  3.75G  -1  -1 253  0                                  69PKY5-5nIM-7TZX-vjoh-4sRJ-pALn-QDzNTq
-
  LogVol01          VolGroup00          1 -wi-ao  1.00G  -1  -1 253  1                                  UwLWHP-J3Iv-s03T-q1gk-nXbE-hvBd-If7hg0
+
LogVol01          VolGroup00          1 -wi-ao  1.00G  -1  -1 253  1                                  UwLWHP-J3Iv-s03T-q1gk-nXbE-hvBd-If7hg0
-
</pre>
+
</pre></small>
-
These [[iSCSI]] volumes and LIO-NR1 volumes need to be accessable on boot by LIO-Primary, and from there, the LVM UUID is passed into a virtual iBlock (BIO Sync Ack) or FILEIO (buffered Ack) in the LIO Target storage engine.
+
These [[iSCSI]] volumes and LIO-NR1 volumes need to be accessable on boot by LIO-Primary, and from there, the LVM UUID is passed into a virtual iBlock (BIO Sync Ack) or FILEIO (buffered Ack) in the {{T}} storage engine.
<pre>
<pre>
-
[root@bbtest2 ~]# target-ctl listluninfo tpgt=1
+
root@bbtest2 ~]# target-ctl listluninfo tpgt=1
-
      -----------------------------[LUN Info for iSCSI TPG 1]-----------------------------
+
-----------------------------[LUN Info for iSCSI TPG 1]-----------------------------
-
      Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32  SectorSize: 512  MaxSectors: 128
+
Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32  SectorSize: 512  MaxSectors: 128
-
      iBlock device: dm-4  LVM UUID: JYElqI-kJOD-QwRo-A68B-s6X9-jw6g-Jfyy1p
+
iBlock device: dm-4  LVM UUID: JYElqI-kJOD-QwRo-A68B-s6X9-jw6g-Jfyy1p
-
      Major: 253 Minor: 4  CLAIMED: IBLOCK
+
Major: 253 Minor: 4  CLAIMED: IBLOCK
-
      Type: Direct-Access    ANSI SCSI revision: 02  Unit Serial: JYElqI-kJOD-QwRo-A68B-s6X9-jw6g-Jfyy1p  DIRECT  EXPORTED
+
Type: Direct-Access    ANSI SCSI revision: 02  Unit Serial: JYElqI-kJOD-QwRo-A68B-s6X9-jw6g-Jfyy1p  DIRECT  EXPORTED
-
      iSCSI Host ID: 0 iSCSI LUN: 0  Active Cmds: 0  Total Bytes: 10716446720
+
iSCSI Host ID: 0 iSCSI LUN: 0  Active Cmds: 0  Total Bytes: 10716446720
-
      ACLed iSCSI Initiator Node(s):
+
ACLed iSCSI Initiator Node(s):
-
              iqn.1994-05.com.redhat:6e211fb3697  0 -> 0
+
      iqn.1994-05.com.redhat:6e211fb3697  0 -> 0
-
              iqn.1994-05.com.redhat:63a52b449156  0 -> 0
+
      iqn.1994-05.com.redhat:63a52b449156  0 -> 0
</pre>
</pre>
-
For testing purposes, all four VM disk images are located on iSCSI storage on their respective host virtualization machines. This storage is coming from one of the LIO Target nodes, and is MD RAID6 SATA with LVM2 on top of the array.
+
For testing purposes, all four VM disk images are located on iSCSI storage on their respective host virtualization machines. This storage is coming from one of the {{T}} nodes, and is MD RAID6 SATA with LVM2 on top of the array.
The prototype so far has proved very stable testing possible failure scenarios.
The prototype so far has proved very stable testing possible failure scenarios.
Line 74: Line 69:
Running Network RAID1 on Dom0 increases performance.
Running Network RAID1 on Dom0 increases performance.
-
Using LVM volume block devices on the DomU Primary and Secondary T/I and VMs as elements of /dev/md0 on the LIO-NR1 machines seems to be a bit slower than raw SCSI block devices. We then create a LVM volume (<code>NR1-PRIMARY-VOL</code> in the prototype) on top of /dev/md0 and this is the storage object that is exported to frontside iSCSI Initiators.
+
Using LVM volume block devices on the DomU Primary and Secondary T/I and VMs as elements of ''/dev/md0'' on the LIO-NR1 machines seems to be a bit slower than raw SCSI block devices. We then create a LVM volume (<code>NR1-PRIMARY-VOL</code> in the prototype) on top of ''/dev/md0'' and this is the storage object that is exported to frontside iSCSI Initiators.
There is also a concern that using an internal write intent bitmap (which is pretty much a requirement for production) with MD has performance implications.
There is also a concern that using an internal write intent bitmap (which is pretty much a requirement for production) with MD has performance implications.
Line 81: Line 76:
Having dedicated 1 Gb/sec or 10 Gb/sec ports between Network RAID1 nodes running jumbo frames for dedicated traffic on Dom0 should help improve latency and performance by reducing the number of interrupts produced by networking hardware.
Having dedicated 1 Gb/sec or 10 Gb/sec ports between Network RAID1 nodes running jumbo frames for dedicated traffic on Dom0 should help improve latency and performance by reducing the number of interrupts produced by networking hardware.
-
 
-
Also, using dedicated CPU affinity for LIO Target threads on Dom0 is something that should be considered for production
 
== Capacity management ==
== Capacity management ==
Line 89: Line 82:
* Growing an existing LIO-NR1 volume (NR1-PRIMARY-VOL in the prototype) by building a new LIO-NR1 of local/remote storage objects.  The frontend iSCSI initiators will have to rescan the logical unit for capacity, and then expand partition->filesystem.
* Growing an existing LIO-NR1 volume (NR1-PRIMARY-VOL in the prototype) by building a new LIO-NR1 of local/remote storage objects.  The frontend iSCSI initiators will have to rescan the logical unit for capacity, and then expand partition->filesystem.
-
* Creating a new LIO-NR1 array and volume and making a new iSCSI LUN available to frontend iSCSI initiators. These initiators can then create new filesystems or extend existing logical volumes.
+
* Creating a new LIO-NR1 array and volume and making a new iSCSI LUN available to frontend iSCSI initiators. These initiators can then create new filesystems or extend existing logical volumes.
== See also ==
== See also ==
-
* Magament: [[RTS Director]]
+
* {{Target}}, [[targetcli]]
-
* Virtualization: [[LIO-VM]] and [[VHACS]]
+
* [[VMware ESX]], [[VMware vSphere]] and [[KVM]]
-
* VMs: [[KVM]], [[VMware ESX]] and [[VMware vSphere]]
+
-
{{AdSense See also}}
+
== External links ==
== External links ==
-
* [http://en.wikipedia.org/wiki/DRBD DRBD]
+
* [http://en.wikipedia.org/wiki/High_availability High Availability] Wikipedia entry
-
* [http://en.wikipedia.org/wiki/Xen Xen]
+
* [http://en.wikipedia.org/wiki/Replication_%28computer_science%29#Disk_storage_replication Data Replication] Wikipedia entry
 +
* [http://en.wikipedia.org/wiki/DRBD DRBD] Wikipedia entry
 +
* [http://en.wikipedia.org/wiki/Xen Xen] Wikipedia entry
[[Category:High availability]]
[[Category:High availability]]

Latest revision as of 17:57, 29 September 2013

Network RAID1 is supported by the LinuxIO and allows for two or more LIO systems to become physically redundant in order to mask hardware or storage array failures.

A prototype of a LinuxIO/Initiator ("T/I") Repeater Node was built with DRBD volumes as described below. A "T/I Repeater Node" is a physical or virtual machine that is running both iSCSI target and Initiator stacks. The DRBD T/I Repeater Node was implemented with Open-iSCSI and LinuxIOs running in DomUs under Xen. The Xen DomU VMs for LIO were used to ease development.

The setup can be ported into LIO-VM. For the Initiators, both Open-iSCSI and Core-iSCSI can be used. For a multi-OS T/I repeater node, local Host OS local iSCSI storage can be imported through a hypervisor into LIO-VM.

Contents

Setup

A Network RAID1 demo setup can be built with virtual machines. In an early example based on Xen, both Initiator and LIO nodes were fully redundant. The early example contained four Xen paravirtualized machines (two LIO VMs and two Initiator VMs with ext3/OCFS2) running across two physical dom0 machines with 2x socket 2x core x86_64 with 8 GB of memory. The two Network RAID1 client VMs had no local storage (other than a Xen block device for the root filesystem), and were accessing storage on LIO through Open-iSCSI. On both of the Network RAID1 target nodes, volumes were created on top of available SCSI block devices. On the primary Network RAID1 target node, the RAID1 array was built with:

mdadm --create /dev/md0 --level=1 --raid-devices=2
      --bitmap=internal /dev/LIO-NR1-Elements/NR1-Local-Element
      --write-mostly /dev/LIO-NR1-Elements/NR1-Remote-Element

In the example, the Network RAID1 volume on LIO is constructed with Linux MD RAID1, with an internal write intent bitmap and write mostly element flag. The use of an internal bitmap for tracking changed blocks allows failed Network RAID1 Primary and Secondary nodes to recover quickly in the face of node failure. The write mostly element flag is used on the Primary's remote iSCSI volume which represents the Secondary's local storage. This ensures that READ operations coming from frontend iSCSI initiators are issued to the Primary's local storage.

The resulting Network RAID1 array looks as follows:

[root@bbtest2 ~]# cat /proc/mdstat 
Personalities : [raid1] 
md0 : active raid1 dm-2[0] dm-3[1](W)
    10477504 blocks [2/2] [UU]
    bitmap: 1/160 pages [4KB], 32KB chunk
unused devices: <none>

From there, a new volume group (LIO-NR1-VOL) is created and a new volume (NR1-PRIMARY-VOL) on the LIO-NR1 array (/dev/md0).

[root@bbtest2 ~]# lvs -v
Finding all logical volumes
LV                 VG               #Seg Attr   LSize  Maj Min KMaj KMin Origin Snap%  Move Copy%  Log LV UUID                               
NR1-Local-Element  LIO-NR1-Elements    1 -wimao 10.00G 253   2 253  2                                  Qu7YhW-vdWo-IZPd-yDxP-sEbm-xM8L-y96RPD
NR1-Remote-Element LIO-NR1-Elements    1 -wimao  9.99G 253   3 253  3                                  EEQewk-dhCW-UoMY-LgIK-QV8C-5Zlx-0Hppxc
NR1-PRIMARY-VOL    LIO-NR1-VOL         1 -wimao  9.98G 253   4 253  4                                  JYElqI-kJOD-QwRo-A68B-s6X9-jw6g-Jfyy1p
LogVol00           VolGroup00          1 -wi-ao  3.75G  -1  -1 253  0                                  69PKY5-5nIM-7TZX-vjoh-4sRJ-pALn-QDzNTq
LogVol01           VolGroup00          1 -wi-ao  1.00G  -1  -1 253  1                                  UwLWHP-J3Iv-s03T-q1gk-nXbE-hvBd-If7hg0

These iSCSI volumes and LIO-NR1 volumes need to be accessable on boot by LIO-Primary, and from there, the LVM UUID is passed into a virtual iBlock (BIO Sync Ack) or FILEIO (buffered Ack) in the LIO storage engine.

root@bbtest2 ~]# target-ctl listluninfo tpgt=1
-----------------------------[LUN Info for iSCSI TPG 1]-----------------------------
Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32  SectorSize: 512  MaxSectors: 128
iBlock device: dm-4  LVM UUID: JYElqI-kJOD-QwRo-A68B-s6X9-jw6g-Jfyy1p
Major: 253 Minor: 4  CLAIMED: IBLOCK
Type: Direct-Access     ANSI SCSI revision: 02  Unit Serial: JYElqI-kJOD-QwRo-A68B-s6X9-jw6g-Jfyy1p  DIRECT  EXPORTED
iSCSI Host ID: 0 iSCSI LUN: 0  Active Cmds: 0  Total Bytes: 10716446720
ACLed iSCSI Initiator Node(s):
      iqn.1994-05.com.redhat:6e211fb3697  0 -> 0
      iqn.1994-05.com.redhat:63a52b449156  0 -> 0

For testing purposes, all four VM disk images are located on iSCSI storage on their respective host virtualization machines. This storage is coming from one of the LIO nodes, and is MD RAID6 SATA with LVM2 on top of the array.

The prototype so far has proved very stable testing possible failure scenarios.

For production systems, we'd typically expect people to be using software or hardware RAID arrays, or Linux v2.6 lvm2 block devices.

Performance

Throughput

Running Network RAID1 on Dom0 increases performance.

Using LVM volume block devices on the DomU Primary and Secondary T/I and VMs as elements of /dev/md0 on the LIO-NR1 machines seems to be a bit slower than raw SCSI block devices. We then create a LVM volume (NR1-PRIMARY-VOL in the prototype) on top of /dev/md0 and this is the storage object that is exported to frontside iSCSI Initiators.

There is also a concern that using an internal write intent bitmap (which is pretty much a requirement for production) with MD has performance implications.

Latency

Having dedicated 1 Gb/sec or 10 Gb/sec ports between Network RAID1 nodes running jumbo frames for dedicated traffic on Dom0 should help improve latency and performance by reducing the number of interrupts produced by networking hardware.

Capacity management

The amount of Network RAID1 storage available for frontend iSCSI initiators can be managed (grown) at least as follows:

See also

External links

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox
Google AdSense