Linuxhotel Wiki

Wie ging das nochmal?

Benutzer-Werkzeuge

Webseiten-Werkzeuge


lpi2:what_is_zfs

What is ZFS

ZFS is an Open Source project started on 2001 by the company Sun Microsystems, afterwards bought by Oracle. It was developed to be a 128 bit filesystem with an integrated Volume Manager for RAID. It was specially design to work with huge amounts of data. It offers data integrity, capacity, simple administration and high speed.

ZFS offers:

  • A pool-based storage administration
  • A filesystem with integrated Volume Manager
  • End-to-end data integrity by checksums
  • Transactional operations
  • Copy-on-write data model
  • 128 bit capacity and scalability
  • Self healing
  • Dynamic block size
  • High speed by compression and pre-fetch
  • Simple administration
  • Host independent on-disk format
  • Online data integrity check
  • Open Source, free availability
  • Filesystem snapshots (without limit)
  • Clone
  • Quotas and reservation
  • Mirror, RAID-Z1 to RAID-Z3 (triple redundancy)
  • Transparent compression
  • Volumes
  • NFS3/4, iSCSI, CIFS/SMB shares
  • Export and import storage pools
  • Storage replication over the network
  • Boot environments
  • ARC - Adaptive Replacement Cache

more: https://openzfs.org

Installation of ZFS

Generic

  • *BSD: Already there in FreeBSD and NetBSD. There are solutions based on FreeBSD and OpenZFS for commercial use, like FreeNAS and TrueOS.
  • Linux: A port for Linux exists since 2005 with FUSE, therefore booting from a ZFS volume in Linux is not possible. The boot loader must be installed on the system to be possible to boot from a ZFS volume. Actually GRUB can do that. To install ZFS on linux, please go to http://zfsonlinux.org
  • Mac OS X: There is a commercial binary for Mac OS X under https://openzfsonosx.org
  • Windows: There is a beta for windows which should be compiled with Microsoft Visual Studio. The sources are under https://github.com/openzfsonwindows/ZFSin

more: https://openzfs.github.io/openzfs-docs/Getting%20Started/index.html

Creation of a Pool

A pool can be created by typing the command:

zpool create <pool_name> [<pool_type>] <devices_for_the_pool>

for instance,

zpool create mypool /dev/ada1

After using this command, ZFS creates the folder /mypool without needing any change on the fstab(5), the changes will be applied on reboot and the pool is immediately to be used.

The pool_type could be one of the following:

  • stripe the pool will use the disks directly, without any RAID configuration, like RAID-0. Minimum, a device must be given as parameter.
  • mirror the disks will be mirrored between the given devices, like RAID-1. At least 2 disks must be given as parameter.
  • raidz the configuration of the pool will be RAID-Z1, which will be equivalent to a RAID-5. One disk can fail without losing any data. At least 3 disks must be used for the pool.
  • raidz2 which is equivalent to RAID-6, where 2 disks could fail without losing data, at least 4 disks are needed for this configuration
  • raidz3 up to 3 disks can fail without losing data, but at least 5 disks are needed for this configuration.

The configuration for the pool can be combined to have some different combinations, for instance a RAID-1+0 can be accomplished with the next command:

zpool create mypool mirror /dev/ada1 /dev/ada2 mirror /dev/ada3 /dev/ada4

Showing the status of the pool

To check the status of the pool, just use the command

zpool status

which will show the name of the pool, the state, if a scan is in progress, the configuration of the pool and if the pool has some errors.

ZIL and L2ARC

ZIL is the ZFS Intent Log, which is similar to a database log: before a transaction is committed, the log is written and in case of interruption, it could be reconstructed. After writing in the log, ZFS communicates that the data were written in a clean state. This is similar to a SLOG (separate intent log).

ZIL will lose all information on boot up

To attach a ZIL to a pool, use the command:

zpool add <pool_name> log [<pool_type>] <devices>

for instance,

zpool add spiegel log /dev/nvm0

L2ARC is the „Level 2 Adaptive Replacement Cache“, which will cache the most recently used and most frequently used data from the disk, allowing the user to have a faster access to the data. ZFS also keeps track of the data which has left both of the lists in case that they come again into any of both lists.

To attach a L2ARC to the pool, use the command:

zpool add <pool_name> cache [<pool_type>] <devices>

for instance,

zpool add spiegel cache /dev/nvm0

Another cool feature of ZFS with ARC is the compression of data which will be put in the ARC, allowing the system to use more data than the size of the whole RAM.

Simple administration

Adding a device to the pool

To add a device or devices to the pool there is a single command which allows the user to add a device or multiple devices to the pool, in the desired configuration.

The command uses the following structure:

zpool add <pool_name> ([<pool_type>] <devices>){1,…}

So, not only a couple of devices could be added to the pool, but also different configurations could be added to it by specifying several times the last block ([<pool_type>] <devices>).

For instance, we had configured a pool with two mirrored disks and we would like to add another two mirrored disks, a cache device and two mirrored SLOGs. The command to use should look like:

zpool add mypool mirror /dev/ada3 /dev/ada4 cache /dev/nvm1 log mirror /dev/nvm2 /dev/nvm3

Notice that after the two mirrored disks ada3 and ada4 the keyword cache appears and after the caching device comes the keywords log and mirror, indicating that the following devices will form part of the logging pool and the are mirrored.

Expanding the pool to a mirrored configuration

If we had a pool in a stripe configuration and we would like to attach a new device to the pool in order to mirror the old disk, it could be accomplished using the following command

zpool attach <pool_name> <device_in_pool> <new_device>

The device in the pool is recognized on the first parameter and will be used to expand the configuration to a mirror one, using both devices for the mirror.

Expanding the pool by exchanging disks

The capacity of the pool could be expanded just by exchanging the old disks with disks that have more capacity. For instance, we had in our pool (in a mirrored configuration) two disks which are 1TB and we want to expand it with disks that are 2TB. In order to do this, we will exchange the first the first disk, we will wait until the resilvering is completed, then the next one, wait again for the resil vering and, at the end, expanding the pool to match the new available space on the pool.

After exchanging the disks, the size of the pool should be the same as before of the exchange, to expand it, use the command:

zpool replace <pool_name> <device> [<new_device>]

In the case that the new device is not located on the same location, the last argument is available to give the administrator the possi bility to give it to ZFS, it is also possible that the old device was attached to a SATA bus and the new one to a SCSI controller.

For instance, we will replace a SATA disk with a SCSI one on FreeBSD, with the commnad:

zpool replace mypool /dev/ada1 /dev/da0

If the new disks are bigger than the old ones, zfs list will show the expandable size on the column EXPANDSZ, in order to expan d the capacity of the pool, the user should use the command online with the option -e as described:

zpool online -e <pool_name> <devices>

for instance,

zpool online -e mypool /dev/ada1 /dev/ada2

If physically there is not more place for a new disk, before exchanging a disk, we must stop all activity from the disk before taking i t from the frame. The disk can be deactivated with the command:

zpool offline <pool_name> <device>

The device should be inactive for the pool and ready for the exchange, after exchanging it, it can be activated again with the online commnand:

zpool online <pool_name> <device>

Another possibility could be to deactivate a device temporaly with the flag -t, allowing the system to recognize that, after a rebo ot, the disk should be reattached.

Pool properties

A pool in ZFS has several properties and some of them could be tuned. For instance, the user could write a script to fetch the size of the pool, to do that, this command can be used:

zpool get <property>|all <pool_name>

If the user is not sure about the available properties of the pool, the command accepts as property the keyword all, so every single property is displayed.

To set some property, the user can use the command:

zpool set <property>=<value> <pool_name>

Of course, not every property can be changed, some of them are inmutable, like the size of the pool. The properties could be also set during the creation of the pool, by giving the -o flag to the command zpool create, so for instance:

zpool create -o comment=„WD Gold“ data mirror /dev/ada1 /dev/ada2

The properties that depends on the operating system could be set using the flag -O, like the property canmount for FreeBSD, this property is usefull if the pool consists on a collection of virtual devices to be used as hard disks for virtual machines.

Showing the history of the Pool

Every single command given to the pool is stored along with the timestamp. The history could be shown using this command:

zpool history <pool_name>

If the user also wants to list the user who made the change and from where, the flag -l could be used, and to check ZFS internal information for the transaction, the -i flag could be pased along to the command.

Exporting / Importing pools

Exporting and importing pools is the best way to migrate a pool from old hardware to a new one. This operation is independant of the operating system and the architecture of the machines, either source or target. One machine could be Little Endian and the other one Big Endian. This kind of information is stored on the metadata of ZFS, so the data stored on the disks could be given to the machine in the right bit-order.

ZFS also recognizes the disks on the pool independant of the location they used to be. A pool could be exported using the command:

zpool export <pool_name>

The available pools to import can be shown using the command

zpool import

without any further parameter. It could happen that one device on the pool is missing or unavailable. If there are too many missing devices, the pool could not be imported due to lack of enough redundancy to replicate the pool. To import a pool, use the command:

zpool import <old_pool_name> <new_pool_name>

if the pool is wanted to be tested before importing it completely, the -R flag could be used to mount the pool on a temporal folder. The properties of the new pool could also be set by giving the -o <property>=<value> flag to the command, so the pool could be imported read only, for instance.

In order to recover a recently destroyed pool (Ooops!) could be recovered by giving the flag -D to this command and the status of the pool should be ONLINE (DESTROYED) when is listed.

Staus of the pool

The pool or its devices could be in one of several status for some reasons, the listed stati and their causes are an important information to have in mind when looking at the status of the pool.

  • Online: There is no problem on the pool.
  • Offline: Device was deactivated by the administrator.
  • Degraded: At least one device or VDEV is missing or it has generated too many errors to work properly. There is enough redundancy though to use the pool efficiently.
  • Faulted: Corrupted data were stored or the pool is not working properly. Usually, there are too many devices missing that the redudancy is not enough.
  • Removed: Some controllers or operating systems recognizes when a device is physically removed and reports it to the pool. If the device is being replaced, it will be tried to be attached again.
  • Missing: The device is no longer available and is missing from the pool.
  • Replacing: The device is currently being replaced by a new one.
  • Spare: A spare device to be used when one of the currently devices fails, the spare takes its place.
  • Resilvering: The device is being currently resilvered and redundancy data is being copied to the device.

Spare disks

ZFS allows the user to specify spare disks to be available when one of the disks on the pool is not available at the moment. This is possible when the space disk was added to the pool using the keyword spare. There are two possibilities to add a spare device during creation or afterwards, let's take a look at both options:

zpool create mypool mirror /dev/ada1 /dev/ada2 spare /dev/ada3

zpool add mypool spare /dev/ada3

When one of the disks is missing, it could be replaced with the command:

zpool replace <pool_name> <disk-uid> <spare_disk>

The spare disk will become a part of a new subset of VDEVs inside the pool and the resilvering begins automatically. Allowing the user to have a way to have redundancy while the new VDEV comes because all the redundancy that could be on use, may disappear. Once the new devices are back online, to detach the spare of the pool and put it back on the spare pool, use the command:

zpool detach <pool_name> <spare_disk>

Scrub

ZFS, due to its CoW design, doesn't need fsck, because the data is always consistently written on disk. Though, a check for errors is also needed by the system, because of this errors:

  • Bit rot
  • Phantom Writes
  • Misguided reading or writing
  • DMA parity error
  • Slight data corruption (bit flips)

To detect and solve this problems, ZFS provides a command, which should be run once per month. The syntax of this command is:

zpool scrub [-s] <pool_name> [<pool_name> …]

The -s flag, allows the user to stop the scrubing process to continue later.

Roberto Fernandez Cueto 2018/05/08 10:44

lpi2/what_is_zfs.txt · Zuletzt geändert: 2025/09/09 22:34 (Externe Bearbeitung)