====== What is ZFS ======

ZFS is an Open Source project started on 2001 by the company Sun Microsystems, afterwards bought by Oracle. It was developed to be a 128 bit filesystem with an integrated Volume Manager for RAID. It was specially design to work with huge amounts of data. It offers data integrity, capacity, simple administration and high speed.

ZFS offers:
  * A pool-based storage administration
  * A filesystem with integrated Volume Manager
  * End-to-end data integrity by checksums
  * Transactional operations
  * Copy-on-write data model
  * 128 bit capacity and scalability
  * Self healing
  * Dynamic block size
  * High speed by compression and pre-fetch
  * Simple administration
  * Host independent on-disk format
  * Online data integrity check
  * Open Source, free availability
  * Filesystem snapshots (without limit)
  * Clone
  * Quotas and reservation
  * Mirror, RAID-Z1 to RAID-Z3 (triple redundancy)
  * Transparent compression
  * Volumes
  * NFS3/4, iSCSI, CIFS/SMB shares
  * Export and import storage pools
  * Storage replication over the network
  * Boot environments
  * ARC - Adaptive Replacement Cache

more: https://openzfs.org

====== Installation of ZFS ======
===== Generic =====
  * *BSD: Already there in FreeBSD and NetBSD. There are solutions based on FreeBSD and OpenZFS for commercial use, like FreeNAS and TrueOS.
  * Linux: A port for Linux exists since 2005 with FUSE, therefore booting from a ZFS volume in Linux is not possible. The boot loader must be installed on the system to be possible to boot from a ZFS volume. Actually GRUB can do that. To install ZFS on linux, please go to http://zfsonlinux.org
  * Mac OS X: There is a commercial binary for Mac OS X under https://openzfsonosx.org
  * Windows: There is a beta for windows which should be compiled with Microsoft Visual Studio. The sources are under https://github.com/openzfsonwindows/ZFSin

more: https://openzfs.github.io/openzfs-docs/Getting%20Started/index.html

** Creation of a Pool **

A pool can be created by typing the command:

''zpool create <pool_name> [<pool_type>] <devices_for_the_pool>''

for instance,

''zpool create mypool /dev/ada1''

After using this command, ZFS creates the folder /mypool without needing any change on the ''fstab(5)'', the changes will be applied on reboot and the pool is immediately to be used.

The pool_type could be one of the following:
  * ''stripe'' the pool will use the disks directly, without any RAID configuration, like RAID-0. Minimum, a device must be given as parameter.
  * ''mirror'' the disks will be mirrored between the given devices, like RAID-1. At least 2 disks must be given as parameter.
  * ''raidz'' the configuration of the pool will be RAID-Z1, which will be equivalent to a RAID-5. One disk can fail without losing any data. At least 3 disks must be used for the pool.
  * ''raidz2'' which is equivalent to RAID-6, where 2 disks could fail without losing data, at least 4 disks are needed for this configuration
  * ''raidz3'' up to 3 disks can fail without losing data, but at least 5 disks are needed for this configuration.

The configuration for the pool can be combined to have some different combinations, for instance a RAID-1+0 can be accomplished with the next command:

''zpool create mypool mirror /dev/ada1 /dev/ada2 mirror /dev/ada3 /dev/ada4''

====== Showing the status of the pool ======

To check the status of the pool, just use the command

''zpool status''

which will show the name of the pool, the state, if a scan is in progress, the configuration of the pool and if the pool has some errors. 

====== ZIL and L2ARC ======

ZIL is the ZFS Intent Log, which is similar to a database log: before a transaction is committed, the log is written and in case of interruption, it could be reconstructed. After writing in the log, ZFS communicates that the data were written in a clean state. This is similar to a SLOG (separate intent log).

** ZIL will lose all information on boot up **

To attach a ZIL to a pool, use the command:

''zpool add <pool_name> log [<pool_type>] <devices>''

for instance,

''zpool add spiegel log /dev/nvm0''

L2ARC is the "Level 2 Adaptive Replacement Cache", which will cache the most recently used and most frequently used data from the disk, allowing the user to have a faster access to the data.
ZFS also keeps track of the data which has left both of the lists in case that they come again into any of both lists.

To attach a L2ARC to the pool, use the command:

''zpool add <pool_name> cache [<pool_type>] <devices>''

for instance,

''zpool add spiegel cache /dev/nvm0''

Another cool feature of ZFS with ARC is the compression of data which will be put in the ARC, allowing the system to use more data than the size of the whole RAM.

====== Simple administration ======
===== Adding a device to the pool =====

To add a device or devices to the pool there is a single command which allows the user to add a device or multiple devices to the pool, in the desired configuration.

The command uses the following structure:

''zpool add <pool_name> ([<pool_type>] <devices>){1,...}''

So, not only a couple of devices could be added to the pool, but also different configurations could be added to it by specifying several times the last block ([<pool_type>] <devices>).

For instance, we had configured a pool with two mirrored disks and we would like to add another two mirrored disks, a cache device and two mirrored SLOGs. The command to use should look like:

''zpool add mypool mirror /dev/ada3 /dev/ada4 cache /dev/nvm1 log mirror /dev/nvm2 /dev/nvm3''

Notice that after the two mirrored disks ''ada3'' and ''ada4'' the keyword ''cache'' appears and after the caching device comes the keywords ''log'' and ''mirror'', indicating that the following devices will form part of the logging pool and the are mirrored.

===== Expanding the pool to a mirrored configuration =====

If we had a pool in a stripe configuration and we would like to attach a new device to the pool in order to mirror the old disk, it could be accomplished using the following command

''zpool attach <pool_name> <device_in_pool> <new_device>''

The device in the pool is recognized on the first parameter and will be used to expand the configuration to a mirror one, using both devices for the mirror.

===== Expanding the pool by exchanging disks =====

The capacity of the pool could be expanded just by exchanging the old disks with disks that have more capacity. For instance, we had in
 our pool (in a mirrored configuration) two disks which are 1TB and we want to expand it with disks that are 2TB. In order to do this, 
we will exchange the first the first disk, we will wait until the resilvering is completed, then the next one, wait again for the resil
vering and, at the end, expanding the pool to match the new available space on the pool.

After exchanging the disks, the size of the pool should be the same as before of the exchange, to expand it, use the command:

''zpool replace <pool_name> <device> [<new_device>]''

In the case that the new device is not located on the same location, the last argument is available to give the administrator the possi
bility to give it to ZFS, it is also possible that the old device was attached to a SATA bus and the new one to a SCSI controller.

For instance, we will replace a SATA disk with a SCSI one on FreeBSD, with the commnad:

''zpool replace mypool /dev/ada1 /dev/da0''

If the new disks are bigger than the old ones, ''zfs list'' will show the expandable size on the column ''EXPANDSZ'', in order to expan
d the capacity of the pool, the user should use the command ''online'' with the option ''-e'' as described:

''zpool online -e <pool_name> <devices>''

for instance,

''zpool online -e mypool /dev/ada1 /dev/ada2''

If physically there is not more place for a new disk, before exchanging a disk, we must stop all activity from the disk before taking i
t from the frame. The disk can be deactivated with the command:

''zpool offline <pool_name> <device>''

The device should be inactive for the pool and ready for the exchange, after exchanging it, it can be activated again with the ''online
'' commnand:

''zpool online <pool_name> <device>''

Another possibility could be to deactivate a device temporaly with the flag ''-t'', allowing the system to recognize that, after a rebo
ot, the disk should be reattached.

====== Pool properties ======

A pool in ZFS has several properties and some of them could be tuned. For instance, the user could write a script to fetch the size of 
the pool, to do that, this command can be used:

''zpool get <property>|all <pool_name>''

If the user is not sure about the available properties of the pool, the command accepts as property the keyword ''all'', so every single property is displayed.

To set some property, the user can use the command:

''zpool set <property>=<value> <pool_name>''

Of course, not every property can be changed, some of them are inmutable, like the size of the pool.
The properties could be also set during the creation of the pool, by giving the ''-o'' flag to the command ''zpool create'', so for instance:

''zpool create -o comment="WD Gold" data mirror /dev/ada1 /dev/ada2''

The properties that depends on the operating system could be set using the flag ''-O'', like the property ''canmount'' for FreeBSD, this property is usefull if the pool consists on a collection of virtual devices to be used as hard disks for virtual machines.

====== Showing the history of the Pool ======

Every single command given to the pool is stored along with the timestamp. The history could be shown using this command:

''zpool history <pool_name>''

If the user also wants to list the user who made the change and from where, the flag ''-l'' could be used, and to check ZFS internal information for the transaction, the ''-i'' flag could be pased along to the command.

====== Exporting / Importing pools ======

Exporting and importing pools is the best way to migrate a pool from old hardware to a new one. This operation is independant of the operating system and the architecture of the machines, either source or target. One machine could be Little Endian and the other one Big Endian. This kind of information is stored on the metadata of ZFS, so the data stored on the disks could be given to the machine in the right bit-order.

ZFS also recognizes the disks on the pool independant of the location they used to be. A pool could be exported using the command:

''zpool export <pool_name>''

The available pools to import can be shown using the command

''zpool import''

without any further parameter. It could happen that one device on the pool is missing or unavailable. If there are too many missing devices, the pool could not be imported due to lack of enough redundancy to replicate the pool. To import a pool, use the command:

''zpool import <old_pool_name> <new_pool_name>''

if the pool is wanted to be tested before importing it completely, the ''-R'' flag could be used to mount the pool on a temporal folder. The properties of the new pool could also be set by giving the ''-o <property>=<value>'' flag to the command, so the pool could be imported read only, for instance.

In order to recover a recently destroyed pool (Ooops!) could be recovered by giving the flag ''-D'' to this command and the status of the pool should be ''ONLINE (DESTROYED)'' when is listed.

====== Staus of the pool ======

The pool or its devices could be in one of several status for some reasons, the listed stati and their causes are an important information to have in mind when looking at the status of the pool.

  * Online: There is no problem on the pool.
  * Offline: Device was deactivated by the administrator.
  * Degraded: At least one device or VDEV is missing or it has generated too many errors to work properly. There is enough redundancy though to use the pool efficiently.
  * Faulted: Corrupted data were stored or the pool is not working properly. Usually, there are too many devices missing that the redudancy is not enough.
  * Removed: Some controllers or operating systems recognizes when a device is physically removed and reports it to the pool. If the device is being replaced, it will be tried to be attached again.
  * Missing: The device is no longer available and is missing from the pool.
  * Replacing: The device is currently being replaced by a new one.
  * Spare: A spare device to be used when one of the currently devices fails, the spare takes its place.
  * Resilvering: The device is being currently resilvered and redundancy data is being copied to the device.


====== Spare disks ======

ZFS allows the user to specify spare disks to be available when one of the disks on the pool is not available at the moment. This is possible when the space disk was added to the pool using the keyword ''spare''. There are two possibilities to add a spare device during creation or afterwards, let's take a look at both options:

''zpool create mypool mirror /dev/ada1 /dev/ada2 spare /dev/ada3''

''zpool add mypool spare /dev/ada3''

When one of the disks is missing, it could be replaced with the command:

''zpool replace <pool_name> <disk-uid> <spare_disk>''

The spare disk will become a part of a new subset of VDEVs inside the pool and the resilvering begins automatically. Allowing the user to have a way to have redundancy while the new VDEV comes because all the redundancy that could be on use, may disappear.
Once the new devices are back online, to detach the spare of the pool and put it back on the spare pool, use the command:

''zpool detach <pool_name> <spare_disk>''

====== Scrub ======

ZFS, due to its CoW design, doesn't need fsck, because the data is always consistently written on disk. Though, a check for errors is also needed by the system, because of this errors:

  * Bit rot
  * Phantom Writes
  * Misguided reading or writing
  * DMA parity error
  * Slight data corruption (bit flips)

To detect and solve this problems, ZFS provides a command, which should be run once per month. The syntax of this command is:

''zpool scrub [-s] <pool_name> [<pool_name> ...]''

The ''-s'' flag, allows the user to stop the scrubing process to continue later.

 --- //[[r.fernandez-cueto@bally-wulff.de|Roberto Fernandez Cueto]] 2018/05/08 10:44//