The translation is temporarily closed for contributions due to maintenance, please come back later.

Source string Read only

(itstool) path: listitem/para
Context English State
<link xlink:href="">OpenZFS</link>
<link xlink:href="">FreeBSD Wiki - <acronym>ZFS</acronym> Tuning</link>
<link xlink:href="">Oracle Solaris <acronym>ZFS</acronym> Administration Guide</link>
<link xlink:href="">Calomel Blog - <acronym>ZFS</acronym> Raidz Performance, Capacity and Integrity</link>
<acronym>ZFS</acronym> Features and Terminology
<acronym>ZFS</acronym> is a fundamentally different file system because it is more than just a file system. <acronym>ZFS</acronym> combines the roles of file system and volume manager, enabling additional storage devices to be added to a live system and having the new space available on all of the existing file systems in that pool immediately. By combining the traditionally separate roles, <acronym>ZFS</acronym> is able to overcome previous limitations that prevented <acronym>RAID</acronym> groups being able to grow. Each top level device in a pool is called a <emphasis>vdev</emphasis>, which can be a simple disk or a <acronym>RAID</acronym> transformation such as a mirror or <acronym>RAID-Z</acronym> array. <acronym>ZFS</acronym> file systems (called <emphasis>datasets</emphasis>) each have access to the combined free space of the entire pool. As blocks are allocated from the pool, the space available to each file system decreases. This approach avoids the common pitfall with extensive partitioning where free space becomes fragmented across the partitions.
A storage <emphasis>pool</emphasis> is the most basic building block of <acronym>ZFS</acronym>. A pool is made up of one or more vdevs, the underlying devices that store the data. A pool is then used to create one or more file systems (datasets) or block devices (volumes). These datasets and volumes share the pool of remaining free space. Each pool is uniquely identified by a name and a <acronym>GUID</acronym>. The features available are determined by the <acronym>ZFS</acronym> version number on the pool.
vdev Types
<emphasis>Disk</emphasis> - The most basic type of vdev is a standard block device. This can be an entire disk (such as <filename><replaceable>/dev/ada0</replaceable></filename> or <filename><replaceable>/dev/da0</replaceable></filename>) or a partition (<filename><replaceable>/dev/ada0p3</replaceable></filename>). On FreeBSD, there is no performance penalty for using a partition rather than the entire disk. This differs from recommendations made by the Solaris documentation.
Using an entire disk as part of a bootable pool is strongly discouraged, as this may render the pool unbootable. Likewise, you should not use an entire disk as part of a mirror or <acronym>RAID-Z</acronym> vdev. These are because it is impossible to reliably determine the size of an unpartitioned disk at boot time and because there's no place to put in boot code.
<emphasis>File</emphasis> - In addition to disks, <acronym>ZFS</acronym> pools can be backed by regular files, this is especially useful for testing and experimentation. Use the full path to the file as the device path in <command>zpool create</command>. All vdevs must be at least 128 MB in size.
<emphasis>Mirror</emphasis> - When creating a mirror, specify the <literal>mirror</literal> keyword followed by the list of member devices for the mirror. A mirror consists of two or more devices, all data will be written to all member devices. A mirror vdev will only hold as much data as its smallest member. A mirror vdev can withstand the failure of all but one of its members without losing any data.
A regular single disk vdev can be upgraded to a mirror vdev at any time with <command>zpool <link linkend="zfs-zpool-attach">attach</link></command>.
<emphasis><acronym>RAID-Z</acronym></emphasis> - <acronym>ZFS</acronym> implements <acronym>RAID-Z</acronym>, a variation on standard <acronym>RAID-5</acronym> that offers better distribution of parity and eliminates the <quote><acronym>RAID-5</acronym> write hole</quote> in which the data and parity information become inconsistent after an unexpected restart. <acronym>ZFS</acronym> supports three levels of <acronym>RAID-Z</acronym> which provide varying levels of redundancy in exchange for decreasing levels of usable storage. The types are named <acronym>RAID-Z1</acronym> through <acronym>RAID-Z3</acronym> based on the number of parity devices in the array and the number of disks which can fail while the pool remains operational.
In a <acronym>RAID-Z1</acronym> configuration with four disks, each 1 TB, usable storage is 3 TB and the pool will still be able to operate in degraded mode with one faulted disk. If an additional disk goes offline before the faulted disk is replaced and resilvered, all data in the pool can be lost.
In a <acronym>RAID-Z3</acronym> configuration with eight disks of 1 TB, the volume will provide 5 TB of usable space and still be able to operate with three faulted disks. <trademark>Sun</trademark> recommends no more than nine disks in a single vdev. If the configuration has more disks, it is recommended to divide them into separate vdevs and the pool data will be striped across them.
A configuration of two <acronym>RAID-Z2</acronym> vdevs consisting of 8 disks each would create something similar to a <acronym>RAID-60</acronym> array. A <acronym>RAID-Z</acronym> group's storage capacity is approximately the size of the smallest disk multiplied by the number of non-parity disks. Four 1 TB disks in <acronym>RAID-Z1</acronym> has an effective size of approximately 3 TB, and an array of eight 1 TB disks in <acronym>RAID-Z3</acronym> will yield 5 TB of usable space.
<emphasis>Spare</emphasis> - <acronym>ZFS</acronym> has a special pseudo-vdev type for keeping track of available hot spares. Note that installed hot spares are not deployed automatically; they must manually be configured to replace the failed device using <command>zfs replace</command>.
<emphasis>Log</emphasis> - <acronym>ZFS</acronym> Log Devices, also known as <acronym>ZFS</acronym> Intent Log (<link linkend="zfs-term-zil"><acronym>ZIL</acronym></link>) move the intent log from the regular pool devices to a dedicated device, typically an <acronym>SSD</acronym>. Having a dedicated log device can significantly improve the performance of applications with a high volume of synchronous writes, especially databases. Log devices can be mirrored, but <acronym>RAID-Z</acronym> is not supported. If multiple log devices are used, writes will be load balanced across them.
<emphasis>Cache</emphasis> - Adding a cache vdev to a pool will add the storage of the cache to the <link linkend="zfs-term-l2arc"><acronym>L2ARC</acronym></link>. Cache devices cannot be mirrored. Since a cache device only stores additional copies of existing data, there is no risk of data loss.
A pool is made up of one or more vdevs, which themselves can be a single disk or a group of disks, in the case of a <acronym>RAID</acronym> transform. When multiple vdevs are used, <acronym>ZFS</acronym> spreads data across the vdevs to increase performance and maximize usable space. <_:itemizedlist-1/>
Transaction Group (<acronym>TXG</acronym>)
<emphasis>Open</emphasis> - When a new transaction group is created, it is in the open state, and accepts new writes. There is always a transaction group in the open state, however the transaction group may refuse new writes if it has reached a limit. Once the open transaction group has reached a limit, or the <link linkend="zfs-advanced-tuning-txg-timeout"><varname>vfs.zfs.txg.timeout</varname></link> has been reached, the transaction group advances to the next state.
<emphasis>Quiescing</emphasis> - A short state that allows any pending operations to finish while not blocking the creation of a new open transaction group. Once all of the transactions in the group have completed, the transaction group advances to the final state.
<emphasis>Syncing</emphasis> - All of the data in the transaction group is written to stable storage. This process will in turn modify other data, such as metadata and space maps, that will also need to be written to stable storage. The process of syncing involves multiple passes. The first, all of the changed data blocks, is the biggest, followed by the metadata, which may take multiple passes to complete. Since allocating space for the data blocks generates new metadata, the syncing state cannot finish until a pass completes that does not allocate any additional space. The syncing state is also where <emphasis>synctasks</emphasis> are completed. Synctasks are administrative operations, such as creating or destroying snapshots and datasets, that modify the uberblock are completed. Once the sync state is complete, the transaction group in the quiescing state is advanced to the syncing state.
Transaction Groups are the way changed blocks are grouped together and eventually written to the pool. Transaction groups are the atomic unit that <acronym>ZFS</acronym> uses to assert consistency. Each transaction group is assigned a unique 64-bit consecutive identifier. There can be up to three active transaction groups at a time, one in each of these three states: <_:itemizedlist-1/> All administrative functions, such as <link linkend="zfs-term-snapshot"><command>snapshot</command></link> are written as part of the transaction group. When a synctask is created, it is added to the currently open transaction group, and that group is advanced as quickly as possible to the syncing state to reduce the latency of administrative commands.
Adaptive Replacement Cache (<acronym>ARC</acronym>)
<acronym>ZFS</acronym> uses an Adaptive Replacement Cache (<acronym>ARC</acronym>), rather than a more traditional Least Recently Used (<acronym>LRU</acronym>) cache. An <acronym>LRU</acronym> cache is a simple list of items in the cache, sorted by when each object was most recently used. New items are added to the top of the list. When the cache is full, items from the bottom of the list are evicted to make room for more active objects. An <acronym>ARC</acronym> consists of four lists; the Most Recently Used (<acronym>MRU</acronym>) and Most Frequently Used (<acronym>MFU</acronym>) objects, plus a ghost list for each. These ghost lists track recently evicted objects to prevent them from being added back to the cache. This increases the cache hit ratio by avoiding objects that have a history of only being used occasionally. Another advantage of using both an <acronym>MRU</acronym> and <acronym>MFU</acronym> is that scanning an entire file system would normally evict all data from an <acronym>MRU</acronym> or <acronym>LRU</acronym> cache in favor of this freshly accessed content. With <acronym>ZFS</acronym>, there is also an <acronym>MFU</acronym> that only tracks the most frequently used objects, and the cache of the most commonly accessed blocks remains.
<acronym>L2ARC</acronym> is the second level of the <acronym>ZFS</acronym> caching system. The primary <acronym>ARC</acronym> is stored in <acronym>RAM</acronym>. Since the amount of available <acronym>RAM</acronym> is often limited, <acronym>ZFS</acronym> can also use <link linkend="zfs-term-vdev-cache">cache vdevs</link>. Solid State Disks (<acronym>SSD</acronym>s) are often used as these cache devices due to their higher speed and lower latency compared to traditional spinning disks. <acronym>L2ARC</acronym> is entirely optional, but having one will significantly increase read speeds for files that are cached on the <acronym>SSD</acronym> instead of having to be read from the regular disks. <acronym>L2ARC</acronym> can also speed up <link linkend="zfs-term-deduplication">deduplication</link> because a <acronym>DDT</acronym> that does not fit in <acronym>RAM</acronym> but does fit in the <acronym>L2ARC</acronym> will be much faster than a <acronym>DDT</acronym> that must be read from disk. The rate at which data is added to the cache devices is limited to prevent prematurely wearing out <acronym>SSD</acronym>s with too many writes. Until the cache is full (the first block has been evicted to make room), writing to the <acronym>L2ARC</acronym> is limited to the sum of the write limit and the boost limit, and afterwards limited to the write limit. A pair of <citerefentry><refentrytitle>sysctl</refentrytitle><manvolnum>8</manvolnum></citerefentry> values control these rate limits. <link linkend="zfs-advanced-tuning-l2arc_write_max"><varname>vfs.zfs.l2arc_write_max</varname></link> controls how many bytes are written to the cache per second, while <link linkend="zfs-advanced-tuning-l2arc_write_boost"><varname>vfs.zfs.l2arc_write_boost</varname></link> adds to this limit during the <quote>Turbo Warmup Phase</quote> (Write Boost).


User avatar None

New source string

FreeBSD Doc (Archived) / books_handbookEnglish

New source string 10 months ago
Browse all component changes

Source information

Source string comment
(itstool) path: listitem/para
Source string location
String age
10 months ago
Source string age
10 months ago
Translation file
books/handbook.pot, string 7019