Source string Read only

(itstool) path: sect1/para
307/3070
Context English State
_ external ref='fig1' md5='__failed__'
+---------------+
| A |
+---------------+
<imageobject> <imagedata fileref="fig1"/> </imageobject> <textobject> <_:literallayout-1/> </textobject> <textobject> <phrase>A picture</phrase> </textobject>
A represents the file—pages may be paged in and out of the file's physical media as necessary. Paging in from the disk is reasonable for a program, but we really do not want to page back out and overwrite the executable. The VM system therefore creates a second layer, B, that will be physically backed by swap space:
_ external ref='fig2' md5='__failed__'
+---------------+
| B |
+---------------+
| A |
+---------------+
On the first write to a page after this, a new page is created in B, and its contents are initialized from A. All pages in B can be paged in or out to a swap device. When the program forks, the VM system creates two new object layers—C1 for the parent, and C2 for the child—that rest on top of B:
_ external ref='fig3' md5='__failed__'
+-------+-------+
| C1 | C2 |
+-------+-------+
| B |
+---------------+
| A |
+---------------+
In this case, let's say a page in B is modified by the original parent process. The process will take a copy-on-write fault and duplicate the page in C1, leaving the original page in B untouched. Now, let's say the same page in B is modified by the child process. The process will take a copy-on-write fault and duplicate the page in C2. The original page in B is now completely hidden since both C1 and C2 have a copy and B could theoretically be destroyed if it does not represent a <quote>real</quote> file; however, this sort of optimization is not trivial to make because it is so fine-grained. FreeBSD does not make this optimization. Now, suppose (as is often the case) that the child process does an <function>exec()</function>. Its current address space is usually replaced by a new address space representing a new file. In this case, the C2 layer is destroyed:
_ external ref='fig4' md5='__failed__'
+-------+
| C1 |
+-------+-------+
| B |
+---------------+
| A |
+---------------+
In this case, the number of children of B drops to one, and all accesses to B now go through C1. This means that B and C1 can be collapsed together. Any pages in B that also exist in C1 are deleted from B during the collapse. Thus, even though the optimization in the previous step could not be made, we can recover the dead pages when either of the processes exit or <function>exec()</function>.
This model creates a number of potential problems. The first is that you can wind up with a relatively deep stack of layered VM Objects which can cost scanning time and memory when you take a fault. Deep layering can occur when processes fork and then fork again (either parent or child). The second problem is that you can wind up with dead, inaccessible pages deep in the stack of VM Objects. In our last example if both the parent and child processes modify the same page, they both get their own private copies of the page and the original page in B is no longer accessible by anyone. That page in B can be freed.
FreeBSD solves the deep layering problem with a special optimization called the <quote>All Shadowed Case</quote>. This case occurs if either C1 or C2 take sufficient COW faults to completely shadow all pages in B. Lets say that C1 achieves this. C1 can now bypass B entirely, so rather then have C1-&gt;B-&gt;A and C2-&gt;B-&gt;A we now have C1-&gt;A and C2-&gt;B-&gt;A. But look what also happened—now B has only one reference (C2), so we can collapse B and C2 together. The end result is that B is deleted entirely and we have C1-&gt;A and C2-&gt;A. It is often the case that B will contain a large number of pages and neither C1 nor C2 will be able to completely overshadow it. If we fork again and create a set of D layers, however, it is much more likely that one of the D layers will eventually be able to completely overshadow the much smaller dataset represented by C1 or C2. The same optimization will work at any point in the graph and the grand result of this is that even on a heavily forked machine VM Object stacks tend to not get much deeper then 4. This is true of both the parent and the children and true whether the parent is doing the forking or whether the children cascade forks.
The dead page problem still exists in the case where C1 or C2 do not completely overshadow B. Due to our other optimizations this case does not represent much of a problem and we simply allow the pages to be dead. If the system runs low on memory it will swap them out, eating a little swap, but that is it.
The advantage to the VM Object model is that <function>fork()</function> is extremely fast, since no real data copying need take place. The disadvantage is that you can build a relatively complex VM Object layering that slows page fault handling down a little, and you spend memory managing the VM Object structures. The optimizations FreeBSD makes proves to reduce the problems enough that they can be ignored, leaving no real disadvantage.
SWAP Layers
Private data pages are initially either copy-on-write or zero-fill pages. When a change, and therefore a copy, is made, the original backing object (usually a file) can no longer be used to save a copy of the page when the VM system needs to reuse it for other purposes. This is where SWAP comes in. SWAP is allocated to create backing store for memory that does not otherwise have it. FreeBSD allocates the swap management structure for a VM Object only when it is actually needed. However, the swap management structure has had problems historically:
Under FreeBSD 3.X the swap management structure preallocates an array that encompasses the entire object requiring swap backing store—even if only a few pages of that object are swap-backed. This creates a kernel memory fragmentation problem when large objects are mapped, or processes with large runsizes (RSS) fork.
Also, in order to keep track of swap space, a <quote>list of holes</quote> is kept in kernel memory, and this tends to get severely fragmented as well. Since the <quote>list of holes</quote> is a linear list, the swap allocation and freeing performance is a non-optimal O(n)-per-page.
It requires kernel memory allocations to take place during the swap freeing process, and that creates low memory deadlock problems.
The problem is further exacerbated by holes created due to the interleaving algorithm.
Also, the swap block map can become fragmented fairly easily resulting in non-contiguous allocations.
Kernel memory must also be allocated on the fly for additional swap management structures when a swapout occurs.
It is evident from that list that there was plenty of room for improvement. For FreeBSD 4.X, I completely rewrote the swap subsystem:
Swap management structures are allocated through a hash table rather than a linear array giving them a fixed allocation size and much finer granularity.
Rather then using a linearly linked list to keep track of swap space reservations, it now uses a bitmap of swap blocks arranged in a radix tree structure with free-space hinting in the radix node structures. This effectively makes swap allocation and freeing an O(1) operation.
The entire radix tree bitmap is also preallocated in order to avoid having to allocate kernel memory during critical low memory swapping operations. After all, the system tends to swap when it is low on memory so we should avoid allocating kernel memory at such times in order to avoid potential deadlocks.
To reduce fragmentation the radix tree is capable of allocating large contiguous chunks at once, skipping over smaller fragmented chunks.
I did not take the final step of having an <quote>allocating hint pointer</quote> that would trundle through a portion of swap as allocations were made in order to further guarantee contiguous allocations or at least locality of reference, but I ensured that such an addition could be made.

Loading…

No matching activity found.

Browse all component changes

Glossary

English English
No related strings found in the glossary.

Source information

Source string comment
(itstool) path: sect1/para
Flags
read-only
Source string location
article.translate.xml:313
String age
a year ago
Source string age
a year ago
Translation file
articles/vm-design.pot, string 39