Source string Read only

(itstool) path: answer/para

556/5560
Context English State
Under 4.X we do not use a sequential list. Instead we use a radix tree and bitmaps of swap blocks rather than ranged list nodes. We take the hit of preallocating all the bitmaps required for the entire swap area up front but it winds up wasting less memory due to the use of a bitmap (one bit per block) instead of a linked list of nodes. The use of a radix tree instead of a sequential list gives us nearly O(1) performance no matter how fragmented the tree becomes.
How is the separation of clean and dirty (inactive) pages related to the situation where you see low cache queue counts and high active queue counts in <command>systat -vm</command>? Do the systat stats roll the active and dirty pages together for the active queue count?
I do not get the following:
It is important to note that the FreeBSD VM system attempts to separate clean and dirty pages for the express reason of avoiding unnecessary flushes of dirty pages (which eats I/O bandwidth), nor does it move pages between the various page queues gratuitously when the memory subsystem is not being stressed. This is why you will see some systems with very low cache queue counts and high active queue counts when doing a <command>systat -vm</command> command.
Yes, that is confusing. The relationship is <quote>goal</quote> verses <quote>reality</quote>. Our goal is to separate the pages but the reality is that if we are not in a memory crunch, we do not really have to.
What this means is that FreeBSD will not try very hard to separate out dirty pages (inactive queue) from clean pages (cache queue) when the system is not being stressed, nor will it try to deactivate pages (active queue -&gt; inactive queue) when the system is not being stressed, even if they are not being used.
In the <citerefentry><refentrytitle>ls</refentrytitle><manvolnum>1</manvolnum></citerefentry> / <command>vmstat 1</command> example, would not some of the page faults be data page faults (COW from executable file to private page)? I.e., I would expect the page faults to be some zero-fill and some program data. Or are you implying that FreeBSD does do pre-COW for the program data?
A COW fault can be either zero-fill or program-data. The mechanism is the same either way because the backing program-data is almost certainly already in the cache. I am indeed lumping the two together. FreeBSD does not pre-COW program data or zero-fill, but it <emphasis>does</emphasis> pre-map pages that exist in its cache.
In your section on page table optimizations, can you give a little more detail about <literal>pv_entry</literal> and <literal>vm_page</literal> (or should vm_page be <literal>vm_pmap</literal>—as in 4.4, cf. pp. 180-181 of McKusick, Bostic, Karel, Quarterman)? Specifically, what kind of operation/reaction would require scanning the mappings?
How does Linux do in the case where FreeBSD breaks down (sharing a large file mapping over many processes)?
A <literal>vm_page</literal> represents an (object,index#) tuple. A <literal>pv_entry</literal> represents a hardware page table entry (pte). If you have five processes sharing the same physical page, and three of those processes's page tables actually map the page, that page will be represented by a single <literal>vm_page</literal> structure and three <literal>pv_entry</literal> structures.
<literal>pv_entry</literal> structures only represent pages mapped by the MMU (one <literal>pv_entry</literal> represents one pte). This means that when we need to remove all hardware references to a <literal>vm_page</literal> (in order to reuse the page for something else, page it out, clear it, dirty it, and so forth) we can simply scan the linked list of <literal>pv_entry</literal>'s associated with that <literal>vm_page</literal> to remove or modify the pte's from their page tables.
Under Linux there is no such linked list. In order to remove all the hardware page table mappings for a <literal>vm_page</literal> linux must index into every VM object that <emphasis>might</emphasis> have mapped the page. For example, if you have 50 processes all mapping the same shared library and want to get rid of page X in that library, you need to index into the page table for each of those 50 processes even if only 10 of them have actually mapped the page. So Linux is trading off the simplicity of its design against performance. Many VM algorithms which are O(1) or (small N) under FreeBSD wind up being O(N), O(N^2), or worse under Linux. Since the pte's representing a particular page in an object tend to be at the same offset in all the page tables they are mapped in, reducing the number of accesses into the page tables at the same pte offset will often avoid blowing away the L1 cache line for that offset, which can lead to better performance.
FreeBSD has added complexity (the <literal>pv_entry</literal> scheme) in order to increase performance (to limit page table accesses to <emphasis>only</emphasis> those pte's that need to be modified).
But FreeBSD has a scaling problem that Linux does not in that there are a limited number of <literal>pv_entry</literal> structures and this causes problems when you have massive sharing of data. In this case you may run out of <literal>pv_entry</literal> structures even though there is plenty of free memory available. This can be fixed easily enough by bumping up the number of <literal>pv_entry</literal> structures in the kernel config, but we really need to find a better way to do it.
In regards to the memory overhead of a page table verses the <literal>pv_entry</literal> scheme: Linux uses <quote>permanent</quote> page tables that are not throw away, but does not need a <literal>pv_entry</literal> for each potentially mapped pte. FreeBSD uses <quote>throw away</quote> page tables but adds in a <literal>pv_entry</literal> structure for each actually-mapped pte. I think memory utilization winds up being about the same, giving FreeBSD an algorithmic advantage with its ability to throw away page tables at will with very low overhead.
Finally, in the page coloring section, it might help to have a little more description of what you mean here. I did not quite follow it.
Do you know how an L1 hardware memory cache works? I will explain: Consider a machine with 16MB of main memory but only 128K of L1 cache. Generally the way this cache works is that each 128K block of main memory uses the <emphasis>same</emphasis> 128K of cache. If you access offset 0 in main memory and then offset 128K in main memory you can wind up throwing away the cached data you read from offset 0!
Now, I am simplifying things greatly. What I just described is what is called a <quote>direct mapped</quote> hardware memory cache. Most modern caches are what are called 2-way-set-associative or 4-way-set-associative caches. The set-associatively allows you to access up to N different memory regions that overlap the same cache memory without destroying the previously cached data. But only N.
So if I have a 4-way set associative cache I can access offset 0, offset 128K, 256K and offset 384K and still be able to access offset 0 again and have it come from the L1 cache. If I then access offset 512K, however, one of the four previously cached data objects will be thrown away by the cache.
It is extremely important… <emphasis>extremely</emphasis> important for most of a processor's memory accesses to be able to come from the L1 cache, because the L1 cache operates at the processor frequency. The moment you have an L1 cache miss and have to go to the L2 cache or to main memory, the processor will stall and potentially sit twiddling its fingers for <emphasis>hundreds</emphasis> of instructions worth of time waiting for a read from main memory to complete. Main memory (the dynamic ram you stuff into a computer) is <emphasis>slow</emphasis>, when compared to the speed of a modern processor core.
Ok, so now onto page coloring: All modern memory caches are what are known as <emphasis>physical</emphasis> caches. They cache physical memory addresses, not virtual memory addresses. This allows the cache to be left alone across a process context switch, which is very important.
But in the <trademark class="registered">UNIX</trademark> world you are dealing with virtual address spaces, not physical address spaces. Any program you write will see the virtual address space given to it. The actual <emphasis>physical</emphasis> pages underlying that virtual address space are not necessarily physically contiguous! In fact, you might have two pages that are side by side in a processes address space which wind up being at offset 0 and offset 128K in <emphasis>physical</emphasis> memory.
A program normally assumes that two side-by-side pages will be optimally cached. That is, that you can access data objects in both pages without having them blow away each other's cache entry. But this is only true if the physical pages underlying the virtual address space are contiguous (insofar as the cache is concerned).
This is what Page coloring does. Instead of assigning <emphasis>random</emphasis> physical pages to virtual addresses, which may result in non-optimal cache performance, Page coloring assigns <emphasis>reasonably-contiguous</emphasis> physical pages to virtual addresses. Thus programs can be written under the assumption that the characteristics of the underlying hardware cache are the same for their virtual address space as they would be if the program had been run directly in a physical address space.
Note that I say <quote>reasonably</quote> contiguous rather than simply <quote>contiguous</quote>. From the point of view of a 128K direct mapped cache, the physical address 0 is the same as the physical address 128K. So two side-by-side pages in your virtual address space may wind up being offset 128K and offset 132K in physical memory, but could also easily be offset 128K and offset 4K in physical memory and still retain the same cache performance characteristics. So page-coloring does <emphasis>not</emphasis> have to assign truly contiguous pages of physical memory to contiguous pages of virtual memory, it just needs to make sure it assigns contiguous pages from the point of view of cache performance and operation.

Loading…

No matching activity found.

Browse all component changes

Glossary

English English
No related strings found in the glossary.

Source information

Source string comment

(itstool) path: answer/para

Flags
read-only
Source string location
article.translate.xml:808
String age
a year ago
Source string age
a year ago
Translation file
articles/vm-design.pot, string 98