Source string Read only

(itstool) path: sect3/para
Context English State
Implementation on <trademark class="registered">Linux</trademark> i386
There are two main ways of setting up TLS in <trademark class="registered">Linux</trademark>. It can be set when cloning a process using the <function>clone</function> syscall or it can call <function>set_thread_area</function>. When a process passes <literal>CLONE_SETTLS</literal> flag to <function>clone</function>, the kernel expects the memory pointed to by the <varname>%esi</varname> register a <trademark class="registered">Linux</trademark> user space representation of a segment, which gets translated to the machine representation of a segment and loaded into a GDT slot. The GDT slot can be specified with a number or -1 can be used meaning that the system itself should choose the first free slot. In practice, the vast majority of programs use only one TLS entry and does not care about the number of the entry. We exploit this in the emulation and in fact depend on it.
Emulation of <trademark class="registered">Linux</trademark> TLS
Loading of TLS for the current thread happens by calling <function>set_thread_area</function> while loading TLS for a second process in <function>clone</function> is done in the separate block in <function>clone</function>. Those two functions are very similar. The only difference being the actual loading of the GDT segment, which happens on the next context switch for the newly created process while <function>set_thread_area</function> must load this directly. The code basically does this. It copies the <trademark class="registered">Linux</trademark> form segment descriptor from the userland. The code checks for the number of the descriptor but because this differs between FreeBSD and <trademark class="registered">Linux</trademark> we fake it a little. We only support indexes of 6, 3 and -1. The 6 is genuine <trademark class="registered">Linux</trademark> number, 3 is genuine FreeBSD one and -1 means autoselection. Then we set the descriptor number to constant 3 and copy out this to the userspace. We rely on the userspace process using the number from the descriptor but this works most of the time (have never seen a case where this did not work) as the userspace process typically passes in 1. Then we convert the descriptor from the <trademark class="registered">Linux</trademark> form to a machine dependant form (i.e. operating system independent form) and copy this to the FreeBSD defined segment descriptor. Finally we can load it. We assign the descriptor to threads PCB (process control block) and load the <varname>%gs</varname> segment using <function>load_gs</function>. This loading must be done in a critical section so that nothing can interrupt us. The <literal>CLONE_SETTLS</literal> case works exactly like this just the loading using <function>load_gs</function> is not performed. The segment used for this (segment number 3) is shared for this use between FreeBSD processes and <trademark class="registered">Linux</trademark> processes so the <trademark class="registered">Linux</trademark> emulation layer does not add any overhead over plain FreeBSD.
The amd64 implementation is similar to the i386 one but there was initially no 32bit segment descriptor used for this purpose (hence not even native 32bit TLS users worked) so we had to add such a segment and implement its loading on every context switch (when a flag signaling use of 32bit is set). Apart from this the TLS loading is exactly the same just the segment numbers are different and the descriptor format and the loading differs slightly.
Introduction to synchronization
Threads need some kind of synchronization and <trademark class="registered">POSIX</trademark> provides some of them: mutexes for mutual exclusion, read-write locks for mutual exclusion with biased ratio of reads and writes and condition variables for signaling a status change. It is interesting to note that <trademark class="registered">POSIX</trademark> threading API lacks support for semaphores. Those synchronization routines implementations are heavily dependant on the type threading support we have. In pure 1:M (userspace) model the implementation can be solely done in userspace and thus be very fast (the condition variables will probably end up being implemented using signals, i.e. not fast) and simple. In 1:1 model, the situation is also quite clear - the threads must be synchronized using kernel facilities (which is very slow because a syscall must be performed). The mixed M:N scenario just combines the first and second approach or rely solely on kernel. Threads synchronization is a vital part of thread-enabled programming and its performance can affect resulting program a lot. Recent benchmarks on FreeBSD operating system showed that an improved sx_lock implementation yielded 40% speedup in <firstterm>ZFS</firstterm> (a heavy sx user), this is in-kernel stuff but it shows clearly how important the performance of synchronization primitives is.
Threaded programs should be written with as little contention on locks as possible. Otherwise, instead of doing useful work the thread just waits on a lock. Because of this, the most well written threaded programs show little locks contention.
Futexes introduction
<trademark class="registered">Linux</trademark> implements 1:1 threading, i.e. it has to use in-kernel synchronization primitives. As stated earlier, well written threaded programs have little lock contention. So a typical sequence could be performed as two atomic increase/decrease mutex reference counter, which is very fast, as presented by the following example:
1:1 threading forces us to perform two syscalls for those mutex calls, which is very slow.
The solution <trademark class="registered">Linux</trademark> 2.6 implements is called futexes. Futexes implement the check for contention in userspace and call kernel primitives only in a case of contention. Thus the typical case takes place without any kernel intervention. This yields reasonably fast and flexible synchronization primitives implementation.
Futex API
The futex syscall looks like this:
int futex(void *uaddr, int op, int val, struct timespec *timeout, void *uaddr2, int val3);
In this example <varname>uaddr</varname> is an address of the mutex in userspace, <varname>op</varname> is an operation we are about to perform and the other parameters have per-operation meaning.
Futexes implement the following operations:
This operation verifies that on address <varname>uaddr</varname> the value <varname>val</varname> is written. If not, <literal>EWOULDBLOCK</literal> is returned, otherwise the thread is queued on the futex and gets suspended. If the argument <varname>timeout</varname> is non-zero it specifies the maximum time for the sleeping, otherwise the sleeping is infinite.
This operation takes a futex at <varname>uaddr</varname> and wakes up <varname>val</varname> first futexes queued on this futex.


User avatar None

New source string

FreeBSD Doc / articles_linux-emulationEnglish

New source string 6 months ago
Browse all component changes

Things to check

Long untranslated

The string has not been translated for a long time



English English
No related strings found in the glossary.

Source information

Source string comment
(itstool) path: sect3/para
Source string location
String age
6 months ago
Source string age
6 months ago
Translation file
articles/linux-emulation.pot, string 294