|
Current patches
The purpose of this page is to keep people better informed about
ongoing work between UML releases by making the patches currently in
my working pool visible to the public. This should alleviate several
issues with UML development:
-
Not infrequently, someone finds a bug in UML, chases it down, and
submits a patch, not knowing that the bug has already been fixed in my
tree. Since my working tree isn't public until a release, there was
no way for anyone to know that the bug was already fixed.
-
Also not infrequently, the fixes in my tree are incomplete or wrong in
some other way. Having those patches available before the release
gives UML developers a way to test and sanity-check the patches before
they are released to the public.
-
Having the patches in a release split out makes it easier to fix new
bugs by allowing users to back out patches until the bug disappears.
Then we know which patch was responsible and can probably figure out
the problem quickly. This also allows non-expert users to help track
things down since the only expertise needed is the ability to run
patch and build UML from source.
To this end, I've started using quilt to
manage patches, and will publish the unreleased patches in my current
tree here. It will be updated frequently so that there will only be a
short window between me putting a patch in my tree and it appearing
here.
So, here are the patches pending in my 2.6 tree. I gave up the
pretense that I'm still supporting 2.4, so those patches are gone.
These patches apply most easily using quilt on the full tarball. If
you're not a quilt user, the "series" file in the tarball gives you
the order in which to appy the patches.
fix-config |
Last Changed - Mon Nov 19 14:50:48 EST 2007 |
tlb-build |
Last Changed - Mon Nov 19 10:41:54 EST 2007 |
page-flags |
Last Changed - Mon Jan 7 12:52:48 EST 2008 |
Signals aren't being properly notified to ptrace on x86_64.
This patch pulls the addition of the openflags.d field from externfs.
This will be merged with the o_direct patch when it is sent to mainline.
These are AIO changes needed by the ubd driver and humfs.
One major problem fixed here is -EAGAIN handling. It is not enough
to simply pass the error back to the driver so it can retry later.
The ubd driver retries in its interrupt routine, the theory being
that it knows that some requests have been finished on the host, so
there is room to queue some more. The problem is that all the
host AIO requests may be humfs requests, in which case the ubd
interrupt handler will never be called, since it had nothing
pending.
The solution is to centralize the kernel side of AIO request
completion. Rather than have the aio thread send finished requests
directly to the driver which submitted them, they now go to a new
AIO IRQ handler, which sends finished requests to the appropriate
driver. When the host says -EAGAIN, the driver registers a restart
handler with the AIO subsystem. When a host AIO request finishes,
the AIO IRQ handler calls all registered restart handlers. So, a
driver will be notified when new requests can be queued, even if
it's not the one which clogged up the host queue in the first place.
This required a bunch of changes in the ubd driver and humfs. The
interrupt routines are drastically different, as they are no longer
directly called from the IRQ system. They take a list of completed
requests and finish them off. They register a structure at boot
time with the AIO subsystem. This is used as the start of the list
of completed requests. When there are requests mixed together from
different drivers, they need to be separated into different lists,
and the aio_driver is used for this.
aio_thread_reply is gone, as the err field was redundant. Once it's
gone, the aio_context is all that's left, so we might as well just
write aio_contexts between the kernel and aio thread.
UBB_IRQ and HUMFS_IRQ are no more, being replaced by AIO_IRQ.
These are AIO changes needed by the ubd driver and humfs.
One major problem fixed here is -EAGAIN handling. It is not enough
to simply pass the error back to the driver so it can retry later.
The ubd driver retries in its interrupt routine, the theory being
that it knows that some requests have been finished on the host, so
there is room to queue some more. The problem is that all the
host AIO requests may be humfs requests, in which case the ubd
interrupt handler will never be called, since it had nothing
pending.
The solution is to centralize the kernel side of AIO request
completion. Rather than have the aio thread send finished requests
directly to the driver which submitted them, they now go to a new
AIO IRQ handler, which sends finished requests to the appropriate
driver. When the host says -EAGAIN, the driver registers a restart
handler with the AIO subsystem. When a host AIO request finishes,
the AIO IRQ handler calls all registered restart handlers. So, a
driver will be notified when new requests can be queued, even if
it's not the one which clogged up the host queue in the first place.
This required a bunch of changes in the ubd driver and humfs. The
interrupt routines are drastically different, as they are no longer
directly called from the IRQ system. They take a list of completed
requests and finish them off. They register a structure at boot
time with the AIO subsystem. This is used as the start of the list
of completed requests. When there are requests mixed together from
different drivers, they need to be separated into different lists,
and the aio_driver is used for this.
aio_thread_reply is gone, as the err field was redundant. Once it's
gone, the aio_context is all that's left, so we might as well just
write aio_contexts between the kernel and aio thread.
UBB_IRQ and HUMFS_IRQ are no more, being replaced by AIO_IRQ.
externfs |
Last Changed - Mon Nov 19 20:02:35 EST 2007 |
This is the externfs/new hostfs/humfs patch. hostfs now seems to be stable.
The old hostfs will continue to exist until this one is as functional and
stable as it.
This deletes the old hostfs. This will be sent to mainline when the
externfs-based hostfs seems stable.
switch-pipe |
Last Changed - Mon Nov 19 10:56:24 EST 2007 |
This fixes the interface of make_pipe, which doesn't need to initialize
filehandles. Instead, it is just a wrapper around pipe which just reclaims
descriptors if the initial call to pipe failes with -EMFILE.
x11-fb |
Last Changed - Mon Nov 19 10:59:25 EST 2007 |
X11 framebuffer driver from Gerd Knorr.
You have to enable CONFIG_FB (UML-specific options/Graphics support/
Support for frabe buffer devices), disable CONFIG_VGA_CONSOLE
(UML-specific options/Graphics support/Console display driver support/
VGA text console), and enable Framebuffer Console support (in the same
place), plus some fonts. You also seem to have to put 'x11=<width>x<height>
on the command line.
From: Bodo Stroesser <bstroesser@fujitsu-siemens.com>
Fix of a wrong condition.
Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com>
From: Bodo Stroesser <bstroesser@fujitsu-siemens.com>
This patch adds vm_area structs for stub-code and stub-data.
So, stub-area is displayed in /proc/XXX/maps. Also, stub-pages
are accessible for debuggers via ptrace now.
Linux has a gate-vma concept, that unfortunately supports one
gate-vma only. Thus, there need to be done some changes in
mm/memory.c and fs/proc/task_mmu.c.
This patch avoids the mainline changes by using some dirty tricks.
So, the patch is for testing only, mainline should be changed
to support more than one gate-vma.
From: Bodo Stroesser <bstroesser@fujitsu-siemens.com>
To support different subarches, UML must not use the same
address for jiffies and jiffies_64 in a hardcoded way.
I added JIFFIES_OFFSET to handle different arches. For
current arches, it is set to 0, for s390 it will be set to 4.
Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com>
From: Bodo Stroesser <bstroesser@fujitsu-siemens.com>
s390 syscalls might be done by a "svc X" instruction (2 bytes
in size) or a "exec X,Y" instruction (4 bytes in size).
There is no way to read the size of the instruction via ptrace,
so UML/s390 can't do syscall-restarting by resetting instruction
pointer to the value before the syscall.
Also, in most cases syscall number is hardcoded in the "SVC X"
instruction, so there is no way to handle ERESTART_RESTARTBLOCK
correctly by *really* restarting the syscall.
s390 host has implemented TIF_RESTART_SVC-flag to handle the
latter case.
In UML we have to use TIF_RESTART_SVC for both cases.
This patch implements TIF_RESTART_SVC in UML.
Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com>
From: Bodo Stroesser <bstroesser@fujitsu-siemens.com>
s390 normally doesn't support a method, that allows us to force
the host to skip its syscall restart handling.
I implemented a new method in the host, which also is posted to
LKML to hopefully be inserted in s390 mainline.
To check availability of this change, I added a new check, which
is done in a slightly different way for the other arches, too.
Success in check_ptrace() and success in the new check are
absolutely necessary for UML to run in any mode.
So I changed the sequence of checks to:
1) check_ptrace() being called at startup very early
2) check_ptrace() calls the new check, too
3) can_do_skas() is called after check_ptrace()
check_ptrace() will never return, if it fails, but it now uses
printf() and exit() instead of panic().
Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com>
From: Bodo Stroesser <bstroesser@fujitsu-siemens.com>
s390 doesn't have PTRACE_GETREGS and friends, but has
PTRACE_[PEEK|POKE]USR_AREA to let user of ptrace() read or write
struct user as he wants.
So we need to support this operation conditionally.
Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com>
From: Bodo Stroesser <bstroesser@fujitsu-siemens.com>
In s390, fpregs are not reset in signal handlers.
Thus we may stop stub_segv_handler on s390 with a breakpoint instruction
instead of calling getpid, kill and sigreturn.
To make this run, we must not mask any signals in stub_segv_handler.
So I added conditional execution of set_handler in userspace_tramp
depending on ARCH_STUB_NO_SIGRETURN. If this macro isn't defined,
the code remains unchanged, else no signals for sa_mask are defined
and SA_NODEFER is added to flags.
Using the change, we also no longer need to care about correct stack
pointer for sigreturn, which would cause some nasty code on s390.
Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com>
A small stub optimization. I forget what the reasoning was, needs
more thought.
s390 |
Last Changed - Tue Oct 24 12:30:58 EDT 2006 |
This patch adds s390 (31-bit) to UML.
SKAS0 and SKAS3 are tested a bit at least system boots and
shuts down correctly, network (tun/tap) works and we even could
start YaST on it.
Note:s
We use a host running SuSE SLES8 with a "private" kernel named
2.4.21-fsc.11, that contains special adaptions and drivers for
Fujitsu-Siemens mainframes. This means, our current SKAS3-patch
doesn't fit to vanilla kernels. We will create a reworked patch for
vanilla later (2.4 and 2.6).
Our 2.4 kernel also contains two fixes and one enhancement, that all
are essential to make UML run, even in SKAS0. The enhancement is to
support PT_TRACESYSGOOD, that generally is available in 2.6 kernels
but not in 2.4 for s390. So I would suggest to use 2.6 host for the
moment.
The fixes meanwhile are included into mainline. I don't know precisely
the first version containing them, in 2.6.13-rc5 they are present.
As those two patches are very small, they are inserted here as as
comment (AFAICS, its easy to do the changes by hand on older kernel
versions):
First patch to fix signal stack handling:
--- a/arch/s390/kernel/signal.c 2005-03-22 11:07:39.000000000 +0100
+++ b/arch/s390/kernel/signal.c 2005-03-22 11:08:44.000000000 +0100
@@ -285,7 +285,7 @@
/* This is the X/Open sanctioned signal stack switching. */
if (ka->sa.sa_flags & SA_ONSTACK) {
- if (! on_sig_stack(sp))
+ if (! sas_ss_flags(sp))
sp = current->sas_ss_sp + current->sas_ss_size;
}
Second patch to allow skipping of syscall restart:
--- a/arch/s390/kernel/ptrace.c 2005-05-07 07:20:31.000000000 +0200
+++ b/arch/s390/kernel/ptrace.c 2005-08-02 06:45:48.000000000 +0200
@@ -723,6 +761,13 @@
? 0x80 : 0));
/*
+ * If the debugger has set an invalid system call number,
+ * we prepare to skip the system call restart handling.
+ */
+ if (!entryexit && regs->gprs[2] >= NR_syscalls)
+ regs->trap = -1;
+
+ /*
* this isn't the same as continuing with a signal, but it will do
* for normal use. strace only continues with a signal if the
* stopping signal is not SIGTRAP. -brl
ubd-aio |
Last Changed - Mon Nov 19 11:23:59 EST 2007 |
This adds AIO support to the ubd driver.
ubd-atomic |
Last Changed - Mon Nov 19 11:24:02 EST 2007 |
To ensure that I/O can always make progress, even when there is no
memory, we provide static buffers which are to be used when dynamic
ones can't be allocated. These buffers are protected by flags which
are set when they are currently in use. The use of these flags is
protected by the queue lock, which is held for the duration of the
do_ubd_request call.
There is an allocation failure emulation
mechanism here - setting fail_start and fail_end will cause
allocations in that range (fail_start <= allocations < fail_end) to
fail, invoking the emergency mechanism.
When this is happening, I/O requests proceed one at a time,
essentially synchronously, until allocations start succeeding again.
This currently doesn't handle the bitmap array, since that can be of
any length, so we can't have a static version of it at this point.
This patch completes the robustness and deadlock avoidance work by
handling the writing of the bitmap. The existing method of dealing
with low-memory situations by having an emergency structure for use
when memory can't be allocated won't work here because of the
variable size of the bitmap buffer, and the unknown (to me) limit of
a contiguous I/O request.
The allocation is avoided by writing directly from the bitmap rather
than allocating a buffer and copying the relevant chunk of the
bitmap into it.
This has a number of consequences. First, since the bitmap is
written directly from the device's bitmap, bits should not be set in
it until the I/O is just about to start. This is because reads
would see that and possibly race with the outgoing writes, returning
data from a section of the COW file which has never been written.
To prevent this, reads that overlap a pending bitmap write are
stalled until the write is finished. Modifying the bitmap bits as
late as possible shrinks the window in which this could happen.
Second, a section of bitmap that's being written out should not be
modified again until the write has finished. Otherwise, a bit might
be set and picked up by a pending I/O, resulting in it being on disk
too soon.
So, there are a couple new lists. Bitmap writes which have been
issued, but not finished are on the pending_bitmaps list. Any
subsequent bitmap writes which overlap a pending write have to
wait. These are put on the waiting_bitmaps list. Whenever a
pending bitmap write finishes, any overlapping waiting writes are
tried. They may continue waiting because they overlap an earlier
waiter, but at least one will proceed. Third, reads which overlap a
pending or waiting bitmap write will wait until those writes have
finished. This is done by do_io returning -EAGAIN, causing the
queue to wait until some requests have finished.
This patch eliminates the atomic count associated with a bitmap_io
struct. The original thinking was that there would be a number of
aio structures associated with the bitmap_io, since different chunks
of the sg element could go to different layers. The count was
needed to know when the full sg segment reached disk and it was safe
to write the bitmap.
However, the flaw in that thinking is that a bitmap_io struct is
only needed for writes, and writes always go to the COW layer.
Hence, there will only be one bitmap_io per aio, and the counting is
unnecessary.
This patch makes it possible to merge the aio and bitmap_io structs,
which would be a good cleanup.
I noticed that the common case in io_submit is an immediate context
switch to the AIO thread when it returns from io_getevents, followed
by a switch back. This patch changes that by having the AIO thread
wait on a pipe before calling io_getevents. When the kernel
finishes a batch of I/O, it writes the number of requests down the
pipe, and the AIO thread waits for that number, and goes back to
sleeping on the pipe.
This probably shouldn't reach mainline, as O_DIRECT I/O should have
the property of causing switching on every I/O request. Also, the
wakeup mechanism should be only used when the other side might be
sleeping.
o_direct |
Last Changed - Mon Nov 19 11:31:50 EST 2007 |
This enables O_DIRECT on ubd devices. This needs work, as it will die
when creating a COW file. It also needs to do buffered I/O on backing
files.
This uses the C99 syntax to initialize an io_thread_req. Given how this
is compiled, it may not be a good idea, as it will consume more stack
than it should.
no-o-direct |
Last Changed - Mon Nov 19 11:31:55 EST 2007 |
This is the reversion of the o_direct patch so I can make COW files.
no-fakehd |
Last Changed - Mon Nov 19 11:31:56 EST 2007 |
The fakehd switch lost its implementation at some point. Since no one is
screaming for it, we might as well remove it.
cow-odirect |
Last Changed - Mon Nov 19 11:32:33 EST 2007 |
Start fixing the problems with aligned access to COW files when O_DIRECT
is enabled.
fuse |
Last Changed - Mon Nov 19 11:39:35 EST 2007 |
This is the start of the FUSE server support, which will export the UML
filesystem to the host as a FUSE filesystem.
Back out the odirect stuff temporarily.
tty-logging |
Last Changed - Mon Nov 19 11:40:03 EST 2007 |
Re-enabled tty logging.
Pull the signals_enabled bit apart into signal-specific bits. This is
used by unmask_* in the genirq patch.
This is used by unmask_* in genirq.
genirq |
Last Changed - Mon Jan 7 12:51:07 EST 2008 |
Use the genirq infrastructure correctly.
Get rid of init_irq_signals
init_new_thread_signals
start_uml_skas
userspace_tramp
exec_tramp - tt
do_new_thread_handler - tt
init_new_thread_signals should probably go away, except that this requires
tt mode to go away first
See why a separate boot_timer_handler is needed
do_boot_timer_handler
boot_timer_handler
* merge_time_init
See why SIGIO and SIGVTALRM mask off different sets of signals
* they don't any more
See if they need to mask off other signals at all
user_time_init should probably go away
* merge-time-init
Look at signal enabling/disabling in unblock_signals and the handlers
Get rid of timer_irq_inited
* no-timer-irq-inited
Merge timer_init and time_init
* merge-time-init
startup_sigvtalrm needs to call set_interval
* genirq
disable_timer should return int and not play with signals
disable_timer
error path of setup_sigvtalrm
just before exiting
* genirq
enable_timer
tt exec, separate from unblock_signals
tt new thread, separate from local_irq_enable
tt finish fork, separate from local_irq_enable
skas userspace_tramp
uml_idle_timer move to tt
* idle-timer
winch |
Last Changed - Mon Nov 19 11:56:10 EST 2007 |
SIGWINCH handling cleanups -
Reformatting of comments and code
Closed a memory and fd leak by freeing a winch structure and
everything associated with it if communication with the winch thread
dies for some reason.
free_winch in winch_interrupt may be wrong, as it calls free_irq
winch_handler_lock needs to become _irqsave
register_winch_irq may need to return int
no-os |
Last Changed - Mon Nov 19 11:56:20 EST 2007 |
There are a number of calls to os_* functions from code that's already
inside os-Linux, and thus can call libc directly. In most cases, the
wrappers added no value, and existed only to separate libc and kernel
headers. When the calls are under os-Linux, this is pointless.
os_{read,write}_file do add some value, unlike the other wrappers,
namely that it retries if the corresponding read() or write() returns
-EINTR and that it handles userspace addresses properly by making sure
they are mapped, avoiding the read() or write() returning -EFAULT.
So, we need to be sure that when these calls are replaced, there is a
CATCH_EINTR added and that the buffer is kernel memory.
In addition, the wrappers returned -errno, rather than {-1,0}, so the
replacement code needs to look at errno for the error.
This patch also makes a number of whitespace and formatting cleanups,
plus some error path fixes around the affected code.
os_print_error just goes away since it's replaced by perror and printk.
os_getpid is preserved for the moment because it takes care to make
sure that the system call is executed, avoiding an old libc bug which
cached the pid of a different thread.
etap_open needs to undo tap_open_common if a socketpair fails
Also maybe undo etap_tramp if an error came back
Split out os_set_exec_close interface change
tuntap_open needs to undo tap_open_common
clean up os_set_fd_async
Remove helper_hup and helper_pause from helper_child
Add NOCLDWAIT support
os_get_exec_close can just return the flag
ptrace_child should kill itself rather than calling _exit
Fix a confusing parameter name in the __const_udelay prototypes.
The UML changes here are suspect, and probably wrong, but also fairly
harmless.
Fix a confusing parameter name in the __const_udelay prototypes.
The UML changes here are suspect, and probably wrong, but also fairly
harmless.
logging |
Last Changed - Mon Nov 19 11:58:01 EST 2007 |
This is a little logger which dumps stuff out to a host file. Used for
tracking down otherwise intractable bugs.
|