Site Home Page
The UML Wiki
UML Community Site
The UML roadmap
What it's good for
Case Studies
Kernel Capabilities
Downloading it
Running it
Compiling
Installation
Skas Mode
Incremental Patches
Test Suite
Host memory use
Building filesystems
Troubles
User Contributions
Related Links
Projects
Diary
Thanks
Contacts
Tutorials
The HOWTO (html)
The HOWTO (text)
Host file access
Device inputs
Sharing filesystems
Creating filesystems
Resizing filesystems
Virtual Networking
Management Console
Kernel Debugging
UML Honeypots
gprof and gcov
Running X
Diagnosing problems
Configuration
Installing Slackware
Porting UML
IO memory emulation
UML on 2G/2G hosts
Adding a UML system call
Running nested UMLs
How you can help
Overview
Documentation
Utilities
Kernel projects
Screenshots
A virtual network
An X session
Transcripts
A login session
A debugging session
Slackware installation
Reference
Kernel switches
Slackware README
Papers
ALS 2000 paper (html)
ALS 2000 paper (TeX)
ALS 2000 slides
LCA 2001 slides
OLS 2001 paper (html)
OLS 2001 paper (TeX)
ALS 2001 paper (html)
ALS 2001 paper (TeX)
UML security (html)
LCA 2002 (html)
WVU 2002 (html)
Security Roundtable (html)
OLS 2002 slides
LWE 2005 slides
Fun and Games
Kernel Hangman
Disaster of the Month

What to do when UML doesn't work

Strange compilation errors when you build from source
As of test11, it is necessary to have "ARCH=um" in the environment or on the make command line for all steps in building UML, including clean, distclean, or mrproper, config, menuconfig, or xconfig, dep, and linux. If you forget for any of them, the i386 build seems to contaminate the UML build. If this happens, start from scratch with
host% make mrproper ARCH=um
and repeat the build process with ARCH=um on all the steps.

See the compilation page for more details.

Another cause of strange compilation errors is building UML in /usr/src/linux. If you do this, the first thing you need to do is clean up the mess you made. The /usr/src/linux/asm link will now point to /usr/src/linux/asm-um. Make it point back to /usr/src/linux/asm-i386. Then, move your UML pool someplace else and build it there. Also see below, where a more specific set of symptoms is described.

UML hangs on boot after mounting devfs
The boot looks like this:
                
VFS: Mounted root (ext2 filesystem) readonly.
Mounted devfs on /dev

              
You're probably running a recent distribution on an old machine. I saw this with the RH7.1 filesystem running on a Pentium. The shared library loader, ld.so, was executing an instruction (cmove) which the Pentium didn't support. That instruction was apparently added later. If you run UML under the debugger, you'll see the hang caused by one instruction causing an infinite SIGILL stream.

The fix is to boot UML on an older filesystem.

A variety of panics and hangs with /tmp on a reiserfs filesystem
I saw this on reiserfs 3.5.21 and it seems to be fixed in 3.5.27. Panics preceded by
                Detaching pid nnnn
              
are diagnostic of this problem. This is a reiserfs bug which causes a thread to occasionally read stale data from a mmapped page shared with another thread. The fix is to upgrade the filesystem or to have /tmp be an ext2 filesystem.
The compile fails with errors about conflicting types for 'open', 'dup', and 'waitpid'
This happens when you build in /usr/src/linux. The UML build makes the include/asm link point to include/asm-um. /usr/include/asm points to /usr/src/linux/include/asm, so when that link gets moved, files which need to include the asm-i386 versions of headers get the incompatible asm-um versions. The fix is to move the include/asm link back to include/asm-i386 and to do UML builds someplace else.
UML doesn't work when /tmp is an NFS filesystem
This seems to be a similar situation with the resierfs problem above. Some versions of NFS seems not to handle mmap correctly, which UML depends on. The workaround is have /tmp be non-NFS directory.
UML hangs on boot when compiled with gprof support
If you build UML with gprof support and, early in the boot, it does this
                kernel BUG at page_alloc.c:100!
              
you have a buggy gcc. You can work around the problem by removing UM_FASTCALL from CFLAGS in arch/um/Makefile-i386. This will open up another bug, but that one is fairly hard to reproduce.
syslogd dies with a SIGTERM on startup
The exact boot error depends on the distribution that you're booting, but Debian produces this:
                /etc/rc2.d/S10sysklogd: line 49:    93 Terminated
start-stop-daemon --start --quiet --exec /sbin/syslogd -- $SYSLOGD

              
This is a syslogd bug. There's a race between a parent process installing a signal handler and its child sending the signal. See
this uml-devel post for the details.
TUN/TAP networking doesn't work on a 2.4 host
There are a couple of problems which were pointed out by Tim Robinson
  • It doesn't work on hosts running 2.4.7 (or thereabouts) or earlier. The fix is to upgrade to something more recent and then read the next item.
  • If you see
                        File descriptor in bad state
                      
    when you bring up the device inside UML, you have a header mismatch between the original kernel and the upgraded one. Make /usr/src/linux point at the new headers. This will only be a problem if you build uml_net yourself.
You can network to the host but not to other machines on the net
If you can connect to the host, and the host can connect to UML, but you can not connect to any other machines, then you may need to enable IP Masquerading on the host. Usually this is only experienced when using private IP addresses (192.168.x.x or 10.x.x.x) for host/UML networking, rather than the public address space that your host is connected to. UML does not enable IP Masquerading, so you will need to create a static rule to enable it:
host% iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
Replace eth0 with the interface that you use to talk to the rest of the world.

Documentation on IP Masquerading, and SNAT, can be found at www.netfilter.org .

If you can reach the local net, but not the outside Internet, then that is usually a routing problem. The UML needs a default route:

UML# route add default gw gateway IP
The gateway IP can be any machine on the local net that knows how to reach the outside world. Usually, this is the host or the local network's gateway.

Occasionally, we hear from someone who can reach some machines, but not others on the same net, or who can reach some ports on other machines, but not others. These are usually caused by strange firewalling somewhere between the UML and the other box. You track this down by running tcpdump on every interface the packets travel over and see where they disappear. When you find a machine that takes the packets in, but does not send them onward, that's the culprit.

I have no root and I want to scream
Thanks to Birgit Wahlich for telling me about this strange one. It turns out that there's a limit of six environment variables on the kernel command line. When that limit is reached or exceeded, argument processing stops, which means that the 'root=' argument that UML usually adds is not seen. So, the filesystem has no idea what the root device is, so it panics.

The fix is to put less stuff on the command line. Glomming all your setup variables into one is probably the best way to go.

UML build conflict between ptrace.h and ucontext.h
On some older systems, /usr/include/asm/ptrace.h and /usr/include/sys/ucontext.h define the same names. So, when they're included together, the defines from one completely mess up the parsing of the other, producing errors like:
                /usr/include/sys/ucontext.h:47: parse error before
`10'
              
plus a pile of warnings.

This is a libc botch, which has since been fixed, and I don't see any way around it besides upgrading.

The UML BogoMips is exactly half the host's BogoMips
On i386 kernels, there are two ways of running the loop that is used to calculate the BogoMips rating, using the TSC if it's there or using a one-instruction loop. The TSC produces twice the BogoMips as the loop. UML uses the loop, since it has nothing resembling a TSC, and will get almost exactly the same BogoMips as a host using the loop. However, on a host with a TSC, its BogoMips will be double the loop BogoMips, and therefore double the UML BogoMips.
When you run UML, it immediately segfaults
If the host is configured with the 2G/2G address space split, that's why. See this page for the details on getting UML to run on your host.
xterms appear, then immediately disappear
If you're running an up to date kernel with an old release of uml_utilities, the port-helper program will not work properly, so xterms will exit straight after they appear. The solution is to upgrade to the latest release of uml_utilities. Usually this problem occurs when you have installed a packaged release of UML then compiled your own development kernel without upgrading the uml_utilities from the source distribution.
cannot set up thread-local storage
This problem is fixed by the skas-hold-own-ldt patch that went into 2.6.15-rc1.

The boot looks like this:

                
cannot set up thread-local storage: cannot set up LDT for thread-local storage
Kernel panic - not syncing: Attempted to kill init!

              
Your UML kernel doesn't support Native Posix Thread Library (NPTL) and the binaries you're running are being dynamically linked to it. Try running in SKAS3 mode first. You might be able to avoid the kernel panic setting the
LD_ASSUME_KERNEL environment variable on the command line:
./linux init=/bin/sh LD_ASSUME_KERNEL=2.4.1

Many commands are very restrictive about what is preserved in the environment when starting child processes, so relying on LD_ASSUME_KERNEL to be globally set for all processes in the whole system is generally not a good idea. It's very hard to guarantee. Thus it's better to move the NPTL libraries away:

                
# mount root_fs mnt-uml/ -o loop
# mv mnt-uml/lib/tls mnt-uml/lib/tls.away
# umount mnt-uml

              
If you're running Debian, you might prefer to use dpkg-divert:
                
# export LD_ASSUME_KERNEL=2.4.1
# mount root_fs mnt-uml/ -o loop
# chroot mnt-uml
# mkdir /lib/tls.off
# cd /lib/tls
# for f in *;
  do
       dpkg-divert --divert --local --rename --divert /lib/tls.off/$f --add /lib/tls/$f;
  done
# exit
# umount mnt-uml
# unset LD_ASSUME_KERNEL

              
Process segfaults with a modern (NPTL-using) filesystem
These appear to be fixed with the tls patches from Blaisorblade that are currently in my patchset. You can apply the entire patchset, or you can move /lib/tls in the image away, as described above.
Any other panic, hang, or strange behavior
If you're seeing truly strange behavior, such as hangs or panics that happen in random places, or you try running the debugger to see what's happening and it acts strangely, then it could be a problem in the host kernel. If you're not running a stock Linus or -ac kernel, then try that. An early version of the preemption patch and a 2.4.10 SuSE kernel have caused very strange problems in UML.

Otherwise, let me know about it. Send a message to one of the UML mailing lists - either the developer list - user-mode-linux-devel at lists dot sourceforge dot net (subscription info) or the user list - user-mode-linux-user at lists dot sourceforge do net (subscription info), whichever you prefer. Don't assume that everyone knows about it and that a fix is imminent.

If you want to be super-helpful, read the trouble-shooting page and follow the instructions contained therein.

Hosted at SourceForge Logo