Supporting gprof and gcov does require some work inside UML. gprof allocates a buffer to store its profiling information. This buffer must be shared among all of the UML threads, or each would get its own private copy of it. This is done by locating that buffer and replacing it with a segment of shared memory. Also, SIGPROF and the profiling timer need to be initialized properly for each new UML thread.
The gcov runtime outputs its accumulated data when the process exits normally. Unfortunately, in a multithreaded process, the first normal exit causes that to happen. So, UML needed to be changed slightly so that the only thread that exits normally is the tracing thread when the virtual machine halts. All other threads are killed when they are no longer needed.