Speeding this process up requires that threads be somehow able to intercept their own system calls. Anything that involves some other thread intercepting system calls will cause multiple host context switches per UML system call.
The minimum overhead is two host kernel entries and exits per UML system call. This is because the int 0x80 forces a kernel entry at the start of a system call. The fact that there needs to be a mode change back to user mode, with system call tracing being re-enabled, requires another kernel entry and exit at the end.
There are no mechanisms in Linux which allow this, so something new is needed.