The performance of UML is dominated by the context switches back and forth between its processes and the tracing thread. So, any major performance improvements have to focus on eliminating the tracing thread.
This would require a mechanism for doing system call interception without using a separate thread. I'm planning on doing this by adding a new system call path in the host which delivers a signal to the process whenever it does a system call. The signal handler would be the current UML system call handler which reads the system call and arguments and executes the system call.
Another area for performance improvement is context switching. The problem is the address space scan which is required to order to bring the host address space up-to-date with UML. This could be eliminated by allowing address spaces to be created, manipulated, and switched from userspace. This would allow the address spaces of all UML processes to be kept up-to-date, which would allow the address space scan to be eliminated.