| Age | Commit message (Collapse) | Author |
|
Historically the code to set up AT_HWCAP and AT_PLATFORM was only built
for 32bit x86 as it was intermingled with the vDSO passthrough code.
Now that vDSO passthrough has been removed, always pass through AT_HWCAP
and AT_PLATFORM.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20251028-uml-remove-32bit-pseudo-vdso-v1-10-e930063eff5f@weissschuh.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Inheriting the vDSO from the host is problematic. The values read
from the time functions will not be correct for the UML kernel.
Furthermore the start and end of the vDSO are not stable or
detectable by userspace. Specifically the vDSO datapages start
before AT_SYSINFO_EHDR and the vDSO itself is larger than a single page.
This codepath is only used on 32bit x86 UML. In my testing with both
32bit and 64bit hosts the passthrough functionality has always been
disabled anyways due to the checks against envp in scan_elf_aux().
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20251028-uml-remove-32bit-pseudo-vdso-v1-4-e930063eff5f@weissschuh.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Setting all auxiliary vector values to default values if one of them
was not provided by the host will discard perfectly fine values.
Remove the elf_aux_platform fallback from the vDSO ones.
As zero is the correct fallback anyways, don't create a new conditional.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20251028-uml-remove-32bit-pseudo-vdso-v1-3-e930063eff5f@weissschuh.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The generic UM code should not have references to x86-specific value.
Move the fallback into the x86-specific header.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20251028-uml-remove-32bit-pseudo-vdso-v1-2-e930063eff5f@weissschuh.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Setting all auxiliary vector values to default values if one of them
was not provided by the host will discard perfectly fine values.
Move the elf_aux_platform fallback to its own conditional.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20251028-uml-remove-32bit-pseudo-vdso-v1-1-e930063eff5f@weissschuh.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Add initial symmetric multi-processing (SMP) support to UML. With
this support enabled, users can tell UML to start multiple virtual
processors, each represented as a separate host thread.
In UML, kthreads and normal threads (when running in kernel mode)
can be scheduled and executed simultaneously on different virtual
processors. However, the userspace code of normal threads still
runs within their respective single-threaded stubs.
That is, SMP support is currently available both within the kernel
and across different processes, but still remains limited within
threads of the same process in userspace.
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20251027001815.1666872-6-tiwei.bie@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Define timers on a per-CPU basis to enable each CPU to have its
own timer. This is a preparation for adding SMP support.
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20251027001815.1666872-5-tiwei.bie@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
With SMP and NO_HZ enabled, the CPU may still need to sleep even
if the timer is disarmed. Switch to deciding whether to sleep based
on pending resched. Additionally, because disabling IRQs does not
block SIGALRM, it is also necessary to check for any pending timer
alarms. This is a preparation for adding SMP support.
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20251027001815.1666872-4-tiwei.bie@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Turn signals_enabled, signals_pending and signals_active into
thread-local variables. This enables us to control and track
signals independently on each CPU thread. This is a preparation
for adding SMP support.
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20251027001815.1666872-3-tiwei.bie@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The file-based iomem emulation was introduced to support writing
paravirtualized drivers based on emulated iomem regions. However,
the only driver that makes use of it is an example driver called
mmapper, which was written over two decades ago.
We now have several modern device emulation mechanisms, such as
vhost-user-based virtio-uml. Remove the file-based iomem emulation
support to reduce the maintenance burden.
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20251027054519.1996090-5-tiwei.bie@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux
Pull uml updates from Johannes Berg:
- minor preparations for SMP support
- SPARSE_IRQ support for kunit
- help output cleanups
* tag 'uml-for-linux-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
um: Remove unused ipi_pipe field from cpuinfo_um
um: Remove unused cpu_data and current_cpu_data macros
um: Stop tracking virtual CPUs via mm_cpumask()
um: Centralize stub size calculations
um: Remove outdated comment about STUB_DATA_PAGES
um: Remove unused offset and child_err fields from stub_data
um: Indent time-travel help messages
um: Fix help message for ssl-non-raw
um: vector: Fix indentation for help message
um: Add missing trailing newline to help messages
um: virtio-pci: implement .shutdown()
um: Support SPARSE_IRQ
|
|
When copying FDs, the copy size should not include the control
message header (cmsghdr). Fix it.
Fixes: 5cde6096a4dd ("um: generalize os_rcv_fd")
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
On one of my machines UML failed to start after enabling
SELinux.
UML failed to start because SELinux's execmod rule denies
executable pages on a modified file mapping.
Historically UML marks it's stack rwx.
AFAICT, these days this is no longer needed, so let's remove
PROT_EXEC.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Some help messages are missing a trailing newline. They should
end with two newlines, but only one is present. Add the missing
newline to make the --help output more readable.
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The PID of the stub process can be obtained from current_mm_id().
There is no need to track it via userspace_pid[]. Stop doing that
to simplify the code.
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20250711065021.2535362-4-tiwei.bie@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
It's no longer used. Remove it.
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20250711065021.2535362-3-tiwei.bie@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Avoid declaring a new variable 'ret' inside the 'if (using_seccomp)'
block, as the existing 'err' variable declared at the top of the
function already serves the same purpose.
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20250711065021.2535362-2-tiwei.bie@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
It's only used within process.c. Make it static.
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20250708090403.1067440-2-tiwei.bie@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
There was a typo that caused the extended FP state to be copied into the
wrong location on 32 bit. On 32 bit we only store the xstate internally
as that already contains everything. However, for compatibility, the
mcontext on 32 bit first contains the legacy FP state and then the
xstate.
The code copied the xstate on top of the legacy FP state instead of
using the correct offset. This offset was already calculated in the
xstate_* variables, so simply switch to those to fix the problem.
With this SECCOMP mode works on 32 bit, so lift the restriction.
Fixes: b1e1bd2e6943 ("um: Add helper functions to get/set state for SECCOMP")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20250604081705.934112-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Instead of always sharing the FDs with the userspace process, only hand
over the FDs needed for mmap when required. The idea is that userspace
might be able to force the stub into executing an mmap syscall, however,
it will not be able to manipulate the control flow sufficiently to have
access to an FD that would allow mapping arbitrary memory.
Security wise, we need to be sure that only the expected syscalls are
executed after the kernel sends FDs through the socket. This is
currently not the case, as userspace can trivially jump to the
rt_sigreturn syscall instruction to execute any syscall that the stub is
permitted to do. With this, it can trick the kernel to send the FD,
which in turn allows userspace to freely map any physical memory.
As such, this is currently *not* secure. However, in principle the
approach should be fine with a more strict SECCOMP filter and a careful
review of the stub control flow (as userspace can prepare a stack). With
some care, it is likely possible to extend the security model to SMP if
desired.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20250602130052.545733-8-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This detects seccomp support, sets the global using_seccomp variable and
initilizes the exec registers. The support is only enabled if the
seccomp= kernel parameter is set to either "on" or "auto". With "auto" a
fallback to ptrace mode will happen if initialization failed.
Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net>
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20250602130052.545733-7-benjamin@sipsolutions.net
[extend help with Kconfig text from v2, use exit syscall instead of libc,
remove unneeded mctx_offset assignment, disable on 32-bit for now]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This adds the kernel side of the seccomp based process handling.
Co-authored-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net>
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20250602130052.545733-6-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
When in seccomp mode, we would hang forever on the futex if a child has
died unexpectedly. In contrast, ptrace mode will notice it and kill the
corresponding thread when it fails to run it.
Fix this issue using a new IRQ that is fired after a SIGCHLD and keeping
an (internal) list of all MMs. In the IRQ handler, find the affected MM
and set its PID to -1 as well as the futex variable to FUTEX_IN_KERN.
This, together with futex returning -EINTR after the signal is
sufficient to implement a race-free detection of a child dying.
Note that this also enables IRQ handling while starting a userspace
process. This should be safe and SECCOMP requires the IRQ in case the
process does not come up properly.
Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net>
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20250602130052.545733-5-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The segv handler is called slightly differently depending on whether
PTRACE_FULL_FAULTINFO is set or not (32bit vs. 64bit). The only
difference is that we don't try to pass the registers and instruction
pointer to the segv handler.
It would be good to either document or remove the difference, but I do
not know why this difference exists. And, passing NULL can even result
in a crash.
Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net>
Link: https://patch.msgid.link/20250602130052.545733-2-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
tgkill is a quite old syscall since kernel 2.5.75, but unfortunately glibc
doesn't support it before 2.30. Thus some systems fail to compile the
latest UserMode Linux.
Here is the compile error I encountered when I tried to compile UML in
my system shipped with glibc-2.28.
CALL scripts/checksyscalls.sh
CC arch/um/os-Linux/sigio.o
In file included from arch/um/os-Linux/sigio.c:17:
arch/um/os-Linux/sigio.c: In function ‘write_sigio_thread’:
arch/um/os-Linux/sigio.c:49:19: error: implicit declaration of function ‘tgkill’; did you mean ‘kill’? [-Werror=implicit-function-declaration]
CATCH_EINTR(r = tgkill(pid, pid, SIGIO));
^~~~~~
./arch/um/include/shared/os.h:21:48: note: in definition of macro ‘CATCH_EINTR’
#define CATCH_EINTR(expr) while ((errno = 0, ((expr) < 0)) && (errno == EINTR))
^~~~
cc1: some warnings being treated as errors
Fix it by Replacing glibc call with raw syscall.
Fixes: 33c9da5dfb18 ("um: Rewrite the sigio workaround based on epoll and tgkill")
Signed-off-by: Yongting Lin <linyongting@gmail.com>
Link: https://patch.msgid.link/20250527151222.40371-1-linyongting@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
These legacy network transports were marked as obsolete in commit
40814b98a570 ("um: Mark non-vector net transports as obsolete").
More than five years have passed since then. Remove these network
transports to reduce the maintenance burden.
Suggested-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Link: https://patch.msgid.link/20250503051710.3286595-2-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The existing sigio workaround implementation removes FDs from the
poll when events are triggered, requiring users to re-add them via
add_sigio_fd() after processing. This introduces a potential race
condition between FD removal in write_sigio_thread() and next_poll
update in __add_sigio_fd(), and is inefficient due to frequent FD
removal and re-addition. Rewrite the implementation based on epoll
and tgkill for improved efficiency and reliability.
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20250315161910.4082396-2-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Directly creating helper threads with VM_CLONE using clone can
compromise the thread safety of errno. Since all these helper
threads have been converted to use os_run_helper_thread(), let's
prevent using this flag in run_helper_thread().
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20250319135523.97050-5-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The write_sigio thread and UML kernel thread share the same errno,
which can lead to conflicts when both call syscalls concurrently.
Switch to the pthread-based helper to address this issue.
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20250319135523.97050-4-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Introduce a new set of utility functions that can be used to create
pthread-based helpers. Helper threads created in this way will ensure
thread safety for errno while sharing the same memory space.
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20250319135523.97050-2-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
There is no need to override the default version of this function
anymore as UML now has proper _nofault memory access functions.
Doing this also fixes the fact that the implementation was incorrect as
using mincore() will incorrectly flag pages as inaccessible if they were
swapped out by the host.
Fixes: f75b1b1bedfb ("um: Implement probe_kernel_read()")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20250210160926.420133-3-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Mark read-only data actually read-only (simple mprotect), and
to be able to test it also implement _nofault accesses. This
works by setting up a new "segv_continue" pointer in current,
and then when we hit a segfault we change the signal return
context so that we continue at that address. The code using
this sets it up so that it jumps to a label and then aborts
the access that way, returning -EFAULT.
It's possible to optimize the ___backtrack_faulted() thing by
using asm goto (compiler version dependent) and/or gcc's (not
sure if clang has it) &&label extension, but at least in one
attempt I made the && caused the compiler to not load -EFAULT
into the register in case of jumping to the &&label from the
fault handler. So leave it like this for now.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Co-developed-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20250210160926.420133-2-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The stub execution uses the somewhat new close_range and execveat
syscalls. Of these two, the execveat call is essential, but the
close_range call is more about stub process hygiene rather than safety
(and its result is ignored).
Replace both calls with a raw syscall as older machines might not have a
recent enough kernel for close_range (with CLOSE_RANGE_CLOEXEC) or a
libc that does not yet expose both of the syscalls.
Fixes: 32e8eaf263d9 ("um: use execveat to create userspace MMs")
Reported-by: Glenn Washburn <development@efficientek.com>
Closes: https://lore.kernel.org/20250108022404.05e0de1e@crass-HP-ZBook-15-G2
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20250113094107.674738-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
It's no longer used since commit 42fda66387da ("uml: throw out
CONFIG_MODE_TT").
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241128083137.2219830-9-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
It's no longer used since commit 11100b1dfb6e ("uml: delete
unused code").
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241128083137.2219830-8-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
It's only invoked during boot from main().
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241128083137.2219830-7-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
It's only invoked during boot from main().
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241128083137.2219830-6-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
It's only invoked during boot from main().
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241128083137.2219830-5-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This selects the THREAD_INFO_IN_TASK option for UM and changes the way
that the current task is discovered. This is trivial though, as UML
already tracks the current task in cpu_tasks[] and this can be used to
retrieve it.
Also remove the signal handler code that copies the thread information
into the IRQ stack. It is obsolete now, which also means that the
mentioned race condition cannot happen anymore.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Hajime Tazaki <thehajime@gmail.com>
Link: https://patch.msgid.link/20241111102910.46512-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The show_stack function had some code to detect double faults. However,
the logic is wrong and it would e.g. trigger if a WARNING happened
inside an IRQ.
Remove it without trying to add a new logic. The current behaviour,
which will just fault repeatedly until the IRQ stack is used up and the
host kills UML, seems to be good enough.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241103150506.1367695-5-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
There is no need to sync the stub code to "disk" for the other process
to see the correct memory. Drop the fsync there and remove the helper
function.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241103150506.1367695-3-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Since commit a95b37e20db9 ("kbuild: get <linux/compiler_types.h> out of
<linux/kconfig.h>") we can safely include these files in userspace code.
Doing so simplifies matters as options do not need to be exported via
asm-offsets.h anymore.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241103150506.1367695-2-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
There is no point in either dumping the KASAN shadow memory or doing
copy-on-write after a fork on these memory regions.
This considerably speeds up coredump generation.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241103150506.1367695-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The write_sigio thread is not really a traditional thread. Set
the parent-death signal for it to ensure that it will be killed
if the UML kernel dies unexpectedly without proper cleanup.
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241024142828.2612828-4-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This helper can be used to set the parent-death signal of the calling
process to SIGKILL to ensure that the process will be killed if the
UML kernel dies unexpectedly without proper cleanup. This helper will
be used in the follow-up patches.
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241024142828.2612828-2-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Evidently, PATH_MAX isn't always defined, at least not via <limits.h>.
Simply remove the use and replace it by a constant 4k. As stat::st_size
is zero for /proc/self/exe we can't even size it automatically, and it
seems unlikely someone's going to try to run UML with such a path.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202410240553.gYNIXN8i-lkp@intel.com/
Fixes: 031acdcfb566 ("um: restore process name")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The PTRACE_GETREGSET API has now existed since Linux 2.6.33. The XSAVE
CPU feature should also be sufficiently common to be able to rely on it.
With this, define our internal FP state to be the hosts XSAVE data. Add
discovery for the hosts XSAVE size and place the FP registers at the end
of task_struct so that we can adjust the size at runtime.
Next we can implement the regset API on top and update the signal
handling as well as ptrace APIs to use them. Also switch coredump
creation to use the regset API and finally set HAVE_ARCH_TRACEHOOK.
This considerably improves the signal frames. Previously they might not
have contained all the registers (i386) and also did not have the
sizes and magic values set to the correct values to permit userspace to
decode the frame.
As a side effect, this will permit UML to run on hosts with newer CPU
extensions (such as AMX) that need even more register state.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241023094120.4083426-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
In time-travel mode userspace can do a lot of work without any time
passing. Unfortunately, this can result in OOM situations as the RCU
core code will never be run.
Work around this by keeping track of userspace processes that do not
yield for a lot of operations. When this happens, insert a jiffie into
the sched_clock clock to account time against the process and cause the
bookkeeping to run.
As sched_clock is used for tracing, it is useful to keep it in sync
between the different VMs. As such, try to remove added ticks again when
the actual clock ticks.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241010142537.1134685-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
When a PTE is updated in the page table, the _PAGE_NEWPAGE bit will
always be set. And the corresponding page will always be mapped or
unmapped depending on whether the PTE is present or not. The check
on the _PAGE_NEWPROT bit is not really reachable. Abandoning it will
allow us to simplify the code and remove the unreachable code.
Reviewed-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241011102354.1682626-2-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This parameter is UML specific and is unknown to kernel. It should not
be propagated to kernel, otherwise it could be passed to user space as
a command line option by kernel with a warning like:
Unknown kernel command line parameters "noreboot", will be passed to user space.
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241011040441.1586345-6-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|