linux-2.6-block.git
8 months agoum: move thread info into task
Benjamin Berg [Mon, 11 Nov 2024 10:29:10 +0000 (11:29 +0100)]
um: move thread info into task

This selects the THREAD_INFO_IN_TASK option for UM and changes the way
that the current task is discovered. This is trivial though, as UML
already tracks the current task in cpu_tasks[] and this can be used to
retrieve it.

Also remove the signal handler code that copies the thread information
into the IRQ stack. It is obsolete now, which also means that the
mentioned race condition cannot happen anymore.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Hajime Tazaki <thehajime@gmail.com>
Link: https://patch.msgid.link/20241111102910.46512-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: Always dump trace for specified task in show_stack
Tiwei Bie [Wed, 6 Nov 2024 10:39:33 +0000 (18:39 +0800)]
um: Always dump trace for specified task in show_stack

Currently, show_stack() always dumps the trace of the current task.
However, it should dump the trace of the specified task if one is
provided. Otherwise, things like running "echo t > sysrq-trigger"
won't work as expected.

Fixes: 970e51feaddb ("um: Add support for CONFIG_STACKTRACE")
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241106103933.1132365-1-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: vector: Do not use drvdata in release
Tiwei Bie [Mon, 4 Nov 2024 16:32:03 +0000 (00:32 +0800)]
um: vector: Do not use drvdata in release

The drvdata is not available in release. Let's just use container_of()
to get the vector_device instance. Otherwise, removing a vector device
will result in a crash:

RIP: 0033:vector_device_release+0xf/0x50
RSP: 00000000e187bc40  EFLAGS: 00010202
RAX: 0000000060028f61 RBX: 00000000600f1baf RCX: 00000000620074e0
RDX: 000000006220b9c0 RSI: 0000000060551c80 RDI: 0000000000000000
RBP: 00000000e187bc50 R08: 00000000603ad594 R09: 00000000e187bb70
R10: 000000000000135a R11: 00000000603ad422 R12: 00000000623ae028
R13: 000000006287a200 R14: 0000000062006d30 R15: 00000000623700b6
Kernel panic - not syncing: Segfault with no mm
CPU: 0 UID: 0 PID: 16 Comm: kworker/0:1 Not tainted 6.12.0-rc6-g59b723cd2adb #1
Workqueue: events mc_work_proc
Stack:
 60028f61 623ae028 e187bc80 60276fcd
 6220b9c0 603f5820 623ae028 00000000
 e187bcb0 603a2bcd 623ae000 62370010
Call Trace:
 [<60028f61>] ? vector_device_release+0x0/0x50
 [<60276fcd>] device_release+0x70/0xba
 [<603a2bcd>] kobject_put+0xba/0xe7
 [<60277265>] put_device+0x19/0x1c
 [<60281266>] platform_device_put+0x26/0x29
 [<60281e5f>] platform_device_unregister+0x2c/0x2e
 [<60029422>] vector_remove+0x52/0x58
 [<60031316>] ? mconsole_reply+0x0/0x50
 [<600310c8>] mconsole_remove+0x160/0x1cc
 [<603b19f4>] ? strlen+0x0/0x15
 [<60066611>] ? __dequeue_entity+0x1a9/0x206
 [<600666a7>] ? set_next_entity+0x39/0x63
 [<6006666e>] ? set_next_entity+0x0/0x63
 [<60038fa6>] ? um_set_signals+0x0/0x43
 [<6003070c>] mc_work_proc+0x77/0x91
 [<60057664>] process_scheduled_works+0x1b3/0x2dd
 [<60055f32>] ? assign_work+0x0/0x58
 [<60057f0a>] worker_thread+0x1e9/0x293
 [<6005406f>] ? set_pf_worker+0x0/0x64
 [<6005d65d>] ? arch_local_irq_save+0x0/0x2d
 [<6005d748>] ? kthread_exit+0x0/0x3a
 [<60057d21>] ? worker_thread+0x0/0x293
 [<6005dbf1>] kthread+0x126/0x12b
 [<600219c5>] new_thread_handler+0x85/0xb6

Cc: stable@vger.kernel.org
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Link: https://patch.msgid.link/20241104163203.435515-5-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: net: Do not use drvdata in release
Tiwei Bie [Mon, 4 Nov 2024 16:32:02 +0000 (00:32 +0800)]
um: net: Do not use drvdata in release

The drvdata is not available in release. Let's just use container_of()
to get the uml_net instance. Otherwise, removing a network device will
result in a crash:

RIP: 0033:net_device_release+0x10/0x6f
RSP: 00000000e20c7c40  EFLAGS: 00010206
RAX: 000000006002e4e7 RBX: 00000000600f1baf RCX: 00000000624074e0
RDX: 0000000062778000 RSI: 0000000060551c80 RDI: 00000000627af028
RBP: 00000000e20c7c50 R08: 00000000603ad594 R09: 00000000e20c7b70
R10: 000000000000135a R11: 00000000603ad422 R12: 0000000000000000
R13: 0000000062c7af00 R14: 0000000062406d60 R15: 00000000627700b6
Kernel panic - not syncing: Segfault with no mm
CPU: 0 UID: 0 PID: 29 Comm: kworker/0:2 Not tainted 6.12.0-rc6-g59b723cd2adb #1
Workqueue: events mc_work_proc
Stack:
 627af028 62c7af00 e20c7c80 60276fcd
 62778000 603f5820 627af028 00000000
 e20c7cb0 603a2bcd 627af000 62770010
Call Trace:
 [<60276fcd>] device_release+0x70/0xba
 [<603a2bcd>] kobject_put+0xba/0xe7
 [<60277265>] put_device+0x19/0x1c
 [<60281266>] platform_device_put+0x26/0x29
 [<60281e5f>] platform_device_unregister+0x2c/0x2e
 [<6002ec9c>] net_remove+0x63/0x69
 [<60031316>] ? mconsole_reply+0x0/0x50
 [<600310c8>] mconsole_remove+0x160/0x1cc
 [<60087d40>] ? __remove_hrtimer+0x38/0x74
 [<60087ff8>] ? hrtimer_try_to_cancel+0x8c/0x98
 [<6006b3cf>] ? dl_server_stop+0x3f/0x48
 [<6006b390>] ? dl_server_stop+0x0/0x48
 [<600672e8>] ? dequeue_entities+0x327/0x390
 [<60038fa6>] ? um_set_signals+0x0/0x43
 [<6003070c>] mc_work_proc+0x77/0x91
 [<60057664>] process_scheduled_works+0x1b3/0x2dd
 [<60055f32>] ? assign_work+0x0/0x58
 [<60057f0a>] worker_thread+0x1e9/0x293
 [<6005406f>] ? set_pf_worker+0x0/0x64
 [<6005d65d>] ? arch_local_irq_save+0x0/0x2d
 [<6005d748>] ? kthread_exit+0x0/0x3a
 [<60057d21>] ? worker_thread+0x0/0x293
 [<6005dbf1>] kthread+0x126/0x12b
 [<600219c5>] new_thread_handler+0x85/0xb6

Cc: stable@vger.kernel.org
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Link: https://patch.msgid.link/20241104163203.435515-4-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: ubd: Do not use drvdata in release
Tiwei Bie [Mon, 4 Nov 2024 16:32:01 +0000 (00:32 +0800)]
um: ubd: Do not use drvdata in release

The drvdata is not available in release. Let's just use container_of()
to get the ubd instance. Otherwise, removing a ubd device will result
in a crash:

RIP: 0033:blk_mq_free_tag_set+0x1f/0xba
RSP: 00000000e2083bf0  EFLAGS: 00010246
RAX: 000000006021463a RBX: 0000000000000348 RCX: 0000000062604d00
RDX: 0000000004208060 RSI: 00000000605241a0 RDI: 0000000000000348
RBP: 00000000e2083c10 R08: 0000000062414010 R09: 00000000601603f7
R10: 000000000000133a R11: 000000006038c4bd R12: 0000000000000000
R13: 0000000060213a5c R14: 0000000062405d20 R15: 00000000604f7aa0
Kernel panic - not syncing: Segfault with no mm
CPU: 0 PID: 17 Comm: kworker/0:1 Not tainted 6.8.0-rc3-00107-gba3f67c11638 #1
Workqueue: events mc_work_proc
Stack:
 00000000 604f7ef0 62c5d000 62405d20
 e2083c30 6002c776 6002c755 600e47ff
 e2083c60 6025ffe3 04208060 603d36e0
Call Trace:
 [<6002c776>] ubd_device_release+0x21/0x55
 [<6002c755>] ? ubd_device_release+0x0/0x55
 [<600e47ff>] ? kfree+0x0/0x100
 [<6025ffe3>] device_release+0x70/0xba
 [<60381d6a>] kobject_put+0xb5/0xe2
 [<6026027b>] put_device+0x19/0x1c
 [<6026a036>] platform_device_put+0x26/0x29
 [<6026ac5a>] platform_device_unregister+0x2c/0x2e
 [<6002c52e>] ubd_remove+0xb8/0xd6
 [<6002bb74>] ? mconsole_reply+0x0/0x50
 [<6002b926>] mconsole_remove+0x160/0x1cc
 [<6002bbbc>] ? mconsole_reply+0x48/0x50
 [<6003379c>] ? um_set_signals+0x3b/0x43
 [<60061c55>] ? update_min_vruntime+0x14/0x70
 [<6006251f>] ? dequeue_task_fair+0x164/0x235
 [<600620aa>] ? update_cfs_group+0x0/0x40
 [<603a0e77>] ? __schedule+0x0/0x3ed
 [<60033761>] ? um_set_signals+0x0/0x43
 [<6002af6a>] mc_work_proc+0x77/0x91
 [<600520b4>] process_scheduled_works+0x1af/0x2c3
 [<6004ede3>] ? assign_work+0x0/0x58
 [<600527a1>] worker_thread+0x2f7/0x37a
 [<6004ee3b>] ? set_pf_worker+0x0/0x64
 [<6005765d>] ? arch_local_irq_save+0x0/0x2d
 [<60058e07>] ? kthread_exit+0x0/0x3a
 [<600524aa>] ? worker_thread+0x0/0x37a
 [<60058f9f>] kthread+0x130/0x135
 [<6002068e>] new_thread_handler+0x85/0xb6

Cc: stable@vger.kernel.org
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Link: https://patch.msgid.link/20241104163203.435515-3-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: ubd: Initialize ubd's disk pointer in ubd_add
Tiwei Bie [Mon, 4 Nov 2024 16:32:00 +0000 (00:32 +0800)]
um: ubd: Initialize ubd's disk pointer in ubd_add

Currently, the initialization of the disk pointer in the ubd structure
is missing. It should be initialized with the allocated gendisk pointer
in ubd_add().

Fixes: 32621ad7a7ea ("ubd: remove the ubd_gendisk array")
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Link: https://patch.msgid.link/20241104163203.435515-2-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: virtio_uml: query the number of vqs if supported
Benjamin Berg [Sun, 3 Nov 2024 21:28:54 +0000 (22:28 +0100)]
um: virtio_uml: query the number of vqs if supported

When the VHOST_USER_PROTOCOL_F_MQ protocol feature flag is set, we can
query the maximum number of virtual queues. Do so when supported and
extend the check to verify that we are not trying to allocate more
queues.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241103212854.1436046-5-benjamin@sipsolutions.net
[add a message to the WARN_ON]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: virtio_uml: fix call_fd IRQ allocation
Benjamin Berg [Sun, 3 Nov 2024 21:28:53 +0000 (22:28 +0100)]
um: virtio_uml: fix call_fd IRQ allocation

If the device does not support slave requests, then the IRQ will not yet
be allocated. So initialize the IRQ to UM_IRQ_ALLOC so that it will be
allocated if none has been assigned yet and store it slightly later when
we know that it will not be immediately unregistered again.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241103212854.1436046-4-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: virtio_uml: send SET_MEM_TABLE message with the exact size
Benjamin Berg [Sun, 3 Nov 2024 21:28:51 +0000 (22:28 +0100)]
um: virtio_uml: send SET_MEM_TABLE message with the exact size

The rust based userspace vhost devices are very strict and will not
accept the message if it is longer than required. So, only include the
data for the first memory region.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241103212854.1436046-2-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: remove broken double fault detection
Benjamin Berg [Sun, 3 Nov 2024 15:05:06 +0000 (16:05 +0100)]
um: remove broken double fault detection

The show_stack function had some code to detect double faults. However,
the logic is wrong and it would e.g. trigger if a WARNING happened
inside an IRQ.

Remove it without trying to add a new logic. The current behaviour,
which will just fault repeatedly until the IRQ stack is used up and the
host kills UML, seems to be good enough.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241103150506.1367695-5-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: remove duplicate UM_NSEC_PER_SEC definition
Benjamin Berg [Sun, 3 Nov 2024 15:05:05 +0000 (16:05 +0100)]
um: remove duplicate UM_NSEC_PER_SEC definition

Just remove the first entry as there is a second later on.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241103150506.1367695-4-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: remove file sync for stub data
Benjamin Berg [Sun, 3 Nov 2024 15:05:04 +0000 (16:05 +0100)]
um: remove file sync for stub data

There is no need to sync the stub code to "disk" for the other process
to see the correct memory. Drop the fsync there and remove the helper
function.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241103150506.1367695-3-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: always include kconfig.h and compiler-version.h
Benjamin Berg [Sun, 3 Nov 2024 15:05:03 +0000 (16:05 +0100)]
um: always include kconfig.h and compiler-version.h

Since commit a95b37e20db9 ("kbuild: get <linux/compiler_types.h> out of
<linux/kconfig.h>") we can safely include these files in userspace code.
Doing so simplifies matters as options do not need to be exported via
asm-offsets.h anymore.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241103150506.1367695-2-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: set DONTDUMP and DONTFORK flags on KASAN shadow memory
Benjamin Berg [Sun, 3 Nov 2024 15:05:02 +0000 (16:05 +0100)]
um: set DONTDUMP and DONTFORK flags on KASAN shadow memory

There is no point in either dumping the KASAN shadow memory or doing
copy-on-write after a fork on these memory regions.

This considerably speeds up coredump generation.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241103150506.1367695-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: fix sparse warnings in signal code
Benjamin Berg [Thu, 31 Oct 2024 14:20:17 +0000 (15:20 +0100)]
um: fix sparse warnings in signal code

sparse reports that various places were missing the __user tag in casts.
In addition, one location was using 0 instead of NULL.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241031142017.430420-2-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: fix sparse warnings from regset refactor
Benjamin Berg [Thu, 31 Oct 2024 14:20:16 +0000 (15:20 +0100)]
um: fix sparse warnings from regset refactor

Some variables were not tagged with __user and another was not marked as
static even though it should be.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202410280655.gOlEFwdG-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202410281821.WSPsAwq7-lkp@intel.com/
Fixes: 3f17fed21491 ("um: switch to regset API and depend on XSTATE")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241031142017.430420-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: Remove double zero check
Shaojie Dong [Fri, 25 Oct 2024 09:02:37 +0000 (17:02 +0800)]
um: Remove double zero check

free_pages() performs a parameter null check inside
therefore remove double zero check here.

Signed-off-by: Shaojie Dong <quic_shaojied@quicinc.com>
Link: https://patch.msgid.link/20241025-upstream_branch-v5-1-b8998beb2c64@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: fix stub exe build with CONFIG_GCOV
Johannes Berg [Fri, 25 Oct 2024 08:27:01 +0000 (10:27 +0200)]
um: fix stub exe build with CONFIG_GCOV

CONFIG_GCOV is special and only in UML since it builds the
kernel with a "userspace" option. This is fine, but the stub
is even more special and not really a full userspace process,
so it then fails to link as reported.

Remove the GCOV options from the stub build.

For good measure, also remove the GPROF options, even though
they don't seem to cause build failures now.

To be able to do this, export the specific options (GCOV_OPT
and GPROF_OPT) but rename them so there's less chance of any
conflicts.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202410242238.SXhs2kQ4-lkp@intel.com/
Fixes: 32e8eaf263d9 ("um: use execveat to create userspace MMs")
Link: https://patch.msgid.link/20241025102700.9fbb9c34473f.I7f1537fe075638f8da64beb52ef6c9e5adc51bc3@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: Use os_set_pdeathsig helper in winch thread/process
Tiwei Bie [Thu, 24 Oct 2024 14:28:28 +0000 (22:28 +0800)]
um: Use os_set_pdeathsig helper in winch thread/process

Since we have a helper now, let's switch to using it. It will make
the code slightly more consistent.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241024142828.2612828-5-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: Set parent-death signal for write_sigio thread/process
Tiwei Bie [Thu, 24 Oct 2024 14:28:27 +0000 (22:28 +0800)]
um: Set parent-death signal for write_sigio thread/process

The write_sigio thread is not really a traditional thread. Set
the parent-death signal for it to ensure that it will be killed
if the UML kernel dies unexpectedly without proper cleanup.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241024142828.2612828-4-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: Set parent-death signal for ubd io thread/process
Tiwei Bie [Thu, 24 Oct 2024 14:28:26 +0000 (22:28 +0800)]
um: Set parent-death signal for ubd io thread/process

The ubd io thread is not really a traditional thread. Set the
parent-death signal for it to ensure that it will be killed if
the UML kernel dies unexpectedly without proper cleanup.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241024142828.2612828-3-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: Add os_set_pdeathsig helper function
Tiwei Bie [Thu, 24 Oct 2024 14:28:25 +0000 (22:28 +0800)]
um: Add os_set_pdeathsig helper function

This helper can be used to set the parent-death signal of the calling
process to SIGKILL to ensure that the process will be killed if the
UML kernel dies unexpectedly without proper cleanup. This helper will
be used in the follow-up patches.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241024142828.2612828-2-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: remove PATH_MAX use
Johannes Berg [Thu, 24 Oct 2024 07:52:51 +0000 (09:52 +0200)]
um: remove PATH_MAX use

Evidently, PATH_MAX isn't always defined, at least not via <limits.h>.
Simply remove the use and replace it by a constant 4k. As stat::st_size
is zero for /proc/self/exe we can't even size it automatically, and it
seems unlikely someone's going to try to run UML with such a path.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202410240553.gYNIXN8i-lkp@intel.com/
Fixes: 031acdcfb566 ("um: restore process name")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: switch to regset API and depend on XSTATE
Benjamin Berg [Wed, 23 Oct 2024 09:41:20 +0000 (11:41 +0200)]
um: switch to regset API and depend on XSTATE

The PTRACE_GETREGSET API has now existed since Linux 2.6.33. The XSAVE
CPU feature should also be sufficiently common to be able to rely on it.

With this, define our internal FP state to be the hosts XSAVE data. Add
discovery for the hosts XSAVE size and place the FP registers at the end
of task_struct so that we can adjust the size at runtime.

Next we can implement the regset API on top and update the signal
handling as well as ptrace APIs to use them. Also switch coredump
creation to use the regset API and finally set HAVE_ARCH_TRACEHOOK.

This considerably improves the signal frames. Previously they might not
have contained all the registers (i386) and also did not have the
sizes and magic values set to the correct values to permit userspace to
decode the frame.

As a side effect, this will permit UML to run on hosts with newer CPU
extensions (such as AMX) that need even more register state.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241023094120.4083426-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: insert scheduler ticks when userspace does not yield
Benjamin Berg [Thu, 10 Oct 2024 14:25:37 +0000 (16:25 +0200)]
um: insert scheduler ticks when userspace does not yield

In time-travel mode userspace can do a lot of work without any time
passing. Unfortunately, this can result in OOM situations as the RCU
core code will never be run.

Work around this by keeping track of userspace processes that do not
yield for a lot of operations. When this happens, insert a jiffie into
the sched_clock clock to account time against the process and cause the
bookkeeping to run.

As sched_clock is used for tracing, it is useful to keep it in sync
between the different VMs. As such, try to remove added ticks again when
the actual clock ticks.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241010142537.1134685-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: Rename _PAGE_NEWPAGE to _PAGE_NEEDSYNC
Tiwei Bie [Fri, 11 Oct 2024 10:23:54 +0000 (18:23 +0800)]
um: Rename _PAGE_NEWPAGE to _PAGE_NEEDSYNC

The _PAGE_NEWPAGE bit does not really indicate that this is a new page,
but rather whether this entry needs to be synced or not. Renaming it
to _PAGE_NEEDSYNC will make it more clear how everything ties together.

Suggested-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241011102354.1682626-3-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: Abandon the _PAGE_NEWPROT bit
Tiwei Bie [Fri, 11 Oct 2024 10:23:53 +0000 (18:23 +0800)]
um: Abandon the _PAGE_NEWPROT bit

When a PTE is updated in the page table, the _PAGE_NEWPAGE bit will
always be set. And the corresponding page will always be mapped or
unmapped depending on whether the PTE is present or not. The check
on the _PAGE_NEWPROT bit is not really reachable. Abandoning it will
allow us to simplify the code and remove the unreachable code.

Reviewed-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241011102354.1682626-2-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: vdso: Always reject undefined references in during linking
Thomas Weißschuh [Fri, 11 Oct 2024 09:18:26 +0000 (11:18 +0200)]
um: vdso: Always reject undefined references in during linking

Instead of using a custom script to detect and fail on undefined
references, use --no-undefined for all VDSO linker invocations.

Drop the now unused checkundef.sh script.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://patch.msgid.link/20241011-vdso-checkundef-v1-2-1a46e0352d20@linutronix.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: Do not propagate initrd parameter to kernel
Tiwei Bie [Fri, 11 Oct 2024 04:04:41 +0000 (12:04 +0800)]
um: Do not propagate initrd parameter to kernel

This parameter is UML specific. It specifies the name of the file
containing the initrd image, which is unknown to kernel.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241011040441.1586345-10-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: hostaudio: Do not propagate mixer parameter to kernel
Tiwei Bie [Fri, 11 Oct 2024 04:04:40 +0000 (12:04 +0800)]
um: hostaudio: Do not propagate mixer parameter to kernel

This parameter is UML specific and is unknown to kernel. It should
not be propagated to kernel, otherwise it will trigger a warning and
be passed to user space as an environment option.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241011040441.1586345-9-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: hostaudio: Do not propagate dsp parameter to kernel
Tiwei Bie [Fri, 11 Oct 2024 04:04:39 +0000 (12:04 +0800)]
um: hostaudio: Do not propagate dsp parameter to kernel

This parameter is UML specific and is unknown to kernel. It should
not be propagated to kernel, otherwise it will trigger a warning and
be passed to user space as an environment option.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241011040441.1586345-8-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agohostfs: Do not propagate hostfs parameter to kernel
Tiwei Bie [Fri, 11 Oct 2024 04:04:38 +0000 (12:04 +0800)]
hostfs: Do not propagate hostfs parameter to kernel

This parameter is UML specific and is unknown to kernel. It should not
be propagated to kernel, otherwise it will be passed to user space as
an environment option by kernel with a warning like:

Unknown kernel command line parameters "hostfs=/foo", will be passed to user space.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241011040441.1586345-7-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: Do not propagate noreboot parameter to kernel
Tiwei Bie [Fri, 11 Oct 2024 04:04:37 +0000 (12:04 +0800)]
um: Do not propagate noreboot parameter to kernel

This parameter is UML specific and is unknown to kernel. It should not
be propagated to kernel, otherwise it could be passed to user space as
a command line option by kernel with a warning like:

Unknown kernel command line parameters "noreboot", will be passed to user space.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241011040441.1586345-6-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: Do not propagate dtb parameter to kernel
Tiwei Bie [Fri, 11 Oct 2024 04:04:36 +0000 (12:04 +0800)]
um: Do not propagate dtb parameter to kernel

This parameter is UML specific and is unknown to kernel. It should not
be propagated to kernel, otherwise it will be passed to user space as
an environment option by kernel with a warning like:

Unknown kernel command line parameters "dtb=/foo", will be passed to user space.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241011040441.1586345-5-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: Do not propagate uml_dir parameter to kernel
Tiwei Bie [Fri, 11 Oct 2024 04:04:35 +0000 (12:04 +0800)]
um: Do not propagate uml_dir parameter to kernel

This parameter is UML specific and is unknown to kernel. It should not
be propagated to kernel, otherwise it will be passed to user space as
an environment option by kernel with a warning like:

Unknown kernel command line parameters "uml_dir=/foo", will be passed to user space.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241011040441.1586345-4-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: Do not propagate mem parameter to kernel
Tiwei Bie [Fri, 11 Oct 2024 04:04:34 +0000 (12:04 +0800)]
um: Do not propagate mem parameter to kernel

This parameter is UML specific and is unknown to kernel. It should not
be propagated to kernel, otherwise it will be passed to user space as
an environment option by kernel with a warning like:

Unknown kernel command line parameters "mem=2G", will be passed to user space.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241011040441.1586345-3-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: Remove UML specific debug parameter
Tiwei Bie [Fri, 11 Oct 2024 04:04:33 +0000 (12:04 +0800)]
um: Remove UML specific debug parameter

It does nothing but emit a warning when 'debug' is provided in the
kernel command line. It can be a bit annoying, as 'debug' is also a
valid kernel parameter to enable kernel debugging.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241011040441.1586345-2-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: remove fault_catcher infrastructure
Johannes Berg [Thu, 10 Oct 2024 20:45:14 +0000 (22:45 +0200)]
um: remove fault_catcher infrastructure

This was perhaps intended to do _nofault copies, but the
real reason is lost to history. Remove this, it's not
needed, and using longjmp() out of the middle of the
signal handler with all the state it has modified is
not going to be a good idea anyway.

Link: https://patch.msgid.link/20241010224513.901c4d390b3e.Ia74742668b44603c1ca23dd36f90e964e6e7ee55@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: restore process name
Johannes Berg [Thu, 10 Oct 2024 14:14:12 +0000 (16:14 +0200)]
um: restore process name

After the execve() to disable ASLR, comm is now "exe",
which is a bit confusing. Use readlink() to get this
to the right name again.

Disable stack frame size warnings on main.o since it's
part of the initial userspace and can use larger stack.

Fixes: 68b9883cc16e ("um: Discover host_task_size from envp")
Link: https://patch.msgid.link/20241010161411.c576e2aeb3e5.I244d4f34b8a8555ee5bec0e1cf5027bce4cc491b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
8 months agoum: make stub_exe _start() pure inline asm
Johannes Berg [Tue, 22 Oct 2024 12:02:38 +0000 (14:02 +0200)]
um: make stub_exe _start() pure inline asm

Since __attribute__((naked)) cannot be used with functions
containing C statements, just generate the few instructions
it needs in assembly directly.

While at it, fix the stack usage ("1 + 2*x - 1" is odd) and
document what it must do, and why it must adjust the stack.

Fixes: 8508a5e0e9db ("um: Fix misaligned stack in stub_exe")
Link: https://lore.kernel.org/linux-um/CABVgOSntH-uoOFMP5HwMXjx_f1osMnVdhgKRKm4uz6DFm2Lb8Q@mail.gmail.com/
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: Fix misaligned stack in stub_exe
David Gow [Thu, 17 Oct 2024 23:10:08 +0000 (07:10 +0800)]
um: Fix misaligned stack in stub_exe

The stub_exe could segfault when built with some compilers (e.g. gcc
13.2.0), as SSE instructions which relied on stack alignment could be
generated, but the stack was misaligned.

This seems to be due to the __start entry point being run with a 16-byte
aligned stack, but the x86_64 SYSV ABI wanting the stack to be so
aligned _before_ a function call (so it is misaligned when the function
is entered due to the return address being pushed). The function
prologue then realigns it. Because the entry point is never _called_,
and hence there is no return address, the prologue is therefore actually
misaligning it, and causing the generated movaps instructions to
SIGSEGV. This results in the following error:
start_userspace : expected SIGSTOP, got status = 139

Don't generate this prologue for __start by using
__attribute__((naked)), which resolves the issue.

Fixes: 32e8eaf263d9 ("um: use execveat to create userspace MMs")
Signed-off-by: David Gow <davidgow@google.com>
Link: https://lore.kernel.org/linux-um/CABVgOS=boUoG6=LHFFhxEd8H8jDP1zOaPKFEjH+iy2n2Q5S2aQ@mail.gmail.com/
Link: https://patch.msgid.link/20241017231007.1500497-2-davidgow@google.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: Disable auto variable initialization for stub_exe.c
Nathan Chancellor [Wed, 16 Oct 2024 21:12:38 +0000 (14:12 -0700)]
um: Disable auto variable initialization for stub_exe.c

When automatic variable initialization is enabled via
CONFIG_INIT_STACK_ALL_{PATTERN,ZERO}, clang will insert a call to
memset() to initialize an object created with __builtin_alloca(). This
ultimately breaks the build when linking stub_exe because it is a
standalone executable that does not include or link against memset().

  ld: arch/um/kernel/skas/stub_exe.o: in function `_start':
  arch/um/kernel/skas/stub_exe.c:83:(.ltext+0x15): undefined reference to `memset'

Disable automatic variable initialization for stub_exe.c by passing the
default value of 'uninitialized' to '-ftrivial-auto-var-init', which
avoids generating the call to memset(). This code is small and runs
quickly as it is just designed to set up an environment, so stack
variable initialization is unnecessary overhead for little gain.

Fixes: 32e8eaf263d9 ("um: use execveat to create userspace MMs")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20241016-uml-fix-stub_exe-clang-v1-2-3d6381dc5a78@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: Fix passing '-n' to linker for stub_exe
Nathan Chancellor [Wed, 16 Oct 2024 21:12:37 +0000 (14:12 -0700)]
um: Fix passing '-n' to linker for stub_exe

When building stub_exe with clang, there is an error because '-n' is not
a recognized flag by the clang driver (which is being used to invoke the
linker):

  clang: error: unknown argument: '-n'

'-n' should be passed along to the linker, as it is the short flag for
'--nmagic', so prefix it with '-Wl,'.

Fixes: 32e8eaf263d9 ("um: use execveat to create userspace MMs")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20241016-uml-fix-stub_exe-clang-v1-1-3d6381dc5a78@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: Switch to 4 level page tables on 64 bit
Benjamin Berg [Thu, 19 Sep 2024 12:45:11 +0000 (14:45 +0200)]
um: Switch to 4 level page tables on 64 bit

The larger memory space is useful to support more applications inside
UML. One example for this is ASAN instrumentation of userspace
applications which requires addresses that would otherwise not be
available.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240919124511.282088-11-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: clear all memory in new userspace processes
Benjamin Berg [Thu, 19 Sep 2024 12:45:10 +0000 (14:45 +0200)]
um: clear all memory in new userspace processes

With the change to use execve() we can now safely clear the memory up to
STUB_START as rseq will not be trying to use memory in that region. Also,
on 64 bit the previous changes should mean that there is no usable
memory range above the stub.

Make the change and remove the comment as it is not needed anymore.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240919124511.282088-10-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: Discover host_task_size from envp
Benjamin Berg [Thu, 19 Sep 2024 12:45:09 +0000 (14:45 +0200)]
um: Discover host_task_size from envp

When loading the UML binary, the host kernel will place the stack at the
highest possible address. It will then map the program name and
environment variables onto the start of the stack.

As such, an easy way to figure out the host_task_size is to use the
highest pointer to an environment variable as a reference.

Ensure that this works by disabling address layout randomization and
re-executing UML in case it was enabled.

This increases the available TASK_SIZE for 64 bit UML considerably.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240919124511.282088-9-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: Limit TASK_SIZE to the addressable range
Benjamin Berg [Thu, 19 Sep 2024 12:45:08 +0000 (14:45 +0200)]
um: Limit TASK_SIZE to the addressable range

We may have a TASK_SIZE from the host that is bigger than UML is able to
address with a three-level pagetable on 64-bit. Guard against that by
clipping the maximum TASK_SIZE to the maximum addressable area.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240919124511.282088-8-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: Calculate stub data address relative to stub code
Benjamin Berg [Thu, 19 Sep 2024 12:45:07 +0000 (14:45 +0200)]
um: Calculate stub data address relative to stub code

Instead of using the current stack pointer, we can also use the current
instruction to calculate where the stub data is. With this the stub data
only needs to be aligned to a full page boundary.

Changing this has the advantage that we do not have a hole in the memory
space above the stub data (which would need to be explicitly cleared).

Another motivation to do this is that with the planned addition of a
SECCOMP based userspace the stack pointer may not be fully trustworthy.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240919124511.282088-7-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: Add compile time assert that stub fits on a page
Benjamin Berg [Thu, 19 Sep 2024 12:45:06 +0000 (14:45 +0200)]
um: Add compile time assert that stub fits on a page

The code assumes that the stub code can fit into a single page. This is
unlikely to ever change, but add a link time assert instead so that
there will be no hard to debug error.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240919124511.282088-6-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: Set parent death signal for winch thread/process
Benjamin Berg [Thu, 19 Sep 2024 12:45:05 +0000 (14:45 +0200)]
um: Set parent death signal for winch thread/process

The winch "thread" is really a separate process. Using prctl to set
PR_SET_PDEATHSIG ensures that this separate thread will be killed if the
UML kernel itself dies unexpectedly and does not perform proper cleanup.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240919124511.282088-5-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: Set parent death signal for userspace process
Benjamin Berg [Thu, 19 Sep 2024 12:45:04 +0000 (14:45 +0200)]
um: Set parent death signal for userspace process

Enable PR_SET_PDEATHSIG so that the UML userspace process will be killed
when the kernel exits unexpectedly.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240919124511.282088-4-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: use execveat to create userspace MMs
Benjamin Berg [Thu, 19 Sep 2024 12:45:03 +0000 (14:45 +0200)]
um: use execveat to create userspace MMs

Using clone will not undo features that have been enabled by libc. An
example of this already happening is rseq, which could cause the kernel
to read/write memory of the userspace process. In the future the
standard library might also use mseal by default to protect itself,
which would also thwart our attempts at unmapping everything.

Solve all this by taking a step back and doing an execve into a tiny
static binary that sets up the minimal environment required for the
stub without using any standard library. That way we have a clean
execution environment that is fully under the control of UML.

Note that this changes things a bit as the FDs are not anymore shared
with the kernel. Instead, we explicitly share the FDs for the physical
memory and all existing iomem regions. Doing this is fine, as iomem
regions cannot be added at runtime.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240919124511.282088-3-benjamin@sipsolutions.net
[use pipe() instead of pipe2(), remove unneeded close() calls]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: Add generic stub_syscall1 function
Benjamin Berg [Thu, 19 Sep 2024 12:45:02 +0000 (14:45 +0200)]
um: Add generic stub_syscall1 function

The 64bit version did not have a stub_syscall1 function yet. Add it as
it will be useful to implement a static binary for stub loading.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240919124511.282088-2-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: remove auxiliary FP registers
Benjamin Berg [Fri, 4 Oct 2024 23:38:21 +0000 (01:38 +0200)]
um: remove auxiliary FP registers

We do not need the extra save/restore of the FP registers when getting
the fault information. This was originally added in commit 2f56debd77a8
("uml: fix FP register corruption") but at that time the code was not
saving/restoring the FP registers when switching to userspace. This was
fixed in commit fbfe9c847edf ("um: Save FPU registers between task
switches") and since then the auxiliary registers have not been useful.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241004233821.2130874-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: remove dependency on undefined CC_CAN_LINK_STATIC_NO_RUNTIME_DEPS
Masahiro Yamada [Tue, 24 Sep 2024 11:33:40 +0000 (20:33 +0900)]
um: remove dependency on undefined CC_CAN_LINK_STATIC_NO_RUNTIME_DEPS

CC_CAN_LINK_STATIC_NO_RUNTIME_DEPS is not defined anywhere.

In the submitted patch set [1], the first patch "um/kconfig: introduce
CC_CAN_LINK_STATIC_NO_RUNTIME_DEPS" was not applied.

Only 2/3 and 3/3 were applied, which are now:

 - 730586ff7fad ("um: Allow static linking for non-glibc implementations")
 - 5e1121cd43d4 ("um: Some fixes to build UML with musl")

Given that nobody has noticed the missing first patch for many years,
it seems it was unnecessary.

[1]: https://lore.kernel.org/lkml/20200719210222.2811-1-ignat@cloudflare.com/

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://patch.msgid.link/20240924113342.32530-1-masahiroy@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: Remove 3-level page table support on i386
Tiwei Bie [Wed, 18 Sep 2024 06:17:02 +0000 (14:17 +0800)]
um: Remove 3-level page table support on i386

The highmem support has been removed by commit a98a6d864d3b ("um:
Remove broken highmem support"). The 2-level page table is sufficient
on UML/i386 now. Remove the 3-level page table support on UML/i386
which is still marked as experimental.

Suggested-by: Benjamin Berg <benjamin@sipsolutions.net>
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20240918061702.614837-1-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: always use the internal copy of the FP registers
Benjamin Berg [Fri, 13 Sep 2024 13:38:45 +0000 (15:38 +0200)]
um: always use the internal copy of the FP registers

When switching from userspace to the kernel, all registers including the
FP registers are copied into the kernel and restored later on. As such,
the true source for the FP register state is actually already in the
kernel and they should never be grabbed from the userspace process.

Change the various places to simply copy the data from the internal FP
register storage area. Note that on i386 the format of PTRACE_GETFPREGS
and PTRACE_GETFPXREGS is different enough that conversion would be
needed. With this patch, -EINVAL is returned if the non-native format is
requested.

The upside is, that this patchset fixes setting registers via ptrace
(which simply did not work before) as well as fixing setting floating
point registers using the mcontext on signal return on i386.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240913133845.964292-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: Fix the return value of elf_core_copy_task_fpregs
Tiwei Bie [Fri, 13 Sep 2024 02:33:02 +0000 (10:33 +0800)]
um: Fix the return value of elf_core_copy_task_fpregs

This function is expected to return a boolean value, which should be
true on success and false on failure.

Fixes: d1254b12c93e ("uml: fix x86_64 core dump crash")
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20240913023302.130300-1-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: Fix the definition for physmem_size
Tiwei Bie [Mon, 16 Sep 2024 04:59:50 +0000 (12:59 +0800)]
um: Fix the definition for physmem_size

Currently physmem_size is defined as long long but declared locally
as unsigned long long before using it in separate .c files. Make them
match by defining physmem_size as unsigned long long and also move
the declaration to a common header to allow the compiler to check it.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20240916045950.508910-5-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: Remove highmem leftovers
Tiwei Bie [Mon, 16 Sep 2024 04:59:49 +0000 (12:59 +0800)]
um: Remove highmem leftovers

Highmem was only supported on UML/i386. And the support has been
removed by commit a98a6d864d3b ("um: Remove broken highmem support").
Remove the leftovers and stop UML from trying to setup highmem when
the sum of physmem_size and iomem_size exceeds max_physmem.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20240916045950.508910-4-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: Fix potential integer overflow during physmem setup
Tiwei Bie [Mon, 16 Sep 2024 04:59:48 +0000 (12:59 +0800)]
um: Fix potential integer overflow during physmem setup

This issue happens when the real map size is greater than LONG_MAX,
which can be easily triggered on UML/i386.

Fixes: fe205bdd1321 ("um: Print minimum physical memory requirement")
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20240916045950.508910-3-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: Remove the redundant declaration of high_physmem
Tiwei Bie [Mon, 16 Sep 2024 04:59:47 +0000 (12:59 +0800)]
um: Remove the redundant declaration of high_physmem

high_physmem has already been declared in as-layout.h, so there is
no need to declare it explicitly in the .c file again.

While at it, group the declarations of __real_malloc and __real_free
together to make the code slightly more readable.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20240916045950.508910-2-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: Set HAVE_EFFICIENT_UNALIGNED_ACCESS for x86
Benjamin Berg [Fri, 13 Sep 2024 13:44:42 +0000 (15:44 +0200)]
um: Set HAVE_EFFICIENT_UNALIGNED_ACCESS for x86

The x86 port of UM has efficient unaligned access. Set the option as it
is appropriate and will e.g. cause UBSAN to not enable unaligned memory
access checking by default.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240913134442.967599-6-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: Remove unused os_getpgrp function
Benjamin Berg [Fri, 13 Sep 2024 13:44:41 +0000 (15:44 +0200)]
um: Remove unused os_getpgrp function

The function is not used anywhere.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240913134442.967599-5-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: Remove unused os_stop_process
Benjamin Berg [Fri, 13 Sep 2024 13:44:40 +0000 (15:44 +0200)]
um: Remove unused os_stop_process

The function is not used anywhere.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240913134442.967599-4-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: Remove unused os_process_parent
Benjamin Berg [Fri, 13 Sep 2024 13:44:39 +0000 (15:44 +0200)]
um: Remove unused os_process_parent

The function is not used anywhere.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240913134442.967599-3-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoum: Remove unused os_process_pc
Benjamin Berg [Fri, 13 Sep 2024 13:44:38 +0000 (15:44 +0200)]
um: Remove unused os_process_pc

The function is not used anywhere in the codebase.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240913134442.967599-2-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 months agoLinux 6.12-rc2 v6.12-rc2
Linus Torvalds [Sun, 6 Oct 2024 22:32:27 +0000 (15:32 -0700)]
Linux 6.12-rc2

9 months agoMerge tag 'kbuild-fixes-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masah...
Linus Torvalds [Sun, 6 Oct 2024 18:34:55 +0000 (11:34 -0700)]
Merge tag 'kbuild-fixes-v6.12' of git://git./linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Move non-boot built-in DTBs to the .rodata section

 - Fix Kconfig bugs

 - Fix maint scripts in the linux-image Debian package

 - Import some list macros to scripts/include/

* tag 'kbuild-fixes-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: deb-pkg: Remove blank first line from maint scripts
  kbuild: fix a typo dt_binding_schema -> dt_binding_schemas
  scripts: import more list macros
  kconfig: qconf: fix buffer overflow in debug links
  kconfig: qconf: move conf_read() before drawing tree pain
  kconfig: clear expr::val_is_valid when allocated
  kconfig: fix infinite loop in sym_calc_choice()
  kbuild: move non-boot built-in DTBs to .rodata section

9 months agoMerge tag 'platform-drivers-x86-v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 6 Oct 2024 18:11:01 +0000 (11:11 -0700)]
Merge tag 'platform-drivers-x86-v6.12-2' of git://git./linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Hans de Goede:

 - Intel PMC fix for suspend/resume issues on some Sky and Kaby Lake
   laptops

 - Intel Diamond Rapids hw-id additions

 - Documentation and MAINTAINERS fixes

 - Some other small fixes

* tag 'platform-drivers-x86-v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: x86-android-tablets: Fix use after free on platform_device_register() errors
  platform/x86: wmi: Update WMI driver API documentation
  platform/x86: dell-ddv: Fix typo in documentation
  platform/x86: dell-sysman: add support for alienware products
  platform/x86/intel: power-domains: Add Diamond Rapids support
  platform/x86: ISST: Add Diamond Rapids to support list
  platform/x86:intel/pmc: Disable ACPI PM Timer disabling on Sky and Kaby Lake
  platform/x86: dell-laptop: Do not fail when encountering unsupported batteries
  MAINTAINERS: Update Intel In Field Scan(IFS) entry
  platform/x86: ISST: Fix the KASAN report slab-out-of-bounds bug

9 months agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sun, 6 Oct 2024 17:53:28 +0000 (10:53 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM64:

   - Fix pKVM error path on init, making sure we do not change critical
     system registers as we're about to fail

   - Make sure that the host's vector length is at capped by a value
     common to all CPUs

   - Fix kvm_has_feat*() handling of "negative" features, as the current
     code is pretty broken

   - Promote Joey to the status of official reviewer, while James steps
     down -- hopefully only temporarly

  x86:

   - Fix compilation with KVM_INTEL=KVM_AMD=n

   - Fix disabling KVM_X86_QUIRK_SLOT_ZAP_ALL when shadow MMU is in use

  Selftests:

   - Fix compilation on non-x86 architectures"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  x86/reboot: emergency callbacks are now registered by common KVM code
  KVM: x86: leave kvm.ko out of the build if no vendor module is requested
  KVM: x86/mmu: fix KVM_X86_QUIRK_SLOT_ZAP_ALL for shadow MMU
  KVM: arm64: Fix kvm_has_feat*() handling of negative features
  KVM: selftests: Fix build on architectures other than x86_64
  KVM: arm64: Another reviewer reshuffle
  KVM: arm64: Constrain the host to the maximum shared SVE VL with pKVM
  KVM: arm64: Fix __pkvm_init_vcpu cptr_el2 error path

9 months agoMerge tag 'powerpc-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sun, 6 Oct 2024 17:43:00 +0000 (10:43 -0700)]
Merge tag 'powerpc-6.12-3' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:

 - Allow r30 to be used in vDSO code generation of getrandom

Thanks to Jason A. Donenfeld

* tag 'powerpc-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/vdso: allow r30 in vDSO code generation of getrandom

9 months agokbuild: deb-pkg: Remove blank first line from maint scripts
Aaron Thompson [Fri, 4 Oct 2024 07:52:45 +0000 (07:52 +0000)]
kbuild: deb-pkg: Remove blank first line from maint scripts

The blank line causes execve() to fail:

  # strace ./postinst
  execve("./postinst", ...) = -1 ENOEXEC (Exec format error)
  strace: exec: Exec format error
  +++ exited with 1 +++

However running the scripts via shell does work (at least with bash)
because the shell attempts to execute the file as a shell script when
execve() fails.

Fixes: b611daae5efc ("kbuild: deb-pkg: split image and debug objects staging out into functions")
Signed-off-by: Aaron Thompson <dev@aaront.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
9 months agokbuild: fix a typo dt_binding_schema -> dt_binding_schemas
Xu Yang [Wed, 25 Sep 2024 05:32:30 +0000 (13:32 +0800)]
kbuild: fix a typo dt_binding_schema -> dt_binding_schemas

If we follow "make help" to "make dt_binding_schema", we will see
below error:

$ make dt_binding_schema
make[1]: *** No rule to make target 'dt_binding_schema'.  Stop.
make: *** [Makefile:224: __sub-make] Error 2

It should be a typo. So this will fix it.

Fixes: 604a57ba9781 ("dt-bindings: kbuild: Add separate target/dependency for processed-schema.json")
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
9 months agoscripts: import more list macros
Sami Tolvanen [Mon, 23 Sep 2024 18:18:47 +0000 (18:18 +0000)]
scripts: import more list macros

Import list_is_first, list_is_last, list_replace, and list_replace_init.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
9 months agoplatform/x86: x86-android-tablets: Fix use after free on platform_device_register...
Hans de Goede [Sat, 5 Oct 2024 13:05:45 +0000 (15:05 +0200)]
platform/x86: x86-android-tablets: Fix use after free on platform_device_register() errors

x86_android_tablet_remove() frees the pdevs[] array, so it should not
be used after calling x86_android_tablet_remove().

When platform_device_register() fails, store the pdevs[x] PTR_ERR() value
into the local ret variable before calling x86_android_tablet_remove()
to avoid using pdevs[] after it has been freed.

Fixes: 5eba0141206e ("platform/x86: x86-android-tablets: Add support for instantiating platform-devs")
Fixes: e2200d3f26da ("platform/x86: x86-android-tablets: Add gpio_keys support to x86_android_tablet_init()")
Cc: stable@vger.kernel.org
Reported-by: Aleksandr Burakov <a.burakov@rosalinux.ru>
Closes: https://lore.kernel.org/platform-driver-x86/20240917120458.7300-1-a.burakov@rosalinux.ru/
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20241005130545.64136-1-hdegoede@redhat.com
9 months agoplatform/x86: wmi: Update WMI driver API documentation
Armin Wolf [Sat, 5 Oct 2024 21:38:24 +0000 (23:38 +0200)]
platform/x86: wmi: Update WMI driver API documentation

The WMI driver core now passes the WMI event data to legacy notify
handlers, so WMI devices sharing notification IDs are now being
handled properly.

Fixes: e04e2b760ddb ("platform/x86: wmi: Pass event data directly to legacy notify handlers")
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20241005213825.701887-1-W_Armin@gmx.de
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
9 months agoplatform/x86: dell-ddv: Fix typo in documentation
Anaswara T Rajan [Sat, 5 Oct 2024 07:00:56 +0000 (12:30 +0530)]
platform/x86: dell-ddv: Fix typo in documentation

Fix typo in word 'diagnostics' in documentation.

Signed-off-by: Anaswara T Rajan <anaswaratrajan@gmail.com>
Reviewed-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20241005070056.16326-1-anaswaratrajan@gmail.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
9 months agoplatform/x86: dell-sysman: add support for alienware products
Crag Wang [Fri, 4 Oct 2024 15:27:58 +0000 (23:27 +0800)]
platform/x86: dell-sysman: add support for alienware products

Alienware supports firmware-attributes and has its own OEM string.

Signed-off-by: Crag Wang <crag_wang@dell.com>
Link: https://lore.kernel.org/r/20241004152826.93992-1-crag_wang@dell.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
9 months agoplatform/x86/intel: power-domains: Add Diamond Rapids support
Srinivas Pandruvada [Thu, 3 Oct 2024 21:55:54 +0000 (14:55 -0700)]
platform/x86/intel: power-domains: Add Diamond Rapids support

Add Diamond Rapids (INTEL_PANTHERCOVE_X) to tpmi_cpu_ids to support
domaid id mappings.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20241003215554.3013807-3-srinivas.pandruvada@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
9 months agoplatform/x86: ISST: Add Diamond Rapids to support list
Srinivas Pandruvada [Thu, 3 Oct 2024 21:55:53 +0000 (14:55 -0700)]
platform/x86: ISST: Add Diamond Rapids to support list

Add Diamond Rapids (INTEL_PANTHERCOVE_X) to SST support list by adding
to isst_cpu_ids.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20241003215554.3013807-2-srinivas.pandruvada@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
9 months agoplatform/x86:intel/pmc: Disable ACPI PM Timer disabling on Sky and Kaby Lake
Hans de Goede [Thu, 3 Oct 2024 20:26:13 +0000 (22:26 +0200)]
platform/x86:intel/pmc: Disable ACPI PM Timer disabling on Sky and Kaby Lake

There have been multiple reports that the ACPI PM Timer disabling is
causing Sky and Kaby Lake systems to hang on all suspend (s2idle, s3,
hibernate) methods.

Remove the acpi_pm_tmr_ctl_offset and acpi_pm_tmr_disable_bit settings from
spt_reg_map to disable the ACPI PM Timer disabling on Sky and Kaby Lake to
fix the hang on suspend.

Fixes: e86c8186d03a ("platform/x86:intel/pmc: Enable the ACPI PM Timer to be turned off when suspended")
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Closes: https://lore.kernel.org/linux-pm/18784f62-91ff-4d88-9621-6c88eb0af2b5@molgen.mpg.de/
Reported-by: Todd Brandt <todd.e.brandt@intel.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219346
Cc: Marek Maslanka <mmaslanka@google.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: Todd Brandt <todd.e.brandt@intel.com>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> # Dell XPS 13 9360/0596KF
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20241003202614.17181-2-hdegoede@redhat.com
9 months agoplatform/x86: dell-laptop: Do not fail when encountering unsupported batteries
Armin Wolf [Tue, 1 Oct 2024 21:28:35 +0000 (23:28 +0200)]
platform/x86: dell-laptop: Do not fail when encountering unsupported batteries

If the battery hook encounters a unsupported battery, it will
return an error. This in turn will cause the battery driver to
automatically unregister the battery hook.

On machines with multiple batteries however, this will prevent
the battery hook from handling the primary battery, since it will
always get unregistered upon encountering one of the unsupported
batteries.

Fix this by simply ignoring unsupported batteries.

Reviewed-by: Pali Rohár <pali@kernel.org>
Fixes: ab58016c68cc ("platform/x86:dell-laptop: Add knobs to change battery charge settings")
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20241001212835.341788-4-W_Armin@gmx.de
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
9 months agoMAINTAINERS: Update Intel In Field Scan(IFS) entry
Jithu Joseph [Tue, 1 Oct 2024 17:08:08 +0000 (10:08 -0700)]
MAINTAINERS: Update Intel In Field Scan(IFS) entry

Ashok is no longer with Intel and his e-mail address will start bouncing
soon.  Update his email address to the new one he provided to ensure
correct contact details in the MAINTAINERS file.

Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Link: https://lore.kernel.org/r/20241001170808.203970-1-jithu.joseph@intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
9 months agoMerge tag 'kvmarm-fixes-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Paolo Bonzini [Sun, 6 Oct 2024 07:59:22 +0000 (03:59 -0400)]
Merge tag 'kvmarm-fixes-6.12-1' of git://git./linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.12, take #1

- Fix pKVM error path on init, making sure we do not change critical
  system registers as we're about to fail

- Make sure that the host's vector length is at capped by a value
  common to all CPUs

- Fix kvm_has_feat*() handling of "negative" features, as the current
  code is pretty broken

- Promote Joey to the status of official reviewer, while James steps
  down -- hopefully only temporarly

9 months agox86/reboot: emergency callbacks are now registered by common KVM code
Paolo Bonzini [Tue, 1 Oct 2024 14:34:58 +0000 (10:34 -0400)]
x86/reboot: emergency callbacks are now registered by common KVM code

Guard them with CONFIG_KVM_X86_COMMON rather than the two vendor modules.
In practice this has no functional change, because CONFIG_KVM_X86_COMMON
is set if and only if at least one vendor-specific module is being built.
However, it is cleaner to specify CONFIG_KVM_X86_COMMON for functions that
are used in kvm.ko.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: 590b09b1d88e ("KVM: x86: Register "emergency disable" callbacks when virt is enabled")
Fixes: 6d55a94222db ("x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 months agoKVM: x86: leave kvm.ko out of the build if no vendor module is requested
Paolo Bonzini [Tue, 1 Oct 2024 14:15:01 +0000 (10:15 -0400)]
KVM: x86: leave kvm.ko out of the build if no vendor module is requested

kvm.ko is nothing but library code shared by kvm-intel.ko and kvm-amd.ko.
It provides no functionality on its own and it is unnecessary unless one
of the vendor-specific module is compiled.  In particular, /dev/kvm is
not created until one of kvm-intel.ko or kvm-amd.ko is loaded.

Use CONFIG_KVM to decide if it is built-in or a module, but use the
vendor-specific modules for the actual decision on whether to build it.

This also fixes a build failure when CONFIG_KVM_INTEL and CONFIG_KVM_AMD
are both disabled.  The cpu_emergency_register_virt_callback() function
is called from kvm.ko, but it is only defined if at least one of
CONFIG_KVM_INTEL and CONFIG_KVM_AMD is provided.

Fixes: 590b09b1d88e ("KVM: x86: Register "emergency disable" callbacks when virt is enabled")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 months agoMerge tag 'bcachefs-2024-10-05' of git://evilpiepirate.org/bcachefs
Linus Torvalds [Sat, 5 Oct 2024 22:18:04 +0000 (15:18 -0700)]
Merge tag 'bcachefs-2024-10-05' of git://evilpiepirate.org/bcachefs

Pull bcachefs fixes from Kent Overstreet:
 "A lot of little fixes, bigger ones include:

   - bcachefs's __wait_on_freeing_inode() was broken in rc1 due to vfs
     changes, now fixed along with another lost wakeup

   - fragmentation LRU fixes; fsck now repairs successfully (this is the
     data structure copygc uses); along with some nice simplification.

   - Rework logged op error handling, so that if logged op replay errors
     (due to another filesystem error) we delete the logged op instead
     of going into an infinite loop)

   - Various small filesystem connectivitity repair fixes"

* tag 'bcachefs-2024-10-05' of git://evilpiepirate.org/bcachefs:
  bcachefs: Rework logged op error handling
  bcachefs: Add warn param to subvol_get_snapshot, peek_inode
  bcachefs: Kill snapshot arg to fsck_write_inode()
  bcachefs: Check for unlinked, non-empty dirs in check_inode()
  bcachefs: Check for unlinked inodes with dirents
  bcachefs: Check for directories with no backpointers
  bcachefs: Kill alloc_v4.fragmentation_lru
  bcachefs: minor lru fsck fixes
  bcachefs: Mark more errors AUTOFIX
  bcachefs: Make sure we print error that causes fsck to bail out
  bcachefs: bkey errors are only AUTOFIX during read
  bcachefs: Create lost+found in correct snapshot
  bcachefs: Fix reattach_inode()
  bcachefs: Add missing wakeup to bch2_inode_hash_remove()
  bcachefs: Fix trans_commit disk accounting revert
  bcachefs: Fix bch2_inode_is_open() check
  bcachefs: Fix return type of dirent_points_to_inode_nowarn()
  bcachefs: Fix bad shift in bch2_read_flag_list()

9 months agoMerge tag 'for-linus-6.12a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 5 Oct 2024 17:59:44 +0000 (10:59 -0700)]
Merge tag 'for-linus-6.12a-rc2-tag' of git://git./linux/kernel/git/xen/tip

Pull xen fix from Juergen Gross:
 "Fix Xen config issue introduced in the merge window"

* tag 'for-linus-6.12a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: Fix config option reference in XEN_PRIVCMD definition

9 months agoMerge tag 'ext4_for_linus-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 5 Oct 2024 17:47:00 +0000 (10:47 -0700)]
Merge tag 'ext4_for_linus-5.12-rc2' of git://git./linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Fix some ext4 bugs and regressions relating to oneline resize and fast
  commits"

* tag 'ext4_for_linus-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix off by one issue in alloc_flex_gd()
  ext4: mark fc as ineligible using an handle in ext4_xattr_set()
  ext4: use handle to mark fc as ineligible in __track_dentry_update()

9 months agoMerge tag 'cxl-fixes-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Linus Torvalds [Sat, 5 Oct 2024 17:40:16 +0000 (10:40 -0700)]
Merge tag 'cxl-fixes-6.12-rc2' of git://git./linux/kernel/git/cxl/cxl

Pull cxl fix from Ira Weiny:

 - Fix calculation for SBDF in error injection

* tag 'cxl-fixes-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  EINJ, CXL: Fix CXL device SBDF calculation

9 months agoMerge tag 'i2c-for-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sat, 5 Oct 2024 17:31:04 +0000 (10:31 -0700)]
Merge tag 'i2c-for-6.12-rc2' of git://git./linux/kernel/git/wsa/linux

Pull i2c fix from Wolfram Sang:

 - Fix potential deadlock during runtime suspend and resume (stm32f7)

* tag 'i2c-for-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: stm32f7: Do not prepare/unprepare clock during runtime suspend/resume

9 months agoMerge tag 'spi-fix-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brooni...
Linus Torvalds [Sat, 5 Oct 2024 17:25:04 +0000 (10:25 -0700)]
Merge tag 'spi-fix-v6.12-rc1' of git://git./linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "A small set of driver specific fixes that came in since the merge
  window, about half of which is fixes for correctness in the use of the
  runtime PM APIs done as part of a broader cleanup"

* tag 'spi-fix-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: s3c64xx: fix timeout counters in flush_fifo
  spi: atmel-quadspi: Fix wrong register value written to MR
  spi: spi-cadence: Fix missing spi_controller_is_target() check
  spi: spi-cadence: Fix pm_runtime_set_suspended() with runtime pm enabled
  spi: spi-imx: Fix pm_runtime_set_suspended() with runtime pm enabled

9 months agoMerge tag 'hardening-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 5 Oct 2024 17:19:14 +0000 (10:19 -0700)]
Merge tag 'hardening-v6.12-rc2' of git://git./linux/kernel/git/kees/linux

Pull hardening fixes from Kees Cook:

 - gcc plugins: Avoid Kconfig warnings with randstruct (Nathan
   Chancellor)

 - MAINTAINERS: Add security/Kconfig.hardening to hardening section
   (Nathan Chancellor)

 - MAINTAINERS: Add unsafe_memcpy() to the FORTIFY review list

* tag 'hardening-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  MAINTAINERS: Add security/Kconfig.hardening to hardening section
  hardening: Adjust dependencies in selection of MODVERSIONS
  MAINTAINERS: Add unsafe_memcpy() to the FORTIFY review list

9 months agoMerge tag 'lsm-pr-20241004' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Linus Torvalds [Sat, 5 Oct 2024 17:10:45 +0000 (10:10 -0700)]
Merge tag 'lsm-pr-20241004' of git://git./linux/kernel/git/pcmoore/lsm

Pull lsm revert from Paul Moore:
 "Here is the CONFIG_SECURITY_TOMOYO_LKM revert that we've been
  discussing this week. With near unanimous agreement that the original
  TOMOYO patches were not the right way to solve the distro problem
  Tetsuo is trying the solve, reverting is our best option at this time"

* tag 'lsm-pr-20241004' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
  tomoyo: revert CONFIG_SECURITY_TOMOYO_LKM support

9 months agoplatform/x86: ISST: Fix the KASAN report slab-out-of-bounds bug
Zach Wade [Mon, 23 Sep 2024 14:45:08 +0000 (22:45 +0800)]
platform/x86: ISST: Fix the KASAN report slab-out-of-bounds bug

Attaching SST PCI device to VM causes "BUG: KASAN: slab-out-of-bounds".
kasan report:
[   19.411889] ==================================================================
[   19.413702] BUG: KASAN: slab-out-of-bounds in _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common]
[   19.415634] Read of size 8 at addr ffff888829e65200 by task cpuhp/16/113
[   19.417368]
[   19.418627] CPU: 16 PID: 113 Comm: cpuhp/16 Tainted: G            E      6.9.0 #10
[   19.420435] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.20192059.B64.2207280713 07/28/2022
[   19.422687] Call Trace:
[   19.424091]  <TASK>
[   19.425448]  dump_stack_lvl+0x5d/0x80
[   19.426963]  ? _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common]
[   19.428694]  print_report+0x19d/0x52e
[   19.430206]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[   19.431837]  ? _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common]
[   19.433539]  kasan_report+0xf0/0x170
[   19.435019]  ? _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common]
[   19.436709]  _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common]
[   19.438379]  ? __pfx_sched_clock_cpu+0x10/0x10
[   19.439910]  isst_if_cpu_online+0x406/0x58f [isst_if_common]
[   19.441573]  ? __pfx_isst_if_cpu_online+0x10/0x10 [isst_if_common]
[   19.443263]  ? ttwu_queue_wakelist+0x2c1/0x360
[   19.444797]  cpuhp_invoke_callback+0x221/0xec0
[   19.446337]  cpuhp_thread_fun+0x21b/0x610
[   19.447814]  ? __pfx_cpuhp_thread_fun+0x10/0x10
[   19.449354]  smpboot_thread_fn+0x2e7/0x6e0
[   19.450859]  ? __pfx_smpboot_thread_fn+0x10/0x10
[   19.452405]  kthread+0x29c/0x350
[   19.453817]  ? __pfx_kthread+0x10/0x10
[   19.455253]  ret_from_fork+0x31/0x70
[   19.456685]  ? __pfx_kthread+0x10/0x10
[   19.458114]  ret_from_fork_asm+0x1a/0x30
[   19.459573]  </TASK>
[   19.460853]
[   19.462055] Allocated by task 1198:
[   19.463410]  kasan_save_stack+0x30/0x50
[   19.464788]  kasan_save_track+0x14/0x30
[   19.466139]  __kasan_kmalloc+0xaa/0xb0
[   19.467465]  __kmalloc+0x1cd/0x470
[   19.468748]  isst_if_cdev_register+0x1da/0x350 [isst_if_common]
[   19.470233]  isst_if_mbox_init+0x108/0xff0 [isst_if_mbox_msr]
[   19.471670]  do_one_initcall+0xa4/0x380
[   19.472903]  do_init_module+0x238/0x760
[   19.474105]  load_module+0x5239/0x6f00
[   19.475285]  init_module_from_file+0xd1/0x130
[   19.476506]  idempotent_init_module+0x23b/0x650
[   19.477725]  __x64_sys_finit_module+0xbe/0x130
[   19.476506]  idempotent_init_module+0x23b/0x650
[   19.477725]  __x64_sys_finit_module+0xbe/0x130
[   19.478920]  do_syscall_64+0x82/0x160
[   19.480036]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   19.481292]
[   19.482205] The buggy address belongs to the object at ffff888829e65000
 which belongs to the cache kmalloc-512 of size 512
[   19.484818] The buggy address is located 0 bytes to the right of
 allocated 512-byte region [ffff888829e65000ffff888829e65200)
[   19.487447]
[   19.488328] The buggy address belongs to the physical page:
[   19.489569] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888829e60c00 pfn:0x829e60
[   19.491140] head: order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[   19.492466] anon flags: 0x57ffffc0000840(slab|head|node=1|zone=2|lastcpupid=0x1fffff)
[   19.493914] page_type: 0xffffffff()
[   19.494988] raw: 0057ffffc0000840 ffff88810004cc80 0000000000000000 0000000000000001
[   19.496451] raw: ffff888829e60c00 0000000080200018 00000001ffffffff 0000000000000000
[   19.497906] head: 0057ffffc0000840 ffff88810004cc80 0000000000000000 0000000000000001
[   19.499379] head: ffff888829e60c00 0000000080200018 00000001ffffffff 0000000000000000
[   19.500844] head: 0057ffffc0000003 ffffea0020a79801 ffffea0020a79848 00000000ffffffff
[   19.502316] head: 0000000800000000 0000000000000000 00000000ffffffff 0000000000000000
[   19.503784] page dumped because: kasan: bad access detected
[   19.505058]
[   19.505970] Memory state around the buggy address:
[   19.507172]  ffff888829e65100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   19.508599]  ffff888829e65180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   19.510013] >ffff888829e65200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   19.510014]                    ^
[   19.510016]  ffff888829e65280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   19.510018]  ffff888829e65300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   19.515367] ==================================================================

The reason for this error is physical_package_ids assigned by VMware VMM
are not continuous and have gaps. This will cause value returned by
topology_physical_package_id() to be more than topology_max_packages().

Here the allocation uses topology_max_packages(). The call to
topology_max_packages() returns maximum logical package ID not physical
ID. Hence use topology_logical_package_id() instead of
topology_physical_package_id().

Fixes: 9a1aac8a96dc ("platform/x86: ISST: PUNIT device mapping with Sub-NUMA clustering")
Cc: stable@vger.kernel.org
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Zach Wade <zachwade.k@gmail.com>
Link: https://lore.kernel.org/r/20240923144508.1764-1-zachwade.k@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
9 months agoMerge tag 'linux_kselftest-fixes-6.12-rc2' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sat, 5 Oct 2024 00:30:59 +0000 (17:30 -0700)]
Merge tag 'linux_kselftest-fixes-6.12-rc2' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:
 "Fixes to build warnings, install scripts, run-time error path, and git
  status cleanups to tests:

   - devices/probe: fix for Python3 regex string syntax warnings

   - clone3: removing unused macro from clone3_cap_checkpoint_restore()

   - vDSO: fix to align getrandom states to cache line

   - core and exec: add missing executables to .gitignore files

   - rtc: change to skip test if /dev/rtc0 can't be accessed

   - timers/posix: fix warn_unused_result result in __fatal_error()

   - breakpoints: fix to detect suspend successful condition correctly

   - hid: fix to install required dependencies to run the test"

* tag 'linux_kselftest-fixes-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: breakpoints: use remaining time to check if suspend succeed
  kselftest/devices/probe: Fix SyntaxWarning in regex strings for Python3
  selftest: hid: add missing run-hid-tools-tests.sh
  selftests: vDSO: align getrandom states to cache line
  selftests: exec: update gitignore for load_address
  selftests: core: add unshare_test to gitignore
  clone3: clone3_cap_checkpoint_restore: remove unused MAX_PID_NS_LEVEL macro
  selftests:timers: posix_timers: Fix warn_unused_result in __fatal_error()
  selftest: rtc: Check if could access /dev/rtc0 before testing

9 months agobcachefs: Rework logged op error handling
Kent Overstreet [Tue, 24 Sep 2024 02:06:58 +0000 (22:06 -0400)]
bcachefs: Rework logged op error handling

Initially it was thought that we just wanted to ignore errors from
logged op replay, but it turns out we do need to catch -EROFS, or we'll
go into an infinite loop.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Add warn param to subvol_get_snapshot, peek_inode
Kent Overstreet [Tue, 24 Sep 2024 09:33:07 +0000 (05:33 -0400)]
bcachefs: Add warn param to subvol_get_snapshot, peek_inode

These shouldn't always be fatal errors - logged op resume, in
particular, and we want it as a parameter there.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: Kill snapshot arg to fsck_write_inode()
Kent Overstreet [Mon, 30 Sep 2024 04:00:33 +0000 (00:00 -0400)]
bcachefs: Kill snapshot arg to fsck_write_inode()

It was initially believed that it would be better to be explicit about
the snapshot we're updating when writing inodes in fsck; however, it
turns out that passing around the snapshot separately is more error
prone and we're usually updating the inode in the same snapshow we read
it from.

This is different from normal filesystem paths, where we do the update
in the snapshot of the subvolume we're in.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>