Hector Martin [Sun, 6 Feb 2022 12:18:18 +0000 (21:18 +0900)]
PCI: apple: Add support for optional PWREN GPIO
WiFi and SD card devices on M1 Macs have a separate power enable GPIO.
Add support for this to the PCIe controller. This is modeled after how
pcie-fu740 does it.
Signed-off-by: Hector Martin <marcan@marcan.st>
Hector Martin [Sun, 6 Feb 2022 12:15:39 +0000 (21:15 +0900)]
PCI: apple: Probe all GPIOs for availability first
If we're probing the PCI controller and some GPIOs are not available and
cause a probe defer, we can end up leaving some ports initialized and
not others and making a mess.
Check for PERST# GPIOs for all ports first, and just return
-EPROBE_DEFER if any are not ready yet, without bringing anything up.
Fixes:
1e33888fbe44 ("PCI: apple: Add initial hardware bring-up")
Cc: stable@vger.kernel.org
Signed-off-by: Hector Martin <marcan@marcan.st>
Hector Martin [Sun, 6 Feb 2022 12:13:17 +0000 (21:13 +0900)]
PCI: apple: GPIO handling nitfixes
- Use devm managed GPIO getter
- GPIO ops can sleep in this context
Fixes:
1e33888fbe44 ("PCI: apple: Add initial hardware bring-up")
Cc: stable@vger.kernel.org
Signed-off-by: Hector Martin <marcan@marcan.st>
Stephen Rothwell [Thu, 17 Feb 2022 09:50:19 +0000 (20:50 +1100)]
Add linux-next specific files for
20220217
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Stephen Rothwell [Thu, 17 Feb 2022 09:07:21 +0000 (20:07 +1100)]
Merge branch 'akpm-current/current'
Stephen Rothwell [Thu, 17 Feb 2022 05:01:55 +0000 (16:01 +1100)]
Merge branch 'rust-next' of https://github.com/Rust-for-Linux/linux.git
Stephen Rothwell [Thu, 17 Feb 2022 04:45:40 +0000 (15:45 +1100)]
Merge branch 'next' of git://git./linux/kernel/git/mic/linux.git
Stephen Rothwell [Thu, 17 Feb 2022 04:38:31 +0000 (15:38 +1100)]
Merge branch 'master' of git://git./linux/kernel/git/crng/random.git
Stephen Rothwell [Thu, 17 Feb 2022 04:26:44 +0000 (15:26 +1100)]
Merge branch 'for-next' of git://git./linux/kernel/git/rppt/memblock.git
Stephen Rothwell [Thu, 17 Feb 2022 04:25:11 +0000 (15:25 +1100)]
Merge branch 'mhi-next' of git://git./linux/kernel/git/mani/mhi.git
Stephen Rothwell [Thu, 17 Feb 2022 04:22:50 +0000 (15:22 +1100)]
Merge branch 'kunit' of git://git./linux/kernel/git/shuah/linux-kselftest.git
Stephen Rothwell [Thu, 17 Feb 2022 04:21:18 +0000 (15:21 +1100)]
Merge branch 'for-next' of git://git./linux/kernel/git/mdf/linux-fpga.git
Stephen Rothwell [Thu, 17 Feb 2022 04:19:45 +0000 (15:19 +1100)]
Merge branch 'hyperv-next' of git://git./linux/kernel/git/hyperv/linux.git
Stephen Rothwell [Thu, 17 Feb 2022 04:03:28 +0000 (15:03 +1100)]
Merge branch 'main' of git://git.infradead.org/users/willy/xarray.git
Stephen Rothwell [Thu, 17 Feb 2022 04:00:52 +0000 (15:00 +1100)]
Merge branch 'for-next' of git://git./linux/kernel/git/srini/nvmem.git
Stephen Rothwell [Thu, 17 Feb 2022 03:59:19 +0000 (14:59 +1100)]
Merge branch 'ntb-next' of https://github.com/jonmason/ntb.git
Stephen Rothwell [Thu, 17 Feb 2022 03:57:01 +0000 (14:57 +1100)]
Merge branch 'rtc-next' of git://git./linux/kernel/git/abelloni/linux.git
Stephen Rothwell [Thu, 17 Feb 2022 03:55:49 +0000 (14:55 +1100)]
Merge branch 'next' of git://git./linux/kernel/git/coresight/linux.git
Stephen Rothwell [Thu, 17 Feb 2022 03:55:48 +0000 (14:55 +1100)]
Merge branch 'for-next' of git://git./linux/kernel/git/livepatching/livepatching
Stephen Rothwell [Thu, 17 Feb 2022 03:54:37 +0000 (14:54 +1100)]
Merge branch 'next' of git://git./linux/kernel/git/shuah/linux-kselftest.git
Stephen Rothwell [Thu, 17 Feb 2022 03:52:29 +0000 (14:52 +1100)]
Merge branch 'for-next' of git://git./linux/kernel/git/thierry.reding/linux-pwm.git
Stephen Rothwell [Thu, 17 Feb 2022 03:50:08 +0000 (14:50 +1100)]
Merge branch 'for-next' of git://git./linux/kernel/git/pinctrl/samsung.git
Stephen Rothwell [Thu, 17 Feb 2022 03:50:08 +0000 (14:50 +1100)]
Merge branch 'renesas-pinctrl' of git://git./linux/kernel/git/geert/renesas-drivers.git
Stephen Rothwell [Thu, 17 Feb 2022 03:50:07 +0000 (14:50 +1100)]
Merge branch 'for-next' of git://git./linux/kernel/git/pinctrl/intel.git
Stephen Rothwell [Thu, 17 Feb 2022 03:50:07 +0000 (14:50 +1100)]
Merge branch 'for-next' of git://git./linux/kernel/git/linusw/linux-pinctrl.git
Waiman Long [Wed, 16 Feb 2022 04:32:37 +0000 (15:32 +1100)]
ipc/mqueue: use get_tree_nodev() in mqueue_get_tree()
When running the stress-ng clone benchmark with multiple testing threads,
it was found that there were significant spinlock contention in sget_fc().
The contended spinlock was the sb_lock. It is under heavy contention
because the following code in the critcal section of sget_fc():
hlist_for_each_entry(old, &fc->fs_type->fs_supers, s_instances) {
if (test(old, fc))
goto share_extant_sb;
}
After testing with added instrumentation code, it was found that the the
benchmark could generate thousands of ipc namespaces with the
corresponding number of entries in the mqueue's fs_supers list where the
namespaces are the key for the search. This leads to excessive time in
scanning the list for a match.
Looking back at the mqueue calling sequence leading to sget_fc():
mq_init_ns()
=> mq_create_mount()
=> fc_mount()
=> vfs_get_tree()
=> mqueue_get_tree()
=> get_tree_keyed()
=> vfs_get_super()
=> sget_fc()
Currently, mq_init_ns() is the only mqueue function that will indirectly
call mqueue_get_tree() with a newly allocated ipc namespace as the key for
searching. As a result, there will never be a match with the exising ipc
namespaces stored in the mqueue's fs_supers list.
So using get_tree_keyed() to do an existing ipc namespace search is just a
waste of time. Instead, we could use get_tree_nodev() to eliminate the
useless search. By doing so, we can greatly reduce the sb_lock hold time
and avoid the spinlock contention problem in case a large number of ipc
namespaces are present.
Of course, if the code is modified in the future to allow
mqueue_get_tree() to be called with an existing ipc namespace instead of a
new one, we will have to use get_tree_keyed() in this case.
The following stress-ng clone benchmark command was run on a 2-socket
48-core Intel system:
./stress-ng --clone 32 --verbose --oomable --metrics-brief -t 20
The "bogo ops/s" increased from 5948.45 before patch to 9137.06 after
patch. This is an increase of 54% in performance.
Link: https://lkml.kernel.org/r/20220121172315.19652-1-longman@redhat.com
Fixes:
935c6912b198 ("ipc: Convert mqueue fs to fs_context")
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Marco Elver [Wed, 16 Feb 2022 04:32:36 +0000 (15:32 +1100)]
Revert "ubsan, kcsan: Don't combine sanitizer with kcov on clang"
This reverts commit
ea91a1d45d19469001a4955583187b0d75915759.
Since
df05c0e9496c ("Documentation: Raise the minimum supported version
of LLVM to 11.0.0") the minimum Clang version is now 11.0, which fixed
the UBSAN/KCSAN vs. KCOV incompatibilities.
Link: https://bugs.llvm.org/show_bug.cgi?id=45831
Link: https://lkml.kernel.org/r/YaodyZzu0MTCJcvO@elver.google.com
Link: https://lkml.kernel.org/r/20220128105631.509772-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Muhammad Usama Anjum [Wed, 16 Feb 2022 04:32:35 +0000 (15:32 +1100)]
selftests: use -isystem instead of -I to include headers
Selftests need kernel headers and glibc for compilation. In compilation
of selftests, uapi headers from kernel source are used instead of default
ones while glibc has already been compiled with different header files
installed in the operating system. So there can be redefinition warnings
from compiler. These warnings can be suppressed by using -isystem to
include the uapi headers.
Link: https://lkml.kernel.org/r/20220214160756.3543590-1-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Kees Cook [Wed, 16 Feb 2022 04:32:34 +0000 (15:32 +1100)]
selftests: kselftest framework: provide "finished" helper
Instead of having each time that wants to use ksft_exit() have to figure
out the internals of kselftest.h, add the helper ksft_finished() that
makes sure the passes, xfails, and skips are equal to the test plan count.
Link: https://lkml.kernel.org/r/20220201013717.2464392-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Muhammad Usama Anjum [Wed, 16 Feb 2022 04:32:32 +0000 (15:32 +1100)]
selftests: vm: remove dependecy from internal kernel macros
The defination of swap() is used from kernel's internal header when this
test is built in source tree. The build fails when this test is built out
of source tree as defination of swap() isn't found. Selftests shouldn't
depend on kernel's internal header files. They can only depend on uapi
header files. Add the defination of swap() to fix the build error:
gcc -Wall -I/linux_mainline2/build/usr/include -no-pie userfaultfd.c -lrt -lpthread -o /linux_mainline2/build/kselftest/vm/userfaultfd
userfaultfd.c: In function `userfaultfd_stress':
userfaultfd.c:1530:3: warning: implicit declaration of function `swap'; did you mean `swab'? [-Wimplicit-function-declaration]
1530 | swap(area_src, area_dst);
| ^~~~
| swab
/usr/bin/ld: /tmp/cclUUH7V.o: in function `userfaultfd_stress':
userfaultfd.c:(.text+0x4d64): undefined reference to `swap'
/usr/bin/ld: userfaultfd.c:(.text+0x4d82): undefined reference to `swap'
collect2: error: ld returned 1 exit status
Link: https://lkml.kernel.org/r/20220119101531.2850400-11-usama.anjum@collabora.com
Fixes:
2c769ed7137a ("tools/testing/selftests/vm/userfaultfd.c: use swap() to make code cleaner")
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Cc: Andr Almeida <andrealmeid@collabora.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Mat Martineau <mathew.j.martineau@linux.intel.com>
Cc: Matthieu Baerts <matthieu.baerts@tessares.net>
Cc: Mickal Salan <mic@digikod.net>
Cc: Minghao Chi <chi.minghao@zte.com.cn>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Muhammad Usama Anjum [Wed, 16 Feb 2022 04:32:31 +0000 (15:32 +1100)]
selftests: vm: add the uapi headers include variable
Out of tree build of this test fails if relative path of the output
directory is specified. Add the KHDR_INCLUDES to correctly reach the
headers.
Link: https://lkml.kernel.org/r/20220119101531.2850400-10-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: Alistair Popple <apopple@nvidia.com>
Cc: Andr Almeida <andrealmeid@collabora.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Mat Martineau <mathew.j.martineau@linux.intel.com>
Cc: Matthieu Baerts <matthieu.baerts@tessares.net>
Cc: Mickal Salan <mic@digikod.net>
Cc: Minghao Chi <chi.minghao@zte.com.cn>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Muhammad Usama Anjum [Wed, 16 Feb 2022 04:32:30 +0000 (15:32 +1100)]
selftests: mptcp: add the uapi headers include variable
Out of tree build of this test fails if relative path of the output
directory is specified. Add the KHDR_INCLUDES to correctly reach the
headers.
Link: https://lkml.kernel.org/r/20220119101531.2850400-9-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andr Almeida <andrealmeid@collabora.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Mat Martineau <mathew.j.martineau@linux.intel.com>
Cc: Mickal Salan <mic@digikod.net>
Cc: Minghao Chi <chi.minghao@zte.com.cn>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Muhammad Usama Anjum [Wed, 16 Feb 2022 04:32:29 +0000 (15:32 +1100)]
selftests: net: add the uapi headers include variable
Out of tree build of this test fails if relative path of the output
directory is specified. Add the KHDR_INCLUDES to correctly reach the
headers.
Link: https://lkml.kernel.org/r/20220119101531.2850400-8-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andr Almeida <andrealmeid@collabora.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Mat Martineau <mathew.j.martineau@linux.intel.com>
Cc: Matthieu Baerts <matthieu.baerts@tessares.net>
Cc: Mickal Salan <mic@digikod.net>
Cc: Minghao Chi <chi.minghao@zte.com.cn>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Muhammad Usama Anjum [Wed, 16 Feb 2022 04:32:28 +0000 (15:32 +1100)]
selftests: landlock: add the uapi headers include variable
Out of tree build of this test fails if relative path of the output
directory is specified. Add the KHDR_INCLUDES to correctly reach the
headers.
Link: https://lkml.kernel.org/r/20220119101531.2850400-7-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andr Almeida <andrealmeid@collabora.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Mat Martineau <mathew.j.martineau@linux.intel.com>
Cc: Matthieu Baerts <matthieu.baerts@tessares.net>
Cc: Mickal Salan <mic@digikod.net>
Cc: Minghao Chi <chi.minghao@zte.com.cn>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Muhammad Usama Anjum [Wed, 16 Feb 2022 04:32:27 +0000 (15:32 +1100)]
selftests: kvm: add the uapi headers include variable
Out of tree build of this test fails if relative path of the output
directory is specified. Add KHDR_INCLUDES to correctly reach the headers.
Link: https://lkml.kernel.org/r/20220119101531.2850400-6-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andr Almeida <andrealmeid@collabora.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Mat Martineau <mathew.j.martineau@linux.intel.com>
Cc: Matthieu Baerts <matthieu.baerts@tessares.net>
Cc: Mickal Salan <mic@digikod.net>
Cc: Minghao Chi <chi.minghao@zte.com.cn>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Muhammad Usama Anjum [Wed, 16 Feb 2022 04:32:26 +0000 (15:32 +1100)]
selftests: futex: add the uapi headers include variable
Out of tree build of this test fails if relative path of the output
directory is specified. KBUILD_OUTPUT also doesn't point to the correct
directory when relative path is used. Thus out of tree builds fail.
Remove the un-needed include paths and use KHDR_INCLUDES to correctly
reach the headers.
Link: https://lkml.kernel.org/r/20220119101531.2850400-5-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andr Almeida <andrealmeid@collabora.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Mat Martineau <mathew.j.martineau@linux.intel.com>
Cc: Matthieu Baerts <matthieu.baerts@tessares.net>
Cc: Mickal Salan <mic@digikod.net>
Cc: Minghao Chi <chi.minghao@zte.com.cn>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Muhammad Usama Anjum [Wed, 16 Feb 2022 04:32:25 +0000 (15:32 +1100)]
selftests: correct the headers install path
uapi headers should be installed at the top of the object tree,
"<obj_tree>/usr/include". There is no need for kernel headers to be
present at kselftest build directory, "<obj_tree>/kselftest/usr/ include"
as well. This duplication can be avoided by correctly specifying the
INSTALL_HDR_PATH.
Link: https://lkml.kernel.org/r/20220119101531.2850400-4-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andr Almeida <andrealmeid@collabora.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Mat Martineau <mathew.j.martineau@linux.intel.com>
Cc: Matthieu Baerts <matthieu.baerts@tessares.net>
Cc: Mickal Salan <mic@digikod.net>
Cc: Minghao Chi <chi.minghao@zte.com.cn>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Muhammad Usama Anjum [Wed, 16 Feb 2022 04:32:24 +0000 (15:32 +1100)]
selftests: add and export a kernel uapi headers path
Kernel uapi headers can be present at different paths depending upon how
the build was invoked. It becomes impossible for the tests to include the
correct headers directory. Set and export KHDR_INCLUDES variable to make
it possible for sub make files to include the header files.
Link: https://lkml.kernel.org/r/20220119101531.2850400-3-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andr Almeida <andrealmeid@collabora.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Mat Martineau <mathew.j.martineau@linux.intel.com>
Cc: Matthieu Baerts <matthieu.baerts@tessares.net>
Cc: Mickal Salan <mic@digikod.net>
Cc: Minghao Chi <chi.minghao@zte.com.cn>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Muhammad Usama Anjum [Wed, 16 Feb 2022 04:32:23 +0000 (15:32 +1100)]
selftests: set the BUILD variable to absolute path
Patch series "selftests: Fix separate output directory builds", v2.
Build of several selftests fail if separate output directory is
specified by the following methods:
1) make -C tools/testing/selftests O=<build_dir>
2) export KBUILD_OUTPUT="build_dir"; make -C tools/testing/selftests
Build fails because of several reasons:
1) The kernel headers aren't found.
2) The path of output objects is wrong and hence unaccessible.
These problems can be solved by:
1) Including the correct path of uapi header files
2) By setting the BUILD variable correctly inside Makefile
Following different build scenarios have been tested after making these
changes to verify that nothing gets broken with these changes:
make -C tools/testing/selftests
make -C tools/testing/selftests/futex
make -C tools/testing/selftests/kvm
make -C tools/testing/selftests/landlock
make -C tools/testing/selftests/net
make -C tools/testing/selftests/net/mptcp
make -C tools/testing/selftests/vm
make -C tools/testing/selftests O=build
make -C tools/testing/selftests o=/opt/build
export KBUILD_OUTPUT="/opt/build"; make -C tools/testing/selftests
export KBUILD_OUTPUT="build"; make -C tools/testing/selftests
cd <any_dir>; make -C <src_path>/tools/testing/selftests
cd <any_dir>; make -C <src_path>/tools/testing/selftests O=build
This patch (of 10):
The build of kselftests fails if relative path is specified through
KBUILD_OUTPUT or O=<path> method. BUILD variable is used to determine the
path of the output objects. When make is run from other directories with
relative paths, the exact path of the build objects is ambiguous and build
fails.
make[1]: Entering directory '/home/usama/repos/kernel/linux_mainline2/tools/testing/selftests/alsa'
gcc mixer-test.c -L/usr/lib/x86_64-linux-gnu -lasound -o build/kselftest/alsa/mixer-test
/usr/bin/ld: cannot open output file build/kselftest/alsa/mixer-test
Set the BUILD variable to the absolute path of the output directory. Make
the logic readable and easy to follow. Use spaces instead of tabs for
indentation as if with tab indentation is considered recipe in make.
Link: https://lkml.kernel.org/r/20220119101531.2850400-1-usama.anjum@collabora.com
Link: https://lkml.kernel.org/r/20220119101531.2850400-2-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Andr Almeida <andrealmeid@collabora.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Mickal Salan <mic@digikod.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Mat Martineau <mathew.j.martineau@linux.intel.com>
Cc: Matthieu Baerts <matthieu.baerts@tessares.net>
Cc: Minghao Chi <chi.minghao@zte.com.cn>
Cc: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Aleksandr Nogikh [Wed, 16 Feb 2022 04:32:21 +0000 (15:32 +1100)]
kcov: properly handle subsequent mmap calls
Allocate the kcov buffer during KCOV_MODE_INIT in order to untie mmapping
of a kcov instance and the actual coverage collection process. Modify
kcov_mmap, so that it can be reliably used any number of times once
KCOV_MODE_INIT has succeeded.
These changes to the user-facing interface of the tool only weaken the
preconditions, so all existing user space code should remain compatible
with the new version.
Link: https://lkml.kernel.org/r/20220117153634.150357-3-nogikh@google.com
Signed-off-by: Aleksandr Nogikh <nogikh@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Taras Madan <tarasmadan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Aleksandr Nogikh [Wed, 16 Feb 2022 04:32:20 +0000 (15:32 +1100)]
kcov: split ioctl handling into locked and unlocked parts
Patch series "kcov: improve mmap processing", v3.
Subsequent mmaps of the same kcov descriptor currently do not update the
virtual memory of the task and yet return 0 (success). This is
counter-intuitive and may lead to unexpected memory access errors.
Also, this unnecessarily limits the functionality of kcov to only the
simplest usage scenarios. Kcov instances are effectively forever attached
to their first address spaces and it becomes impossible to e.g. reuse the
same kcov handle in forked child processes without mmapping the memory
first. This is exactly what we tried to do in syzkaller and inadvertently
came upon this behavior.
This patch series addresses the problem described above.
This patch (of 3):
Currently all ioctls are de facto processed under a spinlock in order to
serialise them. This, however, prohibits the use of vmalloc and other
memory management functions in the implementations of those ioctls,
unnecessary complicating any further changes to the code.
Let all ioctls first be processed inside the kcov_ioctl() function which
should execute the ones that are not compatible with spinlock and then
pass control to kcov_ioctl_locked() for all other ones.
KCOV_REMOTE_ENABLE is processed both in kcov_ioctl() and
kcov_ioctl_locked() as the steps are easily separable.
Although it is still compatible with a spinlock, move KCOV_INIT_TRACE
handling to kcov_ioctl(), so that the changes from the next commit are
easier to follow.
Link: https://lkml.kernel.org/r/20220117153634.150357-1-nogikh@google.com
Link: https://lkml.kernel.org/r/20220117153634.150357-2-nogikh@google.com
Signed-off-by: Aleksandr Nogikh <nogikh@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Taras Madan <tarasmadan@google.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Guilherme G. Piccoli [Wed, 16 Feb 2022 04:32:19 +0000 (15:32 +1100)]
panic: allow printing extra panic information on kdump
Currently we have the "panic_print" parameter/sysctl to allow some extra
information to be printed in a panic event. On the other hand, the kdump
mechanism allows to kexec a new kernel to collect a memory dump for the
running kernel in case of panic.
Right now these options are incompatible: the user either sets the kdump
or makes use of "panic_print". The code path of "panic_print" isn't
reached when kdump is configured.
There are situations though in which this would be interesting: for
example, in systems that are very memory constrained, a handcrafted tiny
kernel/initrd for kdump might be used in order to only collect the dmesg
in kdump kernel. Even more common, systems with no disk space for the
full (compressed) memory dump might very well rely in this functionality
too, dumping only the dmesg with the additional information provided by
"panic_print".
So, this is what the patch does: allows both functionality to co-exist; if
"panic_print" is set and the system performs a kdump, the extra
information is printed on dmesg before the kexec. Some notes about the
design choices here:
(a) We could have introduced a sysctl or an extra bit on "panic_print"
to allow enabling the co-existence of kdump and "panic_print", but
seems that would be over-engineering; we have 3 cases, let's check how
this patch change things:
- if the user have kdump set and not "panic_print", nothing changes;
- if the user have "panic_print" set and not kdump, nothing changes;
- if both are enabled, now we print the extra information before kdump,
which is exactly the goal of the patch (and should be the goal of the
user, since they enabled both options).
(b) We assume that the code path won't return from __crash_kexec() so
we didn't guard against double execution of panic_print_sys_info().
Link: https://lkml.kernel.org/r/20211109202848.610874-4-gpiccoli@igalia.com
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Cc: Dave Young <dyoung@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Guilherme G. Piccoli [Wed, 16 Feb 2022 04:32:18 +0000 (15:32 +1100)]
panic: add option to dump all CPUs backtraces in panic_print
Currently the "panic_print" parameter/sysctl allows some interesting debug
information to be printed during a panic event. This is useful for
example in cases the user cannot kdump due to resource limits, or if the
user collects panic logs in a serial output (or pstore) and prefers a fast
reboot instead of a kdump.
Happens that currently there's no way to see all CPUs backtraces in a
panic using "panic_print" on architectures that support that. We do have
"oops_all_cpu_backtrace" sysctl, but although partially overlapping in the
functionality, they are orthogonal in nature: "panic_print" is a panic
tuning (and we have panics without oopses, like direct calls to panic() or
maybe other paths that don't go through oops_enter() function), and the
original purpose of "oops_all_cpu_backtrace" is to provide more
information on oopses for cases in which the users desire to continue
running the kernel even after an oops, i.e., used in non-panic scenarios.
So, we hereby introduce an additional bit for "panic_print" to allow
dumping the CPUs backtraces during a panic event.
Link: https://lkml.kernel.org/r/20211109202848.610874-3-gpiccoli@igalia.com
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: Feng Tang <feng.tang@intel.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Randy Dunlap [Wed, 16 Feb 2022 04:32:17 +0000 (15:32 +1100)]
sysctl: documentation: fix table format warning
Fix malformed table warning in sysctl documentation:
(don't use ':'s)
Documentation/admin-guide/sysctl/kernel.rst:798: WARNING: Malformed table.
Text in column margin in table line 7.
===== ============================================
bit 0 print all tasks info
bit 1 print system memory info
bit 2 print timer info
bit 3 print locks info if ``CONFIG_LOCKDEP`` is on
bit 4 print ftrace buffer
bit 5: print all printk messages in buffer
bit 6: print all CPUs backtrace (if available in the arch)
Link: https://lkml.kernel.org/r/20220109055635.6999-1-rdunlap@infradead.org
Fixes:
934d51cad60c ("docs: sysctl/kernel: add missing bit to panic_print")
Fixes:
addc64999934 ("panic: add option to dump all CPUs backtraces in panic_print")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Guilherme G. Piccoli [Wed, 16 Feb 2022 04:32:16 +0000 (15:32 +1100)]
docs: sysctl/kernel: add missing bit to panic_print
Patch series "Some improvements on panic_print".
This is a mix of a documentation fix with some additions to the
"panic_print" syscall / parameter. The goal here is being able to collect
all CPUs backtraces during a panic event and also to enable "panic_print"
in a kdump event - details of the reasoning and design choices in the
patches.
This patch (of 3):
Commit
de6da1e8bcf0 ("panic: add an option to replay all the printk
message in buffer") added a new bit to the sysctl/kernel parameter
"panic_print", but the documentation was added only in
kernel-parameters.txt, not in the sysctl guide.
Fix it here by adding bit 5 to sysctl admin-guide documentation.
Link: https://lkml.kernel.org/r/20211109202848.610874-1-gpiccoli@igalia.com
Link: https://lkml.kernel.org/r/20211109202848.610874-2-gpiccoli@igalia.com
Fixes:
de6da1e8bcf0 ("panic: add an option to replay all the printk message in buffer")
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: Feng Tang <feng.tang@intel.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tiezhu Yang [Wed, 16 Feb 2022 04:32:14 +0000 (15:32 +1100)]
kasan: no need to unset panic_on_warn in end_report()
panic_on_warn is unset inside panic(), so no need to unset it before
calling panic() in end_report().
Link: https://lkml.kernel.org/r/1644324666-15947-6-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tiezhu Yang [Wed, 16 Feb 2022 04:32:13 +0000 (15:32 +1100)]
ubsan: no need to unset panic_on_warn in ubsan_epilogue()
panic_on_warn is unset inside panic(), so no need to unset it before
calling panic() in ubsan_epilogue().
Link: https://lkml.kernel.org/r/1644324666-15947-5-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tiezhu Yang [Wed, 16 Feb 2022 04:32:12 +0000 (15:32 +1100)]
panic: unset panic_on_warn inside panic()
In the current code, the following three places need to unset
panic_on_warn before calling panic() to avoid recursive panics:
kernel/kcsan/report.c: print_report()
kernel/sched/core.c: __schedule_bug()
mm/kfence/report.c: kfence_report_error()
In order to avoid copy-pasting "panic_on_warn = 0" all over the places, it
is better to move it inside panic() and then remove it from the other
places.
Link: https://lkml.kernel.org/r/1644324666-15947-4-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tiezhu Yang [Wed, 16 Feb 2022 04:32:10 +0000 (15:32 +1100)]
docs: kdump: add scp example to write out the dump file
Except cp and makedumpfile, add scp example to write out the dump file.
Link: https://lkml.kernel.org/r/1644324666-15947-3-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tiezhu Yang [Wed, 16 Feb 2022 04:32:09 +0000 (15:32 +1100)]
docs: kdump: update description about sysfs file system support
Patch series "Update doc and fix some issues about kdump", v2.
This patch (of 5):
After commit
6a108a14fa35 ("kconfig: rename CONFIG_EMBEDDED to
CONFIG_EXPERT"), "Configure standard kernel features (for small systems)"
is not exist, we should use "Configure standard kernel features (expert
users)" now.
Link: https://lkml.kernel.org/r/1644324666-15947-1-git-send-email-yangtiezhu@loongson.cn
Link: https://lkml.kernel.org/r/1644324666-15947-2-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Jisheng Zhang [Wed, 16 Feb 2022 04:32:08 +0000 (15:32 +1100)]
arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE" by a
check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the code and
increase compile coverage.
Link: https://lkml.kernel.org/r/20211206160514.2000-5-jszhang@kernel.org
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Jisheng Zhang [Wed, 16 Feb 2022 04:32:07 +0000 (15:32 +1100)]
x86/setup: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE" by a
check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the code and
increase compile coverage.
Link: https://lkml.kernel.org/r/20211206160514.2000-4-jszhang@kernel.org
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Jisheng Zhang [Wed, 16 Feb 2022 04:32:06 +0000 (15:32 +1100)]
riscv: mm: init: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE" by a
check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the code and
increase compile coverage.
Link: https://lkml.kernel.org/r/20211206160514.2000-3-jszhang@kernel.org
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Jisheng Zhang [Wed, 16 Feb 2022 04:32:06 +0000 (15:32 +1100)]
kexec: make crashk_res, crashk_low_res and crash_notes symbols always visible
Patch series "kexec: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef", v2.
Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE" by a
check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the code and
increase compile coverage.
I only modified x86, arm, arm64 and riscv, other architectures such as sh,
powerpc and s390 are better to be kept kexec code as-is so they are not
touched.
This patch (of 5):
Make the forward declarations of crashk_res, crashk_low_res and
crash_notes always visible. Code referring to these symbols can then just
check for IS_ENABLED(CONFIG_KEXEC_CORE), instead of requiring conditional
compilation using an #ifdef, thus preparing to increase compile coverage
and simplify the code.
Link: https://lkml.kernel.org/r/20211206160514.2000-1-jszhang@kernel.org
Link: https://lkml.kernel.org/r/20211206160514.2000-2-jszhang@kernel.org
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Kees Cook [Wed, 16 Feb 2022 04:32:05 +0000 (15:32 +1100)]
selftests/exec: test for empty string on NULL argv
Test for the NULL argv argument producing a single empty string on exec.
Link: https://lkml.kernel.org/r/20220201011637.2457646-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Ariadne Conill <ariadne@dereferenced.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Rich Felker <dalias@libc.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Kees Cook [Wed, 16 Feb 2022 04:32:04 +0000 (15:32 +1100)]
exec: Fix min/max typo in stack space calculation
When handling the argc == 0 case, the stack space calculation should be
using max() not min().
Link: https://lkml.kernel.org/r/20220201190700.3147041-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ariadne Conill <ariadne@dereferenced.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Kees Cook [Wed, 16 Feb 2022 04:32:03 +0000 (15:32 +1100)]
exec: force single empty string when argv is empty
Quoting[1] Ariadne Conill:
"In several other operating systems, it is a hard requirement that the
second argument to execve(2) be the name of a program, thus prohibiting
a scenario where argc < 1. POSIX 2017 also recommends this behaviour,
but it is not an explicit requirement[2]:
The argument arg0 should point to a filename string that is
associated with the process being started by one of the exec
functions.
...
Interestingly, Michael Kerrisk opened an issue about this in 2008[3],
but there was no consensus to support fixing this issue then.
Hopefully now that CVE-2021-4034 shows practical exploitative use[4]
of this bug in a shellcode, we can reconsider.
This issue is being tracked in the KSPP issue tracker[5]."
While the initial code searches[6][7] turned up what appeared to be
mostly corner case tests, trying to that just reject argv == NULL
(or an immediately terminated pointer list) quickly started tripping[8]
existing userspace programs.
The next best approach is forcing a single empty string into argv and
adjusting argc to match. The number of programs depending on argc == 0
seems a smaller set than those calling execve with a NULL argv.
Account for the additional stack space in bprm_stack_limits(). Inject an
empty string when argc == 0 (and set argc = 1). Warn about the case so
userspace has some notice about the change:
process './argc0' launched './argc0' with NULL argv: empty string added
Additionally WARN() and reject NULL argv usage for kernel threads.
[1] https://lore.kernel.org/lkml/
20220127000724.15106-1-ariadne@dereferenced.org/
[2] https://pubs.opengroup.org/onlinepubs/
9699919799/functions/exec.html
[3] https://bugzilla.kernel.org/show_bug.cgi?id=8408
[4] https://www.qualys.com/2022/01/25/cve-2021-4034/pwnkit.txt
[5] https://github.com/KSPP/linux/issues/176
[6] https://codesearch.debian.net/search?q=execve%5C+*%5C%28%5B%5E%2C%5D%2B%2C+*NULL&literal=0
[7] https://codesearch.debian.net/search?q=execlp%3F%5Cs*%5C%28%5B%5E%2C%5D%2B%2C%5Cs*NULL&literal=0
[8] https://lore.kernel.org/lkml/
20220131144352.GE16385@xsang-OptiPlex-9020/
Link: https://lkml.kernel.org/r/20220201000947.2453721-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Ariadne Conill <ariadne@dereferenced.org>
Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Ariadne Conill <ariadne@dereferenced.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Helge Deller [Wed, 16 Feb 2022 04:32:02 +0000 (15:32 +1100)]
fat: use pointer to simple type in put_user()
The put_user(val,ptr) macro wants a pointer to a simple type, but in
fat_ioctl_filldir() the d_name field references an "array of chars". Be
more accurate and explicitly give the pointer to the first character of
the d_name[] array.
I noticed that issue while trying to optimize the parisc put_user() macro
and used an intermediate variable to store the pointer. In that case I
got this error:
In file included from include/linux/uaccess.h:11,
from include/linux/compat.h:17,
from fs/fat/dir.c:18:
fs/fat/dir.c: In function `fat_ioctl_filldir':
fs/fat/dir.c:725:33: error: invalid initializer
725 | if (put_user(0, d2->d_name) || \
| ^~
include/asm/uaccess.h:152:33: note: in definition of macro `__put_user'
152 | __typeof__(ptr) __ptr = ptr; \
| ^~~
fs/fat/dir.c:759:1: note: in expansion of macro `FAT_IOCTL_FILLDIR_FUNC'
759 | FAT_IOCTL_FILLDIR_FUNC(fat_ioctl_filldir, __fat_dirent)
Andreas Schwab <schwab@linux-m68k.org> suggested to use
__typeof__(&*(ptr)) __ptr = ptr;
instead. This works, but nevertheless it's probably reasonable to
fix the original caller too.
Link: https://lkml.kernel.org/r/Ygo+A9MREmC1H3kr@p100
Signed-off-by: Helge Deller <deller@gmx.de>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: David Laight <David.Laight@aculab.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Qinghua Jin [Wed, 16 Feb 2022 04:32:02 +0000 (15:32 +1100)]
minix: fix bug when opening a file with O_DIRECT
Testcase:
1. create a minix file system and mount it
2. open a file on the file system with O_RDWR|O_CREAT|O_TRUNC|O_DIRECT
3. open fails with -EINVAL but leaves an empty file behind. All other
open() failures don't leave the failed open files behind.
It is hard to check the direct_IO op before creating the inode. Just as
ext4 and btrfs do, this patch will resolve the issue by allowing to create
the file with O_DIRECT but returning error when writing the file.
Link: https://lkml.kernel.org/r/20220107133626.413379-1-qhjin.dev@gmail.com
Signed-off-by: Qinghua Jin <qhjin.dev@gmail.com>
Reported-by: Colin Ian King <colin.king@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Andrei Vagin [Wed, 16 Feb 2022 04:32:01 +0000 (15:32 +1100)]
fs/pipe.c: local vars have to match types of proper pipe_inode_info fields
head, tail, ring_size are declared as unsigned int, so all local variables
that operate with these fields have to be unsigned to avoid signed integer
overflow.
Right now, it isn't an issue because the maximum pipe size is limited by
1U<<31.
Link: https://lkml.kernel.org/r/20220106171946.36128-1-avagin@gmail.com
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Suggested-by: Dmitry Safonov <0x7f454c46@gmail.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Andrei Vagin [Wed, 16 Feb 2022 04:32:01 +0000 (15:32 +1100)]
fs/pipe: use kvcalloc to allocate a pipe_buffer array
Right now, kcalloc is used to allocate a pipe_buffer array. The size of
the pipe_buffer struct is 40 bytes. kcalloc allows allocating reliably
chunks with sizes less or equal to PAGE_ALLOC_COSTLY_ORDER (3). It means
that the maximum pipe size is 3.2MB in this case.
In CRIU, we use pipes to dump processes memory. CRIU freezes a target
process, injects a parasite code into it and then this code splices memory
into pipes. If a maximum pipe size is small, we need to do many
iterations or create many pipes.
kvcalloc attempt to allocate physically contiguous memory, but upon
failure, fall back to non-contiguous (vmalloc) allocation and so it isn't
limited by PAGE_ALLOC_COSTLY_ORDER.
The maximum pipe size for non-root users is limited by the
/proc/sys/fs/pipe-max-size sysctl that is 1MB by default, so only the root
user will be able to trigger vmalloc allocations.
Link: https://lkml.kernel.org/r/20220104171058.22580-1-avagin@gmail.com
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Andrew Halaney [Wed, 16 Feb 2022 04:32:00 +0000 (15:32 +1100)]
init/main.c: silence some -Wunused-parameter warnings
There are a bunch of callbacks with unused arguments, go ahead and silence
those so "make KCFLAGS=-W init/main.o" is a little quieter. Here's a
little sample:
init/main.c:182:43: warning: unused parameter 'str' [-Wunused-parameter]
static int __init set_reset_devices(char *str)
Link: https://lkml.kernel.org/r/20210519162341.1275452-1-ahalaney@redhat.com
Signed-off-by: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Mark-PK Tsai [Wed, 16 Feb 2022 04:31:59 +0000 (15:31 +1100)]
init: use ktime_us_delta() to make initcall_debug log more precise
Use ktime_us_delta() to make the initcall_debug log more precise than
right shifting the result of ktime_to_ns() by 10 bits.
Link: https://lkml.kernel.org/r/20220209053350.15771-1-mark-pk.tsai@mediatek.com
Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Tested-by: Andrew Halaney <ahalaney@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: YJ Chiang <yj.chiang@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Maninder Singh [Wed, 16 Feb 2022 04:31:59 +0000 (15:31 +1100)]
kallsyms: print module name in %ps/S case when KALLSYMS is disabled
original:
With KALLSYMS
%pS %ps
[16.4200] hello_init+0x0/0x24 [crash] hello_init [crash]
Without KALLSYMS:
[16.2200] 0xbe200040 0xbe200040
With Patch (Without KALLSYMS:) load address + current offset [Module Name]
[13.5993] 0xbe200000+0x40 [crash] 0xbe200000+0x40 [crash]
It will help in better debugging and checking when KALLSYMS is disabled,
user will get information about module name and load address of module.
verified for arm64:
/ # insmod /crash.ko
[ 19.263556] 0xffff800000ec0000+0x38 [crash]
..
[ 19.276023] Call trace:
[ 19.276277] 0xffff800000ec0000+0x28 [crash]
[ 19.276567] 0xffff800000ec0000+0x58 [crash]
[ 19.276727] 0xffff800000ec0000+0x74 [crash]
[ 19.276866] 0xffff8000080127d0
[ 19.276978] 0xffff80000812d95c
[ 19.277085] 0xffff80000812f554
Link: https://lkml.kernel.org/r/20220201040044.1528568-1-maninder1.s@samsung.com
Signed-off-by: Vaneet Narang <v.narang@samsung.com>
Co-developed-by: Vaneet Narang <v.narang@samsung.com>
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Alexey Dobriyan [Wed, 16 Feb 2022 04:31:58 +0000 (15:31 +1100)]
binfmt: move more stuff undef CONFIG_COREDUMP
struct linux_binfmt::core_dump and struct min_coredump::min_coredump are
used under CONFIG_COREDUMP only. Shrink those embedded configs a bit.
Link: https://lkml.kernel.org/r/YglbIFyN+OtwVyjW@localhost.localdomain
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Alexey Dobriyan [Wed, 16 Feb 2022 04:31:58 +0000 (15:31 +1100)]
ELF: fix overflow in total mapping size calculation
Kernel assumes that ELF program headers are ordered by mapping address,
but doesn't enforce it. It is possible to make mapping size extremely
huge by simply shuffling first and last PT_LOAD segments.
As long as PT_LOAD segments do not overlap, it is silly to require sorting
by v_addr anyway because mmap() doesn't care.
Don't assume PT_LOAD segments are sorted and calculate min and max
addresses correctly.
Link: https://lore.kernel.org/all/YVmd7D0M6G/DcP4O@localhost.localdomain/
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Tested-by: Magnus Groß <magnus.gross@rwth-aachen.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Akira Kawata [Wed, 16 Feb 2022 04:31:57 +0000 (15:31 +1100)]
fs/binfmt_elf: refactor load_elf_binary function
I delete load_addr because it is not used anymore. And I rename
load_addr_set to first_pt_load because it is used only to capture the
first iteration of the loop.
Link: https://lkml.kernel.org/r/20211212232414.1402199-3-akirakawata1@gmail.com
Signed-off-by: Akira Kawata <akirakawata1@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Akira Kawata [Wed, 16 Feb 2022 04:31:56 +0000 (15:31 +1100)]
fs-binfmt_elf-fix-at_phdr-for-unusual-elf-files-v5
add comment per Kees
Link: https://lkml.kernel.org/r/20220127124014.338760-2-akirakawata1@gmail.com
Signed-off-by: Akira Kawata <akirakawata1@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Akira Kawata [Wed, 16 Feb 2022 04:31:55 +0000 (15:31 +1100)]
fs/binfmt_elf: fix AT_PHDR for unusual ELF files
Patch series "fs/binfmt_elf: Fix AT_PHDR for unusual ELF files", v4.
These patches fix a bug in AT_PHDR calculation.
We cannot calculate AT_PHDR as the sum of load_addr and exec->e_phoff.
This is because exec->e_phoff is the offset of PHDRs in the file and the
address of PHDRs in the memory may differ from it. These patches fix the
bug by calculating the address of program headers from PT_LOADs directly.
This patch (of 2):
As pointed out in the bugzilla discussion, we cannot calculate AT_PHDR as
the sum of load_addr and exec->e_phoff.
: The AT_PHDR of ELF auxiliary vectors should point to the memory address
: of program header. But binfmt_elf.c calculates this address as follows:
:
: NEW_AUX_ENT(AT_PHDR, load_addr + exec->e_phoff);
:
: which is wrong since e_phoff is the file offset of program header and
: load_addr is the memory base address from PT_LOAD entry.
:
: The ld.so uses AT_PHDR as the memory address of program header. In normal
: case, since the e_phoff is usually 64 and in the first PT_LOAD region, it
: is the correct program header address.
:
: But if the address of program header isn't equal to the first PT_LOAD
: address + e_phoff (e.g. Put the program header in other non-consecutive
: PT_LOAD region), ld.so will try to read program header from wrong address
: then crash or use incorrect program header.
This is because exec->e_phoff is the offset of PHDRs in the file and the
address of PHDRs in the memory may differ from it. This patch fixes the
bug by calculating the address of program headers from PT_LOADs directly.
Link: https://lkml.kernel.org/r/20211212232414.1402199-1-akirakawata1@gmail.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=197921
Link: https://lkml.kernel.org/r/20211212232414.1402199-2-akirakawata1@gmail.com
Signed-off-by: Akira Kawata <akirakawata1@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Maninder Singh [Wed, 16 Feb 2022 04:31:55 +0000 (15:31 +1100)]
scripts/checkpatch.pl: remove _deferred and _deferred_once false warning
With commit
98e35f5894cf ("printk: git rid of [sched_delayed] message for
printk_deferred") printk_deferred and printk_deferred_once require
LOGLEVEL in argument, but checkpatch.pl was not fixed and still reports it
as warning:
WARNING: Possible unnecessary KERN_ALERT
printk_deferred(KERN_ALERT "checking deferred
");
As suggested by Andy, made 2 functions from logFunction.
1. logFunction: with all checks
2. logFunctionCore: without printk(?:_ratelimited|_once|_deferred) checking
and call logFunctionCore instead of logFunction for checking of loglevel,
which will exclude checking of printk(?:_ratelimited|_once|_deferred).
This way, there is no need to maintain same stanza at multiple places for
removing printk flavours.
Link: https://lkml.kernel.org/r/20220202103309.1914992-1-maninder1.s@samsung.com
Co-developed-by: Vaneet Narang <v.narang@samsung.com>
Signed-off-by: Vaneet Narang <v.narang@samsung.com>
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Joe Perches [Wed, 16 Feb 2022 04:31:54 +0000 (15:31 +1100)]
checkpatch: add early_param exception to blank line after struct/function test
Add early_param as another exception to the blank line preferred after
function/struct/union declaration or definition test.
Link: https://lkml.kernel.org/r/3bd6ada59f411a7685d7e64eeb670540d4bfdcde.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Joe Perches [Wed, 16 Feb 2022 04:31:53 +0000 (15:31 +1100)]
checkpatch: add --fix option for some TRAILING_STATEMENTS
Single line code like:
if (foo) bar;
should generally be written:
if (foo)
bar;
Add a --fix test to do so.
This fix is not done when an ASSIGN_IN_IF in the same line exists.
Link: https://lkml.kernel.org/r/20220128185924.80137-2-joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Joe Perches [Wed, 16 Feb 2022 04:31:53 +0000 (15:31 +1100)]
checkpatch: prefer MODULE_LICENSE("GPL") over MODULE_LICENSE("GPL v2")
There is no effective difference.
Given the large number of uses of "GPL v2", emit this message only for
patches as a trivial treeside sed could be done one day.
Ref: commit
bf7fbeeae6db ("module: Cure the MODULE_LICENSE "GPL" vs. "GPL v2" bogosity")
Link: https://lkml.kernel.org/r/20220128185924.80137-1-joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Guo Xuenan [Wed, 16 Feb 2022 04:31:52 +0000 (15:31 +1100)]
lz4: fix LZ4_decompress_safe_partial read out of bound
When partialDecoding, it is EOF if we've either, filled the output
buffer or can't proceed with reading an offset for following match.
In some extreme corner cases when compressed data is crusted corrupted,
UAF will occur. As reported by KASAN [1], LZ4_decompress_safe_partial
may lead to read out of bound problem during decoding. lz4 upstream has
fixed it [2] and this issue has been disscussed here [3] before.
current decompression routine was ported from lz4 v1.8.3, bumping lib/lz4
to v1.9.+ is certainly a huge work to be done later, so, we'd better fix
it first.
[1] https://lore.kernel.org/all/
000000000000830d1205cf7f0477@google.com/
[2] https://github.com/lz4/lz4/commit/
c5d6f8a8be3927c0bec91bcc58667a6cfad244ad#
[3] https://lore.kernel.org/all/
CC666AE8-4CA4-4951-B6FB-
A2EFDE3AC03B@fb.com/
Link: https://lkml.kernel.org/r/20211111105048.2006070-1-guoxuenan@huawei.com
Reported-by: syzbot+63d688f1d899c588fb71@syzkaller.appspotmail.com
Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
Reviewed-by: Nick Terrell <terrelln@fb.com>
Cc: Gao Xiang <hsiangkao@linux.alibaba.com>
Cc: Yann Collet <cyan@fb.com>
Cc: Chengyang Fan <cy.fan@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Andy Shevchenko [Wed, 16 Feb 2022 04:31:51 +0000 (15:31 +1100)]
bitfield: add explicit inclusions to the example
It's not obvious that bitfield.h doesn't guarantee the bits.h inclusion
and the example in the former is confusing. Some developers think that
it's okay to just include bitfield.h to get it working. Change example to
explicitly include necessary headers in order to avoid confusion.
Link: https://lkml.kernel.org/r/20220207123341.47533-1-andriy.shevchenko@linux.intel.com
Fixes:
3e9b3112ec74 ("add basic register-field manipulation macros")
Depends-on:
8bd9cb51daac ("locking/atomics, asm-generic: Move some macros from <linux/bitops.h> to a new <linux/bits.h> file")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reported-by: Jan DÄ…broÅ› <jsd@semihalf.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Christophe Leroy [Wed, 16 Feb 2022 04:31:50 +0000 (15:31 +1100)]
ilog2: force inlining of __ilog2_u32() and __ilog2_u64()
Building a kernel with CONFIG_CC_OPTIMISE_FOR_SIZE leads to __ilog2_u32()
being duplicated 50 times and __ilog2_u64() 3 times in vmlinux on a tiny
powerpc32 config.
__ilog2_u32() being 2 instructions it is not worth being kept out of line,
so force inlining. Allthough the u64 version is a bit bigger, there is
still a small benefit in keeping it inlined. On a 64 bits config there's
a real benefit.
With this change the size of vmlinux text is reduced by 1 kbytes, which is
approx 50% more than the size of the removed functions.
Before the patch there is for instance:
c00d2a94 <__ilog2_u32>:
c00d2a94: 7c 63 00 34 cntlzw r3,r3
c00d2a98: 20 63 00 1f subfic r3,r3,31
c00d2a9c: 4e 80 00 20 blr
c00d36d8 <__order_base_2>:
c00d36d8: 28 03 00 01 cmplwi r3,1
c00d36dc: 40 81 00 2c ble
c00d3708 <__order_base_2+0x30>
c00d36e0: 94 21 ff f0 stwu r1,-16(r1)
c00d36e4: 7c 08 02 a6 mflr r0
c00d36e8: 38 63 ff ff addi r3,r3,-1
c00d36ec: 90 01 00 14 stw r0,20(r1)
c00d36f0: 4b ff f3 a5 bl
c00d2a94 <__ilog2_u32>
c00d36f4: 80 01 00 14 lwz r0,20(r1)
c00d36f8: 38 63 00 01 addi r3,r3,1
c00d36fc: 7c 08 03 a6 mtlr r0
c00d3700: 38 21 00 10 addi r1,r1,16
c00d3704: 4e 80 00 20 blr
c00d3708: 38 60 00 00 li r3,0
c00d370c: 4e 80 00 20 blr
With the patch it has become:
c00d356c <__order_base_2>:
c00d356c: 28 03 00 01 cmplwi r3,1
c00d3570: 40 81 00 14 ble
c00d3584 <__order_base_2+0x18>
c00d3574: 38 63 ff ff addi r3,r3,-1
c00d3578: 7c 63 00 34 cntlzw r3,r3
c00d357c: 20 63 00 20 subfic r3,r3,32
c00d3580: 4e 80 00 20 blr
c00d3584: 38 60 00 00 li r3,0
c00d3588: 4e 80 00 20 blr
No more need for __order_base_2() to setup a stack frame and
save/restore caller address. And the following 'add 1' is
merged in the subtract.
Another typical use of it:
c080ff28 <hugepagesz_setup>:
...
c080fff8: 7f c3 f3 78 mr r3,r30
c080fffc: 4b 8f 81 f1 bl
c01081ec <__ilog2_u32>
c0810000: 38 63 ff f2 addi r3,r3,-14
...
Becomes
c080ff1c <hugepagesz_setup>:
...
c080ffec: 7f c3 00 34 cntlzw r3,r30
c080fff0: 20 63 00 11 subfic r3,r3,17
...
Here no need to move r30 argument to r3 then substract 14 to result. Just
work on r30 and merge the 'sub 14' with the 'sub from 31'.
Link: https://lkml.kernel.org/r/803a2ac3d923ebcfd0dd40f5886b05cae7bb0aba.1644243860.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Rasmus Villemoes [Wed, 16 Feb 2022 04:31:50 +0000 (15:31 +1100)]
include: drop pointless __compiler_offsetof indirection
(1) compiler_types.h is unconditionally included via an -include flag
(see scripts/Makefile.lib), and it defines __compiler_offsetof
unconditionally. So testing for definedness of __compiler_offsetof is
mostly pointless.
(2) Every relevant compiler provides __builtin_offsetof (even sparse
has had that for 14 years), and if for whatever reason one would end
up picking up the poor man's fallback definition (C file compiler with
completely custom CFLAGS?), newer clang versions won't treat the
result as an Integer Constant Expression, so if used in place where
such is required (static initializer or static_assert), one would get
errors like
t.c:11:16: error: static_assert expression is not an integral constant expression
t.c:11:16: note: cast that performs the conversions of a reinterpret_cast is not allowed in a constant expression
t.c:4:33: note: expanded from macro 'offsetof'
#define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER)
So just define offsetof unconditionally and directly in terms of
__builtin_offsetof.
Link: https://lkml.kernel.org/r/20220202102147.326672-1-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Kees Cook [Wed, 16 Feb 2022 04:31:49 +0000 (15:31 +1100)]
Kconfig.debug: make DEBUG_INFO always default=n
While trying to make sure CONFIG_DEBUG_INFO wasn't set for COMPILE_TEST, I
ordered the choices incorrectly to retain the prior default=n state. Move
DEBUG_INFO_NONE to the top so that the default choice is disabled, and
remove the "if COMPILE_TEST" as it is now redundant.
Link: https://lkml.kernel.org/r/20220128214131.580131-1-keescook@chromium.org
Link: https://lore.kernel.org/lkml/YfRY6+CaQxX7O8vF@dev-arch.archlinux-ax161
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Kees Cook [Wed, 16 Feb 2022 04:31:48 +0000 (15:31 +1100)]
Kconfig.debug: make DEBUG_INFO selectable from a choice
Currently it's not possible to enable DEBUG_INFO for an all*config build,
since it is marked as "depends on !COMPILE_TEST". This generally makes
sense because a debug build of an all*config target ends up taking much
longer and the output is much larger. Having this be "default off" makes
sense. However, there are cases where enabling DEBUG_INFO for such builds
is useful for doing treewide A/B comparisons of build options, etc.
Make DEBUG_INFO selectable from any of the DWARF version choice options,
with DEBUG_INFO_NONE being the default for COMPILE_TEST. The mutually
exclusive relationship between DWARF5 and BTF must be inverted, but the
result remains the same. Additionally moves DEBUG_KERNEL and DEBUG_MISC
up to the top of the menu because they were enabling features _above_ it,
making it weird to navigate menuconfig.
Link: https://lkml.kernel.org/r/20220125075126.891825-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Julius Hemanth Pitti [Wed, 16 Feb 2022 04:31:48 +0000 (15:31 +1100)]
proc/sysctl: make protected_* world readable
protected_* files have 600 permissions which prevents non-superuser from
reading them.
Container like "AWS greengrass" refuse to launch unless
protected_hardlinks and protected_symlinks are set. When containers like
these run with "userns-remap" or "--user" mapping container's root to
non-superuser on host, they fail to run due to denied read access to these
files.
As these protections are hardly a secret, and do not possess any security
risk, making them world readable.
Though above greengrass usecase needs read access to only
protected_hardlinks and protected_symlinks files, setting all other
protected_* files to 644 to keep consistency.
Link: http://lkml.kernel.org/r/20200709235115.56954-1-jpitti@cisco.com
Fixes:
800179c9b8a1 ("fs: add link restrictions")
Signed-off-by: Julius Hemanth Pitti <jpitti@cisco.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Yang Li [Wed, 16 Feb 2022 04:31:47 +0000 (15:31 +1100)]
proc/vmcore: fix vmcore_alloc_buf() kernel-doc comment
Fix a spelling problem to remove warnings found by running
scripts/kernel-doc, which is caused by using 'make W=1'.
fs/proc/vmcore.c:492: warning: Function parameter or member 'size' not
described in 'vmcore_alloc_buf'
fs/proc/vmcore.c:492: warning: Excess function parameter 'sizez'
description in 'vmcore_alloc_buf'
Link: https://lkml.kernel.org/r/20220129011449.105278-1-yang.lee@linux.alibaba.com
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Acked-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
David Hildenbrand [Wed, 16 Feb 2022 04:31:46 +0000 (15:31 +1100)]
proc/vmcore: fix possible deadlock on concurrent mmap and read
Lockdep noticed that there is chance for a deadlock if we have concurrent
mmap, concurrent read, and the addition/removal of a callback.
As nicely explained by Boqun:
"
Lockdep warned about the above sequences because rw_semaphore is a fair
read-write lock, and the following can cause a deadlock:
TASK 1 TASK 2 TASK 3
====== ====== ======
down_write(mmap_lock);
down_read(vmcore_cb_rwsem)
down_write(vmcore_cb_rwsem); // blocked
down_read(vmcore_cb_rwsem); // cannot get the lock because of the fairness
down_read(mmap_lock); // blocked
IOW, a reader can block another read if there is a writer queued by the
second reader and the lock is fair.
"
To fix, convert to srcu to make this deadlock impossible. We need srcu as
our callbacks can sleep. With this change, I cannot trigger any lockdep
warnings.
[ 6.386519] ======================================================
[ 6.387203] WARNING: possible circular locking dependency detected
[ 6.387965] 5.17.0-0.rc0.20220117git0c947b893d69.68.test.fc36.x86_64 #1 Not tainted
[ 6.388899] ------------------------------------------------------
[ 6.389657] makedumpfile/542 is trying to acquire lock:
[ 6.390308]
ffffffff832d2eb8 (vmcore_cb_rwsem){.+.+}-{3:3}, at: mmap_vmcore+0x340/0x580
[ 6.391290]
[ 6.391290] but task is already holding lock:
[ 6.391978]
ffff8880af226438 (&mm->mmap_lock#2){++++}-{3:3}, at: vm_mmap_pgoff+0x84/0x150
[ 6.392898]
[ 6.392898] which lock already depends on the new lock.
[ 6.392898]
[ 6.393866]
[ 6.393866] the existing dependency chain (in reverse order) is:
[ 6.394762]
[ 6.394762] -> #1 (&mm->mmap_lock#2){++++}-{3:3}:
[ 6.395530] lock_acquire+0xc3/0x1a0
[ 6.396047] __might_fault+0x4e/0x70
[ 6.396562] _copy_to_user+0x1f/0x90
[ 6.397093] __copy_oldmem_page+0x72/0xc0
[ 6.397663] read_from_oldmem+0x77/0x1e0
[ 6.398229] read_vmcore+0x2c2/0x310
[ 6.398742] proc_reg_read+0x47/0xa0
[ 6.399265] vfs_read+0x101/0x340
[ 6.399751] __x64_sys_pread64+0x5d/0xa0
[ 6.400314] do_syscall_64+0x43/0x90
[ 6.400778] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 6.401390]
[ 6.401390] -> #0 (vmcore_cb_rwsem){.+.+}-{3:3}:
[ 6.402063] validate_chain+0x9f4/0x2670
[ 6.402560] __lock_acquire+0x8f7/0xbc0
[ 6.403054] lock_acquire+0xc3/0x1a0
[ 6.403509] down_read+0x4a/0x140
[ 6.403948] mmap_vmcore+0x340/0x580
[ 6.404403] proc_reg_mmap+0x3e/0x90
[ 6.404866] mmap_region+0x504/0x880
[ 6.405322] do_mmap+0x38a/0x520
[ 6.405744] vm_mmap_pgoff+0xc1/0x150
[ 6.406258] ksys_mmap_pgoff+0x178/0x200
[ 6.406823] do_syscall_64+0x43/0x90
[ 6.407339] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 6.407975]
[ 6.407975] other info that might help us debug this:
[ 6.407975]
[ 6.408945] Possible unsafe locking scenario:
[ 6.408945]
[ 6.409684] CPU0 CPU1
[ 6.410196] ---- ----
[ 6.410703] lock(&mm->mmap_lock#2);
[ 6.411121] lock(vmcore_cb_rwsem);
[ 6.411792] lock(&mm->mmap_lock#2);
[ 6.412465] lock(vmcore_cb_rwsem);
[ 6.412873]
[ 6.412873] *** DEADLOCK ***
[ 6.412873]
[ 6.413522] 1 lock held by makedumpfile/542:
[ 6.414006] #0:
ffff8880af226438 (&mm->mmap_lock#2){++++}-{3:3}, at: vm_mmap_pgoff+0x84/0x150
[ 6.414944]
[ 6.414944] stack backtrace:
[ 6.415432] CPU: 0 PID: 542 Comm: makedumpfile Not tainted 5.17.0-0.rc0.20220117git0c947b893d69.68.test.fc36.x86_64 #1
[ 6.416581] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 6.417272] Call Trace:
[ 6.417593] <TASK>
[ 6.417882] dump_stack_lvl+0x5d/0x78
[ 6.418346] print_circular_bug+0x5d7/0x5f0
[ 6.418821] ? stack_trace_save+0x3a/0x50
[ 6.419273] ? save_trace+0x3d/0x330
[ 6.419681] check_noncircular+0xd1/0xe0
[ 6.420217] validate_chain+0x9f4/0x2670
[ 6.420715] ? __lock_acquire+0x8f7/0xbc0
[ 6.421234] ? __lock_acquire+0x8f7/0xbc0
[ 6.421685] __lock_acquire+0x8f7/0xbc0
[ 6.422127] lock_acquire+0xc3/0x1a0
[ 6.422535] ? mmap_vmcore+0x340/0x580
[ 6.422965] ? lock_is_held_type+0xe2/0x140
[ 6.423432] ? mmap_vmcore+0x340/0x580
[ 6.423893] down_read+0x4a/0x140
[ 6.424321] ? mmap_vmcore+0x340/0x580
[ 6.424800] mmap_vmcore+0x340/0x580
[ 6.425237] ? vm_area_alloc+0x1c/0x60
[ 6.425661] ? trace_kmem_cache_alloc+0x30/0xe0
[ 6.426174] ? kmem_cache_alloc+0x1e0/0x2f0
[ 6.426641] proc_reg_mmap+0x3e/0x90
[ 6.427052] mmap_region+0x504/0x880
[ 6.427462] do_mmap+0x38a/0x520
[ 6.427842] vm_mmap_pgoff+0xc1/0x150
[ 6.428260] ksys_mmap_pgoff+0x178/0x200
[ 6.428701] do_syscall_64+0x43/0x90
[ 6.429126] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 6.429745] RIP: 0033:0x7fc7359b8fc7
[ 6.430157] Code: 00 00 00 89 ef e8 69 b3 ff ff eb e4 e8 c2 64 01 00 66 90 f3 0f 1e fa 41 89 ca 41 f7 c1 ff 0f 00 00 75 10 b8 09 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 21 c3 48 8b 05 21 7e 0e 00 64 c7 00 16 00 00
[ 6.432147] RSP: 002b:
00007fff35b4c208 EFLAGS:
00000246 ORIG_RAX:
0000000000000009
[ 6.432970] RAX:
ffffffffffffffda RBX:
0000000000000001 RCX:
00007fc7359b8fc7
[ 6.433746] RDX:
0000000000000001 RSI:
0000000000400000 RDI:
0000000000000000
[ 6.434529] RBP:
000055a1125ecf10 R08:
0000000000000003 R09:
0000000000002000
[ 6.435310] R10:
0000000000000002 R11:
0000000000000246 R12:
0000000000002000
[ 6.436093] R13:
0000000000400000 R14:
000055a1124269e2 R15:
0000000000000000
[ 6.436887] </TASK>
Link: https://lkml.kernel.org/r/20220119193417.100385-1-david@redhat.com
Fixes:
cc5f2704c934 ("proc/vmcore: convert oldmem_pfn_is_ram callback to more generic vmcore callbacks")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Baoquan He <bhe@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Andrew Morton [Wed, 16 Feb 2022 04:31:45 +0000 (15:31 +1100)]
proc-alloc-path_max-bytes-for-proc-pid-fd-symlinks-fix
remove now-unneeded cast
Reported-by: kernel test robot <lkp@intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Hao Lee <haolee.swjtu@gmail.com>
Cc: James Morris <jamorris@linux.microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Hao Lee [Wed, 16 Feb 2022 04:31:45 +0000 (15:31 +1100)]
proc: alloc PATH_MAX bytes for /proc/${pid}/fd/ symlinks
It's not a standard approach that use __get_free_page() to alloc path
buffer directly. We'd better use kmalloc and PATH_MAX.
PAGE_SIZE is different on different archs. An unlinked file
with very long canonical pathname will readlink differently
because "(deleted)" eats into a buffer. --adobriyan
Link: https://lkml.kernel.org/r/Ye1fCxyZZ0I5lgOL@localhost.localdomain
Signed-off-by: Hao Lee <haolee.swjtu@gmail.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tetsuo Handa [Wed, 16 Feb 2022 04:31:44 +0000 (15:31 +1100)]
kernel/hung_task.c: Monitor killed tasks.
syzbot's current top report is "no output from test machine" where the
userspace process failed to spawn a new test process for 300 seconds for
some reason. One of reasons which can result in this report is that an
already spawned test process was unable to terminate (e.g. trapped at an
unkillable retry loop due to some bug) after SIGKILL was sent to that
process. Therefore, reporting when a thread is failing to terminate
despite a fatal signal is pending would give us more useful information.
In the context of syzbot's testing where there are only 2 CPUs in the
target VM (which means that only small number of threads and not so much
memory) and threads get SIGKILL after 5 seconds from fork(), being unable
to reach do_exit() within 10 seconds is likely a sign of something went
wrong. Therefore, I would like to try this patch in linux-next.git for
feasibility testing whether this patch helps finding more bugs and
reproducers for such bugs, by bringing "unable to terminate threads"
reports out of "no output from test machine" reports.
Potential bad effect of this patch will be that kernel code becomes
killable without addressing the root cause of being unable to terminate,
for use of killable wait will bypass both TASK_UNINTERRUPTIBLE stall test
and SIGKILL after 5 seconds behavior, which will result in failing to
detect in real systems where SIGKILL won't be sent after 5 seconds when
something went wrong.
This version shares existing sysctl settings (e.g. check interval,
timeout, whether to panic) used for detecting TASK_UNINTERRUPTIBLE
threads. We will likely want to use different sysctl settings for
monitoring killed threads. But let's start as linux-next.git patch
without introducing new sysctl settings. We can add sysctl settings
before sending to linux.git.
Link: http://lkml.kernel.org/r/60d1d7f6-b201-3dcb-a51b-76a31bcfa919@i-love.sakura.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Liu Chuansheng <chuansheng.liu@intel.com>
Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Cc: linux-kernel@vger.kernel.org
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tetsuo Handa [Wed, 16 Feb 2022 04:31:43 +0000 (15:31 +1100)]
fs/buffer.c: dump more info for __getblk_gfp() stall problem
We need to dump more variables on top of
"fs/buffer.c: add debug print for __getblk_gfp() stall problem".
Link: http://lkml.kernel.org/r/12239545-7d8a-820f-48ba-952e2e98a05c@i-love.sakura.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tetsuo Handa [Wed, 16 Feb 2022 04:31:43 +0000 (15:31 +1100)]
fs/buffer.c: add debug print for __getblk_gfp() stall problem
Among syzbot's unresolved hung task reports, 18 out of 65 reports contain
__getblk_gfp() line in the backtrace. Since there is a comment block that
says that __getblk_gfp() will lock up the machine if try_to_free_buffers()
attempt from grow_dev_page() is failing, let's start from checking whether
syzbot is hitting that case. This change will be removed after the bug is
fixed.
Link: http://lkml.kernel.org/r/9b9fcdda-c347-53ee-fdbb-8a7d11cf430e@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: <syzkaller-bugs@googlegroups.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
SeongJae Park [Wed, 16 Feb 2022 04:31:43 +0000 (15:31 +1100)]
mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}()
Because DAMON debugfs interface and DAMON-based proactive reclaim are now
using monitoring operations via registration mechanism,
damon_{p,v}a_{target_valid,set_operations}() functions have no user. This
commit clean them up.
Link: https://lkml.kernel.org/r/20220215184603.1479-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
SeongJae Park [Wed, 16 Feb 2022 04:31:43 +0000 (15:31 +1100)]
mm/damon/dbgfs-test: fix is_target_id() change
DAMON kunit tests for DAMON debugfs interface fails because it still
assumes setting empty monitoring operations makes DAMON debugfs interface
believe the target of the context don't have pid. This commit fixes the
kunit test fails by explicitly setting the context's monitoring operations
with the operations for the physical address space, which let debugfs
knows the target will not have pid.
Link: https://lkml.kernel.org/r/20220215184603.1479-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
SeongJae Park [Wed, 16 Feb 2022 04:31:43 +0000 (15:31 +1100)]
mm/damon/dbgfs: use operations id for knowing if the target has pid
DAMON debugfs interface depends on monitoring operations for virtual
address spaces because it knows if the target has pid or not by seeing if
the context is configured to use one of the virtual address space
monitoring operation functions. We can replace that check with 'enum
damon_ops_id' now, to make it independent. This commit makes the change.
Link: https://lkml.kernel.org/r/20220215184603.1479-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
SeongJae Park [Wed, 16 Feb 2022 04:31:42 +0000 (15:31 +1100)]
mm/damon/dbgfs: use damon_select_ops() instead of damon_{v,p}a_set_operations()
This commit makes DAMON debugfs interface to select the registered
monitoring operations for the physical address space or virtual address
spaces depending on user requests instead of setting it on its own. Note
that DAMON debugfs interface is still dependent to DAMON_VADDR with this
change, because it is also using its symbol, 'damon_va_target_valid'.
Link: https://lkml.kernel.org/r/20220215184603.1479-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
SeongJae Park [Wed, 16 Feb 2022 04:31:42 +0000 (15:31 +1100)]
mm/damon/reclaim: use damon_select_ops() instead of damon_{v,p}a_set_operations()
This commit makes DAMON_RECLAIM to select the registered monitoring
operations for the physical address space instead of setting it on its
own. This allows DAMON_RECLAIM be independent of DAMON_PADDR, but leave
the dependency as is, because it's the only one monitoring operations it
use, and therefore it makes no sense to build DAMON_RECLAIM without
DAMON_PADDR.
Link: https://lkml.kernel.org/r/20220215184603.1479-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
SeongJae Park [Wed, 16 Feb 2022 04:31:42 +0000 (15:31 +1100)]
mm/damon/paddr,vaddr: register themselves to DAMON in subsys_initcall
This commit makes the monitoring operations for the physical address space
and virtual address spaces register themselves to DAMON in the
subsys_initcall step. Later, in-kernel DAMON user code can use them via
damon_select_ops() without have to unnecessarily depend on all possible
monitoring operations implementations.
Link: https://lkml.kernel.org/r/20220215184603.1479-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
SeongJae Park [Wed, 16 Feb 2022 04:31:42 +0000 (15:31 +1100)]
mm/damon: let monitoring operations can be registered and selected
In-kernel DAMON user code like DAMON debugfs interface should set 'struct
damon_operations' of its 'struct damon_ctx' on its own. Therefore, the
client code should depend on all supporting monitoring operations
implementations that it could use. For example, DAMON debugfs interface
depends on both vaddr and paddr, while some of the users are not always
interested in both.
To minimize such unnecessary dependencies, this commit makes the
monitoring operations can be registered by implementing code and then
dynamically selected by the user code without build-time dependency.
Link: https://lkml.kernel.org/r/20220215184603.1479-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
SeongJae Park [Wed, 16 Feb 2022 04:31:42 +0000 (15:31 +1100)]
mm/damon: rename damon_primitives to damon_operations
Patch series "Allow DAMON user code independent of monitoring primitives".
In-kernel DAMON user code is required to configure the monitoring context
(struct damon_ctx) with proper monitoring primitives (struct
damon_primitive). This makes the user code dependent to all supporting
monitoring primitives. For example, DAMON debugfs interface depends on
both DAMON_VADDR and DAMON_PADDR, though some users have interest in only
one use case. As more monitoring primitives are introduced, the problem
will be bigger.
To minimize such unnecessary dependency, this patchset makes monitoring
primitives can be registered by the implemnting code and later dynamically
searched and selected by the user code.
In addition to that, this patchset renames monitoring primitives to
monitoring operations, which is more easy to intuitively understand what
it means and how it would be structed.
This patch (of 8):
DAMON has a set of callback functions called monitoring primitives and let
it can be configured with various implementations for easy extension for
different address spaces and usages. However, the word 'primitive' is not
so explicit. Meanwhile, many other structs resembles similar purpose
calls themselves 'operations'. To make the code easier to be understood,
this commit renames 'damon_primitives' to 'damon_operations' before it is
too late to rename.
Link: https://lkml.kernel.org/r/20220215184603.1479-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20220215184603.1479-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Baolin Wang [Wed, 16 Feb 2022 04:31:41 +0000 (15:31 +1100)]
mm/damon: remove redundant page validation
It will never get a NULL page by pte_page() as discussed in thread [1],
thus remove the redundant page validation to fix below Smatch static
checker warning.
mm/damon/vaddr.c:405 damon_hugetlb_mkold()
warn: 'page' can't be NULL.
[1] https://lore.kernel.org/linux-mm/
20220106091200.GA14564@kili/
Link: https://lkml.kernel.org/r/6d32f7d201b8970d53f51b6c5717d472aed2987c.1642386715.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
SeongJae Park [Wed, 16 Feb 2022 04:31:41 +0000 (15:31 +1100)]
mm/damon: remove the target id concept
DAMON asks each monitoring target ('struct damon_target') to have one
'unsigned long' integer called 'id', which should be unique among the
targets of same monitoring context. Meaning of it is, however, totally up
to the monitoring primitives that registered to the monitoring context.
For example, the virtual address spaces monitoring primitives treats the
id as a 'struct pid' pointer.
This makes the code flexible, but ugly, not well-documented, and
type-unsafe[1]. Also, identification of each target can be done via its
index. For the reason, this commit removes the concept and uses clear
type definition. For now, only 'struct pid' pointer is used for the
virtual address spaces monitoring. If DAMON is extended in future so that
we need to put another identifier field in the struct, we will use a union
for such primitives-dependent fields and document which primitives are
using which type.
[1] https://lore.kernel.org/linux-mm/
20211013154535.
4aaeaaf9d0182922e405dd1e@linux-foundation.org/
Link: https://lkml.kernel.org/r/20211230100723.2238-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
SeongJae Park [Wed, 16 Feb 2022 04:31:41 +0000 (15:31 +1100)]
mm/damon/core: move damon_set_targets() into dbgfs
damon_set_targets() function is defined in the core for general use cases,
but called from only dbgfs. Also, because the function is for general use
cases, dbgfs does additional handling of pid type target id case. To make
the situation simpler, this commit moves the function into dbgfs and makes
it to do the pid type case handling on its own.
Link: https://lkml.kernel.org/r/20211230100723.2238-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
SeongJae Park [Wed, 16 Feb 2022 04:31:41 +0000 (15:31 +1100)]
Docs/admin-guide/mm/damon/usage: update for changed initail_regions file input
A previous commit made init_regions debugfs file to use target index
instead of target id for specifying the target of the init regions. This
commit updates the usage document to reflect the change.
Link: https://lkml.kernel.org/r/20211230100723.2238-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
SeongJae Park [Wed, 16 Feb 2022 04:31:41 +0000 (15:31 +1100)]
mm/damon/dbgfs/init_regions: use target index instead of target id
Patch series "Remove the type-unclear target id concept".
DAMON asks each monitoring target ('struct damon_target') to have one
'unsigned long' integer called 'id', which should be unique among the
targets of same monitoring context. Meaning of it is, however, totally up
to the monitoring primitives that registered to the monitoring context.
For example, the virtual address spaces monitoring primitives treats the
id as a 'struct pid' pointer.
This makes the code flexible but ugly, not well-documented, and
type-unsafe[1]. Also, identification of each target can be done via its
index. For the reason, this patchset removes the concept and uses clear
type definition.
[1] https://lore.kernel.org/linux-mm/
20211013154535.
4aaeaaf9d0182922e405dd1e@linux-foundation.org/
This patch (of 4):
Target id is a 'unsigned long' data, which can be interpreted differently
by each monitoring primitives. For example, it means 'struct pid *' for
the virtual address spaces monitoring, while it means nothing but an
integer to be displayed to debugfs interface users for the physical
address space monitoring. It's flexible but makes code ugly and
type-unsafe[1].
To be prepared for eventual removal of the concept, this commit removes a
use case of the concept in 'init_regions' debugfs file handling. In
detail, this commit replaces use of the id with the index of each target
in the context's targets list.
[1] https://lore.kernel.org/linux-mm/
20211013154535.
4aaeaaf9d0182922e405dd1e@linux-foundation.org/
Link: https://lkml.kernel.org/r/20211230100723.2238-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20211230100723.2238-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>