linux-2.6-block.git
3 months agocachefiles: Set object to close if ondemand_id < 0 in copen
Zizhi Wo [Wed, 22 May 2024 11:43:06 +0000 (19:43 +0800)]
cachefiles: Set object to close if ondemand_id < 0 in copen

If copen is maliciously called in the user mode, it may delete the request
corresponding to the random id. And the request may have not been read yet.

Note that when the object is set to reopen, the open request will be done
with the still reopen state in above case. As a result, the request
corresponding to this object is always skipped in select_req function, so
the read request is never completed and blocks other process.

Fix this issue by simply set object to close if its id < 0 in copen.

Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-11-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
3 months agocachefiles: defer exposing anon_fd until after copy_to_user() succeeds
Baokun Li [Wed, 22 May 2024 11:43:05 +0000 (19:43 +0800)]
cachefiles: defer exposing anon_fd until after copy_to_user() succeeds

After installing the anonymous fd, we can now see it in userland and close
it. However, at this point we may not have gotten the reference count of
the cache, but we will put it during colse fd, so this may cause a cache
UAF.

So grab the cache reference count before fd_install(). In addition, by
kernel convention, fd is taken over by the user land after fd_install(),
and the kernel should not call close_fd() after that, i.e., it should call
fd_install() after everything is ready, thus fd_install() is called after
copy_to_user() succeeds.

Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
Suggested-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-10-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
3 months agocachefiles: never get a new anonymous fd if ondemand_id is valid
Baokun Li [Wed, 22 May 2024 11:43:04 +0000 (19:43 +0800)]
cachefiles: never get a new anonymous fd if ondemand_id is valid

Now every time the daemon reads an open request, it gets a new anonymous fd
and ondemand_id. With the introduction of "restore", it is possible to read
the same open request more than once, and therefore an object can have more
than one anonymous fd.

If the anonymous fd is not unique, the following concurrencies will result
in an fd leak:

     t1     |         t2         |          t3
------------------------------------------------------------
 cachefiles_ondemand_init_object
  cachefiles_ondemand_send_req
   REQ_A = kzalloc(sizeof(*req) + data_len)
   wait_for_completion(&REQ_A->done)
            cachefiles_daemon_read
             cachefiles_ondemand_daemon_read
              REQ_A = cachefiles_ondemand_select_req
              cachefiles_ondemand_get_fd
                load->fd = fd0
                ondemand_id = object_id0
                                  ------ restore ------
                                  cachefiles_ondemand_restore
                                   // restore REQ_A
                                  cachefiles_daemon_read
                                   cachefiles_ondemand_daemon_read
                                    REQ_A = cachefiles_ondemand_select_req
                                      cachefiles_ondemand_get_fd
                                        load->fd = fd1
                                        ondemand_id = object_id1
             process_open_req(REQ_A)
             write(devfd, ("copen %u,%llu", msg->msg_id, size))
             cachefiles_ondemand_copen
              xa_erase(&cache->reqs, id)
              complete(&REQ_A->done)
   kfree(REQ_A)
                                  process_open_req(REQ_A)
                                  // copen fails due to no req
                                  // daemon close(fd1)
                                  cachefiles_ondemand_fd_release
                                   // set object closed
 -- umount --
 cachefiles_withdraw_cookie
  cachefiles_ondemand_clean_object
   cachefiles_ondemand_init_close_req
    if (!cachefiles_ondemand_object_is_open(object))
      return -ENOENT;
    // The fd0 is not closed until the daemon exits.

However, the anonymous fd holds the reference count of the object and the
object holds the reference count of the cookie. So even though the cookie
has been relinquished, it will not be unhashed and freed until the daemon
exits.

In fscache_hash_cookie(), when the same cookie is found in the hash list,
if the cookie is set with the FSCACHE_COOKIE_RELINQUISHED bit, then the new
cookie waits for the old cookie to be unhashed, while the old cookie is
waiting for the leaked fd to be closed, if the daemon does not exit in time
it will trigger a hung task.

To avoid this, allocate a new anonymous fd only if no anonymous fd has
been allocated (ondemand_id == 0) or if the previously allocated anonymous
fd has been closed (ondemand_id == -1). Moreover, returns an error if
ondemand_id is valid, letting the daemon know that the current userland
restore logic is abnormal and needs to be checked.

Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-9-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
3 months agocachefiles: add spin_lock for cachefiles_ondemand_info
Baokun Li [Wed, 22 May 2024 11:43:03 +0000 (19:43 +0800)]
cachefiles: add spin_lock for cachefiles_ondemand_info

The following concurrency may cause a read request to fail to be completed
and result in a hung:

           t1             |             t2
---------------------------------------------------------
                            cachefiles_ondemand_copen
                              req = xa_erase(&cache->reqs, id)
// Anon fd is maliciously closed.
cachefiles_ondemand_fd_release
  xa_lock(&cache->reqs)
  cachefiles_ondemand_set_object_close(object)
  xa_unlock(&cache->reqs)
                              cachefiles_ondemand_set_object_open
                              // No one will ever close it again.
cachefiles_ondemand_daemon_read
  cachefiles_ondemand_select_req
  // Get a read req but its fd is already closed.
  // The daemon can't issue a cread ioctl with an closed fd, then hung.

So add spin_lock for cachefiles_ondemand_info to protect ondemand_id and
state, thus we can avoid the above problem in cachefiles_ondemand_copen()
by using ondemand_id to determine if fd has been closed.

Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-8-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
3 months agocachefiles: add consistency check for copen/cread
Baokun Li [Wed, 22 May 2024 11:43:02 +0000 (19:43 +0800)]
cachefiles: add consistency check for copen/cread

This prevents malicious processes from completing random copen/cread
requests and crashing the system. Added checks are listed below:

  * Generic, copen can only complete open requests, and cread can only
    complete read requests.
  * For copen, ondemand_id must not be 0, because this indicates that the
    request has not been read by the daemon.
  * For cread, the object corresponding to fd and req should be the same.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-7-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
3 months agocachefiles: remove err_put_fd label in cachefiles_ondemand_daemon_read()
Baokun Li [Wed, 22 May 2024 11:43:01 +0000 (19:43 +0800)]
cachefiles: remove err_put_fd label in cachefiles_ondemand_daemon_read()

The err_put_fd label is only used once, so remove it to make the code
more readable. In addition, the logic for deleting error request and
CLOSE request is merged to simplify the code.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-6-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
3 months agocachefiles: fix slab-use-after-free in cachefiles_ondemand_daemon_read()
Baokun Li [Wed, 22 May 2024 11:43:00 +0000 (19:43 +0800)]
cachefiles: fix slab-use-after-free in cachefiles_ondemand_daemon_read()

We got the following issue in a fuzz test of randomly issuing the restore
command:

==================================================================
BUG: KASAN: slab-use-after-free in cachefiles_ondemand_daemon_read+0xb41/0xb60
Read of size 8 at addr ffff888122e84088 by task ondemand-04-dae/963

CPU: 13 PID: 963 Comm: ondemand-04-dae Not tainted 6.8.0-dirty #564
Call Trace:
 kasan_report+0x93/0xc0
 cachefiles_ondemand_daemon_read+0xb41/0xb60
 vfs_read+0x169/0xb50
 ksys_read+0xf5/0x1e0

Allocated by task 116:
 kmem_cache_alloc+0x140/0x3a0
 cachefiles_lookup_cookie+0x140/0xcd0
 fscache_cookie_state_machine+0x43c/0x1230
 [...]

Freed by task 792:
 kmem_cache_free+0xfe/0x390
 cachefiles_put_object+0x241/0x480
 fscache_cookie_state_machine+0x5c8/0x1230
 [...]
==================================================================

Following is the process that triggers the issue:

     mount  |   daemon_thread1    |    daemon_thread2
------------------------------------------------------------
cachefiles_withdraw_cookie
 cachefiles_ondemand_clean_object(object)
  cachefiles_ondemand_send_req
   REQ_A = kzalloc(sizeof(*req) + data_len)
   wait_for_completion(&REQ_A->done)

            cachefiles_daemon_read
             cachefiles_ondemand_daemon_read
              REQ_A = cachefiles_ondemand_select_req
              msg->object_id = req->object->ondemand->ondemand_id
                                  ------ restore ------
                                  cachefiles_ondemand_restore
                                  xas_for_each(&xas, req, ULONG_MAX)
                                   xas_set_mark(&xas, CACHEFILES_REQ_NEW)

                                  cachefiles_daemon_read
                                   cachefiles_ondemand_daemon_read
                                    REQ_A = cachefiles_ondemand_select_req
              copy_to_user(_buffer, msg, n)
               xa_erase(&cache->reqs, id)
               complete(&REQ_A->done)
              ------ close(fd) ------
              cachefiles_ondemand_fd_release
               cachefiles_put_object
 cachefiles_put_object
  kmem_cache_free(cachefiles_object_jar, object)
                                    REQ_A->object->ondemand->ondemand_id
                                     // object UAF !!!

When we see the request within xa_lock, req->object must not have been
freed yet, so grab the reference count of object before xa_unlock to
avoid the above issue.

Fixes: 0a7e54c1959c ("cachefiles: resend an open request if the read request's object is closed")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-5-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
3 months agocachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd()
Baokun Li [Wed, 22 May 2024 11:42:59 +0000 (19:42 +0800)]
cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd()

We got the following issue in a fuzz test of randomly issuing the restore
command:

==================================================================
BUG: KASAN: slab-use-after-free in cachefiles_ondemand_daemon_read+0x609/0xab0
Write of size 4 at addr ffff888109164a80 by task ondemand-04-dae/4962

CPU: 11 PID: 4962 Comm: ondemand-04-dae Not tainted 6.8.0-rc7-dirty #542
Call Trace:
 kasan_report+0x94/0xc0
 cachefiles_ondemand_daemon_read+0x609/0xab0
 vfs_read+0x169/0xb50
 ksys_read+0xf5/0x1e0

Allocated by task 626:
 __kmalloc+0x1df/0x4b0
 cachefiles_ondemand_send_req+0x24d/0x690
 cachefiles_create_tmpfile+0x249/0xb30
 cachefiles_create_file+0x6f/0x140
 cachefiles_look_up_object+0x29c/0xa60
 cachefiles_lookup_cookie+0x37d/0xca0
 fscache_cookie_state_machine+0x43c/0x1230
 [...]

Freed by task 626:
 kfree+0xf1/0x2c0
 cachefiles_ondemand_send_req+0x568/0x690
 cachefiles_create_tmpfile+0x249/0xb30
 cachefiles_create_file+0x6f/0x140
 cachefiles_look_up_object+0x29c/0xa60
 cachefiles_lookup_cookie+0x37d/0xca0
 fscache_cookie_state_machine+0x43c/0x1230
 [...]
==================================================================

Following is the process that triggers the issue:

     mount  |   daemon_thread1    |    daemon_thread2
------------------------------------------------------------
 cachefiles_ondemand_init_object
  cachefiles_ondemand_send_req
   REQ_A = kzalloc(sizeof(*req) + data_len)
   wait_for_completion(&REQ_A->done)

            cachefiles_daemon_read
             cachefiles_ondemand_daemon_read
              REQ_A = cachefiles_ondemand_select_req
              cachefiles_ondemand_get_fd
              copy_to_user(_buffer, msg, n)
            process_open_req(REQ_A)
                                  ------ restore ------
                                  cachefiles_ondemand_restore
                                  xas_for_each(&xas, req, ULONG_MAX)
                                   xas_set_mark(&xas, CACHEFILES_REQ_NEW);

                                  cachefiles_daemon_read
                                   cachefiles_ondemand_daemon_read
                                    REQ_A = cachefiles_ondemand_select_req

             write(devfd, ("copen %u,%llu", msg->msg_id, size));
             cachefiles_ondemand_copen
              xa_erase(&cache->reqs, id)
              complete(&REQ_A->done)
   kfree(REQ_A)
                                    cachefiles_ondemand_get_fd(REQ_A)
                                     fd = get_unused_fd_flags
                                     file = anon_inode_getfile
                                     fd_install(fd, file)
                                     load = (void *)REQ_A->msg.data;
                                     load->fd = fd;
                                     // load UAF !!!

This issue is caused by issuing a restore command when the daemon is still
alive, which results in a request being processed multiple times thus
triggering a UAF. So to avoid this problem, add an additional reference
count to cachefiles_req, which is held while waiting and reading, and then
released when the waiting and reading is over.

Note that since there is only one reference count for waiting, we need to
avoid the same request being completed multiple times, so we can only
complete the request if it is successfully removed from the xarray.

Fixes: e73fa11a356c ("cachefiles: add restore command to recover inflight ondemand read requests")
Suggested-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-4-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
3 months agocachefiles: remove requests from xarray during flushing requests
Baokun Li [Wed, 22 May 2024 11:42:58 +0000 (19:42 +0800)]
cachefiles: remove requests from xarray during flushing requests

Even with CACHEFILES_DEAD set, we can still read the requests, so in the
following concurrency the request may be used after it has been freed:

     mount  |   daemon_thread1    |    daemon_thread2
------------------------------------------------------------
 cachefiles_ondemand_init_object
  cachefiles_ondemand_send_req
   REQ_A = kzalloc(sizeof(*req) + data_len)
   wait_for_completion(&REQ_A->done)
            cachefiles_daemon_read
             cachefiles_ondemand_daemon_read
                                  // close dev fd
                                  cachefiles_flush_reqs
                                   complete(&REQ_A->done)
   kfree(REQ_A)
              xa_lock(&cache->reqs);
              cachefiles_ondemand_select_req
                req->msg.opcode != CACHEFILES_OP_READ
                // req use-after-free !!!
              xa_unlock(&cache->reqs);
                                   xa_destroy(&cache->reqs)

Hence remove requests from cache->reqs when flushing them to avoid
accessing freed requests.

Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-3-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
3 months agocachefiles: add output string to cachefiles_obj_[get|put]_ondemand_fd
Baokun Li [Wed, 22 May 2024 11:42:57 +0000 (19:42 +0800)]
cachefiles: add output string to cachefiles_obj_[get|put]_ondemand_fd

This lets us see the correct trace output.

Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-2-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
4 months agoReapply "arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD"
Will Deacon [Wed, 22 May 2024 10:53:05 +0000 (11:53 +0100)]
Reapply "arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD"

This reverts commit b8995a18417088bb53f87c49d200ec72a9dd4ec1.

Ard managed to reproduce the dm-crypt corruption problem and got to the
bottom of it, so re-apply the problematic patch in preparation for
fixing things properly.

Cc: stable@vger.kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
4 months agoarm64: asm-bug: Add .align 2 to the end of __BUG_ENTRY
Jiangfeng Xiao [Mon, 20 May 2024 13:34:37 +0000 (21:34 +0800)]
arm64: asm-bug: Add .align 2 to the end of __BUG_ENTRY

When CONFIG_DEBUG_BUGVERBOSE=n, we fail to add necessary padding bytes
to bug_table entries, and as a result the last entry in a bug table will
be ignored, potentially leading to an unexpected panic(). All prior
entries in the table will be handled correctly.

The arm64 ABI requires that struct fields of up to 8 bytes are
naturally-aligned, with padding added within a struct such that struct
are suitably aligned within arrays.

When CONFIG_DEBUG_BUGVERPOSE=y, the layout of a bug_entry is:

struct bug_entry {
signed int      bug_addr_disp; // 4 bytes
signed int      file_disp; // 4 bytes
unsigned short  line; // 2 bytes
unsigned short  flags; // 2 bytes
}

... with 12 bytes total, requiring 4-byte alignment.

When CONFIG_DEBUG_BUGVERBOSE=n, the layout of a bug_entry is:

struct bug_entry {
signed int      bug_addr_disp; // 4 bytes
unsigned short  flags; // 2 bytes
< implicit padding > // 2 bytes
}

... with 8 bytes total, with 6 bytes of data and 2 bytes of trailing
padding, requiring 4-byte alginment.

When we create a bug_entry in assembly, we align the start of the entry
to 4 bytes, which implicitly handles padding for any prior entries.
However, we do not align the end of the entry, and so when
CONFIG_DEBUG_BUGVERBOSE=n, the final entry lacks the trailing padding
bytes.

For the main kernel image this is not a problem as find_bug() doesn't
depend on the trailing padding bytes when searching for entries:

for (bug = __start___bug_table; bug < __stop___bug_table; ++bug)
if (bugaddr == bug_addr(bug))
return bug;

However for modules, module_bug_finalize() depends on the trailing
bytes when calculating the number of entries:

mod->num_bugs = sechdrs[i].sh_size / sizeof(struct bug_entry);

... and as the last bug_entry lacks the necessary padding bytes, this entry
will not be counted, e.g. in the case of a single entry:

sechdrs[i].sh_size == 6
sizeof(struct bug_entry) == 8;

sechdrs[i].sh_size / sizeof(struct bug_entry) == 0;

Consequently module_find_bug() will miss the last bug_entry when it does:

for (i = 0; i < mod->num_bugs; ++i, ++bug)
if (bugaddr == bug_addr(bug))
goto out;

... which can lead to a kenrel panic due to an unhandled bug.

This can be demonstrated with the following module:

static int __init buginit(void)
{
WARN(1, "hello\n");
return 0;
}

static void __exit bugexit(void)
{
}

module_init(buginit);
module_exit(bugexit);
MODULE_LICENSE("GPL");

... which will trigger a kernel panic when loaded:

------------[ cut here ]------------
hello
Unexpected kernel BRK exception at EL1
Internal error: BRK handler: 00000000f2000800 [#1] PREEMPT SMP
Modules linked in: hello(O+)
CPU: 0 PID: 50 Comm: insmod Tainted: G           O       6.9.1 #8
Hardware name: linux,dummy-virt (DT)
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : buginit+0x18/0x1000 [hello]
lr : buginit+0x18/0x1000 [hello]
sp : ffff800080533ae0
x29: ffff800080533ae0 x28: 0000000000000000 x27: 0000000000000000
x26: ffffaba8c4e70510 x25: ffff800080533c30 x24: ffffaba8c4a28a58
x23: 0000000000000000 x22: 0000000000000000 x21: ffff3947c0eab3c0
x20: ffffaba8c4e3f000 x19: ffffaba846464000 x18: 0000000000000006
x17: 0000000000000000 x16: ffffaba8c2492834 x15: 0720072007200720
x14: 0720072007200720 x13: ffffaba8c49b27c8 x12: 0000000000000312
x11: 0000000000000106 x10: ffffaba8c4a0a7c8 x9 : ffffaba8c49b27c8
x8 : 00000000ffffefff x7 : ffffaba8c4a0a7c8 x6 : 80000000fffff000
x5 : 0000000000000107 x4 : 0000000000000000 x3 : 0000000000000000
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff3947c0eab3c0
Call trace:
 buginit+0x18/0x1000 [hello]
 do_one_initcall+0x80/0x1c8
 do_init_module+0x60/0x218
 load_module+0x1ba4/0x1d70
 __do_sys_init_module+0x198/0x1d0
 __arm64_sys_init_module+0x1c/0x28
 invoke_syscall+0x48/0x114
 el0_svc_common.constprop.0+0x40/0xe0
 do_el0_svc+0x1c/0x28
 el0_svc+0x34/0xd8
 el0t_64_sync_handler+0x120/0x12c
 el0t_64_sync+0x190/0x194
Code: d0ffffe0 910003fd 91000000 9400000b (d4210000)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: BRK handler: Fatal exception

Fix this by always aligning the end of a bug_entry to 4 bytes, which is
correct regardless of CONFIG_DEBUG_BUGVERBOSE.

Fixes: 9fb7410f955f ("arm64/BUG: Use BRK instruction for generic BUG traps")

Signed-off-by: Yuanbin Xie <xieyuanbin1@huawei.com>
Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/1716212077-43826-1-git-send-email-xiaojiangfeng@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
4 months agoperf/arm-dmc620: Fix lockdep assert in ->event_init()
Namhyung Kim [Tue, 14 May 2024 18:00:50 +0000 (11:00 -0700)]
perf/arm-dmc620: Fix lockdep assert in ->event_init()

for_each_sibling_event() checks leader's ctx but it doesn't have the ctx
yet if it's the leader.  Like in perf_event_validate_size(), we should
skip checking siblings in that case.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Fixes: f3c0eba28704 ("perf: Add a few assertions")
Reported-by: Greg Thelen <gthelen@google.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Tuan Phan <tuanphan@os.amperecomputing.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20240514180050.182454-1-namhyung@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
4 months agoRevert "arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD"
Will Deacon [Fri, 17 May 2024 11:55:55 +0000 (12:55 +0100)]
Revert "arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD"

This reverts commit 2632e25217696712681dd1f3ecc0d71624ea3b23.

Johannes (and others) report data corruption with dm-crypt on Apple M1
which has been bisected to this change. Revert the offending commit
while we figure out what's going on.

Cc: stable@vger.kernel.org
Reported-by: Johannes Nixdorf <mixi@shadowice.org>
Link: https://lore.kernel.org/all/D1B7GPIR9K1E.5JFV37G0YTIF@shadowice.org/
Signed-off-by: Will Deacon <will@kernel.org>
4 months agoMerge branch 'for-next/errata' into for-next/core
Will Deacon [Fri, 10 May 2024 13:34:37 +0000 (14:34 +0100)]
Merge branch 'for-next/errata' into for-next/core

* for-next/errata:
  arm64: errata: Add workaround for Arm errata 3194386 and 3312417
  arm64: cputype: Add Neoverse-V3 definitions
  arm64: cputype: Add Cortex-X4 definitions
  arm64: barrier: Restore spec_bar() macro

4 months agoarm64: errata: Add workaround for Arm errata 3194386 and 3312417
Mark Rutland [Wed, 8 May 2024 08:14:00 +0000 (09:14 +0100)]
arm64: errata: Add workaround for Arm errata 3194386 and 3312417

Cortex-X4 and Neoverse-V3 suffer from errata whereby an MSR to the SSBS
special-purpose register does not affect subsequent speculative
instructions, permitting speculative store bypassing for a window of
time. This is described in their Software Developer Errata Notice (SDEN)
documents:

* Cortex-X4 SDEN v8.0, erratum 3194386:
  https://developer.arm.com/documentation/SDEN-2432808/0800/

* Neoverse-V3 SDEN v6.0, erratum 3312417:
  https://developer.arm.com/documentation/SDEN-2891958/0600/

To workaround these errata, it is necessary to place a speculation
barrier (SB) after MSR to the SSBS special-purpose register. This patch
adds the requisite SB after writes to SSBS within the kernel, and hides
the presence of SSBS from EL0 such that userspace software which cares
about SSBS will manipulate this via prctl(PR_GET_SPECULATION_CTRL, ...).

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240508081400.235362-5-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
4 months agoarm64: cputype: Add Neoverse-V3 definitions
Mark Rutland [Wed, 8 May 2024 08:13:59 +0000 (09:13 +0100)]
arm64: cputype: Add Neoverse-V3 definitions

Add cputype definitions for Neoverse-V3. These will be used for errata
detection in subsequent patches.

These values can be found in Table B-249 ("MIDR_EL1 bit descriptions")
in issue 0001-04 of the Neoverse-V3 TRM, which can be found at:

  https://developer.arm.com/documentation/107734/0001/?lang=en

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240508081400.235362-4-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
4 months agoarm64: cputype: Add Cortex-X4 definitions
Mark Rutland [Wed, 8 May 2024 08:13:58 +0000 (09:13 +0100)]
arm64: cputype: Add Cortex-X4 definitions

Add cputype definitions for Cortex-X4. These will be used for errata
detection in subsequent patches.

These values can be found in Table B-249 ("MIDR_EL1 bit descriptions")
in issue 0002-05 of the Cortex-X4 TRM, which can be found at:

  https://developer.arm.com/documentation/102484/0002/?lang=en

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240508081400.235362-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
4 months agoarm64: barrier: Restore spec_bar() macro
Mark Rutland [Wed, 8 May 2024 08:13:57 +0000 (09:13 +0100)]
arm64: barrier: Restore spec_bar() macro

Upcoming errata workarounds will need to use SB from C code. Restore the
spec_bar() macro so that we can use SB.

This is effectively a revert of commit:

  4f30ba1cce36d413 ("arm64: barrier: Remove spec_bar() macro")

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240508081400.235362-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
4 months agoMerge branch 'for-next/tlbi' into for-next/core
Will Deacon [Thu, 9 May 2024 14:56:26 +0000 (15:56 +0100)]
Merge branch 'for-next/tlbi' into for-next/core

* for-next/tlbi:
  arm64: tlb: Allow range operation for MAX_TLBI_RANGE_PAGES
  arm64: tlb: Improve __TLBI_VADDR_RANGE()
  arm64: tlb: Fix TLBI RANGE operand

4 months agoMerge branch 'for-next/selftests' into for-next/core
Will Deacon [Thu, 9 May 2024 14:56:18 +0000 (15:56 +0100)]
Merge branch 'for-next/selftests' into for-next/core

* for-next/selftests:
  kselftest: arm64: Add a null pointer check
  kselftest/arm64: Remove unused parameters in abi test

4 months agoMerge branch 'for-next/perf' into for-next/core
Will Deacon [Thu, 9 May 2024 14:56:10 +0000 (15:56 +0100)]
Merge branch 'for-next/perf' into for-next/core

* for-next/perf: (41 commits)
  arm64: Add USER_STACKTRACE support
  drivers/perf: hisi: hns3: Actually use devm_add_action_or_reset()
  drivers/perf: hisi: hns3: Fix out-of-bound access when valid event group
  drivers/perf: hisi_pcie: Fix out-of-bound access when valid event group
  perf/arm-spe: Assign parents for event_source device
  perf/arm-smmuv3: Assign parents for event_source device
  perf/arm-dsu: Assign parents for event_source device
  perf/arm-dmc620: Assign parents for event_source device
  perf/arm-ccn: Assign parents for event_source device
  perf/arm-cci: Assign parents for event_source device
  perf/alibaba_uncore: Assign parents for event_source device
  perf/arm_pmu: Assign parents for event_source devices
  perf/imx_ddr: Assign parents for event_source devices
  perf/qcom: Assign parents for event_source devices
  Documentation: qcom-pmu: Use /sys/bus/event_source/devices paths
  perf/riscv: Assign parents for event_source devices
  perf/thunderx2: Assign parents for event_source devices
  Documentation: thunderx2-pmu: Use /sys/bus/event_source/devices paths
  perf/xgene: Assign parents for event_source devices
  Documentation: xgene-pmu: Use /sys/bus/event_source/devices paths
  ...

4 months agoMerge branch 'for-next/mm' into for-next/core
Will Deacon [Thu, 9 May 2024 14:55:54 +0000 (15:55 +0100)]
Merge branch 'for-next/mm' into for-next/core

* for-next/mm:
  arm64/mm: Fix pud_user_accessible_page() for PGTABLE_LEVELS <= 2
  arm64/mm: Add uffd write-protect support
  arm64/mm: Move PTE_PRESENT_INVALID to overlay PTE_NG
  arm64/mm: Remove PTE_PROT_NONE bit
  arm64/mm: generalize PMD_PRESENT_INVALID for all levels
  arm64: mm: Don't remap pgtables for allocate vs populate
  arm64: mm: Batch dsb and isb when populating pgtables
  arm64: mm: Don't remap pgtables per-cont(pte|pmd) block

4 months agoMerge branch 'for-next/misc' into for-next/core
Will Deacon [Thu, 9 May 2024 14:55:47 +0000 (15:55 +0100)]
Merge branch 'for-next/misc' into for-next/core

* for-next/misc:
  arm64: simplify arch_static_branch/_jump function
  arm64: Add the arm64.no32bit_el0 command line option
  arm64: defer clearing DAIF.D
  arm64: assembler: update stale comment for disable_step_tsk
  arm64/sysreg: Update PIE permission encodings
  arm64: Add Neoverse-V2 part
  arm64: Remove unnecessary irqflags alternative.h include

4 months agoMerge branch 'for-next/kbuild' into for-next/core
Will Deacon [Thu, 9 May 2024 14:55:39 +0000 (15:55 +0100)]
Merge branch 'for-next/kbuild' into for-next/core

* for-next/kbuild:
  arm64: boot: Support Flat Image Tree
  arm64: Add BOOT_TARGETS variable

4 months agoMerge branch 'for-next/acpi' into for-next/core
Will Deacon [Thu, 9 May 2024 14:55:27 +0000 (15:55 +0100)]
Merge branch 'for-next/acpi' into for-next/core

* for-next/acpi:
  arm64: acpi: Honour firmware_signature field of FACS, if it exists
  ACPICA: Detect FACS even for hardware reduced platforms

4 months agoarm64/mm: Fix pud_user_accessible_page() for PGTABLE_LEVELS <= 2
Ryan Roberts [Thu, 9 May 2024 12:28:42 +0000 (13:28 +0100)]
arm64/mm: Fix pud_user_accessible_page() for PGTABLE_LEVELS <= 2

The recent change to use pud_valid() as part of the implementation of
pud_user_accessible_page() fails to build when PGTABLE_LEVELS <= 2
because pud_valid() is not defined in that case.

Fix this by defining pud_valid() to false for this case. This means that
pud_user_accessible_page() will correctly always return false for this
config.

Fixes: f0f5863a0fb0 ("arm64/mm: Remove PTE_PROT_NONE bit")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405082221.43rfWxz5-lkp@intel.com/
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/r/20240509122844.563320-1-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
4 months agoarm64/mm: Add uffd write-protect support
Ryan Roberts [Fri, 3 May 2024 14:46:02 +0000 (15:46 +0100)]
arm64/mm: Add uffd write-protect support

Let's use the newly-free PTE SW bit (58) to add support for uffd-wp.

The standard handlers are implemented for set/test/clear for both pte
and pmd. Additionally we must also track the uffd-wp state as a pte swp
bit, so use a free swap pte bit (3).

Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/r/20240503144604.151095-5-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
4 months agoarm64/mm: Move PTE_PRESENT_INVALID to overlay PTE_NG
Ryan Roberts [Fri, 3 May 2024 14:46:01 +0000 (15:46 +0100)]
arm64/mm: Move PTE_PRESENT_INVALID to overlay PTE_NG

PTE_PRESENT_INVALID was previously occupying bit 59, which when a PTE is
valid can either be IGNORED, PBHA[0] or AttrIndex[3], depending on the
HW configuration. In practice this is currently not a problem because
PTE_PRESENT_INVALID can only be 1 when PTE_VALID=0 and upstream Linux
always requires the bit set to 0 for a valid pte.

However, if in future Linux wants to use the field (e.g. AttrIndex[3])
then we could end up with confusion when PTE_PRESENT_INVALID comes along
and corrupts the field - we would ideally want to preserve it even for
an invalid (but present) pte.

The other problem with bit 59 is that it prevents the offset field of a
swap entry within a swap pte from growing beyond 51 bits. By moving
PTE_PRESENT_INVALID to a low bit we can lay the swap pte out so that the
offset field could grow to 52 bits in future.

So let's move PTE_PRESENT_INVALID to overlay PTE_NG (bit 11).

There is no need to persist NG for a present-invalid entry; it is always
set for user mappings and is not used by SW to derive any state from the
pte. PTE_NS was considered instead of PTE_NG, but it is RES0 for
non-secure SW, so there is a chance that future architecture may
allocate the bit and we may therefore need to persist that bit for
present-invalid ptes.

These are both marginal benefits, but make things a bit tidier in my
opinion.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/r/20240503144604.151095-4-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
4 months agoarm64/mm: Remove PTE_PROT_NONE bit
Ryan Roberts [Fri, 3 May 2024 14:46:00 +0000 (15:46 +0100)]
arm64/mm: Remove PTE_PROT_NONE bit

Currently the PTE_PRESENT_INVALID and PTE_PROT_NONE functionality
explicitly occupy 2 bits in the PTE when PTE_VALID/PMD_SECT_VALID is
clear. This has 2 significant consequences:

  - PTE_PROT_NONE consumes a precious SW PTE bit that could be used for
    other things.
  - The swap pte layout must reserve those same 2 bits and ensure they
    are both always zero for a swap pte. It would be nice to reclaim at
    least one of those bits.

But PTE_PRESENT_INVALID, which since the previous patch, applies
uniformly to page/block descriptors at any level when PTE_VALID is
clear, can already give us most of what PTE_PROT_NONE requires: If it is
set, then the pte is still considered present; pte_present() returns
true and all the fields in the pte follow the HW interpretation (e.g. SW
can safely call pte_pfn(), etc). But crucially, the HW treats the pte as
invalid and will fault if it hits.

So let's remove PTE_PROT_NONE entirely and instead represent PROT_NONE
as a present but invalid pte (PTE_VALID=0, PTE_PRESENT_INVALID=1) with
PTE_USER=0 and PTE_UXN=1. This is a unique combination that is not used
anywhere else.

The net result is a clearer, simpler, more generic encoding scheme that
applies uniformly to all levels. Additionally we free up a PTE SW bit
and a swap pte bit (bit 58 in both cases).

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/r/20240503144604.151095-3-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
4 months agoarm64/mm: generalize PMD_PRESENT_INVALID for all levels
Ryan Roberts [Fri, 3 May 2024 14:45:59 +0000 (15:45 +0100)]
arm64/mm: generalize PMD_PRESENT_INVALID for all levels

As preparation for the next patch, which frees up the PTE_PROT_NONE
present pte and swap pte bit, generalize PMD_PRESENT_INVALID to
PTE_PRESENT_INVALID. This will then be used to mark PROT_NONE ptes (and
entries at any other level) in the next patch.

While we're at it, fix up the swap pte format comment to include
PTE_PRESENT_INVALID. This is not new, it just wasn't previously
documented.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/r/20240503144604.151095-2-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
4 months agoarm64: simplify arch_static_branch/_jump function
George Guo [Tue, 30 Apr 2024 08:56:55 +0000 (16:56 +0800)]
arm64: simplify arch_static_branch/_jump function

Extracted the jump table definition code from the arch_static_branch and
arch_static_branch_jump functions into a macro JUMP_TABLE_ENTRY to reduce
code duplication.

Signed-off-by: George Guo <guodongtai@kylinos.cn>
Link: https://lore.kernel.org/r/20240430085655.2798551-2-dongtai.guo@linux.dev
Signed-off-by: Will Deacon <will@kernel.org>
4 months agoarm64: Add USER_STACKTRACE support
chenqiwu [Tue, 19 Dec 2023 02:22:29 +0000 (10:22 +0800)]
arm64: Add USER_STACKTRACE support

Currently, userstacktrace is unsupported for ftrace and uprobe
tracers on arm64. This patch uses the perf_callchain_user() code
as blueprint to implement the arch_stack_walk_user() which add
userstacktrace support on arm64.
Meanwhile, we can use arch_stack_walk_user() to simplify the
implementation of perf_callchain_user().
This patch is tested pass with ftrace, uprobe and perf tracers
profiling userstacktrace cases.

Tested-by: chenqiwu <qiwu.chen@transsion.com>
Signed-off-by: chenqiwu <qiwu.chen@transsion.com>
Link: https://lore.kernel.org/r/20231219022229.10230-1-qiwu.chen@transsion.com
Signed-off-by: Will Deacon <will@kernel.org>
4 months agoarm64: Add the arm64.no32bit_el0 command line option
Andrea della Porta [Mon, 29 Apr 2024 10:28:33 +0000 (12:28 +0200)]
arm64: Add the arm64.no32bit_el0 command line option

Introducing the field 'el0' to the idreg-override for register
ID_AA64PFR0_EL1. This field is also aliased to the new kernel
command line option 'arm64.no32bit_el0' as a more recognizable
and mnemonic name to disable the execution of 32 bit userspace
applications (i.e. avoid Aarch32 execution state in EL0) from
kernel command line.

Link: https://lore.kernel.org/all/20240207105847.7739-1-andrea.porta@suse.com/
Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
Link: https://lore.kernel.org/r/20240429102833.6426-1-andrea.porta@suse.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agodrivers/perf: hisi: hns3: Actually use devm_add_action_or_reset()
Hao Chen [Thu, 25 Apr 2024 12:46:27 +0000 (20:46 +0800)]
drivers/perf: hisi: hns3: Actually use devm_add_action_or_reset()

pci_alloc_irq_vectors() allocates an irq vector. When devm_add_action()
fails, the irq vector is not freed, which leads to a memory leak.

Replace the devm_add_action with devm_add_action_or_reset to ensure
the irq vector can be destroyed when it fails.

Fixes: 66637ab137b4 ("drivers/perf: hisi: add driver for HNS3 PMU")
Signed-off-by: Hao Chen <chenhao418@huawei.com>
Signed-off-by: Junhao He <hejunhao3@huawei.com>
Reviewed-by: Jijie Shao <shaojijie@huawei.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240425124627.13764-4-hejunhao3@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agodrivers/perf: hisi: hns3: Fix out-of-bound access when valid event group
Junhao He [Thu, 25 Apr 2024 12:46:26 +0000 (20:46 +0800)]
drivers/perf: hisi: hns3: Fix out-of-bound access when valid event group

The perf tool allows users to create event groups through following
cmd [1], but the driver does not check whether the array index is out
of bounds when writing data to the event_group array. If the number of
events in an event_group is greater than HNS3_PMU_MAX_HW_EVENTS, the
memory write overflow of event_group array occurs.

Add array index check to fix the possible array out of bounds violation,
and return directly when write new events are written to array bounds.

There are 9 different events in an event_group.
[1] perf stat -e '{pmu/event1/, ... ,pmu/event9/}

Fixes: 66637ab137b4 ("drivers/perf: hisi: add driver for HNS3 PMU")
Signed-off-by: Junhao He <hejunhao3@huawei.com>
Signed-off-by: Hao Chen <chenhao418@huawei.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Jijie Shao <shaojijie@huawei.com>
Link: https://lore.kernel.org/r/20240425124627.13764-3-hejunhao3@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agodrivers/perf: hisi_pcie: Fix out-of-bound access when valid event group
Junhao He [Thu, 25 Apr 2024 12:46:25 +0000 (20:46 +0800)]
drivers/perf: hisi_pcie: Fix out-of-bound access when valid event group

The perf tool allows users to create event groups through following
cmd [1], but the driver does not check whether the array index is out of
bounds when writing data to the event_group array. If the number of events
in an event_group is greater than HISI_PCIE_MAX_COUNTERS, the memory write
overflow of event_group array occurs.

Add array index check to fix the possible array out of bounds violation,
and return directly when write new events are written to array bounds.

There are 9 different events in an event_group.
[1] perf stat -e '{pmu/event1/, ... ,pmu/event9/}'

Fixes: 8404b0fbc7fb ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU")
Signed-off-by: Junhao He <hejunhao3@huawei.com>
Reviewed-by: Jijie Shao <shaojijie@huawei.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240425124627.13764-2-hejunhao3@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agokselftest: arm64: Add a null pointer check
Kunwu Chan [Tue, 23 Apr 2024 08:21:02 +0000 (16:21 +0800)]
kselftest: arm64: Add a null pointer check

There is a 'malloc' call, which can be unsuccessful.
This patch will add the malloc failure checking
to avoid possible null dereference and give more information
about test fail reasons.

Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Link: https://lore.kernel.org/r/20240423082102.2018886-1-chentao@kylinos.cn
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoarm64: defer clearing DAIF.D
Mark Rutland [Mon, 22 Apr 2024 11:35:23 +0000 (12:35 +0100)]
arm64: defer clearing DAIF.D

For historical reasons we unmask debug exceptions in __cpu_setup(), but
it's not necessary to unmask debug exceptions this early in the
boot/idle entry paths. It would be better to unmask debug exceptions
later in C code as this simplifies the current code and will make it
easier to rework exception masking logic to handle non-DAIF bits in
future (e.g. PSTATE.{ALLINT,PM}).

We started clearing DAIF.D in __cpu_setup() in commit:

  2ce39ad15182604b ("arm64: debug: unmask PSTATE.D earlier")

At the time, we needed to ensure that DAIF.D was clear on the primary
CPU before scheduling and preemption were possible, and chose to do this
in __cpu_setup() so that this occurred in the same place for primary and
secondary CPUs. As we cannot handle debug exceptions this early, we
placed an ISB between initializing MDSCR_EL1 and clearing DAIF.D so that
no exceptions should be triggered.

Subsequently we rewrote the return-from-{idle,suspend} paths to use
__cpu_setup() in commit:

  cabe1c81ea5be983 ("arm64: Change cpu_resume() to enable mmu early then access sleep_sp by va")

... which allowed for earlier use of the MMU and had the desirable
property of using the same code to reset the CPU in the cold and warm
boot paths. This introduced a bug: DAIF.D was clear while
cpu_do_resume() restored MDSCR_EL1 and other control registers (e.g.
breakpoint/watchpoint control/value registers), and so we could
unexpectedly take debug exceptions.

We fixed that in commit:

  744c6c37cc18705d ("arm64: kernel: Fix unmasked debug exceptions when restoring mdscr_el1")

... by having cpu_do_resume() use the `disable_dbg` macro to set DAIF.D
before restoring MDSCR_EL1 and other control registers. This relies on
DAIF.D being subsequently cleared again in cpu_resume().

Subsequently we reworked DAIF masking in commit:

  0fbeb318754860b3 ("arm64: explicitly mask all exceptions")

... where we began enforcing a policy that DAIF.D being set implies all
other DAIF bits are set, and so e.g. we cannot take an IRQ while DAIF.D
is set. As part of this the use of `disable_dbg` in cpu_resume() was
replaced with `disable_daif` for consistency with the rest of the
kernel.

These days, there's no need to clear DAIF.D early within __cpu_setup():

* setup_arch() clears DAIF.DA before scheduling and preemption are
  possible on the primary CPU, avoiding the problem we we originally
  trying to work around.

  Note: DAIF.IF get cleared later when interrupts are enabled for the
  first time.

* secondary_start_kernel() clears all DAIF bits before scheduling and
  preemption are possible on secondary CPUs.

  Note: with pseudo-NMI, the PMR is initialized here before any DAIF
  bits are cleared. Similar will be necessary for the architectural NMI.

* cpu_suspend() restores all DAIF bits when returning from idle,
  ensuring that we don't unexpectedly leave DAIF.D clear or set.

  Note: with pseudo-NMI, the PMR is initialized here before DAIF is
  cleared. Similar will be necessary for the architectural NMI.

This patch removes the unmasking of debug exceptions from __cpu_setup(),
relying on the above locations to initialize DAIF. This allows some
other cleanups:

* It is no longer necessary for cpu_resume() to explicitly mask debug
  (or other) exceptions, as it is always called with all DAIF bits set.
  Thus we drop the use of `disable_daif`.

* The `enable_dbg` macro is no longer used, and so is dropped.

* It is no longer necessary to have an ISB immediately after
  initializing MDSCR_EL1 in __cpu_setup(), and we can revert to relying
  on the context synchronization that occurs when the MMU is enabled
  between __cpu_setup() and code which clears DAIF.D

Comments are added to setup_arch() and secondary_start_kernel() to
explain the initial unmasking of the DAIF bits.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240422113523.4070414-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoarm64: assembler: update stale comment for disable_step_tsk
Mark Rutland [Mon, 22 Apr 2024 11:35:22 +0000 (12:35 +0100)]
arm64: assembler: update stale comment for disable_step_tsk

A comment in the disable_step_tsk macro refers to synchronising with
enable_dbg, as historically the entry used enable_dbg to unmask debug
exceptions after disabling single-stepping.

These days the unmasking happens in entry-common.c via
local_daif_restore() or local_daif_inherit(), so the comment is stale.
This logic is likely to chang in future, so it would be best to avoid
referring to those macros specifically.

Update the comment to take this into account, and describe it in terms
of clearing DAIF.D so that it doesn't macro where this logic lives nor
what it is called.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240422113523.4070414-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoarm64/sysreg: Update PIE permission encodings
Shiqi Liu [Sun, 21 Apr 2024 06:33:28 +0000 (14:33 +0800)]
arm64/sysreg: Update PIE permission encodings

Fix left shift overflow issue when the parameter idx is greater than or
equal to 8 in the calculation of perm in PIRx_ELx_PERM macro.

Fix this by modifying the encoding to use a long integer type.

Signed-off-by: Shiqi Liu <shiqiliu@hust.edu.cn>
Acked-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20240421063328.29710-1-shiqiliu@hust.edu.cn
Signed-off-by: Will Deacon <will@kernel.org>
5 months agokselftest/arm64: Remove unused parameters in abi test
xieming [Mon, 22 Apr 2024 01:57:30 +0000 (09:57 +0800)]
kselftest/arm64: Remove unused parameters in abi test

Remove unused parameter i in tpidr2.c main function.

Signed-off-by: xieming <xieming@kylinos.cn>
Link: https://lore.kernel.org/r/20240422015730.89805-1-xieming@kylinos.cn
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/arm-spe: Assign parents for event_source device
Jonathan Cameron [Fri, 12 Apr 2024 16:10:50 +0000 (17:10 +0100)]
perf/arm-spe: Assign parents for event_source device

Currently the PMU device appears directly under /sys/devices/
Only root busses should appear there, so instead assign the pmu->dev
parent to be the platform device.

Link: https://lore.kernel.org/linux-cxl/ZCLI9A40PJsyqAmq@kroah.com/
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-24-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/arm-smmuv3: Assign parents for event_source device
Jonathan Cameron [Fri, 12 Apr 2024 16:10:49 +0000 (17:10 +0100)]
perf/arm-smmuv3: Assign parents for event_source device

Currently the PMU device appears directly under /sys/devices/
Only root busses should appear there, so instead assign the pmu->dev
parent to be the platform device.

Link: https://lore.kernel.org/linux-cxl/ZCLI9A40PJsyqAmq@kroah.com/
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-23-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/arm-dsu: Assign parents for event_source device
Jonathan Cameron [Fri, 12 Apr 2024 16:10:48 +0000 (17:10 +0100)]
perf/arm-dsu: Assign parents for event_source device

Currently the PMU device appears directly under /sys/devices/
Only root busses should appear there, so instead assign the pmu->dev
parent to be the platform device.

Link: https://lore.kernel.org/linux-cxl/ZCLI9A40PJsyqAmq@kroah.com/
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-22-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/arm-dmc620: Assign parents for event_source device
Jonathan Cameron [Fri, 12 Apr 2024 16:10:47 +0000 (17:10 +0100)]
perf/arm-dmc620: Assign parents for event_source device

Currently the PMU device appears directly under /sys/devices/
Only root busses should appear there, so instead assign the pmu->dev
parent to be the platform device.

Link: https://lore.kernel.org/linux-cxl/ZCLI9A40PJsyqAmq@kroah.com/
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-21-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/arm-ccn: Assign parents for event_source device
Jonathan Cameron [Fri, 12 Apr 2024 16:10:46 +0000 (17:10 +0100)]
perf/arm-ccn: Assign parents for event_source device

Currently the PMU device appears directly under /sys/devices/
Only root busses should appear there, so instead assign the pmu->dev
parent to be the platform device.

Link: https://lore.kernel.org/linux-cxl/ZCLI9A40PJsyqAmq@kroah.com/
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-20-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/arm-cci: Assign parents for event_source device
Jonathan Cameron [Fri, 12 Apr 2024 16:10:45 +0000 (17:10 +0100)]
perf/arm-cci: Assign parents for event_source device

Currently the PMU device appears directly under /sys/devices/
Only root busses should appear there, so instead assign the pmu->dev
parent to be the platform device.

Link: https://lore.kernel.org/linux-cxl/ZCLI9A40PJsyqAmq@kroah.com/
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-19-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/alibaba_uncore: Assign parents for event_source device
Jonathan Cameron [Fri, 12 Apr 2024 16:10:44 +0000 (17:10 +0100)]
perf/alibaba_uncore: Assign parents for event_source device

Currently the PMU device appears directly under /sys/devices/
Only root busses should appear there, so instead assign the pmu->dev
parent to be the platform device.

Link: https://lore.kernel.org/linux-cxl/ZCLI9A40PJsyqAmq@kroah.com/
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-18-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/arm_pmu: Assign parents for event_source devices
Jonathan Cameron [Fri, 12 Apr 2024 16:10:43 +0000 (17:10 +0100)]
perf/arm_pmu: Assign parents for event_source devices

Currently the PMU device appears directly under /sys/devices/
Only root busses should appear there, so instead assign the pmu->dev
parent to be the platform device.

Link: https://lore.kernel.org/linux-cxl/ZCLI9A40PJsyqAmq@kroah.com/
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20240412161057.14099-17-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/imx_ddr: Assign parents for event_source devices
Jonathan Cameron [Fri, 12 Apr 2024 16:10:42 +0000 (17:10 +0100)]
perf/imx_ddr: Assign parents for event_source devices

Currently all this device appear directly under /sys/devices/
Only root busses should appear there, so instead assign the pmu->dev
parent to be the platform device.

Link: https://lore.kernel.org/linux-cxl/ZCLI9A40PJsyqAmq@kroah.com/
Cc: Frank Li <Frank.li@nxp.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-16-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/qcom: Assign parents for event_source devices
Jonathan Cameron [Fri, 12 Apr 2024 16:10:41 +0000 (17:10 +0100)]
perf/qcom: Assign parents for event_source devices

Currently all these devices appear directly under /sys/devices/
Only root busses should appear there, so instead assign the pmu->dev
parents to be the platform devices.

Link: https://lore.kernel.org/linux-cxl/ZCLI9A40PJsyqAmq@kroah.com/
Cc: Andy Gross <agross@kernel.org>
Cc: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-15-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoDocumentation: qcom-pmu: Use /sys/bus/event_source/devices paths
Jonathan Cameron [Fri, 12 Apr 2024 16:10:40 +0000 (17:10 +0100)]
Documentation: qcom-pmu: Use /sys/bus/event_source/devices paths

To allow setting an appropriate parent for the struct pmu device
remove existing references to /sys/devices/ path.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-14-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/riscv: Assign parents for event_source devices
Jonathan Cameron [Fri, 12 Apr 2024 16:10:39 +0000 (17:10 +0100)]
perf/riscv: Assign parents for event_source devices

Currently all these devices appear directly under /sys/devices/
Only root busses should appear there, so instead assign the pmu->dev
parents to be the appropriate platform devices.

Link: https://lore.kernel.org/linux-cxl/ZCLI9A40PJsyqAmq@kroah.com/
Cc: Atish Patra <atishp@atishpatra.org>
CC: Anup Patel <anup@brainfault.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-13-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/thunderx2: Assign parents for event_source devices
Jonathan Cameron [Fri, 12 Apr 2024 16:10:38 +0000 (17:10 +0100)]
perf/thunderx2: Assign parents for event_source devices

Currently all these devices appear directly under /sys/devices/
Only root busses should appear there, so instead assign the pmu->dev
parents to be the platform device.

Link: https://lore.kernel.org/linux-cxl/ZCLI9A40PJsyqAmq@kroah.com/
Cc: Robert Richter <rric@kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-12-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoDocumentation: thunderx2-pmu: Use /sys/bus/event_source/devices paths
Jonathan Cameron [Fri, 12 Apr 2024 16:10:37 +0000 (17:10 +0100)]
Documentation: thunderx2-pmu: Use /sys/bus/event_source/devices paths

To allow setting an appropriate parent for the struct pmu device
remove existing references to /sys/devices/ path.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-11-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/xgene: Assign parents for event_source devices
Jonathan Cameron [Fri, 12 Apr 2024 16:10:36 +0000 (17:10 +0100)]
perf/xgene: Assign parents for event_source devices

Currently all these devices appear directly under /sys/devices/
Only root busses should appear there, so instead assign the pmu->dev
parents to be the hardware related struct device.

Link: https://lore.kernel.org/linux-cxl/ZCLI9A40PJsyqAmq@kroah.com/
Cc: Khuong Dinh <khuong@os.amperecomputing.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-10-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoDocumentation: xgene-pmu: Use /sys/bus/event_source/devices paths
Jonathan Cameron [Fri, 12 Apr 2024 16:10:35 +0000 (17:10 +0100)]
Documentation: xgene-pmu: Use /sys/bus/event_source/devices paths

To allow setting an appropriate parent for the struct pmu device
remove existing references to /sys/devices/ path.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-9-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/arm_cspmu: Assign parents for event_source devices
Jonathan Cameron [Fri, 12 Apr 2024 16:10:34 +0000 (17:10 +0100)]
perf/arm_cspmu: Assign parents for event_source devices

Currently all these devices appear directly under /sys/devices/
Only root busses should appear there, so instead assign the pmu->dev
parents to be the platform device.

Link: https://lore.kernel.org/linux-cxl/ZCLI9A40PJsyqAmq@kroah.com/
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-8-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/amlogic: Assign parents for event_source devices
Jonathan Cameron [Fri, 12 Apr 2024 16:10:33 +0000 (17:10 +0100)]
perf/amlogic: Assign parents for event_source devices

Currently all these devices appear directly under /sys/devices/
Only root busses should appear there, so instead assign the pmu->dev
parents to be the platform device.

Link: https://lore.kernel.org/linux-cxl/ZCLI9A40PJsyqAmq@kroah.com/
Reviewed-by: Jiucheng Xu <jiucheng.xu@amlogic.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-7-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/hisi-hns3: Assign parents for event_source device
Jonathan Cameron [Fri, 12 Apr 2024 16:10:32 +0000 (17:10 +0100)]
perf/hisi-hns3: Assign parents for event_source device

Currently the PMU device appears directly under /sys/devices/
Only root busses should appear there, so instead assign the pmu->dev
parent to be the PCI device.

Link: https://lore.kernel.org/linux-cxl/ZCLI9A40PJsyqAmq@kroah.com/
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-6-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoDocumentation: hns-pmu: Use /sys/bus/event_source/devices paths
Jonathan Cameron [Fri, 12 Apr 2024 16:10:31 +0000 (17:10 +0100)]
Documentation: hns-pmu: Use /sys/bus/event_source/devices paths

To allow setting an appropriate parent for the struct pmu device
remove existing references to /sys/devices/ path.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-5-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/hisi-uncore: Assign parents for event_source devices
Jonathan Cameron [Fri, 12 Apr 2024 16:10:30 +0000 (17:10 +0100)]
perf/hisi-uncore: Assign parents for event_source devices

Currently the PMU device appears directly under /sys/devices/
Only root busses should appear there, so instead assign the pmu->dev
parent to be the platform device.

Link: https://lore.kernel.org/linux-cxl/ZCLI9A40PJsyqAmq@kroah.com/
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-4-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoDocumentation: hisi-pmu: Drop reference to /sys/devices path
Jonathan Cameron [Fri, 12 Apr 2024 16:10:29 +0000 (17:10 +0100)]
Documentation: hisi-pmu: Drop reference to /sys/devices path

Having assigned a parent to the device, the suggested path is
no longer valid.  As /sys/bus/event_sources based path is also
provided, simply drop mention of alternative.

Reviewed-by: Yicong Yang <yangyicong@hisilicon.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-3-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/hisi-pcie: Assign parent for event_source device
Jonathan Cameron [Fri, 12 Apr 2024 16:10:28 +0000 (17:10 +0100)]
perf/hisi-pcie: Assign parent for event_source device

Currently the PMU device appears directly under /sys/devices/
Only root busses should appear there, so instead assign the pmu->dev
parent to be the PCI device.

Link: https://lore.kernel.org/linux-cxl/ZCLI9A40PJsyqAmq@kroah.com/
Reviewed-by: Yicong Yang <yangyicong@hisilicon.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-2-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoarm64: Add Neoverse-V2 part
Besar Wicaksono [Tue, 9 Jan 2024 19:23:08 +0000 (13:23 -0600)]
arm64: Add Neoverse-V2 part

Add the part number and MIDR for Neoverse-V2

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Reviewed-by: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/20240109192310.16234-2-bwicaksono@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoarm64: acpi: Honour firmware_signature field of FACS, if it exists
David Woodhouse [Mon, 11 Mar 2024 13:04:07 +0000 (13:04 +0000)]
arm64: acpi: Honour firmware_signature field of FACS, if it exists

If the firmware_signature changes then OSPM should not attempt to resume
from hibernate, but should instead perform a clean reboot. Set the global
swsusp_hardware_signature to allow the generic code to include the value
in the swsusp header on disk, and perform the appropriate check on resume.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20240412073530.2222496-3-dwmw2@infradead.org
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoACPICA: Detect FACS even for hardware reduced platforms
David Woodhouse [Mon, 11 Mar 2024 12:19:14 +0000 (12:19 +0000)]
ACPICA: Detect FACS even for hardware reduced platforms

ACPICA commit 44fc328a1a14b097d92b8be83989e4bf69b6e6cb

The FACS is optional even on hardware reduced platforms, and may exist
for the purpose of communicating the hardware_signature field to provoke
a clean reboot instead of a resume from hibernation.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20240412073530.2222496-2-dwmw2@infradead.org
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoarm64: mm: Don't remap pgtables for allocate vs populate
Ryan Roberts [Fri, 12 Apr 2024 13:19:08 +0000 (14:19 +0100)]
arm64: mm: Don't remap pgtables for allocate vs populate

During linear map pgtable creation, each pgtable is fixmapped /
fixunmapped twice; once during allocation to zero the memory, and a
again during population to write the entries. This means each table has
2 TLB invalidations issued against it. Let's fix this so that each table
is only fixmapped/fixunmapped once, halving the number of TLBIs, and
improving performance.

Achieve this by separating allocation and initialization (zeroing) of
the page. The allocated page is now fixmapped directly by the walker and
initialized, before being populated and finally fixunmapped.

This approach keeps the change small, but has the side effect that late
allocations (using __get_free_page()) must also go through the generic
memory clearing routine. So let's tell __get_free_page() not to zero the
memory to avoid duplication.

Additionally this approach means that fixmap/fixunmap is still used for
late pgtable modifications. That's not technically needed since the
memory is all mapped in the linear map by that point. That's left as a
possible future optimization if found to be needed.

Execution time of map_mem(), which creates the kernel linear map page
tables, was measured on different machines with different RAM configs:

               | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
               | VM, 16G     | VM, 64G     | VM, 256G    | Metal, 512G
---------------|-------------|-------------|-------------|-------------
               |   ms    (%) |   ms    (%) |   ms    (%) |    ms    (%)
---------------|-------------|-------------|-------------|-------------
before         |   11   (0%) |  161   (0%) |  656   (0%) |  1654   (0%)
after          |   10 (-11%) |  104 (-35%) |  438 (-33%) |  1223 (-26%)

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
Tested-by: Eric Chanudet <echanude@redhat.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240412131908.433043-4-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoarm64: mm: Batch dsb and isb when populating pgtables
Ryan Roberts [Fri, 12 Apr 2024 13:19:07 +0000 (14:19 +0100)]
arm64: mm: Batch dsb and isb when populating pgtables

After removing uneccessary TLBIs, the next bottleneck when creating the
page tables for the linear map is DSB and ISB, which were previously
issued per-pte in __set_pte(). Since we are writing multiple ptes in a
given pte table, we can elide these barriers and insert them once we
have finished writing to the table.

Execution time of map_mem(), which creates the kernel linear map page
tables, was measured on different machines with different RAM configs:

               | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
               | VM, 16G     | VM, 64G     | VM, 256G    | Metal, 512G
---------------|-------------|-------------|-------------|-------------
               |   ms    (%) |   ms    (%) |   ms    (%) |    ms    (%)
---------------|-------------|-------------|-------------|-------------
before         |   78   (0%) |  435   (0%) | 1723   (0%) |  3779   (0%)
after          |   11 (-86%) |  161 (-63%) |  656 (-62%) |  1654 (-56%)

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
Tested-by: Eric Chanudet <echanude@redhat.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240412131908.433043-3-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoarm64: mm: Don't remap pgtables per-cont(pte|pmd) block
Ryan Roberts [Fri, 12 Apr 2024 13:19:06 +0000 (14:19 +0100)]
arm64: mm: Don't remap pgtables per-cont(pte|pmd) block

A large part of the kernel boot time is creating the kernel linear map
page tables. When rodata=full, all memory is mapped by pte. And when
there is lots of physical ram, there are lots of pte tables to populate.
The primary cost associated with this is mapping and unmapping the pte
table memory in the fixmap; at unmap time, the TLB entry must be
invalidated and this is expensive.

Previously, each pmd and pte table was fixmapped/fixunmapped for each
cont(pte|pmd) block of mappings (16 entries with 4K granule). This means
we ended up issuing 32 TLBIs per (pmd|pte) table during the population
phase.

Let's fix that, and fixmap/fixunmap each page once per population, for a
saving of 31 TLBIs per (pmd|pte) table. This gives a significant boot
speedup.

Execution time of map_mem(), which creates the kernel linear map page
tables, was measured on different machines with different RAM configs:

               | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
               | VM, 16G     | VM, 64G     | VM, 256G    | Metal, 512G
---------------|-------------|-------------|-------------|-------------
               |   ms    (%) |   ms    (%) |   ms    (%) |    ms    (%)
---------------|-------------|-------------|-------------|-------------
before         |  168   (0%) | 2198   (0%) | 8644   (0%) | 17447   (0%)
after          |   78 (-53%) |  435 (-80%) | 1723 (-80%) |  3779 (-78%)

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
Tested-by: Eric Chanudet <echanude@redhat.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240412131908.433043-2-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoarm64: boot: Support Flat Image Tree
Simon Glass [Fri, 29 Mar 2024 03:28:36 +0000 (16:28 +1300)]
arm64: boot: Support Flat Image Tree

Add a script which produces a Flat Image Tree (FIT), a single file
containing the built kernel and associated devicetree files.
Compression defaults to gzip which gives a good balance of size and
performance.

The files compress from about 86MB to 24MB using this approach.

The FIT can be used by bootloaders which support it, such as U-Boot
and Linuxboot. It permits automatic selection of the correct
devicetree, matching the compatible string of the running board with
the closest compatible string in the FIT. There is no need for
filenames or other workarounds.

Add a 'make image.fit' build target for arm64, as well.

The FIT can be examined using 'dumpimage -l'.

This uses the 'dtbs-list' file but processes only .dtb files, ignoring
the overlay .dtbo files.

This features requires pylibfdt (use 'pip install libfdt'). It also
requires compression utilities for the algorithm being used. Supported
compression options are the same as the Image.xxx files. Use
FIT_COMPRESSION to select an algorithm other than gzip.

While FIT supports a ramdisk / initrd, no attempt is made to support
this here, since it must be built separately from the Linux build.

Signed-off-by: Simon Glass <sjg@chromium.org>
Acked-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20240329032836.141899-3-sjg@chromium.org
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoarm64: Add BOOT_TARGETS variable
Simon Glass [Fri, 29 Mar 2024 03:28:35 +0000 (16:28 +1300)]
arm64: Add BOOT_TARGETS variable

Add a new variable containing a list of possible targets. Mark them as
phony. This matches the approach taken for arch/arm

Signed-off-by: Simon Glass <sjg@chromium.org>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
Link: https://lore.kernel.org/r/20240329032836.141899-2-sjg@chromium.org
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoarm64: arm_pmuv3: Correctly extract and check the PMUVer
Yicong Yang [Thu, 11 Apr 2024 12:30:30 +0000 (20:30 +0800)]
arm64: arm_pmuv3: Correctly extract and check the PMUVer

Currently we're using "sbfx" to extract the PMUVer from ID_AA64DFR0_EL1
and skip the init/reset if no PMU present when the extracted PMUVer is
negative or is zero. However for PMUv3p8 the PMUVer will be 0b1000 and
PMUVer extracted by "sbfx" will always be negative and we'll skip the
init/reset in __init_el2_debug/reset_pmuserenr_el0 unexpectedly.

So this patch use "ubfx" instead of "sbfx" to extract the PMUVer. If
the PMUVer is implementation defined (0b1111) or not implemented(0b0000)
then skip the reset/init. Previously we'll also skip the init/reset
if the PMUVer is higher than the version we known (currently PMUv3p9),
with this patch we'll only skip if the PMU is not implemented or
implementation defined. This keeps consistence with how we probe
the PMU in the driver with pmuv3_implemented().

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240411123030.7201-1-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoarm64: tlb: Allow range operation for MAX_TLBI_RANGE_PAGES
Gavin Shan [Fri, 5 Apr 2024 03:58:52 +0000 (13:58 +1000)]
arm64: tlb: Allow range operation for MAX_TLBI_RANGE_PAGES

MAX_TLBI_RANGE_PAGES pages is covered by SCALE#3 and NUM#31 and it's
supported now. Allow TLBI RANGE operation when the number of pages is
equal to MAX_TLBI_RANGE_PAGES in __flush_tlb_range_nosync().

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Link: https://lore.kernel.org/r/20240405035852.1532010-4-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoarm64: tlb: Improve __TLBI_VADDR_RANGE()
Gavin Shan [Fri, 5 Apr 2024 03:58:51 +0000 (13:58 +1000)]
arm64: tlb: Improve __TLBI_VADDR_RANGE()

The macro returns the operand of TLBI RANGE instruction. A mask needs
to be applied to each individual field upon producing the operand, to
avoid the adjacent fields can interfere with each other when invalid
arguments have been provided. The code looks more tidy at least with
a mask and FIELD_PREP().

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Link: https://lore.kernel.org/r/20240405035852.1532010-3-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoarm64: tlb: Fix TLBI RANGE operand
Gavin Shan [Fri, 5 Apr 2024 03:58:50 +0000 (13:58 +1000)]
arm64: tlb: Fix TLBI RANGE operand

KVM/arm64 relies on TLBI RANGE feature to flush TLBs when the dirty
pages are collected by VMM and the page table entries become write
protected during live migration. Unfortunately, the operand passed
to the TLBI RANGE instruction isn't correctly sorted out due to the
commit 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()").
It leads to crash on the destination VM after live migration because
TLBs aren't flushed completely and some of the dirty pages are missed.

For example, I have a VM where 8GB memory is assigned, starting from
0x40000000 (1GB). Note that the host has 4KB as the base page size.
In the middile of migration, kvm_tlb_flush_vmid_range() is executed
to flush TLBs. It passes MAX_TLBI_RANGE_PAGES as the argument to
__kvm_tlb_flush_vmid_range() and __flush_s2_tlb_range_op(). SCALE#3
and NUM#31, corresponding to MAX_TLBI_RANGE_PAGES, isn't supported
by __TLBI_RANGE_NUM(). In this specific case, -1 has been returned
from __TLBI_RANGE_NUM() for SCALE#3/2/1/0 and rejected by the loop
in the __flush_tlb_range_op() until the variable @scale underflows
and becomes -9, 0xffff708000040000 is set as the operand. The operand
is wrong since it's sorted out by __TLBI_VADDR_RANGE() according to
invalid @scale and @num.

Fix it by extending __TLBI_RANGE_NUM() to support the combination of
SCALE#3 and NUM#31. With the changes, [-1 31] instead of [-1 30] can
be returned from the macro, meaning the TLBs for 0x200000 pages in the
above example can be flushed in one shoot with SCALE#3 and NUM#31. The
macro TLBI_RANGE_MASK is dropped since no one uses it any more. The
comments are also adjusted accordingly.

Fixes: 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()")
Cc: stable@kernel.org # v6.6+
Reported-by: Yihuang Yu <yihyu@redhat.com>
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Link: https://lore.kernel.org/r/20240405035852.1532010-2-gshan@redhat.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
5 months agoarm64: Remove unnecessary irqflags alternative.h include
Jinjie Ruan [Thu, 14 Mar 2024 06:38:19 +0000 (14:38 +0800)]
arm64: Remove unnecessary irqflags alternative.h include

Since commit 20af807d806d ("arm64: Avoid cpus_have_const_cap() for
ARM64_HAS_GIC_PRIO_MASKING"), the alternative.h include is not used,
so remove it.

Fixes: 20af807d806d ("arm64: Avoid cpus_have_const_cap() for ARM64_HAS_GIC_PRIO_MASKING")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://lore.kernel.org/r/20240314063819.2636445-1-ruanjinjie@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/arm-cmn: Set PMU device parent
Robin Murphy [Tue, 9 Apr 2024 17:15:17 +0000 (18:15 +0100)]
perf/arm-cmn: Set PMU device parent

Now that perf supports giving the PMU device a parent, we can use our
platform device to make the relationship between CMN instances and PMU
IDs trivially discoverable, from either nominal direction:

root@crazy-taxi:~# ls /sys/devices/platform/ARMHC600:00 | grep cmn
arm_cmn_0
root@crazy-taxi:~# realpath /sys/bus/event_source/devices/arm_cmn_0/..
/sys/devices/platform/ARMHC600:00

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/25d4428df1ddad966c74a3ed60171cd3ca6c8b66.1712682917.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/thunderx2: Avoid placing cpumask on the stack
Dawei Li [Wed, 3 Apr 2024 15:59:50 +0000 (23:59 +0800)]
perf/thunderx2: Avoid placing cpumask on the stack

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_any_and_but() to avoid the need for a temporary cpumask on
the stack.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Link: https://lore.kernel.org/r/20240403155950.2068109-11-dawei.li@shingroup.cn
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/qcom_l2: Avoid placing cpumask on the stack
Dawei Li [Wed, 3 Apr 2024 15:59:49 +0000 (23:59 +0800)]
perf/qcom_l2: Avoid placing cpumask on the stack

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_any_and_but() to avoid the need for a temporary cpumask on
the stack.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Link: https://lore.kernel.org/r/20240403155950.2068109-10-dawei.li@shingroup.cn
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/hisi_uncore: Avoid placing cpumask on the stack
Dawei Li [Wed, 3 Apr 2024 15:59:48 +0000 (23:59 +0800)]
perf/hisi_uncore: Avoid placing cpumask on the stack

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_any_and_but() to avoid the need for a temporary cpumask on
the stack.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240403155950.2068109-9-dawei.li@shingroup.cn
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/hisi_pcie: Avoid placing cpumask on the stack
Dawei Li [Wed, 3 Apr 2024 15:59:47 +0000 (23:59 +0800)]
perf/hisi_pcie: Avoid placing cpumask on the stack

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_any_and_but() to avoid the need for a temporary cpumask on
the stack.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240403155950.2068109-8-dawei.li@shingroup.cn
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/dwc_pcie: Avoid placing cpumask on the stack
Dawei Li [Wed, 3 Apr 2024 15:59:46 +0000 (23:59 +0800)]
perf/dwc_pcie: Avoid placing cpumask on the stack

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_any_and_but() to avoid the need for a temporary cpumask on
the stack.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240403155950.2068109-7-dawei.li@shingroup.cn
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/arm_dsu: Avoid placing cpumask on the stack
Dawei Li [Wed, 3 Apr 2024 15:59:45 +0000 (23:59 +0800)]
perf/arm_dsu: Avoid placing cpumask on the stack

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_any_and_but() to avoid the need for a temporary cpumask on
the stack.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Link: https://lore.kernel.org/r/20240403155950.2068109-6-dawei.li@shingroup.cn
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/arm_cspmu: Avoid placing cpumask on the stack
Dawei Li [Wed, 3 Apr 2024 15:59:44 +0000 (23:59 +0800)]
perf/arm_cspmu: Avoid placing cpumask on the stack

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_any_and_but() to avoid the need for a temporary cpumask on
the stack.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Link: https://lore.kernel.org/r/20240403155950.2068109-5-dawei.li@shingroup.cn
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/arm-cmn: Avoid placing cpumask on the stack
Dawei Li [Wed, 3 Apr 2024 15:59:43 +0000 (23:59 +0800)]
perf/arm-cmn: Avoid placing cpumask on the stack

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_any_and_but() to avoid the need for a temporary cpumask on
the stack.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Link: https://lore.kernel.org/r/20240403155950.2068109-4-dawei.li@shingroup.cn
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoperf/alibaba_uncore_drw: Avoid placing cpumask on the stack
Dawei Li [Wed, 3 Apr 2024 15:59:42 +0000 (23:59 +0800)]
perf/alibaba_uncore_drw: Avoid placing cpumask on the stack

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_any_and_but() to avoid the need for a temporary cpumask on
the stack.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240403155950.2068109-3-dawei.li@shingroup.cn
Signed-off-by: Will Deacon <will@kernel.org>
5 months agocpumask: add cpumask_any_and_but()
Mark Rutland [Wed, 3 Apr 2024 15:59:41 +0000 (23:59 +0800)]
cpumask: add cpumask_any_and_but()

In some cases, it's useful to be able to select a random cpu from the
intersection of two masks, excluding a particular CPU.

For example, in some systems an uncore PMU is shared by a subset of
CPUs, and management of this PMU is assigned to some arbitrary CPU in
this set. Whenever the management CPU is hotplugged out, we wish to
migrate responsibility to another arbitrary CPU which is both in this
set and online.

Today we can use cpumask_any_and() to select an arbitrary CPU in the
intersection of two masks. We can also use cpumask_any_but() to select
any arbitrary cpu in a mask excluding, a particular CPU.

To do both, we either need to use a temporary cpumask, which is
wasteful, or use some lower-level cpumask helpers, which can be unclear.

This patch adds a new cpumask_any_and_but() to cater for these cases.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Acked-by: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20240403155950.2068109-2-dawei.li@shingroup.cn
Signed-off-by: Will Deacon <will@kernel.org>
5 months agodrivers/perf: thunderx2_pmu: Replace open coded acpi_match_acpi_device()
Andy Shevchenko [Thu, 4 Apr 2024 16:59:23 +0000 (19:59 +0300)]
drivers/perf: thunderx2_pmu: Replace open coded acpi_match_acpi_device()

Replace open coded acpi_match_acpi_device() in get_tx2_pmu_type().

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20240404170016.2466898-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agodrivers: perf: Remove the now superfluous sentinel elements from ctl_table array
Joel Granados [Thu, 28 Mar 2024 15:57:54 +0000 (16:57 +0100)]
drivers: perf: Remove the now superfluous sentinel elements from ctl_table array

This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which will
reduce the overall build time size of the kernel and run time memory
bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)

Remove sentinel from sbi_pmu_sysctl_table

Signed-off-by: Joel Granados <j.granados@samsung.com>
Link: https://lore.kernel.org/r/20240328-jag-sysctl_remset_misc-v1-7-47c1463b3af2@samsung.com
Signed-off-by: Will Deacon <will@kernel.org>
5 months agoLinux 6.9-rc3 v6.9-rc3
Linus Torvalds [Sun, 7 Apr 2024 20:22:46 +0000 (13:22 -0700)]
Linux 6.9-rc3

5 months agoMerge tag 'x86-urgent-2024-04-07' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 7 Apr 2024 16:33:21 +0000 (09:33 -0700)]
Merge tag 'x86-urgent-2024-04-07' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

 - Fix MCE timer reinit locking

 - Fix/improve CoCo guest random entropy pool init

 - Fix SEV-SNP late disable bugs

 - Fix false positive objtool build warning

 - Fix header dependency bug

 - Fix resctrl CPU offlining bug

* tag 'x86-urgent-2024-04-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/retpoline: Add NOENDBR annotation to the SRSO dummy return thunk
  x86/mce: Make sure to grab mce_sysfs_mutex in set_bank()
  x86/CPU/AMD: Track SNP host status with cc_platform_*()
  x86/cc: Add cc_platform_set/_clear() helpers
  x86/kvm/Kconfig: Have KVM_AMD_SEV select ARCH_HAS_CC_PLATFORM
  x86/coco: Require seeding RNG with RDRAND on CoCo systems
  x86/numa/32: Include missing <asm/pgtable_areas.h>
  x86/resctrl: Fix uninitialized memory read when last CPU of domain goes offline

5 months agoMerge tag 'timers-urgent-2024-04-07' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 7 Apr 2024 16:20:50 +0000 (09:20 -0700)]
Merge tag 'timers-urgent-2024-04-07' of git://git./linux/kernel/git/tip/tip

Pull timer fixes from Ingo Molnar:
 "Fix various timer bugs:

   - Fix a timer migration bug that may result in missed events

   - Fix timer migration group hierarchy event updates

   - Fix a PowerPC64 build warning

   - Fix a handful of DocBook annotation bugs"

* tag 'timers-urgent-2024-04-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers/migration: Return early on deactivation
  timers/migration: Fix ignored event due to missing CPU update
  vdso: Use CONFIG_PAGE_SHIFT in vdso/datapage.h
  timers: Fix text inconsistencies and spelling
  tick/sched: Fix struct tick_sched doc warnings
  tick/sched: Fix various kernel-doc warnings
  timers: Fix kernel-doc format and add Return values
  time/timekeeping: Fix kernel-doc warnings and typos
  time/timecounter: Fix inline documentation

5 months agoMerge tag 'perf-urgent-2024-04-07' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 7 Apr 2024 16:14:46 +0000 (09:14 -0700)]
Merge tag 'perf-urgent-2024-04-07' of git://git./linux/kernel/git/tip/tip

Pull x86 perf fix from Ingo Molnar:
 "Fix a combined PEBS events bug on x86 Intel CPUs"

* tag 'perf-urgent-2024-04-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/ds: Don't clear ->pebs_data_cfg for the last PEBS event

5 months agoMerge tag 'nfsd-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Linus Torvalds [Sat, 6 Apr 2024 16:37:50 +0000 (09:37 -0700)]
Merge tag 'nfsd-6.9-2' of git://git./linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Address a slow memory leak with RPC-over-TCP

 - Prevent another NFS4ERR_DELAY loop during CREATE_SESSION

* tag 'nfsd-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: hold a lighter-weight client reference over CB_RECALL_ANY
  SUNRPC: Fix a slow server-side memory leak with RPC-over-TCP

5 months agoMerge tag 'i2c-for-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sat, 6 Apr 2024 16:27:36 +0000 (09:27 -0700)]
Merge tag 'i2c-for-6.9-rc3' of git://git./linux/kernel/git/wsa/linux

Pull i2c fix from Wolfram Sang:
 "A host driver build fix"

* tag 'i2c-for-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: pxa: hide unused icr_bits[] variable

5 months agoMerge tag 'xfs-6.9-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Sat, 6 Apr 2024 16:14:18 +0000 (09:14 -0700)]
Merge tag 'xfs-6.9-fixes-2' of git://git./fs/xfs/xfs-linux

Pull xfs fix from Chandan Babu:

 - Allow creating new links to special files which were not associated
   with a project quota

* tag 'xfs-6.9-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: allow cross-linking special files without project quota

5 months agoMerge tag '6.9-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sat, 6 Apr 2024 16:06:17 +0000 (09:06 -0700)]
Merge tag '6.9-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - fix to retry close to avoid potential handle leaks when server
   returns EBUSY

 - DFS fixes including a fix for potential use after free

 - fscache fix

 - minor strncpy cleanup

 - reconnect race fix

 - deal with various possible UAF race conditions tearing sessions down

* tag '6.9-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: fix potential UAF in cifs_signal_cifsd_for_reconnect()
  smb: client: fix potential UAF in smb2_is_network_name_deleted()
  smb: client: fix potential UAF in is_valid_oplock_break()
  smb: client: fix potential UAF in smb2_is_valid_oplock_break()
  smb: client: fix potential UAF in smb2_is_valid_lease_break()
  smb: client: fix potential UAF in cifs_stats_proc_show()
  smb: client: fix potential UAF in cifs_stats_proc_write()
  smb: client: fix potential UAF in cifs_dump_full_key()
  smb: client: fix potential UAF in cifs_debug_files_proc_show()
  smb3: retrying on failed server close
  smb: client: serialise cifs_construct_tcon() with cifs_mount_mutex
  smb: client: handle DFS tcons in cifs_construct_tcon()
  smb: client: refresh referral without acquiring refpath_lock
  smb: client: guarantee refcounted children from parent session
  cifs: Fix caching to try to do open O_WRONLY as rdwr on server
  smb: client: fix UAF in smb2_reconnect_server()
  smb: client: replace deprecated strncpy with strscpy

5 months agox86/retpoline: Add NOENDBR annotation to the SRSO dummy return thunk
Borislav Petkov (AMD) [Fri, 5 Apr 2024 14:46:37 +0000 (16:46 +0200)]
x86/retpoline: Add NOENDBR annotation to the SRSO dummy return thunk

srso_alias_untrain_ret() is special code, even if it is a dummy
which is called in the !SRSO case, so annotate it like its real
counterpart, to address the following objtool splat:

  vmlinux.o: warning: objtool: .export_symbol+0x2b290: data relocation to !ENDBR: srso_alias_untrain_ret+0x0

Fixes: 4535e1a4174c ("x86/bugs: Fix the SRSO mitigation on Zen3/4")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20240405144637.17908-1-bp@kernel.org